Issue with thinlinc HA setup

Hi,

We recently setup the thinlinc HA to test the failover. The vsmserver.hconf on 1 node looks like

# cat /opt/thinlinc/etc/conf.d/vsmserver.hconf
#  -*- mode: conf-unix; -*-

# Hiveconf configuration file - VSM server
#
[/vsmserver]

# Administrators email
admin_email=root@localhost

#
# Terminal servers. A space-separated list of agent server hostnames. These will
# be used for communication between the server and the agent. The names reported
# to clients are fetched from the agent itself; names in terminalservers are not
# reported directly to clients.
#
terminalservers=thinlinc01.abc.com

# Load balance finetuning
ram_per_user=100
bogomips_per_user=600
existing_users_weight=4
load_update_cycle=40

# The maximum number of sessions per user. 0 means no limit.
max_sessions_per_user=4

# Only allow connections from clients in this space-separated list for priv
# operations localhost and hostname IP are always allowed.
allowed_clients=

# ThinLinc access can be limited to certain groups. If the allowed_groups
# space-separated list is empty, all users are accepted. Otherwise, the user
# must be a member of the groups listed below, to be able to use ThinLinc.
# Example: allowed_groups=students teachers
allowed_groups=

# If true, processes occupying the users interval of forwarded ports
# will be killed.
unbind_ports_at_login=true

# A space-separated list of candidate:agenthost pairs that can be used to force
# sessions for specific users or groups to be created on specific agent hosts.
# Note that only one server can be specified per candidate. No load
# balancing is in use for servers selected this way.
#
# If the specific server is down, no session will be created.
#
# If a server specified here is also listed in terminalservers,
# sessions will also be created for users or groups not listed here.
# Use of this parameter is recommended only for special circumstances,
# for example when testing new operating systems.
#
# Groupnames should be prepended by a '+'. Example:
# explicit_agentselection=+agentoneusers:agentone
explicit_agentselection=

# Port to listen on
# This should normally be the same as /vsm/vsm_server_port, but under
# some special circumstances, it might be set to another value.
listen_port=9000

[/vsmserver/subcluster/default]
agents=thinlinc01.abc.com thinlinc02.abc.com

[/vsmserver/HA]
# Enable HA operations by setting this to 1.
enabled=1

# A space-separated list of the nodes in the cluster
nodes=thinlinc01.abc.com thinlinc02.abc.com

and vmsagent.conf looks like -

[/vsmagent]


# The host that runs the VSM server (master machine)
master_hostname=thinlinc.abc.com (hostname of LB)
# Only allow connections from the VSM servers in this space-separated list.
# localhost, hostname, IP and master_hostname are always allowed. NOTE: Do not
# change this parameter unless you know what you are doing.
allowed_clients=thinlinc01.abc.com thinlinc02.abc.com

# Automatically create the users home directory, if it doesn't exist?
make_homedir=1
# The file mode for the newly created home directory
make_homedir_mode=0700

# The default geometry, if the client doesn't request anything.
default_geometry=1024 768

# Save password for future logins?
single_signon=1

# Extra arguments to pass to the Xserver Xvnc, for example:
# xserver_args=-MaxIdleTime 60
xserver_args=-br -localhost -verbose 3

# The location of the Xauthority file, either homedir or sessiondir
xauthority_location=sessiondir

# Public hostname; the hostname that clients are redirected to. If not
# defined, the agent will use the computer's IP address.
agent_hostname=

# The maximum port used for VNC and tunnel ports for displays
# display_min to display_max. This number may not be higher than
# lowest_user_port
max_session_port=32767

# The lowest port to be used for user programs needing TCP/UDP ports.
# This must be higher than max_session_port.
lowest_user_port=32768

# Where to start allocating display numbers.
display_min=10

# Timeout in tenths of seconds, for starting new sessions
xvnc_start_timeout=250

# The maximum display number to use on this VSM agent host.
# display_max - display_min is the maximum number of ThinLinc users
# allowed on this host. Default is 2000.
display_max=2000

# Port to listen on
# This should normally be the same as /vsm/vsm_agent_port, but under
# some special circumstances, it might be set to another value.
listen_port=904

# Environment variables to add to users environment, before running
# xstartup. Note: Since xstartup is run through /bin/bash --login,
# files in /etc/profile.d/ will be sourced and may override values in
# default_environment.
# Note: TOWN is just an example.

The same configuration is there in other node too

So, when user tries to connect to LB IP - then the session is created in either thinlinc01.abc.com
or thinlinc02.abc.com which is fine . Now, when i shutdown the node where the session was connected - i am expecting the session to get transfered to another node - which is not happening .

When the session first gets created in thinlinc01, there in logs i see

2023-05-16 15:22:03 DEBUG vsmserver: Requesting VSM Agent 127.0.0.1 to unbind ports for mhartman's display 10
2023-05-16 15:22:03 DEBUG vsmserver: Handling connection from ('loadbalancerIP', 1023)
2023-05-16 15:22:03 DEBUG vsmserver.HA: Handling session change (new, abcd/127.0.0.1:10)
2023-05-16 15:22:03 WARNING vsmserver.HA: Other node sent (new,abcd), but we already have this information
2023-05-16 15:22:03 DEBUG vsmserver.HA: Successfully transferred session change (new,abcd/127.0.0.1:10) to other node
2023-05-16 15:22:03 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-05-16 15:22:03 DEBUG vsmserver.HA: Done writing active HA changes

And then i was expecting some kind of HA logs in other node too i,e thinlinc02 but there wasn’t any.
And now when i shutdown the thinlinc01 node the session was not there in 02 node and When i checked the logs of 01 when i started it again after sometime i did see

2023-05-16 14:59:33 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-05-16 14:59:33 DEBUG vsmserver.HA: Done writing active HA changes
2023-05-16 14:59:33 DEBUG vsmserver: Handling connection from ('loadbalncerIP, 1022)
2023-05-16 14:59:33 DEBUG vsmserver.HA: Handling session change (delete, abcd/127.0.0.1:10)
2023-05-16 14:59:33 WARNING vsmserver.HA: Other node told us to remove non-existing session abcd/127.0.0.1:10
2023-05-16 14:59:33 DEBUG vsmserver: Handling connection from ('127.0.0.1', 46540)
2023-05-16 14:59:33 WARNING vsmserver.HA: Tried to transfer session change (delete,abcd/127.0.0.1:10) to other node but other node reported HA_NOSUCHSESSION
2023-05-16 14:59:33 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-05-16 14:59:33 DEBUG vsmserver.HA: Done writing active HA changes
2023-05-16 14:59:33 DEBUG vsmserver.session: User with uid 0 requested a socket
2023-05-16 14:59:34 DEBUG vsmserver: Scheduled load update of 127.0.0.1 at Tue May 16 15:00:14 2023

Can you please help me here and point if i am missing something ?

Hello,

ThinLinc HA does not provide functionality to transfer sessions between agent nodes, if one or more agents has issues.
From the ThinLinc Administrator’s Guide:

 Since the VSM server service handles load-balancing and the session database, it can be problematic if the machine fails. ThinLinc HA provides protection for this service against the single point of failure that the hardware running the VSM server normally is.

I.e., it provides functionality to keep the session database syncronised between two VSM servers, and so that users can still reach their session running on an agent in case of failure on one of the VSM servers.

Kind regards,
Martin

Hello,

In my case two the nodes i have behaves as both server and agent. I followed exactly whats mentioned in https://www.cendio.com/resources/docs/tag/HA-configuration.html

The purpose of HA is to provide a way for users to still reach their session in a situation where one of the VSM servers fails for some reason. If the user’s session is running on the same host that is running VSM, and that host it fails, their session will be lost.

One more thing, from your pasted vsmserver.hconf, I can see that you’re running an old version of ThinLinc which is most probably EOL’ed and without support. You should consider upgrading.

/Martin

HI @martin ,

We are running the latest version 4.14.0

We are two nodes which are running both the agent and server anf HA enabled.So, my question is my HA won’t work here ?

When session is created in one node then HA must pass that info to another node and if 1st node dies which has active session - then this info was already passed on to 2nd node and hence ideally per as thinlinc HA it should have worked

The purpose of HA is not to migrate the actual desktop session, but to make sure that there is always one node ready to serve incoming VSM connections and to keep the session database syncronised.

Hence, it does not make sense to have the vsmagent service running on the same node as vsmserver.

Kind regards,
Martin

Hi Martin,

Thanks for the clarification.

Just one last question if 1 agent dies - will the sessions running on that, then still be available (migrated to second agent node) ?

No, all sessions will be lost on the agent that dies.

Regards,
Martin