Seesions are not being loadbalanced across agents

Hi,

We recently setup the thinlinc HA and cluster to test the failover and loadbalancing. The vsmserver.hconf on 1 node looks like

# cat /opt/thinlinc/etc/conf.d/vsmserver.hconf
#  -*- mode: conf-unix; -*-

# Hiveconf configuration file - VSM server
#
[/vsmserver]

# Administrators email
admin_email=root@localhost

#
# Terminal servers. A space-separated list of agent server hostnames. These will
# be used for communication between the server and the agent. The names reported
# to clients are fetched from the agent itself; names in terminalservers are not
# reported directly to clients.
#
terminalservers=thinlinc01.abc.com

# Load balance finetuning
ram_per_user=100
bogomips_per_user=600
existing_users_weight=4
load_update_cycle=40

# The maximum number of sessions per user. 0 means no limit.
max_sessions_per_user=4

# Only allow connections from clients in this space-separated list for priv
# operations localhost and hostname IP are always allowed.
allowed_clients=

# ThinLinc access can be limited to certain groups. If the allowed_groups
# space-separated list is empty, all users are accepted. Otherwise, the user
# must be a member of the groups listed below, to be able to use ThinLinc.
# Example: allowed_groups=students teachers
allowed_groups=

# If true, processes occupying the users interval of forwarded ports
# will be killed.
unbind_ports_at_login=true

# A space-separated list of candidate:agenthost pairs that can be used to force
# sessions for specific users or groups to be created on specific agent hosts.
# Note that only one server can be specified per candidate. No load
# balancing is in use for servers selected this way.
#
# If the specific server is down, no session will be created.
#
# If a server specified here is also listed in terminalservers,
# sessions will also be created for users or groups not listed here.
# Use of this parameter is recommended only for special circumstances,
# for example when testing new operating systems.
#
# Groupnames should be prepended by a '+'. Example:
# explicit_agentselection=+agentoneusers:agentone
explicit_agentselection=

# Port to listen on
# This should normally be the same as /vsm/vsm_server_port, but under
# some special circumstances, it might be set to another value.
listen_port=9000

[/vsmserver/subcluster/default]
agents=thinlinc01.abc.com thinlinc02.abc.com

[/vsmserver/HA]
# Enable HA operations by setting this to 1.
enabled=1

# A space-separated list of the nodes in the cluster
nodes=thinlinc01.abc.com thinlinc02.abc.com

and vmsagent.conf looks like -

[/vsmagent]


# The host that runs the VSM server (master machine)
master_hostname=thinlinc.abc.com (resource IP)
# Only allow connections from the VSM servers in this space-separated list.
# localhost, hostname, IP and master_hostname are always allowed. NOTE: Do not
# change this parameter unless you know what you are doing.
allowed_clients=

# Automatically create the users home directory, if it doesn't exist?
make_homedir=1
# The file mode for the newly created home directory
make_homedir_mode=0700

# The default geometry, if the client doesn't request anything.
default_geometry=1024 768

# Save password for future logins?
single_signon=1

# Extra arguments to pass to the Xserver Xvnc, for example:
# xserver_args=-MaxIdleTime 60
xserver_args=-br -localhost -verbose 3

# The location of the Xauthority file, either homedir or sessiondir
xauthority_location=sessiondir

# Public hostname; the hostname that clients are redirected to. If not
# defined, the agent will use the computer's IP address.
agent_hostname=

# The maximum port used for VNC and tunnel ports for displays
# display_min to display_max. This number may not be higher than
# lowest_user_port
max_session_port=32767

# The lowest port to be used for user programs needing TCP/UDP ports.
# This must be higher than max_session_port.
lowest_user_port=32768

# Where to start allocating display numbers.
display_min=10

# Timeout in tenths of seconds, for starting new sessions
xvnc_start_timeout=250

# The maximum display number to use on this VSM agent host.
# display_max - display_min is the maximum number of ThinLinc users
# allowed on this host. Default is 2000.
display_max=2000

# Port to listen on
# This should normally be the same as /vsm/vsm_agent_port, but under
# some special circumstances, it might be set to another value.
listen_port=904

# Environment variables to add to users environment, before running
# xstartup. Note: Since xstartup is run through /bin/bash --login,
# files in /etc/profile.d/ will be sourced and may override values in
# default_environment.
# Note: TOWN is just an example.

The same configuration is there in other node too. My both the nodes are running vsmserver and vsmagent and i have setup HA and cluster within them. And i have configured the keepalived

So, when user tries to connect to resource IP - then the session is always created on master thinlinc01.abc.com. I was expecting my sessions to get created on thinlinc02.abc.com and load being distributed but sadly that is not happening.

Can you please advise me if i am missing something ?

There is one more issue - when the session is created in the vsmserver logs where the session is created i can see

2023-08-14 12:15:49 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-14 12:15:49 DEBUG vsmserver.HA: Done writing active HA changes
2023-08-14 12:15:49 DEBUG vsmserver.session: Writing active sessions to disk
2023-08-14 12:15:49 DEBUG vsmserver.session: Done writing sessions
2023-08-14 12:15:49 DEBUG vsmserver.HA: Successfully transferred session change (new,admintestaman/127.0.0.1:10) to other node
2023-08-14 12:15:49 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-14 12:15:49 DEBUG vsmserver.HA: Done writing active HA changes

and on the other machine i see

2023-08-14 12:32:52 DEBUG vsmserver.session: User with uid 0 requested a socket
2023-08-14 12:33:10 DEBUG vsmserver.session: Doing periodic session verification
2023-08-14 12:35:57 DEBUG vsmserver.HA: Handling session change (new, admintestaman/127.0.0.1:10)
2023-08-14 12:35:57 DEBUG vsmserver.HA: Got (new,admintestaman) from other node. Updating my session database
2023-08-14 12:35:57 DEBUG vsmserver.session: Writing active sessions to disk
2023-08-14 12:35:57 DEBUG vsmserver.session: Done writing sessions

My doubt/concern is why its showing 127.0.0.1 IP - I was expecting it to display IP/hostname of the server instead.

Am i correct here ??

Now, since it says 127.0.0.1 - because of this , i can see a session is also created in slave thinlinc02 server which IMO is not correct and after a while that session dies - and this info is then transfered to master node and then it also deletes its session - and because of this my original session dies.

Can you please point what i am missng here ?

Hello @Aman

My both the nodes are running vsmserver and vsmagent and i have setup HA and cluster within them. And i have configured the keepalived

So thinlinc01.abc.com thinlinc02.abc.com are acting as both masters and agents?

So, when user tries to connect to resource IP - then the session is always created on master thinlinc01.abc.com. I was expecting my sessions to get created on thinlinc02.abc.com and load being distributed but sadly that is not happening.

What made you expect the sessions to be created on thinlinc02.abc.com if I may ask?

New sessions will be placed on the agent that has the lightest load, and that has the highest rating value. You can check these values in tlwebadmin under Status > Load.

There is one more issue - when the session is created in the vsmserver logs where the session is created i can see
and on the other machine i see

These logs provided from your vsmserver.log shows a normal session database synchronization, and seems to be from two different occasions, since the timestamps are 20 minutes apart.

My doubt/concern is why its showing 127.0.0.1 IP - I was expecting it to display IP/hostname of the server instead.

I believe this is because the vsmserver is also acting as vsmagent, and you have not defined any value for agent_hostname in your vsmagent.hconf file.

i can see a session is also created in slave thinlinc02 server which IMO is not correct and after a while that session dies - and this info is then transfered to master node and then it also deletes its session - and because of this my original session dies.

Please provide your vsmagent.log from the agent server where the session was created and later died. Please include your session log file as well, as that might provide useful clues to why the session suddenly died. The session log can be found in /var/opt/thinlinc/sessions//last/xinit.log

Kind regards,
Martin

Hello @martin ,

I am not expecting sessions to be created always on thinlinc02.abc.com. My expectation was once i have couple of sessions open on thinlinc01.abc.com then few of the sessions should also be then created on thinlinc02.abc.com to balance the load, which is not happening. I think this has something to do with this

2023-08-16 13:36:15 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent 127.0.0.1:904 in request for loadinfo. Marking as down.

I did set the agent_hostname this time around in my vsmagent.hconf file for both the servers but still i am seeing

2023-08-16 13:36:49 INFO vsmserver.session: Session 127.0.0.1:10 created for user admintestaman
2023-08-16 13:36:49 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-16 13:36:49 DEBUG vsmserver.HA: Done writing active HA changes
2023-08-16 13:36:49 DEBUG vsmserver.session: Writing active sessions to disk
2023-08-16 13:36:49 DEBUG vsmserver.session: Done writing sessions
2023-08-16 13:36:49 DEBUG vsmserver.HA: Successfully transferred session change (new,admintestaman/127.0.0.1:10) to other node
2023-08-16 13:36:49 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-16 13:36:49 DEBUG vsmserver.HA: Done writing active HA changes

My sessions are created on thinlinc01.abc.com and this info is then passed on thinlinc02.abc.com since they are in HA and my session dies on thinlinc02.abc.com which passes the info to thinlinc01.abc.com and then my session in thinlinc01.abc.com also dies. I strongly believe its because of

2023-08-16 13:36:49 DEBUG vsmserver.HA: Successfully transferred session change (new,admintestaman/127.0.0.1:10) to other node

when this info is passed to 02 server - then since its 127.0.0.1 - the 02 server thinks the sesssion is running localhost - and this is the reason why when i open the tlwebadm page in browser in can see the session listed on 02 as well - and when i click on that session or refresh the page then the session is removed/killed .

On 02 server after a while i see in logs

2023-08-16 13:46:15 WARNING vsmagent.sessions: Broken session for user admintestaman, tl-session process 24886 does not exist

I think if i am able to solve the issue why 127.0.0.1 is being sent to 02 server instead of hostname of 01 then this issue shoud be solved . But, i am not able to figure that out yet

Also, if you please help me why i am seeing

2023-08-16 13:36:15 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent 127.0.0.1:904 in request for loadinfo. Marking as down.
```

I noticed that you had not configured allowed_clients in vsmagent.hconf. I’m a bit surprised that your setup seemed to (kind of at least) work anyway. That misconfiguration would lead to Permissioned Denied errors in vsmagent.log when the Master server is trying to communicate with the agent.

The localhost is always allowed as well as the IP of the hostname the VSM agent runs on

Since your setup seemed to have to work without any allowed_clients configured, and the masters seem to confuse which agent owns the session (127.0.0.1 instead of the actual host name), I believe the issues you’re experiencing might be related to DNS resolution.

Does the hostnames resolve to actual IP addresses or to 127.0.0.1 ?

Kind regards,
Martin

Hi @martin ,

I did add the allowed clients in vsmagent.conf and it still failed.
yes, the hostname is resolving to actual IP of the servers

On master my sessions are created

2023-08-22 13:18:59 INFO vsmserver.session: Session 127.0.0.1:10 created for user admintestaman
2023-08-22 13:18:59 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-22 13:18:59 DEBUG vsmserver.HA: Done writing active HA changes
2023-08-22 13:18:59 DEBUG vsmserver.session: Writing active sessions to disk
2023-08-22 13:18:59 DEBUG vsmserver.session: Done writing sessions
2023-08-22 13:18:59 DEBUG vsmserver.HA: Successfully transferred session change (new,admintestaman/127.0.0.1:10) to other node
2023-08-22 13:18:59 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-22 13:18:59 DEBUG vsmserver.HA: Done writing active HA changes
2023-08-22 13:27:32 DEBUG vsmserver.session: Doing periodic session verification
2023-08-22 13:27:54 DEBUG vsmserver.HA: Handling session change (delete, admintestaman/127.0.0.1:10)
2023-08-22 13:28:03 DEBUG vsmserver.session: Writing active sessions to disk
2023-08-22 13:28:03 DEBUG vsmserver.session: Done writing sessions

and after a while slave deletes the session

2023-08-22 13:27:54 INFO vsmserver.session: Session 127.0.0.1:10 for admintestaman has terminated. Removing.
2023-08-22 13:27:54 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-22 13:27:54 DEBUG vsmserver.HA: Done writing active HA changes
2023-08-22 13:27:54 DEBUG vsmserver.HA: Successfully transferred session change (delete,admintestaman/127.0.0.1:10) to other node
2023-08-22 13:27:54 DEBUG vsmserver.HA: Writing active HA changes to disk
2023-08-22 13:27:54 DEBUG vsmserver.HA: Done writing active HA changes
2023-08-22 13:27:54 DEBUG vsmserver.session: Writing active sessions to disk
2023-08-22 13:27:54 DEBUG vsmserver.session: Done writing sessions

and this info was transfered to master resulting in deletion of my actual session from master

Hi @martin ,

FYI - i did manage to fix the issue with hostname not coming in the HA message - and now hostname is coming and my sessions are working perfectly fine.

But, i am facing issue with creating multiple sessions . getting segmentation fault

(xfwm4:703704): xfwm4-WARNING **: 13:59:38.687: Failed to connect to session manager: Failed to connect to the session manager: IO error occured opening connection
/opt/thinlinc/etc/xstartup.default: line 24: 703679 Segmentation fault      (core dumped) "${TLPREFIX}/libexec/tl-run-profile"

Any idea how i can fix this ?

FYI - i did manage to fix the issue with hostname not coming in the HA message - and now hostname is coming and my sessions are working perfectly fine.

Great to hear that you managed to sort it out! Could you please tell us what was wrong and how you fixed it?

But, i am facing issue with creating multiple sessions . getting segmentation fault

This is because modern desktop environments lack the support of having multiple sessions running for the same user account. This is not a limitation in ThinLinc. You can read more about it here.

I believe the segmentation fault comes from the startup of Xfce, which fails.

Does this always happen when starting more than one session for the same user?

Regards,
Martin