Thinc VSM Agent connection refused

Hello since one week, one of my VSM agent is marked as down and I can’t find what is the issue.
The Virtual Machine is up, the DNS is working, the network is working also. I can ping the host from my both HA Thinlinc nodes, I can SSH from them.

The log in /var/log/vsmserver.log tells me that:

WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:00:43 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:01:23 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:02:03 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:02:43 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:03:23 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:04:03 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:04:43 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:05:24 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:06:04 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:06:44 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:07:24 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:08:04 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:08:44 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:09:24 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.
2022-03-06 00:10:04 WARNING vsmserver.loadinfo: [Errno 111] Connection refused talking to VSM Agent tl-agent-1:904 in request for loadinfo. Marking as down.

It is Thinlinc v12.1, the configuration is the same than last year, I did not change anything in the configuration, except than changing the localectl language from French to English on the agent.

Is anyone have an idea ?

1 Like

Is vsmagent running on the agent server ?

% systemctl status vsmagent

If you run it on the tl-agent-1 server ?

What is the result of:

% nc -zv tl-agent-1 904

If you run it on the thinlinc master server

1 Like

Hello,

The issue for my agent node is closed. I am migrating to a bigger hardware cluster.

But it appears I have the same situation on a fresh cluster (2 nodes with 2 socket of 20/40 C/T and 512 GB RAM)

Corosync / Pacemaker cluster healthy:

root@tl-alpha-d11:/var/log# crm configure show
node 1: tl-alpha-d11
node 2: tl-beta-d11
primitive cluster_ip IPaddr2 \
        params cidr_netmask=24 ip=10.1.15.2 \
        op monitor interval=30s \
        op start timeout=20s interval=0s \
        op stop timeout=20s interval=0s
primitive fence_tl-alpha-d11 stonith:fence_virsh \
        params ipaddr=10.1.15.2 port=VMtl-alpha-d11 action=off login=root passwd="******" \
        op monitor interval=60s
primitive fence_tl-beta-d11 stonith:fence_virsh \
        params ipaddr=10.1.15.2 port=VMtl-beta-d11 action=off login=root passwd="******" delay=15 \
        op monitor interval=60s
location l_fence_tl-alpha-d11 fence_tl-alpha-d11 -inf: tl-alpha-d11
location l_fence_tl-beta-d11 fence_tl-beta-d11 -inf: tl-beta-d11
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=2.0.5-ba59be7122 \
        cluster-infrastructure=corosync \
        cluster-name=debian \
        stonith-enabled=yes \
        no-quorum-policy=ignore \
        last-lrm-refresh=1647291235v

nc -zv tl-alpha-d11 904

root@tl-alpha-d11:/var/log# nc -zv tl-beta-d11 904
Connection to tl-beta-d11 (10.1.15.4) 904 port [tcp/*] succeeded!
root@tl-alpha-d11:/var/log# nc -zv tl-alpha-d11 904
Connection to tl-alpha-d11 (10.1.15.3) 904 port [tcp/*] succeeded!
  • Connexion to the server expired, check your settings

When I check the /var/log/vsmserver/agent at the same time, i get:

The no route to host in the log was because i had 127.0.0.1 linked to the DNS name in /etc/hosts (fixed)

I have exact same configuration than my previous setup on Debian 10, now I’m on Debian 11.

This morning I tested again, and it is working now !

I did not change anything in the configuration. Maybe it was an DNS propagation issue ?
I tried to find what was the issue but nothing appear in my investigation.

2 Likes