I recently ran into a problem with our ThinLinc setup, which is one Master server and one Agent server with connections load-balanced between the two.
We have a public web address where you can connect via web interface or the ThinLinc client which goes to the IP address of the Master server in our DMZ. The Master server then transfers the connection to itself or to the Agent server depending on which one is fuller.
The problem I have encountered is that the Master server asks for a 2FA code (which is the desired behaviour) but if he transfers the connection to the Agent, the Agent asks for a 2FA code again (without showing the input window for the code) and then times out and cannot establish a connection. In the client’s logs you can see that the Master asks for 2FA, shows the window to input it, then accepts it (if it is valid of course), then transfers to the Agent which asks for 2FA again, doesn’t show the window and then times out.
If you immediately connect again, you get transferred directly to the Agent which then correctly shows the input box for the 2FA and will accept the NEXT 2FA code you get in your authenticator app (tried Google and Microsoft Authenticators). It does NOT accept the same 2FA code that you input on the previous connection at the Master server, indicating that this code was accepted and saved somewhere in the user’s home folder so it will not be accepted again.
If the Agent server is fuller than the Master, the Master transfers the incoming connection to itself and there are no problems: The 2FA code has already been accepted and the session is started. The problem only comes up when the connection is transferred to the Agent.
To me it sounds like there is some missing communication between the Master and the Agent. I believe, normally the Master should tell the Agent that 2FA was already handled and the Agent should not ask again for the 2FA code. For some reason, though, it does. Maybe there is some problem with our firewall settings not allowing communication between Master and Agent? Which ports should be open on which? Port 9000 on the Master and 904 on the Agent, right?
Is my assumption even correct that the Master should handle 2FA and the Agent should not ask again? Or should 2FA happen AFTER the transfer to either the Master itself or the Agent? That would indicate a problem with our config instead of the communication between Master and Agent.
Which authentication provider are you using?
ThinLinc requires the passcode to be used twice due to how the ThinLinc client first connects and authenticates to the master server and then reconnects and authenticates to the agent server.
Port 22,904 and 9000 needs to be open for communication between the master and agent.
Since the master will direct your user to the agent, this seems to be the case already.
So far, everything has been working with this config on the Master and (before a recent restart of both servers) it also worked with the Agent. Before this reboot, when connecting via web interface or client, the user would get one 2FA prompt and then be logged in. Now the above-mentioned behaviour happens.
The client log from an unsuccessful login shows Password & 2FA prompt on both servers:
22,904 and 9000 needs to be open between master and agent. Port 22 also needs to be open between the client and all masters and agents.
What does the ThinLinc client report when this error occurs?
Could you also please share the sshd log from both master and agent:
journalctl -u sshd.service
Are the home directories shared between the master and agent? If google-authenticator is also installed on the agent (which it seems to be), the .google_authenticator file has to match.
22,904 and 9000 needs to be open between master and agent. Port 22 also needs to be open between the client and all masters and agents.
In our iptables firewall, all three are open on Master and on Agent. However, when listing the packets going through each of the rule chains with iptables -L -n -v the rule for port 904 only shows any activity on the Agent and the rule for port 9000 only shows activity on the Master. The rule for port 904 shows 0 packets on the Master and the rule for port 9000 shows 0 packets on the Agent. The rule for port 22 shows a lot of packets on both machines.
What does the ThinLinc client report when this error occurs?
The client log says:
2023-08-01T14:06:40: ssh[E]: NEXT AUTHMETHOD: none
2023-08-01T14:06:40: ssh[E]: AUTH FAILURE
2023-08-01T14:06:40: ssh[E]: NEXT AUTHMETHOD: keyboard-interactive
2023-08-01T14:06:40: ssh[E]: SSH_PROMPT:(user.name@agent.domain.de) Password:
2023-08-01T14:06:40: ssh[E]: SSH_PROMPT:(user.name@agent.domain.de) Verification code:
2023-08-01T14:06:42: ssh[E]: AUTH FAILURE
2023-08-01T14:06:42: ssh[E]: NEXT AUTHMETHOD: password
2023-08-01T14:06:42: Already tried one password method. Aborting...
2023-08-01T14:06:45: Process 124228 exited with code 0
2023-08-01T14:06:45: Process 124236 killed by signal 15
at the moment when the connection is handed from Master to Agent (the moment when the status message in the client window switches from reading “connecting to server master.domain.de” to “connecting to agent agent@domain.de”). This switch happens the moment you click “OK” on the 2FA prompt. When immediately connecting again, the message “connecting to server master.domain.de” is not shown and it immediately shows “connecting to agent agent@domain.de” and the 2FA code is accepted here, if it is not the previously used code.
Could you also please share the sshd log from both master and agent:
journalctl -u sshd.service
That command shows on both machines (when run as root):
-- Logs begin at Tue 2023-05-16 12:17:34 CEST, end at Mon 2023-08-07 11:21:24 CEST. --
-- No entries --
Are the home directories shared between the master and agent? If google-authenticator is also installed on the agent (which it seems to be), the .google_authenticator file has to match.
The home dirs are on a storage server and served as NFS exports that are mounted on the Master and Agent, so the .google_authenticator file would be the same. It is located in the home folder itself (not under ~/.ssh as in the linked tutorial).
I checked the systemd log while trying to connect with journalctl -f | grep sshd and got the following output:
on the Master:
Aug 07 11:45:47 master sshd[106336]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.0.1 user=dummy.user
Aug 07 11:46:15 master sshd(pam_google_auth)[106336]: Accepted google_authenticator for dummy.user
Aug 07 11:46:15 master sshd[106334]: Accepted keyboard-interactive/pam for dummy.user from 192.168.0.2 port 57667 ssh2
Aug 07 11:46:15 master sshd[106334]: pam_unix(sshd:session): session opened for user dummy.user by (uid=0)
Aug 07 11:46:18 master sshd[106334]: pam_unix(sshd:session): session closed for user dummy.user
on the Agent:
Aug 07 11:46:20 agent sshd[60824]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.0.2 user=dummy.user
Aug 07 11:46:20 agent sshd(pam_google_auth)[60824]: Trying to reuse a previously used time-based code. ("/home/dummy.user/.google_authenticator")Retry again in 30 seconds. Warning! This might mean, you are currently subject to a man-in-the-middle attack.
Aug 07 11:46:20 agent sshd(pam_google_auth)[60824]: Invalid verification code for dummy.user
Aug 07 11:46:22 agent sshd[60799]: error: PAM: Authentication failure for dummy.user from 192.168.0.2
Aug 07 11:46:38 agent sshd[60799]: Connection reset by authenticating user dummy.user 192.168.0.2 port 57678 [preauth]
If (as you said in the first answer) the ThinLinc client disconnects from the Master before re-connecting to the Agent, it looks like that’s the problem, right? The Master asks for 2FA, the user inputs the 2FA, the used 2FA is written to the .google-authenticator file and when the ThinLinc client re-connects to the Agent a few seconds later, the Agent tries to login with that 2FA code but it is already marked as used at that point.
Is the Master somehow supposed to communicate to the Agent that 2FA was already handled? Because the logs suggest that the Agent will always try to authenticate against a 2FA code that was already used at the Master and written as used to the .google-authenticator file? I assume here that the Master somehow forwards the 2FA code to the Agent? Because the 2FA prompt is only ever shown once, at the Master. Or is it supposed to be shown twice or only at the Agent?
I definitely remember (from when everything was still working) that the 2FA prompt was only ever shown once, but I’m not sure if it was during the connection phase at the Master or at the Agent?
P.S:
I guess the pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=192.168.0.1 user=dummy.user
part from the journalctl logs correspond to the
2023-08-01T14:06:40: ssh[E]: NEXT AUTHMETHOD: none
2023-08-01T14:06:40: ssh[E]: AUTH FAILURE
part from the client logs and there is also something wrong with our order of auth methods.
Sorry, I wasn’t clear enough. What I meant was what does the ThinLinc client GUI reports?
I checked the systemd log while trying to connect with journalctl -f | grep sshd and got the following output:
Hmm ok. I would expect it to also log something similar to this, since you are using the “grace_period” argument in your google_authenticator PAM module:
Aug 07 10:48:37 lab-196.lkpg.cendio.se sshd(pam_google_authenticator)[8481]: within grace period: "cendio"
Aug 07 11:46:20 agent sshd(pam_google_auth)[60824]: Trying to reuse a previously used time-based code. ("/home/dummy.user/.google_authenticator")Retry again in 30 seconds. Warning! This might mean, you are currently subject to a man-in-the-middle attack.
This is not something I’ve seen before, and something you would’ve to investigate further. I tried to replicate your issue in my test environment, but have yet not been able to. (lab env consists of two RHEL9 with google-authenticator from EPEL, and NFS shared /home)
Is the Master somehow supposed to communicate to the Agent that 2FA was already handled? Because the logs suggest that the Agent will always try to authenticate against a 2FA code that was already used at the Master and written as used to the .google-authenticator file? I assume here that the Master somehow forwards the 2FA code to the Agent? Because the 2FA prompt is only ever shown once, at the Master. Or is it supposed to be shown twice or only at the Agent?
No, ThinLinc is not aware that pam_google_authenticator is in use at all, actually. ThinLinc communicates back and forth to PAM (over ssh) according with how the system is configured. (ChallangeResponseAuthentication yes in sshd_config is fundamental to this).
As you wrote earlier, the issue started after reboot of the servers, not a reconfiguration of you firewall. So I believe we can leave this out of the troubleshooting. For more information regarding ports used by ThinLinc you can find here:
What I meant was what does the ThinLinc client GUI reports?
The GUI says in the small stripe at the bottom of the window:
when user clicks “Connect” it reads: “Connecting to server master.domain.de…”
then the 2FA prompt pops up and the user inputs their Verification code
then the message changes to: “Connecting to agent agent.domain.de…”
after a few seconds a “Connection error” window pops up with the message: “Couldn’t set up secure tunnel to ThinLinc agent. (Login failed! Wrong username or password.)”
While the logs on the sever side implicate that the Agent is trying to re-use the 2FA token that was asked for at the Master but finds that it has been used before. The username and password are correct and work when immediately re-connecting.
Hmm ok. I would expect it to also log something similar to this, since you are using the “grace_period” argument in your google_authenticator PAM modul
The message with the grace_period does appear but only on the second connection, so after login failed once and the user immediately tries again but with the next 2FA code he gets in the Authenticator app. Then the login also works and the desktop is shown and everything works perfectly. It just never works on the first try if the user’s session is put on the Agent. If the session by chance gets put on the Master itself, everything works first try.
(ChallangeResponseAuthentication yes in sshd_config is fundamental to this)
Okay, we have that set to yes on both, Agent and Master.
As you wrote earlier, the issue started after reboot of the servers, not a reconfiguration of you firewall. So I believe we can leave this out of the troubleshooting.
Hmm okay, from what you wrote on the previous post it sounds like it is not a problem with communication between Master and Agent but between Master/Agent/Client on one side and PAM on the other.
Oh okay, the within grace_period message comes up in systemd on the Master if the user immediately tried again and the last used 2FA token is still valid on the Master at that point. Then the Master transfers the session to the Agent which then asks again for a 2FA token and then accepts any token that came after the one given to the Master previously…
Sounds like PAM doesn’t bridge communication between Master and Agent and both always want a 2FA token. In the one situation where the Master wants to give a session to the Agent, this leads to the problem that the Agent tries using a token which was already used.
Aug 07 11:46:20 agent sshd(pam_google_auth)[60824]: Trying to reuse a previously used time-based code. ("/home/dummy.user/.google_authenticator")Retry again in 30 seconds. Warning! This might mean, you are currently subject to a man-in-the-middle attack.
This would imply that your user answered Yes to the question:
Do you want to disallow multiple uses of the same authentication
token? This restricts you to one login about every 30s, but it increases
your chances to notice or even prevent man-in-the-middle attacks (y/n)
The problem is then likely the RATE_LIMIT .. 30 and DISALLOW_REUSE parts, right? I’m not sure what the number behind DISALLOW_REUSE means? I would have expected this to read either True/False or Yes/No or 0/1 or something? But this looks like the problematic part.
I’m still not sure why, though, as this exact setup worked fine before the restart, even with the DISALLOW_REUSE option set exactly like this.
Is the REUSE policy linked to the RATE_LIMIT? As in: REUSE is always allowed if the code is reused within 30 seconds (or whatever is set in RATE_LIMIT)? Or is REUSE generally only ever allowed for codes not older than 30 seconds? I’m sure that allowing REUSE will not enable user to use codes that are a month old or always use just one single code forever, right?
Sorry for the confusion but the servers in question have been running for over 1 year before this specific reboot and back then there were different admins who set up the server, so it is possible that there were some weird options set in firewall or network configs that were put there a year ago and then only became active after this reboot and affected port availability or network latency or any other factor that could have caused this.
I’d have to direct you to FILEFORMAT, since this is not something I’m very familiar with.
Perfect, that helps a lot! I only had checked the readme and the code itself.
Okay, so our only option would be to allow reuse. Or is there an option in ThinLinc that 2FA auth only happens after the Master has delegated to the Agent? So that the Master only checks the password, then delegates and only the Agent checks the 2FA?
Ah okay, then I think we need to change our setup so that we have a Master that does not also function as an Agent taking sessions. I think then we will need a small VM that just acts as Master and has “only password” configured in PAM and which then load-balances sessions between our two current servers (as Agents) which both have 2FA configured in PAM.
Or we just allow reuse for those 30 seconds until a new TOTP token is generated anyways.
Oh that’s right, I could just let the Master have 2FA and the Agent with password only, right? This way the user will get prompted for 2FA only once but definitely at least once.