Written by Martin Östlund (@martin)
3rd May, 2023
What problem are we trying to solve?
A typical ThinLinc cluster consists of one “master” server, and several identical “agent” servers behind it, where the sessions themselves are run. Having multiple agent servers allows ThinLinc to perform load-balancing, and also provides a certain degree of redundancy. If one agent goes down, there are still others available to host sessions. However, as there is only one master server, this presents a potential single point of failure for the entire cluster. To address this, ThinLinc provides high-availability (HA) features which enable administrators to eliminate this single point of failure by having a second master server, which is only used if the primary one fails.
From our ThinLinc Administrator Guide:
Both machines have an unique hostname and an unique IP address, but there is also a third IP address that is active only on the node currently responsible for the VSM server service. This is usually referred to as a resource IP address, which the clients are connecting to. ThinLinc does not move this resource IP address between servers, supplementary software is required for this purpose.
The information provided in this guide outlines a possible solution on how to achieve this.
Different use cases
While there are other projects out there that can provide a solution to share a VIP (Virtual Private IP address) between two or more hosts, for example Pacemaker from the Linux-HA project, these can be a bit daunting to set up and keep maintained. Especially if the sole purpose is to provide a VIP between two hosts.
To give you an example, Pacemaker is a great choice if your setup is designed with a chain of different services that all depend on each other somehow. It might be necessary to bring other services up (or down) on another machine in case of a failure on the primary. In this case, you’re not only moving the VIP, but the service itself. You might need to do things like mounting a remote file system or ensuring that other services are brought down on the host that just lost the VIP. For that use-case, Pacemaker from Linux-HA is a good choice
In a ThinLinc HA setup, two equal machines are used to keep the VSM service running and the session database in sync. One of the machines is considered to be primary, and the other one secondary. The primary machine is normally handling VSM server requests, and if that machine is, for whatever reason, offline, the secondary is ready to perform its duties. There are no other dependencies involved in a ThinLinc HA pair, and it’s rather simple in its design. That’s why keepalived is a good fit for providing a VIP failover between the pair.
To avoid this single point of failure, and to allow a failure of one VSM server to be completely transparent to ThinLinc users, we need a convenient way to make the transition to the secondary as smooth as possible. Simply informing the users by saying “Hey, use this IP address instead” is not very efficient nor user-friendly. That’s where keepalived comes in.
Keepalived is a software written in C that provides a simple tooling to manage high availability for Linux servers. It uses the Virtual Router Redundancy Protocol (VRRP) to create a virtual IP address that can be shared among multiple machines. It also ensures that if one machine fails, another machine can seamlessly take over its tasks by moving the VIP to the other machine.
Once keepalived is installed and configured on both VSM servers, the primary VSM server will handle incoming requests to the VIP. In case of failure, the VIP is instantly moved over to the secondary VSM server, which will happily continue on serving requests.
Once the primary server is available again, VSM server will synchronize its session database from the secondary server and the VIP will be moved back to the primary.
Let’s get to it.
The prerequisites for this setup are that you have two Linux servers available to you.
If you’re unsure about some commands used in this guide, please consult the man pages or the keepalived
documentation. Links provided at the end of this guide.
The example of this guide will use the following setup:
- Primary node with the hostname tlha-primary and IP address 10.0.0.2
- Secondary node with the hostname tlha-secondary and IP address 10.0.0.3
- VIP address of 10.0.0.4 with the DNS name tlha
The distribution we’re using is Red Hat Enterprise Linux 8.
To allow for services to bind to an IP address not present on the system, one must configure the kernel parameter net.ipv4.ip_nonlocal_bind
.
The following command has to be executed on both tlha-primary and tlha-secondary:
[cendio@tlha-primary ~]$ echo "net.ipv4.ip_nonlocal_bind = 1" | sudo tee -a /etc/sysctl.d/100-nonlocal_bind.conf
Give both machines a reboot to have the changes take effect. Once the machines are rebooted, log back in and verify that net.ipv4.ip_nonlocal_bind
is now true
.
On both machines, execute and verify that that net.ipv4.ip_nonlocal_bind = 1
[root@tlha-primary ~]# sysctl -a | grep net.ipv4.ip_nonlocal_bind
We are now ready to install keepalived
. On both machines:
[cendio@tlha-primary ~]$ sudo dnf install keepalived
After keepalived
has been installed, a basic configuration file is placed in /etc/keepalived/keepalived.conf
. Move it out of the way on both tlha-primary
and tlha-secondary
:
[cendio@tlha-primary keepalived]$ sudo mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
Then create a new configuration on both nodes. The values for some settings may differ between tlha-primary
and tlha-secondary
, so pay extra attention to the highlighted settings below:
On tlha-primary
, make sure the contents of keepalived.conf are:
! Configuration File for keepalived
global_defs {
notification_email {
root@tlha-primary
}
notification_email_from root@tlha-primary
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id tlha-primary
}
vrrp_instance TL_VSM {
state MASTER
interface ens192
virtual_router_id 51
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.4/24
}
}
On tlha-secondary
, make sure the contents of keepalived.conf
are:
! Configuration File for keepalived
global_defs {
notification_email {
root@tlha-secondary
}
notification_email_from root@tlha-secondary
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id tlha-secondary
}
vrrp_instance TL_VSM {
state BACKUP
interface ens192
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.0.0.4/24
}
}
Note: You will need to modify the variables in the examples above to suit your environment.
Setting | Explanation |
---|---|
notification_email |
If you choose to configure alert emails to be sent out on state changes, this is the email address the alerts will go to. The above configuration will not send any alerts, further configuration in the vrrp_instance block is necessary. |
router_id |
A string that identifies this machine. |
state |
Initial state of this VRRP instance. |
interface |
Which interface VRRP will operate on. |
virtual_router_id |
Arbitrary number used to differentiate multiple instances of vrrpd running on the same machine/network interface. We only have one instance in this scenario. |
priority |
For electing MASTER, highest priority wins. We use 101 for MASTER and 100 for BACKUP |
advert_int |
The default advertisement interval is set to one second. If the backup nodes fail to receive three consecutive VRRP advertisements, the backup server with the highest assigned priority takes over as the primary server and assigns the virtual IP addresses to its own network interface. |
auth_pass |
Password for accessing vrrpd , used to authenticate servers for failover synchronization. This must be the same on both machines. |
virtual_ipaddress |
Our VIP for tlha , 10.0.0.4. |
There are plenty of more configuration options available, and we recommend that you read up on what else is available for you to configure, /usr/share/doc/keepalived is a good source for example configurations.
If you have firewalld
running on your Red Hat system, you must allow VRRP traffic to pass between the keepalived
machines. To configure the firewall to allow the VRRP traffic with firewalld
, run the following commands on both tlha-primary and tlha-secondary:
[cendio@tlha-primary keepalived]$ sudo --add-rich-rule='rule protocol value="vrrp" accept' --permanent [cendio@tlha-primary keepalived]$ sudo firewall-cmd --reload
We are now all set for firing up keepalived
, execute the following on both tlha-primary and tlha-secondary:
[cendio@tlha-primary keepalived]$ sudo systemctl enable --now keepalived
Verify that keepalived
is running:
[cendio@tlha-primary keepalived]$ sudo systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2023-04-28 08:08:45 CEST; 9min ago
Process: 3099 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 3100 (keepalived)
Tasks: 2 (limit: 11368)
Memory: 1.8M
CGroup: /system.slice/keepalived.service
├─3100 /usr/sbin/keepalived -D
└─3101 /usr/sbin/keepalived -D
Apr 28 08:08:48 tlha-primary Keepalived_vrrp[3101]: Sending gratuitous ARP on ens192 for 10.48.2.100
Apr 28 08:08:48 tlha-primary Keepalived_vrrp[3101]: Sending gratuitous ARP on ens192 for 10.48.2.100
You should now see that our VIP, 10.0.0.4
, has been assigned on interface ens192
on tlha-primary:
[cendio@tlha-primary ~]$ ip a s ens192
2: ens192: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:b1:a3:25 brd ff:ff:ff:ff:ff:ff
altname enp11s0
inet 10.0.0.2/24 brd 10.48.2.255 scope global dynamic noprefixroute ens192
valid_lft 2545sec preferred_lft 2545sec
inet 10.0.0.4/24 scope global secondary ens192
valid_lft forever preferred_lft forever
Executing the above command on tlha-secondary
should show absence of the VIP. Let’s try to failover the VIP to tlha-secondary
. On the tlha-primary
machine, stop keepalived
:
[cendio@tlha-primary ~]$ sudo systemctl stop keepalived
Verify that the VIP was failed-over to tlha-secondary
:
[cendio@tlha-secondary ~]$ ip a s ens192
2: ens192: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:b1:2a:69 brd ff:ff:ff:ff:ff:ff
altname enp11s0
inet 10.0.0.3/24 brd 10.48.2.255 scope global dynamic noprefixroute ens192
valid_lft 2070sec preferred_lft 2070sec
inet 10.0.0.4/24 scope global secondary ens192
valid_lft forever preferred_lft forever
This can also be verified by looking at the journal for keepalived
on tlha-secondary
when stopping keepalived
on tlha-primary
:
[cendio@tlha-secondary ~]$ journalctl -u keepalived -f
Apr 28 08:34:50 tlha-secondary Keepalived_vrrp[1879]: (TL_VSM) Backup received priority 0 advertisement
Apr 28 08:34:50 tlha-secondary Keepalived_vrrp[1879]: (TL_VSM) Receive advertisement timeout
Apr 28 08:34:50 tlha-secondary Keepalived_vrrp[1879]: (TL_VSM) Entering MASTER STATE
Apr 28 08:34:50 tlha-secondary Keepalived_vrrp[1879]: (TL_VSM) setting VIPs.
Apr 28 08:34:50 tlha-secondary Keepalived_vrrp[1879]: (TL_VSM) Sending/queueing gratuitous ARPs on ens192 for 10.48.2.100
When starting keepalived
on tlha-primary
again, we can observe that the VIP has been moved back, using the same commands as above.
Problems?
If you experience any problems following this guide, or want to ask follow-up questions, please leave a comment below.
Conclusion
Setting up keepalived
can greatly enhance the availability and reliability of your ThinLinc infrastructure. With keepalived
, you can ensure that your VSM service remains accessible even in the event of a machine failure. By following the steps outlined in this guide, you should be able to successfully configure keepalived
on your system, and you can now follow along in the ThinLinc Administrators Guide to continue to setup ThinLinc HA for VSM.
References
man 8 keepalived
man 5 keepalived.conf
Resources and examples: /usr/share/doc/keepalived
Project homepage: Keepalived for Linux