@s0p4L1n hard to know for sure, it depends on your usage patterns and available hardware. As a guess: if users are doing things in the browser which require graphic acceleration and there is no GPU available, then this could result in high CPU usage and poor performance.
From within your ThinLinc session, run:
glxinfo | grep "OpenGL renderer"
Open Chrome and browse to chrome://gpu. Check the value for GL_RENDERER.
If llvmpipe shows up in both these places, then the CPU is being used for graphic acceleration.
Like @aaron said, it depends on what your users are doing, but to me, it seems like you should have sufficient resources for typical use.
Is it only Chrome that runs slow in the sessions?
Are users on the tl-beta-d11 node having a better experience (that machine looks to have a lower load in your screenshot)? If not, is it possible that other factors are playing a part here? I’m thinking of stuff like network latency or bandwidth issues.
What desktop environment are you using? Desktops such as GNOME are typically not designed to run well in remote desktop scenarios and could very well be a reason for the high CPU load you’re seeing.
We had a peak CPU usage during 45 min, the load was 215 at its peak on tl-beta-d11, the load on the -tl-alpha-d11 were almost the same, with only 35 users (load balanced between the two nodes)
The thinlinc client accepted users login, but did not wanted to launch Chrome as it was very slow.
I did a reboot of the tl-alpha-d11 (as it was not working and did not impact), after this the load came back to its normal value.
During the afternoon, there was no more issue and the load average were between 5-15.
Both nodes are redundant on switches, with 2x10Gbps QSFP+ cable
For now we did not detected what was the issue, maybe I will discuss on GPU acceleration with my chief, I did not think about that at the time we decided to buy dedicated hardware servers.
All employees using Thinlinc for now are graphists which needs youtube for references, and It could be this.
I’m trying to find out an alternative like a Youtube client which do not have upload features, so it will allow them to view directly on the workstation instead in Thinlinc, avoiding CPU high peak.
If I’m not mistaking, decoding video in the browser / youtube is always CPU bound, and having a real GPU won’t make any real difference in this case? @aaron@samuel what’s your take on this?
Chrome is capable of offloading certain graphical operations to the GPU via WebGL, as well as decoding video. But again, whether this is what’s causing the increase in load depends on usage patterns. It may be something else.
VirtualGL has just received an EGL front-end, which allows it to be used with recent versions of Chrome and Firefox for GPU offload.
As far as GPU selection is concerned, this is really outside the scope of ThinLinc. Some investigation would be required to determine which is best suited for your particular use case.
The load decrease from 900 to 200 instantly, but as the users log in again, it came back…
I rebooted the tl-alpha-node and now the load of the tl-beta is now 20.
But the load came back on 150 on the tl-alpha.
As Google Chrome is the only app used on Thinlinc by user, we tried to identify what is cosumming everything but we can’t identify on which chrome tab there is Youtube.
I’m pretty sure this is the bottleneck but I can not confirm 100%.
1hour later it is now back to its normal state: 60 load and 45 load on each nodes. But mayne not for so long.
We do not intend to use Firefox, we decided internally to use Google Chrome as default browser. Thus, in the past experienced, the resources consumptions are pretty the same now.
What I do not understand is that we tested the solution on virtual machine with 2 servers in same configuration before deploying in production.
The 2 VM had 48GB RAM and 8vcpu allocated. There was 15/20 people on the solutions.
Now on 2 hardware servers each with 500GB RAM and 2 socket of 20/40 CPU/Thread, it does not support 40 users.
Yesterday there was the same amount of users connected and no issue were told by users.
Also, the speedtest result from the Chrome is bottleneck in download,
While the speedtest from the command line of the server result 800/800 DOWN/UP.
I understand the issue is not Thinlinc itselft, we will try to find what is causing this. And if needed adding 2 more servers, but even if the load is high, the CPUs from top command are seen idle almost or only at 5-6% used.
#!/bin/sh
#
# print total CPU usage in percent of currently logged in users.
#
# 1st column: user
# 2nd column: aggregated CPU usage
# 3rd column: normalized CPU usage according to the number of cores
#
# to sort by CPU usage, pipe the output to 'sort -k2 -nr'
#
set -e
own=$(id -nu)
cpus=$(lscpu | grep "^CPU(s):" | awk '{print $2}')
for user in $(who | awk '{print $1}' | sort -u)
do
# print other user's CPU usage in parallel but skip own one because
# spawning many processes will increase our CPU usage significantly
if [ "$user" = "$own" ]; then continue; fi
(top -b -n 1 -u "$user" | awk -v user=$user -v CPUS=$cpus 'NR>7 { sum += $9; } END { print user, sum, sum/CPUS; }' | sort -k2 -nr) &
# don't spawn too many processes in parallel
sleep 0.05
done
wait
# print own CPU usage after all spawned processes completed
top -b -n 1 -u "$own" | awk -v user=$own -v CPUS=$cpus 'NR>7 { sum += $9; } END { print user, sum, sum/CPUS; }'
I will keep you update after some testing tonight when employees are out of the officle.
And will present my apologies for bothering the forum if the problem is this one, because it is our fault, by having high hardware ressources but not for the /home.
No need to apologies at all. This thread can be of use for someone in the future for sure, especially in the thought process of ruling things out when debugging such issues.
Please keep us updated, and thank you for sharing your case!
Hello, the NFS Central home repository is migrated to another server.
Well it’s been 2 hours there is 40 users load balanced between the two nodes and the load is very low.