VirtualGL Q&A with Darrell Commander

Many of you will be familiar with VirtualGL as a means to run OpenGL applications remotely, with full hardware acceleration. While not a part of ThinLinc itself, VirtualGL is popular enough amongst ThinLinc users that we commonly receive questions about its usage.

Darrell R. Commander (@drc) is the lead developer and maintainer for the VirtualGL Project. He has kindly agreed to participate in this Q&A to discuss the history of VirtualGL, the reasons for its existence, future direction, and the opportunities and challenges currently being faced by the project.

This thread will remain open for those who wish to ask questions directly, so if you have one, please post it below. To get the ball rolling, however, I’d like to start with a few of my own:

  • Please tell us a bit about the history of VirtualGL, the technology behind it, and the problems that it solves.

  • How is VirtualGL being affected by emerging technologies such as EGL, Vulkan, and Wayland?

  • What are some of the challenges that VirtualGL faces as a project?

  • What is the best way to contribute to VirtualGL?

  • What is the best way to get support for VirtualGL?

–

For more information on the VirtualGL Project, visit https://virtualgl.org

For code, support, and feature requests, visit GitHub - VirtualGL/virtualgl: Main VirtualGL repository

Darrell can be reached via the VirtualGL mailing lists VirtualGL | About / Mailing Lists

3 Likes

Hi, @aaron and members of the ThinLinc community. I’ll answer these questions in individual replies.

It’s more than a bit, but bear with me.

After graduating from college, I worked at Compaq in the late 1990s on various projects related to OpenGL performance and 3D accelerator architecture (the term “GPU” didn’t exist at the time), benchmarking and optimizing various technical computing applications, streaming video, and high-performance computing and visualization in general. I was laid off as a result of the DEC merger, and I went to work for Landmark Graphics Corporation, which produced a suite of high-end oil & gas exploration and production applications. Landmark’s upper management at the time made a big push toward remote computing, and my group was tasked with figuring out the details. Remotely displaying 2D graphics applications was feasible, but we also needed to remotely display 3D graphics applications using hardware-accelerated OpenGL. The standard way of doing that at the time was indirect OpenGL rendering, which involved running a 3D application on a remote application server and sending all of the OpenGL commands and data over the network (via the GLX extension to the X Windows protocol) to be rendered by a 3D accelerator in the client machine. However, that was untenable for some oil & gas applications. Seismic visualization applications, in particular, typically dealt with multi-gigabyte volumetric datasets and used large textures to pass planar “probes” through those datasets interactively. (The same could be said about medical visualization applications, incidentally.) Such applications typically generated many megabytes of texture data for every frame they rendered, with no ability to reuse that data between frames (and thus no ability to take advantage of OpenGL caching mechanisms.) 100-megabit networks were mainstream, but some customers still had 10-megabit networks, so the largest probes that could be rendered with “reasonably interactive” frame rates (~15 fps) using indirect OpenGL were around 25,000-250,000 voxels. The oil & gas industry was pushing for larger and larger datasets, and the existing method of remote 3D application display simply did not scale. More generally, indirect OpenGL rendering was becoming increasingly incompatible with emerging OpenGL extensions, such as fragment/vertex shaders. A colleague of mine came up with the idea of doing OpenGL rendering on the application server and using real-time JPEG compression, via the Intel JPEG Library, to send the OpenGL-rendered frames to the client machine as a video stream. The IJL could deliver 30+ megapixels/second on a Pentium 4, which was sufficient to sustain interactive frame rates with a 1-megapixel application window. Moreover, JPEG compressed the OpenGL-rendered frames down to about 1-2 bits/pixel with perceptually lossless image quality, so 100-megabit networks could theoretically accommodate 50-100 megapixels/second. Furthermore, this thin client approach meant that remote display performance no longer depended on the size of the 3D data being rendered, and it meant that the client machines no longer needed to be high-end 3D workstations. Any machine capable of playing HD video was sufficient.

Initially, I implemented this idea as a remote display library, but that was a dead end, because it would have required all of the application teams to develop and document their own application-specific remote rendering features. I then dabbled with screen scraping— compressing and transporting the contents of a remote machine’s physical display in real time as a motion JPEG stream. That was a dead end as well, because it didn’t allow multiple users to simultaneously share the same application server. In early 2003, I stumbled upon a paper from researchers at the University of Stuttgart that described using LD_PRELOAD on Un*x systems to intercept (“interpose”) GLX commands from unmodified OpenGL applications at run time. The “interposer library” could thus modify those GLX commands in order to automatically redirect OpenGL rendering to a dedicated server-side GPU-attached X display (we call this the “3D X server”) and to automatically redirect OpenGL rendering into off-screen pixel buffers (“Pbuffers.”) This “split rendering” approach meant that non-OpenGL portions of the application’s GUI could continue to be rendered on the client machine via remote X, whereas the OpenGL portions of the application’s GUI could be rendered on the server using a high-end GPU. Once OpenGL rendering was redirected by the interposer library, the OpenGL commands and data could flow directly from the application to the GPU. That allowed for maximum compatibility with emerging OpenGL extensions. The missing piece was the high-speed motion JPEG transport that I had already developed.

In 2001, Landmark had acquired Magic Earth, whose GeoProbe seismic visualization software could deal with huge datasets (sometimes dozens of gigabytes.) That necessitated real-time rendering of planar probes (OpenGL textures) with potentially hundreds of millions of voxels each. This was the “killer app” for my prototype. It’s worth mentioning that a research group at IBM was working on the same problem at the time. Their initial prototype did not involve GLX interposition or a high-speed image transport, but they ultimately adopted both approaches in their software, which became Deep Computing Visualization (DCV.) (DCV was sold to NICE in 2010, and NICE was acquired by Amazon in 2016.) In the meantime, I continued to refine my prototype, and the seismic visualization developers at Landmark (particularly the GeoProbe developers, who officed next to me) helped test it and diagnose various compatibility issues. It’s also worth mentioning a commercial remote desktop solution that added hardware-accelerated OpenGL capabilities in the late 2003/early 2004 timeframe. However, that solution relied upon indirect OpenGL rendering. Indirect OpenGL rendering occurred between the 3D application and a virtual X server (“X proxy”) running on the same machine, so performance was less of a concern than with indirect OpenGL rendering over a network, but compatibility with emerging OpenGL extensions was still a concern.

By early 2004, my software had matured to the point at which it could reasonably take the place of the internal application-specific remote display features in Landmark’s seismic visualization applications (at least on Linux.) The upper management of Landmark’s R&D Division pushed for releasing the software as open source, in the hopes of improving its compatibility even further by encouraging outside vendors and users to test it. One of the aforementioned GeoProbe developers had some experience with maintaining an open source project, so he helped me pick an appropriate license and navigate the initial release on SourceForge. He also came up with the name “VirtualGL.” Since most oil & gas software users were already using remote X to some degree, VirtualGL was a natural fit, since it bolted onto remote X. However, remote X has never been a good solution for low-bandwidth/high-latency networks. Thus, I started experimenting with adding high-speed JPEG capabilities to TightVNC, for the purposes of using VirtualGL over slow networks and enabling collaboration between 3D application users. TurboVNC was born. Soon after VirtualGL and TurboVNC were released, Landmark underwent a management upheaval. The new management was not as supportive of VirtualGL, so I started looking for other avenues. Sun Microsystems was heavily invested both in open source and remote computing, and they already had a closed-source solution that worked similarly to VirtualGL, so VirtualGL was a good fit. I worked for Sun for over four years, developing VirtualGL and TurboVNC into enterprise-quality products, which were released as part of the “Sun Shared Visualization” suite. When Sun laid off the entire Advanced Visualization team in early 2009, I stumbled into independent open source development. Some of the former Shared Visualization customers approached me about sustaining support for VirtualGL and TurboVNC, and Cendio also approached me about improving the open source SIMD-accelerated JPEG codec they were using in a next-generation branch of TightVNC (what would eventually become TigerVNC.) 2009 was a really lean year for me, but in 2010, momentum started to pick up when Australia’s largest independent natural gas producer (Santos) decided to replace hundreds of high-end 3D workstations with eight beefy multi-GPU servers running my remote computing software. This enabled them to deploy laptops, rather than expensive 3D workstations, for most of their geoscience software users, which saved them AU$2 million in upfront hardware costs. Moving their suite of applications into the server room continues to save them about AU$1 million/year in IT costs. (Santos received the 2011 Red Hat Innovator of the Year Award for this effort.) Santos also provided a lot of one-off funding to improve the software in various ways, and they continue to provide 200 hours of sustaining yearly funding for ongoing maintenance of the projects. Cendio also contracted with me during that timeframe to add some of TurboVNC’s performance enhancements to TigerVNC, which allowed TigerVNC (and ThinLinc, upon which it is based) to perform well with VirtualGL.

It also bears mentioning that the aforementioned JPEG codec was eventually released as a standalone project, libjpeg-turbo, and I have devoted a great deal of labor over the past 11 years toward maintaining that project as well. libjpeg-turbo is now the most popular JPEG codec in the world and also an ISO reference implementation. When VirtualGL was first released, there was no other open source remote display software that provided sufficient performance for 3D applications. Thanks in part to libjpeg-turbo, there are now a variety of options, each with their own individual strengths. (libjpeg-turbo is used by ThinLinc/TigerVNC, and Cendio provides some of the funding needed to support it.)

4 Likes

Traditionally, VirtualGL has required two different X servers— a “2D X server” and a “3D X server.” The 2D X server, the X display on which the application’s GUI is rendered, could be either a client-side X server or a server-side virtual X server (“X proxy”) such as VNC. Because of VirtualGL, the 2D X server does not need an attached GPU, nor does it even need OpenGL rendering capabilities at all. The 3D X server is a dedicated server-side GPU-attached X display, and VirtualGL uses it solely for GPU access. Since VirtualGL only does off-screen rendering, using a whole X server for GPU access is overkill. Back in the Sun Microsystems days, we had an API called “GLP” that was specific to their SPARC OpenGL implementation, and that allowed VirtualGL to access the GPU on SPARC systems without using a 3D X server. Such did not become possible on x86 systems until 2014, when the EGL_EXT_platform_device extension was released for EGL. In 2015, I started investigating the possibility of using device-based EGL to implement a GLP-like back end for VirtualGL, but since VirtualGL has to emulate OpenGL window rendering using Pbuffers, it needs Pbuffers that support double buffering. Unfortunately, EGL Pbuffers do not, so it was necessary to emulate double-buffered Pbuffers using OpenGL renderbuffer objects (RBOs.) That proved to be extremely difficult, and it ultimately required about 500 hours to complete the proof-of-concept EGL back end. (You can take it for a test drive in the VirtualGL 3.0 evolving pre-release.) About 440 hours of that was funded by four separate companies, and 60 hours is currently unfunded. It will probably take another 30-40 hours to document and release the feature (for which there is also currently no funding.)

VirtualGL could potentially use EGL on the front end as well, interposing EGL/X11 commands to support 3D applications that use that API rather than GLX. That project is much more straightforward but is also in need of funding.

I have looked into the possibility of supporting Vulkan applications with VirtualGL, but there are two major technical hurdles at the moment:

  1. Vulkan implementations for some popular GPUs are hard-coded to use the 2D X server (more specifically, the X server specified in the DISPLAY environment variable) for 3D rendering, and they require a vendor-specific X extension to be present in that X server. At the moment, there is no robust way to work around that in VirtualGL, and there is no way to implement Vulkan support at all in VirtualGL without requiring a 3D X server.
  2. It is unclear how to handle Vulkan swap chains in VirtualGL.

Wayland is a bit of a wildcard from VirtualGL’s point of view. Wayland applications control the creation and disposal of off-screen rendering buffers, into which they can render whatever they want (2D or 3D), and the applications also direct the Wayland compositor to display those buffers. Thus, at least on the surface, Wayland may eventually eliminate the need for a VirtualGL-like solution. The developers of Weston, an open source reference implementation of a Wayland compositor, have proven that it can support both a remote display back end and hardware-accelerated OpenGL, but the technology is still experimental. In a more general sense, technical computing software vendors tend to wait until the last possible minute to adopt new technologies, so as long as operating systems support both Wayland and X11, I predict that there will not be a big push toward Wayland among those vendors. Wayland has great potential, since it could just as easily support virtual displays (off-screen compositors that inherently have remote display and multi-session capabilities) as it could support physical displays. However, GPU vendors and operating system distributors need to recognize that potential before it can become a reality. I envision at least one intermediate step being necessary, which will be for a VirtualGL-like interposer to intercept Wayland/EGL commands from 3D applications and convert them into device-based EGL commands. This would be conceptually similar to the proposed EGL/X11 interposer that I mentioned earlier.

4 Likes

Most of the challenges relate to the fact that I am an independent open source developer without a lot of resources. Almost every major feature that has been implemented in VirtualGL and my other open source projects since 2009 has been implemented because a company or other organization needed that feature badly enough to pay for my labor to implement it. The value proposition for potential project sponsors is that, by using VirtualGL, they are already reaping the benefit of decades of labor. It would probably cost US$millions to create such a project from scratch, but organizations can instead pay US$hundreds or US$thousands to fund the development of a specific feature or fix they need. Still, though, I constantly have to remind people that, although the software is provided free of charge, it isn’t free to produce. I have to hustle just to make enough money to pay my bills. Depending on the year, I earn about 1/4 to 1/3 of what a software engineer with my experience and skill set would normally earn in the U.S. market. I am willing to do that because I believe strongly in what VirtualGL and my other open source projects are accomplishing, but those projects are always on the ragged edge. Without “general sponsors”— companies and other organizations that provide a fixed amount of funding every year that can be used for project maintenance, spinning releases, build system enhancements, investigating and fixing issues reported by the community, answering questions from the community, and other necessary labor that is unlikely to attract outside funding— the projects wouldn’t survive. However, our level of general funding is only about half of what the projects need in order to be healthy and sustainable. I almost always have to eat significant labor cost— sometimes hundreds of hours’ worth— in order to finish a major release. If I cease being able to pay my bills through open source development, then I would have no choice but to pursue a different career, which would force me to abandon full-time work on VirtualGL and my other open source projects. In 2018, that almost happened. When major releases from all three projects landed in the same year, I went into debt. As an independent developer, I don’t get paid for vacation or sick days or bathroom breaks or lunch breaks. In the U.S., self-employed people have to pay both the employer’s and employee’s share of Medicare and Social Security, and individual health insurance is much more expensive than the group health insurance that corporate employees can typically get. Thus, I take home much less money than a corporate employee with the same gross income would. (And, again, corporate employees in my line of work typically have a much higher gross income to begin with.) I haven’t been able to meaningfully contribute to my retirement account since 2009, and in fact, I had to start borrowing against the interest on it a few years ago in order to pay my bills. I mention all of this not to complain. There are many other people in the world who have it much worse than I do. I mention all of this primarily to underscore the fact that these open source projects, which are making or saving millions of dollars for multiple companies worldwide, are not on stable financial footing. Stabilizing the project revenue streams would make it easier to speculate on new feature development, and it would increase my efficiency, since I wouldn’t have to spend as much time hustling.

Independent open source development is always a delicate balancing act. You have to give away a certain amount of free milk but not so much that people won’t buy the cow. Bug fixes are always free-- covered by General Fund money or, when that money inevitably runs out halfway into the fiscal year, by pro bono labor. That is just good PR— making sure that the software is always as stable and intuitive and performant as it can be. Doing so attracts new users, and the more new users there are, the more likely it is that some of them work for companies that can fund the development of VirtualGL. This patronage-based business model has deep trade-offs, though. The upside is that no one organization controls the agenda of my open source projects, meaning that they are not beholden to any particular variety of CPU or GPU or operating system. Any feature that makes sense can probably happen as long as someone is willing to pay for it. The downside, however, is that lack of funding frequently makes it difficult to work on the features that the projects need the most. Sometimes I have to speculate, to use pro bono labor to implement a feature or finish a release in order to attract new funded development. Long story short: the biggest challenge is money.

A related challenge is lack of access to the latest hardware. Basically all of the equipment I use to develop and test VirtualGL was donated or reimbursed by corporate sponsors. I was able to purchase a new Linux workstation using part of a grant that libjpeg-turbo was awarded in 2019, but prior to that, my newest Linux machine was nearly 10 years old. I test most operating systems using VMWare, which works reasonably well for most purposes, but sometimes, I’ll encounter issues that can’t be reproduced on my equipment (usually because the issues involve interaction between a specific O/S and specific hardware.) I have one recent AMD GPU and one old (2012 vintage) nVidia GPU at my disposal. I have no access to commercial 3D applications for testing, so if an interaction issue between VirtualGL and such an application can’t be reproduced in any other way, I’m usually stuck unless someone can provide access to the application (which isn’t always possible.) I also have no access to the official Khronos GLX conformance tests, because they charge tens of thousands of dollars for that. Thus, I basically had to write my own.

The technical challenges around developing VirtualGL mostly have to do with “stupid application tricks” and “stupid driver tricks.” It’s less the case now than it used to be, but historically, commercial 3D applications did some really “unique” things. Thus, most of the complexity in VirtualGL stems from workarounds for such applications. In terms of drivers, there are unfortunately some vendor-specific GLX implementations that are just plain wrong. Even more unfortunately, one of them is from a major GPU vendor, who has yet to fix the conformance issues despite me providing test cases and supporting documentation for all of them. Thus, I had to devote significant labor (mostly pro bono) toward working around some of those issues in VirtualGL. I wasn’t able to work around all of them, though, so I continue to have to field tech support requests from users of those GPUs, explaining why VirtualGL can’t currently work properly with them.

Again, I mention all of this not to complain but rather to paint the picture that, despite being a very lean operation with low overhead and relatively modest needs in terms of money and equipment, it’s still a struggle to get what I need-- even when what I need is just a little bit of tech support from a particular vendor.

Even though VirtualGL is open source, it is too specialized to allow for much collaborative development. (Translation: it’s way too easy to break things if you don’t know what you’re doing.) Depending on the scope, it may take me dozens of hours to properly regression test, benchmark, review, clean up, and document a submitted patch that implements a major new feature. If there is no funding to cover that labor, then I have to really ponder whether the feature is strategically important enough to integrate using pro bono labor. Since I have nearly 20 years of experience with this stuff, paying me to implement a new feature is almost always less costly than paying someone else to implement it and then paying me to integrate it. I do, however, welcome patches that fix minor issues. I also welcome bug reports. Bugs are fixed for free (within reason), and every squashed bug improves the reputation and reach of VirtualGL, which potentially attracts more funded development.

I have 200 hours of funding per year from Santos, which is used for maintaining both VirtualGL and TurboVNC, but VirtualGL would be much healthier if it had 200 hours of its own. If your organization has benefited significantly from VirtualGL and is in a position to provide that general funding, please contact me.

Another way that organizations can contribute is by funding the development of specific features. You can find a list of outstanding features (most of which I consider strategically important) that are in need of funding here. And I am, of course, open to any other ideas that you might have. Funded development helps the project in two ways. In the near term, it helps pay my bills, and in the long term, the new feature helps attract new users, some of whom might be in a position to fund further development.

Organizations that fund $1000 or more of labor can opt to have their logo displayed on the VirtualGL Sponsors page.

Apart from that, you can find a link for individual donations on the landing page of VirtualGL.org. Every penny I receive goes directly toward funding the labor needed to maintain VirtualGL.

Also, the easiest way that most people can contribute is simply to test VirtualGL with their favorite applications and let me know if any issues are discovered.

I generally prefer users to file an issue in the GitHub issue tracker or to post on one of the mailing lists. This gives us a public, searchable record of the issue or question and its resolution, so others in the community can potentially benefit from it. Feel free to use either forum for bug reports or just general questions about VirtualGL. I am happy to answer them.

4 Likes

Thanks @drc (could that be the most comprehensive history of VirtualGL on the Internet today? :thinking: ) Very interesting reading.

I’d like to highlight this point.

Many people will remember a time when open source software was the domain of hobbyists and tinkerers; not any more. By some estimates the Linux kernel would cost US$612M (2011) to redevelop using proprietary methods.

You can download it for free here.

If you use VirtualGL commercially, please consider donating to or sponsoring the project. Your support makes a difference.

2 Likes

Possibly. The VirtualGL background article walks through essentially the same progression of ideas (and also provides helpful figures), but I don’t think I’ve ever written extensively about the historical context behind those ideas before.

2 Likes

The Linux Foundation estimated the value of Linux R&D at USD$10.8 billion (in 2008). Linux: $10.8 billion worth of R&D...for free - CNET

2 Likes