University of Antwerp builds desktop supercomputer with 13 NVIDIA GPUs
Monday, December 14, 2009 - by Thomas De Maesschalck
Almost a year and a half ago researchers of the University of Antwerp in Belgium were among the first to take advantage of the latest developments in GPGPU technology to create FASTRA, a desktop PC with supercomputer power. Scientists of the ASTRA research group, part of the University of Antwerp’s Vision Lab, had only limited time allocation on the university’s CalcUA supercomputer at their disposal, and using regular PC hardware was no option as processing a dataset could take several weeks on a standard desktop PC.
Therefore they had to search for an alternative, and once they learned about GPGPU computing the researchers build a 4000EUR desktop supercomputer with four NVIDIA GeForce 9800 GX2 dual-GPU graphics cards. The results were stunning, for this niche application the eight NVIDIA GPUs outperformed the university’s three-year old 256-node supercomputer with AMD Opteron 250 2.4GHz processors. Besides the higher performance, other major advantages include the very low cost (4000EUR for FASTRA vs 3.5 million euro for the real supercomputer) and much lower power consumption.
FASTRA was one of the first illustrations of what's possible with the massive parallel computing power GPGPU computing has to offer to scientists, and it’s possible that the project inspired NVIDIA to launch the Tesla Personal Supercomputer a couple of months later. These GPU-based desktop supercomputers should not be seen as a replacement for real supercomputers though, graphics cards are very efficient for applications with highly parallel workload but they can’t match supercomputers in other areas. While some people think it’s blasphemy to refer to systems like FASTRA as supercomputers, it can’t be denied, however, that GPGPU computing is giving millions of researchers and individuals the opportunity to get supercomputing-like power on their desk. GPUs are now being adopted by real supercomputers, and Bright Side of News recently wrote that as much as nine out of ten new high-performance computing (HPC) systems will feature at least one GPU or a whole GPGPU server for evaluation purposes.
The original FASTRA system (pictured below) delivers a theoretical computing power of four teraflops and was probably one of the most powerful desktop PCs at the time, but as the researchers came across new problems in the application domain of advanced 3D image reconstruction they were craving for a system with even more power.
First a word about the kind of research the ASTRA group is doing, these scientists are developing new 3D image reconstruction techniques. The Vision Lab is specialized in the development of novel processing methods for tomography, a technique used in medical scanners to compute 3D images of the patient, based on a large number of 2D X-ray photos acquired from a range of angles. The group focuses on new computational methods that take prior knowledge into account to reconstruct more accurate 3D models of patients, or objects, from only a very limited number of X-ray photos. On their website, the ASTRA group explains they hope to bring their technique into the realm of real-world application in the field of medical imaging, materials science, and other applications.
Without GPGPU computing the computation time of large datasets took up to a week on a cluster with four quad-core PCs and FASTRA made it possible to finish the job in just 10 minutes. In the past year, they developed new algorithms that are capable of raising the spatial resolution, while still using the same input dataset. The upsampling to achieve higher resolution requires more power and increased the computing time to about one hour on FASTRA. This was too long so the researchers started working on a system that could reduce the construction time to about 15 minutes.
Here's a 6.5 minutes long video presentation that explains why they need so much computing power and how they build FASTRA II:
The ASTRA group tested some NVIDIA Tesla units but were not satisfied due to the high price and the fact that current Tesla units feature a maximum of four GPUs. While gamers do not get linear improvements by adding additional GPUs in SLI or CrossFire systems, doubling the number of GPUs in a GPGPU system like FASTRA will nearly double the performance. Therefore, the scientists once again decided to use off-the-shelve 3D gaming hardware for the creation of FASTRA II. This time the job was more complicated though, the new system features 13 GPUs and a custom BIOS from ASUS and Linux kernel hacks were required to get it operational. FASTRA II is 3.5 times faster than the original system and is likely the fastest desktop computer in the world.
Here’s a brief overview of the specifications of FASTRA II:
Lian-Li PC-P80 Armorsuit: This case was chosen because of the massive amount of working space.
ASUS P6T7 WS Supercomputer: This is the only workstation motherboard with seven full-size PCI Express slots. SLI is not required because GPGPU computation involves no communication between the GPUs.
Intel Core i7 920: Managing 13 GPUs simultaneously requires heavy multithreading on the CPU side, but as most computational load is shifted to the GPUs this model was good enough.
6x 2GB Corsair DDR3 1333MHz: Lots of RAM is crucial to load large 3D volumes completely in memory.
Samsung Spinpoint F3 1TB: This storage device was chosen as disk access is not a performance bottleneck for their research, little benefit would be gained from faster devices like SSDs.
ThermalTake Toughpower 1500W + 3x ThermalTake PowerExpress 450W: Not so long ago most enthusiasts deemed 1500W power supplies pointless but the ASTRA team required even more! The 1500W beast from ThermalTake doesn't support so many graphics cards so the team added three of ThermalTake’s 450W VGA power supplies.
ASUS ENGTX275 + 4x ASUS ENGTX295 (2PCB) + 2x ASUS ENGTX295 (1PCB): The GeForce GTX 275 is used to connect to the screen, a single-GPU card had to be used due to technical reasons which restricted the FASTRA II from using more than 13 GPUs, more info about this can be found below. The reason why two different GeForce GTX 295 cards are used is because FASTRA II has been assembled over a rather long period of time. The researchers explain on their site that they prefer the single-PCB GeForce GTX 295 cards because they generate less heat than the older model with two PCBs. NVIDIA graphics cards were used because the researchers use NVIDIA’s CUDA programming model.
Custom design GPU suspension cage: In collaboration with Tones.be and Lastertek N.V., an aluminum graphics card cage was designed. The cage resolves space issues and has the side benefit of improved ventilation.
Adex Electronics PE-FLEX16 gen. 2 risers: PCI Express x16 risers were used to connect all seven dual-slot graphics cards to the tightly spaced PCI Express slots of the motherboard.
CentOS 5.3: This operating system was picked because it provides a stable environment that requires little maintenance. A custom Linux 126.96.36.199 kernel was used for the FASTRA II project.
The system is theoretically capable of supporting 14 GPUs by inserting seven dual-GPU graphics cards but this didn’t work as the system wouldn’t boot due to memory allocation issues caused by the 32-bit BIOS. To deal with the issue, they worked together with ASUS to develop a custom BIOS that skips initialization of certain types of graphics cards during boot. To still get output from the system, one card must be of a different type and that’s why they used a GeForce GTX 275. Due to this BIOS hack, the Linux OS kernel had to be modified to leave all memory address space allocation up to the operating system. Unfortunately FASTRA II still suffers from stability issues, it is capable of performing successful computations with all 13 GPUs simultaneously, but it regularly crashes. The stability issues seem to be related to problems with the NVIDIA driver and/or the changes made to the Linux kernel. The researchers hope the publicity surrounding the launch of FASTRA II will help them to get more support to solve these issues. Technical details about the problems the ASTRA group encountered while building FASTRA II can be read on this page.
FASTRA II uses NVIDIA graphics cards because CUDA was the only available option when the team developed FASTRA I in 2008. The ASTRA group may adopt OpenCL in the future but this programming model still needs to prove itself so they decided to stick with CUDA for the development of FASTRA II. The team had already invested a significant amount of time in the development of their CUDA code so it didn't make sense to start from scratch with OpenCL to be able to use the extra horsepower of ATI's Radeon HD 5970.
Here’s a photo that shows how the cards are connected to the motherboard.
All the hard work resulted in a desktop supercomputer with a computation capacity of 12 teraFLOPS, it can be transported under your arm, and it costs less than 6000 euro. Now lets take a look at some benchmarks to see how FASTRA II performs. For the benchmarks the ASTRA group used code they use daily for their research. They have two versions of the algorithm, one for CPUs and one for GPUs. The performance of FASTRA II is compared with FASTRA, a Core i7 940 desktop computer, a Core i7 940 with NVIDIA Tesla C1060, and the university’s 512-core supercomputer cluster.
Here’s a comparison of the reconstruction speed of their tomography algorithm, the new system is about 50 percent more expensive than FASTRA I but it’s 3.6x faster, and it's more than four times as fast as the University of Antwerp’s supercomputer. Furthermore, the researchers believe the performance benefit will be even greater once they solve the remaining stability problems, since the benchmark they ran was not large enough to take full advantage of the increased speed of the GeForce GTX 295 cards.
As many people are worried about cooling, the researchers decided to pay no specific attention to cooling in the design to see what would happen. It turns out that with the case open, the temperatures don’t rise much over 60 degrees Celsius. No exotic forms of cooling are required to keep the system at acceptable temperatures. Here's some thermal footage, you can clearly see that the single-PCB GeForce GTX 295 cards are getting less hot than their dual-PCB brothers.
While the power supplies in the system can deliver up to 2850W, the actual load power consumption is only 1200W. Not bad considering the university's four-year old 512-core cluster requires 90,000W and can't even deliver a fourth of the computing power of the FASTRA II.
While FASTRA II wasn’t specifically build with power efficiency in mind, the following chart illustrates one of the reasons why GPGPU computing is such a big deal. FASTRA II is almost three times more energy efficient than its older brother, and the energy efficiency of the CalcUA supercomputer cluster is shocking. FASTRA II delivers about 12.25 slices/Wh, making it roughly 300 times more energy efficient than CalcUA, which calculates just 0.04 slices/Wh.
For under 6000 euro, the ASTRA research group managed to create a new personal desktop supercomputer that can do their tomography reconstruction calculations four times faster than a four-year old 3.5 million euro supercomputer. Some stability issues still need to be solved, but it’s clear that it’s worth all the hard work as the computing power, the performance per euro and the energy efficiency of FASTRA II is stunning. More info about the system can be found at FASTRA II.