The result is that the computational world is suddenly more complex. Not only are there CPUs of every type and variety, there are also now GPUs for data parallel workloads. Just as the computational power of these products varies, so does the programmability and the range of workloads for which they are suitable. Parallel computing devices such as GPUs, Cell and Niagara tend to be hit or miss – they are all hopeless for any single threaded application and frequently are poor performers for extremely branch-intensive, unpredictable and messy integer code, but for sufficiently parallel problems they outperform the competition by factors of ten or a hundred. Niagara and general purpose CPUs are more flexible, while GPUs are difficult to use with more sophisticated data structures and the Cell processor is downright hostile to programmers.
Ironically, of the two GPU vendors NVIDIA turned out to have the most comprehensive and consistent approach to general purpose computation – despite the fact that (or perhaps because) ATI was purchased by a CPU company. This article focuses exclusively on the computational aspects of NVIDIA’s GPUs, specifically CUDA and the recently released GT200 GPU which is used across the GeForce, Tesla and Quadro product lines. We will not delve into the intricacies of the modern 3D pipeline as represented by DX10 and OpenGL 2.1, except to note that these are alternative programming models that can be mapped to CUDA.
NVIDIA GT200's parallel architecture investigated
Posted on Sunday, Sep 14 2008 @ 12:16 CEST by Thomas De Maesschalck