Tom's Hardware has published an editorial about the latest developments in the GPGPU computing world. The site talks about Intel's upcoming Larrabee and compares it to NVIDIA's CUDA programming model - both are platforms that may potentially deliver a significant boost in computing performance.
The reason why Intel is first going to tap into the gamer market with Larrabee is pretty simple; even if a computer chip could vastly improve the performance of many applications it's not easy to sell it if these applications don't exist and without a large install base not many developers will be inclined to write applications that take advantage of these technologies. It's the same problem AGEIA faced with their PhysX cards before NVIDIA bought them and created a CUDA version of PhysX that can now run on millions of GeForce cards. Intel will basically do the same as NVIDIA (and AMD), it will attempt to build a large install base by selling multi-purpose graphics cards to gamers.
Intel sees graphics cards that run general applications as a threat to its processor sales and that's one of the prime reasons why the chip giant is developing the many-core Larrabee architecture. Currently NVIDIA has the advantage with its CUDA programming model as it already has an install base of over 70 million GeForce GPUs that support CUDA. By the time Intel's Larrabee makes it to the market in late 2009 or 2010 there will be more than 100 million GPUs from NVIDIA that support CUDA. Here's a snip from Tom's Hardware's article:
Our developer sources partially confirmed and partially denied those claims. On Nvidia’s side, it appears that a carelessly programmed CUDA application still run faster than what you would get from a CPU, but you do need sufficient knowledge of the GPU hardware, such as the memory architecture, to get to the heart of the acceleration. The same is true for Intel’s Larrabee: The claim that developers simply need x86 knowledge to program Larrabee applications is not entirely correct. While Larrabee may accelerate even applications that are not programmed for it, the purpose of Larrabee is to access its complete potential and that is only possible through the vector units, which require vectorized code. Without vectorization, you will have to rely on a compiler to do that for you and it is no secret that this automated version will rarely work as well as hand-crafted code. Long story short: Both CUDA and Larrabee development benefit from the understanding of the hardware. Both platforms promise decent performance without fine tuning and without knowledge of the hardware. But there seems to be little doubt at this time that developers who understand the devices they are developing for will have a big advantage.
Interestingly, we talked to developers who believed that Larrabee will not be able to handle essential x86 capabilities, such as system calls. In fact, Intel’s published Larrabee document clearly supports this conclusion. However, Intel confirmed that Larrabee can do everything an x86 CPU can do and some of these features are actually being achieved through a micro OS that is running on top of the architecture. We got the impression that the way how Larrabee will support essential x86 features and how well they are processed will be closely watched by developers.
A key criticism of CUDA by Intel is a lack of flexibility and the fact that its compiler is tied to GPUs. This claim may be true at this time, but could evaporate within a matter of days. Nvidia says CUDA code can run on any multi-core platform. To prove its point, the company is about to roll out an x86 CUDA compiler. Our sources indicate that the software will be available as a beta by the time the company’s tradeshow Nvision08 opens its doors. In that case, CUDA could be considered to be much more flexible than Larrabee, as it will support x86 as well as GPUs (and possibly even AMD/ATI GPUs.) Even if Intel often describes x86 programming as the root of all programming, the company will have to realize that CUDA may have an edge at this point. The question will be how well CUDA code will run on x86 processors.