Speaking to The Inquirer, NVIDIA Tesla GM Sumit Gupta revealed that eye popping figures of speed ups of 100x to 200x achieved with GPGPU computing were not because of the sheer power provided by GPUs but because the original code was poorly optimized. Gupta admits the actual speed up is between 5x to 10x, and in some cases only 2x.
Gupta said, "Most of the time when you saw the 100x, 200x and larger numbers those came from universities. Nvidia may have taken university work and shown it and it has an 100x on it, but really most of those gains came from academic work. Typically we find when you investigate why someone got 100x [speed up] is because they didn't have good CPU code to begin with. When you investigate why they didn't have good CPU code you find that typically they are domain scienctists not computer science guys - biologists, chemists, physics - and they wrote some C code and it wasn't good on the CPU. It turns out most of those people find it easier to code in CUDA C or CUDA Fortran than they do to use MPI or Pthreads to go to multi-core CPUs, so CUDA programming for a GPU is easier than multi-core CPU programming."
According to Gupta, those users that have optimised their code to squeeze most of the performance out of the CPU can get somewhat more sedate performance gains. "Most people we find who have optimised CPU code, and really you'll only find optimised CPU code in the HPC world, get between 5x to 10x speed up, that's the average speed up that people get. In some cases it's even less, we've seen people getting speed ups of 2X but they are delighted with 2x because there is no way for them to get a sustainable 2X speed up from where they are today," said Gupta.