Theo Valich from Bright Side of News has published an overview of some of the rumors that are circulating the web about NVIDIA's next-generation GT300 GPU. It appears we can expect some big architecture changes in this DirectX 11 compatible chip, it's believed the GT300 will have MIMD (Multiple-Instruction, Multiple Data) units rather than SIMD (Single Instruction, Multiple Data) and a more granular scratch cache that enables larger interactivity between the cores inside the cluster.
These changes should result in major improvements in both single and double-precision performance for GPGPU computing applications. Valich guesstimates a single precision computing power of up to 3 teraFLOPS and claims the double precision performance of the GT300 GPU could be between 6-15x as much as the GT200. This would put the GT300's double precision performance between 468 gigaFLOPS and 1.17 teraFLOPS!
The flagship GT300 is expected to have 16 groups with 32 cores, resulting in a total of 512 cores. A launch date of the GT300 (GeForce GTX 380?) is unknown, but it should be within a half year or so.
GT300 architecture groups processing cores in sets of 32 - up from 24 in GT200 architecture. But the difference between the two is that GT300 parts ways with the SIMD architecture that dominate the GPU architecture of today. GT300 Cores rely on MIMD-similar functions [Multiple-Instruction Multiple Data] - all the units work in MPMD mode, executing simple and complex shader and computing operations on-the-go. We're not exactly sure should we continue to use the word "shader processor" or "shader core" as these units are now almost on equal terms as FPUs inside latest AMD and Intel CPUs.
GT300 itself packs 16 groups with 32 cores - yes, we're talking about 512 cores for the high-end part. This number itself raises the computing power of GT300 by more than 2x when compared to the GT200 core. Before the chip tapes-out, there is no way anybody can predict working clocks, but if the clocks remain the same as on GT200, we would have over double the amount of computing power.
If for instance, nVidia gets a 2 GHz clock for the 512 MIMD cores, we are talking about no less than 3TFLOPS with Single-Precision. Dual precision is highly-dependant on how efficient the MIMD-like units will be, but you can count on 6-15x improvement over GT200.