NVIDIA's chief scientist Bill Dally talked about the future of the GPU at the Supercomputing 2010 conference. One of the things he revealed is a concept for a 10 teraflops chip, you can read more about it at EE Times.
In his talk, Dally described a graphics core that can process a floating point operation using just 10 picojoules of power, down from 200 picojoules on Nvidia's current Fermi chips. Eight of the cores would be packaged on a single streaming multiprocessor (SM) and 128 of the SMs would be packed into one chip.
The result would be a thousand-core graphics chip with each core capable of handling four double precision floating-point operations per clock cycle—the equivalent of 10 teraflops on a chip. A chip with just eight of the cores would someday power a handset, Dally said.