If you want to learn more about the architecture of NVIDIA's new GT200 GPUs I suggest you read the new article on Beyond3D. It's an interesting read about the GeForce GTX 200 series' architecture but for benchmarks you'll have to look elsewhere.
GT200 demonstrates subtle yet distinct architectural differences when compared to G80, the chip that pioneered the basic traits of this generation of GPUs from Kirk and Co. As we've alluded to, G80 led a family of chips that have underpinned the company's dominance over AMD in the graphics space since its launch, so it's no surprise to see NVIDIA stick to the same themes of execution, use of on-chip memories, and approach to acceleration of graphics and non-graphics computation.
At its core, GT200 is a MIMD array of SIMD processors, partitioned into what we call clusters, with each cluster a 3-way collection of shader processors which we call an SM. Each SM, or streaming multiprocessor, comprises 8 scalar ALUs, with each capable of FP32 and 32-bit integer computation (the only exception being multiplication, which is INT24 and therefore still takes 4 cycles for INT32), a single 64-bit ALU for brand new FP64 support, and a discrete pool of shared memory 16KiB in size.