The Register has some more info about the performance of Project Molecule:
The concept machine at the SC08 show was a 3U rack that contained 180 of the Atom boards, for a total of 360 cores. These boards would present 720 virtual threads to a clustered application, and have 720 GB of main memory (using 512 MB DDR2 DIMMs mounted on the board) and a total of 720 GB/sec of memory bandwidth. The important thing to realize, explained Brown, is that if the interconnect was architected correctly, the entire memory inside the chassis could be searched in one second. That memory bandwidth, Brown explained, was up to 15 TB/sec per rack, or about 20 times that of a single-rack cluster these days. This setup would be good for applications where cache memory or out-of-order execution don't help, but massive amounts of threads do help. (Search, computational fluid dynamics, seismic processing, stochastic modeling, and others were mentioned).
The other advantages that the Molecule system might have are low energy use and low cost. The aggregate memory bandwidth in a rack of these machines (that's 10,080 cores with 9.8 TB of memory) would deliver about 7 times the GB per second per watt of a rack of x64 servers in a cluster today, according to Brown. On applications where threads rule, the Molecule would do about 7 times the performance per watt of x64 servers, and on SPEC-style floating point tests, it might even deliver twice the performance per watt. On average, SGI is saying performance per watt should be around 3.5 times that of a rack of x64 servers.