AMD's Bulldozer core will be succeeded in a couple of months by Piledriver based chips. Piledriver-based Trinity APUs are currently making their way into desktop systems and the Piledriver-based Vishera CPUs should follow soon, although the eight-core Vishera processor isn't expected until next year. Once Piledriver is out in full force we can start looking forward to Steamroller, and after that AMD has Excavator on its roadmap.
At this year's Hot Chips conference, AMD CTO Mark Papermaster took the time to discuss Steamroller, detailing the per-clock throughput and power efficiency tweaks made in this new core. Rumoured to be released in 2013, Steamroller intends to expand computation efficiency across its design, feed the cores faster, improve single-core performance and push on performance/Watt, thereby solving some of the weaknesses that plagued Bulldozer.
Steamroller will be a step back to the traditional dual-core, AMD gets rid of the shared front-end of the dual-core Bulldozer modules and gives Steamroller separate, dedicated decoders for each integer core, along with larger instructions caches. The revised front end is likely to be the single biggest improvement in the new core. The slide promises instruction cache misses will be reduced by up to 30 percent, branch mispredictions drop by 20 percent and the maximum per-thread instruction dispatches go up by a quarter. AMD says these number result from simulated client-focused workloads, including digital media, productivity and gaming applications
Single-core performance should be improved by higher integer execution bandwidth and decreased average load latency. The slide promises massive improvements in store handling and a 5-10 percent increase in scheduling efficiency.
In terms of power efficiency tweaks, Steamroller promises instruction fetch optimization, and dynamic L2 cache resizing. There will also be a rebalancing of the floating point, AMD says they've identified some redundancies, in the MMX units for instance, and will be re-using some hardware in order to save power and area, with no performance impact.
The chip designer also looked into the future by showing off a slide that reveals the potential of using a high-density cell library rather than hand-drawn custom logic that's generally used in high-end x86 CPUs. This new design method promises to achieve 30 percent area and power usage reductions, the same order as a full process node improvement. This improvement will not be present in Steamroller, it's reserved for a "post-Steamroller design".