AMD Steamroller core promises to solve Bulldozer weaknesses

Posted on Wednesday, August 29 2012 @ 13:18 CEST by Thomas De Maesschalck
AMD's Bulldozer core will be succeeded in a couple of months by Piledriver based chips. Piledriver-based Trinity APUs are currently making their way into desktop systems and the Piledriver-based Vishera CPUs should follow soon, although the eight-core Vishera processor isn't expected until next year. Once Piledriver is out in full force we can start looking forward to Steamroller, and after that AMD has Excavator on its roadmap.

At this year's Hot Chips conference, AMD CTO Mark Papermaster took the time to discuss Steamroller, detailing the per-clock throughput and power efficiency tweaks made in this new core. Rumoured to be released in 2013, Steamroller intends to expand computation efficiency across its design, feed the cores faster, improve single-core performance and push on performance/Watt, thereby solving some of the weaknesses that plagued Bulldozer.

AMD Steamroller architecture slide 1

Steamroller will be a step back to the traditional dual-core, AMD gets rid of the shared front-end of the dual-core Bulldozer modules and gives Steamroller separate, dedicated decoders for each integer core, along with larger instructions caches. The revised front end is likely to be the single biggest improvement in the new core. The slide promises instruction cache misses will be reduced by up to 30 percent, branch mispredictions drop by 20 percent and the maximum per-thread instruction dispatches go up by a quarter. AMD says these number result from simulated client-focused workloads, including digital media, productivity and gaming applications

AMD Steamroller architecture slide 2

Single-core performance should be improved by higher integer execution bandwidth and decreased average load latency. The slide promises massive improvements in store handling and a 5-10 percent increase in scheduling efficiency.

AMD Steamroller architecture slide 3

In terms of power efficiency tweaks, Steamroller promises instruction fetch optimization, and dynamic L2 cache resizing. There will also be a rebalancing of the floating point, AMD says they've identified some redundancies, in the MMX units for instance, and will be re-using some hardware in order to save power and area, with no performance impact.

AMD Steamroller architecture slide 4

The chip designer also looked into the future by showing off a slide that reveals the potential of using a high-density cell library rather than hand-drawn custom logic that's generally used in high-end x86 CPUs. This new design method promises to achieve 30 percent area and power usage reductions, the same order as a full process node improvement. This improvement will not be present in Steamroller, it's reserved for a "post-Steamroller design".

AMD Steamroller architecture slide 5

Source: The Tech Report


About the Author

Thomas De Maesschalck

Thomas has been messing with computer since early childhood and firmly believes the Internet is the best thing since sliced bread. Enjoys playing with new tech, is fascinated by science, and passionate about financial markets. When not behind a computer, he can be found with running shoes on or lifting heavy weights in the weight room.



Loading Comments