Where Silvermont (and AMD’s Kabini / Jaguar / Puma) were all dual-issue decoders, Goldmont has three decoder units, and a maximum of 20 bytes decoded per cycle. The fetch and instruction cache pipelines are no longer coupled, large page support have both been added, and there’s a small L2 “precode” cache (16K) that didn’t exist on prior Atom processors. Goldmont’s triple-wide decoder is matched by its ability to retire up to three instructions per cycle, and the chip is capable of executing one load and store per clock cycle (Silvermont could only perform one load or store per clock cycle). Three simple integer operations can be executed per cycle and address generation is now out-of-order in Goldmont (Silvermont generated and scheduled memory addresses in-order, but could complete them out-of-order.)
Some more details about the Intel Goldmont architecture
Posted on Tuesday, November 15 2016 @ 17:42 CET by Thomas De Maesschalck