NVIDIA's GeForce GTX 970 graphics card is causing quite the controversy lately as enthusiasts discovered that for some reason the GTX 970's last 0.5GB of memory capacity is significantly slower than the rest.
In a surprising move, NVIDIA now updates the official GeForce GTX 970 specifications, claiming there was an error in the reviewer's guide and that the GTX 970 actually has fewer ROPs and less L2 cache memory than the GeForce GTX 980. According to the corrected specifications, the GTX 970 has 56 ROPS instead of 64 ROPS, and 1792KB L2 cache instead of 2048KB.
In laymans terms, the different structure of the GTX 970 required NVIDIA to divide the memory into two pools to prevent dramatic underutilization and to achieve optimal performance and efficiency for the GPU. You can read the full technical explanation at PC Perspective.
A quick note about the GTX 980 here: it uses a 1KB memory access stride to walk across the memory bus from left to right, able to hit all 4GB in this capacity. But the GTX 970 and its altered design has to do things differently. If you walked across the memory interface in the exact same way, over the same 4GB capacity, the 7th crossbar port would tend to always get twice as many requests as the other port (because it has two memories attached). In the short term that could be ok due to queuing in the memory path. But in the long term if the 7th port is fully busy, and is getting twice as many requests as the other port, then the other six must be only half busy, to match with the 2:1 ratio. So the overall bandwidth would be roughly half of peak. This would cause dramatic underutilization and would prevent optimal performance and efficiency for the GPU.
To avert this, NVIDIA divided the memory into two pools, a 3.5GB pool which maps to seven of the DRAMs and a 0.5GB pool which maps to the eighth DRAM. The larger, primary pool is given priority and is then accessed in the expected 1-2-3-4-5-6-7-1-2-3-4-5-6-7 pattern, with equal request rates on each crossbar port, so bandwidth is balanced and can be maximized. And since the vast majority of gaming situations occur well under the 3.5GB memory size this determination makes perfect sense. It is those instances where memory above 3.5GB needs to be accessed where things get more interesting.
What this means is that on its own and in a vacuum, access to the last 0.5GB of the memory would occur at 1/7th of the speed of the 3.5GB pool of the GTX 970's memory. In real-world gaming benchmarks the difference in performance is much less dramatic though.
Overall, we feel NVIDIA should have been more upfront about this when it launched the GTX 970 as some people may feel they were mislead by the official specifications. However, these new technical details about the GTX 970's design don't change anything in regards to the stance or position of the GTX 970 in the discrete graphics card market. At the end of the day, it's still a very good card that performs exactly as was revealed by the launch day review.