AMD filed a patent for a new method (PDF) to protect memory instructions against faults. The patent got filed on April 8, 2019 and was published on August 1, 2019. TechPowerUp offers a brief summary of how the system works:
The proposed method uses system's "master and slave" devices and manipulates their instruction streams and check for any errors in the process. Firstly, the proposed system converts "slave" device request to dummy operations like NOP (No OPeration) is, and modifies the memory arbiter to issue N master and N slave global/shared memory instructions per cycle, sending master memory requests to memory system. Then it uses slave requests to check for errors and enter master requests in to memory FIFO aka First In First Out memory buffer. Slave request is stored in a register. Finally two values from register, where slave request was stored, and FIFO are compared to see if there are any differences.
The patent explains the new GPU memory instruction protection system is aimed at the server, cloud, and HPC markets. It's intended as a low-cost fault detection mechanism with a reduced area cost versus EEC, as well as a method to offer protection of both on-chip memories and logic in the GPU. AMD explains this becomes necessary due to lower operating voltages and increased near-threshold operation as a design choice to control the power envelope of GPUs.
Interestingly, the patent notes that the government has certain rights in this invention as it was made with government support awarded by the Department of Energy (DOE).