Erratum 298 will be described as follows: "The processor operation to change the accessed or dirty bits of a page translation table entry in the L2 from 0b to 1b may not be atomic. A small window of time exists where other cached operations may cause the stale page translation table entry to be installed in the L3 before the modified copy is returned to the L2. In addition, if a probe for this cache line occurs during this window of time, the processor may not set the accessed or dirty bit and may corrupt data for an unrelated cached operation. The system may experience a machine check event reporting an L3 protocol error has occurred. In this case, the MC4 status register (MSR 0000_0410) will be equal to B2000000_000B0C0F or BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be equal to 26h."The patch can be downloaded by everyone but AMD recommends to not use this patch on a regular Linux system.
Wahlig describes the workings of the Linux patch, as well, which bypasses the BIOS workaround and emulates "Accessed and Dirty bits" in order to prevent the erratum from rearing its head:
The basis for the kernel patch solution depends on the root cause of the L2 eviction problem. The only exposure for the problem is when the TLB needs to set an A or D bit in a page table entry. If the TLB never needs to set an A or D bit, the bug cannot occur. By emulating the A and D bits with the help of the Present and Writable bits, the patch will ensure the real A and D bits are always preset. It works by forcing a page fault when the first access is made to a page with the emulated A bit not set, and when the first write access is made to a writable page with the emulated D bit not set. Emulated A and D bits are stored in bits generally available to the OS in the page table entry.
AMD TLB bug explained
Posted on Thursday, Dec 06 2007 @ 12:41 CET by Thomas De Maesschalck