Earlier this week Internet users in North America suffered from high latency and dropping Internet connections. The issue was caused by older backbone routers with ternary content addressable memory (TCAM) with support for a maximum of 512K (524,288) routes. As the border gateway protocol (BGP) tables outgrew the space allotted for them, the routers slowed down or simply shut down.
This week's issues aren't an isolated event, the problems are expected to get worse next week as more and more providers will cross the 512K line. Some tech firms have been warning about this for quite some time but now that issues are starting to pop up it's expected that providers will finally feel a sense of urgency to resolve this problem.
A more technical explanation of the event can be read at Renesys.
The problem is real, and we still haven’t seen the full effects, because most of the Internet hasn’t yet experienced the conditions that could cause problems for underprovisioned equipment. Everyone on the Internet has a slightly different idea of how big the global routing table is, thanks to slightly different local business rules about peering and aggregation (the merging of very similar routes to close-by parts of the Internet address space). Everyone has a slightly different perspective, but the consensus estimate is indeed just under 512K, and marching higher with time.
The real test, when large providers commonly believe that the Internet contains 512K routes, and pass that along to all their customers as a consensus representation of Internet structure, will start later this week, and will be felt nearly everywhere by the end of next week.
Enterprises that rely on the Internet for delivery of service should pay close attention to the latency and reachability of the paths to customers in the coming weeks, in order to identify affected service providers upstream and work around them while they perform appropriate upgrades to their infrastructure.