Opera is running an ongoing study into the structure of Internet content. The browser maker has already crawled approx. 3.5 million pages and found only 4.13 percent of this sample was standards-compliant:
Opera also ran the pages indexed by MAMA through the W3C's validation tools to see how many conform with standards. The results show that only 4.13 percent are valid. A more startling conclusion that Opera derived from its MAMA data is that only 50 percent of sites that display a badge touting validation are actually valid. This could indicate that many sites which are initially designed with valid HTML later cease to be valid as changes are made and new content is added.
Opera analyzed page meta tags to see if there were any correlations between editing tools and validation rates. Surprisingly, Apple's iWeb delivered the highest volume of valid pages—the study shows that 81 percent of pages created with iWeb were valid. By comparison, only 3.4 percent of pages created with Adobe Dreamweaver were valid.
The initial results of Opera's study are fascinating, but its true value hasn't yet been fully unlocked. Opera's efforts to build a search engine on top of MAMA will open the door for some really exciting analysis and will enable third parties to use and repurpose the data for their own studies and projects.