Microsoft Office 12's XML file format

Posted on Thursday, June 02 2005 @ 21:41 CEST by Thomas De Maesschalck
Microsoft Office 12, which will be released in the second half of 2006, will feature new file formats. They will be based on XML and files will by default be saved under the following file names: .docx, .xlsx and .pptx

Microsoft says the new XML-based file format will make documents smaller, offer better protection against corruption and make it easier to share data with other office software. The company says the new XML based files can be up to 75 percent smaller than Office 2003 documents because of the usage of zip compression.

In the new format, each type of data within a file is segmented and stored separately. So, when one file component is corrupt, the remainder of the file will still open within the application. For example, if a chart were to become damaged, this would not prevent people from opening every other part of the document, without the charts. This is different than the binary formats used in older applications, where corruption of a particular piece of data would prevent the entire file from loading properly.

Also, for those parts that do become corrupt or damaged, Office applications can detect these defects, and attempt to "fix" a document when it is opened by restoring the proper data structure to the content. Missing or improperly written XML data can be re-written to ensure that the files are compliant to the file format specification, and to improve the chances of opening the files correctly. And because the XML format is a text file, simply compressed, it's easier for any tool or person to recover information because the content is readily transparent.

Microsoft also addresses the need to identify and protect sensitive information can be stored within documents. Comments, tracked changes and document metadata are the types of information businesses don't want leaking outside their firewalls. The Office XML Open Format stores each type of data as a separate tag within the file, making it easy to detect and remove specific types of content. For example, the comments that are stored inside a document as part of a review can be detected and removed before the document travels outside the company. In fact, a developer could write a solution to ensure that Web pages that are about to be published do not contain documents with embedded comments.

The Office XML Open Format also helps to improve security against documents with embedded code or macros. By default, the new Word, Excel and PowerPoint file formats will not execute embedded code. So, if a person receives an e-mail message with a Word document attached, he or she could open that attachment knowing that the document would not execute harmful code. The Microsoft Office XML Open Format will include a special-purpose format with a separate file extension for files with embedded code, enabling IT staff to quickly identify files that contain such code.

Users of previous Office versions will get an update to make their software compatible with the new Office 12 formats. Microsoft will also unveil a tool to transform large batches of documents into the new format.

While Office has been able to create XML documents for years the new Office 12 Open XML Formats will become the standard choice when saving a document and they will also take less space on your hard drive because they will use a zip compression.

About the Author

Thomas De Maesschalck

Thomas has been messing with computer since early childhood and firmly believes the Internet is the best thing since sliced bread. Enjoys playing with new tech, is fascinated by science, and passionate about financial markets. When not behind a computer, he can be found with running shoes on or lifting heavy weights in the weight room.

Loading Comments