XML for the uninitiated
You may have heard of Extensible Markup Language (XML), and you may have heard many reasons why your organization should use it. But what is XML, exactly? This article explains the basics of XML — what it is and how it works.
In this article
A brief look at mark up, markup, and tags
To understand XML, it helps to understand the idea of marking up data. People have created documents for centuries, and for just as long they have marked up those documents. For example, school teachers mark up student papers all of the time. They tell students to move paragraphs, clarify sentences, correct misspellings, and so on. Marking up a document is how we define the structure, meaning, and visual appearance of the information in the document. If you have ever used the Track Changes feature in Microsoft Office Word, you have used a computerized form of mark up.
In computing, "mark up" has also evolved into "markup." Markup is the process of using codes called tags (or sometimes tokens) to define the structure, the visual appearance, and — in the case of XML — the meaning of any data.
The HTML code for this article is a good example of computer markup at work. If you browse through it (in Microsoft Internet Explorer, right-click the page, and then click View Source), you will see a mix of readable text and Hypertext Markup Language (HTML) tags, such as <p> and <h2>. Tags in HTML and XML documents are easy to recognize because they are surrounded by angle brackets. In the source code for this article, the HTML tags do a variety of jobs, such as define the beginning and end of each paragraph (<p> ... </p>) and mark the location of each image.
So what makes it XML?
HTML and XML documents contain data that is surrounded with tags, but that is where the similarities between the two languages end. In HTML, the tags define the look and feel of your data — the headlines go here, the paragraph starts there, and so on. In XML the tags define the structure and meaning of your data — what the data is.
When you describe the structure and meaning of your data, you make it possible to reuse that data in any number of ways. For example, if you have a block of sales data and each item in the block is clearly identified, you can load just the items that you need into a sales report and load other items into an accounting database. Put another way, you can use one system to generate your data and mark it up with XML tags, and then process that data in any number of other systems, regardless of the hardware platform or operating system. That portability is why XML has become one of the most popular technologies for exchanging data.
Remember these facts as you proceed:
You cannot use HTML in place of XML. You can, however, wrap your XML data in HTML tags and display it in a Web page.
HTML is limited to a predefined set of tags that all users share.
XML allows you to create any tag that you need to describe your data and the structure of that data. For instance, say that you need to store and share information about pets. You can create the following XML code:
You can see that XML tags make it possible to know exactly what kind of data that you are looking at. For example, you know this is data about a cat, and you can easily find the cat's name, age, and so on. The ability to create tags that define almost any data structure is what makes XML "extensible."
But don't confuse the tags in that code sample with tags in an HTML file. For instance, if you paste that XML structure into an HTML file and view the file in your browser, the results will look something like this:
Izzy Siamese 6 yes no Izz138bod Colin Wilcox
The browser ignores your XML tags and displays just the data.
A word about well-formed data
You may hear someone from your IT department mention "well-formed" XML. A well-formed XML file conforms to a set of very strict rules that govern XML. If a file doesn't conform to those rules, XML stops working. For example, in the previous code sample, every opening tag has a closing tag, so the sample adheres to one of the rules for being well-formed. If you remove a tag and try to open that file in one of the Office programs, you will see an error message, and the program will stop you from using the file.
You don't necessarily need to know the rules for creating well-formed XML (though they are easy to understand), but you do need to remember that you can share XML data among programs and systems only if that data is well-formed. If you can't open an XML file, chances are that file isn't well-formed.
XML is also platform-independent, meaning that any program built to use XML can read and process your XML data, regardless of the hardware or operating system. For example, with the right XML tags, you can use a desktop program to open and work with data from a mainframe computer. And, regardless of who creates a body of XML data, you can work with the same data in several of the Microsoft Office 2003 and Microsoft Office Professional 2007 programs, including Microsoft Office Access 2007, Microsoft Office Word 2007, Microsoft Office InfoPath 2007, and Microsoft Office Excel 2007. Because it is so portable, XML has become one of the most popular technologies for exchanging data between databases and user desktops.
In addition to tagged, well-formed data, XML systems typically use two additional components: schemas and transforms. The following sections explain how these additional components work.
A quick look at schemas
Don't let the term "schema" intimidate you. A schema is just an XML file that contains the rules for what can and cannot reside in an XML data file. Schema files typically use the .xsd file name extension, while XML data files use the .xml extension.
Schemas allow programs to validate data. They provide the framework for structuring data and ensuring that it makes sense to the creator and any other users. For example, if a user enters invalid data, such as text in a date field, the program can prompt the user to enter the correct data. As long as the data in an XML file conforms to the rules in a given schema, any program that supports XML can use that schema to read, interpret, and process the data. For example, as shown in the following illustration, Excel can validate the <CAT> data against the CAT schema.
Schemas can become complex, and teaching you how to create one is beyond the scope of this article. (Besides, you probably have an IT department that knows how.) However, it helps to know what schemas look like. The following schema defines the rules for the <CAT> ... </CAT> tag set.
<xsd:element name="NAME" type="xsd:string"/>
<xsd:element name="BREED" type="xsd:string"/>
<xsd:element name="AGE" type="xsd:positiveInteger"/>
<xsd:element name="ALTERED" type="xsd:boolean"/>
<xsd:element name="DECLAWED" type="xsd:boolean"/>
<xsd:element name="LICENSE" type="xsd:string"/>
<xsd:element name="OWNER" type="xsd:string"/>
Don't worry about understanding everything in the sample. Just keep these facts in mind:
The line items in the sample schema are called declarations. If you needed additional information about an animal, such as its color or markings, chances are that your IT department would add a declaration to the schema. You can change your XML system as your business needs evolve.
Declarations provide a tremendous amount of control over the data structure. For instance, the <xsd:sequence> declaration means that the tags, such as <NAME> and <BREED> , have to occur in the order that they are listed above. Declarations can also control the types of data that users can enter. For example, the schema above requires a positive number for the cat's age, and Boolean (TRUE or FALSE) values for the ALTERED and DECLAWED tags.
When the data in an XML file conforms to the rules provided by a schema, that data is said to be valid. The process of checking an XML data file against a schema is called (logically enough) validation. The big advantage to using schemas is that they can help prevent corrupted data. They also make it easy to find corrupted data because XML stops when it encounters a problem.
A quick look at transforms
As we mentioned earlier, XML also provides powerful ways to use or reuse data. The mechanism for reusing data is called an Extensible Stylesheet Language Transformation (XSLT), or simply, a transform.
You (okay, your IT department) can also use transforms to exchange data between back-end systems, such as databases. For instance, say that Database A stores the sales data in a table structure that works well for the sales department. Database B stores the revenue and expense data in a table structure that is tailored for the accounting department. Database B can use a transform to accept data from A and write that data to the correct tables.
The combination of data file, schema, and transform constitutes a basic XML system. The following illustration shows how such systems typically work. The data file is validated against the schema and then rendered in any number of usable ways by a transform. In this case, the transform deploys the data to a table in a Web page.
The following code sample shows one way to write a transform. It loads the <CAT> data into a table on a Web page. Again, the point of the sample isn't to show you how to write a transform, but to show you one form that a transform can take.
<TR ALIGN="LEFT" VALIGN="TOP">
This sample shows how one type of transform might look when it is coded, but remember that you can just describe what you need from the data in plain English. For example, you can go to your IT department and say that you need to print the sales data for particular regions for the past two years, "and I need it to look this way." Your IT department can then write (or change) a transform to do that job.
What makes all of this even more convenient is that Microsoft and a growing number of other vendors are creating transforms for jobs of all sorts. In the future, chances are that you will be able to download a transform that either meets your needs or that you can adjust to suit your purpose. That means XML will cost less to use over time.
A peek at XML in the Microsoft Office System
The professional editions of Microsoft Office 2003 and 2007 Office release provide extensive XML support.
Office Excel 2007, Office Word 2007, and Office PowerPoint 2007 use XML as their default file formats, a change that has several advantages:
Smaller file sizes. The new format uses ZIP and other compression technologies to reduce file size by as much as 75 percent compared to the binary formats that are used in earlier versions of Office.
Easier information recovery and greater security. XML is human readable, so if a file becomes damaged, you can open the file in Microsoft Notepad or another text reader and recover at least some of your information. Also, the new files are more secure because they cannot contain Visual Basic for Applications (VBA) code. If you use the new format to create templates, any ActiveX controls and VBA macros reside in a separate, more secure section of the file. In addition, you can use tools, such as Document Inspector, to remove any personal data. For more information about using Document Inspector, see the article Remove hidden data and personal information from Office documents.
Greater portability and flexibility. Because XML stores data in a text format instead of a proprietary binary format, your customers can define their own schemas and use your data in more ways, all without having to pay royalties. For more information about the new formats, see Introduction to Open XML File Formats.
Some of the Office programs use XML in the background, and some, such as Microsoft Office OneNote™, don't support it at all. The best way to learn how an Office program supports XML is to start the online Help for that program and search on XML.
So far so good, but what if you have XML data with no schema? The Office programs that support XML have their own approaches to helping you work with the data. For instance, Excel infers a schema if you open an XML file that doesn't already have one. Excel then gives you the option of loading this data into a read-only file or of mapping the data into either an XML list (in Microsoft Office Excel 2003) or an XML table (in Office Excel 2007). You can use the XML lists and tables to sort, filter, or add calculations to the data.
Office Professional 2007 and Microsoft Office 2003 provide the same sets of XML tools. In Office Professional 2007, you must first enable XML support, and then you start the tools from different locations. However, after you start the tools, they work the same in Microsoft Office 2003 and Office Professional 2007.
Enable the XML tools in Office Excel 2007
In Excel, click the Microsoft Office Button , and then click Excel Options.
Under Top options for working with Excel, select Show Developer tab in the Ribbon, and then click OK.
Note The Ribbon is part of the Microsoft Office Fluent user interface
Start the XML tools in Office Excel 2007
On the Developer tab, click any available command in the XML group.
Start the XML tools in Office Access 2007
Click the External Data tab.
Do one of the following:
In the Import group, click XML File.
In the Export group, click More, and then click XML File.
The links in the following sections take you to information about using XML in various Office programs and about writing XML code.
Using XML in 2007 Office release
Using XML in Microsoft Office 2003
Writing XML code
XML Developer's Center (MSDN)
Books about XML
For developers and IT specialists