XML stands for extensible markup language; it is a text based language that uses tags to describe structured data including other markup languages. This basically means that it is only as complex as the data (or language) you are describing. HTML is also text based language that uses tags but HTML is not XML, because HTML is designed to define how data looks rather than its structure.
A good working knowledge of XML is a very important skill for an application developer because it is the standard way of transferring information between applications within a company or between companies.
The fact that it is text based means that applications developed in any languages can understand it and it can be passed over almost any network including the internet.
It is also used in configuration files. Its widespread use means that there are a lot of XML resources on the internet and it has a lot of associated technologies including XPath, XSLT, Web Services and SOAP.
Objectives
The objectives of this module are not only to learn enough about XML to allow you to use it effectively but also be confident with Document Type Definitions and XSLT and also give you an introduction to Schemas and XPath.
What is XML?
We briefly explained what XML is in the introduction but, although correct, that was a bit of a high level definition. We now know that discrete blocks of XML are called documents.
Documents can be stored more or less anywhere but are often stored in files. A file can only contain a single XML document.
Uses of xml
XML has many uses. The fact that it is text based and describes data means that XML documents are commonly used to transfer data between different system or different tiers of a single system, either with one organisation or between different ones.
The reason that XML is used in this is because it doesn’t matter what language the two systems are written in, any language can read and write text.
Other common uses of XML are:
1. Application Log Files.
2. Configuration Files.
3. Storing documents that need to be presented in different formats, For example an XML document could have a style sheet applied that transforms it into either a web page, a MS Word document or a PDF document.
XML syntax
There is a root element that acts as the parent of all other elements in XML. Check the syntax below;
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
Now that we know what it is and what it is used for lets get on with learning the syntax.
Because XML is so flexible and you define the structure yourself it is important to be able to check that your document is well formed and valid.
Another important part of XML Syntax are namespaces. It is often useful to merge two or more XML documents together and namespaces provide a means to identify which elements came from each document and prevent name conflicts.
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>May</to>
<from>Jackson</from>
<heading>Greetings</heading>
<body>How are you? Please learn about XML</body>
</note>
DTD (Document Type Definition)
This section explains how XML documents are ensured to be valid.
XML describes the structure of data, DTDs provide a means to ensure that XML documents are valid and contain exactly the data it should do. XML documents contain elements, attributes and entities.
<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Schemas
This section explains schemas. It is envisaged that schemas will replace DTDs. Schemas are actually XML documents that are used to validate other XML documents, but they are are more powerful and even allow some validation of the data.
However DTDs are still more widely used than Schemas so for now just remember what they are.
Transforming XML
This section explains XML transformations.
Just like any data, once you have got it you usually need to manipulate it.
The structured nature of XML means that it is possible to manipulate it without loading the data into Java objects. This process is known as transformation and uses a technology called XSL. XSL stands for extensible stylesheet language and is made up of three sub-technologies.
1. XSLT - XSL Transformations are used to transform XML documents.
2. XPath - XPath is a languaged used to navigate XML documents. We'll cover XPath in the next section.
3. XSLFO - XSL Formatting Objects are used to apply formatting to XML documents. We will not be covering XSLFO in this module.
Using these technologies you can create new text based documents in any format (not just XML) that are made up of some or all of the data in your original XML document. The objective being to transform the data into a format understandable by your application or even ready for presentation to the user of your application.
A common use of XSL is to transform XML documents into HTML documents.
XPath
This section explains XPath.
XPath is a fundamental part of XSL and is used to find and navigate data within your XML documents.
XPath is basically a set of expressions and functions (not tags) that you can use within the attributes of your XSL tags.
XSLT
XSLT – EXtensible Stylesheet Language
Now that you have an overview of the XSL technologies you need to get down to the nitty-gritty of XSLT.
XSLT is the cornerstone of XSL. It uses elements (tags) to provide common programming constructs like loops and condition tests. But XSL relies on XPath to provide functions and a way to reference data.
Summary
XML is an important skill for any developer to have in today’s market and when you look beneath the surface there is quite a lot to it. Of course there is more you could learn about XML and its associated technologies but, although this isn’t the biggest module we have covered everything you need to know. In the future Java modules we will cover how to integrate XML into your applications.