The purpose of this article is to introduce some technical XML principles.
Learning goals
The two most fundamental ideas you should start believing are the following:
Below, we shall introduce the technical barebones of XML.
An XML document usually includes:
<my_tag>contents</my_tag>
or tags without contents like <self_closing_tag/>
<my_tag style="green">....</my_tag>
. style
would be the attribute and "green">
the attribute value.XML documents are trees
For a computer person, an XML document has a so-called tree structure. We also call it “boxes within boxes”. Inside a browser or most other clients, the document is represented as a tree-based data structure, the so-called Document Object Model (DOM)
Below is an XML fragment for a CALS (Docbook) table example:
<?xml version="1.0"?>
<TABLE>
<TBODY>
<TR>
<TD>Pierre Muller</TD>
<TD>http://pm.com/</TD>
</TR>
<TR> <TD>Elisabeth Dupont</TD> <TD></TD> </TR>
</TBODY>
</TABLE>
A graphical representation of this tree looks like this:
If we look at this as "boxes within boxes", there is a TABLE
box that includes a TBODY
box. The TBODY
box includes two TR
boxes. Each of the latter include two TD
boxes. In the XML markup language, each tree element or box starts with <TAG>
(e.g. ), and ends with a
</TAG>
(e.g. </TR>
)
The totally different example below (inspired by Wikipedia's Wikipedia RDF entry) shows an RDF fragment, i.e. a language that allows to define relationships between concepts.
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://edutechwiki.unige.ch/en/XML_principles">
<dc:title>DKS</dc:title>
<dc:publisher>EduTechWiki</dc:publisher>
</rdf:Description>
</rdf:RDF>
The rdf:RDF "box" includes a ref:Description box, that in turn include a dc:title and dc:publisher boxes.
All XML documents must be well-formed. XML documents can be valid with respect to a grammar (also called schema, document type, language, etc.). See below for details.
Any XML document must be at least well-formed. Well-formed XML documents obey the following rules:
(1) A document must start with an XML declaration, including the XML version number
<?xml version="1.0"?>
You may specify an encoding scheme (default is utf-8). Of course this means that you'll have to stick to this encoding ! Make sure to check your editor's settings.
<?xml version="1.0" encoding="ISO-8859-1"?>
We suggest not to use any other encoding than UTF-8. However, you may have to deal with legacy XML documents that do use a restricted encoding scheme like ISO-8859-1.
(2) The XML structure must be hierarchical
<i>...<b>...</i> .... </b>
(3) Attributes must have values and values must be quoted:
(4) A single root element per document
(5) Special characters: <, &, >,", and ’. Use one of the five predefined characters:
< & > " '
instead of
<, &, >, ", '
This principle also applies also to URLs !!
Example of a minimal well-formed XML document:
<?xml version="1.0" ?>
<page updated="jan 2007">
<title>Hello friend</title>
<content> Here is some content :) </content>
<comment> Written by DKS/Tecfa </comment>
</page>
This example:
Names used for elements should start with a letter and only use letters, numbers, the underscore, the hyphen and the period (no other punctuation marks) !
When you want to display data that includes "XMLish" things like the < sign that should not be interpreted, then you can use so called CDATA Sections:
<?xml version="1.0" ?>
<example>
<!CDATA[
(x < y) is a math expression
]]>
</example>
Un valid document must be:
Kinds of XML grammars
The exists several types of XML grammars.
It is possible to use several vocabularies within a well-formed document. If the markup language formally includes compound languages, such documents also can be validated
Now, image that you just could mix tags from different languages together. The problem would be that the client application could not know which tags belong to which XML language. Also, there could be so-called naming conflicts (e.g. "title" does not means the same thing in XHTML and SVG). To address these problems so-called name-spaces have been invented, one can prefix element and attribute names with a label that represents a name space
Declaring name spaces for additional vocabularies
The "xmlns: name_space" attribute allows to introduce a new vocabulary. It tells that all elements or attributes prefixed by "name_space" belong to a different vocabulary
Syntax:
SVG within (true) XHTML example
<?xml version="1.0" ?>
<html xmlns:svg="http://www.w3.org/2000/svg">
<svg:rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
Note: This example only works if the *.xhtml file is served as XML from the server. On your local PC, you can try to rename the file into *.xml.
Xlink example:
XLink is a language to define links (only works with Firefox-based browsers)
<?xml version="1.0" ?>
<RECIT xmlns:xlink="http://www.w3.org/1999/xlink">
<INFOS>
<Date>30 octobre 2003 - </Date>
<Auteur>DKS - </Auteur>
<A xlink:href="http://jigsaw.w3.org/css-validator/check/referer"
xlink:type="simple">CSS Validator</A>
</INFOS>
Namespace declaration for the main vocabulary
The main vocabulary can be introduced by an attribute like:
xmlns="URL_name_of_name_space"
Some specifications (e.g. SVG or XHTML) require a name space declaration in any case (even if you do not use any other vocabulary) !
SVG namespace example
<?xml version="1.0" ?>
<svg xmlns="http://www.w3.org/2000/svg">
<rect x="50" y="50" rx="5" ry="5" width="200" height="100" ....
What are Namespace URLs ?
URLs that define namespaces are just names, there doesn’t need to be a real link. E.g. for your own purposes you could very well make up something like:
<?xml version="1.0" ?>
<account xmls:pin = "http://joe.miller.com/pin">
<pin:name>Joe</pin:name>
</account>
... and the URL http://joe.miller.com/pin doesn’t need to exist for real.
XML per se doesn't say anything about display and style, however:
Remember: XML per se cannot include media (e.g. pictures), doesn't understand links, doesn't have style. XML does not exist. XML languages do....