For some reason (i guess i was way too lazy) I messed up an exam at university. So this post will help me to know about the topics mentioned above.
XML is not just a standard for data-exchange. It's more like the root of a whole family of IT Standards f.e.
- Transformation of XML-Documents (XSLT, STX)
- Adressing of parts of the XML-Document (XPath)
- Connection of different XML-Ressources (XPointer, XLink, XInclude)
- Selection of Data from XML-Documents (XQuery)
- Data manipulation of XML-Documents (Update)
- Creation of XML based Forms (XForms)
- Definition of XML datastructure (XML Schema / XSD, DTD)
- Signature and encryption of XML-Nodes (XML-Signature, XML-Encryption)
- Formatted Output of a XML-Document (XSL-FO)
- Markup Language (XUL)
- Definition to call functions or methods of another system (XML-RPC)
- ...
- how can i use DTD to validate a XML-Document
- how can i use XSD (= XML Schema) to validate a XML-Document
- how can i use XPATH to get out the data i'm looking for
- how can i use XSLT to get a nice looking html out of a XML-Document
- [viq*]why should i know about such things[/viq]
Lets start with Question Nr. 1 "how can i use DTD to validate a XML-Document"
Document Type Definition (DTD) is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. A DTD is primarily used for the expression of a schema via a set of declarations that conform to a particular markup syntax and that describe a class, or type, of document, in terms of constraints on the structure of that document. A DTD may also declare constructs that are not always required to establish document structure, but that may affect the interpretation of some documents. XML documents are described using a subset of DTD which imposes a number of restrictions on the document's structure, as required per the XML standard (XML is in itself an application of SGML optimized for automated parsing).We all know how a standard XML-File looks like...
- <recipe name="bread" prep_time="5 mins" cook_time="3 hours">
- <name>Basic bread</name>
- <ingredient amount="8" unit="dL">Flour</ingredient>
- <ingredient amount="10" unit="grams">Yeast</ingredient>
- <ingredient amount="4" unit="dL" state="warm">Water</ingredient>
- <ingredient amount="1" unit="teaspoon">Salt</ingredient>
- <instructions>
- <step>Mix all ingredients together.</step>
- <step>Knead thoroughly.</step>
- <step>Cover with a cloth, and leave for one hour in warm room.</step>
- <step>Knead again.</step>
- <step>Place in a bread baking tin.</step>
- <step>Cover with a cloth, and leave for one hour in warm room.</step>
- <step>Bake in the oven at 180(degrees)C for 30 minutes.</step>
- </instructions>
- </recipe>
Before i go on with the DTD-File we have to link the XML-File to the DTD

Lets create a simple dtd for that example before we look a little deeper into rules to validate
The most importand things to do right now: define Elements and Attributes for the XML-Document. We have to define Elements (name, ingredient, instructions), because there is no way of a valid XML-Document if there are more Elements in the XML File which are not mentioned in the DTD File.
This Elements-Definition should look like
- <!--ELEMENT recipe (name, ingredient+, instructions)-->
The next Step is to define a Attribute-list (ATTLIST) for this element. This ATTLIST could look like ...

Now we have to continue with the next Element (name) - no attributes, so we just add an ELEMENT-Tag
- <!--ELEMENT name (#PCDATA)-->
PCDATA means Parsed Character Data. The only difference is, that in comparison to CDATA the parser will change special characters like the german Ö or Ü to its HTML counterpart.
Let's continue with the next Element (ingredient)

And finaly we have to define another Element (instructions) which has a sub Element (step), but not a single attribute:

That's just a basic example - to get further instructions on different ways to check your XML you could throw an eye on the DTD site of w3c, wikipedia, or the w3cschool-site.
Question 2: "how can i use XSD (= XML Schema) to validate a XML-Document"
As you probably already realized - DTD is kind of basic validation for XML-Documents. If you want to check something like the right Date format or do a bit more restrictions on values.
XSD provides a set of 19 primitive data types
- boolean
- string
- decimal
- double
- float
- anyURI
- QName
- hexBinary
- base64Binary
- duration
- date
- time
- dateTime
- gYear
- gYearMonth
- gMonth
- gMonthDay
- gDay
- and NOTATION
XML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C. Like all XML schema languages, XML Schema can be used to express a schema: a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XML Schema was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data types. Such a post-validation infoset can be useful in the development of XML document processing software, but the schema language's dependence on specific data types has provoked criticism.
In comparison to the DTD File the XSD File looks quite differentm, but you'll still be able to find the element and attribute tags in it.
Let's have a look, how a working XSD File could look like


xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="file:///E:/Daten/XmlPad%20Projects/XML%20via%20DTD/recipe-schema.xsd"Another interesting point is the sequence of the tags. For me it's just important to know that you have to define a element as a complex type with a sequence to add sub-elements.
Let's have a look on line 6 to 8. There you can find the sub-elements of the root element 'recipe'. Those sub-elements are defined a second time in the XSD File, therefore you put in a reference (attribute ref="...."). On line 7 we put in another new attribute maxOccurs="unboundet" which tells the parser, that this element has to be there once or many times.
Still in the complex-type tag, we can find the attributes (name, prep_time, cook_time) with their data definition (type=xs:string). Then the definition of the root element is finished.
Line 15: The definition of the element name (my personal favourite). You just put in the name and the type.
Line 16-26: Ingredients - We do have Attributes to define, so we put in the complex-type tag. In the shown example you'll find the tags xs:simpleContent and xs:extension. But because of the lack of time (and the fact, that i have no idea what they are doing exactly) i just deleted those tags (delete line 18, 19, 23, 24).
0 Responses to what you/i should know about XML, DTD and XML Schema
Something to say?