For some reason (i guess i was way too lazy) I messed up an exam at university. So this post will help me to know about the topics mentioned above.

XML is not just a standard for data-exchange. It's more like the root of a whole family of IT Standards f.e.

  • Transformation of XML-Documents (XSLT, STX)
  • Adressing of parts of the XML-Document (XPath)
  • Connection of different XML-Ressources (XPointer, XLink, XInclude)
  • Selection of Data from XML-Documents (XQuery)
  • Data manipulation of XML-Documents (Update)
  • Creation of XML based Forms (XForms)
  • Definition of XML datastructure (XML Schema / XSD, DTD)
  • Signature and encryption of XML-Nodes (XML-Signature, XML-Encryption)
  • Formatted Output of a XML-Document (XSL-FO)
  • Markup Language (XUL)
  • Definition to call functions or methods of another system (XML-RPC)
  • ...
For me the most importand things are ...
  1. how can i use DTD to validate a XML-Document
  2. how can i use XSD (= XML Schema) to validate a XML-Document
  3. how can i use XPATH to get out the data i'm looking for
  4. how can i use XSLT to get a nice looking html out of a XML-Document
  5. [viq*]why should i know about such things[/viq]
*) Very Importand Question

Lets start with Question Nr. 1 "how can i use DTD to validate a XML-Document"
Document Type Definition (DTD) is one of several SGML and XML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. A DTD is primarily used for the expression of a schema via a set of declarations that conform to a particular markup syntax and that describe a class, or type, of document, in terms of constraints on the structure of that document. A DTD may also declare constructs that are not always required to establish document structure, but that may affect the interpretation of some documents. XML documents are described using a subset of DTD which imposes a number of restrictions on the document's structure, as required per the XML standard (XML is in itself an application of SGML optimized for automated parsing).
We all know how a standard XML-File looks like...


Basic bread
Flour
Yeast
Water
Salt

Mix all ingredients together.
Knead thoroughly.
Cover with a cloth, and leave for one hour in warm room.
Knead again.
Place in a bread baking tin.
Cover with a cloth, and leave for one hour in warm room.
Bake in the oven at 180(degrees)C for 30 minutes.

To validate this XML we create a DTD and link the XML File to this DTD File. To create those Files I use the Freeware called XMLPad.

Before i go on with the DTD-File we have to link the XML-File to the DTD
That's it - now we can start to create rules to validate the DTD ...

Lets create a simple dtd for that example before we look a little deeper into rules to validate

The most importand things to do right now: define Elements and Attributes for the XML-Document. We have to define Elements (name, ingredient, instructions), because there is no way of a valid XML-Document if there are more Elements in the XML File which are not mentioned in the DTD File.

This Elements-Definition should look like
The Syntax of the Element-Tag should be self-explanatory, the only thing you have to consider is that we can have several incredient-tags (therefore we put a '+' right behind the tagname, we can also put a '*' behind to say that ther can be 0 or more incredient-tags or we put a '?' behind to tell the parser, that this attribute is optional).

The next Step is to define a Attribute-list (ATTLIST) for this element. This ATTLIST could look like ...
So we have 3 Attributes for the Element recipe (name, prep_time, cook_time). CDATA defines the type of the attributes (Character Data). That way any XML-Parser knows that there is just text folowing (nothing to do). #IMPLIED tells us if the attribute has to be stated or not (in our case none of the attributes are required / #REQUIRED).

Now we have to continue with the next Element (name) - no attributes, so we just add an ELEMENT-Tag



PCDATA means Parsed Character Data. The only difference is, that in comparison to CDATA the parser will change special characters like the german Ö or Ü to its HTML counterpart.

Let's continue with the next Element (ingredient)
This time we force the parser to check if all of the mandatory attributes (amount, unit) are stated - otherwise we'll get an error.

And finaly we have to define another Element (instructions) which has a sub Element (step), but not a single attribute:
that's it - we don't have to do something else. The DTD File checks if every Element is in the XML File and if every mandatory attribute is stated.

That's just a basic example - to get further instructions on different ways to check your XML you could throw an eye on the DTD site of w3c, wikipedia, or the w3cschool-site.

Question 2: "how can i use XSD (= XML Schema) to validate a XML-Document"
As you probably already realized - DTD is kind of basic validation for XML-Documents. If you want to check something like the right Date format or do a bit more restrictions on values.

XSD provides a set of 19 primitive data types
  • boolean
  • string
  • decimal
  • double
  • float
  • anyURI
  • QName
  • hexBinary
  • base64Binary
  • duration
  • date
  • time
  • dateTime
  • gYear
  • gYearMonth
  • gMonth
  • gMonthDay
  • gDay
  • and NOTATION

XML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C. Like all XML schema languages, XML Schema can be used to express a schema: a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XML Schema was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data types. Such a post-validation infoset can be useful in the development of XML document processing software, but the schema language's dependence on specific data types has provoked criticism.

In comparison to the DTD File the XSD File looks quite differentm, but you'll still be able to find the element and attribute tags in it.

Let's have a look, how a working XSD File could look like
the next thing you'll realize is, that you link the XML-File to the schema file in a different way (look at the root element recipe)

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="file:///E:/Daten/XmlPad%20Projects/XML%20via%20DTD/recipe-schema.xsd"
Another interesting point is the sequence of the tags. For me it's just important to know that you have to define a element as a complex type with a sequence to add sub-elements.

Let's have a look on line 6 to 8. There you can find the sub-elements of the root element 'recipe'. Those sub-elements are defined a second time in the XSD File, therefore you put in a reference (attribute ref="...."). On line 7 we put in another new attribute maxOccurs="unboundet" which tells the parser, that this element has to be there once or many times.

Still in the complex-type tag, we can find the attributes (name, prep_time, cook_time) with their data definition (type=xs:string). Then the definition of the root element is finished.

Line 15: The definition of the element name (my personal favourite). You just put in the name and the type.

Line 16-26: Ingredients - We do have Attributes to define, so we put in the complex-type tag. In the shown example you'll find the tags xs:simpleContent and xs:extension. But because of the lack of time (and the fact, that i have no idea what they are doing exactly) i just deleted those tags (delete line 18, 19, 23, 24).

Related Posts by Categories



Widget by Hoctro | Jack Book