Tuesday, February 02, 2010

XML SChema

Please refer to my earlier post on Learing XML it is continuation of the same.

XML Schema is an xml lanaguage that defines and validates the structure of xml document.It is saved as .xsd file where xsd corresponds to xml schema definition.It is another way to validate the xml document other than DTD and now a days it is used more in comparison to that of DTD.
XML Schema allows XML technology to represent data in a standard format.It provides a precise way to specify the type of content that can be held in the element and attibute of an xml document.This was not avaialable in DTD as there was no mechanism to specify data type in DTD so that a database user can use it.The XML schema is based on DTD and it also provides the XML namespace and data type support.It is written like xml only.It allows easy creation of complex and reusable content models.They also support object inheritance and substitution in types.They also support w3c namespace recommendation.

A simple XML schema should have
1>A required namespace string with an xs as prefix.xs is not essential but it is w3c recommended format so we use xs as a default prefix in xml schmea.
The default name space used is http://w3.org/2001/XMLSchema
2>It has the document root as <schema>
3>Element has to be defined in <element>
so a simple xsd schema should look like
<?xml version =”1.0”?>
<xs:schema xmlns:xs =”http://www.w3.org/2001/XMLSchema”>
<xs:element name=”dept” type=”xs:string”>
</xs:element>
</xs:schema>

we will call this as dept.xsd
For this xsd we have the corresponding xml document as
<?xml version=”1.0”?>
<dept>Integration</dept>

The best point about the xsd is that it can also be validated.
Now we will again use our xml-startlet tool to check the validity of xml document based on the xml schema.
But before that we need to check and understand few things in our xsd .The root element in our xsd is dept hence the document should be named as dept.xsd.
http://www.w3.org/2001/XMLSchema is a standard namespace provided by w3c.
so now just open a command prompt make sure you have your dept.xsd and dept.xml are in the same folder and navigate upto that folder in command format and issue the following command
xml val –s dept.xsd dept.xml



Again we will check the xsd document to understand few more terms.
<xs:schema> element is the root element for the xsd and contains the definition for the structure of the xml instance document.
An xml document whose structure is based on the definitions in an XML schema is called an instance document of that xml schema.
IN our case <xs:element> represents the root element,dept for the xml instance document.
The <xs:string> value is set as the data type for the cotent dept in the xml instance document.The data can be any string and no child elements are permitted here.
One important point to note here is that here we are explicitly validating the xml document against the xsd file as we have not pointed out any reference of the xsd file in our xml document but we always should point out our xsd in the xml document.
The xml schema can be pointed out by using
1>noNamespaceSchemaLocation attribute or
noNamespaceSchemaLocation=”file.xsd”
2>schemaLocation attribute.
schemaLocation=”namespace file.xsd”

To validate an xml instance doucment using an xml schema ,an instance document should declare the http://www.w3.org/2001/XMLSchema-instance XML namespace in the root element to enable the two attribute which we have defined earlier.
Again it is recommended to use the schemaLocation attribute.
Now our xml document should look like this
<?xml version="1.0"?>
<dept xmlns=”http://www.abc.com/dept” xmlns:xsi=
”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.abc.com/dept dept.xsd”>
Arpit</dept>

And corresponding schema should appear like

<?xml version ="1.0"?>
<xs:schema xmlns:xs ="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.abc.com/dept">
<xs:element name="dept" type="xs:string">
</xs:element>
</xs:schema>

Using schemaLocation forces a specified XML namespace to be used in an XML instance document.As you can see the xml schema contains a target name space that must be used in the xml instance document.The xml instance document refer the xml schema for validation by using schema location attribute to get the targer namespace and file.It is a bit confusing but once you will go further you will get a better idea on this.

We will check again the schema declaration concpet.

An xml schema document contains the following information
Namespace information,Default values and version
Example
<?xml version=”1.0” ?>
<xsd:schema xmlns:xsd =”http://www.w3.org/2001/XMLSchema”
targetNamespace=”http://www.arpit.com/rahi”
xmlns:target=”http://www.arpit.com/rahi”
attributeFormDefault=”qualified”|”unqualified”
elementFormDefault=”qualified”|”unqualified”
version=”1.0”>

”http://www.w3.org/2001/XMLSchema” is the xml schema namespace.
The namespace prefix as we have used in our case is xsd is an optional feature and you can provide any name to it.
The XML schema define certain informations and the targeNamespace help us to identify these information and it also requires a matching target namespace declaration to be used with reference to declaration within the schema.So in our case
targetNamespace=”http://www.arpit.com/rahi”
is the declaration of a namespace and
xmlns:target=”http://www.arpit.com/rahi”
is the reference to the matching namespace declaration.
Declaring targetNamespace value as a default namespace in the root element allows the xml schema document to refer to the information in the same xsd.This is impotant when we want to refernce the element referenced by other declaration.

Declaring Defaults
You can use the following settings attributeFormDefault and elementFormDefault to control the qualification of elements and attributes against a namespace declaration.eg-
<?xml version =”1.0” ?>
<emp:employee xmlns:emp=”http://www.arpit.com/rahi”>
<first_name>Arpit</first_name>
<last_name>Rahi</last_name>
</emp:employee>
here employee element is qualified by the namespace as you can see it is expressed as
emp:employee however the elements first_name and last_name are not qualified by the namespace .

again
<?xml version =”1.0” ?>
<employee xmlns=”http://www.arpit.com/rahi”>
<first_name>Arpit</first_name>
<last_name>Rahi</last_name>
</emp:employee>
in this case all the elements are implicitly qualified by the default namespace so it can be also understood as the

<?xml version =”1.0” ?>
<emp:employee xmlns:emp=”http://www.arpit.com/rahi”>
<emp:first_name>Arpit</emp:first_name>
<emp:last_name>Rahi</emp:last_name>
</emp:employee>
======================================


Components of an XML declaration
The various components of an xml document are
1.>element which can contain the attributes or can be structured as simple type or complex type.
2>attribute represents a named attribute.
3>simplet type for defining a simple data type can be structured as union or list.
4>complex type can be composed of simple types,sequence,choice and groups
5>group can be composed of sequence ,choice.
6>attribute group defines a group of attributes.

Lets take an example to understand this.
<?xml version ="1.0" ?>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name ="employee">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="first_name" type="xsd:string"/>
<xsd:element name="last_name" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

we can have a corresponding xml document for the same as
<?xml version =”1.0” ?>
<employee xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
<first_name>Arpit</first_name>
<last_name>Rahi</last_name>
</employee>
so again use the xml-starlet tool to check the validity of the xml document against the xml schema.


Global and local declaration
In the xml schema a component declaration is either global or local.A component is global if it is declared as a direct child of schema element and they can be reused within the xml schmea document.Where as local declaration are valid in the the context where they are defined.They are not the direct child of schema element.The local declaration can refernce global type declaration by using a namespace prefix.
To reference a global declaration within a xml schema document it is necessary to declare an xml namespace with the same value as target namespace attribute in the start tag of the <schema> element.so it should be appearing some what like this.
<schema targetNamespace =”http://www.arpit.com”
xmlns:t= “http://www.arpit.com”>

Now we will see an example on how to use the global and local declaration.

<?xml version ="1.0" ?>
<department xmlns="http://www.arpit.com/rahi"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.arpit.com/rahi dept.xsd">
<details>
<first_name>Arpit</first_name>
<last_name>Rahi</last_name>
<dept>Integration</dept>
</details>
</department>


And the correspoding xsd file for this will be

<?xml version ="1.0" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.arpit.com/rahi"
xmlns:t="http://www.arpit.com/rahi"
elementFormDefault="qualified">
<xsd:element name ="dept" type="xsd:string"/>
<xsd:element name="first_name" type="xsd:string"/>
<xsd:element name="last_name" type="xsd:string"/>
<xsd:complexType name ="detail">
<xsd:sequence>
<xsd:element ref="t:first_name"/>
<xsd:element ref="t:last_name"/>
<xsd:element ref ="t:dept"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element name="department">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="details" type="t:detail" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>


Now we will try to look in to the xsd and try to understand it.
<?xml version ="1.0" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.arpit.com/rahi"
xmlns:t="http://www.arpit.com/rahi"
elementFormDefault="qualified">
This part of the xsd is by default declaring the schma element ,targetnamespace and a prefix to point to that target name space,again we are qualifying the element to be reference using namespace.
<xsd:element name ="dept" type="xsd:string"/>
<xsd:element name="first_name" type="xsd:string"/>
<xsd:element name="last_name" type="xsd:string"/>
<xsd:complexType name ="detail">

these are the global declaration of elements in the schema as they are the direct child element of the schema.
<xsd:element ref="t:first_name"/>
<xsd:element ref="t:last_name"/>
<xsd:element ref ="t:dept"/>

here we are passing the reference of the global declaration .The order is important as we have to define our xml in the same order otherwise it will give a parser error.

<xsd:element name="department">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="details" type="t:detail" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>

here is the main logic for the xml document.Here we are defining that the root element of the xml document should be department.It should contain elements details which will be of type detail,AS you can see deatil is a complex type which contains the refernec for the three elements ,along with that we have defined one more property that is maxOccurs it defines how many times the element can occur in the xml unbouded means it can occur any number of time in the xml document.Now once again you can check the validity of the xml document from the xml schema using xml startlet.
===============================================================
Declaring an Element
The full syntax of an element declaration is as follows
<element
name=”name of element”
type=”global-type|built in type”
ref=”global declaration”
form =”qualified|unqualified”
minOccurs=”somevalue”
maxOccurs=”somevalue|unbounded”
default=”some default value”
fixed=”some fixed value”>

I believe all the attributes are self explanatory over here.the type which we use like string,int are the global varaible type.Min occurs and max occurs should not be a negative number.We can also have default as well as fixed values as we used to have in DTD.
=================================================================
Declaring the simple type
A simple type can provide three primary derived types
1>restriction
2>list and
3>Union
We will check each of them one by one.
Lets take an example to make it more clear
<simpleType name =”emp_id”>
<restriction base=”positiveInteger”>
<maxInclusive value=”1000”/>
</restriction>
</simpleType>
<element name=”employee_id” type =”emp_id”/>

AS you can see we have defined here a new data type emp_id whose restriction is that it should be a positive number and the maximum value allowed is 1000.
Again in the element declaration
<element name=”employee_id” type =”emp_id”/>
we have declared the type of the element as emp_id .
Don’t bother about the maxInclusive and positiveInteger keywords as they are the reserved key words and can be found easily in any sites.

A valid xml document for this element should be
<employee_id>999</employee_id>
There are many properties that can be used in the simpleType declaration as enumeration,length,maxExclusive,minLength etc.
We will discuss one more type that is enumeration.
<simpleType name =”emp_name”>
<restriction base=”string”>
<enumeration value=”Arpit”/>
<enumeration value=”Ankit”/>
<enumeration value=”Nitin”/>
</restriction>
</simpleType>

Here we are defining a simple type called emp_name and here we are defining three values for the type.So a valid xml document should look like
<emp_name>Arpit|Ankit|Nitin</emp_name>
We can have only three values in the emp_name which we have defined in the xsd.If we will define any other value it won’t be valid.

Using List
<element name =”email”>
<simpleType>
<list itemType=”string”/>
</simpleType>
</element>

list here provides a list of strings
<email>Arpit Rahi is god</email>

Using Union
<simpleType name =”emp_name”>
<restriction base =”string”>
<enumeration value=”Arpit”>
<enumeration value=”Ankit”>
</restriction>
</simpleType>
<simpleType name =”employee_name”>
<restriction base =”string”>
<enumeration value=”Nitin”>
<enumeration value=”Krishna”>
</restriction>
</simpleType>

<element name =”Employee”>
<simpleType>
<union memberTypes=”emp_name employee_name”/>
</simpleType>
</element>

So as you can see we have defined two simple types emp_name and employee_name
And also as we have defined union
<union memberTypes=”emp_name employee_name”/>
that is we can have the elements that we have declared in both the simpleTypes.
So the following xml document can be defined for the same.
<Employee>Arpit</Employee>
<Employee>Krishna</Employee>
so as you can see Arpit has been defined in emp_name and Krishna has been defined in employee_name and since we have defined a union we can take both the enumerated type in our xml document.

Declaring a choice
A choice allows for one of a selection of components to be included in the XML instance document.A choice is declared within a complexType.It will be more clear with an example.
<complexType name=”employee”>
<choice>
<element name=”first_name” type=”string”/>
<element name=”full_name” type=”string”/>
</choice>
</complexType>
<element name=”emp_name”>
<complexType>
<sequence>
<element name=”employee_name” type=”string”/>
<element name=”emp_id” type=”employee”/>
</sequence>
</complexType>
</element>

A valid xml document for this xsd will be
<emp_name>
<employee_name>Arpit Rahi</employee_name>
<emp_id>
<first_name>Arpit</first_name>
</emp_id>
</emp_name>
As you can see we have defined two element in the choices
<choice>
<element name=”first_name” type=”string”/>
<element name=”full_name” type=”string”/>
</choice>
so when we will be defining the xml document we should have one of the two choices in our xml instance document.Hence in our xml instance document we have defined only one of the element as a choice.

Empty Element
An empty element does not allow text between the start and the end tag.It is useful when we declare an element with attributes only that is they do not have text content.
Consider this example
<element name=”employees”>
<complexType>
<sequence>
<element name=”employee” maxOccurs=”unbounded”>
<attribute name=”emp_name” type=”string”/>
</element>
</sequence>
</complexType>
</element>

As you can see we have decalred an element employee but we have not defined the type for the element so we are not supposed to supply any text content for the employee.All we need to supply is the attribute name.So a proper xml document for the given xsd file will be
<employees>
<employee emp_name=”arpit”></employee>
</employees>

Declaring attributes
An attributes can be declared as
<attribute
name=”name of attribute”
type=”global_type|build in types”
ref=”global attribute declaration”
form=”qualified|unqualified”
used=”optional|prohibited|required”
default=”default-value”
fixed=”fixed value”>

All the attributes are self explanatory.It is declared and defined in a similar way as that of XML.
Declaring and referncing an attribute group.
Attribute group contains one or more attributes.It will be more clear with examples
Declaring an attribute
<attributeGroup name=”employees”>
<attribute name=”emp_name” type=”string”/>
<attribute name=”emp_id” type=”string”/>
</attributeGroup>
Referencing attributeGroup
<element name=”employee”>
<complexType>
<attributeGroup ref=”employees”/>
</compleType>
</element>

So a valid xml string for this XSD will be something like this
<employee emp_name=”Arpit” emp_id=”420”/>
So we have covered all most everything about XSD now it needs a bit of practice to get good command over XSD programming.

Documenting the XML schmea
It is again one of the important concept in XML schema.We need to know how the XML schema are documented.An XML schema can be documented either by using
XML comments
Annotation declaration
Xml comment as you are aware of can be done using the following quotation mark
<!—Enter your comment -->
Using annotation
<annotation>
<appinfo source =”C:\document.xml”/>
<documentation>Comments</documentation>
</annotation>
Using annotation we can use the following commanda to store the xml document
<appinfo source =”C:\document.xml”/>
The document will be stored in the following C:\document.xml
We can also provide comments within the documentation tag.

So by default a xsd schema should look like
<?xml version="1.0" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.example.org"
targetNamespace="http://www.example.org"
elementFormDefault="qualified">
<xsd:element name="exampleElement">
<xsd:annotation>
<xsd:documentation>
A sample element
</xsd:documentation>
</xsd:annotation>
</xsd:element>
</xsd:schema>

No comments: