Internet‎ > ‎

XML

XML (Extensible Markup Language) is a set of rules for encoding documents electronically. It is defined in XML 1.0 specification produced by W3C and several other related specifications.
  • XML
    • XML is used to design and transport data.
    • It is a simple text based format
    • Benefits of XML
      • Simple to learn
      • Easy to read
      • Used widely over Internet
      • Strong unicode support
      • Text based therefore freely and widely edited
      • Many editors
      • Platform independent
      • Can be used for representing structured information for variety of pruposes
      • Some of the purposes for which it could be used is:
        • documents
        • data
        • configuration
        • books
        • transactions
        • invoices
        • applications
        • user interface templates
        • games
    • XML tags are not predefined.
    • You must define your own tags
    • XML tags are case sensitive
    • XML Elements be properly nested
    • XML documents must have root element
    • White space is preserved in XML
    • XML Naming Rules
      • Names can contain letters, numbers, and other characters
      • Names cannot start with a number or punctuation character
      • Names cannot start with the letters xml (XML or Xml etc)
      • Names cannot contain spaces
    • XML documents must be well formed
    • Displaying CSS with XML  Files
    • <?xml-stylesheet type="text/css" href="abc.css"? 
  •  <?xml version="1.0" encoding="ISO-8859-1"?>
      <open>
      <to>Example</to>
      <skj>Area</skj>
      <head>Rest</head>
      <body>Welcom to SKJ Platform</body>
      </open>
  • XML Syntax Rules
    • Document must begin with the XML declaration
      • eg: <?xml version="1.0"?>
    • Document must have one unique root element
    • The start tags in an XML document must have matching end tags
    • All elements in XML :
      • Are case sensitive
      • Must be closed
      • Must be properly nested
    • All attribute values must be quoted
    • Entities must be used for special characters in an XML document
  • XML Based Languages
    • XHTML the latest version of HTML 
    • WSDL for describing available web services
    • WAP and WML as markup languages for handheld devices
    • RSS languages for news feeds
    • RDF and OWL for describing resources and ontology
    • SMIL for describing multimedia for the web
    • SOAP for exchanging structured information in the implementation of Web Services in computer networks
  • Features in XML which are different from HTML
    • All elements must be closed or marked as empty.
    • In XML, attribute values must always be quoted. For eg. <Employee type="Manager" />
    • In XML, there are no built-in names (although names starting with xml have special meanings)
    • In XML, there are only five built-in character entities:
      • &lt; (<)
      • &gt; (>)
      • &amp; (&)
      • &quot; (")
      • &apos; (')
      • You can define your own entities in a Document Type Definition, or you can use any Unicode character
    • XML also allows hexadecimal references
  • Technical Things (Should be read only by programmers)
    • A software module called an XML processor is used to read XML documents and provide access to their content and structure
    • Each XML document has both a logical and a physical structure.
    • Physically, the document is composed of units called entities
    • An entity may refer to other entities to cause their inclusion in the document.
    • A document begins in a "root" or document entity
    • All XML processors must accept the UTF-8 and UTF-16 encodings of Unicode
    • Comments
      • Start with <!--
      • End with -->
    • Standalone Document Declaration
      • In a standalone document declaration, the value "yes" indicates that there are no external markup declarations which affect the information passed from the XML processor to the application.
      • The value "no" indicates that there are or may be such external markup declarations.
      • eg. <?xml version="1.0" standalone='yes'?>
    • A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document
    • Language information may also be provided by external transport protocols (e.g. HTTP or MIME). When available, this information may be used by XML applications, but the more local information provided by xml:lang should be considered to override it
    • XML attribute types are of three kinds:
      • a string type
      • a set of tokenized types
      • enumerated types
XML Namespace
  • An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names.

  • URI references which identify namespaces are considered identical when they are exactly the same character-for-character

  • Eg. 

<job xmlns:skj='http://skjword.org/schema'>

  <!-- the "skj" prefix is bound to http://skjworld.org/schema

       for the "job" element and contents -->

<!-- the 'price' element's namespace is http://skjworld.org/schema -->

  <skj:price units='Rs'>81.28</skj:price>

</job>

XML Schema (XSD)
  • XML Schema is generally referred as XSD document written in XML Language
  • XSD can be used to express a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema.
  • XML Schema is an XML-based alternative to DTD
  • Unlike DTD the XML Schema is itself written in XML
  • XML Schema (XSD) defines
    • Elements that can appear in a document
    • Attributes that can appear in a document
    • Which elements are child elements
    • The order of child elements
    • The number of child elements
    • Whether an element is empty or can include text
    • Data types for elements and attributes
    • Default and fixed values for elements and attributes
  • The XML representation of schema components uses a vocabulary identified by the namespace named http://www.w3.org/2001/XMLSchema
  • For brevity, the text and examples used in W3C's XSD specification and in examples used by some people use the prefix xs: to stand for this namespace; in practice, any prefix can be used
  • Many attributes used are identified by another namespace named http://www.w3.org/2001/XMLSchema-instance
  • For brevity, the text and examples used in W3C's XSD specification and in examples used by some people use the prefix xsi: to stand for this namespace; in practice, any prefix can be used
  • The xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes can be used in a document to provide hints as to the physical location of schema documents which may be used for assessment
  • MIME Type - application/xml or text/xml
  • Elements in XML Schema
    • schema
      • Description
        • It is the root element in XML Schema
      • Attributes
        • xmlns
          • This indicates that the data types and elements used in this schema come from the name  space given as value of this attribute. It specifies the default namespace declaration
        • schemaLocation
          • This attribute has two values
          • First value is the namespace to use
          • Second value is the location of the XML Schema for that namespace to use
        • targetNamespace
        • elementFormDefault
      • Child Elements
        • schema element can contain two types of child elements
        • Simple type of child elements
        • Complex type of child elements
    • Simple Elements
      • Description
        • It is an XML element that only contains values in one of the 19 primitive data types
        • They cannot contain attributes
        • It cannot contain any other elements inside it
      • Attributes
        • name
          • Defines name of the element
        • type
          • Defines primitive type of the element
        • default
          • Defines the default value of the element.
          • Default value is to be used when no other value is specified
        • fixed
          • Defines the fixed value of the element
          • No other value can be defined by the document writer if this attribute is used
    • Complex Elements
      • Description
        • They are elements that can contain attributes and other elements inside them
        • Tag : <complexType>
      • Attributes
      • Child Elements
        • complexContent
          • It signals that we intend to restrict or extend the content model of a complex type
        • any
          • It enables us to extend the XML document with elements not specified by the schema
          • When present in a complex element definition it tells that any other element can be added  to this complex element even if the XSD does not define it
        • anyAttribute
          • It enables us to extend the XML document with attributes not specified by the schema
          • When present in a complex element definition it tells that any other attribute can be added to this complex element even if the XSD does not define it
      • Indicators
        • Order Indicators (they are used to define the order of the elements)
          • All
            • The <all> indicator specifies that the child elements can appear in any order, and that each child element must occur only once
            • When using the <all> indicator you can set the <minOccurs> indicator to 0 or 1
            • <maxOccurs> indicator can only be set to 1
          • Choice
            • specifies that either one child element or another can occur
          • Sequence
            • specifies that the child elements must appear in a specific order
        • Occurrence Indicators (the are used to define how often an element can occur)
          • maxOccurs
            • specifies the maximum number of times an element can occur
            • Default value is 1
          • minOccurs
            • specifies the minimum number of times an element can occur
            • Default value is 1
            • To allow an element to appear an unlimited number of times, use the maxOccurs="unbounded"
        • Group Indicators (they are used to define related sets of elements)
          • Element Groups
            • syntax : <group name="groupname">
          • Attribute Groups
            • syntax : <attributeGroup name="groupname">
    • Global Elements
      • Elements that are immediate children of the "schema" element
    • Local Elements
      • Elements nested within elements other than schema element
    • Element Substitution
      • To understand please refer to this example
      • Eg. 
        • <element name="country" type="xs:string"/>
        • <element name="desh"    substitutionGroup="name"/>
        • In the example above, the "country" element is the head element and the "desh" element is substitutable for "country".
      • block attribute
        • The element can use the block attribute (block="substitution") in its definition to stop element substitution
      • All elements in the substitutionGroup (head element and the substitutable elements) must be declared as global elements, otherwise element substitution will not work
    • Element Reference
      • ref attribute
        • Any element can be reference by using the 'ref' attribute
        • This can help in dividing the document 
  • Attributes in XML Schema
    • Description
      • All attributes in XML Schema are defined as a simple type in a way similar to simple elements
      • Syntax : <attribute name="aaa" type="bbb"/>
    • Attributes of the element 'attribute'
      • name
        • Defines name of the element
      • type
        • Defines primitive type of the element
      • default
        • Defines the default value of the element.
        • Default value is to be used when no other value is specified
      • fixed
        • Defines the fixed value of the element
        • No other value can be defined by the document writer if this attribute is used
      • use
        • Defines whether this attribute is compulsorily required or is optional
        • All attributes are optional by default
        • Values : optional/required
        • Using this attribute with 'required' as value means that filing this attribute is compulsory
  • XSD provides a set of 19 primitive data types :
    • boolean
      • It is used to specify a true or false value
      • Legal values for boolean are following :
        • true
        • false
        • 1 (which indicates true)
        • 0 (which indicates false)
    • string
      • It can contain the following things
        • characters
        • line feeds
        • carriage returns
        • tab characters
      • The following is a list of data types that derive from string primitive data type
        • ENTITIES
        • ENTITY
        • ID
          • A string that represents the ID attribute in XML (only used with schema attributes)
        • IDREF
          • A string that represents the IDREF attribute in XML (only used with schema attributes)
        • IDREFS
        • language
          • A string that contains a valid language id
        • Name
          • A string that contains a valid XML name
        • NCName
        • NMTOKEN
          • A string that represents the NMTOKEN attribute in XML (only used with schema attributes)
        • NMTOKENS
        • normalizedString
          • A string that does not contain line feeds, carriage returns, or tabs
        • token
          • A string that does not contain line feeds, carriage returns, tabs, leading or trailing spaces, or multiple spaces
      • The following restrictions can be imposed on string
        • enumeration
        • length
        • maxLength
        • minLength
        • pattern (NMTOKENS, IDREFS, and ENTITIES cannot use this constraint)
        • whiteSpace
    • decimal
      • It is used to specify a numeric value
      • The following date types have been derived from the decimal primitive data type
        • byte
          • A signed 8-bit integer
        • int
          • A signed 32-bit integer
        • integer
          • An integer value
        • long
          • A signed 64-bit integer
        • negativeInteger
          • An integer containing only negative values (..,-2,-1)
        • nonNegativeInteger
          • An integer containing only non-negative values (0,1,2,..)
        • nonPositiveInteger
          • An integer containing only non-positive values (..,-2,-1,0)
        • positiveInteger
          • An integer containing only positive values (1,2,..)
        • short
          • A signed 16-bit integer
        • unsignedLong
          • An unsigned 64-bit integer
        • unsignedInt
          • An unsigned 32-bit integer
        • unsignedShort
          • An unsigned 16-bit integer
        • unsignedByte
          • An unsigned 8-bit integer
      • The restrictions that can be placed on this data type and the ones derived from it are
        • enumeration
        • fractionDigits
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • totalDigits
        • whiteSpace
    • double
    • float
    • anyURI
      • It is used to specify a URI
    • QName
    • hexBinary
      • It is used for expressing hexadecimal-encoded binary data.
    • base64Binary
      • It is used for expressing Base64-encoded binary data.
    • duration
      • Defines a time interval
      • It uses this format for writing time interval - "PnYnMnDTnHnMnS" where
        • P indicates the period (required)
        • nY indicates the number of years
        • nM indicates the number of months
        • nD indicates the number of days
        • T indicates the start of a time section (required if you are going to specify hours, minutes, or seconds)
        • nH indicates the number of hours
        • nM indicates the number of minutes
        • nS indicates the number of seconds
      • This is how it may look in document : <time-passed>P5Y2M10DT15H</time-passed>
      • To use a negative duration just place minus sign before the value in the document 
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • date
      • It specifies dates
      • The following format is used by it for writing dates
        • "YYYY-MM-DD"
        • YYYY indicates the year
        • MM indicates the month
        • DD indicates the day
        • All the three are required for a valid date
      • To specify a time zone, you can either enter a date in UTC time by adding a "Z" behind the date
      • To specify an offset from the UTC time add a positive or negative time behind the date
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • time
      • It specifies the time
      • The following format is used by it for writing time
        • "hh:mm:ss"
        • hh indicates the hour
        • mm indicates the minute
        • ss indicates the second
      • To specify a time zone, you can either enter a date in UTC time by adding a "Z" behind the date
      • To specify an offset from the UTC time add a positive or negative time behind the date
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • dateTime
      • It can be used to specify both date and time in one element
      • The following format is used by it for writing date and time
        • "YYYY-MM-DDThh:mm:ss"
        • YYYY indicates the year
        • MM indicates the month
        • DD indicates the day
        • T indicates the start of the required time section
        • hh indicates the hour
        • mm indicates the minute
        • ss indicates the second
      • To specify a time zone, you can either enter a date in UTC time by adding a "Z" behind the date
      • To specify an offset from the UTC time add a positive or negative time behind the date
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • gYear
      • Part of a date - the year (YYYY)
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • gYearMonth
      • Part of a date - the year and month (YYYY-MM)
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • gMonth
      • Part of a date - the month (MM)
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • gMonthDay
      • Part of a date - the month and day (MM-DD)
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • gDay
      • Part of a date - the day (DD)
      • Restrictions that can be placed on it
        • enumeration
        • maxExclusive
        • maxInclusive
        • minExclusive
        • minInclusive
        • pattern
        • whiteSpace
    • NOTATION
  • It allows new data types to be constructed from these primitives by three mechanisms: 
    • restriction (reducing the set of permitted values)
    • list (allowing a sequence of values)
    • union (allowing a choice of values from several types).
  • Twenty-five derived types are defined within the specification itself, and further derived types can be defined by users in their own schema.
  • Facets
    • Restrictions on XML elements are called facets
    • Types of restrictions allowed
      • enumeration
        • List of acceptable values
      • fractionDigits
        • Maximum number of decimal places allowed
        • Must be equal to or greater than 0
      • length
        • Exact number of characters or list items allowed
        • Must be equal to or greater than 0
      • maxExclusive
        • Upper bounds for numeric values (the value must be less than this value)
      • maxInclusive
        • upper bounds for numeric values (the value must be less than or equal to this value)
      • maxLength
        • Maximum number of characters or list items allowed
        • Must be equal to or greater than 0
      • minExclusive
        • Lower bounds for numeric values (the value must be greater than this value)
      • minInclusive
        • Lower bounds for numeric values (the value must be greater than or equal to this value)
      • minLength
        • Minimum number of characters or list items allowed
        • Must be equal to or greater than 0
      • pattern
        • It imposes the restriction as to the exact sequence of characters that are acceptable
        • It allows for imposing relatively sophisticated restrictions
      • totalDigits
        • Specifies the exact number of digits allowed
        • Must be greater than 0
      • whiteSpace
        • Specifies how white space is handled
        • White space includes
          • line feeds
          • tabs
          • spaces
          • carriage returns
        • Possible restrictions of white spaces
          • preserve
            • It tells that white spaces should not be removed
          • replace
            • It tells that all white spaces should be replaced by spaces
          • collapse
            • It tells that all white space characters should be replaced with spaces, leading and trailing spaces should be removed and multiple spaces should be reduced to a single space
  • Benefits of XSD
    • Gives ability to create markup languages very fast 
    • Defines vocabulary
    • Documentation generation
    • Code generation
    • Ability to use existing XML editor to edit Schema files
    • Ability to use existing XML parser to parse Schema files
    • Ability to manipulate Schema with the XML DOM
    • Ability to transform Schema with XSLT
    • It is extensible which allows one to
      • Reuse Schema in other Schemas
      • Create own data types derived from the standard types
      • Reference multiple schemas in the same document
  • (Work for SKJ Team) Find more about
    • Elements
    • Attributes

XSLT Essentials
  • XSLT is the most important part of XSL
  • XSLT transforms an XML document into another XML document
  • XSLT is used to transform an XML document into another XML document, or another type of document that is recognized by a browser, like HTML and XHTML. Normally XSLT does this by transforming each XML element into an (X)HTML element.
  • Support on almost all browsers
  • The root element that declares the document to be an XSL style sheet is <xsl:stylesheet> or <xsl:transform>
  • The correct way to declare an XSL style sheet according to the W3C XSLT Recommendation is:
    • <xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  • Steps to Create XSLT
    • Start with RAW XML Document
    • Create an XSL Style Sheet
    • Link the XSL Stylesheet to XML Document
  • The <xsl:template> element is used to build templates.

    The match attribute is used to associate a template with an XML element. The match attribute can also be used to define a template for the entire XML document. The value of the match attribute is an XPath expression (i.e. match="/" defines the whole document)


Reference Sites
ĉ
Saurabh Jain,
Feb 21, 2010, 1:54 AM
ċ
XQuery.docx
(25k)
Saurabh Jain,
Feb 21, 2010, 1:54 AM
ċ
XSLT.docx
(61k)
Saurabh Jain,
Feb 21, 2010, 1:54 AM
ĉ
Saurabh Jain,
Feb 21, 2010, 1:55 AM
Comments