DTD-Entities

Entities are variables used to define shortcuts to standard text or special characters. Entity references are references to entities Entities can be declared internally or externally.

Internal Entity Declaration

Syntax

<!ENTITY entity-name "entity-value">
DTD Example:
<!ENTITY name "Amit">
<!ENTITY company "RoseIndia">
XML example:
<Profile>&name;&company;</Profile>
Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;).

An External Entity Declaration

Syntax

<!ENTITY entity-name SYSTEM "URI/URL">
DTD Example:
<!ENTITY name SYSTEM "http://www.roseindia.net/entities.dtd">
<!ENTITY company SYSTEM "http://www.roseindia.net/entities.dtd">
XML example:
<Profile>&name;&company;</Profile>

DTD-Attributes

In a DTD, attributes are declared with an ATTLIST declaration.
Declaring Attributes
The ATTLIST declaration defines the element having a attribute with attribute name , attribute type , and  attribute default  value. An attribute declaration has the following syntax:



<!ATTLIST element-name attribute-name attribute-type default-value>
DTD example:
<!ATTLIST reciept type CDATA "check">
XML example:
<reciept type="check" />
Attribute-type
The attribute-type can be one of the following:
Type Description
CDATA
The value is character data
(en1|en2|..)
The value must be one from an enumerated list
ID
The value is a unique id
IDREF
The value is the id of another element
IDREFS
The value is a list of other ids
NMTOKEN
The value is a valid XML name
NMTOKENS
The value is a list of valid XML names
ENTITY
The value is an entity
ENTITIES
The value is a list of entities
NOTATION
The value is a name of a notation
xml:
The value is a predefined xml value
Default-value
The default-value can be one of the following:
Value Explanation
value
The default value of the attribute
#REQUIRED
The attribute is required
#IMPLIED
The attribute is not required
#FIXED value
The attribute value is fixed

A Default Attribute Value
DTD Example:
<!ELEMENT Scale EMPTY>
<!ATTLIST Scale length CDATA "0">
In the example above, the DTD defines a "Scale" element  to be  empty with a "length " attribute of  type CDATA . If no length is specified, it has a default value of 0.
Valid XML:
<Scale length ="100" />
REQUIRED
Syntax
<!ATTLIST element-name attribute_name attribute-type #REQUIRED>
DTD Example
<!ATTLIST person number CDATA #REQUIRED>
Valid XML:
<person id="5677" />
Invalid XML:
<person />
Use the #REQUIRED keyword if you don't have an option for a default value, but still want to force the attribute to be present.
IMPLIED
Syntax
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
DTD Example
<!ATTLIST emergency no. CDATA #IMPLIED>
Valid XML:
<emergency no.="555-667788" />
Valid XML:
<emergency/>
Use the #IMPLIED keyword if you don't want to force the author to include an attribute, and you don't have an option for a default value.
FIXED
Syntax
<!ATTLIST element-name attribute-name attribute-type #FIXED "value">
DTD Example
<!ATTLIST Client CDATA #FIXED "RoseIndia">
Valid XML:
<Client ="RoseIndia" />
Invalid XML:
<Client="LotusIndia" />
Use the #FIXED keyword when you want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the XML parser will return an error.
Enumerated Attribute Values
Syntax
<!ATTLIST element-name attribute-name (en1|en2|..) default-value>
DTD Example
<!ATTLIST reciept type (check|cash) "cash">
XML example:
<reciept type="check" />
or
<reciept type="cash" />
Use enumerated attribute values when you want the attribute value to be one of a fixed set of legal values.

DTD - XML Constituent

The constituent components of XML DTD Documents.
 DTDs are made up by the following integrants:
  • Elements
  • Attributes
  • Entities
  • PCDATA
  • CDATA
Brief explanation of each of the integrants :
Elements
Elements are the main constituent components of both XML documents.
Elements can contain text, other elements, or be empty eg..
<To>Rohan</To>
<From>Amit</From>

Attributes
Attributes provide extra information about elements.
Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".
Entities:
Entities are expanded when a document is parsed by a XML parser. Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag , the greater than sign (>) that defines the end of a XML tag.
The following entities are predefined in XML:
Entity References    Character
&lt;     <
&gt;     >
&amp;    &
&quot;     "
&apos;     '
 
PCDATA:
PCDATA means parsed character data. It can be thought as the character data ( text ) found between the start tag and the end tag of a XML element.
PCDATA is a text to be parsed by a parser. The text is checked by the parser for entities and markup.
Tags inside the text will be treated as markup and entities will be expanded. However, parsed character data should not contain any &, <, or > characters. These should  be represented by the &amp , &lt, and &gt entities, respectively.
CDATA:
CDATA is character data that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

DTD:Document Type Definition

A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal    elements and attributes.
A DTD can be defined  inside a  XML document, or a  external reference can be declared .

Internal DTD 

If the DTD is defined inside the XML document, it should be wrapped in a DOCTYPE definition with the following syntax:
<!DOCTYPE root-element [element-declarations]>



Example of a XML document with an internal DTD: E-mail.xml
<?xml version="1.0"?>
<!DOCTYPE E-mail[
  <!ELEMENT E-mail (To,From,subject,Body)>
  <!ELEMENT To (#PCDATA)>
  <!ELEMENT From (#PCDATA)>
  <!ELEMENT Subject (#PCDATA)>
  <!ELEMENT Body (#PCDATA)>
]>
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will 
catch u  tonight</Body>
</E-mail>

Open the file E-mail.xml in a web-browser . you will see  the following :
External DTD 
If the DTD is defined in an external file, it should be wrapped in a DOCTYPE definition with the following syntax:
<!DOCTYPE root-element SYSTEM "filename">
This is the same XML document as above,(but with an external DTD ) : E-mail.xml
<?xml version="1.0"?>
<!DOCTYPE E-mail SYSTEM 
"E-mail.dtd">
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will 
catch u  tonight</Body>
</E-mail>
And this is the file "E-mail.dtd" which contains the following DTD:
<!ELEMENT E-mail (To,From,subject,Body)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>
<!ELEMENT Subject (#PCDATA)>
<!ELEMENT Body (#PCDATA)>

Open the file E-mail.xml in a web-browser. It will display the following :
Importance of a DTD?
  • With a DTD, a XML file carries a description of its own format.
  • With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.
  •  User application can use a standard DTD to verify that the data he receives from the outside world is valid.
  • User can also use a DTD to verify his own data.

XML Validation

XML with correct syntax is Well Formed XML.
XML validated against a DTD or a Schema  is a Valid XML.
Well Formed XML Documents
A "Well Formed" XML document has correct XML syntax.
A "Well Formed" XML document is a document that conforms to the XML syntax rules that were described in the previous chapters:
  • XML documents must have a root element
  • XML elements must have a closing tag
  • XML tags are case sensitive
  • XML elements must be properly nested
  • XML attribute values must always be quoted

<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u  tonight</Body>
</E-mail>

Valid XML Documents:
A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD) or a XML Schema .
The following xml document is validated against a DTD , notice the highlighted text.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE e-mail SYSTEM "InternalE-mail.dtd">
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u  tonight</Body>
</E-mail

XML DTD
A DTD defines the legal elements of an XML document. The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements.
XML Schema 
XML Schema is an XML based alternative to DTD .W3C supports an alternative to DTD called XML Schema.

XML Syntax Rules

The syntax  rules  for  XML are very simple and strict. These  are  easy to learn and  use. Because of this, creating software that can read and manipulate XML is very easy. Xml enables an user to create  his own tags.
Note - XML documents use a self-describing and simple syntax
Let's develop a simple XML document :
<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail>
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u  tonight</Body>
</E-mail>

The XML declaration:  Always the first line in the xml document: 
The XML declaration should always be included. It defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of  XML and uses the ISO-8859-1 (Latin-1/West European) character set.
<?xml version="1.0" encoding="ISO-8859-1"?>
Root Element: The next line defines the first element of the document . It is called as the root element 
<E-mail>
Child Elements: The next 4 lines describe the four child elements of the root (To, From, Subject and Body).
<To>Rohan</To>
<From>Amit</From>
<Subject>Surprise....</Subject>
<Body>Be ready for a cruise...i will catch u  tonight</Body>
And finally the last line defines the end of the root element .
</E-mail>
you may feel from this example that the XML document contains a  E-mail To Rohan  From Amit. Don't you agree that XML is quite self-descriptive?
Now let's discuss its syntax-rules which are very simple to learn.

All XML elements must have a closing tag

In XML all the elements must have a closing tag like this:
<To>Rohan</To>
<From>Amit</From>

XML tags are case sensitive

XML tags are case sensitive. The tag <To> is different from the tag <to>.Hence the opening and closing tags must  be written with the same case:
<To>Rohan</To>
<to>Rohan</to>

XML Elements Must be Properly Nested

Improper nesting of tags makes no sense to XML. In XML all elements must be properly nested within each other like this in a logical order:
<b><i>Hi , how are you.....</i></b>

XML Documents Must Have a Root Element

All XML documents must contain a single tag pair to define a root element. All other elements must be written within this root element. All elements can have sub elements called as child elements. Sub elements must be correctly nested within their parent element:
<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root> 

Always Quote the XML Attribute Values 

In XML the attribute value must always be quotedXML elements can have attributes in name/value pairs just like in HTML. Just look the two XML documents below.  
The error in the first document is that the date and version attributes are not quoted .
<?xml version=1.0 encoding="ISO-8859-1"?>
<E-mail date=12/11/2002/>
The second document is correct:
<?xml version="1.0" encoding="ISO-8859-1"?>
<E-mail date="12/11/2002"/>

With XML, White Space is Preserved

With XML, the white space in a document is  preserved .
So a sentence like this :   Hello              How are you, will be displayed like this:
Hello              How are you,

Comparing XML with HTML

The Main Differences Between XML and HTML

XML is designed to carry data.

XML describes and focuses on the data while HTML only displays and focuses on how data looks. HTML is all about displaying information but XML is all about describing information. In current scenario XML is the most common tool for data manipulation and data transmission.

XML is used to store data in files and for sharing data between diverse applications. Unlike HTML document where data and display logic are available in the same file, XML hold only data. Different presentation logics could be applied to display the xml data in the required format. XML is the best way to exchange information.
XML tags are not predefined. User must "invent" his tags.


XML allows the user to define his own tags and document structure.

XML Tags are Case Sensitive

Unlike HTML, XML tags are case sensitive. In HTML the following will work:
This is incorrect

In XML opening and closing tags must therefore be written with the same case:

This is correctThis text is bold and italic

In XML all elements must be properly nested within each other like this:

This text is bold and italic

XML is a Complement to HTML

XML is not a replacement for HTML.

It is important to understand that XML is not a replacement for HTML. In Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data.

XML: An Introduction - Brief History

In the 1970’s, Charles Goldfarb, Ed Mosher and Ray Lorie invented GML at IBM. GML was used to describe a way of marking up technical documents with structural tags. The initials stood for Goldfarb, Mosher and Lorie.

Goldfarb invented the term “mark-up language” to make better use of the initials and it became the Standard Generalised Markup Language .

In 1986 , SGML was adopted by the ISO .

SGML is just a specification for defining markup languages.

SGML (Standardized Generalized Markup Language) is the mother of all markup languages like HTML, XML, XHTML, WML etc...

In 1986, SGML became an international standard for defining the markup languages. It was used to create other languages, including HTML, which is very popular for its use on the web. HTML was made by Tim Berners Lee in 1991.

While on one hand SGML is very effective but complex, on the other, HTML is very easy, but limited to a fixed set of tags. This situation raised the need for a language that was as effective as SGML and at the same time as simple as HTML. This gap has now been filled by XML.

The development of XML started in 1996 at Sun Microsystems. Jon Bosak with his team began work on a project for remoulding SGML. They took the best of SGML and produced something to be powerful, but much simpler to use.

The World Wide Web Consortium also contributes to the creation and development of the standard for XML. The specifications for XML were laid down in just 26 pages

How Can You Use XML?

Few Applications of XML

Although there are countless applications that use XML, here are a few examples of the applications that are making use of this technology.

Refined search results - With XML-specific tags, search engines can give users more refined search results. A search engine seeks the term in the tags, rather than the entire document, giving the user more precise results.

EDI Transactions - XML has made electronic data interchange (EDI) transactions accessible to a broader set of users. XML allows data to be exchanged, regardless of the computing systems or accounting applications being used.

Cell Phones - XML data is sent to some cell phones, which is then formatted by the specification of the cell phone software designer to display text, images and even play sounds!
File Converters - Many applications have been written to convert existing documents into the XML standard. An example is a PDF to XML converter.

VoiceXML - Converts XML documents into an audio format so that a user can listen to an XML document.

XML--What is XML?

"XML is a cross-platform, software and hardware independent tool for transmitting information"

XML is a W3C Recommendations. It stands for Extensible Markup Language . It is a markup language much like HTML used to describe data. In XML, tags are not predefined. A user defines his own tags and XML document structure like Document Type Definition (DTD) , XML Schema to describe the data. Hence it is self-descriptive too.There is Nothing Special About XML It is just plain text with the addition of some XML tags enclosed in angle brackets. In a simple text editor, the XML document is easily visible .


Why Is XML So Important?

There are number of reasons that contributes to the XML's increasing acceptance , few of them are:

Plain Text

In XML it is easy to create and edit files with anything from a standard text editor to a visual development environment. XML also provides scalability for anything from small configuration files to a company-wide data repository.

Data Identification

The markup tags in XML documents identifiy the information and break up the data into parts for example.. a search program can look for messages sent to particular people from the rest of the message. Different parts of the information are identified and further they can be used in different ways by different applications.

Stylability

When display matters, the stylesheet standard, XSL (an advance feature of XML), lets you dictate over the convectional designs ( like using HTML) to portray the data. XML being style-free, uses different stylesheets to produce output in postscript, TEX, PDF, or some new format that hasn't even been invented yet. A user can use a simple XML document to display data in diverse formats like

* a plain text file
* an XHTML file
* a WML (Wireless Markup Language) document suitable for display on a PDA
* an Adobe PDF document suitable for hard copy
* a VML (Voice Markup Language) dialog for a voicemail information system
* an SVG (Scalable Vector Graphic) document that draws pictures of thermometers and water containers

Universally Processed

Apart from being valid , restrictions are imposed on a xml file to abide by a DTD or a Schema to make it well-formed .Otherwise, the XML parser won't be able to read the data. XML is a vendor-neutral standard, so a user can choose among several XML parsers to process XML data.

Hierarchical Approach

XML documents get benefitted from their hierarchical structure. Hierarchical document structures are, faster to access. They are also easier to rearrange, because each piece is delimited. This makes xml files easy to modify and maintain.

Inline Reusabiliy

XML documents can be composed of separate entities. XML entities can be included "in line" in a XML document. And this included sections look like a normal part of the document .A user can single-source a section so that an edit to it is reflected everywhere the section is used, and yet a document composed looks like a one-piece document.