Skip Navigation

Center for Digital Research in the Humanities

What is XML?

XML is an encoding standard that assists in the creation, retrieval, and storage of documents. It consists of a tag structure that identifies specific information within a document. Unlike HTML, XML is not limited to a specific set of tags, because a single tag set would not adapt to all documents or applications that may use XML.

XML utilizes the concepts of tags, elements, and attributes for encoding text.

The following example is a simplified XML document. One can see that there is a title at the beginning of the document and that the line breaks ("lb/") occur in specific places:

XML Example

Examples of XML Encoding at the University of Nebraska–Lincoln include:

  1. TEI (Text Encoding Initiative) encoding:
    • Poetry
    • Letters
    • Books
    • Any text
  2. EAD (Encoded Archival Description) encoding:
    • Archival collections
    • Finding aids

XML Rules

  1. All tags must be either closed or empty.
  2. All tags must be nested properly.
  3. All special characters (entities) within the tags must be well-formed.

In addition to the concepts of tags, elements, and attributes, it should be noted that XML is case sensitive. Therefore the tags:

<TITLE>...</TITLE>

will mean something different than the tags:

<title>...</title>.

In addition, a mixed markup such as <TITLE>...</title> would not be valid.

Hierarchy is important:

<title>Bonfire of the Soul: <subtitle>Wooden Churches in Ireland</subtitle></title>

is valid while...

<title>Bonfire of the Soul: <subtitle>Wooden Churches in Ireland </title></subtitle > is not valid because the </title> and </subtitle>

tags are not nested properly.

What is a DTD?

The Document Type Definition (DTD) defines the structural rules of a type of document. These rules include a complete list of allowable elements and attributes, special character entities, rules for external files(such as images), as well as the hierarchical structure of all elements. Examples of documents type definitions include TEI and EAD and numerous others.

Understanding the DTD: Coding for all documents begins with a DTD. Coding specifications will differ according to the type of material being coded.

There are separate, but related, DTDs for:

  • poetry
  • drama
  • prose

 

 
Please Support CDRH

For news and updates about the CDRH and our digital projects:



Certificate in Digital Humanities
Graduate Student Internships
UCARE

Preface
What is a digital project?
Introduction to Text Encoding
XML
TEI
EAD
Presentation
Glossary