DAACS DTD documentation

Written by Sarah Wells and Daniel Pitti, March, 2004.
DAACS, Charlottesville: 2004

Introduction

The DAACS DTD is based on the TEI-Lite DTD and is TEI-compliant. All of these elements are in TEI-Lite, although they are not all used in the same way. The definitions here are intended for use in the DAACS DTD only.

How to read the definitions

Each element has a description of its function and any special usage rules are explained. If other elements can be nested inside that element, they are listed as sub-elements. The following symbols are used to define how those nestable elements can be used:

+ You must use this element at least once.
? You may use this element once, but it is not required.
* You may use this element as many times as you wish, but it is not required.
A | B You can use either element A or element B, but not both.

If none of these symbols appear, assume that the element is required but cannot be repeated. The elements are placed inside sets of parentheses to clarify which symbols apply to which elements. For example, suppose that element <hello> can contain the following elements:

sub-elements: (foo+, bar?, (foo2* | bar2*))

The only required sub-element is <foo>. It can also contain one or zero <bar> elements, and zero or more <foo2> or <bar2> elements (but not both). A proper usage might be:

<hello>
  <foo> 
    <bar2>blah blah blah</bar2>
    <bar2>yammer yammer</bar2>
  </foo>
</hello>
In several cases, you will see "#PCDATA" listed as a sub-element. This stands for "parsed character data," meaning text that is analyzed by the parser for entities and markup and then processed. You can enter any string of characters, with the exception of the "&" and "<" characters, which are used to indicate entity references and the beginning of elements. You must use "&amp;" and "&lt;" instead.

Note that some non-textual characters, such as a copyright symbol, can be typed in as entities ("&copy;").

Attributes are listed as either required or implied (i.e., optional). In some cases, they have a limited range of possible values. These values are listed in parenthesis. For example,

implied attributes: id, type (foo | bar)

This means that there are two optional attributes, id and type. You can assign any value to id but type can only be set to "foo" or "bar".

Types

There are ten types of DAACS documents:

  • background: Background documents are associated with buildings and sites and contain information about the building's or site's excavation history and summaries of the documentary and archeological evidence and analyses. There is one background document for each site in the Archive.

  • bibliography: This document contains bibliographic information for citations in the DAACS web site. There is only one bibliography document for the Archive.

  • chronology: A chronology of a building or site, based on a DAACS-developed uniform set of methods to infer intra-site chronologies. There is one chronology document for each site.

  • features: An explanation of the feature numbers and groups assigned to a site. There is one feature document for each site.

  • glossary: A list of definitions and attributes of terms used in the Archive documents. There is only one glossary document.

  • harrismatrix: A Harris Matrix for a site and an explanation of the information contained in the Matrix. There is one Harris Matrix document for each site.

  • history: A brief history of a particular site or area associated with sites in the Archive. There are currently three historical documents, for the Mount Vernon and Rich Neck plantations and Mulberry Row.

  • home: The home page for a particular site. It contains basic facts and figures relevant to the data about that building or site. There is one home document for each site in the Archive.

  • images: Contains locations and captions for images associated with a particular site. There is one image document for each site in the Archive.

  • phase: A detailed phasing query for each site. There is one phase document for each site in the Archive.

Elements

author

This element contains the name of the author of a bibliographic entry in the bibliography. The first author's name should be written last name first. If there are multiple authors, the remaining names should be written first name first. For example:

<author>Kennedy, John F.</author>
<author>Lyndon Johnson</author> 
<author>Gerald R. Ford</author>
It is used inside the <bibl>element.

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


bibl

Describes an item in the bibliography. It is nested in the <listBibl> element and contains the item's title and editor or author.

Each <bibl> must have an id attribute, which is a unique name for each work listed in the bibliography. Any references to a work will refer to it by this name. The id should use one of the following formats:

If the entry has one or more authors, the id should consists of the primary author's last name plus the year of publication that is given in the <date> element. If there are several publications from the same year, the year should be followed by a letter. E.g.,

<bibl id="Jones1995a"> ...</bibl> 
<bibl id="Jones1995b"> ...</bibl>

If there is no author or date, the id should be the entry's title (or a reasonable abbreviation, if the title is long). E.g.,

<bibl id="LudwellPapers">

Either way, every <bibl> must contain either a <title> or <date>. The <date> is the year of publication. If available, the name(s) of either the editor(s)or author(s) can be included.

sub-elements: ((editor* | author*),(date | title)+, note, term*)

required attributes: id

implied attributes: [none]


biblFull

This is bibliographic information about the source of the current XML file and is found inside the <sourceDesc> element in the <teiHeader>. It contains the file's title, its publication history, and how it was created.

sub-elements: (titleStmt, publicationStmt, notesStmt)

required attributes: [none]

implied attributes: [none]


body

A standard TEI element that contains the body of an XML document. It is used in conjunction with the <front> element, which contains preliminary descriptive information. The division is similar to the division in a book between front matter, such as a table of contents, epigraph, and forward, and the main body of the text. It appears inside the <text> element.

The <body> must include text inside a <p>, <listBibl>, or <list>; or a <div1>.

sub-elements: (head?, ((p | listBibl | list)*, div1*) +)

required attributes: [none]

implied attributes: [none]


cell

This is used for data that will appear in a table cell. It is placed inside a <row> element. It can contain either text (i.e., PCDATA) or an <xref> (a cross-reference).

sub-elements: (xref)*

required attributes: [none]

implied attributes: [none]


creation

This element is found inside the <profileDesc> element and contains a statement about how the content was converted from its previous form to XML and who was responsible for the conversion. For example,

<creation>Manually converted from Microsoft Word 2000 file to TEI Lite (P4) 
by Jillian Galle, Department of Archaeology, Thomas Jefferson Foundation.</creation>
sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


date

This can be found inside the <bibl> and <publicationStmt> elements. It contains a publication date.

When used inside <bibl>, it indicates what year the work was published. E.g.,

<date>1997</date>

This date is used to assign the work a unique id, so it should correspond to the <bibl> id (please see the <bibl> element definition for more information). If an author published more than one work in one year, the publication year should be followed by a letter. For example,

<date>1986a</date>

When used inside the <publicationStmt>, the date is the year that the XML document was published. To make the final HTML documents readable, the date should be followed by a period. E.g.,

<date>2003.</date>

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


div1

This is a standard TEI element used to mark a high-level section of a document. In the DAACS DTD, it is used inside the <front> and <body> elements.

It can have an id or a type attribute. The id is used in the history documents to identify sections of information. For example,

<div1 id="stuff">
The type attribute can only be set to "site".

sub-elements: (head?, ((list | p)*, div2*) +)

required attributes: [none]

implied attributes: id, type (site)


div2

A standard TEI element used to mark a second-level section of a document. It can only be used in a <div1> element.

sub-elements: (head?, ((list | p)*, div3*)+)

required attributes: [none]

implied attributes: [none]


div3

A standard TEI element used to mark a third-level section of a document. It can only be used in a <div2> element.

sub-elements: (head?, (list | p)+)

required attributes: [none]

implied attributes: [none]


edition

This element appears inside <editionStmt> and describes which edition of the document is contained inside the current XML file. The documents in this site are all electronic editions and should therefore be identified thus:

<edition>Electronic Edition.</edition>

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


editionStmt

This element appears inside <fileDesc> and contains information in the current XML file. It must contain the <edition> element.

sub-elements: (edition)

required attributes: [none]

implied attributes: [none]


editor

This element contains the name of the editor of a bibliographic entry in the bibliography. The first editor's name should be written last name first. If there are multiple editors, the remaining names should be have the first name first. For example:

<editor>Shaun, William</editor>
<editor>Harold Ross</editor> 
<editor>David Remnick</editor>
It is used inside the <bibl>element.

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


figDesc

This element is used in the images documents. It is nested inside a <figure> and provides a textual description of the image and/or an <xptr> to a full-size version of the image. For example:

<figDesc>Overview of completed Building l excavations
   <xptr n="75K" type="jpg" doc="overview2.full"/>
</figDesc>

sub-elements: (#PCDATA | xptr)*

required attributes: [none]

implied attributes: [none]


figure

Contains information about an image. It is used only in an images documents and is nested inside an <item> in a list of images. It contains a <figDesc> to describe the image.

It must have an entity attribute, which corresponds to an entity declaration at the top of the file. The entity declaration, in turn, translates into a the location of the image file. For example:

<!ENTITY overview03.full SYSTEM "bldg_s_overview3.jpg" NDATA jpeg>
...
<item> <figure entity="overview03.ref"> 
   <figDesc>...</figDesc> 
</figure></item> 
The entity "overview03.ref", up at the top of the XML document, contains a URL for the image. It is refered in the <figure> element with the entity attribute.

sub-elements: (figDesc)

required attributes: entity

implied attributes: [none]


fileDesc

This element contains file description information and is the first element inside the <teiHeader>. It holds bibliographic information about the XML file, including information about the origins and publication history of the file's content.

sub-elements: (titleStmt, editionStmt, publicationStmt, notesStmt, sourceDesc)

required attributes: [none]

implied attributes: [none]


front

Like <body>, this is a standard TEI element. It is used in conjunction with the <body> element, and contains preliminary descriptive information. In DAACS documents, it contains pointers to other XML documents and images which appear in the backgroup or in side panels. The division is similar to the division in a book between front matter, such as a table of contents, epigraph, and forward, and the main body of the text. It appears inside the <text> element.

sub-elements: (div1)

required attributes: [none]

implied attributes: [none]


head

This is a heading for a <body>, <div1>, <div2>, <div3>, or <list>. Use the <hi> element to render parts of a heading in a specific style. For example,

<head>Building <hi rend="font-style:italic">l</hi> Chronology</head>

Use the <xref> and <name> elements here as in another element.

sub-elements: (#PCDATA | hi | name | xref)*

required attributes: [none]

implied attributes: [none]


hi

Identifies a portion of text that needs to be formatted in a specific manner. It can only be used inside the <head>, <item>, <name>, <p>, <title>, and <xref> elements.

It must have a rend attribute, which describes how the text should be rendered. The description can be one or more options from a controlled list. Currently, the possible text renderings are:

font-style:italic
font-weight:bold
vertical-align:text-top
font-size:xx-small

To use more than one rendering option, separate them with a semicolon. Use the vertical-align and font-size options to render the text as superscript:
<hi rend="vertical-align:text-top;font-size:xx-small">

sub-elements: (#PCDATA)

required attributes: rend

implied attributes: [none]


item

Marks an item in a <list>. It can contain text, figures, and cross-references to other files. Note that you can start a secondary <list> inside the current list:

<list><item>
   <list>....</list>
</item>...</list>

sub-elements: (#PCDATA | hi | list | p | xref | figure | table)*

required attributes: [none]

implied attributes: [none]


label

Contains the title of an item in a <list>. It is nested inside the <list> element and before the <item> tag. For example,

<list>
   <label>Topic A:</label><item>...</item>
   <label>Topic B:</label><item>...</item>
</list>

Be sure to be consistent about labeling items: do not label some but not others.

The id attribute is not required, although it is used in the glossary to assign a unique id to every glossary entry.

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: id


list

Contains a list. It can be placed inside the <body>, <div1>, <div2>, or <div3> tags. It may contain a <head> and one or more <item> or <label> elements.

It may have a type attribute, which identifies what kind of list it is. Currently, the possible types are:

  • colophon
  • glossary
  • images
  • label
A colophon list is used in the background file for each site and contain information about the source of the text. The glossary and images lists are used only in the glossary and images files, respectively. Label lists are used in site home files to organize data about the site.

sub-elements: (head?, (item | label)*)

required attributes: [none]

implied attributes: type


listBibl

This is a list for the bibliography. It holds a list of <bibl> elements, which in turn contain information about the works.

sub-elements: (bibl)+

required attributes: [none]

implied attributes: [none]


name

This is currently used to identify a building mentioned in a background file. E.g.,

<p>Blah blah <name>Building <hi rend="font-style:italic">l</hi></name> blah blah.</p>

It can be found in <p>, <q>, and <head> elements.

sub-elements: (#PCDATA | hi)

required attributes: [none]

implied attributes: [none]


note

Contains a note about a bibliographic entry or about the file. It can appear in the <teiHeader> in the <notesStmt> element or in the <body> in the <bibl> element. It must use a type attribute to define the information as "technology" or "bibliographic." When it is used in a <notesStmt>, it is technology. For example,

<notesStmt>
   <note type="technology">Microsoft Word 2000.</note>
</notesStmt>

When used in a <bibl> it is bibliographic. If it is a bibliographic note, it can contain one or more <title> elements.

sub-elements: (#PCDATA | title)*

required attributes: [none]

implied attributes: type (bibliographic | technology)


notesStmt

This element contains notes about the XML file. It appears twice in the <teiHeader>, in the <fileDesc>, where it holds technical information about the file's encoding, and <biblFull>, where it describes what technology was used to encode the original content. In both cases, the information is in a nested <note type="technology"> element.

sub-elements: (note)

required attributes: [none]

implied attributes: [none]


num

This is found inside the <title> element and is used to specify the volume number of a work.

<title>Cabbages and Kings, 1066-1914, <num>Volume III.</num></title>

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


p

This is a standard TEI element. It can be used in several elements and is always used to mark the beginning and end of a paragraph of information.

sub-elements: (#PCDATA | hi | name | term | xptr | xref | q | table)*

required attributes: [none]

implied attributes: [none]


profileDesc

Contains information about how the XML file was created. It is located in the <teiHeader> element and contains a <creation> element.

sub-elements: (creation)

required attributes: [none]

implied attributes: [none]


publicationStmt

Contains information about where and when the file was published and who published it. It is found inside the <biblFull> and <fileDesc> elements and should look something like this:

<publicationStmt>
   <pubPlace>Charlottesville: </pubPlace>
   <publisher>DAACS, </publisher>
   <date>2003.</date>
</publicationStmt>

sub-elements: (p | (pubPlace, publisher, date))

required attributes: [none]

implied attributes: [none]


publisher

The name of the publisher of the XML document. It is found in the <publicationStmt> element. To make the final HTML documents readable, the publisher name should be followed by a comma and space. E.g.,

<publisher>DAACS, </publisher>

In all cases, the publisher is DAACS.

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


pubPlace

The place where theXML document was published. It is found in the <publicationStmt> element. To make the final HTML documents readable, this place name should be followed by a colon and space. E.g.,

<pubPlace>Charlottesville: </pubPlace>

Currently, all documents are published in Charlottesville.

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: [none]


q

This marks a quotation and is only used inside the <p> element. It can include an <xref> for citations.

sub-elements: (#PCDATA | name | xref)*

required attributes: [none]

implied attributes: [none]


row

Marks a row inside a <table>. It must contain one or more <cell> elements (be sure that each row in the table has the same number of cells). You may use the role attribute to mark the top row of a table (e.g., <row role="head">).

sub-elements: (cell)+

required attributes: [none]

implied attributes: role


sourceDesc

Describes the source of the document contained in the current XML file. It can be found in the <fileDesc> element and can contain either a <p> or <biblFull>. If the content is being encoded for the first time, use <p> and the word "Original." If the source of the content is another document (such as a Microsoft Word file), use <biblFull>.

sub-elements: (p | biblFull)

required attributes: [none]

implied attributes: [none]


table

This marks a table of data in the text. It can only be found in <p> and <item> elements and must contain one or more <row> elements.

sub-elements: (row)+

required attributes: [none]

implied attributes: [none]


TEI.2

This is the root element of a TEI-conformant file. It is used here because the XML markup we are using is essentially TEI. The root element is the first element in an XML file: all other elements in the file are nested inside. It comes after the XML and general entity declarations and must contain a <teiHeader> and <text> element.

sub-elements: (teiHeader, text)

required attributes: [none]

implied attributes: [none]


teiHeader

This contains header information about the XML file and is a required element in all TEI-based texts. It contains descriptive and bibliographic information about the file and is the first element in the <TEI.2> root element. It must contain a <fileDesc> and <profileDesc>.

It must include an type and a n attribute. The type attribute indicates what type of document the file. As discussed above, there are currently ten types of DAACS documents, listed again here, and you must choose one of these types.

  • background
  • bibliography
  • chronology
  • features
  • glossary
  • harrismatrix
  • history
  • home
  • images
  • phase

The n attribute associates the file with a given site or given type of information. XML attributes are case sensitive, so it is important it be spelled consistently. Currently, the following values are used:

  • Biblio
  • glossary
  • BuildingL
  • BuildingO
  • BuildingR
  • BuildingS
  • BuildingT
  • HouseforFamilies
  • JC298
  • PalaceLandsQuarter
  • RichneckQuarter
  • ST116
  • Mulberry
  • mountvernon
  • richneck
The first two are for the bibliography and glossary documents. The last three should be used in history documents.

sub-elements: (fileDesc, profileDesc)

required attributes: n, type (background | bibliography | chronology | features | glossary | harrismatrix | history | home | images | phase)

implied attributes: [none]


term

This is used in the bibliography to associate a work with one or more sites. It can be used in the <bibl> and <p> elements. The type attribute must be used and must have the value of “sitename." The site name used should be drawn from the same list used by the <teiHeader> n attribute:

<term type="sitename">BuildingL</term>

sub-elements: (#PCDATA)

required attributes: [none]

implied attributes: type (sitename)


text

This is a standard TEI element and indicates the content of an XML file. It follows the <teiHeader> and can hold <front> and <body> elements.

sub-elements: (front?, body)

required attributes: [none]

implied attributes: [none]


title

Contains the title of a bibliographic element, note, or the document. It can occur in the <bibl>, <note>, and <titlestmt> elements. When used in the <bibl> and <note> elements, it must use the level attribute. This attribute indicates what type of resource the bibliographic work is.

There are five possible values for the level attribute: a, meaning an article (i.e., something that was published as part of a larger work); m, meaning a book, manuscript, or other kind of monographic title (including single volumes of multi-volume works); j, meaning a journal; s, meaning a series; and u, meaning an unpublished paper or poster (including theses and dissertations unless published by a commercial press). An article title should include the title of the journal, book, or series in which it was published, in the <note> element. For example,

<bibl>
...
   <title level="a">Slave Diet at Monticello.</title> 
   <note type="bibliographic"><title level="j">American Antiquity</title> 55(4): 690-717.</note>
...
</bibl>

sub-elements: (#PCDATA | hi | name)*

required attributes: [none]

implied attributes: level (a | m | j | s | u)


titleStmt

A standard TEI element that contains information about the title of the XML document. It is used inside the <fileDesc> and <biblFull> elements and contains a <title> element.

sub-elements: (title)

required attributes: [none]

implied attributes: [none]


xptr

This is an empty element, meaning that it does not contain text or any nested elements and has a single tage marked with an "/" at the end. It is a pointer to another, internal, location in the current document or to an external document. It can be used in the <figDesc>, <item>, and <p> elements.

It must include the doc and type attributes. The doc attribute specifies an entity reference listed at the top of the document and the type attribute specifies what kind of data the entity reference will resolve to. For example, a pointer to a .gif file of an full image of Building S might look like this:

<xptr type="gif" doc="bldg_s.full.img"/>

When the file is processed by the style sheet, the reference will be resolved and the path of the .gif file will be inserted into the HTML. The type must be one of the choices listed below. Pointers to XML files can be used to associate the content of other files with the current file.

backgroundtoc This is used in background documents and refers to an image that appears next to a table of contents at the top of the page. You can use the n attribute to further clarify the image. For example, if there are two images that show two different buildings at the same site, you might want to clarify which building is in which image:
<xptr n="68AL" type="backgroundtoc" doc="background.AL"/> 
<xptr n="68AP" type="backgroundtoc" doc="background.AP"/>
biblhistory This is used in the bibliography and helps associate works with specific sites.
dxf Used in image documents to refer to .dxf files (AutoCAD site plans are stored in .dxf format).
embedtableright Used to specify that a table should be embedded in the text and displayed on the right. Currently, this is only used in the glossary to display a formula.
gif Used in the image documents to refer to .gif files.
harris Used in the harrismatrix documents. It refers to an image of the Harris Matrix of the site.
history Used in the home, background, images, harrismatrix, features, and chronology documents to create a link to the history document for the site. Used in the history documents to indicate which sites are associated with that document.
jpg Used in the image documents to refer to .jpg files.
pdf Used in the image documents to refer to .pdf files.
siteplan Used in the home documents. It refers to an image of the site's plan.

The n attribute can be used in conjunction with dxf, gif, and jpg to specify the size of the referenced file. It should be in kilobytes and should include the abbreviation “K". E.g.,

<xptr type="gif" n="37K" doc="bldg_s.full.img"/>

sub-elements: [none]

required attributes: doc, type (backgroundtoc | biblhistory | dxf | embedtableright | gif | harris | history | jpg | pdf | siteplan)

implied attribute: from, n


xref

This element is very similar to the <xptr> element. It is not an empty tag, though, and contains text. It is generally used to mark a portion of text, such as a cross-reference, and can be used in the <cell>, <head>, <item>, <p>, and <q> elements. It can also contain nested <xref> elements. There are no required attributes.

The doc attribute can be used to provide an entity reference or to specify that the text is a cross-reference to the bibliography or glossary.

The from attribute is used in cross-references and should correspond to either a <bibl> id in the bibliography or a <label> id in the glossary.

The type attribute indicate what kind of data the reference is pointing to. Currently, possible values are:

citation Creates a cross-reference to a work in the bibliography. Citations should use the from and doc attributes. Put the appropriate id for the work that is referenced in from (this must correspond to the <bibl> id in the bibliography) and the entity reference for the biblipgraphy document in doc (it should be "bibliography"). For example, a reference to the 1997 work published by Anna Agbe-Davies would look like this:
<xref type="citation" from="Agbe-Davies1997" doc="bibliography">
embedtableleft When the file is run through the stylesheets, the data referenced in the <xref> element will be placed inside an embedded table along the left margin of the page. It should use include the doc attribute, giving an entity reference for the desired data.
embedtableright When the file is run through the stylesheets, the data referenced in the <xref> element will be placed inside an embedded table along the right margin of the page. It should use include the doc attribute, giving an entity reference for the desired data.
glossary Creates a cross-reference to an entry in the glossary. Glossary references should use the from and doc attributes. Put the word that will be cross-referenced in from and the entity reference for the glossary document in doc (it should be "glossary"). For example, a cross-reference to the word "foo" would look like this:
<xref type="glossary" from="foo" doc="glossary">
pdf For a reference to a PDF file. It should use the doc attribute to refer to an entity reference and the from attribute to give a file path.
<xref type="pdf" doc="entity1"> 
<xref type="pdf" from="bldg_r_HarrisMatrix.pdf">

sub-elements: (#PCDATA | hi | xref)*

required attributes: [none]

implied attributes: doc, from, type (citation | embedtableleft | embedtableright | glossary | pdf)