Metadata and the Collection Object


Table of Contents

1. Introduction

2. Controlling Metadata

3. Information Overload

4. Example

4.1. Catalogs
4.2. Physical model

5. Organizing Metadata

5.1. Wall 13
5.2. Relationships
5.3. Tracking internal and external relations

6. Appendix

A-1. File organization at ingestion
A-2. PFP minimum element set
A-2.1. Catalog
A-2.2. Header
A-2.3. Resource description

1. INTRODUCTION

Collection Object: an aggregation of digital objects where the aggregate form has intellectual content beyond its specific digital objects and the simple membership set.

The definition above is clearly a direct analogy to an edited volume (e.g., the collected works of Kilgore Trout). Such edited volumes have several components:

  1. a representation of each primary work,
  2. core binding structures (e.g. table of contents, index, and credit for original works), and
  3. editorial content (e.g., preface to the volume and maybe to each work, ordering and groupings of the works [as evidenced in the TOC and maybe in more elaborate discussions], and substantial interpretations of the primary works and their interrelations).

A central repository will have metadata for each digital object and an overall index of its digital objects (possibly in both an administrative and a user form). Those two aspects of a central repository imply that the central repository will subsume any collection object that does not include editorial content (as described above).

The primary value of a collection object in a digital repository derives from information in editorial content information. For the purposes of this discussion, that information can be split into two subcomponents:

3a. interpretation: the editorial content

3b. metadata: descriptive information to aid discovery

The interpretation section might well be another digital object, like the original works of the primary collection, or a smaller work (such as a preface).

Our main interest in this discussion concerns the metadata subcomponent. One critical observation is that metadata provides both an opportunity and a responsibility to establish metadata in context. The context here is a particular collection’s point of view and interests. For example, a collection focused on religious scholarship and a collection about historical construction techniques may refer to the same building, but they will use different metadata to indicate the conceptual function the building holds in each context.

Note that the term "metadata" here means information specifically recorded to enable user discovery, perhaps via a library-wide index, an "information community" index, or a project (collection) index.


2. CONTROLLING METADATA

The Supporting Digital Scholarship (SDS) Technical Committee has been investigating the technical issues associated with transforming born-digital materials (as exemplified by several IATH projects) into library resources. A major aspect of this transformation is coagulating a project’s materials into a coherent whole. For the purposes of this discussion we will consider that whole to be a collection object that comprises the materials for the project. Note that a project may already have its own set of collection objects, which become components of the overall collection object.

[***This still needs a decent transition to move from collection objects to MD***] Metadata can encompass a wide variety of information about a work or collection, its resources, and its connections to other networked collections. It is critically important to the administrative, scholarly, and technical spheres of influence inside and outside an individual work and any plan for collecting digital scholarship must seriously and thoroughly consider this topic. However, it is a slippery subject and is difficult to define, collect, and manage. How can an author tell the difference between data and metadata? How can a collector know what metadata a project should have? How can a library conveniently and reliably identify and use metadata from every project it aquires?

The technical committee has been discussing ways to represent and organize metadata for several months. There is no generic solution, since while there are some types of information that all collection objects or projects produce each project will have its own particular combination of information that it needs to store and track. An archeology project, for example, might be primarily interested in mapping information against a physical location, while a literary or historical project might want to track information against a timeline or an intellectual construct.

There are some ways to control the information. The GDMS tools we are developing allow a great deal of flexibility and creativity in collecting, segregating, holding, and discovering different types of metadata. We are also pursuing a basic division between catalogs of resources and intellectual/physical model, which helps authors distribute information in a relatively straightforward structure.


3. INFORMATION OVERLOAD

Metadata can feed on itself, producing an apparently endless flow of recursive definition and explanation. This can easily overwhelm and drown an otherwise well-organized project. The project creator and the collecting library both need to guard against too much of a good thing.

  • Too much information
  • One benefit of on-line scholarship is that a project can build and use enormous amounts of multimedia information. A single project can encompass all of William Blake’s collected works, a cathedral’s structural history, or the intimate details of life in the U.S. Civil War. It may hold thousands of individual resources (images, texts, audio files, etc.), each with their own administrative, descriptive, and technical metadata. All this information demands precise and conscientious management before, during, and after a project is created and collected.

  • Too many contexts
  • One benefit of on-line scholarship is that a project can build and use enormous amounts of multimedia information. A single project can encompass all of William Blake’s collected works, a cathedral’s structural history, or the intimate details of life in the U.S. Civil War. It may hold thousands of individual resources (images, texts, audio files, etc.), each with their own administrative, descriptive, and technical metadata. All this information demands precise and conscientious management before, during, and after a project is created and collected.

  • No single vision to unite workspace and repository
  • One benefit of on-line scholarship is that a project can build and use enormous amounts of multimedia information. A single project can encompass all of William Blake’s collected works, a cathedral’s structural history, or the intimate details of life in the U.S. Civil War. It may hold thousands of individual resources (images, texts, audio files, etc.), each with their own administrative, descriptive, and technical metadata. All this information demands precise and conscientious management before, during, and after a project is created and collected.

  • Nonstandard projects
  • One benefit of on-line scholarship is that a project can build and use enormous amounts of multimedia information. A single project can encompass all of William Blake’s collected works, a cathedral’s structural history, or the intimate details of life in the U.S. Civil War. It may hold thousands of individual resources (images, texts, audio files, etc.), each with their own administrative, descriptive, and technical metadata. All this information demands precise and conscientious management before, during, and after a project is created and collected.


4. EXAMPLE

The Pompeii Forum Project (PFP) is a good case study for examining these issues. PFP is a collaborative research effort in archaeology and the history of urban design. It aims to provide the first systematic documentation of the architecture and decoration of the Forum, interpret evidence as it pertains to Pompeii's urban history, and make wider contributions to both the history of urbanism and contemporary problems of urban design. The primary resources in the site are images and analytic information. The images are photos of the physical site, taken by PFP team members during on-site digging and research. The photos have been digitized and marked with identifying information (where and when each photo was taken, what it shows, and so forth). The analytic information consists of field notes and observations.

These two categories of information resources – digital images and textual analysis – and metadata about the resource files must be stored and maintained. The information is linked to the physical forum by a virtual structural hierarchy which copies the physical model. The model starts with the whole forum at the top of the hierarchy, then works its way down through buildings, columns, and individual walls. This arrangement is convenient but it creates small corners and deadends for isolated pieces of information. What might seem an intuitive arrangement to the project authors may frustrate users and libraries. Best practices need to be established observed to help the library maintain and track information.

4.1. Catalogs

The PFP catalogs hold digital resource files and metadata about the resources. Each catalog has two sections, headed by one <cathead> and at least one <res> element. A <cathead> element holds information about the catalog itself, such as project and file descriptions and any revisions made to the catalog entries. A <res> holds technical, administrative, and descriptive information about a specific resource. A <res> can also have a pointer to a digital resource (inside a <resptr> element) or textual commentary/observation in a <rescon> element (Figure 1).

A <res> can hold either a <resptrgrp> collection of pointers to digital resources or a <rescon> text resources. A <rescon> cannot point to a digital resource. This keeps text and digital resources segregated.

Figure 1: Basic catalog structure and metadata distribution

PFP splits these up into two separate catalogs: the image catalog holds information about and pointers to image objects (via <resptrgrp>) and the observation catalog holds textual observations and commentary about the forum and related topics via <rescon> elements. The data inside the observation catalog are probably not directly related to any of the pictures (they don’t describe the pictures, how they were taken, or what they show) but can be connected to the photos via pointers. The observation catalog is a convenient catch-all for descriptive and interpretive information about a wide variety of subjects. Information that falls outside a “full text” reading is a hazy category, since it is metadata and should be discoverable but shouldn’t displace primary materials. Observations shouldn’t become a junk room where things get tangled up and lost.

Note that in PFP the image catalog does not use <rescon> elements. This is to avoid extensive semantics (as might be included in a <rescon>) in the image catalog.

4.2. Physical model

The model of the physical forum is more complicated, since it contains the project’s intellectual activity. The physical model holds information about the physical site and analyzes the site, using the resources stored in the catalogs. A hierarchy of <div> elements organizes the information around the forum’s physical structure. Each <div> holds metadata about itself and the particular area of the Forum that it covers.

There are three main sections in a <div> (figure 2).

Figure 2: Physical model metadata distribution

  • The <divhead> holds metadata about the <div>, but not about the resources or information in the <div>.
  • The <divdesc> contains descriptive metadata about the part of the physical model encompassed by the current <div>.
  • Each <resgrp> holds metadata about one or more resources. This does not include information about the digital files or the resources themselves but analyses of the resources and their import at the catalog level. Each resource is in a <res> element, which holds administrative, technical, and descriptive information about the resource’s intellectual content. The <source> element stores descriptive metadata about the resource’s original source [***digital or otherwise?***].

Figure 3, below, shows how the physical model connects <div>s and resources. The <relation> element links to other parts of the physical model. As of right now, PFP’s <relation> tags do not point to the catalogs. Instead, all pointers to catalog resources are in <resptr> elements. Similarly, only <relation> elements are used to point to other parts of the physical model. This is not required by the GDMS DTD but it may help avoid confusion.

Figure 3: Basic physical model structure

As in the catalogs, a <res> can hold either a <resptrgrp> or a <rescon>.


5. ORGANIZING METADATA

To avoid utter chaos, PFP must identify and follow best practices for categorizing different kinds of metadata. The rest of this paper discusses different aspects of this problem and possible solutions.

5.1. Wall 13

Let’s consider information about an imaginary wall 13 in the Pompeii Forum, which is located near the equally imaginary building 5 and column 15. To avoid confusion, let us assume that the wall is not actually part of the building, but is adjacent to it.

Figure 4, below, shows that the observation catalog contains field note 44, which says that wall 13 is next to building 5 and lines up with column 15. Note that this information could also be stored in the <div>, inside <relation> elements [***check***]. The observation catalog will also note any mentions of the wall in journal articles and books and discussions of wall 13’s greater meaning in the Forum site. The <div> for this particular wall in the physical model describes the wall (height, length, color, etc.) and its location. The image catalog has a <res> with information about a picture that shows the wall and a pointer to the digital image file.

The <res> in the wall 13 <div> would probably hold a list of pointers to all catalog resources that are mentioned in the <divdesc> or in the observation catalog.

Figure 4: Information about wall 13

5.2. Relationships

Probably the most important, and most difficult, aspect of a project like PFP is handling relationships between resources. This would include simple relations such as “this wall stands next to that wall” and more complex relations that involve groups of resources. There are two parts to this problem. One is how to tag a relationship between resources (i.e., how to actually point to another resource) and the other is how to most clearly organize information about these relationships (the information may be in image files, journal articles, field notes, and assorted other metadata in the catalogs and <div>s).

Resources that have some kind of relationship can be connected by an include, in which other information (whether in a catalog or another <div>) is pulled into the current <div>, or by a pointer, which records the information’s location (essentially, “it’s over there” or “look over in yonder catalog”). The <resinc>, <catinc>, and <divinc> tags all act as includes and can use idrefs or Xpaths to tell the system where to find the desired information1.

But before considering how to implement relationships, we’ll take a brief look at the relationships themselves. The simplest kind begins with an object (O) that exhibits an attribute (A) in a manner specified by a value (V). This forms the triple (O, A, V), sometimes referred to as an associative triple. It could be diagrammed as something like figure 5a, right.

Figure 5: binary relations

An example of an associative triple is (Mary, eye-color, blue), which says that “Mary has blue eyes.” Textually it could be recorded as a two-place predicate EC(Mary, blue), with the predicate “EC” representing eye-color. This could in turn be diagrammed as shown in figure 5b.

The situation becomes slightly more complex when “value” is a specific object in its own right. For example, Mary might have the “attribute” mother_of with the “value” Sue. The triple (Mary, mother_of, Sue) may seem to be more clearly a relationship between two objects than the (Mary, eye_color, blue) triple. Figure 5c shows a possible diagram of the relationship between Mary and Sue. This is often referred to as a binary relationship, since two objects are involved.

We could also record this relationship from Sue’s point of view, as (Sue, mother_is, Mary). Note that that the name of the relationship (i.e., the attribute “mother_is”) must be changed to read properly in English, but the two relationships form a symmetric pair (figure 5d). If Mary and Sue are siblings, the single sibling_of relationship appears more immediately symmetric (figure 5e).

However, notice that if we record this as a triple (Mary, sibling_of, Sue), as the predicate SO(Mary, Sue), or in a diagram (the top half of figure 5e) there is an explicit order: all these are from Mary’s point of view. That may be an unintended consequence of the representation scheme and may have strong implications for the possible access to the information.

We will return to these implications, but first let us turn to a more elaborate scenario involving three objects, called a three-ary relation. If we give Mary and Sue another sibling, Joe, the predicate becomes a three-place predicate, SO(Mary, Sue, Joe).

Figure 6

The associative triple representation is not directly applicable to the three-ary relation, but we can use a pair of triples, (Mary, sibling_of, Sue) and (Mary, sibling_of, Joe), as diagrammed in figure 6. The access implications are clear, in that Sue and Joe’s sibling relationship is not recorded but must be computed. If we want to make Sue and Joe’s relationship explicit in the symmetric relations discussed above, the complexity of the representation increases greatly (figure 7).

Figure 7

If one of the objects changes or moves all of the objects must be informed, creating a possible maintenance problem. If we add another object into this picture (figure 8, below), the pictures quickly becomes more complex. If all the objects are related to each other there are now twelve relationships. This is not necessarily a problem, depending on the particular situation. In a model such as PFP it might severely limit the user’s ability to move through the project.

Figure 8

Something like this could be implemented in the PFP model with the <relation> tags. Figure 9 shows how this might work to inform wall 13 of its relations with building 5, column 15, and field note 44.

Figure 9: implementing binary relations in PFP

Note that the diagram shows that in the <div>-<div> relationships are symmetrical but while wall 13 knows about field note 44, the note doesn’t have a <relation> pointer back to the wall (<relation> elements aren’t allowed in <rescon>s).

The three-ary relation in figure 7 involves six arcs, and if we generalize this to a relation among n objects, called an n-ary relation, the number of arcs can be described as n(n-1) or “of order n”. This alternative representation requires only 2n arcs (“of order n”) but slightly more computation in order to access the node. N-ary relations are not laid out from any one object’s point of view, as the binary relations shown in figure 6 do, and do not impose an explicit order among the objects.

In this scenario the relationship is separated from the objects and treated like a quasi-object in its own right. Using the same example as in figure 7, we get the diagram shown in figure 10. The dashed circle is the new mode for the relationship. Here, only the arcs from the relationship to the objects (shown as solid lines in figure 10) are required. That means that the objects are not directly aware of the relationship or of any other objects in the relationship. If desired, the objects can have pointers back to the relationship, shown in figure 10 as dashed lines.

Figure 10: n-ary relationship based on figure 7

If we add the mother_of relation shown in figure 8, the predicate MO(Jill, Mary, Sue, Joe), we get the diagram shown in figure 11a. Note that this diagram doesn’t explicitly indicate that Jill is the mother (but then neither does the predicate representation, which relies on the order of the arguments and a separate specifiction that the first argument is the mother). This information can be made explicit by arc labels, as shown in figure 11b. This allows unordered, ordered, and mixed relations and avoids the order bias shown in figure 6 and the complexity shown in figure 7.

Figure 11a: two n-ary relationships based on figure 8

Figure 11b: n-ary relationships with arc labels

Figure 12, below, shows how an n-ary relationship could be used in PFP. Wall 13, building 5, and column 15 are aligned. The “alignment” relationship object has pointers to all three objects, with “role” attributes to further define the objects’ relationship.

Figure 12: n-ary relationships in PFP

This raises the question of how the relation node in figure 12 could be recorded in a GDMS structure. In many cases, making the node part of any one of the objects involved in the relationship would introduce an order bias (as in figure 6). One possible set-up is shown in figure 13.

Figure 13: n-ary relationships in PFP <cat>s and <div>s

The data in the <rescon> does not necessarily qualify as metadata but might better be called data. Whether it’s called one or the other doesn’t necessarily affect where it is stored or how it is tagged, but it may be a potential source of great confusion for authors and libraries. [***need to expand this idea further***]

5.3. Tracking internal and external relations

Relationships between internal and external resources complicate the situation further. If we want to refer to metadata in both internal and external resources, we need to differentiate between pointers to objects inside the project’s immediate boundaries (the DOM, for a specific GDMS structure) and outside (the larger repository and the outside world). The respository is the library’s repository space (although it could theoretically encomposs other repositories). The wild is everything outside the repository, as shown in figure 14. Objects inside a DOM can be found via idrefs. Objects in the repository have PIDs, and can be found with either Xpaths or a disseminator.2 Objects in the wild will probably be found via URLs.

Figure 14: boundaries

Figure 15 shows a possible way to handle external and internal resource relationships. This uses extended pointers (<extptr>) to hold information about the relationship between multiple objects. The information and links to all of the objects are kept in the observation catalog in a <rescon> element. A textual explanation of the relationship can also be kept in the <rescon>. Or, as in figure 13, a “role” attribute could be used to type the relationships between each (e.g., “role=’is_next_to’”). XLink might be used in this setting.

Figure 15: tracking internal and external relations in PFP

Note that in this diagram the objects themselves do not know about the relationship. This can be rectified via some kind of back pointer pointing from the objects to the observation catalog or the objects can have binary relationships pointing directly to the objects themselves. Figure 16, below, shows how pointers between objects can be used to note that there is a relationship without repeating the information in the observation catalog. Theoretically, different kinds of pointers would help differentiate between internal and external pointers, such as <ptr> for internal pointers, <pidptr> for pointers to the repository, and <urlptr> for pointers to the outside world.

Figure 16

Another alternative is to dispatch with the <extptr> in the <rescon> and use <ref>s to create binary relationships between each pair of related objects (e.g., href addresses). We might use a pair of <ref> elements between a pair of DOM objects, a <pidref> for DOM-repository pairs, and a <urlref> for DOM-wild pairs.

There is no one best practice for this issue, since each project will have its own particular pattern of relationships. Of course, if a project isn’t disciplined about how it organizes and tracks relationships there is an excellent chance that information will be duplicated, contradicted, and lost. Best practices and good technical support can help avoid this problem.


6. APPENDIX

A-1. File organization at ingestion

The exact mechanisms for moving a project from the workspace to the repository are still being worked out. This will undoubtedly involve GDMS, FEDORA, METS, etc. and require complex negotions to settle on a practical and realistic system. To start the discussion, we should consider what the workspace will present to the repository and what the repository’s final result will be. Figure 17 shows one possible set-up.

Figure 17

In this arrangement, the workspace contains a single GDMS file. Upon ingestion, the file is split up into five GDMS files, one each for the main sections (figure 18). One possible variation on this might be if an entire project is handled as a single FEDORA object, with the main GDMS files linked together as datastreams. There may be further FEDORA objects inside these initial datastreams.

Figure 18

A-2. PFP minimum element set

A-2.1. Catalog

The image catalog in a GDMS document or group of documents acts as a pool of resources to the larger project. Descriptive information about the “real world” object (i.e., what the picture shows) is in <div> tags elsewhere. The library requires a minimum set of adminstriative, technical, and descriptive information to construct individual image objects in the repository. The minimal information for each individual resource in the catalog includes information about the original source for the digital file and the digital object’s history and provenance (if any).

A-2.2. Header

The library requires the following elements in the GDMS header.

<gdmsid> 
	<system> 
		gdsm file url 
	</system> 
</gdmsid>

<filedesc> 
	<pubStmt> 
		<title>
			Catalog of Pompeii resources 
		</title> 
	
		<agent type="creator" form="persname" role="author"> 
			Name of GDMS author 
		</agent> 

		<series> 
			The project that owns the GDMS file 
		</series> 

		<place type="publication">
			University of Virginia, Charlottesville, VA
		</place> 

		<time type="creation"> 
			<date> 
				The date of creation – YYYYMMDD 
			</date> 
		</time>

	</pubStmt> 
</filedesc>

This must be included in the header of every GDMS catalog file.

A-2.3. Resource description

A catalog resource description must include information about the resource and its digital equivalent (if the object wasn’t born digital). The digital version is considered a surrogate for the original resource (i.e., a photograph, a painting, or a drawing), and information about the two items needs to be kept distinct in the GDMS file.3 The distinction becomes more important when tracking administrative, technical, and, to some extent, descriptive metadata.

The library’s minimum set of metadata includes who created the resource, when it was created, where it was created, technical details about the digital file, a brief textual description of the image, and a URL for the digital file. In cases where the two sets of information can be confused, as when recording a resource’s creator, the information’s context becomes important. To help clarify this, information about the original resource is recorded inside the <source> element. The <digiprov> tag holds records of any transformation or migrations of the digital object file after the creation date. The <adminrights> and <technical> tags record information about the digital objects.

-- begin cathead info --
<cat> 
	<cathead> 
		-- A <cathead> is used to distinguish between 
		several catalogs in one file and to help call 
		catalogs across files. -- 
		<filedesc></filedesc>
		<projectdesc></projectdesc> 
		<profiledesc></profiledesc>
		<revisiondesc></revisiondesc> 
	</cathead> 
-- end cathead info --
-- begin digital image info --
<res> 
[an attribute will be added here to note if object 
has digital surrogate or not – D. McShane will do] 
	<agent 
		type="contributor" 
		form="persname/corpname" 
		role="digitizer">
		Name of person who digitized the image 
	</agent>
	<time type="reformat"> 
		<date> 
		The date of digitization of the original 
		image (original photo or whatever)
		</date> 
	</time> 
	<place type="published">
		--Use if the image was acquired from a source 
		other than the PFP staff.--  
		<geogname> 
			Place of publication
		</geogname> 
		<date> 
			Publication date 
		</date> 
	</place> 
	<rights type="unrestricted|educational use|restricted"> 
		Statement of use and accessibility of a resource 
	</rights> 
	<digiprov>
		-- Use to record any transformations or 
		migrations undergone by files composing the 
		digital object subsequent to the initial 
		digitization of an item or, in the case of 
		born digital materials, the files' creation. -- 
	</digiprov> 
-- end digital image info --
-- begin original photo info --
<source> 
	<agent 
		type="creator/provider" 
		form="persname/corpname" 
		role="photographer"> 
		Creator or provider of image
	</agent> 

	<time type="creation"> 
		<date> 
			The date of creation for the original image 
		</date> 
	</time>

	<title type="main"> 
		Title for the image
	</title> 
	--or-- 
	<description type="summary"> 
		A description of the image, i.e. what it is 
	</description>

	<physdesc type="extent" units="inches"> 
		Enter size of original photograph, i.e. 3X5 
	</physdesc> 
	<mediatype type="image"> 
		<form> 
		--optional element to further qualify mediatype-- 
		</form> 
	</mediatype> 
</source>
-- end original photo info -- 
-- begin digital image administrative and 
technical info – 
<adminrights> 
	-- Not used right now. Cornell must determine how to 
	resolve user profile (permissions) with resource 
	permissions. -- 
</adminrights>

<technical> 
	-- You may want to look at the definitions for these 	
	at: http://dl.lib.virginia.edu/html/admin/ to determine 
	how many you want to require. Most of the following are 
	empty elements by design. The values are stored in the 
	attributes. I didn't include attributes for most, you 
	should have a look and determine what you want. --
	<image>
		<compression 
			method="method of compression" 
			amount="amount of compression" /> 
		<format> 
			<segment />
			<planar_config></planar_config> 
			<orientation />
		</format> 
		<spatialmetrics> 
			<dimensions />
			<sampling_frequency /> 
		</spatialmetrics> 
		<energetics>
			<sample /> 
			<color_map /> 
			<gray_response /> 
			<chromaticities /> 
		</energetics> 
	</image> 
</technical> 

<resptrgrp>
	<resptr 
		targetype="mimetype of the image" 
		href="URL location of image file" /> 
	-- The attributes on this tag are required. 
	One possible point of confusion, though, is that we might 
	need to distinguish between internal and external pointers. 
	The current consensus is that external pointers can use 
	hrefs and internal pointers can use idrefs. This will 
	probably change. -- 
</resptrgrp> 
-- end digital image administrative and technical info --


Footnotes

1. [back] The choice between idrefs and Xpaths is fairly important, although it is not something that the user should ever have to think about. The GDMSTool can generate unique idrefs for objects it creates and Xpaths are by nature unique. Xpaths are (theoretically) cheaper to use, since the system doesn’t have download the entire project in order to find the object attached the specified id.

2. [back] It may be preferable to use PIDs plus a disseminator or Xpath for objects in the DOM after ingestion, since objects are automatically assigned PIDs at ingestion. There are pros and cons on each side, but one big concern with idrefs is the processor cost of rebuilding the DOM every time you need to trace an idref.

3. [back] This leads to the question of what happens if there is no need for a surrogate, when the file is born digital (from a digital camera, for example). The most obvious solution would be to simply leave out the <source> tag or leave it empty, but that can lead to confusion about whether source information is just missing or doesn’t exist. It also means that descriptive information might not be tagged or discovered correctly (since the minimum element set here assumes that this information is inside <source>). An alternative might be to add some kind of attribute or element to <res> or <source> that can indicate that there is no surrogate and that therefore will not be a <source>. This, however, means that the DTD is being changed to enforce current best practices, which is perhaps not desirable.


© 2002 IATH at the University of Virginia. All rights reserved.