| |
Metadata and the Collection Object
Table of Contents
- 4.1. Catalogs
- 4.2. Physical model
- 5.1. Wall 13
- 5.2. Relationships
- 5.3. Tracking internal and external relations
- A-1. File organization at ingestion
- A-2. PFP minimum element set
- A-2.1. Catalog
- A-2.2. Header
- A-2.3. Resource description
1. INTRODUCTION
- Collection Object: an aggregation of digital objects
where the aggregate form has intellectual content beyond its specific digital
objects and the simple membership set.
The definition above is clearly a direct analogy to an edited volume
(e.g., the collected works of Kilgore Trout). Such edited volumes have several
components:
- a representation of each primary work,
- core binding structures (e.g. table of contents, index, and
credit for original works), and
- editorial content (e.g., preface to the volume and maybe to
each work, ordering and groupings of the works [as evidenced in the TOC and
maybe in more elaborate discussions], and substantial interpretations of the
primary works and their interrelations).
A central repository will have metadata for each digital object and
an overall index of its digital objects (possibly in both an administrative and
a user form). Those two aspects of a central repository imply that the central
repository will subsume any collection object that does not
include editorial content (as described above).
The primary value of a collection object in a digital repository
derives from information in editorial content information. For the purposes of
this discussion, that information can be split into two subcomponents:
- 3a. interpretation: the editorial content
- 3b. metadata: descriptive information to aid discovery
The interpretation section might well be another digital object,
like the original works of the primary collection, or a smaller work (such as a
preface).
Our main interest in this discussion concerns the metadata
subcomponent. One critical observation is that metadata provides both an
opportunity and a responsibility to establish metadata in context. The context
here is a particular collection’s point of view and interests. For
example, a collection focused on religious scholarship and a collection about
historical construction techniques may refer to the same building, but they
will use different metadata to indicate the conceptual function the building
holds in each context.
Note that the term "metadata" here means information specifically
recorded to enable user discovery, perhaps via a library-wide index, an
"information community" index, or a project (collection) index.
2. CONTROLLING METADATA
The Supporting Digital Scholarship (SDS) Technical Committee has
been investigating the technical issues associated with transforming
born-digital materials (as exemplified by several IATH projects) into library
resources. A major aspect of this transformation is coagulating a
project’s materials into a coherent whole. For the purposes of this
discussion we will consider that whole to be a collection object that comprises
the materials for the project. Note that a project may already have its own set
of collection objects, which become components of the overall collection
object.
[***This still needs a decent transition to move from
collection objects to MD***] Metadata can encompass a wide variety of
information about a work or collection, its resources, and its connections to
other networked collections. It is critically important to the administrative,
scholarly, and technical spheres of influence inside and outside an individual
work and any plan for collecting digital scholarship must seriously and
thoroughly consider this topic. However, it is a slippery subject and is
difficult to define, collect, and manage. How can an author tell the difference
between data and metadata? How can a collector know what metadata a project
should have? How can a library conveniently and reliably identify and use
metadata from every project it aquires?
The technical committee has been discussing ways to represent and
organize metadata for several months. There is no generic solution, since while
there are some types of information that all collection objects or projects
produce each project will have its own particular combination of information
that it needs to store and track. An archeology project, for example, might be
primarily interested in mapping information against a physical location, while
a literary or historical project might want to track information against a
timeline or an intellectual construct.
There are some ways to control the information. The GDMS tools we
are developing allow a great deal of flexibility and creativity in collecting,
segregating, holding, and discovering different types of metadata. We are also
pursuing a basic division between catalogs of resources and
intellectual/physical model, which helps authors distribute information in a
relatively straightforward structure.
3. INFORMATION OVERLOAD
Metadata can feed on itself, producing an apparently endless flow of
recursive definition and explanation. This can easily overwhelm and drown an
otherwise well-organized project. The project creator and the collecting
library both need to guard against too much of a good thing.
- Too much information
One benefit of on-line scholarship is that a project can
build and use enormous amounts of multimedia information. A single project can
encompass all of William Blake’s collected works, a cathedral’s
structural history, or the intimate details of life in the U.S. Civil War. It
may hold thousands of individual resources (images, texts, audio files, etc.),
each with their own administrative, descriptive, and technical metadata. All
this information demands precise and conscientious management before, during,
and after a project is created and collected.
- Too many contexts
One benefit of on-line scholarship is that a project can
build and use enormous amounts of multimedia information. A single project can
encompass all of William Blake’s collected works, a cathedral’s
structural history, or the intimate details of life in the U.S. Civil War. It
may hold thousands of individual resources (images, texts, audio files, etc.),
each with their own administrative, descriptive, and technical metadata. All
this information demands precise and conscientious management before, during,
and after a project is created and collected.
- No single vision to unite workspace and
repository
One benefit of on-line scholarship is that a project can
build and use enormous amounts of multimedia information. A single project can
encompass all of William Blake’s collected works, a cathedral’s
structural history, or the intimate details of life in the U.S. Civil War. It
may hold thousands of individual resources (images, texts, audio files, etc.),
each with their own administrative, descriptive, and technical metadata. All
this information demands precise and conscientious management before, during,
and after a project is created and collected.
- Nonstandard projects
One benefit of on-line scholarship is that a project can
build and use enormous amounts of multimedia information. A single project can
encompass all of William Blake’s collected works, a cathedral’s
structural history, or the intimate details of life in the U.S. Civil War. It
may hold thousands of individual resources (images, texts, audio files, etc.),
each with their own administrative, descriptive, and technical metadata. All
this information demands precise and conscientious management before, during,
and after a project is created and collected.
4. EXAMPLE
The Pompeii Forum Project (PFP) is a good case study for examining
these issues. PFP is a collaborative research effort in archaeology and the
history of urban design. It aims to provide the first systematic documentation
of the architecture and decoration of the Forum, interpret evidence as it
pertains to Pompeii's urban history, and make wider contributions to both the
history of urbanism and contemporary problems of urban design. The primary
resources in the site are images and analytic information. The images are
photos of the physical site, taken by PFP team members during on-site digging
and research. The photos have been digitized and marked with identifying
information (where and when each photo was taken, what it shows, and so forth).
The analytic information consists of field notes and observations.
These two categories of information resources – digital images
and textual analysis – and metadata about the resource files must be
stored and maintained. The information is linked to the physical forum by a
virtual structural hierarchy which copies the physical model. The model starts
with the whole forum at the top of the hierarchy, then works its way down
through buildings, columns, and individual walls. This arrangement is
convenient but it creates small corners and deadends for isolated pieces of
information. What might seem an intuitive arrangement to the project authors
may frustrate users and libraries. Best practices need to be established
observed to help the library maintain and track information.
-
4.1. Catalogs
The PFP catalogs hold digital resource files and metadata about
the resources. Each catalog has two sections, headed by one <cathead> and
at least one <res> element. A <cathead> element holds information
about the catalog itself, such as project and file descriptions and any
revisions made to the catalog entries. A <res> holds technical,
administrative, and descriptive information about a specific resource. A
<res> can also have a pointer to a digital resource (inside a
<resptr> element) or textual commentary/observation in a <rescon>
element (Figure 1).
A <res> can hold either a <resptrgrp> collection of
pointers to digital resources or a <rescon> text resources. A
<rescon> cannot point to a digital resource. This keeps text and digital
resources segregated.
 |
|
Figure 1: Basic catalog structure and metadata
distribution
|
PFP splits these up into two separate catalogs: the image catalog
holds information about and pointers to image objects (via <resptrgrp>)
and the observation catalog holds textual observations and commentary about the
forum and related topics via <rescon> elements. The data inside the
observation catalog are probably not directly related to any of the pictures
(they don’t describe the pictures, how they were taken, or what they
show) but can be connected to the photos via pointers. The observation catalog
is a convenient catch-all for descriptive and interpretive information about a
wide variety of subjects. Information that falls outside a “full
text” reading is a hazy category, since it is metadata and should be
discoverable but shouldn’t displace primary materials. Observations
shouldn’t become a junk room where things get tangled up and lost.
Note that in PFP the image catalog does not use <rescon>
elements. This is to avoid extensive semantics (as might be included in a
<rescon>) in the image catalog.
-
4.2. Physical model
The model of the physical forum is more complicated, since it
contains the project’s intellectual activity. The physical model holds
information about the physical site and analyzes the site, using the resources
stored in the catalogs. A hierarchy of <div> elements organizes the
information around the forum’s physical structure. Each <div> holds
metadata about itself and the particular area of the Forum that it covers.
There are three main sections in a <div> (figure 2).
 |
|
Figure 2: Physical model metadata distribution
|
- The <divhead> holds metadata about the <div>,
but not about the resources or information in the <div>.
- The <divdesc> contains descriptive metadata about the
part of the physical model encompassed by the current <div>.
- Each <resgrp> holds metadata about one or more
resources. This does not include information about the digital files or the
resources themselves but analyses of the resources and their import at the
catalog level. Each resource is in a <res> element, which holds
administrative, technical, and descriptive information about the
resource’s intellectual content. The <source> element stores
descriptive metadata about the resource’s original source
[***digital or otherwise?***].
Figure 3, below, shows how the physical model connects
<div>s and resources. The <relation> element links to other parts
of the physical model. As of right now, PFP’s <relation> tags do
not point to the catalogs. Instead, all pointers to catalog resources are in
<resptr> elements. Similarly, only <relation> elements are used to
point to other parts of the physical model. This is not required by the GDMS
DTD but it may help avoid confusion.
 |
|
Figure 3: Basic physical model structure
|
As in the catalogs, a <res> can hold either a
<resptrgrp> or a <rescon>.
5. ORGANIZING METADATA
To avoid utter chaos, PFP must identify and follow best practices
for categorizing different kinds of metadata. The rest of this paper discusses
different aspects of this problem and possible solutions.
-
5.1. Wall 13
Let’s consider information about an imaginary wall 13 in the
Pompeii Forum, which is located near the equally imaginary building 5 and
column 15. To avoid confusion, let us assume that the wall is not actually part
of the building, but is adjacent to it.
Figure 4, below, shows that the observation catalog contains field
note 44, which says that wall 13 is next to building 5 and lines up with column
15. Note that this information could also be stored in the <div>, inside
<relation> elements [***check***]. The observation catalog will also note
any mentions of the wall in journal articles and books and discussions of wall
13’s greater meaning in the Forum site. The <div> for this
particular wall in the physical model describes the wall (height, length,
color, etc.) and its location. The image catalog has a <res> with
information about a picture that shows the wall and a pointer to the digital
image file.
The <res> in the wall 13 <div> would probably hold a
list of pointers to all catalog resources that are mentioned in the
<divdesc> or in the observation catalog.
 |
|
Figure 4: Information about wall 13
|
-
5.2. Relationships
Probably the most important, and most difficult, aspect of a
project like PFP is handling relationships between resources. This would
include simple relations such as “this wall stands next to that
wall” and more complex relations that involve groups of resources. There
are two parts to this problem. One is how to tag a relationship between
resources (i.e., how to actually point to another resource) and the other is
how to most clearly organize information about these relationships (the
information may be in image files, journal articles, field notes, and assorted
other metadata in the catalogs and <div>s).
Resources that have some kind of relationship can be connected by
an include, in which other information (whether in a catalog or another
<div>) is pulled into the current <div>, or by a pointer, which
records the information’s location (essentially, “it’s over
there” or “look over in yonder catalog”). The <resinc>,
<catinc>, and <divinc> tags all act as includes and can use idrefs
or Xpaths to tell the system where to find the desired information1.
But before considering how to implement relationships, we’ll
take a brief look at the relationships themselves. The simplest kind begins
with an object (O) that exhibits an attribute (A) in a manner specified by a
value (V). This forms the triple (O, A, V), sometimes referred to as an
associative triple. It could be diagrammed as something like figure 5a, right.
 |
|
Figure 5: binary relations
|
An example of an associative triple is (Mary, eye-color, blue),
which says that “Mary has blue eyes.” Textually it could be
recorded as a two-place predicate EC(Mary, blue), with the predicate
“EC” representing eye-color. This could in turn be diagrammed as
shown in figure 5b.
The situation becomes slightly more complex when
“value” is a specific object in its own right. For example, Mary
might have the “attribute” mother_of with the “value”
Sue. The triple (Mary, mother_of, Sue) may seem to be more clearly a
relationship between two objects than the (Mary, eye_color, blue) triple.
Figure 5c shows a possible diagram of the relationship between Mary and Sue.
This is often referred to as a binary relationship, since two objects are
involved.
We could also record this relationship from Sue’s point of
view, as (Sue, mother_is, Mary). Note that that the name of the relationship
(i.e., the attribute “mother_is”) must be changed to read properly
in English, but the two relationships form a symmetric pair (figure 5d). If
Mary and Sue are siblings, the single sibling_of relationship appears more
immediately symmetric (figure 5e).
However, notice that if we record this as a triple (Mary,
sibling_of, Sue), as the predicate SO(Mary, Sue), or in a diagram (the top half
of figure 5e) there is an explicit order: all these are from Mary’s point
of view. That may be an unintended consequence of the representation scheme and
may have strong implications for the possible access to the information.
We will return to these implications, but first let us turn to a
more elaborate scenario involving three objects, called a three-ary relation.
If we give Mary and Sue another sibling, Joe, the predicate becomes a
three-place predicate, SO(Mary, Sue, Joe).
 |
|
Figure 6
|
The associative triple representation is not directly applicable
to the three-ary relation, but we can use a pair of triples, (Mary, sibling_of,
Sue) and (Mary, sibling_of, Joe), as diagrammed in figure 6. The access
implications are clear, in that Sue and Joe’s sibling relationship is not
recorded but must be computed. If we want to make Sue and Joe’s
relationship explicit in the symmetric relations discussed above, the
complexity of the representation increases greatly (figure 7).
 |
|
Figure 7
|
If one of the objects changes or moves all of the objects must be
informed, creating a possible maintenance problem. If we add another object
into this picture (figure 8, below), the pictures quickly becomes more complex.
If all the objects are related to each other there are now twelve
relationships. This is not necessarily a problem, depending on the particular
situation. In a model such as PFP it might severely limit the user’s
ability to move through the project.
 |
|
Figure 8
|
Something like this could be implemented in the PFP model with
the <relation> tags. Figure 9 shows how this might work to inform wall 13
of its relations with building 5, column 15, and field note 44.
 |
|
Figure 9: implementing binary relations in PFP
|
Note that the diagram shows that in the <div>-<div>
relationships are symmetrical but while wall 13 knows about field note 44, the
note doesn’t have a <relation> pointer back to the wall
(<relation> elements aren’t allowed in <rescon>s).
The three-ary relation in figure 7 involves six arcs, and if we
generalize this to a relation among n objects, called
an n-ary relation, the number of arcs can be described as n(n-1) or “of order
n”. This alternative
representation requires only 2n arcs (“of order
n”) but slightly more computation in order to
access the node. N-ary relations are not laid out from any one object’s
point of view, as the binary relations shown in figure 6 do, and do not impose
an explicit order among the objects.
In this scenario the relationship is separated from the objects
and treated like a quasi-object in its own right. Using the same example as in
figure 7, we get the diagram shown in figure 10. The dashed circle is the new
mode for the relationship. Here, only the arcs from the relationship to the
objects (shown as solid lines in figure 10) are required. That means that the
objects are not directly aware of the relationship or of any other objects in
the relationship. If desired, the objects can have pointers back to the
relationship, shown in figure 10 as dashed lines.
 |
|
Figure 10: n-ary relationship based on figure
7
|
If we add the mother_of relation shown in figure 8, the predicate
MO(Jill, Mary, Sue, Joe), we get the diagram shown in figure 11a. Note that
this diagram doesn’t explicitly indicate that Jill is the mother (but
then neither does the predicate representation, which relies on the order of
the arguments and a separate specifiction that the first argument is the
mother). This information can be made explicit by arc labels, as shown in
figure 11b. This allows unordered, ordered, and mixed relations and avoids the
order bias shown in figure 6 and the complexity shown in figure 7.
 |
|
Figure 11a: two n-ary relationships based on figure
8
|
 |
|
Figure 11b: n-ary relationships with arc
labels
|
Figure 12, below, shows how an n-ary relationship could be used
in PFP. Wall 13, building 5, and column 15 are aligned. The
“alignment” relationship object has pointers to all three objects,
with “role” attributes to further define the objects’
relationship.
 |
|
Figure 12: n-ary relationships in PFP
|
This raises the question of how the relation node in figure 12
could be recorded in a GDMS structure. In many cases, making the node part of
any one of the objects involved in the relationship would introduce an order
bias (as in figure 6). One possible set-up is shown in figure 13.
 |
|
Figure 13: n-ary relationships in PFP <cat>s and
<div>s
|
The data in the <rescon> does not necessarily qualify as
metadata but might better be called data. Whether it’s called one or the
other doesn’t necessarily affect where it is stored or how it is tagged,
but it may be a potential source of great confusion for authors and libraries.
[***need to expand this idea further***]
-
5.3. Tracking internal and external relations
Relationships between internal and external resources complicate
the situation further. If we want to refer to metadata in both internal and
external resources, we need to differentiate between pointers to objects inside
the project’s immediate boundaries (the DOM, for a specific GDMS
structure) and outside (the larger repository and the outside world). The
respository is the library’s repository space (although it could
theoretically encomposs other repositories). The wild is everything outside the
repository, as shown in figure 14. Objects inside a DOM can be found via
idrefs. Objects in the repository have PIDs, and can be found with either
Xpaths or a disseminator.2 Objects in the
wild will probably be found via URLs.
 |
|
Figure 14: boundaries
|
Figure 15 shows a possible way to handle external and internal
resource relationships. This uses extended pointers (<extptr>) to hold
information about the relationship between multiple objects. The information
and links to all of the objects are kept in the observation catalog in a
<rescon> element. A textual explanation of the relationship can also be
kept in the <rescon>. Or, as in figure 13, a “role” attribute
could be used to type the relationships between each (e.g.,
“role=’is_next_to’”). XLink might be used in this
setting.
 |
|
Figure 15: tracking internal and external relations in
PFP
|
Note that in this diagram the objects themselves do not know about
the relationship. This can be rectified via some kind of back pointer pointing
from the objects to the observation catalog or the objects can have binary
relationships pointing directly to the objects themselves. Figure 16, below,
shows how pointers between objects can be used to note that there is a
relationship without repeating the information in the observation catalog.
Theoretically, different kinds of pointers would help differentiate between
internal and external pointers, such as <ptr> for internal pointers,
<pidptr> for pointers to the repository, and <urlptr> for pointers
to the outside world.
 |
|
Figure 16
|
Another alternative is to dispatch with the <extptr> in the
<rescon> and use <ref>s to create binary relationships between each
pair of related objects (e.g., href addresses). We might use a pair of
<ref> elements between a pair of DOM objects, a <pidref> for
DOM-repository pairs, and a <urlref> for DOM-wild pairs.
There is no one best practice for this issue, since each project
will have its own particular pattern of relationships. Of course, if a project
isn’t disciplined about how it organizes and tracks relationships there
is an excellent chance that information will be duplicated, contradicted, and
lost. Best practices and good technical support can help avoid this problem.
6. APPENDIX
-
A-1. File organization at ingestion
The exact mechanisms for moving a project from the workspace to
the repository are still being worked out. This will undoubtedly involve GDMS,
FEDORA, METS, etc. and require complex negotions to settle on a practical and
realistic system. To start the discussion, we should consider what the
workspace will present to the repository and what the repository’s final
result will be. Figure 17 shows one possible set-up.
 |
|
Figure 17
|
In this arrangement, the workspace contains a single GDMS file.
Upon ingestion, the file is split up into five GDMS files, one each for the
main sections (figure 18). One possible variation on this might be if an entire
project is handled as a single FEDORA object, with the main GDMS files linked
together as datastreams. There may be further FEDORA objects inside these
initial datastreams.
 |
|
Figure 18
|
-
A-2. PFP minimum element set
-
A-2.1. Catalog
The image catalog in a GDMS document or group of documents acts
as a pool of resources to the larger project. Descriptive information about the
“real world” object (i.e., what the picture shows) is in
<div> tags elsewhere. The library requires a minimum set of
adminstriative, technical, and descriptive information to construct individual
image objects in the repository. The minimal information for each individual
resource in the catalog includes information about the original source for the
digital file and the digital object’s history and provenance (if
any).
-
A-2.2. Header
The library requires the following elements in the GDMS
header.
<gdmsid>
<system>
gdsm file url
</system>
</gdmsid>
<filedesc>
<pubStmt>
<title>
Catalog of Pompeii resources
</title>
<agent type="creator" form="persname" role="author">
Name of GDMS author
</agent>
<series>
The project that owns the GDMS file
</series>
<place type="publication">
University of Virginia, Charlottesville, VA
</place>
<time type="creation">
<date>
The date of creation – YYYYMMDD
</date>
</time>
</pubStmt>
</filedesc>
This must be included in the header of every GDMS catalog file.
-
A-2.3. Resource description
A catalog resource description must include information about
the resource and its digital equivalent (if the object wasn’t born
digital). The digital version is considered a surrogate for the original
resource (i.e., a photograph, a painting, or a drawing), and information about
the two items needs to be kept distinct in the GDMS file.3
The distinction becomes more important when tracking administrative, technical,
and, to some extent, descriptive metadata.
The library’s minimum set of metadata includes who created
the resource, when it was created, where it was created, technical details
about the digital file, a brief textual description of the image, and a URL for
the digital file. In cases where the two sets of information can be confused,
as when recording a resource’s creator, the information’s context
becomes important. To help clarify this, information about the original
resource is recorded inside the <source> element. The <digiprov>
tag holds records of any transformation or migrations of the digital object
file after the creation date. The <adminrights> and <technical>
tags record information about the digital objects.
-- begin cathead info --
<cat>
<cathead>
-- A <cathead> is used to distinguish between
several catalogs in one file and to help call
catalogs across files. --
<filedesc></filedesc>
<projectdesc></projectdesc>
<profiledesc></profiledesc>
<revisiondesc></revisiondesc>
</cathead>
-- end cathead info --
-- begin digital image info --
<res>
[an attribute will be added here to note if object
has digital surrogate or not – D. McShane will do]
<agent
type="contributor"
form="persname/corpname"
role="digitizer">
Name of person who digitized the image
</agent>
<time type="reformat">
<date>
The date of digitization of the original
image (original photo or whatever)
</date>
</time>
<place type="published">
--Use if the image was acquired from a source
other than the PFP staff.--
<geogname>
Place of publication
</geogname>
<date>
Publication date
</date>
</place>
<rights type="unrestricted|educational use|restricted">
Statement of use and accessibility of a resource
</rights>
<digiprov>
-- Use to record any transformations or
migrations undergone by files composing the
digital object subsequent to the initial
digitization of an item or, in the case of
born digital materials, the files' creation. --
</digiprov>
-- end digital image info --
-- begin original photo info --
<source>
<agent
type="creator/provider"
form="persname/corpname"
role="photographer">
Creator or provider of image
</agent>
<time type="creation">
<date>
The date of creation for the original image
</date>
</time>
<title type="main">
Title for the image
</title>
--or--
<description type="summary">
A description of the image, i.e. what it is
</description>
<physdesc type="extent" units="inches">
Enter size of original photograph, i.e. 3X5
</physdesc>
<mediatype type="image">
<form>
--optional element to further qualify mediatype--
</form>
</mediatype>
</source>
-- end original photo info --
-- begin digital image administrative and
technical info –
<adminrights>
-- Not used right now. Cornell must determine how to
resolve user profile (permissions) with resource
permissions. --
</adminrights>
<technical>
-- You may want to look at the definitions for these
at: http://dl.lib.virginia.edu/html/admin/ to determine
how many you want to require. Most of the following are
empty elements by design. The values are stored in the
attributes. I didn't include attributes for most, you
should have a look and determine what you want. --
<image>
<compression
method="method of compression"
amount="amount of compression" />
<format>
<segment />
<planar_config></planar_config>
<orientation />
</format>
<spatialmetrics>
<dimensions />
<sampling_frequency />
</spatialmetrics>
<energetics>
<sample />
<color_map />
<gray_response />
<chromaticities />
</energetics>
</image>
</technical>
<resptrgrp>
<resptr
targetype="mimetype of the image"
href="URL location of image file" />
-- The attributes on this tag are required.
One possible point of confusion, though, is that we might
need to distinguish between internal and external pointers.
The current consensus is that external pointers can use
hrefs and internal pointers can use idrefs. This will
probably change. --
</resptrgrp>
-- end digital image administrative and technical info --
Footnotes1. [back] The choice between idrefs and Xpaths is
fairly important, although it is not something that the user should ever have
to think about. The GDMSTool can generate unique idrefs for objects it creates
and Xpaths are by nature unique. Xpaths are (theoretically) cheaper to use,
since the system doesn’t have download the entire project in order to
find the object attached the specified id.
2. [back] It may be preferable to use PIDs plus a
disseminator or Xpath for objects in the DOM after ingestion, since objects are
automatically assigned PIDs at ingestion. There are pros and cons on each side,
but one big concern with idrefs is the processor cost of rebuilding the DOM
every time you need to trace an idref.
3. [back] This leads to the question of what
happens if there is no need for a surrogate, when the file is born digital
(from a digital camera, for example). The most obvious solution would be to
simply leave out the <source> tag or leave it empty, but that can lead to
confusion about whether source information is just missing or doesn’t
exist. It also means that descriptive information might not be tagged or
discovered correctly (since the minimum element set here assumes that this
information is inside <source>). An alternative might be to add some kind
of attribute or element to <res> or <source> that can indicate that
there is no surrogate and that therefore will not be a <source>. This,
however, means that the DTD is being changed to enforce current best practices,
which is perhaps not desirable.
© 2002 IATH at the University of Virginia. All rights reserved.
|