Manuscript Transcription Database
Institute for Advanced Technology in the Humanities
University of Virginia

1.     About

2.     Prerequisites

3.     Downloading and installing MTD

3.1.       Editing index.html

3.2.       Editing Tools.inc

3.3.       Setting up the msstranscription database in MySQL

3.3.1.       From the command line

3.3.2.       From an administrative interface

3.3.3.       Editing the database and creating a user

3.4.       Restricting access

4.     Using MTD

4.1.       MTD home page

4.2.       Identifying users

4.3.       Editor Switchboard: managing the list of editors

4.3.1.       Add an editor

4.3.2.       Changing or deleting an editor

4.4.       Editor Interface: the editorial workspace

4.4.1.       Document hierarchy

4.4.2.       Document switchboards

4.4.3.       Adding items to the hierarchy

4.4.4.       Changing the list of transcribers

4.4.5.       Assigning work to a transcriber

4.4.6.       Status report

4.4.7.       Uploading final transcriptions

4.5.       Transcriber interface: transcriber workspace

4.5.1.       Starting and completing assignments

4.5.2.       Viewing other transcriber’s completed transcriptions

5.     Known bugs

1. About

The Manuscript Transcription Database (MTD) was originally built by Category4 Design in Charlottesville, VA, for Benjamin Ray’s Salem Witch Trials Documentary Archive and Transcription Project (http://www.iath.virginia.edu/salem/) at the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH). The MTD was subsequently adapted and generalized for Kenneth Price and Ed Folsom’s The Walt Whitman Archive (http://www.iath.virginia.edu/whitman/), another IATH research project.

Both of these projects rely on skilled transcribers to read and transcribe handwritten manuscripts into an electronic format. The editors and transcribers involved in these projects are literally all over the world (from Finland to Nebraska), and they need to assign and view tasks, upload and download documents, coordinate workflow, and collect finished work. MTD was built in response to that need. It uses a central database and a set of editor and transcriber GUIs to organize and track documents and assignments. Editors can assign pages or documents to particular transcribers and see the current status of individual assignments, groups of documents, and the entire project. Transcribers can see their current assignments, upload assignments, download finished work, and check their work against other transcriptions.

IATH has made MTD publicly available for download and installation.

2. Prerequisites

MTD is a web-based application, and to use it, you must have (or have access to) a web server. We recommend the Apache web server, which is available for free download from http://httpd.apache.org/. MTD is written in PHP, so you must run it in conjunction with a web server that includes the PHP module. You can download the PHP module (and many other useful things, like PHPMyAdmin, a PHP-based administrative interface for MySQL databases) from http://www.php.net/.  If you have problems with the PHP installation, consult the PHP documentation (http://www.php.net/docs.php) or your webmaster.   

MTD is an interface—a front end—to a database, and it expects that database to be MySQL, so you must have MySQL running on the server that will deliver MTD. You can download MySQL for free (see http://www.mysql.com/). Follow the installation instructions. If you have questions or problems, consult MySQL’s documentation (http://www.mysql.com/documentation/index.html).

MTD has been tested with MySQL 3.23.41 and PHP 4.0.6 compiled into Apache 1.3.22. We are confident that MTD will run with other release versions of these packages but we cannot guarantee that it will run any and all versions or combinations. Your operating system’s type and version should be irrelevant, as MySQL, PHP, and Apache can be run under all major operating systems. For the record, however, we run it under Solaris 5.8. MTD is free and unsupported, but we are glad to have comments and suggestions from users (send them to mtd@www.iath.virginia.edu). 

3. Downloading and installing MTD

These instructions assume that you already have a web server that supports PHP and a copy of MySQL installed, as outlined above.

Download the mss.tar.gz from http://www.iath.virginia.edu/mss/mss.tar.gz and save it to your local file system. If necessary, move the file to the server and directory where you will install MTD. The default location, in the configuration files that MTD ships with, is immediately under the document root of the web server (for example, on a Unix system with a standard Apache installation, the document root would be in /usr/local/apache/htdocs).

Unzip and untar the file. The file was created with GNU tar and gzip. If you are working in a Unix system and have gunzip (part of the GNU gzip package), you can run:

gunzip mss.tar.gz | tar -xvf mss.tar

If you are working on an NT system, a zip utility such as Winzip should be able to do both operations for you. This should create a single directory called "mss" in your current directory or folder. 

Once unpacked, the mss directory contains two files and four subdirectories. The two files are:

  • index.html: the starting page for the MTD
  • mysql.dump: contains SQL instructions to create the database and its tables in your local installation of MySQL

The four directories are:

  • php-bin: contains the scripts that regulate access to the database
  • cgi-lib: contains Tools.inc, the only file that you will need to edit in order to configure the installation for your site
  • cgi-templates: contains html files used in building the web pages
  • cgi-upload: where uploaded documents will be stored (the owner of the web server process must have writing rights to this directory)

To finish the installation process, you must edit the index.html and Tools.inc files, set up the MySQL database, and create any desired passwords and use restrictions.

3.1. Editing index.html

You will need to edit the links in the index.html file in the mss directory to point to the appropriate web server. The default URL for links to the Transcriber login, Editors login, Add an editor, and Browse the collections has www.yourwebserver.edu as the domain. Open index.html in your favorite HTML editor and change the links to point to the proper URL. For example, if you were running the mss package on the www.great-stuff.org web server in the directory, you would change

<a href=http://www.yourwebserver.edu/mss/php-bin/>

to

<a href=http://www.great-stuff.org/mss/php-bin/>

Be sure that this path is complete and correct. You may of course edit anything else you like in this page, or simply incorporate the edited links into some other start page. Note that these links won’t work until you finished the installation process, so don’t be alarmed if you try them and you get back an error message.

3.2. Editing Tools.inc

In the cgi-lib directory, open the Tools.inc file in your favorite text editor.

If you are new to scripting languages, please be careful in editing this file. Don’t introduce any new line breaks and don’t edit anything outside the quotation marks in any of these lines. 

You first need to set the correct values for the URL and base directory that the database will use. Edit lines 17 and 18 of the file to show the proper values:

$this->url = "http://www.yourwebserver.edu/";

$this->base_dir = "/usr/local/apache/htdocs/mss";

For example, if you are installing MTD on www.great-stuff.org, you should edit line 17 to read:

$this->url = "http://www.great-stuff.org/";

If you have a standard Apache installation, and installed the mss directory in the document root of that directory, you can leave line 18 as is:

$this->base_dir = "/usr/local/apache/htdocs/mss";

but if your mss directory is not in /usr/local/apache/htdocs, you should edit line 18 to provide a full literal system path to the mss directory.

Lines 20-23 in Tools.inc configure the package to provide the correct arguments when connecting to the MySQL database for the user name, password, and database name. The default setting is:

$this->connection_args = array(
                      username  => "username",
                      password  => "password",
                      database  => "msstranscription",

The mysql.dump file will create a database named "msstranscription", so do not change line 23 unless you plan to change that database name.

If you do not change the username and password values, the username and password for connecting to that database will be "username" and "password" (those literal values).[*]If you haven’t yet set up the database, set these to whatever values you want (you will probably want to establish a unique username/password within MySQL for the msstranscription database anyway).

3.3. Setting up the msstranscription database in MySQL

These instructions assume that MySQL is already set up and that you have access to it, with permission to create new databases within the MySQL installation. If any of those things are not true, these instructions won’t work.

3.3.1. From the command line

First, you must create a new database called "msstranscription". Go to the MySQL bin directory (e.g., /usr/local/mysql/bin) and run mysqladmin. For example (assuming that you’re using a password):

./mysqladmin -p create msstranscription

You should next run mysqlimport to import tables and data from the mysql.dump file. For example:

./mysqlimport -p msstranscription /usr/local/apache/httpd/mysql.dump

3.3.2. From an administrative interface

If you are using an administrative interface such as PHPMyAdmin, open the web page through which you administer the installation of MySQL. You can then create a new database called "msstranscription" (type that name carefully, exactly as given), paste the text of the mysql.dump file in the SQL query window on the next page you see, and submit the query.

3.3.3. Editing the database and creating a user

You should now have eleven tables in your database and three data items: a test transcriber (initials for MTD login are "npt"), a test editor (initial for MTD login are "aka"), and a configuration item giving the terms to be used in the MTD interface for transcriber and transcription. You can change these to something else (such as "scribe" and "document") and the change will be picked up at various points in the web pages the MTD user sees. 

You now need to create a user, set that user’s password, and give the user select, insert, update, and delete privileges with respect to the msstranscription database. Please refer to your MySQL documentation for instructions on how to do that; if you are using an administrative interface like PHPMyAdmin, you may be able to figure it out by looking at existing entries in the "mysql" database within your installation, and specifically at the "db" and "user" tables within that database. If you did not edit lines 21 and 22 in the Tools.inc file, you must call this user "user" and set its password to "password".

3.4. Restricting access

Access to the database itself—either through the MTD pages or outside of them—is handled by MySQL and depends on the username, password, and permissions in the Tools.inc file, as discussed above in section 3.3. The Tools.inc file stores this information in an unencrypted file and provides that information for all MTD connections to the MySQL database, so you will probably want a second and even a third layer of access protection. 

By default, the MTD installation does not come with any access restriction: if you want to restrict access to any or all of the web pages for the MTD, you can use htaccess files. These should be described in your web server documentation or ask your system administrator.

We have some suggestions for how you can use htaccess files in standard Apache installations. In the standard Apache installation, access to directories, subdirectories, and files can be restricted via a .htaccess file (note the leading dot in that filename) in the restricted directory.  The file looks something like this:

AuthUserFile /usr/local/apache/htdocs/mss/php-bin/authorized/password.txt 
AuthGroupFile /dev/null
AuthName "manuscript transcriptions"
AuthType Basic 
<Limit GET> 
require user editor
</Limit>

This means that only the "editor" user has access to the directory. The AuthUserFile (password.txt) contains a username and encrypted password separated by a colon and no spaces, looking something like this:

editor:Xmw5fi0YU3i

The user must sign in as "editor" and provide the proper password. Use htpasswd to produce this file and the encrypted password. 

Alternately, you can use a slightly different .htaccess file:

AuthUserFile /usr/local/apache/htdocs/mss/users.txt
AuthGroupFile /dev/null
AuthName ByPassword
AuthType Basic
<Limit GET>
require valid-user
</Limit>

where the AuthUserFile (user.txt, in this case) contains a list of users and encrypted passwords, one per line, separated by a colon and no spaces, like this:

transcriber:Cm0fi5X84MXw
editor:Xmw5fi0YU3i

This means that any user with a valid user-id and password (i.e., one that matches the list in user.txt) has access to the directory.

The first type of htaccess strategy is useful when you want to limit access to a particular user, as you might in the mss/php-bin/authorized directory (which provides the pages that allow editors to manage the database). The second strategy is useful if you want to provide access to a set of users. You can have one htaccess file per directory.

If the "editor" user referred to in the .htaccess file in mss/php-bin/authorized is also listed as a valid user for mss, then someone logging in as "editor" will have access to any subdirectory in mss. However, a user who is valid in the .htaccess file in mss/php-bin/authorized but not listed as a valid user for mss will not be able to access the managerial pages. You can use this to limit transcriber access to the editorial part of the MTD.

4. Using MTD

MTD has separate sections for editors and transcribers. An editorial interface lets editors assign documents to transcribers. Transcribers can then see their assignments on a transcriber interface and download digital images of manuscript pages, make transcription files, and upload the finished work onto the database server. Editors can track the progress of documents with the Status Report and edit checked in transcriptions.

4.1. MTD home page

The default home page is shown below in Figure 1.

Figure 1: Home page

These links let you login as a transcriber or an editor, add editors, and browse through the current collection(s) of documents. Transcribers and editors do not see the same information: transcribers can only see documents that have been assigned to them.

Please note that there is no default password on the Add an editor link, so anyone can add or delete editors and change an existing editor’s identifying information unless an htaccess password file is added to the mss/php-bin/authorized directory (see section 3.4).

4.2. Identifying users

All editors and transcribers are identified in the database by their initials. When adding users you must assign each user a set of initials (some one to six character alphanumeric string). Case does not matter. A new MTD comes with a default editor and transcriber: the default transcriber’s initials are "npt" and the default editor’s initials are "aka".

4.3. Editor Switchboard: managing the list of editors

Click Add an Editor on the home page. This takes you to the Administrator Menu. Click Add, Edit, and Delete Editors to go to the Editor Switchboard (Figure 2).

Figure 2: Editor Switchboard

The Editor Switchboard lists the names, initials, and e-mail addresses of all editors in the database. Click on a name to view or change an editor’s identifying information. Click an e-mail address to send an e-mail message.

4.3.1. Add an editor

To add an editor, click the Add an Editor button on the Editor Switchboard. You’ll be asked to give the editor’s name and initials. When you are done, click Insert.   

4.3.2. Changing or deleting an editor

Go to the Editor Switchboard. To delete one or more editors, click the Select box next to the editor’s name(s), then click the Delete Selected Records button.

4.4. Editor Interface: the editorial workspace

Click Editors login on the home page to get to the Editor Interface, shown below in Figure 3. This is the front page of the editor section of the database and the starting point for all of the database’s editorial functions. Editors can add new documents to the database, remove documents, control how the documents are organized, and change the list of transcribers.

Figure 3: Editor Interface

There are three sections in the interface: Documents, Transcribers, and Reports. Documents contains links for organizing your documents in a hierarchy (see section 4.4.1).The Search Documents option will search the hierarchy for a desired name. Transcribers displays the list of transcribers and the list of tasks assigned to each. Reports, Instructions contains status reports and documentation (the instructions are generated from the php-bin/transcriberInstructions.php file. You can customize the instructions for your installation if you are comfortable editing text in PHP).

4.4.1. Document hierarchy

The MTD document hierarchy has five levels, moving from general to specific:

  • A collection comprises several documents in a common location.
  • A document group is a part of a collection or several documents with a common theme.
  • A document is an individual work (a book, a poem, a letter, etc.).
  • A page is a single piece of paper, parchment, or other writing surface, either single-sided or double-sided.
  • A side is one side of a page.

When you are first starting to work in the MTD, you must start with the general and build up to the specific. E.g., you must create a collection before a document group, a group before a document, and so on.

4.4.2. Document switchboards

The document switchboards let you organize your documents according to a series of categories. To get to a switchboard, click on one of the Document links in the Editor Interface. Each switchboard contains a list of all current items in that category and buttons to add or delete items or search the lists. Click on an item’s name to view and update its information.

4.4.3. Adding items to the hierarchy

To add, delete, or a new item to the database, go to the Editor Interface and click one of the Add, Edit, and Delete... links. Click the Add button, enter the item’s name, then click Insert. The switchboard should then display the new addition.

New documents must have a unique name (an alphanumeric string used to track it in the database: this is what the transcribers will see in their list of assignments). They must also be assigned to an existing document group and an editor. The title field is optional. The Editor File Name section should be left blank at this stage.

A new page must be associated with an existing document and an on-line image file transcription (note that you must preface the URL with "http://").

A new side must be marked as either verso or recto and be associated with an on-line file (note that you must preface the URL with "http://").

4.4.4. Changing the list of transcribers

Only editors can view and edit this list. Go to the Editor Switchboard and click Add, Edit, and Delete Transcribers. This calls up the Transcriber Switchboard (Figure 4), which displays the list of current transcribers.

Figure 4: Transcriber Switchboard

The Search option will search the names, initials, and e-mail addresses in the list. To delete one or more names from the list, click the box next to the desired name(s) and then click Delete Selected Records. To add new names, click Add a Transcriber.

Click on a transcriber’s name to view and edit his or her current list of assignments and contact information, add notes, and delete assignments. Click Update to save any changes.

4.4.5. Assigning work to a transcriber

To assign a document transcription or change a previous assignment, go to the Editor Interface and click the Add, Edit, Delete, and Assign Documents link. This calls up a list of all documents in the database and their assigned transcribers. 

In the Transcriber(s) field, choose one or more transcriber(s) to assign to the document. Click Update to save any changes.

4.4.6. Status report

The status report (Figure 5) shows the current status of all documents in the database. To see it, go to the Editor Interface and click Status Report.

Figure 5: Status Report

The report sorts by collection, then group, and so on (i.e., from left to right), but you can customize this. Select one or more of the pull-down menus to sort and view specific collections and groups, a particular transcriber’s documents, or documents of a specified status, then click the Sort button (in the far right corner) to sort the documents.

The Document column, third from the left, contains links to an information page for each document. The information page contains links to any pages and sides associated with the documents and any finished transcriptions that have been checked in. An asterisk in the Document column indicates that the editor has uploaded a final version of the transcription to the database.

The Title column (not shown in Figure 5) lists the document’s title.

The Transcriber column lists the assigned transcriber(s) for each document. Click on a transcriber’s name to send him or her an e-mail.

The Status column lists the status of transcribing work on that document. Possible statuses are Not Assigned, Assigned, Checked Out, and Checked In. If a transcription has been checked in, the check-in date will be displayed and you can click on the date to view or download the file.

The Edit option in the last column calls up an update window for a document. This lets you edit a document’s name, title, and document group; add or edit notes about the transcription; make editor and transcriber assignments; and upload final transcriptions.

4.4.7. Uploading final transcriptions

Editors can download transcriber’s finished work via the Status Report to their local file space and make any necessary corrections and edits. Once the document is finished, it can be uploaded back to the database. To do this:

  • Click Edit in the far right column of the Status Report. This opens an Update a Document window.
  • Then go to the bottom of the page to the Editor File Name section. Type in the local file path for the corrected file or use the Browse button to find a path.

The document name in the Status Report will be marked with an asterisk to indicate that the editor has finished revising it.

4.5. Transcriber interface: transcriber workspace

You can click on the Transcriber login option on the home page to see your list of assigned tasks. A sample list is shown below in Figure 6.

Figure 6: Transcriber interface

The first column, Document Name, lists the documents by name (this is not the same as the document title: the document name is assigned by the editor). Click on a document name to see pointers to image files of the assigned pages and sides.

The Date Checked Out column displays the date when you checked out a copy of that assignment from the database server, or, if an assignment has not yet been check out, Check Out.

The Date Checked In column reads either Check In, if the assignment is still in progress, or the date when you uploaded a copy of the finished transcription to the database server.

4.5.1. Starting and completing assignments

Click on a document name to see links to the relevant image transcription files. When you are ready to start working on an assignment, click Check Out in the Date Checked Out column. You can download or view these files (the image files are not actually stored in the database: instead, the database stores a URL of the image file).

To upload a finished assignment, click Check In in the Date Checked In column. You must provide a local file path for the finished file(s). The files are then copied and uploaded to the database server and the check-in date noted in the table. An <edit> option next to the check-in date points to the checked-in file, in case you need to view or replace the uploaded file.

4.5.2. Viewing other transcriber’s completed transcriptions

If multiple transcribers are assigned to the same document, they will be able to see each other’s completed transcriptions. Click on the document name in the assignment list to see any previously checked-in transcriptions of the document.

5. Known bugs

So far, none. If you find any, report them to mtd@www.iath.virginia.edu.



*Not a very wise thing to do, but you know best. [back] 


back to MTD