harpur

  • Increase font size
  • Default font size
  • Decrease font size
Home Transcription Guidelines

Transcription Guidelines

E-mail Print PDF

Contents

0. Introduction
1. Transcribing Plain Text
    1.1 Tools
    1.2 Naming Files
    1.3 Each Work is a Single Item
    1.4 What is Transcribed
        1.4.1 Carriage Returns
        1.4.2 Indentation
        1.4.3 Mistakes, Misspellings and Abbreviations
    1.5 Layers
        1.6 The Default Method
        1.6.1 An Example from A87-1 'These Poems'
    1.7 Exceptions to the Default Method
        1.7.1 Connecting Variants By Sense or Grammar
        1.7.2 Discontinued Cancellations
        1.7.3 Open Alternatives
2. Special Features
    2.1 Comments by the author
    2.2 Dividing Lines
    2.3 Signatures
    2.4 Illegible Text
    2.5 Underlined Text
    2.6 Dashes
    2.7 Poetry Embedded in Prose
3. A Complete Example

0. Introduction

A manuscript containing corrections can be very complicated. This document describes a method for transcribing even the most complex manuscript fairly easily. The first part describes the general technique using plain text, and the second part shows, by means of clear examples, how to encode Special Features using simple markup. The third part shows a complete short work as an example of how it should be transcribed. Although these Guidelines are basically stable, they may be improved and extended in the light of experience.

1. Transcribing Plain Text

1.1 Tools

You will need a plain text editor and photocopies of the original documents. Save your work in UTF-8 encoding, sometimes called 'Unicode' format. If the editor supports word-processor formats, save as 'text only', with the ".xml" suffix. A good editor to use in Windows is Crimson Editor available at http://www.crimsoneditor.com/ (save as UTF-8 without 'BOM'), or you can use Wordpad (not Notepad). On MacOSX you can use Textedit or BBEdit, and on Linux GEdit.

1.2 Naming Files

Transcription files should be named by concatenating three components with hyphens:

  1. An abbreviated form of the name, with all spaces removed. For example, the poem 'The Cloud' could have the identifier 'thecloud'.
  2. The manuscript id from which the transcription was taken, e.g. 'A88'
  3. The number of the layer, e.g. '1'

The layer number is preceded by the underscore character and is followed by '.txt' as the file type. So the full name of the file in this case would be: 'thecloud-A88_1.txt'.

1.3 Each Work is a Single Item

A manuscript frequently contains many works, such as individual poems. Each version of a separate work should be assigned a separate file. This is because the same work may appear in a various collections or individually, and each version of it needs to be compared one against the others.

1.4 What is Transcribed

The physical layout of the text, e.g. what colour of ink it is, where it is on the page, insertion marks, etc. are in general not recorded. These can be seen better in a facsimile. However, text formats such as underlining are recorded as described in the Special Features section below.

1.4.1 Carriage Returns

Carriage returns are significant in poetic texts, but start a new line in the transcription regardless of whether the manuscript is poetry or prose. In the case of a paragraph break, start a new line and also insert blank lines if there are blank lines in the text. For example:

come at once to the matter I have in hand.

I wish to try a publication in England, and to this end would as for trespass upon your goodness

1.4.2 Indentation

Indentation is always preserved, in poetry and prose. If a line starts with a number of spaces, insert five spaces at the start of that line in the transcription.

In the above example the transcription of layer 1 should read:

Or stamp as with a mighty engine stroke
On many a bold Stripling of my race,
A character, whose strength shall hold all base

1.4.3 Mistakes, Misspellings and Abbreviations

Mistakes and abbreviations in the manuscripts are simply transcribed as they are. Don't correct the text.

1.5 Layers

A document containing corrections can be divided into layers, each of which represents a single readable version. A layer can be distinguished by pen-colour, by a style of handwriting, by the Default Method, or by sense and grammar.

1.6 The Default Method

The Default Method starts from the original text on the baseline. The transcriber simply enters the text as it was originally written, ignoring all subsequent changes. The markup for any hard-to-read sections is described in the Special Features section below.

For each subsequent correction in each independent part of the text, the layer number of the replacement text is given by its edit-distance from the baseline. For example, if the author wrote 'quickly' and crossed it out and wrote 'swiftly' above it, then crossed this out and wrote 'rapidly' above that, then the first layer is 'quickly', the second is 'swiftly' and the third is 'rapidly'. Using this method the last version at a particular place in the text is also that of all subsequent versions. For example:

Layer 1: I went to the shop to buy some bread.
Layer 2: I went to the shop to get some cigarettes.
Layer 3: I went to the shop to get some grapes.

Here 'get' belongs to layers 2 and 3 but 'bread' to 1, 'cigarettes' to 2 and 'grapes' to layer 3. The remaining text belongs to layers 1, 2 and 3.

1.6.1 An Example from A87-1 'These Poems'

The first layer reads:

Volubly sounding upon many a hill
Yet to be famous, and by lake or rill

The second layer reads:

Volubly sounding upon many a hill
Yet to be famed, and by lake, river and rill

So this produces two files: thesepoems-A87-1-1.txt and thesepoems-A87-1-2.txt. Note that:

  • The graphical appearance of the text is not recorded.
  • The insertions marks (^) are not recorded.
  • Cancellations are not recorded.

This information is contained in the facsimile, or is implied by the layers.

1.7 Exceptions to the Default Method

1.7.1 Connecting Variants By Sense or Grammar

When two variants within one sentence are connected by sense or grammar, encode them in the same layer, even if the default method says they belong to different layers. For example:

Layer 1: I went to the shop to buy some bread.
Layer 2: I went to the shop to buy some biscuits.
Layer 3: I went to the library to borrow a book.

In this case the Default Method says that 'shop' is replaced by 'library' as a second-level correction, whereas it is actually connected by sense to the third-level correction 'buy some biscuits' -> borrow a book'. In this case the Default Method must be overridden so that the correct version 3 results.

1.7.2 Discontinued Cancellations

Often, an author will start to write something, but then cross it out in mid-sentence, and continue the new text on the same base line. In such cases assign all such discontinued cancellations to a separate layer called layer 0. In manuscripts where there are no such corrections there will simply be no layer 0. At the end of the discontinued cancellation append the characters '...', with a preceding space if it ended in a whole word, and no space if it ends in mid-word. A discontinued cancellation is not followed by any text until the start of the next sentence. If more than one discontinued cancellation is contained within a sentence, then add the intervening text so that only the last such cancellation ends without any following text. So in the case:

we would transcribe layer 0 as:

that there may be amongst s... many specimens of translation, ab...

And layer 1 as:

that there may be amongst them many specimens of translation, and from several tongues.

So layer 1 should be always the first readable version.

1.7.3 Open Alternatives

It is possible, although no examples have yet come to light in Harpur, that an alternative may be supplied by the author but the original text is not crossed out. In this case create a separate layer for open alternatives and assign all such variants to that layer. We'll have to create some convention for naming such versions so they can be correctly interpreted by the software.

2. Special Features

Unfortunately, not all features can be recorded via plain text. In many cases additional information must be supplied in a transcription via markup. The form of markup used in the Harpur transcriptions is a very much simplified form of TEI-XML. Eventually it is hoped that such markup can be removed entirely from the text, although this is not possible at present.

XML markup uses tags delimited by angle-brackets (). Tags may be paired.

  • A start-tag contains a left-angle bracket followed by the tag-name followed by a number of attributes and then a closing angle bracket. e.g. the start tag div type="poem" is the start tag of a poem.
  • An end tag consists of a left-angle bracket, a forward-slash (/) and a closing angle-bracket. e.g. /div ends a division.
  • An attribute consists of a name, and equals sign and a quoted value, e.g. type="double".
  • If a pair of tags have no content then they are merged. In this case only one tag appears with the forward-slash at the end of the tag, before the closing angle-bracket. e.g. pb n="23"/, which is the markup for 'page 23 starts here'.

The examples below should make this clearer.

2.1 Comments by the author

Harpur often writes 'final copy' at the end of his texts. This, and other such comments about the text at the end can be recorded with the tags colophonfinal copy./colophon.

2.2 Dividing Lines

The lines dividing sections should be recorded. There are three types, single, double, flourish. Some examples and their marked-up forms are:

rule type="single"/

rule type="double"/

rule type="flourish"/

2.3 Signatures

Signatures at the end of letters are to be encoded using the standard TEI-tags, e.g.:

signatureI remain, dear Sir
Your very humble servant
Chas Harpur/signature

2.4 Illegible Text

Illegible text should be recorded in its appropriate layer marked with the TEI unclear tag. The approximate text should be entered if possible, or if not the approximate number of unclear characters should be entered as asterisks (*):

unclearAdvertisement/unclear

2.5 Underlined Text

There appear to be only two types: single or double.

This should be recorded as hi rend="underlined"suffering/hi (Even if this is meant to represent italicised text when printed). Rendering is separate from recording what is there.

This should be recorded as hi rend="double underlined"S/hitripling (Even if this means 'capitalise the S')

2.6 Dashes

Dashes should be rendered as EM-dashes. Harpur seems to prefer the long or EM-dash over the EN-dash. This character is available in the Unicode character set and can be typed.

2.7 Poetry Embedded in Prose

Set apart poetry embedded in prose using the div type="poem" tag.

alt

Which should be transcribed as:

without looking it in the mouth. But to our imitative specimen: 
div type="poem"headLord Potather/head
rule type="single"
There's the great Lord Potather,
So lofty of stature, ...

3. A Complete Example

alt

This is a relatively difficult example from the first image of MS A98-1, containing torn segments of text that are hard to read in the facsimile. Although this is a manuscript, the same principles apply to edited newspaper cuttings and typescripts. Only the central area of the page contains text by Harpur, so the pencil at the top: 'Charles Harpur' and the bit at the bottom: 'D.J. Mitchell Bought' etc can be omitted. An examination of the text reveals several levels of correction:

  1. A layer containing discontinued cancellations, e.g. in line 6: 'Where I ...'.
  2. The text on the baseline after discontinued cancellations.
  3. Another layer of correction by Harpur in dark ink, for example the correction in the last line of 'object' to 'refuge'.
  4. A final layer of correction by Harpur in light ink, e.g. replacement of 'With which' by 'Of'.

The Default Method can be overridden here because the layers are clearly identified by ink colour. The unclear readings could probably be clarified by a closer examination of the MS, and by comparison with any other surviving versions. Roughly transcribed the four layers are as follows:

Layer 0 (file trustingod-A98-1_0.txt)

headTrust in God./head
rule type="single"/
Deep trust in God! — for that I still have sought
Through all the dim doubts that beshade the soul,
When, in the amazement of far-reaching thought,
We list the laborings that for ever unclearroll/unclear
Their thundrous wheels within that clouded launclearnd/unclear
Where I ...
Wherewith Time's mortal unclear*****/unclear is fraught
And when I've stood upon some fearful stage
Of Speculation, that did weave its base
And rugged ridge into the nebulous air
Of endless change, and thence tremendously
Through its dark shadow, like a unclearblend/unclear manunclear***/unclear
Into the dread Unknown, — deep trust in thee,
O God, hath been my object even thereunclear***/unclear
rule type="single"/

Layer 1 (file trustingod-A98-1_1.txt)

headTrust in God./head
rule type="single"/
Deep trust in God! — for that I still have sought
Through all the dim doubts that beshade the soul,
When, in the amazement of far-reaching thought,
We list the laborings that for ever unclearroll/unclear
Their thundrous wheels within that clouded launclearnd/unclear
Where this world's Destiny uncleardost/unclear the secrets keep
Wherewith Time's mortal unclear*******/unclear is fraught
And when I've stood upon some fearful stage
Of Speculation, that did weave its base
And rugged ridge into the nebulous air
Of endless change, and thence tremendously
Through its dark shadow, like a unclearblend/unclear manunclear***/unclear
Into the dread Unknown, — deep trust in thee,
O God, hath been my object even thereunclear***/unclear
rule type="single"/

Layer 2 (file trustingod-A98-1_2.txt)

headTrust in God./head
rule type="single"/
Deep trust in God! — for that I still have sought
Through all the dim doubts that beshade the soul,
When, in the amazement of far-reaching thought,
We list the laborings that for ever unclearroll/unclear
Their thundrous wheels within that clouded launclearnd/unclear
Where this world's Destiny the secrets keep
With which Time's mortal unclear*******/unclear is fraught
And when I've stood upon some fearful stage
Of Speculation — heaving up its base
And rugged ridge into the nebulous air
Of endless change, and thence tremendously
Through its shadow, like a unclearblend/unclear manunclear***/unclear
Into the dread Unknown, — Deep trust in thee,
O God, hath been my refuge even thereunclear***/unclear
rule type="single"/

Layer 3 (file trustingod-A98-1_3.txt)

headTrust in God./head
rule type="single"/
Deep trust in God! — for that I still have sought
Through all the dread doubts that beshade the soul,
When, in the amazement of far-reaching thought,
We list the laborings that for ever unclearroll/unclear
Their thundrous wheels within those clouded reuncleargions/unclear
Where Night and Destiny the counsels keep
Of Time, developing uncleartheir shadowy legacy/unclear.
And when I've stood upon some fearful stage
Of Speculation, heaving up its base
And rugged ridge into the nebulous air
Of endless change, and thence tremendously
Through its shadow, like a unclearblend/unclear manunclear***/unclear
Into the dread Unknown, — Deep trust in thee,
O God, hath been my refuge even thereunclear***/unclear
rule type="single"/
Last Updated on Saturday, 02 January 2010 21:48