Darwin Online print transcription policy v.16
The transcriptions are made from scans of historical texts. Apart from the exceptions outlined below content is prepared in accordance with the XHTML 1.0 (Transitional) DTD. The UTF-8 encoding is used for all content.
The texts are transcribed exactly like the original and no printing or original spelling or printing errors are corrected. If any passages are illegible, please type [illegible] and provide a note at the top of the file indicating the number of such passages that occur. Also, at the top of each there is an indication of the kind of transcription that has been performed i.e. double key, OCR etc.
All documents begin and end with the correct heading material.
The dar:id number is derived from the Freeman ('F' or 'A' number) number which is contained in the source image file names. So 1999_Example_F123_001.tiff would become dar:id F123
The value for this line in the document code header is derived from the number of page breaks ( <p dar:class="page">) occur in the document. If there are ten pages then the value should be:
The original book or article will probably have a running title like the sample page below:
VARIATION CHAP. I. 8
No case is on record of a variable being ceasing to be variable under cultivation. Our oldest cultivated plants, such as wheat, still often yield new varieties: our oldest domesticated animals are still capable of rapid improvement or modification.
The running title is the text before or after the page number (i.e. 'VARIATION CHAP. I.'). The page number should be moved in front of the running title and the dar page coding added to the source: <p dar:class="page"> [page] 8 VARIATION CHAP. I.</p>
This coding is used to link the transcribed documents to page images on the website.
If the page number is not printed on the original page, then it is inserted inside the brackets thus: [page 8]
Page numbers are transcribed at the top of the page they belong to, even if the original text has its page numbers at the bottom of the page.
If a page, blank page or diagram occurs which has no number and the page numbering continues uninterrupted after it on subsequent pages then the beginning of the unnumbered page is encoded <p dar:class="page"> [page break]</p>
The pages of a book before page 1 are numbered with Roman numerals thus: [page i], [page ii] or [page] iii and so forth.
Illustrations in books are prepared as jpegs and given file names such as 1999_example_F123_fig01.jpg These images should be downsized to about 100kb.
Fonts (such as Times Roman) are not preserved. Text in reduced size, such as footnote text, is rendered as <p class=style2>
Paragraph centring is preserved, but not paragraph justification.
Superscript, italics, underlining and bold are preserved and rendered according to XHTML 1.0 (Transitional).
Each block of text is treated as a paragraph <p> </p> even if the real paragraph is broken by a page break.
All characters on every page are transcribed. This includes the small signature letters or numbers that sometimes appear at the bottom of pages.
Some documents contain lengthy quotations with marks at the beginning of each line:
"of the document
"will begin with
"these inverted commas."
Only the first and last should be transcribed thus:
"Each line of the document will begin with these inverted commas."
Inverted commas, or quotation marks ( " ) are tagged as " and not “ or other curled marks. The same is true for apostrophes or inverted commas.
Each file is validated as proper XHTML 1.0 (Transitional), spell checked and proof-read twice.
File sizes and names.
The XHTML file names follow the page images such as: 1999_Example_F123_001.tif becomes 1999_Example_F123.html
John van Wyhe
12 May, 2011