Introduction to the Online Variorum of Darwin's Origin of Species

by Barbara Bordalejo

This is a new variorum edition of the six British editions of Darwin's Origin of Species, published between 1859 and 1872. It identifies and presents every change (ranging between alterations of single words to the addition of a whole chapter in 1872) between the six editions.

Morse Peckham published a printed variorum of the Origin of Species in 1959. He called it a Variorum Text, rather than a Variorum Edition.1 As many readers have found, the Peckham Variorum is difficult to use and is frustrating if one attempts to reconstruct the text of any of the editions. In his introduction, Peckham admitted that he did not expect readers who wanted to read the Origin to use his book, which he envisioned as a tool for specialized researchers. This online edition is designed to overcome these shortcomings.

Making this Edition

The comparison between the different editions of the Origin was carried out using the Collate software suite.2 To produce the most accurate collation possible, Collate required that every collateable unit received an individual identifier.

Darwin made a substantial number of changes to each of the editions of the Origin, amounting to around 2,250 changes in sentences (including sentences that were added and those that were removed) and more than 15,000 changes in words or phrases. Therefore the level of detail in the collation required the use of specialized collation software. But such software is not as intelligent as one might wish and it gets thrown off course by long unmatched variation. The solution is to break up the text into smaller units, which can be translated as aligning it by hand. This alignment is achieved through a numbering system. The numerous changes at the sentence level made it necessary to create a system that would allow new sentences to be introduced without creating an overly complicated numbering structure. After some testing, I settled for numbering in the hundreds. For this, I speculated that it was unlikely that Darwin would have rewritten one sentence (as to make it uncollateable) more than 99 times. Following this system, the first paragraph of Chapter I in the 1859 edition was numbered P 100, and its first sentence, correspondingly, is S 100. The second sentence is S 200 and the third S 300, and so on. The paragraphs follow the same pattern: P 200, P 300, P 400, etc. When a different edition introduces a new paragraph, this one receives a number in multiples of ten. For example, the 1869 edition introduces, after the first paragraph (P 100) of Chapter I, two new paragraphs, these are numbered P 110 and P 120. Both of these paragraphs are also present without changes in 1872, but if this edition had introduced a new paragraph between them, this would have been numbered P 111. This structure is also used for sentences and it has worked adequately for this edition. The rest of the encoding in these texts involves the use of standard entities to stand by ü, —, & and other similar characters.

In order to facilitate the search for section titles, they have been included as part of the paragraph that each of them follows. This means that, for example, the subtitle 'Causes of Variability,' introduced at the beginning of Chapter I in the 1866 edition, comes immediately after the paragraph encoding. In this way, the numbering of individual subtitles is solved as they can be numbered S ST, without any further markup.

The Base Text for Collation

Although the original idea was to produce a full historical critical edition of the Origin (including all the manuscript materials, for example), lack of funding has made this so far impossible. However, the base text for collation was established with the production of a historical critical edition as its final aim. The first edition of 1859 was chosen as a source of reference for the numbering of paragraphs and sentences. The base text contains all the paragraphs and sentences as they appear in the first edition of 1859, but it also includes any new ones that were included in later editions, as explained above. When paragraphs or sentences from later editions need to be included in the base text, the included version is the earliest one available. For example, if text first appears in 1866, as is the case of the subtitle 'Causes of Variability,' it is the version present in this edition the one that is included in the base text. It should be possible therefore to extend this system backwards to align segments of the print editions with the manuscripts, to show phrases and sentences moving through the various drafts into the print editions.

The Online Variorum was built using the transcriptions of the Origin of Species, as published on Darwin Online. The original XHTML transcriptions were completely re-encoded using the light encoding system designed for Collate. Once the collations had been carried out, Collate reconverted every file into TEI-XML compliant markup. For publication on Darwin Online the XML was transformed into XHTML. Although it might seem unusual to go from dynamic XML to static XHTML, there are some advantages. Particularly, the site works much faster since the data do not need to be processed at every single point.

Every paragraph and sentence of the Origin was manually numbered in each edition. This makes it possible to see exactly what change is happening at each point, down to the individual word. As impressive as the visualization systems devised by McInerny and Posavec ( or Ben Fry ( are (both also based on the Darwin Online transcriptions) and as neatly as they represent some of the variation in the Origin, they lack this detail.

Was Peckham right?

In the last few years, part of Peckham's work was called into question, particularly by Peter Shillingsburg who took on the task of carrying out a new bibliographical analysis of the editions of the Origin.3 He was surprised at Peckham's positive statement explaining that the first five editions were set up from the same standing type: 'The text, however, was not reset. The printers simply regarded Darwin's corrected copy of the first edition as still another revise. Instead of resetting the whole book, they reset only lines and paragraphs when necessary.' (p. 19)

If Peckham was right, it would have meant that the text was standing for ten years. If nothing else, this is unpractical. It would mean having several formes with type occupying space in the printing house and a large amount of sorts unusable for years. Shillingsburg concludes that 'with the exception of the inner form of gathering X and all of gathering Y of the 1860 edition, each new edition of Darwin's Origin from 1860 to 1869 was a completely reset new edition even though many of the lines begin and end with the same word and have no textual variants.' (p. 221) Moreover, he also refuted the suggestion that the fifth edition was the most accurate. Shillingsburg's hypothesis was that 'each edition was completely reset, albeit line for line.' (p. 226) The Online Variorum should help to evaluate Peckham's work, as well as to better understand the textual history of the Origin.

Peckham's list of changes in the Origin is as follows: 'Of the 3,878 sentences in the first edition, nearly 3000, about 75 per cent, were rewritten from one to five times each. Over 1,500 sentences were added, and of the original sentences plus these, nearly 325 were dropped. Of the original and added sentences there are nearly 7,500 variants of all kinds. In terms of added sentences, the sixth edition is nearly a third as long again as the first.' (p. 9)

Checking these numbers by hand would require the work to be carried out again. The Online Variorum, however, allows easy first-hand analysis of the differences between the texts of the successive editions. And one of our provided tools, VBase, can carry out sophisticated textual searches. The manner in which Peckham carried out his analysis restricts the way variants are presented. For example, it is difficult to assess exactly what was meant that nearly three thousand sentences were rewritten. Does this include all types of changes? Does a sentence that has been re-punctuated receive the status of rewritten?

The Editions

Each of the editions of On the Origin of Species represents a series of changes when compared with the previous or subsequent one. With the Online Variorum the changes show all six at once or only two at once.

Changes from the 1859 to 1860

Peckham described the changes included in the second edition as follows: 'the total number [of changes] in this editions is impressive enough: 9 sentences dropped; 483 rewritten or re-punctuated; 30 added. No chapter was untouched.' (p. 19) In order to produce some comparable results, several searches were carried out following his criteria. The Online Variorum currently counts 692 changes at the word level. At the moment, the collation still includes changes in punctuation, even though I agree with Shillingsburg when he states that Darwin is unlikely to have been responsible for these. In the future, I hope to revise the collations to reflect this fact but, for now, one must bear in mind that changes in punctuation might affect the numbers given. According to my definitions, 24 sentences were dropped and 29 were added. I have defined a sentence as a unit that finishes with a full stop (period) in the 1859 edition. There are some exceptions when only the end of a sentence has been rewritten as to become uncollateable. In such case, I have subdivided the second part of the sentence using letters to differentiate versions. Titles and subtitles are also counted as such units.

Changes from 1860 to 1861

For the 1861 edition, Peckham stated that Darwin '…dropped 33 sentences, altered 617 and added 266, together 14 per cent of the total number of variations, while the second edition had only 7 per cent. The text was 35 pages longer than in the two previous editions, and the 'Historical Sketch' added six and a half pages in smaller type.' (p. 20)

The Online Variorum gives 1,479 changes at word level and 267 sentences added and 54 dropped. This includes the twenty-one paragraphs, amounting to 72 sentences, found in the 'Historical Sketch.'

Changes from 1861 to 1866

Darwin's revisions increased in each edition. Peckham stated that '…[t]he fourth edition was the most extensively revised yet, containing 21 per cent of the total number of variants. Darwin dropped 36 sentences, rewrote 1073 and added 435, although in his new table of differences between the third and fourth editions he listed only 34 passages. He added two pages to the 'Historical Sketch' and fifty two pages to the text. An important structural change involved the addition of a number of new sub-headings within the chapters, and the change from the former place at the beginning of paragraphs to a position centered above paragraphs.' (p. 21)

For these two chapters, the changes at word level amount to 2,877; 507 sentences were added and 112 were dropped. At this point, the changes reported by the Online Variorum deviate from Peckham's count.

Changes from 1866 to 1869

It was in the fifth edition that 'Darwin used the famous phrase, taken from Spencer, 'Survival of the Fittest,' and it was the most extensively revised edition yet - indeed, if we except the bulk of the extra chapter added in the sixth edition, the most extensively revised of all. It contains 29 per cent of the total number of variants: 178 sentences dropped, 1770 altered, and 227 added.' (p. 22)

It is evident, even in a manual comparison, that 1869 is the edition with the most changes (that is if we do not take into account the additional chapter which was included in 1872). There are 6,319 changes at word level, more than double than the previous edition. In the fifth edition, Darwin also added 351 and deleted 261 sentences. The relative drop in the number of changes at sentence level is probably related to the increasing amount of revisions at word level, although more research is required before a firm statement can be made about this.

Changes from 1869 to 1872

For the sixth edition, '…[t]he first word, 'On,' was dropped from the title. The considerations of objections were taken from Chapter IV and placed with new material, chiefly rebutting Mivart's attacks, in a new Chapter VII. Thus the old Chapters VII through XIV were renumbered VIII through XV… Including the new material on Mivart, the new edition had more variants than any of the previous five. Excluding that, it has fewer than the fifth but considerably more than the fourth. Darwin dropped 63 sentences, rewrote 1,669, and added 571. As in the fifth, hundreds of sentences were completely recast with only slight changes in meaning, the cumulative effect of which, however, was of great importance, as detailed studies of the text, if they are forthcoming, will show.' (p. 23)

There were 5,711 revisions at word level in 1872. Darwin added 285 sentences and deleted 261. Again, these numbers do not include the additional Chapter.

The numbers obtained using the Online Variorum and Peckham do not agree because different things are being counted. The numbers I have given here include every alteration within a sentence. Peckham's are the result of counting every sentence that was changed, independently of how many alterations were made to it.

Types of Changes in the Origin

During the course of preparing the present edition, I have come to the conclusion that Darwin's writing in the Origin developed in very specific ways.

There are six types of variants found in the editions under consideration. Each represents a particular type of stylistic or semantic change.

1. Depersonalization: This first kind of variant depersonalizes the text, by supressing the use of the first person and changing from personal to impersonal constructions. Some of his correspondents complained about the number of times 'I believe' occurred. The phrase 'I think,' for example, was removed 29 times. He also removed 'I presume' and other similar constructions.

Another example of depersonalization is the change from 'When we attempt...' to 'In attempting…':

When we attempt to estimate the amount of structural difference between the domestic races… (p. 17, 1859)

In attempting to estimate the amount of structural difference between the domestic races… (p. 18, 1869)

This kind of change is a typical example of what I call depersonalization. It suggests that the tone is being changed into what appears more detached and objective.

2. Reinforcement: an idea is presented in a slightly hesitant manner and is then revised to become a forceful proposition.

... and that some of the breeds closely resemble, perhaps are identical with, those still existing. (p. 18, 1859)

… and that some of these ancient breeds closely resemble or are even identical with those still existing. (p. 18, 1866)

In the above example, Darwin changed 'some of the breeds' to a more precise 'some of the ancient breeds,' more precise particularly because he made reference to Egypt at the beginning of this sentence. Note that there is another change: 'perhaps are identical' to 'are even identical.'

3. Objectivization: this type suppresses colloquial elements from the text. Darwin removed 'simply' at least three times, 'much' at least eleven times, 'often' at least thirteen times, 'just' four times, 'quite' 26 times and changed 'pretty clear' to 'clear' once.

4. Clarification: attempts to make the language less obscure while building a more understandable sentence structure.

When a deviation appears not unfrequently, and we see it in the father and child, we cannot tell whether it may not be due to the same original cause acting on both;… (pp. 12-13, 1859)

When any deviation of structure often appears, and we see it in the father and child, we cannot tell whether it may not be due to the same cause having acted on both;… (p. 13, 1866)

In this example, Darwin changed his very convoluted first version to a less obscure one.

5. Updating: When new hypotheses are brought to Darwin's attention or new research is published he included relevant references in the new editions. Chronological references were also changed over time. Updating is a particular kind of semantic change.

Many examples of the updating of chronological references can be found in the Historical Sketch, which included many temporal allusions since its first appearance.

6. Semantic changes: This category includes those significant changes that do not immediately appear as belonging to type 5 (updating). The following example might be considered:

Has the little variability of the ass or guinea-fowl, or the small power of endurance of warmth by the rein-deer, or of cold by the common camel, prevented their domestication? (p. 17, 1859)

Has the little variability of the ass and goose, or the small power of endurance of warmth by the reindeer or of cold by the common camel, prevented their domestication? (p. 18, 1866)

The variants guinea-fowl/goose are clearly not the correction of a typographical error. It does not appear to be an update either.

The moment the world changed

On 24 November 1859 a gentleman naturalist living in Kent, who twenty-five years before had traveled the world and in the time since had pursued his own thoughts about how new species arise, published his theory in On the Origin of Species by Means of Natural Selection. His book was an instant best-seller. He corrected it, making sure that he included the latest research carried out by others, while his reputation grew. At some point between 1866 and 1869, Darwin or one of his colleagues and friends, made the decision that the world had changed. Most of the stylistic changes in the Origin appeared in 1869. The second sentence in the Historical Sketch underwent a radical transformation:

The great majority of naturalists believe that species are immutable productions, and have been separately created. (p. xiii, 1861; p. xiii, 1866)

This is how the same sentence appears in the fifth edition, in 1869:

Until recently the great majority of naturalists believed that species were immutable productions, and had been separately created. (p. xv, 1869; p. xiii 1872)

The changes in chronological reference and, particularly, the use of the preterite in the latter version of the sentence, indicate that Darwin had now seen a radical change in the attitude of his contemporaries. What had been the norm had now ceased to be. The revisions in this sentence mark the moment when the world changed.

Barbara Bordalejo

24 November 2009


This variorum could not have been realized without the transcriptions from Darwin Online, which were used with permission of John van Wyhe. I am also grateful to John for his hard work and enthusiasm in improving the edition. David Parker was always loyal and always showed he believed in me more than most. I owe a great debt to Mark Pallen. His practical approach to research has been a source of inspiration and his book, the Rough Guide to Evolution, a cherished gift that I enjoy reading over and over. Peter Robinson kindly donated his time to make the edition publishable. I can never repay his patience (when he was patient) or make up for his exasperation (when he had lost his patience).
I am also grateful to Alan Bale, Juliet Coates and Joris van Zundert who often made jokes of my recurrent complaints about line numbering, and to Peter Shillingsburg and Dirk Van Hulle, for showing up and weighing in. All errors remain my responsibility.

1 Morse Peckham ed., 1959. The Origin of Species: a Variorum Text. Philadelphia: Univ. of Pennsylvania Press. Peckham made a clear distinction between the two: 'I have prepared a variorum text. The next step is a variorum edition, with the sources of his [Darwin's] information tracked down and verified, authorities already published up to 1872 and authorities whom he consulted as revealed in the various volumes of his correspondence and in the letters, apparently a great body, which remain unpublished.' I have never seen a variorum edition as described by Peckham. Instead, the editions of this type that I am aware of present all the variants as available in the chosen source texts, which is I what I have done in this edition.

2 Collate is at the end of its life in its current form. Historically, the software was developed in an Apple computer and always maintained within the Macintosh system. When Macintosh switched, a few years ago, to Intel processors, Collate was not updated. Currently, and under the umbrella of the Interedition Project, new software is being developed with the name of CollateX. This program will retain many of the original Collate features, but will run as a webserver application.

3 Shillingsburg, Peter. 2007. The first five English editions of Charles Darwin's On the Origin of Species. Variants 5: 221-243.