T-PIV-2: Authorship
Thursday, 08/Mar/2018:
2:00pm - 3:30pm

Session Chair: Jani Marjanen
Location: PIV

2:00pm - 2:30pm
Long Paper (20+10min) [abstract]

Extracting script features from a large corpus of handwritten documents

Lasse Mårtensson1, Anders Hast2, Ekta Vats2

1Högskolan i Gävle, Sweden,; 2Uppsala universitet, Sweden

Before the advent of the printing press, the only way to create a new piece of text was to produce it by hand. The medieval text culture was almost exclusively a handwritten one, even though printing began towards the very end of the Middle Ages. As a consequence of this, the medieval text production is very much characterized by variation of various kinds: regarding language forms, regarding spelling and regarding the shape of the script. In the current presentation, the shape of the script is in focus, an area referred to as palaeography. The introduction of computers has changed this discipline radically, as computers can handle very large amounts of data and furthermore measure features that are difficult to deal with for a human researcher.

In the current presentation, we will demonstrate two investigations within digital palaeography, carried out on the medieval Swedish charter corpus in its entirety, to the extent that this has been digitized. The script in approximately 14 000 charters has been measured and accounted for, regarding aspects described below. The charters are primarily in Latin and Old Swedish, but there are also a few in Middle Low German. The overall purpose for the investigations is to search for script features that may be significant from the perspective of separating one scribe from another, i.e. scribal attribution. As the investigations have been done on the entire available charter corpus, it is possible to visualize how each separate charter relates to all the others, and furthermore to see how the charters may divide themselves into clusters on the basis of similarity regarding the investigated features.

The two investigations both focus on aspects that have been looked upon as significant from the perspective of scribal attribution, but that are very difficult to measure, at least with any degree of precision, without the aid of computers. One of the investigations belongs to a set of methods often referred to as Quill Features. This method focuses, as the name states, on how the scribe has moved the pen over the script surface (parchment or paper). The medieval pen, the quill, consisted of a feather that had been hardened, truncated and split at the top. This construction created variation in width in the strokes constituting the script, mainly depending on the direction in which the pen was moved, and also depending on the angle in which the scribe had held the pen. This is what this method measures: the variation between thick and thin strokes, in relation to the angle of the pen. This method has been used on medieval Swedish material before, namely a medieval Swedish manuscript (Cod. Ups. C 61, 1104 pages), but the current investigation accounts for ten times the size of the previous investigation, and furthermore, we employ a new type of evaluation (see below) of the results that to our knowledge has not been done before.

The second investigation focuses on the relations between script elements of different height, and the proportions between these. For instance three different formations can be discerned among the vertical scripts elements: minims (e.g. in ‘i’, ‘n’ and ‘m’), ascenders (e.g. in ‘b’, ‘h’ and ‘k’) and descenders (e.g. in ‘p’ and ‘q’). The ascender can extend to a various degree above the minim, and the descender can extend to a various degree below the minim, creating different proportions between the components. These measures have also been extracted from the entire available medieval Swedish charter corpus, and display very interesting information from the perspective of scribal identity. It should be noted that the first line of a charter often is divergent from the rest of the charter in this respect, as the ascenders here often extends higher than otherwise. In a similar way, the descenders of the last line of the charters often extend further down below the line as compared to the rest of the charter. In order for a representative measure to be gained from a charter, these two lines must be disregarded.

One of the problems when investigating individual scribal habits in medieval documents is that we rarely know for certain who has produced them, which makes the evaluation difficult. In most cases, the scribe of a given document is identified through a process of scribal attribution, usually based on palaeographical and linguistic evidence. In an investigation on individual scribal features, it is not desirable to evaluate the results on the basis of previous attributions. Ideally, the evaluation should be done on charters where the identity of the scribe can be established on external features, where his/her identity is in fact known. For this purpose, we have identified a set of charters where this is actually the case, namely where the scribe himself/herself explicitly states that he/she has held the pen (in our corpus, there are only male scribes). These charters contain a so-called scribal note, containing the formula ego X scripsi (‘I X wrote’), accompanied by a symbol unique to this specific scribe. One such scribe is Peter Tidikesson, who produced 13 charters with such a scribal note in the period 1432–1452, and another is Peter Svensson, who produced six charters in the period 1433–1453. This selection of charters is the means by which the otherwise big data-focused computer aided methods can be evaluated from a qualitative perspective. This step of evaluation is crucial in order for the results to become accessible and useful for the users of the information gained.

2:30pm - 3:00pm
Long Paper (20+10min) [abstract]

Text Reuse and Eighteenth-Century Histories of England

Ville Vaara1, Aleksi Vesanto2, Mikko Tolonen1

1University of Helsinki; 2University of Turku


- ----

What kind of history is Hume’s History of England? Is it an impartial account or is it part of a political project? To what extent was it influenced by seventeenth-century Royalist authors? These questions have been asked since the first Stuart volumes were published in the 1750s. The consensus is that Hume’s use of Royalist sources left a crucial mark on his historical project. However, as Mark Spencer notes, Hume did not only copy from Royalists or Tories. One aim of this paper is to weigh these claims against our evidence about Hume’s use of historical sources. To do this we qualified, clustered and compared 129,646 instances text reuse in Hume’s History. Additionally, we are able to compare Hume’s History of England to other similar undertakings in the eighteenth-century and get an accurate view of their composition. We aim to extend the discussion on Hume's History in the direction of applying computation methods on understanding the writing of history of England in the eighteenth-century as a genre.

This paper contributes to the overall development of Digital Humanities by demonstrating how digital methods can help develop and move forward discussion in an existing research case. We don’t limit ourselves to general method development, but rather contribute in the specific discussions on Hume’s History and study of eighteenth-century histories.

Methods and sources

- ----

This paper advances our understanding of the composition of Hume’s History by examining the direct quotes in it based on data in Eighteenth-Century Collections Online (ECCO). It should be noted that ECCO also includes central seventeenth-century histories and other important documents reprinted later. Thus, we do not only include eighteenth-century sources, but, for example, Clarendon, Rushworth and other notable seventeenth-century historians. We compare the phenomenon of text reuse in Hume’s History to that in works of Rapin, Guthrie and Carte, all prominent historians at the time. To our knowledge, this kind of text mining effort has not been not been previously done in the field of historiography.

Our base-text for Hume is the 1778 edition of History of England. For Paul de Rapin we used the 1726-32 edition of his History of England. For Thomas Carte the source was the 1747-1755 edition of his General History of England. And for William Guthrie we used the 1744-1751 edition of his History of Great Britain.

As a starting point for our analysis, we used a dataset of linked text-reuse fragments found in ECCO. The basic idea was to create a dataset that identifies similar sequences of characters (from circa 150 to more than 2000 characters each) instead of trying to match individual characters or tokens/words. This helped with the optical character recognition problems that plague ECCO. The methodology has previously been used in matching DNA sequences, where the problem of noisy data is likewise present. We further enriched the results with bibliographical metadata from the English Short Title Catalogue (ESTC). This enriching allows us to compare the publication chronology and locations, and to create rough estimates of first edition publication dates.

There is no ready-to-use gold standard for text reuse cluster detection. Therefore, we compared our clusters and the critical edition of the Essay Concerning Human Understanding (EHU) to see if text reuse cases of Hume’s Treatise in EHU are also identified by our method. The results show that we were able to identify all cases included in EHU except those in footnotes. Because some of the changes that Hume made from the Treatise to EHU are not evident, this is a very promising.


- ----

To give a general overview of Hume’s History in relation to other works considered, we compared their respective volumes of source text reuse (figure 1). The comparison reveals some fundamental stylistic and structural differences. Hume’s and Carte’s Histories are composed quite differently from Rapin’s and Guthrie’s, which have roughly three times more reused fragments: Rapin typically opens a chapter with a long quote from a source document, and moves on to discuss the related historical events. Guthrie writes similarly, quoting long passages from sources of his choice. Humeis different: His quotes are more evenly spread, and a greater proportion of the text seems to be his own original formulations.

[Figure 1.]

Change in text reuse in the Histories

- ----

All the histories of England considered in our analysis are massive works, comprising of multiple separate volumes. The amount of reused text fragments found in these volumes differs significantly, but the trends are roughly similar. The common overall feature is a rise in the frequency of direct quotes in later volumes.

The increase in text reuse peaks in the volumes covering the reign of Charles I, and the events of the English Civil War, but with respect to both Hume and Rapin (figures 2 & 3), the highest peak is not at the end of Charles’ reign, but in the lead up to the confrontation with the parliament. In Guthrie and Carte (figures 4 & 5) the peaks are located in the final volume. Except for Guthrie, all the other historical works considered here have the highest reuse rates located around the period of Charles I’s reign that was intensely debated topic among Hume’s contemporaries.

[Figure 2.]

[Figure 3.]

[Figures 4, 5.]

We can further break down the the sources of reused text fragments by political affiliation of their authors (figure 6). A significant portion of the detected text reuse cases by Hume link to authors with no strong political leaning in the wider Whig-Tory context. It is obvious that serious antiquary work that is politically neutral forms the main body of seventeenth-century historiography in England. With the later volumes, the amount of text reuses cases tracing back to authors with a political affiliation increases, as might be expected with more heavily politically loaded topics.

[Figure 6.]

Taking an overview of the authors of the text reuse fragments in Hume’s History (figure 7), we note that the statistics are dominated by a handful of writers, with a long “tail” of others whose use is limited to a few fragments. Both groups, the Whig and Tory authors, feature a few “main sources” for Hume. John Rushworth (1612-1690) emerges as the most influential source, followed closely by Edward Hyde Clarendon (1609-1674). Both Rushworth and Clarendon had reached a position of prominence as historians and were among the best known and respected sources available when Hume was writing his own work. We might even question if their use was politically colored at all, as practically everyone was using their works, regardless of political stance.

[Figure 7.]

Charles I execution and Hume’s impartiality

- ----

A relatively limited list of authors are responsible for majority of the text fragments in Hume's History. As one might intuitively expect, the use of particular authors is concentrated in particular chapters. In general, the unevenness in the use of quotes can be seen as more of a norm than an exception.

However, there is at least one central chapter in Hume’s Stuart history that breaks this pattern. That is, Chapter LIX - perhaps the most famous chapter in the whole work, covering the execution of Charles I. Nineteenth-century Whig commentators argued, with great enthusiasm, that Hume’s use of sources, especially in this particular chapter, and Hume’s description of Charles’s execution, followed Royalist sources and the Jacobite Thomas Carte in particular. Thus, more carefully balanced use of sources in this particular chapter reveals a clear intention of wanting to be (or appear to be) impartial on this specific topic (figure 8).

Of course, there is John Stuart Mill’s claim that Hume only uses Whigs when they support his Royalist bias. In the light of our data, this seems unlikely. If we compare Hume's use of Royalist sources in his treatment of the execution of Charles I to Carte, Carte’s use of Royalists, statistically, is off the chart whereas Hume’s is aligned with his use of Tory sources elsewhere in the volume.

[Figure 8.]

Hume’s influence on later Histories

- ----

A final area of interest in terms of text reuse is what it can tell us about an author’s influence on later writers. The reuse totals of Hume’s History in works following its publication are surprisingly evenly spread out over all the volumes (figure 9), and in this respect differ from the other historians considered here (figures 10 - 12). The only exception is the last volume where a drop in the amount of detected reuse fragments can be considered significant.

Of all the authors only Hume has a significant reuse arising from the volumes discussing the Civil War. The reception of Hume’s first Stuart volume, the first published volume of his History is well known. It is notable that the next volumes published, that is the following Stuart volumes, and possibly written with the angry reception of the first Stuart volume in mind, are the ones that seem to have given rise to least discussion.

[Figure 9.]

[Figure 10.]

[Figures 11 & 12.]


- ----

Original sources

- ----

Eighteenth-century Collections Online (GALE)

English Short-Title Catalogue (British Library)

Thomas Carte, General History of England, 4 vols., 1747-1755.

William Guthrie, History of Great Britain, 3 vols., 1744-1751.

David Hume, History of England, 8 vols., 1778.

David Hume, Enquiry concerning Human Understanding, ed. Tom L. Beauchamp, OUP, 2000.

Paul de Rapin, History of England, 15 vols., 1726-32.

Secondary sources

- ----

Herbert Butterfield, The Englishman and his history, 1944.

John Burrow, Whigs and Liberals: Continuity and Change in English Political Thought, 1988.

Duncan Forbes, Hume’s Philosophical Politics, Cambridge, 1975.

James Harris, Hume. An intellectual biography, 2015.

Colin Kidd, Subverting Scotland's Past. Scottish Whig Historians and the Creation of an Anglo-British Identity 1689–1830, Cambridge, 1993.

Royce MacGillivray, ‘Hume's "Toryism" and the Sources for his Narrative of the Great Rebellion’, Dalhousie Review, 56, 1987, pp. 682-6.

John Stuart Mill, ‘Brodie’s History of the British Empire’, Robson et al. ed. Collected works, vol. 6, pp. 3-58. (

Ernest Mossner, "Was Hume a Tory Historian?’, Journal of the History of Ideas, 2, 1941, pp. 225-236.

Karen O’Brien, Narratives of Enlightenment: Cosmopolitan History from Voltaire to Gibbon, CUP, 1997.

Laird Okie, ‘Ideology and Partiality in David Hume's History of England’, Hume Studies, vol. 11, 1985, pp. 1-32.

Frances Palgrave, ‘Hume and his influence upon History’ in vol. 9 of Collected Historical Works, e.d R. H. Inglis Palgrave, 10 vols. CUP, 1919-22.

John Pocock, Barbarism and religion, vols. 1-2.

B. A. Ring, ’David Hume: Historian or Tory Hack?’, North Dakota Quarterly, 1968, pp. 50-59.

Claudia Schmidt, Reason in history, 2010.

Mark Spencer, ‘David Hume, Philosophical Historian: “contemptible Thief” or “honest and industrious Manufacturer”?, Hume conference, Brown, 2017.

Vesanto, Nivala, Salakoski, Salmi & Ginter: A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora. Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24. May 2017, Gothenburg, Sweden. (

3:00pm - 3:30pm
Long Paper (20+10min) [abstract]

Refutatio errorum – authorship attribution on a late-medieval antiheretical treatise

Reima Välimäki

University of Turku, Cultural history

Refutatio errorum – authorship attribution on a late-medieval antiheretical treatise.

Since Peter Biller’s attribution of the Cum dormirent homines (1395) to Petrus Zwicker, perhaps the most important late medieval inquisitor prosecuting Waldensians, the treatise has become a standard source on the late medieval German Waldensianism. There is, however, another treatise, known as the Refutatio errorum, which has gained far less attention. In my dissertation (2016) I proposed that similarities in style, contents, manuscript tradition and composition of the Refutatio errorum and the Cum dormirent homines are so remarkable that Petrus Zwicker can be confirmed as the author of both texts. The Refutatio exists in four different redactions. However, the redaction edited by J. Gretser in the 17th century, and consequently used by modern scholars, does not correspond to the earlier and more popular redaction that is in the majority of preserved manuscripts.

In the proposed paper I will add a new element of verification to Zwicker’s authorship: machine-learning-based computational authorship attribution applied in the digital humanities consortium Profiling Premodern Authors (University of Turku, 2016–2019). In its simplest form, the authorship attribution is a binary classification task based on textual features (word uni/bi-grams, character n-grams). In our case, the classifications are “Petrus Zwicker” (based on features from his known treatise) and “not-Zwicker”, based on features from a background corpus consisting of medieval Latin polemical treatises, sermons and other theological works. The test cases are the four redactions of the Refutatio errorum. Classifiers used include a linear Support Vector Machine and a more complex Convolutional Neural Network. Researchers from the Turku NLP group (Aleksi Vesanto, Filip Ginter, Sampo Pyysalo) are responsible for the computational analysis.

The paper contributes to the conference theme History. It aims to bridge the gap between authorship attribution based on qualitative analysis (e.g. contents, manuscript tradition, codicological features, palaeography) and computational stylometry. Computational methods are treated as one tool that contributes to the difficult task of recognising authorship in a medieval text. The study of author profiles of four different redactions of a single work contributes to the discussions on scribes, secretaries and compilers as authors of medieval texts (e.g. Reiter 1996, Minnis 2006, Connolly 2011, Kwakkel 2012, De Gussem 2017).


Biller, Peter. “The Anti-Waldensian Treatise Cum Dormirent Homines of 1395 and its Author.” In The Waldenses, 1170-1530: Between a Religious Order and a Church, 237–69. Variorum Collected Studies Series. Aldershot: Ashgate, 2001.

Connolly, Margaret. “Compiling the Book.” In The Production of Books in England 1350-1500, edited by Alexandra Gillespie and Daniel Wakelin, 129–49. Cambridge Studies in Palaeography and Codicology 14. Cambridge ; New York: Cambridge University Press, 2011.

De Gussem, Jeroen. “Bernard of Clairvaux and Nicholas of Montiéramey: Tracing the Secretarial Trail with Computational Stylistics.” Speculum 92, no. S1 (2017): S190–225.

Kwakkel, Erik. “Late Medieval Text Collections. A Codicological Typology Based on Single-Author Manuscripts.” In Author Reader Book: Medieval Authorship in Theory and Practice, edited by Stephen Partridge and Erik Kwakkel, 56–79. Toronto: University of Toronto Press, 2012.

Reiter, Eric H. “The Reader as Author of the User-Produced Manuscript: Reading and Rewriting Popular Latin Theology in the Late Middle Ages.” Viator 27, no. 1 (1996): 151–70.

Minnis, A. J. “Nolens Auctor Sed Compilator Reputari: The Late-Medieval Discourse of Compilation.” In La Méthode Critique Au Moyen Âge, edited by Mireille Chazan and Gilbert Dahan, 47–63. Bibliothèque d’histoire Culturelle Du Moyen âge 3. Turnhout: Brepols, 2006.

Välimäki, Reima. “The Awakener of Sleeping Men. Inquisitor Petrus Zwicker, the Waldenses, and the Retheologisation of Heresy in Late Medieval Germany.” PhD Thesis, University of Turku, 2016.

