F-PIV-2: Digital History
Friday, 09/Mar/2018:
4:00pm - 5:30pm

Session Chair: Mikko Tolonen
Location: PIV

4:00pm - 4:30pm
Long Paper (20+10min)

Historical Networks and Identity Formation: Digital Representation of Statistical and Geo- Data to Mobilize Knowledge. Case Study of Norwegian Migration to the USA (1870-1920)

Jana Sverdljuk

National Library of Norway,

The article is a result of the collaborative interdisciplinary workshop, which involved expertise from social sciences, history and digital humanities. It showed how computer mediated ways of researching historical networks and identity formation of Norwegian-Americans substantially complemented historical and social sciences methods. By using open API of the National Archives of Norway we used statistical, geo- and text data to produce an interactive temporal visualization of regional origins in Norway at the USA map. Spatial visualization allowed highlighting space and time and the changing regional belonging as fundamental values for understanding social and cultural dimensions of migrants’ lives. We claim that data visualizations of space and time have performative materiality (Drucker 2013). They open a free room for a researcher to come up with his/her own narrative about the studied phenomenon (Perez and Granger 2015). Visualizations make us reflect on the relationship between the phenomenon and its representation (Klein 2014). This digital method supplements the classical sociological and socio-constructivist methods and has therefore knowledge mobilizing effects. In the article, we show, what potentials this visualization has in relation to the particular field of emigration studies, when entering into a dialogue with the existing historical research in the field.

4:30pm - 4:45pm
Short Paper (10+5min)

Spheres of “public” in eighteenth-century Britain

Mark J. Hill1, Antti Kanner1, Jani Marjanen1, Ville Vaara1, Eetu Mäkelä1, Leo Lahti2, Mikko Tolonen1

1University of Helsinki; 2University of Turku

The eighteenth-century saw a transformation in the practices of public discourse. With the emergence of clubs, associations, and, in particular, coffee houses, civic exchange intensified from the late seventeenth century. At the same time print media was transformed: book printing proliferated; new genres emerged (especially novels and small histories); works printed in smaller formats made reading more convenient (including in public); and periodicals - generally printed onto single folio half-sheets - emerged as a separate category of printed work which was written specifically for public consumption, and with the intention of influencing public discourse (such periodicals were intended to be both ephemeral and shared, often read, and then discussed, publically each day). This paper studies how these changes may be recognized in language by quantitatively studying the word “public” and its semantic context in the Eighteenth-Century Collections Online (ECCO).

While there are many descriptions of the transformation of public discourse (both contemporary and historical), there has been limited research into the language revolving (and evolving) around “public” in the eighteenth-century. Jürgen Habermas (2003: 2-3) famously argues that the emergence of words such as “Öffentlichkeit” in German and “publicity” in English are indicative of a change in the public sphere more generally. The conceptual history of “Öffentlichkeit” has been further studied in depth by Lucian Hölscher (1978), but a systematic study of the semantic context of “public” in British eighteenth-century material is missing. Studies that have covered this topic, such as Gunn (1989), base their findings on a very limited set of source material. In contrast, this study, by using a large-scale digitized corpus, aims to supplement earlier studies that focus on individual speech acts or particular collections of sources, and provide a more comprehensive account of how the language of “public” changed in the eighteenth century.

The historical subject matter means that the study is based on the ECCO corpus. While ECCO is in many ways an invaluable resource, a key goal of this study is to be methodologically sound from the perspective of corpus-linguistics and intellectual history, while developing insights which are relevant more generally to sociologists and historians. In this regard, ECCO does come with its own particular problems: both in terms of content and size.

With regard to content: OCR mistakes remain problematic; its heterogeneity in genres can skew investigations; and the unpredictable nature of duplicate texts introduced by numerous reprints of certain volumes must be taken into account. However, many of these problems can be mitigated in different ways. For example, in specific cases we compare findings with the, much smaller, ECCO TCP (an OCR corrected subset of ECCO). We have further used the English Short Title Catalogue (ESTC) to connect textual findings with relevant metadata information contained in the catalogue. By merging ESTC metadata with ECCO, one can more easily use existing historical knowledge (for example, issues around reprints and multiple editions) to engage with the corpus.

With regard to size: the corpus itself is too big to run automatic parsers. We have therefore extracted a separate, and smaller, corpus (with the help of ESTC metadata) to do more complex and demanding analyses. Results of these analyses were then replicated in a much simpler and cruder form on the whole dataset to gauge whether results corroborate the initial observations.

The size constraints provide their own advantages, however. The smaller subsections were chosen to represent pamphlets and other similar short documents by extracting all documents with less than 10406 characters in them. Compared to other specific genres or text types, this proved to be a successful method when attempting to define a meaningful subcorpus, while at the same time limiting effects of reprints, and including a relatively large number of individual writers in the analysis. The subjects covered by pamphlets also tend to be quite historically topical, and as shorter texts, inspecting single occurrences in their original context is much more efficient as things such as main theme, context, and writer’s intentions reveal themselves comparatively quickly compared to larger works. Thus, issues around distant and close reading are more easily overcome. In addition, we are able to compare semantic change between the larger corpus and the more rapidly shifting topical and political debates found in pamphlets, which offers its own historical insights.

In terms of specific linguistic approaches, analysis started with examinations of contextual distributions of “public” by year. Then, by changing the parameters of this analysis (for example, by defining the context as a set of syntactic dependencies engaged by public, or as collocation structures of a wider lexical environment) different aspects of the use of “public” can be brought to the foreground.

As syntactic constraints govern possibilities of combinations of words in shorter ranges of context, the narrower context windows contain a lot of syntactic information in addition to collocational information. Because of this syntactic restrictedness of close range combinations, the semantic relatedness of words with similar short range context distributions is one of degree of mutual interchangeability and, as such, of metaphorical relatedness (Heylen, Peirsman, Geeraerts, Speelman 2008). Wider context windows, such as paragraphs, are free from syntactic constraints, and so semantic relatedness between two words with similar wide range context distributions carries information from frequent contiguity in context and can be described as more metonymical than metaphorical by nature, as is visible from applications based on term-document-matrices, such as topic modelling or Latent Semantic Analysis (cf. Blei, Ng and Jordan (2003) and Dumais (2005))

The syntactic dependencies were counted by analysing the pamphlet subcorpus using Stanford Lexical Parser (Cheng and Manning 2014). Results show changes in the tendency to use “public” as an adjective attribute and in compound positions. Since in English the overwhelmingly most frequent position for both adjective attributes and compounding attributes is preceding head words, this analysis could be adequately replicated using bigrams in the whole dataset. Lexical environments have been analysed by clustering second order collocations (cf. Bertels and Speelman (2014)) and replicated by using a random sampling from the whole dataset to produce the second order vectors.

The study of all bigrams relating to “public” (such as “public opinion”, “public finances”, “public religion”) in ECCO provides for a broader analysis of the use of “public” in eighteenth-century discourse that not only focuses on particular compounds, but provides a better idea of which domains “public” was used in. It points towards a declining trend in relative frequency of religious bigrams during the course of the eighteenth century and rise in the relative frequency of secular bigrams - both political and economic. This allows us to present three arguments: First, it is argued that this is indicative of an overall shift in the language around “public” as the concept’s focus changed and it began to be used in new domains. This expansion of discourses or domains in which “public” was used is confirmed in the analyses of a wider lexical environment. Second, we also notice that some collocates to public, such as “public opinion” and “public good”, gained a stronger rhetorical appeal. They became tropes in their own right and gained a future orientation in political discourse in the latter half of the eighteenth century (Koselleck 1972). Third, by combining the results of the distributional semantics of “public” in ECCO with information extracted from ESTC, one can recognize how different groups used the language relating to “public” in different ways. For example, authors writing on religious topics tended to use “public” differently from authors associated with the enlightenment in Scotland or France.

There are two important upshots to this study: the methodological and the historical. With regard to the former, the paper works as a convincing case study which could be used as an example, or workflow, for studying other words that are pivotal to large structural change. With regard to the latter, the work is of particular historical relevance to recent discussions in eighteenth century intellectual history. In particular, the study contributes to the critical discussion of Habermas that has been taking place in the English-speaking world since the translation of his Structural Transformation of the Public Sphere in 1989, while also informing more traditional historical analyses which have not been able to draw tools from the digital humanities (Hill 2017).


4:45pm - 5:00pm
Short Paper (10+5min)

Charting the ’Culture’ of Cultural Treaties: Digital Humanities approaches to the history of international ideas

Benjamin G. Martin

Uppsala University

Cultural treaties are the bi-lateral or sometimes multilateral agreements among states that promote and regulate cooperation and exchange in the fields of life we call cultural or intellectual. Pioneered by France just after World War I, this type of treaty represents a distinctive technology of modern international relations, a tool in the toolkit of public diplomacy, a vector of “soft power.” One goal of a comparative examination of these treaties is to locate them in the history of public diplomacy and in the broader history of culture and power in the international arena. But these treaties can also serve as sources for the study of what the historian David Armitage has called “the intellectual history of the international.” In this project, I use digital humanities methods to approach cultural treaties as a historical source with which to explore the emergence of a global concept of culture in the twentieth century. Specifically, the project will investigate the hypothesis that the culture concept, in contrast to earlier ideas of civilization, played a key role in the consolidation of the post-World War II international order.

I approach the topic by charting how concepts of culture were given form in the system of international treaties between 1919 (when the first such treaty was signed) and 1972 (when UNESCO’s Convention on cultural heritage marked the “arrival” of a global embrace of the culture concept), studying them with the large-scale, quantitative methods of the digital humanities, as well as with the tools of textual and conceptual analysis associated with the study of intellectual history. In my paper for DH Nordic 2018, I will outline the topic, goals, and methods of the project, focusing on the ways we (that is, my colleagues at Umeå University’s HUMlab and I) seek to apply DH approaches to this study of global intellectual history.

The project uses computer-assisted quantitative analysis to analyze and visualize how cultural treaties contributed to the spread of cultural concepts and to the development of transnational cultural networks. We explore the source material offered by these treaties by approaching it as two distinct data sets. First, to chart the emergence of an international system of cultural treaties, we use quantitative analysis of the basic information, or “metadata” (countries, date, topic) from the complete set of treaties on cultural matters between 1919 and 1972, approximately 1250 documents. Our source for this information is the World Treaty Index ( This data can also help identify historical patterns in the emergence of a global network of bilateral cultural treaties. Once mapped, these networks will allow me to pose interesting questions by comparing them to any number of other transnational systems. How, for example, does the map of cultural agreements compare to that of trade treaties, military alliances, or to the transnational flows of cultural goods, capital, or migrants?

Second, to identify the development of concepts, we will observe the changing use of key terms through quantitative analysis of the treaty texts. By treating a large group of cultural treaties as several distinct text corpora and, perhaps, as a single text corpus, we will be able explore the treaties using textometry and topic modeling. The treaty texts (digital versions of most which can be found online) will be limited to four subsets: a) Britain, France, and Italy, 1919-1972; b) India, 1947-1972; c) the German Reich (1919-1945) and the two German successor states (1949-1972); and d) UNESCO’s multilateral conventions (1945-1972). This selection is designed to approach a global perspective while taking into account practical factors, such as language and accessibility. Our use of text analysis seeks (a) to offer insight into the changing usage and meanings of concepts like “culture” and “civilization”; (b) to identify which key areas of cultural activity were regulated by the treaties over time and by world region; and (c) to clarify whether “culture” was used in a broad, anthropological sense, or in a narrower sense to refer to the realm of arts, music, and literature. This aspect of the project raises interesting challenges, for example regarding how best to manipulate a multi-lingual text corpus (with texts in English, French, and German, at least).

In these ways, the project seeks to contribute to our understanding of how the concept of culture that guides today’s international society developed. It also explores how digital tools can help us ask (and eventually answer) questions in the field of global intellectual history.

5:00pm - 5:15pm
Short Paper (10+5min)

Facilitating Digital History in Finland: What can we learn from the past?

Mats Fridlund, Mila Oiva, Petri Paju

Aalto University

The paper discusses the findings of “From Roadmap to Roadshow: A collective demonstration & information project to strengthen Finnish digital history” project. The project develops the history disciplines in Finland as a collaborative project. The project received funding from the Kone Foundation. The long paper proposed for the DHN2018 will discuss what we have learned about the present day conditions of digital history in Finland, how digital humanities is facilitated today in Finland and abroad, and what suggestions we could give for strengthening the conditions for doing digital history research in Finland.

At the first phase of the project we did a survey among Finnish historians and identified several critical issues that require further development. They were the following: creating better, up-to-date information channels of digital history resources and events, providing relevant education, skills, and teaching by historians, and the need to help historians and information technology specialists to meet and collaborate better and more systematically than before. Many historians also had issues with the concept of digital history and difficulties with such an identity.

In order to situate Finnish digital history in the domestic and international contexts, we have studied the roots of the computational history research in Finland, which date back to the 1960s, and the best practice of how digital history is currently done internationally. We have visited selected digital humanities centers in Europe and the US, which we have identified as having “done something right”. Based on these studies, visits and interviews we will propose steps to be taken for further strengthen the digital history research community in Finland.

