Conference Agenda

Digital Humanities in the Nordic Countries 3rd Conference

 
Date: Wednesday, 07/Mar/2018
10:00am - 12:30pmRegistration
Lobby, Porthania, Yliopistonkatu 3
 
12:30pm - 2:00pmLunch
Main Building of the University, entrance from the Senate Square side
 
2:00pm - 2:15pmWelcome
PI 
2:15pm - 3:30pmPlenary 1: Alan Liu
Session Chair: Mikko Tolonen
Open and Re­pro­du­cible Work­flows for the Di­gital Hu­man­it­ies – A 10,000 Meter El­ev­a­tion View
PI 
3:30pm - 4:00pmCoffee break
Lobby, Porthania
 
4:00pm - 5:30pmW-PI-1: New Media
Session Chair: Bente Maegaard
PI 
 
4:00pm - 4:30pm
Long Paper (20+10min) [publication ready]

Skin Tone Emoji and Sentiment on Twitter

Steven Coats

University of Oulu

In 2015, the Unicode Consortium introduced five skin tone emoji that can be used in combination with emoji representing human figures and body parts. In this study, use of the skin tone emoji is analyzed geographically in a large sample of data from Twitter. It can be shown that values for the skin tone emoji by country correspond approximately to the skin tone of the resident populations, and that a negative correlation exists between tweet sentiment and darker skin tone at the global level. In an era of large-scale migrations and continued sensitivity to questions of skin color and race, understanding how new language elements such as skin tone emoji are used can help frame our understanding of how people represent themselves and others in terms of a salient personal appearance attribute.

Coats-Skin Tone Emoji and Sentiment on Twitter-238_a.pdf
Coats-Skin Tone Emoji and Sentiment on Twitter-238_c.pdf

4:30pm - 4:45pm
Distinguished Short Paper (10+5min) [abstract]

“Memes” as a Cultural Software in the Context of the (Fake) Wall between the US and Mexico

Martin Camps

University of the Pacific,

Memes function as “digital graffiti” in the streets of social media, a cultural electronic product that satirizes those in power. The success of a meme is measured by “going viral” and reproduced like a germ. I am interested in analyzing these eckphrastic texts in the context of the construction of the wall between the US and Mexico. I examine popular memes in Mexico and in the US from both sides of the political spectrum. I believe these “political haikus” work as an escape valve to the tensions generated in the cultural wars that consume American politics. The border is an “open wound” (as Mexican writer Carlos Fuentes said) that was opened after the War of 1847 that resulted in Mexico losing half of its territory. Now the wall functions as a political membrane to keep out the “expelled citizens” of the Global South from the economic benefits of the North. Memes help to expunge the gravity of a two-thousand-mile concrete wall in a region that shares cultural traits, languages, and an environment that cannot be domesticated with monuments to hatred. Memes are rhetorical devices that convey the absurdity of a situation, as in one example, of a border wall with an enormous “piñata” that infantilizes the State-funded project of a fence. The meme’s iconoclastography set in motion a discussion of the real issues at hand, global economic disparities and the human right to migrate on this small planet of ours.

Camps-“Memes” as a Cultural Software in the Context-168_a.pdf
Camps-“Memes” as a Cultural Software in the Context-168_c.pdf

4:45pm - 5:00pm
Short Paper (10+5min) [publication ready]

A Mixed Methods Analysis of Local Facebook Groups in Helsinki

Matti Autio

Freelancer

In Helsinki, the largest city of Finland, local Facebook groups have become increasingly popular. Communities in Helsinki have developed a virtual existence as a part of everyday life. More than 50 discussion groups exist that are tied to a certain residential district. Local flea market groups are even more common. The membership of local Facebook groups totals at least 300 000 in a city of little more than half a million. The content of discussion groups was studied using a mixed methods approach. The qualitative results give a typology of local Facebook groups and an insight to the reoccurring topics of posts. The quantitative study reveals significant differences in the amount of local social control exerted through the Facebook group. The discussion groups are used for social control more prominently in areas with a lot of detached housing. In high rise districts the new networks are used mainly for socially cohesive cooperation. The social cohesion and control of local Facebook groups strengthens the community’s collective efficacy.

Autio-A Mixed Methods Analysis of Local Facebook Groups-257_a.pdf

5:00pm - 5:15pm
Short Paper (10+5min) [publication ready]

Medicine Radar – Discovering How People Discuss Their Health

Krista Lagus1, Minna Ruckenstein2, Atte Juvonen3, Chang Rajani3

1Faculty of Social Sciences, University of Helsinki, Finland,; 2Consumer Society Research Centre, University of Helsinki, Finland; 3Futurice Oy

In order to understand how people discuss their health and their use of medicines on-line, we studied Health in the Suomi24. Upon the analysis of 19 million comments containing 200 million words, a concept vocabulary of medicines and of symptoms was derived from colloquial discussions utilizing a mixture of unsupervised learning, human input and linguistic analyses. We present the method and a tool for browsing the health discussions, and the relations of medicines, symptoms and dosages.

Lagus-Medicine Radar – Discovering How People Discuss Their Health-273_a.pdf

5:15pm - 5:30pm
Short Paper (10+5min) [publication ready]

(Re)Branching Narrativity: Virtual Space Experience in Twitch

Ilgin Kizilgunesler

University of Manitoba, Canada

Twitch, as an online platform for gamers, has been analyzed in terms of its commercial benefits for the increase of game sales and its role in bringing fame to streamers. By focusing on Twitch’s interactive capacity, this paper compares this platform to narrative games, playable stories, and mobile narratives in terms of the role of the user(s) and their virtual space experience. Drawing on theories by Marie-Laure Ryan in "From Narrative Games to Playable Stories: Toward a Poetics of Interactive Narrative"and Rita Raley in "Walk This Way: Mobile Narrative as Composed Experience”, the paper argues that Twitch assigns authorial roles to the users (i.e., the streamers and the subscribers), who branch the existing narrative of the game by determining the path of the setting collectively. By doing so, the paper proposes Twitch as a space, which extends the immersion that is discussed around such interactive forms (i.e., narrative games, playable stories, and mobile narratives).

Kizilgunesler-(Re)Branching Narrativity-149_a.pdf
Kizilgunesler-(Re)Branching Narrativity-149_c.pdf
 
4:00pm - 5:30pmW-PII-1: Historical Texts
Session Chair: Asko Nivala
PII 
 
4:00pm - 4:30pm
Long Paper (20+10min) [abstract]

Diplomatarium Fennicum and the digital research infrastructures for medieval studies

Seppo Eskola, Lauri Leinonen

National Archives of Finland,

Digital infrastructures for medieval studies have advanced in strides in Finland over the last few years. Most literary sources concerning medieval Finland − the Diocese of Åbo − are now available online in one form or another: Diplomatarium Fennicum encompasses nearly 7 000 documentary sources, the Codices Fennici project recently digitized over 200 mostly well-preserved pre-17th century codices and placed them online, and Fragmenta Membranea contains digital images of 9 300 manuscript leaves belonging to over 1 500 fragmentary manuscripts. In terms of availability of sources, the preconditions for research have never been better. So, what’s next?

This presentation discusses the current state of digital infrastructures for medieval studies and their future possibilities. For the past two and a half years the presenters have been working on the Diplomatarium Fennicum webservice, published in November 2017, and the topic is approached from this background. Digital infrastructures are being developed on many fronts in Finland: several memory institutions are actively engaged (the three above-mentioned webservices are developed and hosted by the National Archives, The Finnish Literature Society, and the National Library respectively) and many universities have active medieval studies programs with an interest in digital humanities. Furthermore, interest in Finnish digital infrastructures is not restricted to Finland as Finnish sources are closely linked to those of other Nordic countries and the Baltic Sea region in general. In our presentation, we will compare the different Finnish projects, highlight opportunities for international co-operation, and discuss choices (e.g. selecting metadata models) that could best support collaboration between different services and projects.

Eskola-Diplomatarium Fennicum and the digital research infrastructures-251_a.pdf
Eskola-Diplomatarium Fennicum and the digital research infrastructures-251_c.pdf

4:30pm - 4:45pm
Short Paper (10+5min) [publication ready]

The HistCorp Collection of Historical Corpora and Resources

Eva Pettersson, Beáta Megyesi

Uppsala University

We present the HistCorp collection, a freely available open platform aiming at the distribution of a wide range of historical corpora and other useful resources and tools for researchers and scholars interested in the study of historical texts. The platform contains a monitoring corpus of historical texts from various time periods and genres for 14 European languages. The collection is taken from well-documented historical corpora, and distributed in a uniform, standardised format. The texts are downloadable as plaintext, and in a tokenised format. Furthermore, some texts are normalised with regard to spelling, and some are annotated with part-of-speech and syntactic structure. In addition, preconfigured language models and spelling normalisation tools are provided to allow the study of historical languages.

Pettersson-The HistCorp Collection of Historical Corpora and Resources-135_a.pdf
Pettersson-The HistCorp Collection of Historical Corpora and Resources-135_c.pdf

4:45pm - 5:00pm
Short Paper (10+5min) [publication ready]

Semantic National Biography of Finland

Eero Hyvönen1,2, Petri Leskinen1, Minna Tamper1,2, Jouni Tuominen2,1, Kirsi Keravuori3

1Aalto University; 2University of Helsinki (HELDIG); 3Finnish Literature Society (SKS)

This paper presents the idea and project of transforming and using

the textual biographies of the National Biography of Finland, published by the

Finnish Literature Society, as Linked (Open) Data. The idea is to publish the lives as semantic, i.e., machine “understandable” metadata in a SPARQL endpoint in the Linked Data Finland (LDF.fi) service, on top of which various Digital Humanities applications are built. The applications include searching and studying individual personal histories as well as historical research of groups of persons using methods of prosopography. The basic biographical data is enriched by extracting events from unstructured texts and by linking entities internally and to external data sources. A faceted semantic search engine is provided for filtering groups of people from the data for research in Digital Humanities. An extension of the event-based CIDOC CRM ontology is used as the underlying data model, where lives are seen as chains of interlinked events populated from the data of the biographies and additional sources, such as museum collections, library databases, and archives.

Hyvönen-Semantic National Biography of Finland-203_a.pdf
Hyvönen-Semantic National Biography of Finland-203_c.pdf

5:00pm - 5:15pm
Short Paper (10+5min) [abstract]

Creating a corpus of communal court minute books: a challenge for digital humanities

Maarja-Liisa Pilvik1, Gerth Jaanimäe1, Liina Lindström1, Kadri Muischnek1, Kersti Lust2

1University of Tartu, Estonia,; 2The National Archives of Estonia, Estonia

This paper presents the work of a digital humanities project concerned with the digitization of Estonian communal court minute books. The local communal courts in Estonia came into being through the peasant laws of the early 19th century and were the first instance class-specific courts, that tried peasants. Rather than being merely judicial institutions, the communal courts were at first institutions for the self-government of peasants, since they also dealt with police and administrative matters. After the municipal reform of 1866, however, the communal courts were emancipated from the noble tutelage and the court became a strictly judicial institution, that tried peasants for their minor offences and solved their civil disputes, claims and family matters. The communal courts in their earlier form ceased to exist in 1918, when Estonia became independent from the Russian rule.

The National Archives of Estonia holds almost 400 archives of communal courts from the pre-independence period. They have been preserved very unevenly and not all of them include minute books. The minute books themselves are also written in an inconsistent manner, the earlier minute books are often written in German and the writing is strongly dependent on the skills and will of the parish clerk. However, the materials from the period starting with the year 1866, when the creation of the minute books became more systematic, are a massive and rich source shedding light on the everyday lives of the peasantry. Still, at the moment, the users of the minute books meet serious difficulties in finding relevant information since there are no indexes and one has to go through all the materials manually. The minute books are also a fascinating resource for linguists, both dialectologists and computational linguists: the books contain regional varieties tied to specific genre and early time period (making it possible to detect linguistic expressions, which are rare in atlases, for example, and also in dialect corpus, which represents language from about 100 years later) while also being a written resource, reflecting the writing traditions of the old spelling system. This is also what makes these texts complex and challenging for automatic analysis methods, which are otherwise quite well-established in contemporary corpus linguistics.

In our talk we present a project dealing with the digitization and analysis of the minute books from the period between 1866 and 1890. The texts were first digitized in the 2000s and preserved in a server in html-format, which is good for viewing, but not as good for automatic processing. After the server crashed, the texts were rescued via web archives and the structure of the minute books was used to convert the documents automatically into a more functional format using xml-markup and separating the body text with tags referring to information about the titles, dates, indexes, participants, content and topical keywords, which indicate the purview of the communal courts in that period.

We discuss the workflow of creating a digital resource in a standardized and maximally functional format as well as challenges, such as automatic text processing for cleaning and annotating the corpus in order to distinguish the relevant layers of information. In order to enable queries with different degrees of specificity in the corpus, the texts also need to be linguistically analyzed. For both named entity recognition (NER), which enables network analysis and links the events described in the materials to geospatial locations, and morphological annotation, which makes it possible to perform queries based on lemmas or grammatical information, we have applied the Estnltk library in Python, which is developed for contemporary written standard Estonian. For NER, its performance was satisfactory, i.e. it found recognized names well, even though it systematically overrecognized organization names. The most complicated issue so far is the morphological analysis and disambiguation of word forms. Tools developed for Estonian morphological analysis, such as Estnltk or Vabamorf, are trained on contemporary written standard Estonian. Communal court minute books, however, include language variants, which are a mixture of dialectal language, inconsistent spelling and the old spelling system. In the presentation, we introduce the results of our first attempts to apply Estnltk tools to the materials of communal court minute books, the problems that we’ve run into, and provide solutions for overcoming these problems.

The final aim of the project is to create a multifunctional source, which could be of interest for researchers of different fields within the humanities. As the National Archives have a considerable amount of communal court minute books, which are thus far only in a scanned form, the digitized minute books collection is planned to expand using crowdsourcing oportunities.

References:

Estnltk. Open source tools for Estonian natural language processing; https://estnltk.github.io/estnltk/1.2/#.

Vabamorf. Eesti keele morfanalüsaator [‘The morphological analyzer of Estonian’]; https://github.com/Filosoft/vabamorf.

Pilvik-Creating a corpus of communal court minute books-247_a.pdf

5:15pm - 5:30pm
Distinguished Short Paper (10+5min) [publication ready]

FSvReader – Exploring Old Swedish Cultural Heritage Texts

Yvonne Adesam, Malin Ahlberg, Gerlof Bouma

University of Gothenburg,

This paper describes FSvReader, a tool for easier access to Old Swedish (13th–16th century) texts. Through automatic fuzzy linking of words in a text to a dictionary describing the language of the time, the reader has direct access to dictionary pop-up definitions, in spite of the large amount of graphical and morphological variation. The linked dictionary entries can also be used for simple searches in the text, highlighting possible further instances of the same entry.

Adesam-FSvReader – Exploring Old Swedish Cultural Heritage Texts-199_a.pdf
 
4:00pm - 5:30pmW-PIII-1: Computational Linguistics 1
Session Chair: Lars Borin
PIII 
 
4:00pm - 4:30pm
Long Paper (20+10min) [abstract]

Dialects of Discord. Using word embeddings to analyze preferred vocabularies in a political debate: nuclear weapons in the Netherlands 1970-1990

Ralf Futselaar, Milan van Lange

NIOD, Institute for War-, Holocaust-, and Genocide Studies

We analyze the debate about the placement of nuclear-enabled cruise missiles in the Netherlands during the 1970s and 1980s. The NATO “double-track decision” of 1979 envisioned the placement of these weapons in the Netherlands, to which the Dutch government eventually agreed in 1985. In the early 1980s, the controversy regarding placement or non-placement of these missiles led to the greatest popular protests in Dutch history and to a long and often bitter political controversy. After 1985, due to declining tensions between the Societ Block and NATO, the cruise missiles were never stationed in the Netherlands. Much older nuclear warheads, in the country since the early 1960s, remain there until today.

We are using word embeddings to analyze this particularly bipolar debate in the proceedings of the Dutch lower and upper house of Parliament. The official political positions, as expressed in party manifestos and voting behavior inside parliament, were stable throughout this period. We demonstrate that in spite of this apparent stability, the vocabularies used by representatives of different political parties changed significantly through time.

Using the word2vec algorithm, we have created a combined vector including all synonyms and near-synonyms of “nuclear weapon” used in the proceedings of both houses of parliament during the period under scrutiny. Based on this combined vector, and again using word2vec, we have identified nearest neighbors of words used to describe nuclear weapons. These terms have been manually classified, insofar relevant, into terms associated with a pro-proliferation or anti-proliferation viewpoint, for example “defense” and “disarmament” respectively.

Obviously, representatives of all Dutch political parties used words from both categories in parliamentary debates. At any given time, however, we demonstrate that different political parties could be shown to have clear preferences in terms of vocabulary. In the “discursive space” created by the binary opposition between pro- and contra-proliferation words, political parties can be shown to have specific and distinct ways of discussing nuclear weapons.

Using this framework, we have analyzed the changing vocabularies of different political parties. This allows us to show that, while stated policy positions and voting behavior remained unchanged, the language used to discuss nuclear weapons shifted strongly towards anti-proliferation terminology. We have also been able to show that this change happened at different times for different political parties. We speculate that these changes resulted from perceived changes of opinion among the target electorates of different parties, as well as the changing geopolitical climate of the mid-to-late 1980s, where nuclear non-proliferation became a more widely shared policy objective.

In the conclusion of this paper, we show that word embedding models offer a, methodology to investigate shifting political attitudes outside of, and in addition to, stated opinions and voting patterns.

Futselaar-Dialects of Discord Using word embeddings to analyze preferred vocabularies-112_a.pdf
Futselaar-Dialects of Discord Using word embeddings to analyze preferred vocabularies-112_c.pdf

4:30pm - 4:45pm
Distinguished Short Paper (10+5min) [publication ready]

Emerging Language Spaces Learned From Massively Multilingual Corpora

Jörg Tiedemann

University of Helsinki,

Translations capture important information about languages that can be used as implicit supervision in learning linguistic properties and semantic representations. Translated texts are semantic mirrors of the original text and the significant variations that we can observe across languages can be used to disambiguate the meaning of a given expression using the linguistic signal that is grounded in translation. Parallel corpora consisting of massive amounts of human translations with a large linguistic variation can be used to increase abstractions and we propose the use of highly multilingual machine translation models to find language-independent meaning representations. Our initial experiments show that neural machine translation models can indeed learn in such a setup and we can show that the learning algorithm picks up information about the relation between languages in order to optimize transfer leaning with shared parameters. The model creates a continuous language space that represents relationships in terms of geometric distances, which we can visualize to illustrate how languages cluster according to language families and groups. With this, we can see a development in the direction of data-driven typology -- a promising approach to empirical cross-linguistic research in the future.

Tiedemann-Emerging Language Spaces Learned From Massively Multilingual Corpora-176_a.pdf
Tiedemann-Emerging Language Spaces Learned From Massively Multilingual Corpora-176_c.pdf

4:45pm - 5:15pm
Long Paper (20+10min) [publication ready]

Digital cultural heritage and revitalization of endangered Finno-Ugric languages

Anisia Katinskaia, Roman Yangarber

University of Helsinki, Department of Computer Science

Preservation of linguistic diversity has long been recognized as a crucial, integral part of supporting our cultural heritage. Yet many ”minority” languages—lacking state official status—are in decline, many severely endangered. We present a prototype system aimed at ”heritage” speakers of endangered Finno-Ugric languages. Heritage speakers are people who have heard the language used by the older generations while they were growing up, and possess a considerable passive

competency (well beyond the ”beginner” level), but are lacking in active fluency. Our system is based on natural language processing and artificial intelligence. It assists the learners by allowing them to use arbitrary texts of their choice, and by creating exercises that require them to engage in active production of language—rather than in passive memorization of material. Continuous automatic assessment helps guide the learner toward improved fluency. We believe that providing such AI-based tools will help bring these languages to the forefront of the modern digital age, raise prestige, and encourage the younger generations to become involved in reversal of decline.

Katinskaia-Digital cultural heritage and revitalization of endangered Finno-Ugric languages-228_a.pdf

5:15pm - 5:30pm
Short Paper (10+5min) [publication ready]

The Fractal Structure of Language: Digital Automatic Phonetic Analysis

William A Kretzschmar Jr

University of Georgia,

In previous study of the Linguistic Atlas data from the Middle and South Atlantic States (e.g. Kretzschmar 2009, 2015), it has been shown that the frequency profiles of variant lexical responses to the same cue are all patterned in nonlinear A-curves. Moreover, these frequency profiles are scale-free, in that the same A-curve patterns occur at every level of scale. In this paper, I will present results from a new study of Southern American English that, when completed, will include over one million vowel measurements from interviews with a sample of sixty-four speakers across the South. Our digital methods, adaptation of the DARLA and FAVE tools for forced alignment and automatic formant extraction, prove that speech outside of the laboratory or controlled settings can be processed by automatic means on a large scale. Measurements in F1/F2 space are analyzed using point-pattern analysis, a technique for spatial data, which allows for creation and comparison of results without assumptions of central tendency. This Big Data resource allows us to see the fractal structure of language more completely. Not only do A-curve patterns describe the frequency profiles of lexical and IPA tokens, but they also describe the distribution of measurements of vowels in F1/F2 space, for groups of speakers, for individual speakers, and even for separate environments in which vowels occur. These findings are highly significant for how linguists make generalizations about phonetic data. They challenge the boundaries that linguists have traditionally drawn, whether geographic, social, or phonological, and demand that we use a new model for understanding language variation.

Kretzschmar Jr-The Fractal Structure of Language-108_a.pdf
Kretzschmar Jr-The Fractal Structure of Language-108_c.pdf
 
4:00pm - 5:30pmW-PIV-1: Infrastructure and Support
Session Chair: Tanja Säily
PIV 
 
4:00pm - 4:30pm
Long Paper (20+10min) [publication ready]

Towards an Open Science Infrastructure for the Digital Humanities: The Case of CLARIN

Koenraad De Smedt1, Franciska De Jong2, Bente Maegaard3, Darja Fišer4, Dieter Van Uytvanck2

1University of Bergen, Norway; 2CLARIN ERIC, The Netherlands; 3University of Copenhagen, Denmark; 4University of Ljubljana and Jožef Stefan Institute, Slovenia

CLARIN is the European research infrastructure for language resources. It is a sustainable home for digital research data in the humanities and it also of-fers tools and services for annotation, analysis and modeling. The scope and structure of CLARIN enable a wide range of studies and approaches, in-cluding comparative studies across regions, periods, languages and cul-tures. CLARIN does not see itself as a stand-alone facility, but rather as a player in making the vision that is underlying the emerging European poli-cies towards Open Science a reality, interconnecting researchers across na-tional and discipline borders by offering seamless access to data and ser-vices in line with the FAIR data principles. CLARIN also aims contribute to responsible data science by the design as well as the governance of its in-frastructure and to achieve an appropriate and transparent division of re-sponsibilities between data providers, technical centres, and end users. CLARIN offers training towards digital scholarship for humanities scholars and aims at increased uptake from this audience.

De Smedt-Towards an Open Science Infrastructure for the Digital Humanities-249_a.pdf
De Smedt-Towards an Open Science Infrastructure for the Digital Humanities-249_c.pdf

4:30pm - 4:45pm
Short Paper (10+5min) [abstract]

The big challenge of data! Managing digital resources and infrastructures for digital humanities researchers

Isto Huvila

Uppsala University,

Digital humanities research is dependent on the development and seizing of appropriate digital methods and technologies, collection and digitisation of data, and development of relevant and practicable research questions. In the long run, the potential of the field to sustain as a significant social intellectual movement (or in Kuhnian terms, paradigm) is, however, conditional to the sustainability of the scholarly practices in the field. Digital humanities research has already moved from early methodological experiments to the systematic development of research infrastructures. These efforts are based both on the explicit needs to develop new resources for digital humanities research and on the strategic initiatives of the keepers of relevant existing collections and datasets to open up their holdings for users. Harmonisation and interoperability of the evolving infrastructures are in different stages of developments both nationally and internationally but in spite of the large number of practical difficulties, the various national, European (e.g. DARIAH, CLARIN and ARIADNE) and international initiatives are making progress in this respect. The sustainability of digital infrastructures is another issue that has been scrutinised and addressed both in theory and practice under the auspices of national data archives, specialist organisations like the British Digital Curation Centre and international discussions, for instance, within the iPRES conference community. However, an aspect of the management of the infrastructures that has received relatively little attention so far, is management for use. We are lacking a comprehensive understanding of how the emerging digital data and infrastructures are used, could be used and consequently, how the emanating resources should be managed to be useful for digital humanities research not only in the context within which they were developed but also for other researchers and many cases users outside of the academia.

This paper discusses the processes and competences for the management of digital humanities resources and infrastructures for (theoretically) maximising their current and future usefulness for the purposes of research. On the basis of empirical work on archaeological research data in the context of the Swedish Archaeological Information in the Digital Society (ARKDIS) research project (Huvila, 2014) and a comparative study with selected digital infrastructures in other branches of humanities research, a model of use-oriented management of research data with central processes and competences is presented. The suggested approach complements existing digital curation and management models by opening up the user side processes of digital humanities data resources and their implications for the functioning, development and management of appropriate research infrastructures. Theoretically the approach draws from the records continuum theory (as formulated by Upward and colleagues (e.g. Upward, 1996, 1997, 2000; McKemmish, 2001)) and Pickering’s notion of the mangle of practice (Pickering, 1995) developed in the context of the social studies of science. The model demonstrates the significance of being sensitive to explicit wants and needs of the researchers (users) but also the implicit, often tacit requirements that emerge from their practical research work. Simultaneously, the findings emphasise the need of a meta-competence to manage the data and provide appropriate services for its users.

References

Huvila, I. (Ed.) (2014). Perspectives to Archaeological Information in the Digital Society. Uppsala: Department of ALM, Uppsala University.

URL http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-240334

McKemmish, S. (2001). Placing Records Continuum Theory and Practice. Archival Science, 1(4), 333–359.

URL http://dx.doi.org/10.1023/A:1016024413538

Pickering, A. (1995). The Mangle of Practice: Time, Agency, and Science. Chicago: University of Chicago Press.

Upward, F. (1996). Structuring the Records Continuum Part One: Postcustodial Principles and Properties. Archives and Manuscripts, 24(2), 268– 285.

Upward, F. (1997). Structuring the Records Continuum, Part Two: Structuration Theory and Recordkeeping. Archives and Manuscripts, 25(1), 10–35.

Upward, F. (2000). Modelling the continuum as paradigm shift in recordkeeping and archiving processes, and beyond–a personal reflection. Records Management Journal, 10(3), 115–139.

Huvila-The big challenge of data! Managing digital resources and infrastructures-104_a.pdf
Huvila-The big challenge of data! Managing digital resources and infrastructures-104_c.pdf

4:45pm - 5:00pm
Short Paper (10+5min) [abstract]

Research in Nordic literary collections: What is possible and what is relevant?

Mads Rosendahl Thomsen1, Kristoffer Laigaard Nielbo2, Mats Malm3

1Aarhus University; 2University of Southern Denmark; 3University of Gothenburg

There are a growing number of digital literary collections in the Nordic countries that make the literary heritage accessible and have great potential for research that takes advantage of machine readable texts. These collections range from very large collections such as the Norwegian Bokhylla, medium-sized collections such as the Swedish Litteraturbanken and the Danish Arkiv for Dansk Litteratur, to one-author collections, e.g. the collected works of N.F.S. Grundtvig. In this presentation we will discuss some of the obstacles for a more widespread use of these collections by literary scholars and present outcomes of a series of seminars – UCLA 2015, Aarhus 2016, UCLA 2017 – sponsored by the Fondation Maison des sciences de l’homme courtesy of a grant from the Andrew Carnegie Mellon Foundation.

We find that there are two important thresholds in the use of collections:

1) The technical obstacles for collecting the right corpora and applying the appropriate tools for analysis are too high for the majority of researchers working in literary studies. While much have been done to advance the access to works, differences in formats and metadata make it difficult to work across the collections. Our project has addressed this issue by creating a Nordic github repository for literary texts, CLEAR, which provides cleaned versions of Nordic literary works, as well as a suite of tools in Python.

2) The capacity to combine traditional hermeneutical approaches to literary studies with computational approaches is still in its infancy despite numerous good studies from the past years, e.g. by Stanford Literary Lab, Leonard and Tangherlini and Ted Underwood. We have worked to bring together in our series of seminar scholars with great technical prowess and more traditionally trained literary scholars in a series of seminars to generate projects that are technically feasible and scholarly relevant. The process of expanding the methodological vocabulary of literary studies is complicated and requires significant domain expertise to verify the outcome of computational analyses, and conversely, openness to work with results that cannot be verified by close readings. In this presentation we will show how thematic variation and readability can provide new perspectives on Swedish and Danish modernist literature, and discuss how this relates to more general visions of literary studies in an age of computation (Heise, Thomsen).

Literature

Algree-Hewitt, Mark et al. 2016. ”Canon/Archive. Large-scale Dynamics in the Literary Field.” Stanford Literary Lab Pamphlet 11.

Heise, Ursula. 2017. “Comparative literature and computational criticism: A conversation with Franco Moretti.” Futures of Comparative Literature: ACLA State of the Discipline Report. London: Routledge, 2017.

Leonard, Peter and Timothy R. Tangherlini. 2013. “Trawling in the Sea of the Great Unread: Sub-Corpus Topic Modeling and Humanities Research”. Poetics 41(6): 725-749.

Thomsen, Mads Rosendahl et al. 2015. “No Future without Humanities.” Humanities 1.

Underwood, Ted. 2013. Why Literary Period Mattered. Stanford: Stanford University Press.

Thomsen-Research in Nordic literary collections-133_a.pdf

5:00pm - 5:30pm
Long Paper (20+10min) [publication ready]

Reassembling the Republic of Letters - A Linked Data Approach

Jouni Tuominen1,2, Eetu Mäkelä1,2, Eero Hyvönen1,2, Arno Bosse3, Miranda Lewis3, Howard Hotson3

1Aalto University, Semantic Computing Research Group (SeCo); 2University of Helsinki, HELDIG – Helsinki Centre for Digital Humanities; 3University of Oxford, Faculty of History

Between 1500 and 1800, a revolution in postal communication allowed ordinary men and women to scatter letters across and beyond Europe. This exchange helped knit together what contemporaries called the respublica litteraria, Republic of Letters, a knowledge-based civil society, crucial to that era’s intellectual breakthroughs, and formative of many modern European values and institutions. To enable effective Digital Humanities research on the epistolary data distributed in different countries and collections, metadata about the letters have been aggregated, harmonised, and provided for the research community through the Early Modern Letters Online (EMLO) service. This paper discusses the idea and benefits of using Linked Data as a basis for the next digital framework of EMLO, and presents experiences of a first demonstrational implementation of such a system.

Tuominen-Reassembling the Republic of Letters-207_a.pdf
Tuominen-Reassembling the Republic of Letters-207_c.pdf
 
6:00pm - 8:00pmJoint reception with Nordic Challenges conference
Main Building of the University, entrance from the Senate Square side