Conference Agenda

Digital Humanities in the Nordic Countries 3rd Conference

Date: Thursday, 08/Mar/2018

8:00am - 9:00am

Breakfast

Lobby, Porthania

9:00am - 10:30am

Plenary 2: Kathryn Eccles
Session Chair: Eero Hyvönen

Finding the Human in Data: What can Digital Humanities learn from digital transformations in cultural heritage?

PII

10:30am - 11:00am

Coffee break

Lobby, Porthania

11:00am - 12:30pm

T-PII-1: Our Digital World
Session Chair: Leo Lahti

PII

11:00am - 11:15am
Short Paper (10+5min) [publication ready]

The unchallenged persuasions of mobile media technology: The pre-domestication of Google Glass in the Finnish press

Minna Saariketo

Aalto University,

In recent years, networked devices have taken an ever tighter hold of

people’s everyday lives. The tech companies are frantically competing to grab

people’s attention and secure a place in their daily routines. In this short paper, I

elaborat further a key finding from an analysis of Finnish press coverage on

Google Glass between 2012 and 2015. The concept of pre-domestication is used

to discuss the ways in which we are invited and persuaded by the media discourse

to integrate ourselves in the carefully orchestrated digital environment. It is

shown how the news coverage deprives potential new users of digital technology

a chance to evaluate the underpinnings of the device, the attachments to data harvesting, and the practices of hooking attention. In the paper, the implications of

contemporary computational imaginaries as (re)produced and circulated in the

mainstream media are reflected, thereby shedding light on and opening possibilities to criticize the politics of mediated pre-domestication.

11:15am - 11:30am
Distinguished Short Paper (10+5min) [publication ready]

Research of Reading Practices and ’the Digital’

Anna Kaisa Kajander

University of Helsinki,

Books and reading habits belong to one of the areas of our everyday lives that have strongly been affected by digitalisation. The subject has been lifted repeatedly to public discussions in Finnish mainstream media, and the typical discourse is focused on e-books and printed books, sometimes still in a manner which juxtaposes the formats. Another aspect of reading that has gained publicity recently, concerns the decreasing interest towards books in general. The acceptance of e-books and the status of printed books in contemporary reading have raised questions, but it has also been realised that the recent changes are connected with digitalisation in a wider cultural context. It has enabled new forms of reading and related habits, which benefit readers and book culture, but it has also affected free time activities that do not support interest towards books.

In this paper, my aim is to discuss the research of books and reading as a socio-cultural practice, and ask if this field could benefit from co-operation with digital humanities scholars. The idea of combining digital humanities with book research is not new; collaboration has been welcomed especially in research that focuses on new technologies of books and the use of digitised historical records, such as bibliographies. However, I would like to call for discussion on how digital humanities could benefit the research of (new) reading practices and the ordinary reader. Defining ‘the digital’ would be essential, as well as knowledge of relevant methodologies, tools and data. I will first introduce my ongoing PhD-project and present some questions that I have had during the process. Then, based on the questions, I’d like to discuss what kind of co-operation between digital humanities and reading research could be useful to help gain knowledge of the change in book reading, related habits and contemporary readership.

PhD-project Life as a Reader

In my ongoing dissertation project, I am focusing on attitudes and expectations towards printed and e-books and new reading practices. The research material I am using consists of approximately 540 writings that were sent to the Finnish Literature Society in a public collection called “Life as a reader” in 2014. This collection was organised by Society’s Literary- and Traditional archives in co-operation with the Finnish Bookhistorical Society, and the aim was to focus on reading as a socio-cultural practice. The organisers wanted people to write in their own words about reading memories. They also included questions in the collection call, which handled, for example, topics about childhood and learning to read, reading as a private or shared practice, places and situations of reading, and experiences about recent changes, such as opinions about e-books and virtual book groups. Book historical interests were visible in the project, as all of the questions mentioned above had been apparent also in other book history research; interests towards the ordinary readers and their every day lives, the ways readers find and consume texts and readership in the digital age.

In the dissertation I will focus on the writings and especially on those writers, who liked to read books for pleasure and as a free time activity. The point is to emphasise the readers point of view to the recent changes. I argue that if we want to understand attitudes towards reading or the possible futures of reading habits, we need to understand the different practices, which the readers themselves attach to their readership. The main focus is on attitudes and expectations towards books as objects, but equally important is to scrutinise other digitally-related changes that have affected their reading practices. I am analysing these writings focusing especially to the different roles books as objects play in readers lives and to the attitudes towards digitalisation as a cultural change. The ideas behind the research questions have been based on my background as an ethnologist interested in material culture studies. I believe the concept of materiality and research of reading as a sensory experience are important in understanding of attitudes towards different book formats, readers choices and wishes towards the development of books.

Aspects of readership

The research material turned out to be rich in different viewpoints towards reading. As I expected, people wrote about their feelings about the different book formats and their reasons for choosing them. However, during the process of analysis, it become clear that to find answers to questions about the meanings of materialities of books, knowledge about the different aspects of reading habits, that the writers themselves connected to their identities as readers, was also needed. This meant focusing on writings about practices that reached further than only to book formats and reading moments. I am now in the phase of analysing the ways in which people, for example, searched for and found books, collected them and discussed literature with other readers. These activities were often connected to social media, digital book stores and libraries, values of (e-)books as objects and affects of different formats to the practices. What also became clear was that other free time activities and use of media affected to the amount of time used for reading, also for those writers that were interested in books and liked to read.

As the material was collected at the time when smartphones and tablets, which are generally considered having made an essential impact to reading habits, had only quite recently become popular and well known objects, the writings were often focused on the change and on uncertain futures of books. The mentioned practices were connected to concepts such as ownership, visibility and representation. As digital texts had changed the ways these aspects were understood, they also seemed to have caused negative associations towards digitalisation of books, especially among readers who saw the different aspects of print culture as positive parts of their readership. However, there were also friends of printed books who saw digital services as something very positive; as things that supported their reading habits. Writings about, for example, finding books to read or discussing literature with other readers online, writing and publishing book reviews in blogs or being active in GoodReads or Book Crossing websites were all seen as welcomed aspects of “new” readership. A small minority of the writers also wrote about fanfiction and electronic literature.

To compare the time of material collection with the present day, digital book services, such as e-book and audiobook services, have been gaining popularity, but the situation is not radically different from 2014. E-books have perhaps become better known since then, but they still are marginal in comparison with printed books. This means that they have not gained popularity as was expected in previous years. To my knowledge, the other aspects of new reading practices, such as the meanings of social media or the interests towards electronic literature have not yet been studied much in the Finnish context. These observations lead to the questions of the possible benefits of digital humanities for book and reading research.

Collaborating with digital humanists?

The changes in books and reading cause worries but also hopes and interest towards the future reading habits. To gain and produce knowledge about the change, we need to define what are ‘the digital’ or ‘digitalisation’, that are so often referred to in different contexts without any specific definitions. The problem is that they can mean and include various things that are attached to both technological and cultural sides of the phenomenon. For those interested in reading research, it would be important to theorise digitalisation from perspectives of readers and view the changes in reading from socio-cultural side; as concrete changes in material environment and as new possibilities to act as a reader. This field of research would benefit from collaboration with digital humanists who have knowledge about ‘the digital’ and the possibilities of reading related software and devices.

Secondly we could benefit from discussions about the possibilities to collect, use and save data that readers now leave behind, as they practice readership in digital environments. Digital book stores, library services and social media sites would be useful sources, but more knowledge is still needed about the nature of these kinds of data; which aspects affect the data, how to get the data, which tools use, etc.. Questions about collecting and saving data also include important questions related to research ethics, that also should be further discussed in book research; which data should be open and free to use, who owns the data, which permissions would be required to study certain websites? Changes in free time activities in general have also raised questions about data that could be used for comparing the time used different activities and on the other hand on reading habits.

Thirdly collaboration is needed when reading related databases are being developed. Some steps have already been taken, for example in the project Finnish Reading Experience Database, but these kinds of projects could be also further developed. Again collecting digital data but also opening and using them for different kinds of research questions is needed. At its best, multidisciplinary collaboration could help building new perspectives and research questions about the contemporary readership, and therefore all discussion and ideas that could benefit the field of books and reading would be welcome.

11:30am - 11:45am
Short Paper (10+5min) [publication ready]

Exploring Library Loan Data for Modelling the Reading Culture: project LibDat

Mats Neovius¹, Kati Launis², Olli Nurmi³

¹bo Akademi University; ²University of Eastern Finland; ³VTT research center

Reading is evidently a part of the cultural heritage. With respect to nourishing this, Finland is exceptional in the sense it has a unique library system, used regularly by 80% of the population. The Finnish library system is publicly funded free-of-charge. On this, the consortium “LibDat: Towards a More Advanced Loaning and Reading Culture and its Information Service” (2017-2021, Academy of Finland) set out to explore the loaning and reading culture and its information service to the end that this project’s results would help the officials to elaborate upon Finnish public library services. The project is part of the constantly growing field of Digital Humanities and wishes to show how large born-digital material, new computational methods and literary-sociological research questions can be integrated into the study of contemporary literary culture. The project’s collaborator Vantaa City Library collect the daily loan data. This loan data is objective, crisp, and big. In this position paper, the main contribution is a discussion on limitations the data poses and the literary questions that may be shed light on by computational means. For this, we de-scribe the data structure of a loan event and outline the dimensions in how to in-terpret the data. Finally, we outline the milestones of the project.

11:45am - 12:00pm
Short Paper (10+5min) [publication ready]

Virtual Museums and Cultural Heritage: Challenges and Solutions

Nadezhda Povroznik

Perm State National Research University, Center for Digital Humanities

The paper is devoted to demonstrate the significance of virtual museums’ study, to define more exactly the term “virtual museum” and its content, to show the problems of existing virtual museums and those complexities, which they represent for the study of cultural heritage, to show the problems of usage of virtual museum content in classical researches, which are connected with the specificity of virtual museums as informational resources and to demonstrate possible decisions of problems, sorting out all possible ways of the most effective usage of Cultural Heritage in humanities researches. The study pays attention to the main problems, related to the preservation, documentation, representation and use of CH associated with the virtual museums. It provides the basis for solving these problems, based on the subsequent development of an information system for the study of virtual museums and their wider use.

12:00pm - 12:15pm
Short Paper (10+5min) [abstract]

The Future of Narrative Theory in the Digital Age?

Hanna-Riikka Roine

University of Helsinki

As it has often been noted, digital humanities are to be understood in plural. It seems, however, that quite as often they are understood as the practice of introducing digital methods to humanities, or a way to analyse “the digital” within the humanist framework. This presentation takes a slightly different approach, as its aim is to challenge some of the traditional theoretical concepts within a humanist field, narrative theory, through the properties of today’s computational environment.

Narrative theory has originated from literary criticism and based its concepts and understanding of narrative in media on printed works. While few trends with a more broadly defined base are emerging (e.g. the project of “transmedial narratology”), the analysis of verbal narrative structures and strategies from the perspective of literary theory remains the primary concern of the field (see Kuhn & Thon 2017). Furthermore, the focus of current research is mostly medium-specific, while various phenomena studied by narratology (e.g. narrativity, worldbuilding) are agreed to be medium-independent.

My presentation starts from the fact that the ancient technology of storytelling has become enmeshed in a software-driven environment which not only has the potential to simulate or “transmediate” all artistic media, but also differs fundamentally from verbal language in its structure and strategies. This development or “digital turn” has so far mostly escaped the attention of narratologists, although it has had profound effects on the affordances and environments of storytelling.

In my presentation, I take up the properties of computational media that challenge the print-based bias of current narrative theory. As a starting point, I suggest that the scope of narrative theory should be extended to the machines of digital media instead of looking at their surface (cf. Wardrip-Fruin 2009). As software-driven, conditional, and process-based, storytelling in computational environments is not so much about disseminating a single story, but rather about multiplication of narrative, centering upon the underlying patterns on which varied instantiations can be based. Furthermore, they challenge the previous theoretical emphasis on fixed media content and author-controlled model of transmission. (See e.g. Murray 1997 and 2011; Bogost 2007, Hayles 2012, Manovich 2013, Jenkins et al. 2013.)

Because computational environments represent “a new norm” compared to the prototypical narrative developed in the study of literary fiction, Brian McHale has recently predicted that narrative theory “might become divergent and various, multiple narratologies instead of one – a separate narratology for each medium and intermedium” (2016, original emphasis). In my view, such a future fragmentation of the field would only diminish the potential of narrative theory. Instead, the various theories could converge or hybridize in a similar way that contemporary media has done – especially in the study of today’s transmedia which is hybridizing both in the sense of content being spread across media and in the sense of media being incorporated by computer and thus, acquiring the properties of computational environments.

The consequences of the recognition of media convergence or hybridization in narrative theory are not only (meta)theoretical. The primary emphasis on media content is still clearly visible in the division of modern academic study of culture and its disciplines – literary studies focus on literature, for example. While the least that narrative theory can do is expanding “potential areas of cross-pollination” (Kuhn & Thon 2017) with media studies, for example, and challenging the print-based assumptions behind concepts such as narrativity or storyworld, there may also be a need to affect some changes in the working methods of narratologists. Creating multidisciplinary research groups focusing on narrative and storytelling in current computational media is one solution (still somewhat unusual in the “traditional” humanities focused on single-authored articles and monographs), while the other is critically reviewing the academic curricula. N. Katherine Hayles, for example, has “Comparative Media Studies approach” (2012) to describe transformed disciplinary coherence that literary studies might embrace.

In my view, narrative theory can truly be “transmedial” and contribute to the study of storytelling practices and strategies in contemporary computational media, but various print- and content-based biases underlying its toolkit must be genuinely addressed first. The need for this is urgent not only because “narratives are everywhere”, but also because the old traditional online/offline distinction has begun to disappear.

References

Bogost, Ian. 2007. Persuasive Games: The Expressive Power of Videogames. Cambridge, Ma: The MIT Press.

Hayles, N. Katherine. 2012. How We Think: Digital Media and Contemporary Technogenesis. Chicago: Univ. of Chicago Press.

Jenkins, Henry, Sam Ford, and Joshua Green. 2013. Spreadable Media: Creating Value and Meaning in a Networked Culture. New York: New York Univ. Press.

Kuhn, Markus and Jan-Noël Thon. “Guest Editors’ Column. Transmedial Narratology: Current Approaches.” NARRATIVE 25:3 (2017): 253–255.

Manovich, Lev. 2013. Software Takes Command: Extending the Language of New Media. New York and London: Bloomsbury.

McHale, Brian. “Afterword: A New Normal?” In Narrative Theory, Literature, and New Media: Narrative Minds and Virtual Worlds, edited by Mari Hatavara, Matti Hyvärinen, Maria Mäkelä, and Frans Mäyrä, 295–304. London: Routledge, 2016.

Murray, Janet. 1997. Hamlet on the Holodeck: The Future of Narrative in Cyberspace. New York: The Free Press.

―――. 2011. Inventing the Medium. Principles of Interaction Design as a Cultural Practice. Cambridge, Ma: The MIT Press.

Wardrip-Fruin, Noah. 2009. Expressive Processing: Digital Fictions, Computer Games, and Software Studies. Cambridge, Ma. and London: The MIT Press.

12:15pm - 12:30pm
Short Paper (10+5min) [abstract]

Broken data and repair work

Minna Ruckenstein

Consumer Society Research Centre, University of Helsinki, Finland,

Recent research introduces a concept-metaphor of “broken data”, suggesting that digital data might be broken and fail to perform, or be in need of repair (Pink et al, forthcoming). Concept-metaphors, anthropologist Henrietta Moore (1999, 16; see also Moore 2004) argues, are domain terms that “open up spaces in which their meanings – in daily practice, in local discourses and in academic theorizing – can be interrogated”. By doing so, concept-metaphors become defined in practice and in context; they are not meant to be foundational concepts, but they work as partial and perspectival framing devices. The aim of a concept-metaphor is to arrange and provoke ideas and act as a domain within which facts, connections and relationships are presented and imagined.

In this paper, the concept-metaphor of broken data is discussed in relation to the open data initiative, Citizen Mindscapes, an interdisciplinary project that contextualizes and explores a Finnish-language social media data set (‘Suomi24’, or Finland24 in English), consisting of tens of millions of messages and covering social media over a time span of 15 years (see, Lagus et al 2016). The role of the broken data metaphor in this discussion is to examine the implications of breakages and consequent repair work in data-driven initiatives that take advantage of secondary data. Moreover, the concept-metaphor can sensitize us to consider the less secure and ambivalent aspects of data worlds. By focusing on how data might be broken, we can highlight misalignments between people, devices and data infrastructures, or bring to the fore the failures to align data sources or uses with the everyday.

As Pink et al (forthcoming) suggest the metaphorical understanding of digital data, aiming to underline aspects of data brokenness, brings together various strands of scholarly work, highlighting important continuities with earlier research. Studies of material culture explore practices of breakage and repair in relation to the materiality of objects, for instance by focusing on art restoration (Dominguez Rubio 2016), or car repair (Dant 2010). Drawing attention to the fragility of objects and temporal decay, these studies underline that objects break and have to be mended and restored. When these insights are brought into the field of data studies, the materiality of platforms and software and subsequent data arrangements, including material restrictions and breakages, become a concern (Dourish 2016; Tanweer et al 2016), emphasizing aspects of brokenness and following repair work in relation to digital data (Pink et al, forthcoming).

In the science and technology studies (STS), on the other hand, the focus on ‘breakages’ has been studied in relation to infrastructures, demonstrating that it is through instances of breakdown that structures and objects, which have become invisible to us in the everyday, gain a new kind of visibility. The STS scholar Stephen Jackson expands the notion of brokenness further to more everyday situations and asks ‘what happens when we take erosion, breakdown, and decay, rather than novelty, growth, and progress, as our starting points in thinking through the nature, use, and effects of information technology and new media?’ (2014: 174). Instances of data breakages can be seen in light of mundane data arrangements, as a recurring feature of data work rather than an exceptional event (Pink et al, forthcoming; Tanweer et al 2016).

In order to concretize further the usefulness of the concept-metaphor of broken data, I will detail instances of breakage and repair in the data work of the Citizen Mindscapes initiative, emphasizing efforts needed to overcome various challenges in working with large digital data. This kind of approach introduces obstacles and barriers that slow or derail the data science process as an important resource for knowledge production and innovation (Tanweer et al 2016). In the collaborative Citizen Mindscapes initiative, discussing the gaps, or possible anomalies in the data led to conversations concerning the production of data, deepening our understanding of the human and material factors at play in processes of data generation.

Identifying data breakages

The Suomi24 data was generated by a media company, Aller. The data set grew on the company servers for over a decade, gaining a new life and purpose when the company decided to open the proprietary data for research purposes. A new infrastructure was needed for hosting and distributing the data. One such data infrastructure was already in place, the Language Bank of Finland, maintained by CSC (IT Centre for Science), developed for acquiring, storing, offering and maintaining linguistic resources, tools and data sets for academic researchers. The Language Bank gave a material structure to the Suomi24 data: it was repurposed as research data for linguistics.

The Korp tool, developed for the analysis of data sets stored in the Language Bank, allowed word searches, in relation to individual sentences, retaining the Suomi24 data as a resource for linguistic research. Yet, the material arrangements constrained other possible uses of the data that were of interest to the Citizen Mindscapes research collective, aiming to work the data to accommodate the social science focus on topical patterns and emotional waves and rhythms characteristic of the social media. In the past two years, the research collective, particularly those members experienced in working with large data sets, have been repairing and cleaning the data in order to make it ready for additional computational approaches. The goal is to build a methodological toolbox that researchers, who do not possess computational skills, but are interested in using digital methods in the social scientific inquiry, can benefit from. This entails, for instance, developing user interfaces that narrow down the huge data set and allow to access data with topic-led perspectives.

The ongoing work has alerted us to breakages of data, raising more general questions about the origins and nature of data. Social media data, such as the Suomi24, is never an accurate, or complete representation of the society. From the societal perspective, the data is broken, offering discontinuous, partial and interrupted views to individual, social and societal aims. The preparation of data for research that takes societal brokenness seriously underlines the importance of understanding the limitations and biases in the production of the data, including insights into how the data might be broken. The first step towards this aim was a research report (Lagus et al 2016) that evaluated and contextualized the Suomi24 data in a wide variety of ways. We paid attention to the writers of the social media community as producers of the data; the moderation practices of the company were described to demonstrate how they shape the data set by filtering certain swearwords and racist terms, or certain kinds of messages, for instance, advertisement or messages containing personal information.

The yearly volume and daily rhythms of the data were calculated based on timestamps, and the topical hierarchies of the data were uncovered by attention to the conversational structure of the social media forum. When our work identified gaps, errors and anomalies in the data, it revealed that data might be broken and discontinuous due to human or technological forces: infrastructure failures, trolling, or automated spam bots. With the information of gaps in the data, we opened a conversation with the social media company’s employees and learned that nobody could tell us about the 2004-2005 gap in the data. A crack in the organizational memory was revealed, reminding of the links between the temporality of data and human memory. In contrast, the anomaly in the data volume in July 2009 which we first suspected was a day when something dramatic happened that created a turmoil in the social media, turned out to be a spam bot, remembered very well in the company.

In the field of statistics, for instance, research might require intimate knowledge of all possible anomalies of the data. What appears as incomplete, inconsistent and broken to some practitioners might be irrelevant for others, or a research opportunity. The role of the concept-metaphor of broken data is to open a space for discussion about these differences, maintaining them, rather than resolving them. One option is to highlight how data is seen as broken in different contexts and compare the breakages, and then follow what happens after them, and focus on the repair and cleaning work

Concluding remarks

The purpose of this paper has been to introduce the broken data metaphor that calls for paying more attention to the incomplete and fractured character of digital data. Acknowledging the incomplete nature of data in itself is of course nothing new, researcher are well aware of their data lacking perfection. With growing uses of secondary data, however, the ways in which data is broken might not be known beforehand, underlining the need to pay more careful attention to brokenness and the consequent work of repair. In the case of Suomi24data, the data breakages suggest that we need to actively question data production and the diverse ways in which data are adapted for different ends by practitioners. As described above, the repurposed data requires an infrastructure, servers and cloud storage; the software and analytics tools enable certain perspectives and operations and disable others, Data is always inferred and interpreted in infrastructure and database design and by professionals, who see the data, and its possibilities, differently depending on their training. As Genevieve Bell (2015: 16) argues, the work of coding data and writing algorithms determines ‘what kind of relationships there should be between data sets’ and by doing so, data work promotes judgments about what data should speak to what other data. As our Citizen Mindscapes collaboration suggests, making ‘data talk’ to other data sets, or to interpreters of data, is permeated by moments of breakdown and repair that call for a richer understanding of everyday data practices. The intent of this paper has been to suggest that a focus on data breakages is an opportunity to learn about everyday data worlds, and to account for how data breakages challenge the linear, solutionist, and triumphant stories of big data.

References:

Bell, G. (2015). ‘The secret life of big data’. In Data, now bigger and better! Eds. T. Boellstorf and B. Maurer. Publisher: Prickly Paradigm Press,7-26

Dant, T., 2010. The work of repair: Gesture, emotion and sensual knowledge. Sociological Research Online, 15(3), p.7.

Domínguez Rubio, F. (2016) ‘On the discrepancy between objects and things: An ecological approach’ Journal of Material Culture. 21(1): 59–86

Jackson, S.J. (2014) ‘Rethinking repair’ in T. Gillespie, P. Boczkowski, and K. Foot, eds. Media Technologies: Essays on Communication, Materiality and Society. MIT Press: Cambridge MA

Lagus, K. M. Pantzar, M. Ruckenstein, and M. Ylisiurua. (2016) Suomi24: Muodonantoa aineistolle. The Consumer Society Research Centre. Helsinki: Faculty of Social Sciences, University of Helsinki.

Moore, H (1999) Anthropological theory at the turn of the century in H. Moore (ed) Anthropological theory today. Cambridge: Polity Press, pp. 1-23.

Moore, H. L. (2004). Global anxieties: concept-metaphors and pre-theoretical commitments in anthropology. Anthropological theory, 4(1), 71-88.

Pink et al, forthcoming. Broken data: data metaphors for an emerging world. Big data & Society.

Tanweer, A., Fiore-Gartland, B., & Aragon, C. (2016). Impediment to insight to innovation: understanding data assemblages through the breakdown–repair process. Information, Communication & Society, 19(6), 736-752.

11:00am - 12:30pm

T-PIII-1: Open and Closed
Session Chair: Olga Holownia

PIII

11:00am - 11:30am
Long Paper (20+10min) [abstract]

When Open becomes Closed: Findings of the Knowledge Complexity (KPLEX) Project.

Jennifer Edmond, Georgina Nugent Folan, Vicky Garnett

Trinity College Dublin, Ireland

The future of cultural heritage seems to be all about “data.” A Google search on the term ‘data’ returns over 5.5 billion hits, but the fact that the term is so well embedded in modern discourse does not necessarily mean that there is a consensus as to what it is or should be. The lack of consensus regarding what data are on a small scale acquires greater significance and gravity when we consider that one of the major terminological forces driving ICT development today is that of "big data." While the phrase may sound inclusive and integrative, "big data" approaches are highly selective, excluding any input that cannot be effectively structured, represented, or, indeed, digitised. The future of DH, of any approaches to understanding complex phenomena or sources such as are held in cultural heritage institutions, indeed the future of our increasingly datafied society, depend on how we address the significant epistemological fissures in our data discourse. For example, how can researchers claim that "when we speak about data, we make no assumptions about veracity" while one of the requisites of "big data" is "veracity"? On the other hand, how can we expect humanities researchers to share their data on open platforms such as the European Open Science Cloud (EOSC) when we, as a community, resist the homogenisation implied and required by the very term “data”, and share our ownership of it with both the institutions that preserve it and the individuals that created it? How can we strengthen European identities and transnational understanding through the use of ICT systems when these very systems incorporate and obscure historical biases between languages, regions and power elites? In short, are we facing a future when the mirage of technical “openness” actually closes off our access to the perspectives, insight and information we need as scholars and as citizens? Furthermore, how might this dystopic vision be avoided?

These are the kinds of questions and issues under investigation by the European Horizon 2020 funded Knowledge Complexity (KPLEX) project. by applying strategies developed by humanities researchers to deal with complex messy, cultural data; the very kind of data that resists datafication and poses the biggest challenges to knowledge creation in large data corpora environments. Arising out of the findings of the KPLEX project, this paper will present the synthesised findings of an integrated set of research questions and challenges addressed by a diverse team led by Trinity College Dublin (Ireland) and encompassing researchers in Freie Universität Berlin (Germany), DANS-KNAW (The Hague) and TILDE (Latvia). We have adopted a comparative, multidisciplinary, and multi-sectoral approach to addressing the issue of bias in big data; focussing on the following 4 key challenges to the knowledge creation capacity of big data approaches:

1. Redefining what data is and the terms we use to speak of it (TCD);

2. The manner in which data that are not digitised or shared become "hidden" from aggregation systems (DANS-KNAW);

3. The fact that data is human created, and lacks the objectivity often ascribed to the term (FUB);

4. The subtle ways in which data that are complex almost always become simplified before they can be aggregated (TILDE).

The paper will presenting a synthesised version of these integrated research questions, and discuss the overall findings and recommendations of the project, which completes its work at the end of March 2018. What follows gives a flavour of the work ongoing at the time of writing this abstract, and the issues that will be raised in the DHN paper.

1. Redefining what data is and the terms we use to speak of it. Many definitions of data, even thoughtful scholarly ones, associate the term with a factual or objective stance, as if data were a naturally occurring phenomenon. But data is not fact, nor is it objective, nor can it be honestly aligned with terms such as ‘signal’ or ‘stimulus,’ or the quite visceral (but misleading) ‘raw data.’ To become data, phenomena must be captured in some form, by some agent; signal must be separated from noise, like must be organised against like, transformations occur. These organisational processes are human determined or human led, and therefore cannot be seen as wholly objective; irrespective of how effective a (human built) algorithm may be. The core concerns of this facet of the project was to expand the understanding of the heterogeneity of definitions of data, and the implications of this state of understanding. Our primary ambition under this theme was to establish a clear taxonomy of existing theories of data, to underpin a more applied, comparative comparison of humanistic versic technical applications of the term. We did this by identifying the key terms (and how they are used differently), key points of bifurcation, and key priorities under each conceptualisation of data. As such, this facet of the project supported the integrated advancement of the three other project themes, as well as itself developing new perspectives on the rhetorical stakes and action implications of differing concepts of the term ‘data’ and how these will impact on the future not only of DH but of society at large.

2. Dealing with ‘hidden’ data. According to the 2013 ENUMERATE Core 2 survey, only 17% of the analogue collections of European heritage institutions had at that time been digitised. This number actually represents a decrease over the findings of their 2012 survey (almost 20%). The survey also reached only a limited number of respondents: 1400 institutions over 29 countries, which surely captures the major national institutions but not local or specialised ones. Although the ENUMERATE Core 2 report does not break down these results by country, one has to imagine that there would be large gaps in the availability of data from some countries over others. Because so much of this data has not been digitised, it remains ‘hidden’ from potential users. This may have always been the case, as there have always been inaccessible collections, but in a digital world, the stakes and the perceptions are changing. The fact that so much other material is available on-line, and that an increasing proportion of the most well-used and well-financed cultural collections are as well, means that the reasonable assumption of the non-expert user of these collections is that what cannot be found does not exist (whereas in the analogue age, collections would be physically contextualised with their complements, leaving the more likely assumption to be that more information existed, but could not be accessed). The threat that our narratives of histories and national identities might thin out to become based on only the most visible sources, places and narratives is high. This facet of the project explored the manner in which data that are not digitised or shared become "hidden" from aggregation systems.

3. Knowledge organisation and epistemics of data. The nature of humanities data is such that even within the digital humanities, where research processes are better optimised toward the sharing of digital data, sharing of "raw data" remains the exception rather than the norm. The ‘instrumentation’ of the humanities researcher consists of a dense web of primary, secondary and methodological or theoretical inputs, which the researcher traverses and recombines to create knowledge. This synthetic approach makes the nature of the data, even at its ‘raw’ stage, quite hybrid, and already marked by the curatorial impulse that is preparing it to contribute to insight. This aspect may be more pronounced in the humanities than in other fields, but the subjective element is present in any human triggered process leading to the production or gathering of data. Another element of this is the emotional. Emotions are motivators for action and interaction that relate to social, cultural, economic and physiological needs and wants. Emotions are crucial factors in relating or disconnecting people from each other. They help researchers to experientially assess their environments, but this aspect of the research process is considered taboo, as noise that obscures the true ‘factual signal’, and as less ‘scientific’ (seen in terms of strictly Western colonialist paradigms of knowledge creation) than other possible contributors to scientific observation and analysis. Our primary ambition here was to explore the data creation processes of the humanities and related research fields to understand how they combine pools of information and other forms of intellectual processing to create data that resists datafication and ‘like-with-like’ federation with similar results. The insights gained will make visible many of the barriers to the inclusion of all aspects of science under current Open Science trajectories, and reveal further central elements of social and cultural knowledge that are unable to be accommodated under current conceptualisations of ‘data’ and the systems designed to use them.

4. Cultural data and representations of system limitations. Cultural signals are ambiguous, polysemic, often conflicting and contradictory. In order to transform culture into data, its elements – as all phenomena that are being reduced to data – have to be classified, divided, and filed into taxonomies and ontologies. This process of 'data-fication' robs them of their polysemy, or at least reduces it. One of the greatest challenges for so-called Big Data is the analysis and processing of multilingual content. This challenge is particularly acute for unstructured texts, which make up a large portion of the Big Data landscape. How do we deal with multilingualism in Big Data analysis? What are the techniques by which we can analyze unstructured texts in multiple languages, extracting knowledge from multilingual Big Data? Will new computational techniques such as AI deep learning improve or merely alter the challenges? The current method for analyzing multilingual Big Data is to leverage language technologies such as machine translation, terminology services, automated speech recognition, and content analytics tools. In recent years, the quality and accuracy of these key enabling technologies for Big Data has improved substantially, making them indispensable tools for high-demand applications with a global reach. However, just as not all languages are alike, the development of these technologies differs for each language. Larger languages with high populations have robust digital resources for their languages, the result of large-scale digitization projects in a variety of domains, including cultural heritage information. Smaller languages have resources that are much more scant. Those resources that do exist may be underpinned by far less robust algorithms and far smaller bases for the statistical modelling, leading to less reliable results, a fact that in large scale, multilingual environments (like Google translate) is often not made transparent to the user. The KPLEX project is exploring and describing the nature and potential for ‘data’ within these clearly defined sets of actors and practices at the margins of what is currently able to be approached holistically using computational methods. It is also envisioning approaches to the integration of hybrid data forms within and around digital platforms, leading not so much to the virtualisation of information generation as approaches to its augmentation.

11:30am - 11:45am
Short Paper (10+5min) [publication ready]

Open, Extended, Closed or Hidden Data of Cultural Heritage

Tuula Pääkkönen¹, Juha Rautiainen¹, Toni Ryynänen², Eeva Uusitalo²

¹National Library of Finland, Finland; ²Ruralia Institute, University of Helsinki, Finland

The National Library of Finland (NLF) agreed on an “Open National Library” policy in 2016[1]. In the policy there are eight principles, which are divided into accessibility, openness in actions and collaboration themes. Accessibility in the NLF means that access to the material needs to exist both for the metadata and the content, while respecting the rights of the rights holders. Openness in operations means that our actions and decision models are transparent and clear, and that the materials are accessible to the researchers and other users. These are one way in which the NLF can implement the findable, accessible, interoperable, re-usable (FAIR) data principles [2] themes in practise.

The purpose of this paper is to view the way in which the policy has impacted our work and how findability and accessibility have been implemented in particular from the aspects of open, extended, closed and hidden data themes. In addition, our aim is to specify the characteristics of existing and potential forms of data produced by the NLF from the research and development perspectives. A continuous challenge is the availability of the digital resources – gaining access to the digitised material for both researchers and the general public, since there are also constant requests for access to newer materials outside the legal deposit libraries’ work stations

11:45am - 12:00pm
Distinguished Short Paper (10+5min) [publication ready]

Aalto Observatory for Digital Valuation Systems

Jenni Huttunen¹, Maria Joutsenvirta², Pekka Nikander¹

¹Aalto University, Department of Communications and Networking; ²Aalto University, Department of Management Studies

Money is a recognised factor in creating sustainable, affluent societies. Yet, the neoclassical orthodoxy that prevails in our economic thinking remains as a contested area, its supporters claiming their results to be objective- ly true while many heterodox economists claim the whole system to stand on clay feet. Of late, the increased activity around complementary currencies suggest that the fiat money zeitgeist might be giving away to more variety in our monetary system. Rather than emphasizing what money does, as the mainstream economists do, other fields of science allow us to approach money as an integral part of the hierarchies and networks of exchange through which it circulates. This paper suggests that a broad understanding of money and more variety in monetary system have great potentials to further a more equalitarian and sustainable economy. They can drive the extension of society to more inclusive levels and transform people’s economic roles and identities in the process. New technologies, including blockchain and smart ledger technology are able to support decentralized money creation through the use of shared and “open” peer-to-peer rewarding and IOU systems. Alongside of specialists and decision makers’ capabilities, our project most pressingly calls for engaging citizens into the process early on. Multidisciplinary competencies are needed to take relevant action to investigate, envision and foster novel ways for value creation. For this, we are forming the Aalto Observatory on Digital Valuation Systems to gain deeper understandings of sustainable value creation structures enabled by new technology.

12:00pm - 12:15pm
Short Paper (10+5min) [publication ready]

Challenges and perspectives on the use of open cultural heritage data across four different user types: Researchers, students, app developers and hackers

Ditte Laursen¹, Henriette Roued-Cunliffe², Stig Svennigsen¹

¹Royal Danish Library; ²University of Copenhagen

In this paper, we analyse and discuss from a user perspective and from an organisational perspective the challenges and perspectives of the use of open cultural heritage data. We base our study on empirical evidence gathered through four cases where we have interacted with four different user groups: 1) researchers, 2) students, 3) app developers and 4) hackers. Our own role in these cases was to engage with these users as teachers, organizers and/or data providers. The cultural heritage data we provided were accessible as curated data sets or through API's. Our findings show that successful use of open heritage data is highly dependent on organisations' ability to calibrate and curate the data differently according to contexts and settings. More specifically, we show what different needs and motivations different user types have for using open cultural heritage data, and we discuss how this can be met by teachers, organizers and data providers.

12:15pm - 12:30pm
Short Paper (10+5min) [abstract]

Synergy of contexts in the light of digital humanities: a pilot study

Monika Porwoł

State University of Applied Sciences in Racibórz

The present paper describes a pilot study pertaining to the linguistic analysis of meaning with regard to the word ladder[EN]/drabina[PL] taking into account views of digital humanities. Therefore, WordnetLoom mapping is introduced as one of the existing research tools proposed by CLARIN ERIC research and technology infrastructure. The explicated material comprises retrospective remarks and interpretations provided by 74 respondents, who took part in a survey. A detailed classification of multiple word’s meanings is presented in a tabular way (showing the number of contexts, in which participants accentuate the word ladder/drabina) along with some comments and opinions. Undoubtedly, the results suggest that apart from the general domain of the word offered for consideration, most of its senses can usually be attributed to linguistic recognitions. Moreover, some perspectives on the continuation of future research and critical afterthoughts are made prominent in the last part of this paper.

11:00am - 12:30pm

T-PIV-1: Newspapers
Session Chair: Mats Malm

PIV

11:00am - 11:30am
Long Paper (20+10min) [publication ready]

A Study on Word2Vec on a Historical Swedish Newspaper Corpus

Nina Tahmasebi

Göteborgs Universitet,

Detecting word sense changes can be of great interest in the field of digital humanities. Thus far, most investigations and automatic methods have been developed and carried out on English text and most recent methods make use of word embeddings. This paper presents a study on using Word2Vec, a neural word embedding method, on a Swedish historical newspaper collection. Our study includes a set of 11 words and our focus is the quality and stability of the word vectors over time. We investigate to which extent a word embedding method like Word2Vec can be used effectively on texts where the volume and quality is limited.

11:30am - 11:45am
Short Paper (10+5min) [abstract]

A newspaper atlas: Named entity recognition and geographic horizons of 19th century Swedish newspapers

Erik Edoff

Umeå University

What was the outside world for 19th century newspaper readers? That is the overarching problem investigated in this paper. One way of facing this issue is to investigate what geographical places that was mentioned in the newspaper, and how frequently. For sure, newspapers were not the only medium that contributed to 19th century readers’ notion of the outside world. Public meetings, novels, sermons, edicts, travelers, photography, and chapbooks are other forms of media that people encountered with a growing regularity during the century; however, newspapers often covered the sermons, printed lists of travelers and attracted readers with serial novels. This means, at least to some extent, that these are covered in the newspapers columns. And after all, the newspapers were easier to collect and archive than a public meeting, and thus makes it an accessible source for the historian.

Two newspapers, digitized by the National Library of Sweden, are analyzed: Tidning för Wenersborgs stad och län (TW) and Aftonbladet (AB). They are chosen based on their publishing places’ different geographical and demographical conditions as well as the papers’ size and circulation. TW was founded in 1848 in the town of Vänersborg, located on the western shore of lake Vänern, which was connected with the west coast port, Göteborg, by the Trollhätte channel, established in 1800. The newspaper was published in about 500 copies once a week (twice a week from 1858) and addressed a local and regional readership. AB was a daily paper founded in Stockholm in 1830 and was soon to become the leading liberal paper of the Swedish capital, with a great impact on national political discourse. For its time, it was widely circulated (between 5,000 and 10,000 copies) in both Stockholm and the country as a whole. Stockholm was an important seaport on the eastern coast. These geographic distinctions probably mean interesting differences in the papers’ respective outlook. The steamboats revolutionized travelling during the first half of the century, but its glory days had passed around 1870, and was replaced by railways as the most prominent way of transporting people.

This paper is focusing on comparing the geographies of the two newspapers by analyzing the places mentioned in the periods 1848–1859 and 1890–1898. The main railroads of Sweden were constructed during the 1860s, and the selected years therefore cover newspaper geographies before and after railroads.

The main questions of paper addresses relate to media history and history of media infrastructure. During the second half of the 19th century several infrastructure technologies were introduced and developed (electric telegraph, postal system, newsletter corporations, railways, telephony, among others). The hypothesis is that these technologies had an impact on the newspapers’ geographies. The media technologies enabled information to travel great distances in short timespans, which could have homogenizing effects on newspaper content, which is suggested by a lot of traditional research (Terdiman 1999). On the other hand, digital historical research has shown that the development of railroads changed the geography of Houston newspapers, increasing the importance of the near region rather than concentrating geographic information to national centers (Blevins 2014).

The goal of the study is in other words to investigate what these the infrastructural novelties introduced during the course of the 19th century as well as the different geographic and demographic conditions meant for the view of the outside world or the imagined geographies provided by newspapers. The aim of the paper is therefore twofold: (1) to investigate a historical-geographical problem relating to newspaper coverage and infrastructural change and (2) to tryout the use of Named Entity Recognition on Swedish historical newspaper data.

Named Entity Recognition (NER) is a software that is designed to locate and tag entities, such as persons, locations, and organizations. This paper uses SweNER to mine the data for locations mentioned in the text (Kokkinakis et al. 2014). Earlier research has emphasized the problems with bad OCR-scanning of historical newspapers. A picture of a newspaper page is read by an OCR-reading software and converted into a text file. The result contains a lot of misinterpretations and therefore considerable amount of noise (Jarlbrink & Snickars 2017). This is a big obstacle when working with digital tools on historical newspapers. Some earlier research has used and evaluated the performance of different NER-tools on digitized historical newspapers, also underlining the OCR-errors as the main problem with using NER on such data (Kettunen et al. 2017). SweNER has also been evaluated in tagging named entities in historical Swedish novels, where the OCR problems are negligible (Borin et al 2007). This paper, however, does not evaluate the software’s result in a systematic way, even though some important biases have been identified by going through the tagging of some newspaper copies manually. Some important geographic entities are not tagged by SweNER at all (e.g. Paris, Wien [Vienna], Borås and Norge [Norway]). SweNER is able to pick up some OCR-reading mistakes, although many recurring ones (e.g. Lübeck read as Liibeck, Liibcck, Ltjbeck, Ltlbeck) are not tagged by SweNER. These problems can be handled, at least to some degree, by using “leftovers” from the data (wrongly spelled words) that was not matched in a comparison corpus. I have manually scanned the 50,000 most frequently mentioned words that was not matched in the comparative corpus, looking for wrongly spelled names of places. I ended up with a list of around 1,000 places and some 2,000 spelling variations (e.g. over 100 ways of spelling Stockholm). This manually constructed list could be used as a gazetteer, complementing the NER-result, giving a more accurate result of the 19th century newspaper geographies.

REFERENCES

Blevins, C. (2014), ”Space, nation, and the triumph of region: A view on the world from Houston”, Journal of American History, Vol. 101, no 1, pp. 122–147.

Borin, L., Kokkinakis, D., and Olsson, L-G. (2007), “Naming the past: Named entity and animacy recognition in 19th century Swedish literature”, Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007), pp. 1–8, available at: http://spraakdata.gu.se/svelb/pblctns/W07-0901.pdf (accessed October 31 2017).

Jarlbrink, J. and Snickars, P. (2017), “Cultural heritage as digital noise: Nineteenth century newspapers in the digital archive”, Journal of Documentation, Vol. 73, no 6, pp. 1228–1243.

Kettunen, K., Mäkelä, E., Ruokolainen, T., Kuokkala, J., and Löfberg, L. (2017), ”Old content and modern tools: Searching named entities in a Finnish OCRed historical newspaper collection 1771–1910”, Digital Humanities Quarterly, (preview) Vol. 11, no 3.

Kokkinakis, D., Niemi, J., Hardwick, S., Lindén, K., and Borin, L., (2014), ”HFST-SweNER – A new NER resource for Swedish”, Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC), Reykjavik 26–31 May 2014., pp. 2537-2543

Terdiman, R. (1999) “Afterword: Reading the news”, Making the news: Modernity & the mass press in nineteenth-century France, Dean de la Motte & Jeannene M. Przyblyski (eds.), Amherst: University of Massachusetts Press.

11:45am - 12:00pm
Short Paper (10+5min) [abstract]

Digitised newspapers and the geography of the nineteenth-century “lingonberry rush” in Finland

Matti La Mela

History of Industrialization & Innovation group, Aalto University,

This paper uses digitised newspaper data for analysing practices of nature use. In the late nineteenth century, a “lingonberry rush” developed in Sweden and Finland due to the growing foreign demand and exports of lingonberries. The Finnish newspapers followed carefully the events in the neighbouring Sweden, and reported on their pages about the export tons and the economic potential this red gold could have also for Finland. The paper is interested in the geography of this “lingonberry rush”, and explores how the unprecise geographic information about berry-picking can be gathered and structured from the digitised newspapers (metadata and NER). The paper distinguishes between unique and original news, and longer chains of news reuse. The paper takes use of open tools for named-entity recognition and text reuse detection. This geospatial analysis adds to the reinterpretation of the history of the Nordic allemansrätten, a tradition of public access to nature, which allows everyone to pick wild berries today; the circulation of commercial news on lingonberries in the nineteenth century enforced the idea of berries as a commodity, and ultimately facilitated to portray the wild berries as an openly accessible resource.

12:00pm - 12:15pm
Short Paper (10+5min) [abstract]

Sculpting Time: Temporality in the Language of Finnish Socialism, 1895–1917

Risto Turunen

University of Tampere,

Sculpting Time: Temporality in the Language of Finnish Socialism, 1895–1917

The Grand Duchy of Finland had relatively the biggest socialist party of Europe in 1907. The breakthrough of Finnish socialism has not yet been analyzed from the perspective of ‘temporality’, that is, the way human beings experience time. Socialists constructed their own versions of the past, present and future that differed from competing Christian and nationalist perceptions of time. This paper examines socialist experiences and expectations primarily by a quantitative analysis of Finnish handwritten and printed newspapers. Three main questions will be solved by combining traditional conceptual-historical approaches with corpus-linguistic methods of collocation, keyness and key collocation. First, what is the relation of the past, present and future in the language of socialism? Second, what are the key differences between socialist temporality and non-socialist temporalities of the time? Third, how do the actual revolutionary moments of 1905 and 1917 affect socialist temporality and vice versa: did the revolution of time in the consciousness of the people lead to the time of revolution in Finland? The hypothesis is that identifying the changes in temporality will improve our historical understanding of the political ruptures in Finland in the early twentieth century. The results will be compared to Reinhart Koselleck’s theory of ‘temporalization of concepts’ – expectations towards future supersede experiences of the past in modernity –, and to Benedict Anderson’s theory of ‘imagined communities’ which suggests that the advance of print capitalism transformed temporality from vertical to horizontal. The paper forms a part of my on-going dissertation project, which merges close reading of archival sources with computational distant reading of digital materials, thus producing a macro-scale picture of the political language of Finnish socialism.

12:15pm - 12:30pm
Short Paper (10+5min) [abstract]

Two cases of meaning change in Finnish newspapers, 1820-1910

Antti Kanner

University of Helsinki,

In Finland the 19th century saw the formation of number of state institutions that came to define the political life of the Grand Duchy and of subsequent independent republic. Alongside legal, political, economical and social institutions and organisations, the Modern Finnish, as institutionally standardised language, can be seen in this context as one of these institutions. As the majority of residents of Finland were native speakers of Finnish dialects, adopting Finnish was necessary for state’s purposes in extending its influence within the borders of autonomous Grand Duchy. Obviously widening domains of use of Finnish played also an important role in the development of Finnish national identity. In the last quarter of 19th century Finnish started to gain ground as the language of administrative, legal and political discourses alongside Swedish. It is this period we find the crucial conceptual processes that shape Finnish political history well into 20th century.

In this paper I will present two related case studies from my doctoral research, where I seek to understand the semantic similarity scores of so-called Semantic Vector Spaces obtained from large historical corpora in terms of linguistic semantics. As historical corpora are collections of past speech acts, view they provide to changing meanings of words is as much influenced by pragmatic factors and writers’ intentions as synchronic semantics. Understanding and explicating the historical context of observed processes is essential when studying temporal dynamics in semantic changes. For this end, I will try to reflect the theoretical side of my work in the light of cases of historical meaning changes. My research falls under the heading of Finnish Language, but is closely related to history and computational linguistics.

The main data for my research comes from the National Library of Finland’s Newspaper Collection, which I use via KORP service API provided by Language Bank of Finland. The collection accessible via the API contains nearly all newspapers and periodicals published in Finland from 1771 to 1910. The collection is however very heterogenous, as the press and other forms of printed public discourse in Finnish only developed in Finland during the 19th century. Historical variation in conventions of typesetting, editing and orthography as well as paper quality used for printing make it very difficult for OCR systems to recognize characters with 100 percent accuracy. Kettunen et. al. estimated that OCR accuracy is actually somewhere between 60 and 80 percent. However, not all problems in the automatic recognition of the data come from OCR problems or even historical spelling variation. Much is also due to linguistic factors: the 19th century saw large scale dialectal, orthographical and lexical variation in written Finnish. To exemplify the scale of variation, when a morphological analyser for Modern Finnish (OMORFI, Pirinen 2015) was used, it could only parse around 60 percent of the wordlist of the Corpus of Early Modern Finnish (CEMF).

For the reason of unreliability of results from automated parser and the temporal heterogeneity inherent in the data, conducting the study with methodology robust for these kinds of problems poses a challenge. The approach chosen was to use number of analysis and see whether their results could be combined to produce a coherent view of the historical change in word use. In addition, simpler and more robust analysis were chosen instead of more advanced and elaborated ones. For example, analysis similar to topic modelling was conducted using second order collocations (Bertels & Speelman 2014 and Heylen, Wielfaerts, Speelman, Geeraerts 2014) instead of algorithms like LDA (Blei, Ng & Young 2004), that are widely used for this purpose. This was because the data contains an highly inflated count of individual types and lemmas resulting from the problems with OCR and morphological analysis. It seemed that in this specific case at least, LDA was not able to produce sensible topics because the number of hapax legomena per text was so high. The analysis applied based on second order collocations aimed not at producing a model of system of topics, as the LDA does, but to simply cluster studied word’s collocating words based on their respective similarities. Also when tracking changes in words’ syntactic positioning tendencies, instead of resource intensive syntactic parsing, that is also sensitive to errors in data, simple morphological case distribution was used. When the task is to track signals of change, morphological case distributions can be used as sufficient proxies for dependency distributions. This can be done on the grounds that the case selection in Finnish is mostly governed by syntax, as case selection is used to express syntactic relations between, for example constituents of nominal phrases or predicate verb and its arguments (Vilkuna 1989).

The first of my case studies focuses on Finnish word maaseutu. Maaseutu denotes rural area but is in Modern Finnish mostly used as a collective singular referring to the rural as a whole. It is most commonly used as an opposite to the urban, which is often lexicalised as kaupunki, the city, used in similar collective meaning. However, after its introduction to Finnish in 1830’s maaseutu was used in variety of related meanings, mostly referring to specific rural areas or communities, until the turn of the century, when the collective singular sense had become dominant. Starting roughly from 1870’s, however, there seems to have been a period of contesting uses. At that time we find a number of vague cases where the meanings generic or collective and specific meanings overlap.

Combining information from my analysis to newspaper metadata yields an image of dynamic situation. The emergence of the collective singular stands out clearly and is being connected to an accompanying discourse of negotiating urban-rural relations on a national instead of regional level. This change can be pinpointed quite precisely to 1870’s and to the newspapers with geographically wider circulation and more national identity.

The second word of interest is vaivainen, an adjective referring to a person or a thing either being of wretched or inadequate quality or suffering from an physical or mental ailment. When used as a noun, it refers to a person of very low and excluded social status and extreme poverty. In this meaning the word appears in Modern Finnish mostly in poetically archaic or historical contexts but has disappeared from vocabulary of social policy or social legislation already in the early 20th century. The word has a biblical background, being used in older Finnish Bible translations, in for example Sermon on the Mount (as the equivalent of poor in Matt. 5:13 “blessed are the poor in spirit”), and as such was natural choice to name the recipients of church charities. When the state poverty relief system started to take its form in the mid 19th century, it built on top of earlier church organizations (Von Aerschot 1996) and the church terminology was carried over to the state institutions.

When tracking the contexts of the word over the 19th century using context word clusters based on second order collocations, two clear discoursal trends appear: the poverty relief discourse that already in the 1860’s is pronounced in the data disperses into a complex network of different topics and discoursive patterns. As the state run poverty relief institutions become more complex and more efficiently administered, the moral foundings of the whole enterprise are discussed alongside reports of everyday comings and goings of individual institutions or, indeed, tales of individual relief recipients fortunes. The other trend involves the presence of religious or spiritual discourse which, against preliminary assumptions does not wane into the background but experiences a strong surge in the 1870’s and 1880’s. This can be explained in part by growth of revivalist Christian publications in the National Library Corpus, but also by intrusion of Christian connotations in the political discussion on poverty relief system. It is as if the word vaivainen functions as a kind of lightning rod of Christian morality into the public poverty relief discourse.

While methodological contributions of this paper are not highly ambitious in terms of language technology or computational algorithms used, the selection of analysis presents an innovative approach to Digital Humanities. The aim here has been to combine not just one, but an array of simple and robust methods from computational linguistics to theoretical background and analytical concepts from lexical semantics. I argue that robustness and simplicity of methods makes the overall workflow more transparent, and this transparency makes it easier to interpret the results in wider historical context. This allows to ask questions whose relevance is not confined to computational linguistics or lexical semantics, but expands to wider areas of Humanities scholarship. This shared relevance of questions and answers, to my understanding, lies at the core of Digital Humanities.

References

Bertels, A. & Speelman, D. (2014). “Clustering for semantic purposes. Exploration of semantic similarity in a technical corpus.” Terminology 20:2, pp. 279–303. John Benjamins Publishing Company.

Blei, D., Ng, A. Y. & Jordan, M. I. (2003). “Latent Dirichlecht Allocation.” Journal of Machine Learning Research 3 (4–5). Pp. 993–1022.

CEMF, Corpus of Early Modern Finnish. Centre for Languages in Finland. http://kaino.kotus.fi

Heylen, C., Peirsman Y., Geeraerts, D. & Speelman, D. (2008). “Modelling Word Similarity: An Evaluation of Automatic Synonymy Extraction Algorithms.” Proceedings of LREC 2008.

Huhtala, H. (1971). Suomen varhaispietistien ja rukoilevaisten sanankäytöstä :

semanttis-aatehistoriallinen tutkimus. [On the vocabulary of the early Finnish pietist

and revivalist movements]. Suomen Teologinen Kirjallisuusseura.

Kettunen, K., Honkela, T., Lindén, K., Kauppinen, P., Pääkkönen, T. & Kervinen, J.

(2014). “Analyzing and Improving the Quality of a Historical News Collection

using Language Technology and Statistical Machine Learning Methods”. In

IFLA World Library and Information Congress Proceedings : 80th IFLA

General Conference and Assembly. Lyon. France.

Pirinen, T. (2015). “Omorfi—Free and open source morphological lexical database for

Finnish”. In Proceedings of the 20th Nordic Conference of Computational

Linguistics NODALIDA 2015.

Vilkuna, M. (1989). Free word order in Finnish: Its syntax and discourse functions.

Suomalaisen Kirjallisuuden Seura.

Von Aerschot, P. (1996). Köyhät ja laki: toimeentukilainsäädännön kehittyminen kehitys

oikeudellistusmisprosessien valossa. [The poor and the law: development of Finnish

welfare legislation in light juridification processes.] Suomalainen Lakimiesyhdistys.

11:00am - 12:30pm

T-P674-1: Place
Session Chair: Christian-Emil Smith Ore

P674

11:00am - 11:30am
Long Paper (20+10min) [publication ready]

SDHK meets NER: Linking place names with medieval charters and historical maps

Olof Karsvall², Lars Borin¹

¹University of Gothenburg,; ²Swedish National Archives

Mass digitization of historical text sources opens new avenues for research in the humanities and social sciences, but also presents a host of new methodological challenges. Historical text collections become more accessible, but new research tools must also be put in place in order to fully exploit the new research possibilities emerging from having access to vast document collections in digital format. This paper highlights some of the conditions to consider when place names in an older source material, in this case medieval charters, are to be matched to geographical data. The Swedish National Archives make some 43,000 medieval letters available in digital form through an online search facility. The volume of the material is such that manual markup of names will not be feasible. In this paper, we present the material, discuss the promises for research of linking, e.g., place names to other digital databases, and report on an experiment where an off-the-shelf named-entity recognition system for modern Swedish is applied to this material.

11:30am - 11:45am
Distinguished Short Paper (10+5min) [publication ready]

On Modelling a Typology of Geographic Places for the Collaborative Open Data Platform histHub

Manuela Weibel, Tobias Roth

Schweizerisches Idiotikon

HistHub will be a platform for Historical Sciences providing authority records for interlinking and referencing basic entities such as persons, organisations, concepts and geographic places within an ontological framework. For the case of geographic places, a draft of a place typology is presented here. Such a typology will be needed for semantic modelling in an ontology. We propose a hierarchical two-step model of geographic place types: a more generic type remaining stable over time that will ultimately be incorporated into the ontology as the essence of the identity of a place, and a more specific type closer to the nature of the place the way it is actually perceived by humans.

Our second approach on our way to a place typology is decidedly bottom-up. We try to standardise the place types in our database of heterogeneous top-onymic data using the place types already present as well as textual descriptions and name matches with typed external data sources. The types used in this standardisation process are basic conceptual units that are most likely to play a role in any place typology yet to be established. Standardisation at this early stage leads to comprehensive and deep knowledge of our data which helps us developing a good place typology.

11:45am - 12:00pm
Distinguished Short Paper (10+5min) [publication ready]

Geocoding, Publishing, and Using Historical Places and Old Maps in Linked Data Applications

Esko Ikkala¹, Eero Hyvönen^1,2, Jouni Tuominen^1,2

¹Aalto University, Semantic Computing Research Group (SeCo); ²University of Helsinki, HELDIG – Helsinki Centre for Digital Humanities

This paper presents a Linked Open Data brokering service prototype Hipla.fi for

using and maintaining historical place gazetteers and maps based on distributed SPARQL endpoints. The service introduces several novelties: First, the service facilitates collaborative maintenance of geo-ontologies and maps in real time as a side effect of annotating contents in legacy cataloging systems. The idea is to support a collaborative ecosystem of curators that creates and maintains data about historical places and maps in a sustainable way. Second, in order to foster understanding of historical places, the places can be provided on both modern and historical maps, and with additional contextual Linked Data attached. Third, since data about historical places is typically maintained by different authorities and in different countries, the service can be used and extended in a federated fashion, by including new distributed SPARQL endpoints (or other web services with a suitable API) into the system.

12:00pm - 12:15pm
Short Paper (10+5min) [abstract]

Using ArcGIS Online and Story Maps to visualise spatial history: The case of Vyborg

Antti Härkönen

University of Eastern Finland

Historical GIS (HGIS) or spatially oriented history is a field that uses geoinformatics to look at historical phenomena from a spatial perspective. GIS tools are used to visualize, manage and analyze geographical data. However, the use of GIS tools requires some technical expertise and ready-made historical spatial data is almost non-existent, which significantly reduces the reach of HGIS. New tools should make spatially oriented history more accessible.

Esri’s ArcGIS Online (AGOL) allows making internet visualization of maps and map layers created with Esri’s more traditional GIS desktop program ArcMap. In addition, Story Map tool allows the creation of more visually pleasing presentations using maps, text and multimedia resources. I will demonstrate the use of Story Maps to represent spatial change in the case of the city of Vyborg.

The city of Vyborg lies in Russia near the Finnish border. A small town grew near the castle founded by Swedes in 1293. Vyborg was granted town privileges in 1403, and later in the 15th century, it became one of the very few walled towns in Kingdom of Sweden. The town was located on a hilly peninsula near the castle. Until 17th century the town space was ‘medieval’ i.e. irregular. The town was regulated to conform to a rectangular street layout in 1640s. I show the similarities between old and new town plans by superimposing them on a map.

The Swedish period ended when the Russians conquered Vyborg in 1710. Vyborg became a provincial garrison town and administrative center. Later, when Russia conquered rest of Finland in 1809, the province of Vyborg (aka ‘Old Finland’) was added to the Autonomous Grand Duchy of Finland, a part of the Russian empire. During 19th century Vyborg became increasingly important trade and industrial center, and the population grew rapidly. I map expanding urban areas using old town plans and population statistics.

Another perspective to the changing town space is the growth of fortifications around Vyborg. As the range of artillery grew, the fortifications were pushed further and further outside the original town. I use story maps to show the position of fortifications of different eras by placing them in the context of terrain. I also employ viewshed analyses to show how the fortifications dominate the terrain around them.

12:15pm - 12:30pm
Short Paper (10+5min) [abstract]

EXPLORING COUNTRY IMAGES IN SCHOOL BOOKS: A COMPARATIVE COMPUTATIONAL ANALYSIS OF GERMAN SCHOOL BOOKS IN THE 20TH AND THE 21ST CENTURY

Kimmo Elo¹, Virpi Kivioja²

¹University of Helsinki; ²University of Turku

This paper is based on an ongoing PhD project entitled “An international triangle drama?”, which studies the depictions of West Germany and East Germany in Finnish, and depictions of Finland in West German and East German geography textbooks in the Cold War era. The primary source material consists of Finnish, West German, and East German geography textbooks that were published between 1946 and 1999.

Contrary to traditional methods of close reading thus far applied in school book analysis, this paper presents an exploratory approach based computational analysis of a large book corpus. The corpus consists of school books in geography used in the Federal Republic of Germany between 1946 and 1999, and in the German Democratic Republic between 1946 and 1990. The corpus has been created by digitising all books by applying OCR technologies on the scanned page images. The corpus has also been post-processed by correcting OCR errors and by adding metadata.

The main aim of the paper is to extract and analyse conceptual geocollocations. Such an analysis focuses on how concepts are embedded geospatially on the one hand, how geographical entities (cities, regions, etc.) are conceptually embedded, on the other. Regarding the former, the main aim is to examine and explain the geospatial distribution of terms and concepts. Regarding the latter, the main focus is on the analysis of concept collocations surrounding geographical entities.

The analysis presented in the paper consists of four steps. First, standard methods of text mining are used in order to identify geographical concepts (names of different regions, cities etc.). Second, concepts and terms in the close neighborhood of geographical terms are tagged with geocodes. Third, network analysis is applied to create concept networks around geographical entities. And fourth, both the geotagged and network data are enriched by adding bibliographical metadata allowing comparisons over time and between countries.

The paper adopts several methods to visualise analytical results. Geospatial plots are used to visualise geographical distribution of concept and its changes over time. Network graphs are used to visualise collocation structures and their dynamics. An important functions of the graphs, however, is to exemplify how graphical visualisations can be used to visualise historical knowledge and how graphical visualisations can help us to tackle change and continuity from a comparative perspective.

Concerning historical research from a more general perspective, one of the main objectives of this paper is to exemplify and discuss how computational methods could be applied to tackle research questions typical for social sciences and historical research. The paper is motivated by the big challenge to move away from computational history guided and limited by tools and methods of computational sciences toward an understanding that computational history requires computational tools developed to find answers to questions typical and crucial for historical research. All tools, data and methods developed during this research project will later be made available for scholars interested in similar topics, thus helping the to gain advantage of this project.

12:30pm - 2:00pm

Lunch

Main Building of the University, entrance from the Senate Square side

2:00pm - 3:30pm

T-PII-2: Cultural Heritage and Art
Session Chair: Bente Maegaard

PII

2:00pm - 2:30pm
Long Paper (20+10min) [publication ready]

Cultural Heritage `In-The-Wild': Considering Digital Access to Cultural Heritage in Everyday Life

David McGookin, Koray Tahiroglu, Tuomas Vaittinen, Mikko Kyto, Beatrice Monastero, Juan Carlos Vasquez

Aalto University,

As digital cultural heritage applications begin to be deployed outwith `traditional' heritage sites, such as museums, there is an increased need to consider their use amongst individuals who are open to learning about the heritage of a site, but where that is a clearly secondary purpose for their visit. Parks, recreational areas and the everyday built environment represent places that although rich in heritage, are often not visited primarily for that heritage. We present the results of a study of a mobile application to support accessing heritage on a Finnish recreational island. Evaluation with 45 participants, who were not there primarily to access the heritage, provided insight into how digital heritage applications can be developed for this user group. Our results showed how low immersion and lightweight interaction support individuals to integrate cultural heritage around their primary visit purpose, and although participants were willing to include heritage as part of their visit, they were not willing to be directed by it.

2:30pm - 2:45pm
Short Paper (10+5min) [publication ready]

Negative to That of Others, But Negligent of One’s Own? On Patterns in National Statistics on Cultural Heritage in Sweden

Daniel Brodén

Gothenburg University, Sweden,

In 2015–2016 the Centre for Critical Heritage Studies conducted an interdisciplinary pilot project in collaboration with the SOM-institute at the University of Gothenburg. A key ambition was to demonstrate the usefulness of combining an analysis rooted in the field of critical heritage studies and a statistical perspective. The study was based on a critical discussion of the concept of cultural heritage and collected data from the nationwide SOM-surveys.

The abstract will highlight some significant patterns in the SOM data from 2015 when it comes to differences between people regarding activities that are traditionally associated with national cul-tural heritage and culture heritage instititions: 1) women are more active than men when it comes to activities related to national cultural heritage; 2) class and education are also significant factors in this context. Since these patterns has been shown in prior research, perhaps the most interesting finding is that, 3) people who are negative to immigration from ‘other’ cultures to a lesser extent participates in activities that are associated with their ‘own’ cultural heritage.

2:45pm - 3:00pm
Distinguished Short Paper (10+5min) [publication ready]

Engaging Collections and Communities: Technology and Interactivity in Museums

Paul Arthur

Edith Cowan University,

Museum computing is a field with a long history that has made a substantial impact on humanities computing, now called ‘digital humanities,’ that dates from at least the 1950s. Community access, public engagement, and participation are central to the charter of most museums and interactive displays are one strategy used help to fulfil that goal. Over the past two decades interactive elements have been developed to offer more immersive, realistic and engaging possibilities through incorporating motion-sensing spaces, speech recognition, networked installations, eye tracking and multitouch tables and surfaces. As museums began to experiment with digital technologies there was an accompanying change of emphasis and policy. Museums aimed to more consciously connect themselves with popular culture by experimenting with the presentation of their collections in ways that would result in in-creased public appreciation and accessibility. In this paper these shifts are investigated in relation to interactive exhibits, virtual museums, the profound influence of the database, and in terms of a wider breaking down of institutional barriers and hierarchies, resulting in trends towards increasing collaboration.

3:00pm - 3:15pm
Short Paper (10+5min) [abstract]

Art of the Digital Natives and Predecessors of Post-Internet Art

Raivo Kelomees

Estonian Academy of Arts, Estonia,

The new normal or the digital environment surrounding us has in recent years surprised us, at least in the fine arts, with the internet's content returning to its physical space. Is this due to pressure from the galleries or something else; in any case, it is clearer than ever that the audience is not separable from the habitual space; there is a huge and primal demand for physical or material art.

Christiane Paul in her article "Digital Art Now: The Evolution of the Post-Digital Age" in "ARS17: Hello World!" exhibition catalogue, is critical of the exhibition. Her main message is that all this has been done before. In itself the statement lacks originality, but in the context of the postinternet apologists declaring the birth of a new mentality, the arrival of a new "after experiencing the internet" and "post-digital" generation, it becomes clear that indeed it is rather like shooting fish in a barrel, because art that is critical of the digital and interactive has existed since the 1990s, as have works concerned with the physicalisation of the digital experience.

The background to the exhibition is the discussion over "digitally created" art and the generation related to it. The notion of "digital natives" is related to the post-digital and post-internet generation and the notion of "post-contemporary" (i.e. art is not concerned with the contemporary but with the universal human condition). Apparently for the digital natives, the internet is not a way out of the world anymore, but an original experience in which the majority of their time is spent. At the same time, however, the internet is a natural information environment for people of all ages whose work involves data collection and intellectual work. Communication, thinking, information gathering and creation – all of these realms are related to the digital environment. These new digital nomads travel from place to place and work in a "post-studio" environment.

While digital or new media was created, stored and shared via digital means, post-digital art addresses the digital without being stored using these same means. In other words, this kind of art exists more in the physical space.

Considerable reference also exists in relation to James Bridle's new aesthetics concept from 2012. In short, this refers to the convergence and conjoinment of the virtual and physical world. It manifests itself clearly even in the "pixelated" design of consumer goods or in the oeuvre of sculptors and painters, whose work has emerged from something digital. For example, the art objects by Shawn Smith and Douglas Coupland are made using pixel-blocks (the sculpture by the latter is indeed reminiscent of a low resolution digital image). Analogous works induce confusion, not to say a surprising experience, in the minds of the audience, for they bring the virtual quality of the computerised environment into physical surroundings. This makes the artworks appear odd and surreal, like some sort of mistake, errors, images and objects out of place.

The so-called postinternet generation artists are certainly not the only ones making this kind of art. As an example of this, there is a reference to the abstract stained glass collage of 11,500 pixels by Gerhard Richter in the Cologne Cathedral. It is supposed to be a reference to his 1974 painting "4096 Farben" (4096 colours), which indeed is quite similar. It is said that Richter did not accept a fee; however, the material costs were covered by donations. And yet the cardinal did not come to the opening of the glasswork, preferring depictions of Christian martyrs over abstract windows, which instead reminded him of mosques.

One could name other such examples inspired by the digital world or schisms of the digital and physical world: Helmut Smits' "Dead Pixel in Google Earth" (2008); Aram Barholl's "Map" (2006); the projects by Eva and Franco Mattes, especially the printouts of Second Life avatars from 2006; Achim Mohné's and Uta Koppi's project "Remotewords" (2007–2011), computer-based instructions printed on rooftops to be seen from Google Maps or satellites or planes. There are countless examples where it is hard to discern whether the artist is deliberately and critically minded towards digital art or rather a representative of the post-digital generation who is not aware and wishes not to be part of the history of digital art.

From the point of view of researchers of digital culture, the so-called media-archaeological direction could be added to this as an inspirational source for artists today. Media archaeology or the examination of previous art and cultural experience signifies, in relation to contemporary media machines and practices, the exploration of previous non-digital cultural devices, equipment, means of communication, and so on, that could be regarded as the pre-history of today's digital culture and digital devices. With this point of view, the "media-archaeological" artworks of Toshio Iwai or Bernie Lubell coalesce. They have taken an earlier "media machine" or a scientific or technical device and created a modern creation on the basis of it.

Then there was the "Ars Electronica" festival (2006) that focused on the umbrella topic "Simplicity", which in a way turned its back on the "complexity" of digital art and returned to the physical space.

Therefore, in the context of digital media based art trends, the last couple of decades have seen many expressions – works, events and exhibitions – of "turning away" from the digital environment that would outwardly qualify as post-digital and postinternet art.

3:15pm - 3:30pm
Short Paper (10+5min) [abstract]

The Stanley Rhetoric: A Procedural Analysis of VR Interactions in 3D Spatial Environments of Stanley Park, BC

Raluca Fratiloiu

Okanagan College

In a seminal text on the language of new media, Manovitch (2002) argued:

Traditionally, texts encoded human knowledge and memory, instructed, inspired, convinced, and seduced their readers to adopt new ideas, new ways of interpreting the world, new ideologies. In short, the printed word was linked to the art of rhetoric. While it is probably possible to invent a new rhetoric of hypermedia […] the sheer existence and popularity of hyperlinking exemplifies the continuing decline of the field of rhetoric. (Manovitch, 2002).

Depending on the context of each “rhetorical situation” (Bitzer, 1968), it may be both good and bad news to think that interactivity and rhetoric might not always go hand in hand. However, despite the anticipated decline of rhetoric as announced by Manovitch (2002), in this paper we propose a closer examination of what constitutes a rhetorically effective discourse in new media, in general and virtual reality (VR), in particular. The reason we need to examine it more closely is that VR, especially when it has an educational goal, needs to be rhetorically effective to be successful with audiences. A consideration of the rhetorical impact of VR’s affordances may enhance the potential of meaningful interactions with students and users.

In addition to a very long disciplinary history, rhetoric has been investigated in relation to new media mainly through Bogost’s (2007) concept of “procedural rhetoric”. He argued that despite the fact that “rhetoric was understood as the art of oratory”, “videogames open a new domain for persuasion, thanks to their core representational mode, procedurality. (Bogost, 2007) This has implications, according to Bogost (2007) in three areas: politics, advertising and learning. Several of these implications have already been investigated. Procedural rhetorical analysis in videogames has since become a core methodological approach. Also, particular attention has been paid to how new media open new possibilities through play and how in turn this creates a renewed interest in digital rhetoric. (Daniel-Wariya, 2016) At the same time, procedural rhetoric has been also investigated at length in connection to learning through games (Gee, 2007). Learning also has been central in a few studies on VR in education (Dalgarno, 2010). However, specific assessments of procedural rhetoric outcomes of particular VR educational projects are non-existent.

In this paper, we will focus on analysing procedural interactions in a VR project developed by University of British Columbia’s Emerging Media Lab. This project, funded via an open education grant, led to the creation of a 3D spatial environment of Stanley Park located in Vancouver, British Columbia (BCCampus, 2017). This project focused on Stanley Park, one of the most iconic Canadian destinations as an experiential field trip, specifically using educational content for and 3D spatial environment models of Prospect Point, Beaver Lake, Lumberman’s Arch, and the Hollow Tree. Students will have opportunities to visit these locations in the park virtually and interact with the environment and remotely with other learners. In addition, VR provides opportunities to explore the complex history of this impressive location that was once home to Burrard, Musqueam and Squamish First Nations people (City of Vancouver, 2017).

This case analysis may open up new possibilities for investigating how students/users derive meaning from interacting in these environments and continue a dialogue between several connected areas of education and VR, games and pedagogy, games and procedural rhetoric. Also, we hope to contribute this feedback to this emerging project as it continues to evolve and share its results with the wider open education community.

References:

BCCampus. (2017, May 10). Virtual reality and augmented reality field trips funded by OER grants. Retrieved from BCCampus: https://bccampus.ca/2017/05/10/virtual-reality-and-augmented-reality-field-trips-funded-by-oer-grants/

Bitzer, L. (1968). The rhetorical situation. Philosophy and Rhetoric, 1, pp. 1-14.

Bogost, I. (2007). Persuasive Games: The Expressive Power of Videogames. . Cambridge: MA: MIT Press.

City of Vancouver. (2017). The History of Stanley Park. Retrieved from City of Vancouver: http://vancouver.ca/parks-recreation-culture/stanley-park-history.aspx

Dalgarno, L. (2010). What are the learning affordances of 3-D virtual environments? British Journal of Educational Technology, Vol 41 No 1 10-32.

Daniel-Wariya, J. (2016). A Language of Play: New Media’s Possibility Spaces. Computers and Composition, 40, pp 32-47.

Gee, J. P. (2007). What Video Games Have to Teach Us About Learning and Literacy. Second Edition: Revised and Updated Edition. New York: St. Martin's Griffin.

Manovitch, L. (2002). The language of new media. Cambridge, MA: MIT Press.

2:00pm - 3:30pm

T-PIII-2: Language Resources
Session Chair: Kaius Sinnemäki

PIII

2:00pm - 2:30pm
Long Paper (20+10min) [publication ready]

Sentimentator: Gamifying Fine-grained Sentiment Annotation

Emily Sofi Öhman, Kaisla Kajava

University of Helsinki

We introduce Sentimentator; a publicly available gamified web-based annotation platform for fine-grained sentiment annotation at the sentence-level. Sentimentator is unique in that it moves beyond binary classification. We use a ten-dimensional model which allows for the annotation of over 50 unique sentiments and emotions. The platform is heavily gamified with a scoring system designed to reward players for high quality annotations. Sentimentator introduces several unique features that have previously not been available, or at best very limited, for sentiment annotation. In particular, it provides streamlined multi-dimensional annotation optimized for sentence-level annotation of movie subtitles. The resulting dataset will allow for new avenues to be explored, particularly in the field of digital humanities, but also knowledge-based sentiment analysis in general. Because both the dataset and platform will be made publicly available it will benefit anyone and everyone interested in fine-grained sentiment analysis and emotion detection, as well as annotation of other datasets.

2:30pm - 2:45pm
Distinguished Short Paper (10+5min) [publication ready]

Defining a Gold Standard for a Swedish Sentiment Lexicon: Towards Higher-Yield Text Mining in the Digital Humanities

Jacobo Rouces, Lars Borin, Nina Tahmasebi, Stian Rødven Eide

University of Gothenburg

There is an increasing demand for multilingual sentiment analysis, and most work on sentiment lexicons is still carried out based on English lexicons like WordNet. In addition, many of the non-English sentiment lexicons that do exist have been compiled by (machine) translation from English resources,

thereby arguably obscuring possible language-specific characteristics of sentiment-loaded vocabulary.

In this paper we describe the creation of a gold standard for the sentiment annotation of Swedish terms as a first step towards the creation of a full-fledged sentiment lexicon for Swedish -- i.e., a lexicon containing information about prior sentiment (also called polarity) values of lexical items (words or disambiguated word senses), along a scale negative--positive. We create a gold standard for sentiment annotation of Swedish terms, using the freely available SALDO lexicon and the Gigaword corpus. For this purpose, we employ a multi-stage approach combining corpus-based frequency sampling and two stages of human annotation: direct score annotation followed by Best-Worst Scaling. In addition to obtaining a gold standard, we analyze the data from our process and we draw conclusions about the optimal sentiment model.

2:45pm - 3:00pm
Short Paper (10+5min) [publication ready]

The Nordic Tweet Stream: A dynamic real-time monitor corpus of big and rich language data

Mikko Laitinen¹, Jonas Lundberg², Magnus Levin³, Rafael Martins⁴

¹University of Eastern Finland; ²Linnaeus University; ³Linnaeus University; ⁴Linnaeus University

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinary corpus project of computer scientists and a group of sociolinguists interested in language variability and in the global spread of English. Our research integrates two types of empirical data: We not only rely on traditional structured corpus data but also use unstructured data sources that are often big and rich in metadata, such as Twitter streams. The NTS downloads tweets and associated metadata from Denmark, Finland, Iceland, Norway and Sweden. We first introduce some technical aspects in creating a dynamic real-time monitor corpus, and the fol-lowing case study illustrates how the corpus could be used as empirical evidence in sociolinguistic studies focusing on the global spread of English to multilingual settings. The results show that English is the most frequently used language, accounting for almost a third. These results can be used to assess how widespread English use is in the Nordic region and offer a big data perspective that complement previous small-scale studies. The future objectives include annotating the material, making it available for the scholarly community, and expanding the geographic scope of the data stream outside Nordic region.

3:00pm - 3:15pm
Short Paper (10+5min) [publication ready]

Best practice for digitising small-scale Digital Humanities projects

Peggy Bockwinkel, Dîlan Cakir

University of Stuttgart, Germany

Digital Humanities (DH) are growing rapidly; the necessary infrastructure

is being built up gradually and slowly. For smaller DH projects, e. g. for

testing methods, as a preliminary work for submitting applications or for use in

teaching, a corpus often has to be digitised. These small-scale projects make an

important contribution to safeguarding and making available cultural heritage, as

they make it possible to machine read those resources that are of little or no interest

to large projects because they are too special or too limited in scope. They

close the gap between large scanning projects of archives, libraries or in connection

with research projects and projects that move beyond the canonised paths.

Yet, these small projects can fail in this first step of digitisation, because it is

often a hurdle for (Digital) Humanists at universities to get the desired texts digitised:

either because the digitisation infrastructure in libraries/archives is not

available (yet) or it is paid service. Also, researchers are often no digitising experts

and a suitable infrastructure at university is missing.

In order to promote small DH projects for teaching purposes, a digitising infrastructure

was set up at the University of Stuttgart as part of a teaching project. It

should enable teachers to digitise smaller corpora autonomously.

This article presents a study that was carried out as part of this teaching project.

It suggests how to implement best practices and on which aspects of the digitisation

workflow need to be given special attention.

The target group of this article are (Digital) Humanists who want to digitise a

smaller corpus. Even with no expertise in scanning and OCR and no possibility

to outsource the digitisation of the project, they still would like to obtain the best

possible machine-readable files.

3:15pm - 3:30pm
Distinguished Short Paper (10+5min) [publication ready]

Creating and using ground truth OCR sample data for Finnish historical newspapers and journals

Kimmo Kettunen, Jukka Kervinen, Mika Koistinen

University of Helsinki, Finland,

The National Library of Finland (NLF) has digitized historical newspapers, journals and ephemera published in Finland since the late 1990s. The present collection consists of about 12 million pages mainly in Finnish and Swedish. Out of these about 5.1 million pages are freely available on the web site digi.kansalliskirjasto.fi. The copyright restricted part of the collection can be used at six legal deposit libraries in different parts of Finland. The time period of the open collection is from 1771 to 1920. The last ten years, 1911–1920, were opened in February 2017.

This paper presents the ground truth Optical Character Recognition data of about 500 000 Finnish words that has been compiled at NLF for development of a new OCR process for the collection. We discuss compilation of the data and show basic results of the new OCR process in comparison to current OCR using the ground truth data.

2:00pm - 3:30pm

T-PIV-2: Authorship
Session Chair: Jani Marjanen

PIV

2:00pm - 2:30pm
Long Paper (20+10min) [abstract]

Extracting script features from a large corpus of handwritten documents

Lasse Mårtensson¹, Anders Hast², Ekta Vats²

¹Högskolan i Gävle, Sweden,; ²Uppsala universitet, Sweden

Before the advent of the printing press, the only way to create a new piece of text was to produce it by hand. The medieval text culture was almost exclusively a handwritten one, even though printing began towards the very end of the Middle Ages. As a consequence of this, the medieval text production is very much characterized by variation of various kinds: regarding language forms, regarding spelling and regarding the shape of the script. In the current presentation, the shape of the script is in focus, an area referred to as palaeography. The introduction of computers has changed this discipline radically, as computers can handle very large amounts of data and furthermore measure features that are difficult to deal with for a human researcher.

In the current presentation, we will demonstrate two investigations within digital palaeography, carried out on the medieval Swedish charter corpus in its entirety, to the extent that this has been digitized. The script in approximately 14 000 charters has been measured and accounted for, regarding aspects described below. The charters are primarily in Latin and Old Swedish, but there are also a few in Middle Low German. The overall purpose for the investigations is to search for script features that may be significant from the perspective of separating one scribe from another, i.e. scribal attribution. As the investigations have been done on the entire available charter corpus, it is possible to visualize how each separate charter relates to all the others, and furthermore to see how the charters may divide themselves into clusters on the basis of similarity regarding the investigated features.

The two investigations both focus on aspects that have been looked upon as significant from the perspective of scribal attribution, but that are very difficult to measure, at least with any degree of precision, without the aid of computers. One of the investigations belongs to a set of methods often referred to as Quill Features. This method focuses, as the name states, on how the scribe has moved the pen over the script surface (parchment or paper). The medieval pen, the quill, consisted of a feather that had been hardened, truncated and split at the top. This construction created variation in width in the strokes constituting the script, mainly depending on the direction in which the pen was moved, and also depending on the angle in which the scribe had held the pen. This is what this method measures: the variation between thick and thin strokes, in relation to the angle of the pen. This method has been used on medieval Swedish material before, namely a medieval Swedish manuscript (Cod. Ups. C 61, 1104 pages), but the current investigation accounts for ten times the size of the previous investigation, and furthermore, we employ a new type of evaluation (see below) of the results that to our knowledge has not been done before.

The second investigation focuses on the relations between script elements of different height, and the proportions between these. For instance three different formations can be discerned among the vertical scripts elements: minims (e.g. in ‘i’, ‘n’ and ‘m’), ascenders (e.g. in ‘b’, ‘h’ and ‘k’) and descenders (e.g. in ‘p’ and ‘q’). The ascender can extend to a various degree above the minim, and the descender can extend to a various degree below the minim, creating different proportions between the components. These measures have also been extracted from the entire available medieval Swedish charter corpus, and display very interesting information from the perspective of scribal identity. It should be noted that the first line of a charter often is divergent from the rest of the charter in this respect, as the ascenders here often extends higher than otherwise. In a similar way, the descenders of the last line of the charters often extend further down below the line as compared to the rest of the charter. In order for a representative measure to be gained from a charter, these two lines must be disregarded.

One of the problems when investigating individual scribal habits in medieval documents is that we rarely know for certain who has produced them, which makes the evaluation difficult. In most cases, the scribe of a given document is identified through a process of scribal attribution, usually based on palaeographical and linguistic evidence. In an investigation on individual scribal features, it is not desirable to evaluate the results on the basis of previous attributions. Ideally, the evaluation should be done on charters where the identity of the scribe can be established on external features, where his/her identity is in fact known. For this purpose, we have identified a set of charters where this is actually the case, namely where the scribe himself/herself explicitly states that he/she has held the pen (in our corpus, there are only male scribes). These charters contain a so-called scribal note, containing the formula ego X scripsi (‘I X wrote’), accompanied by a symbol unique to this specific scribe. One such scribe is Peter Tidikesson, who produced 13 charters with such a scribal note in the period 1432–1452, and another is Peter Svensson, who produced six charters in the period 1433–1453. This selection of charters is the means by which the otherwise big data-focused computer aided methods can be evaluated from a qualitative perspective. This step of evaluation is crucial in order for the results to become accessible and useful for the users of the information gained.

2:30pm - 3:00pm
Long Paper (20+10min) [abstract]

Text Reuse and Eighteenth-Century Histories of England

Ville Vaara¹, Aleksi Vesanto², Mikko Tolonen¹

¹University of Helsinki; ²University of Turku

Introduction

- ----

What kind of history is Hume’s History of England? Is it an impartial account or is it part of a political project? To what extent was it influenced by seventeenth-century Royalist authors? These questions have been asked since the first Stuart volumes were published in the 1750s. The consensus is that Hume’s use of Royalist sources left a crucial mark on his historical project. However, as Mark Spencer notes, Hume did not only copy from Royalists or Tories. One aim of this paper is to weigh these claims against our evidence about Hume’s use of historical sources. To do this we qualified, clustered and compared 129,646 instances text reuse in Hume’s History. Additionally, we are able to compare Hume’s History of England to other similar undertakings in the eighteenth-century and get an accurate view of their composition. We aim to extend the discussion on Hume's History in the direction of applying computation methods on understanding the writing of history of England in the eighteenth-century as a genre.

This paper contributes to the overall development of Digital Humanities by demonstrating how digital methods can help develop and move forward discussion in an existing research case. We don’t limit ourselves to general method development, but rather contribute in the specific discussions on Hume’s History and study of eighteenth-century histories.

Methods and sources

- ----

This paper advances our understanding of the composition of Hume’s History by examining the direct quotes in it based on data in Eighteenth-Century Collections Online (ECCO). It should be noted that ECCO also includes central seventeenth-century histories and other important documents reprinted later. Thus, we do not only include eighteenth-century sources, but, for example, Clarendon, Rushworth and other notable seventeenth-century historians. We compare the phenomenon of text reuse in Hume’s History to that in works of Rapin, Guthrie and Carte, all prominent historians at the time. To our knowledge, this kind of text mining effort has not been not been previously done in the field of historiography.

Our base-text for Hume is the 1778 edition of History of England. For Paul de Rapin we used the 1726-32 edition of his History of England. For Thomas Carte the source was the 1747-1755 edition of his General History of England. And for William Guthrie we used the 1744-1751 edition of his History of Great Britain.

As a starting point for our analysis, we used a dataset of linked text-reuse fragments found in ECCO. The basic idea was to create a dataset that identifies similar sequences of characters (from circa 150 to more than 2000 characters each) instead of trying to match individual characters or tokens/words. This helped with the optical character recognition problems that plague ECCO. The methodology has previously been used in matching DNA sequences, where the problem of noisy data is likewise present. We further enriched the results with bibliographical metadata from the English Short Title Catalogue (ESTC). This enriching allows us to compare the publication chronology and locations, and to create rough estimates of first edition publication dates.

There is no ready-to-use gold standard for text reuse cluster detection. Therefore, we compared our clusters and the critical edition of the Essay Concerning Human Understanding (EHU) to see if text reuse cases of Hume’s Treatise in EHU are also identified by our method. The results show that we were able to identify all cases included in EHU except those in footnotes. Because some of the changes that Hume made from the Treatise to EHU are not evident, this is a very promising.

Analysis

- ----

To give a general overview of Hume’s History in relation to other works considered, we compared their respective volumes of source text reuse (figure 1). The comparison reveals some fundamental stylistic and structural differences. Hume’s and Carte’s Histories are composed quite differently from Rapin’s and Guthrie’s, which have roughly three times more reused fragments: Rapin typically opens a chapter with a long quote from a source document, and moves on to discuss the related historical events. Guthrie writes similarly, quoting long passages from sources of his choice. Humeis different: His quotes are more evenly spread, and a greater proportion of the text seems to be his own original formulations.

[Figure 1.]

Change in text reuse in the Histories

- ----

All the histories of England considered in our analysis are massive works, comprising of multiple separate volumes. The amount of reused text fragments found in these volumes differs significantly, but the trends are roughly similar. The common overall feature is a rise in the frequency of direct quotes in later volumes.

The increase in text reuse peaks in the volumes covering the reign of Charles I, and the events of the English Civil War, but with respect to both Hume and Rapin (figures 2 & 3), the highest peak is not at the end of Charles’ reign, but in the lead up to the confrontation with the parliament. In Guthrie and Carte (figures 4 & 5) the peaks are located in the final volume. Except for Guthrie, all the other historical works considered here have the highest reuse rates located around the period of Charles I’s reign that was intensely debated topic among Hume’s contemporaries.

[Figure 2.]

[Figure 3.]

[Figures 4, 5.]

We can further break down the the sources of reused text fragments by political affiliation of their authors (figure 6). A significant portion of the detected text reuse cases by Hume link to authors with no strong political leaning in the wider Whig-Tory context. It is obvious that serious antiquary work that is politically neutral forms the main body of seventeenth-century historiography in England. With the later volumes, the amount of text reuses cases tracing back to authors with a political affiliation increases, as might be expected with more heavily politically loaded topics.

[Figure 6.]

Taking an overview of the authors of the text reuse fragments in Hume’s History (figure 7), we note that the statistics are dominated by a handful of writers, with a long “tail” of others whose use is limited to a few fragments. Both groups, the Whig and Tory authors, feature a few “main sources” for Hume. John Rushworth (1612-1690) emerges as the most influential source, followed closely by Edward Hyde Clarendon (1609-1674). Both Rushworth and Clarendon had reached a position of prominence as historians and were among the best known and respected sources available when Hume was writing his own work. We might even question if their use was politically colored at all, as practically everyone was using their works, regardless of political stance.

[Figure 7.]

Charles I execution and Hume’s impartiality

- ----

A relatively limited list of authors are responsible for majority of the text fragments in Hume's History. As one might intuitively expect, the use of particular authors is concentrated in particular chapters. In general, the unevenness in the use of quotes can be seen as more of a norm than an exception.

However, there is at least one central chapter in Hume’s Stuart history that breaks this pattern. That is, Chapter LIX - perhaps the most famous chapter in the whole work, covering the execution of Charles I. Nineteenth-century Whig commentators argued, with great enthusiasm, that Hume’s use of sources, especially in this particular chapter, and Hume’s description of Charles’s execution, followed Royalist sources and the Jacobite Thomas Carte in particular. Thus, more carefully balanced use of sources in this particular chapter reveals a clear intention of wanting to be (or appear to be) impartial on this specific topic (figure 8).

Of course, there is John Stuart Mill’s claim that Hume only uses Whigs when they support his Royalist bias. In the light of our data, this seems unlikely. If we compare Hume's use of Royalist sources in his treatment of the execution of Charles I to Carte, Carte’s use of Royalists, statistically, is off the chart whereas Hume’s is aligned with his use of Tory sources elsewhere in the volume.

[Figure 8.]

Hume’s influence on later Histories

- ----

A final area of interest in terms of text reuse is what it can tell us about an author’s influence on later writers. The reuse totals of Hume’s History in works following its publication are surprisingly evenly spread out over all the volumes (figure 9), and in this respect differ from the other historians considered here (figures 10 - 12). The only exception is the last volume where a drop in the amount of detected reuse fragments can be considered significant.

Of all the authors only Hume has a significant reuse arising from the volumes discussing the Civil War. The reception of Hume’s first Stuart volume, the first published volume of his History is well known. It is notable that the next volumes published, that is the following Stuart volumes, and possibly written with the angry reception of the first Stuart volume in mind, are the ones that seem to have given rise to least discussion.

[Figure 9.]

[Figure 10.]

[Figures 11 & 12.]

Bibliography

- ----

Original sources

- ----

Eighteenth-century Collections Online (GALE)

English Short-Title Catalogue (British Library)

Thomas Carte, General History of England, 4 vols., 1747-1755.

William Guthrie, History of Great Britain, 3 vols., 1744-1751.

David Hume, History of England, 8 vols., 1778.

David Hume, Enquiry concerning Human Understanding, ed. Tom L. Beauchamp, OUP, 2000.

Paul de Rapin, History of England, 15 vols., 1726-32.

Secondary sources

- ----

Herbert Butterfield, The Englishman and his history, 1944.

John Burrow, Whigs and Liberals: Continuity and Change in English Political Thought, 1988.

Duncan Forbes, Hume’s Philosophical Politics, Cambridge, 1975.

James Harris, Hume. An intellectual biography, 2015.

Colin Kidd, Subverting Scotland's Past. Scottish Whig Historians and the Creation of an Anglo-British Identity 1689–1830, Cambridge, 1993.

Royce MacGillivray, ‘Hume's "Toryism" and the Sources for his Narrative of the Great Rebellion’, Dalhousie Review, 56, 1987, pp. 682-6.

John Stuart Mill, ‘Brodie’s History of the British Empire’, Robson et al. ed. Collected works, vol. 6, pp. 3-58. (http://oll.libertyfund.org/titles/mill-the-collected-works-of-john-stuart-mill-volume-vi-essays-on-england-ireland-and-the-empire)

Ernest Mossner, "Was Hume a Tory Historian?’, Journal of the History of Ideas, 2, 1941, pp. 225-236.

Karen O’Brien, Narratives of Enlightenment: Cosmopolitan History from Voltaire to Gibbon, CUP, 1997.

Laird Okie, ‘Ideology and Partiality in David Hume's History of England’, Hume Studies, vol. 11, 1985, pp. 1-32.

Frances Palgrave, ‘Hume and his influence upon History’ in vol. 9 of Collected Historical Works, e.d R. H. Inglis Palgrave, 10 vols. CUP, 1919-22.

John Pocock, Barbarism and religion, vols. 1-2.

B. A. Ring, ’David Hume: Historian or Tory Hack?’, North Dakota Quarterly, 1968, pp. 50-59.

Claudia Schmidt, Reason in history, 2010.

Mark Spencer, ‘David Hume, Philosophical Historian: “contemptible Thief” or “honest and industrious Manufacturer”?, Hume conference, Brown, 2017.

Vesanto, Nivala, Salakoski, Salmi & Ginter: A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora. Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24. May 2017, Gothenburg, Sweden. (http://www.ep.liu.se/ecp/131/049/ecp17131049.pdf)

3:00pm - 3:30pm
Long Paper (20+10min) [abstract]

Refutatio errorum – authorship attribution on a late-medieval antiheretical treatise

Reima Välimäki

University of Turku, Cultural history

Refutatio errorum – authorship attribution on a late-medieval antiheretical treatise.

Since Peter Biller’s attribution of the Cum dormirent homines (1395) to Petrus Zwicker, perhaps the most important late medieval inquisitor prosecuting Waldensians, the treatise has become a standard source on the late medieval German Waldensianism. There is, however, another treatise, known as the Refutatio errorum, which has gained far less attention. In my dissertation (2016) I proposed that similarities in style, contents, manuscript tradition and composition of the Refutatio errorum and the Cum dormirent homines are so remarkable that Petrus Zwicker can be confirmed as the author of both texts. The Refutatio exists in four different redactions. However, the redaction edited by J. Gretser in the 17th century, and consequently used by modern scholars, does not correspond to the earlier and more popular redaction that is in the majority of preserved manuscripts.

In the proposed paper I will add a new element of verification to Zwicker’s authorship: machine-learning-based computational authorship attribution applied in the digital humanities consortium Profiling Premodern Authors (University of Turku, 2016–2019). In its simplest form, the authorship attribution is a binary classification task based on textual features (word uni/bi-grams, character n-grams). In our case, the classifications are “Petrus Zwicker” (based on features from his known treatise) and “not-Zwicker”, based on features from a background corpus consisting of medieval Latin polemical treatises, sermons and other theological works. The test cases are the four redactions of the Refutatio errorum. Classifiers used include a linear Support Vector Machine and a more complex Convolutional Neural Network. Researchers from the Turku NLP group (Aleksi Vesanto, Filip Ginter, Sampo Pyysalo) are responsible for the computational analysis.

The paper contributes to the conference theme History. It aims to bridge the gap between authorship attribution based on qualitative analysis (e.g. contents, manuscript tradition, codicological features, palaeography) and computational stylometry. Computational methods are treated as one tool that contributes to the difficult task of recognising authorship in a medieval text. The study of author profiles of four different redactions of a single work contributes to the discussions on scribes, secretaries and compilers as authors of medieval texts (e.g. Reiter 1996, Minnis 2006, Connolly 2011, Kwakkel 2012, De Gussem 2017).

Bibliography:

Biller, Peter. “The Anti-Waldensian Treatise Cum Dormirent Homines of 1395 and its Author.” In The Waldenses, 1170-1530: Between a Religious Order and a Church, 237–69. Variorum Collected Studies Series. Aldershot: Ashgate, 2001.

Connolly, Margaret. “Compiling the Book.” In The Production of Books in England 1350-1500, edited by Alexandra Gillespie and Daniel Wakelin, 129–49. Cambridge Studies in Palaeography and Codicology 14. Cambridge ; New York: Cambridge University Press, 2011.

De Gussem, Jeroen. “Bernard of Clairvaux and Nicholas of Montiéramey: Tracing the Secretarial Trail with Computational Stylistics.” Speculum 92, no. S1 (2017): S190–225. https://doi.org/10.1086/694188.

Kwakkel, Erik. “Late Medieval Text Collections. A Codicological Typology Based on Single-Author Manuscripts.” In Author Reader Book: Medieval Authorship in Theory and Practice, edited by Stephen Partridge and Erik Kwakkel, 56–79. Toronto: University of Toronto Press, 2012.

Reiter, Eric H. “The Reader as Author of the User-Produced Manuscript: Reading and Rewriting Popular Latin Theology in the Late Middle Ages.” Viator 27, no. 1 (1996): 151–70. https://doi.org/10.1484/J.VIATOR.2.301125.

Minnis, A. J. “Nolens Auctor Sed Compilator Reputari: The Late-Medieval Discourse of Compilation.” In La Méthode Critique Au Moyen Âge, edited by Mireille Chazan and Gilbert Dahan, 47–63. Bibliothèque d’histoire Culturelle Du Moyen âge 3. Turnhout: Brepols, 2006.

Välimäki, Reima. “The Awakener of Sleeping Men. Inquisitor Petrus Zwicker, the Waldenses, and the Retheologisation of Heresy in Late Medieval Germany.” PhD Thesis, University of Turku, 2016.

2:00pm - 3:30pm

T-P674-2: Crowdsourcing and Collaboration
Session Chair: Hannu Salmi

P674

2:00pm - 2:30pm
Long Paper (20+10min) [abstract]

From crowdsourcing cultural heritage to citizen science: how the Danish National Archives 25-year old transcription project is meeting digital historians

Barbara Revuelta-Eugercios^1,2, Nanna Floor Clausen¹, Katrine Tovgaard-Olsen¹

¹Rigsarkivet (Danish National Archives); ²Saxo Institute, University of Copenhagen

The Danish National Archives have the oldest crowdsourcing project in Denmark, with more than 25 million records transcribed that illuminate the live and deaths of Danes since the early 18th century. Until now, the main group interested in creating and using these resources have been amateur historians and genealogists. However, it has become clear that the material also holds immense value to historians, armed with the new digital methods. The rise of citizen science projects show, likewise, an alternative way, with clear research purposes, of using the crowdsourcing of cultural heritage material. How to reconcile the traditional crowd-centered approach of the existing projects, to the extent that we can talk about co-creation, with the narrowly-defined research questions and methodological decisions researchers required? How to increase the use of these materials by digital historians without losing the projects’ core users?

This article articulates how the Danish National Archives are answering these questions. In the first section, we discuss the tensions and problems of combining crowdsourcing digital heritage and citizen science; in the second, the implications of the crowd-centered nature of the project in the incorporation of research interests; and in the third one, we present the obstacles and solutions put in place to successfully attract digital historians to work on this material.

Crowdsourcing cultural heritage: for the public and for the humanists

In the last decades, GLAMs (galleries, libraries, archives and museums) have been embarked in digitalization projects to broaden the access, dissemination and appeal of their collections, as well as enriching them in different ways (tagging, transcribing, etc.), as part of their institutional missions. Many of these efforts have included audience or community participation, which can be loosely defined as either crowdsourcing or activities that predate or conform to the standard definition of crowdsourcing, taking Howe’s (2006) business-related definition as “the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call” (Ridge 2014). However, the key feature that differentiates these crowdsourcing cultural heritage projects is that the work the crowd performs has never been undertaken by employees. Instead, they co-create new ways for the collections to be made available, disseminated, interpreted, enriched and enjoyed that could never had been paid for within their budgets.

These projects often feature “the crowd” at both ends of the process: volunteers contribute to improve access to and availability of the collections, which in turn will benefit the general public from which volunteers are drawn. In the process, access to the digital cultural heritage material is democratized and facilitated, transcribing records, letters, menus, tagging images, digitizing new material, etc. As a knock-on effect, the research community can also benefit, as the new materials open up possibilities for researchers in the digital humanities. The generally financially limited Humanities projects could never achieve the transcription of millions of records.

At the same time, there has been a strand of academic applications of crowdsourcing in Humanities projects (Dunn and Hedges 2014). These initiatives fall within the so-called citizen science projects, which are driven by researchers and narrowly defined to answer a research question, so the tasks performed by the volunteers are lined up to a research purpose. Citizen science or public participation on scientific research, that emerged out of natural sciences projects in the mid-1990s (Bonney et al 2009), has branched out to meet the Humanities, building on a similar utilization of the crowd, i.e. institutional digitalization projects of cultural heritage material. In particular, archival material has been a rich source for such endeavours: weather observations from ship logs in Old Weather (Blaser 2014), Benthan’s works in Transcribe Bentham (Causer & Terras 2014) or restaurant menus on What’s on the menu (2014). While some of them have been carried out in cooperation with the GLAMs responsible for those collections, the new opportunities opened up for the digital humanities allow these projects to be carried out by researchers independently from the institutions that host the collections, missing a great opportunity to combine interests and avoid duplicating work.

Successfully bringing a given project to contribute to crowdsourcing cultural heritage material and citizen science faces many challenges. First, a collaboration needs to be established across at least two institutional settings – a GLAMs and a research institution- that have very different institutional aims, funding, culture and legal frameworks. GLAMs foundational missions often relate to serving the public in general first, the research community being only a tiny percentage of its users. Any institutional research they undertake on the collections is restricted to particular areas or aspects of the collections and institutional interest which, on the other hand, is less dependent on external funding. The world of Academia, on the other hand, has a freer approach to formulating research questions but is often staffed with short-term positions and projects, time-constraints and a need of immediacy of publication and the ever-present demand for proving originality and innovation.

Additionally, when moving from cultural heritage dissemination to research applications, a wide set of issues also come into view in these crowdsourcing works that can determine their development and success: the boundaries between professional and lay expertise, the balance of power in the collaboration between the public, institutions and researchers, ethical concerns in relation to data quality and data property, etc. (Riesh 2014, Shirk et al 2012).

The Danish National Archives crowd-centered crowdsourced 25-year-old approach

In this context, the Danish National Archives are dealing with the challenge of how to incorporate a more citizen-science oriented approach and attract historians (and digital humanists) to work with the existing digitized sources while maintaining its commitment to the volunteers. This challenge is of a particular difficulty in this case because not only the interests of the archives and researchers need to align, but also those of the “crowd” itself, as volunteers have played a major role in co-creating crowdsourcing for 25 years.

The original project, now the Danish Demographic Database, DDD, (www.ddd.dda.dk), is the oldest “crowdsourcing project” in the country. It started in 1992 thanks to the interest of the genealogical communities in coordinating the transcription of historical censuses and church books. (Clausen & Jørgensen 2000). From its beginning, the volunteers were actively involved in the decision-making process of what was to be done and how, while the Danish National Archives (Rigsarkivet) were in charge of coordination, management and dissemination functions. Thus, there has been a dual government of the project and a continuous conversation of negotiation of priorities, in the form of, a coordination committee, which combines members of the public and genealogical societies as well as Rigsarkivet personel.

This tradition of co-creation has shaped the current state of the project and its relationship to research. The subsequent Crowdsourcing portal, CS, (https://cs.sa.dk/), which started in 2014 with a online interface, broadened the sources under transcription and the engagement with volunteers (in photographing, counselling, etc.), and maintains a strong philosophy of serving the volunteers’ wishes and interests, rather than imposing particular lines. Crowdsourcing is seen as more than a framework for creating content: it is also a form of engagement with the collections that benefits both audiences and archive. However, it has also introduced some citizen-science projects, in which the transcriptions are intended to be used for research (e.g. the Criminality History project).

Digital history from the crowdsourced material: present and future

In spite of that largely crowd-oriented nature of this crowdsourcing project, there were also broad research interests (if not a clearly defined research project) behind the birth of DDD, so that the decisions taken in its setup ensured that the data was suitable for research. Dozens of projects and publications have made use of it, applying new digital history methods, and the data has been included in international efforts, such as the North Atlantic Population Project (NAPP.org).

However, while amply known in genealogist and amateur historian circles, the Danish National Archives large crowdsourcing projects are still either unknown or taken advantage of by historians and students in the country. Some of the reasons are related to field-specific developments, but one of the key constraints towards a wider use is, undoubtedly, the lack of adequate training. There is no systematic training for dealing with historical data or digital methods in the History degrees, even when we are witnessing a clear increase in the digital Humanities.

In this context, the Danish National Archives are trying to push their material into the hands of more digital historians, building bridges to the Danish universities by multiple means: collaboration with universities in seeking joint research projects and applications (SHIP and Link Lives project); active dissemination of the material for educational purposes across disciplines (Supercomputer Challenge at Southern Denmark University ); addressing the lack of training and familiarity of students and researchers with it through targeted workshops and courses, including training in digital history methods (Rigsarkivets Digital History Labs); and promotion of an open dialogue with researchers to identify more sources that could combine the aims of access democratization and citizen science.

References

Blaser, L., 2014 “Old Weather: approaching collections from a different angle” in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 45-56.

Bonney et al. 2009. Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. Center for Advancement of Informal Science Education (CAISE), Washington, DC

Clausen, N.C and Marker, H.J., 2000, ”The Danish Data Archive” in Hall, McCall, Thorvaldsen International historical microdata for population research, Minnesota Population Center . Minneapolis, Minnesota, 79-92,

Causer, T. and Terras, M. 2014, ”‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections”, in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 57-88.

Dunn, S. and Hedges, M. 2014“How the crowd can surprise us: Humanities crowd-sourcing and the creation of knowledge”, in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 231-246.

Howe, J. 2006, “The rise of crowdsourcing”, Wired, June.

Ridge, M. 2014, “Crowdsourcing our cultural heritage: Introduction”, in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 1-16.

Riesch, H., Potter, C., 2014. Citizen science as seen by scientists: methodological, epistemological and ethical dimensions. Public Understanding of Science 23 (1), 107–120

Shirk, J.L et al, 2012. Public participation in scientific research: a framework for deliberate design. Ecology and Society 17 (2),

2:30pm - 2:45pm
Short Paper (10+5min) [abstract]

CAWI for DH

Jānis Daugavietis, Rita Treija

Institute of Literature, Folklore and Art - University of Latvia

Survey method using questionnaire for acquiring different kinds of information from the population is old and classic way to collect the data. As examples of such surveys we can trace back to the ancient civilizations, like censuses or standardised agricultural data recordings. The main instrument of this method is question (closed-ended or open-ended) which should be asked exactly the same way to all the representatives of surveyed population. During the last 20-25 years the internet survey method (also called web, electronic, online, CAWI [computer assisted web interview] etc.) is well developed and and more and more frequently employed in social sciences and marketing research, among others. Usually CAWI is designed for acquiring quantitative data, but as in other most used survey modes (face-to-face paper assisted, telephone or mail interviews) it can be used to collect qualitative data, like un- or semi-structured text/ speech, pictures, sounds etc.

In recent years DH (digital humanities) starting to use more often the CAWI alike methodology. At the same time the knowledge of humanitarians in this field is somehow limited (because lack of previous experience and in many cases - education, humanitarian curriculum usually does not include quantitative methods). The paper seeks to analyze specificity of CAWI designed for needs of DH, when the goal of interaction with respondents is to acquire the primary data (eg questioning/ interviewing them on certain topic in order to make a new data set/ collection).

Questionnaires as the approach for collecting data of traditional culture date back to an early stage of the disciplinary history of Latvian folkloristics, namely, to the end of the 19th century and the beginning of the 20th century (published by Dāvis Ozoliņš, Eduard Wolter, Pēteris šmits, Pēteris Birkerts). The Archives of Latvian Folklore was established in 1924. Its founder and the first Head, folklorist and schoolteacher Anna Bērzkalne on regular basis addressed to Archives’ collaborators the questionnaires (jautājumu lapas) on various topics of Latvian folklore. She both created original sets of questions herself and translated into Latvian and adapted those by the Estonian and Finnish folklore scholars (instructions for collecting children’s songs by Walter Anderson; questionnaires of folk beliefs by O. A. F. Mustonen alias Oskar Anders Ferdinand Lönnbohm and Viljo Johannes Mansikka). The localised equivalents were published in the press and distributed to Latvian collectors. Printed questionnaires, such as “House and Household”, “Fishing and Fish”, “Relations between Relatives and Neighbors” and other, presented sets of questions of which were formulated in a suggestive way so that everyone who had some interest could easily engage in the work. The hand-written responses by contributors were sent to the Archives of Latvian Folklore from all regions of the country; the collection of folk beliefs in the late 1920s greatly supplemented the range of materials at the Archives.

However, the life of the survey as a method of collecting folklore in Latvia did not last long. Soon after the World War II it was overcome by the dominance of collective fieldwork and, at the end of the 20th century, by the individual field research, implying mainly the face-to-face qualitative interviews with the informants.

Only in 2017, the Archives of Latvian Folklore revitalized the approach of remote data collecting via the online questionnaires. Within the project “Empowering knowledge society: interdisciplinary perspectives on public involvement in the production of digital cultural heritage” (funded by the European Regional Development Fund), a virtual inquiry module has been developed. The working group of virtual ethnography launched a series of online surveys aimed to study the calendric practices of individuals in the 21st century. Along with working out the iterative inquiry, data accumulation and analysis tools, the researchers have tried to find solutions to the technical and ethical challenges of our day.

Mathematics, sociology and other sciences have developed a coherent theoretical methodology and have accumulated experience based knowledge for online survey tools. That rises several questions, such as:

- How much of this knowledge is known by DH?

- How much are they useful for DH? How different is DH CAWI?

- What would be the most important aspects for DH CAWI?

To answer these questions, we will make a schematic comparison of ‘traditional’ or most common CAWI of social sciences and those of DH, looking at previous experience of our work in fields and institutions of sociology, statistics and humanities.

2:45pm - 3:00pm
Short Paper (10+5min) [abstract]

Wikidocumentaries

Susanna Ånäs

Aalto University

background

Wikidocumentaries is a concept for a collaborative online space for gathering, researching and remediating cultural heritage items from memory institutions, open platforms and the participants. The setup brings together communities of interest and of expertise to work together on shared topics with onnline tools. For the memory organization, Wikidocumentaries offers a platform for crowdsourcing, for amateur and expert researchers it provides peers and audiences, and from the point of view of the open environments, it acts as a site of curation.

Current environments fall short in serving this purpose. Content aggregators focus on gathering, harmonizing and serving the content. Commercial services fail to take into account the open and connected environment in the search for profit. Research environments do not prioritize public access and broad participation. Many participatory projects live short lives from enthusiastic engagement to oblivion due to lack of planning for the sustainability of the results. Wikidocumentaries tries to battle these challenges.

This short paper will be the first attempt in creating an inventory of research topics that this environment surfaces.

the topics

Technologically the main focus of the project is investigating the use of linked open data, and especially proposing the use of Wikidata for establishing meaningful connections across collections and sustainability of the collected data.

Co-creation is an important topic in many senses. What are the design issues of the environment to encourage collaborative creative work? How can the collaboration reach out from the online environment into communities of interest in everyday life? What are the characteristics of the collaborative creations or what kind of creative entrepreneurship can such open environment promote? How to foster and expand a community of technical contributors for the open environments?

The legislative environment sets the boundaries for working. How will privacy and openness be balanced? Which copyright licensing schemes can encourage widest participation? Can novel technologies of personal information management be applied to allow wider participation?

The paper will draw together recent observations from a selection of disciplines for practices in creating participatory knowledge environments.

3:00pm - 3:15pm
Short Paper (10+5min) [abstract]

Heritage Here, K-Lab and intra-agency collaboration in Norway

Vemund Olstad, Anders Olsson

Directorate for Cultural Heritage,

Heritage Here, K-Lab and intra-agency collaboration in Norway

Introduction

This paper aims to give an overview of an ongoing collaboration between four Norwegian government agencies, by outlining its history, its goals and achievements and its current status. In doing so, we will, hopefully, be able to arrive at some conclusions about the usefulness of the collaboration itself – and whether or not anything we have learned during the collaboration can be used as a model for – or an inspiration to – other projects within the cultural heritage sector or the broader humanities environment.

First phase – “Heritage Here” 2012 – 2015

Heritage Here (or “Kultur- og naturreise” as it is known in its native Norwegian) was a national project which ran between 2012 and 2015 (http://knreise.org/index.php/english/). The project had two main objectives:

1. To help increase access to and use of public information and local knowledge about culture and nature

2. To promote the use of better quality open data.

The aim being that anyone with a smartphone can gain instant access to relevant facts and stories about their local area wherever they might be in the country.

The project was a result of cross-agency cooperation between five agencies from 3 different ministries. Project partners included:

• the Norwegian Mapping Authority (Ministry of Local Government and Modernization).

• the Arts Council Norway and the National Archives (Ministry of Culture).

• the Directorate of Cultural Heritage and (until December 2014) the Norwegian Environment Agency (the Ministry of Climate and Environment).

Together, these partners made their own data digitally accessible; to be enriched, geo-tagged and disseminated in new ways. Content included information about animal and plant life, cultural heritage and historical events, and varied from factual data to personal stories. The content was collected into Norway’s national digital infrastructure ‘Norvegiana’ (http://www.norvegiana.no/) and from there it can be used and developed by others through open and documented API’s to create new services for business, tourism, or education. Parts of this content were also exported into the European aggregation service Europeana.eu (http://www.europeana.eu).

In 2012 and 2013 the main focus of the project was to facilitate further development of technical infrastructures - to help extract data from partner databases and other databases for mobile dissemination. However, the project also worked with local partners in three pilot areas:

• Bø and Sauherad rural municipalities in Telemark county

• The area surrounding Akerselva in Oslo

• The mountainous area of Dovre in Oppland county.

These pilots were crucial to the project, both as an arena to test the content from the various national datasets, but also as a testing ground for user community participation on a local and regional level. They have also been an opportunity to see Heritage Here’s work in a larger context. The Telemark pilot was for example, used to test the cloud-based mapping tools developed in the Best Practice Network “LoCloud” (http://www.locloud.eu/) which where coordinated by the National Archives of Norway.

In addition to the previously mentioned activities Heritage Here worked towards being a competence builder – organizing over 20 workshops on digital storytelling and geo-tagging of data, and numerous open seminars with topics ranging from open data and LOD, to IPR and copyright related issues. The project also organized Norway’s first heritage hackathon “#hack4no” in early 2014 (http://knreise.org/index.php/2014/02/27/hack4no-a-heritage-here-hackathon/). This first hackathon has since become an annual event – organized by one of the participating agencies (The Mapping authority) – and a great success story, with 50+ participants coming together to create new and innovative services by using open public data.

Drawing on the experiences the project had gathered, the project focused its final year on developing various web-based prototypes which use a map as the users starting point. These demonstrate a number of approaches for visualizing and accessing different types of cultural heritage information from various open data sets in different ways – such as content related to a particular area, route or subject. These prototypes are free and openly accessible as web-tools for anyone to use (http://knreise.no/demonstratorer/). The code to the prototypes has been made openly available so it can be used by others – either as it is, or as a starting point for something new.

Second phase – “K-Lab” 2016 –>

At the end of 2015 Heritage Here ended as a project. But the four remaining project partners decided to continue their digital cross-agency cooperation. So, in January 2016 a new joint initiative with the same core governmental partners was set up. Heritage here went from being a project to being a formalized collaboration between four government agencies. This new partnership is set up to focus on some key issues seen as crucial for further development of the results that came out of the Heritage Here project. Among these are:

• In cooperation develop, document and maintain robust, common and sustainable APIs for the partnerships data and content.

• Address and discuss the need for, and potential use of, different aggregation services for this field.

• Develop and maintain plans and services for a free and open flow of open and reusable data between and from the four partner organizations.

• In cooperation with other governmental bodies organize another heritage hackathon in October 2016 with the explicit focus on open data, sharing, reuse and new and other services for both the public and the cultural heritage management sector.

• As a partnership develop skillsets, networks, arenas and competence for the employees in the four partner organizations (and beyond) within this field of expertise.

• Continue developing and strengthening partnerships on a local, national and international level through the use of open workshops, training, conferences and seminars.

• Continue to work towards improving data quality and promoting the use of open data.

One key challenge at the end of the Heritage here project was making the transition from being a project group to becoming a more permanent organizational entity – without losing key competence and experience. This was resolved by having each agency employing one person from the project each and assigning this person in a 50% position to the K-Lab collaboration. The remaining time was to be spent on other tasks for the agency. This helped ensure the following things:

• Continuity. The same project group could continue working, albeit organized in a slightly different manner.

• Transfer of knowledge. Competence built during Heritage here was transferred to organizational line of the agencies involved.

• Information exchange. By having one employee from each agency meeting on a regular basis information, ideas for common projects and solutions to common problems could easily be exchanged between the collaboration partners.

I addition to the allocation of human resources, each agency chipped in roughly EUR 20.000 as ‘free funds’. The main reasoning behind this kind of approach was to allow the new entity a certain operational freedom and room for creativity – while at the same time tying it closer to the day-to-day running of the agencies.

Based on an evaluation of the results achieved in Heritage Here, the start of 2016 was spent planning the direction forward for K-Lab, and a plan was formulated – outlining activities covering several thematic areas:

Improving data quality and accessibility. Making data available to the public was one of the primary goals of the Heritage here project, and one most important outcomes of the project was the realisation that in all agencies involved there is huge room for improvement in the quality of the data we make available and how we make it accessible. One of K-Lab’s tasks will be to cooperate on making quality data available through well documented API’s and making sure as much data as possible have open licenses that allow unlimited re-use.

Piloting services. The work done in the last year of Heritage Here with the map service mentioned above demonstrated to all parties involved the importance of actually building services that make use of our own open data. K-lab will, as a part of its scope, function as a ‘sandbox’ for both coming up with new ideas for services, and – to the extent that budget and resources allow for it – try out new technologies and services. One such pilot service, is the work done by K-lab – in collaboration with the Estonian photographic heritage society – in setting up a crowdsourcing platform for improving metadata on historic photos (https://fotodugnad.ra.no/).

For 2018, K-Lab will start looking into building a service making use of linked open data from our organizations. All of our agencies are data owners that responsible for authority data in some form or another – ranging from geo names to cultural heritage data and person data. Some work has been done already to bring our technical departments closer in this field, but we plan to do ‘something’ on a practical level next year.

Building competence. In order to facilitate the exchange of knowledge between the collaboration partners K-Lab will arrange seminars, workshops and conferences as arenas for discussing common challenges, learning from each other and building networks. This is done primarily to strengthen the relationship between the agencies involved – but many activities will have a broader scope. One such example is the intention to arrange workshops – roughly every two months – on topics that are relevant to our agencies, but that are open to anyone interested. To give a rough overview of the range of topics, these workshops were arranged in 2017:

• A practical introduction to Cidoc-CRM (May)

• Workshop on Europeana 1914-1918 challenge – co-host: Wikimedia Norway (June)

• An introduction to KulturNAV – co-host: Vestfoldmuseene (September)

• Getting ready for #hack4no (October)

• Transkribus – Text recognition and transcription of handwritten text - co-host: The Munch museum (November)

Third phase – 2018 and beyond

K-lab is very much a work in progress, and the direction it takes in the future depends on many factors. However, a joint workshop was held in September 2017 to evaluate the work done so far – and to try and map out a direction for the future. Employees from all levels in the organisations were present, with invited guests from other institutions from the cultural sector – like the National Library and Digisam from Sweden – to evaluate, discuss and suggest ideas.

No definite conclusions were drawn, but there was an overall agreement that the focus on the three areas described above is of great importance, and that the work done so far by the agencies together has been, for the most part, successful. Setting up arenas for discussing common problems, sharing success stories and interacting with colleagues across agency boundaries has been a key element in the relative success of K-Lab so far. This work will continue into 2018 with focus on thematic groups on linked open data and photo archives, and a new series of workshops is being planned. The experimentation with technology will continue, and hopefully new ideas will be brought forward and realised over the course of the next year(s).

3:15pm - 3:30pm
Short Paper (10+5min) [abstract]

Semantic Annotation of Cultural Heritage Content

Uldis Bojārs^1,2, Anita Rašmane¹

¹National Library of Latvia; ²Faculty of Computing, University of Latvia

This talk focuses on the semantic annotation of textual content and on annotation requirements that emerge from the needs of cultural heritage annotation projects. The information presented here is based on two text annotation case studies at the National Library of Latvia and was generalised to be applicable to a wider range of annotation projects.

The two case studies examined in this work are (1) correspondence (letters) from the late 19th century between two of the most famous Latvian poets Aspazija and Rainis, and (2) a corpus of parliamentary transcripts that document the first four parliament terms in Latvian history (1922-1934).

The first half of the talk focus on the annotation requirements collected and how they may be implemented in practical applications. We propose a model for representing annotation data and implementing annotation systems. The model includes support for three core types of annotations - simple annotations that may link to named entities, structural annotations that mark up portions of the document that have a special meaning within a context of a document and composite annotations for more complex use cases. The model also introduces a separate Entity database for maintaining information about the entities referenced from annotations.

In the second half of the talk we will present a web-based semantic annotation tool that was developed based on this annotation model and requirements. It allows users to import textual documents (various document formats such as HTML and .docx are supported), create annotations and reference the named entities mentioned in these documents. Information about the entities references from annotations is maintained in a dedicated Entity database that supports links between entities and can point to additional information about these entities including Linked Open Data resources. Information about these entities is published as Linked Data. Annotated documents may be exported (along with annotation and entity information) in a number of representations including a standalone web view.

3:30pm - 4:00pm

Coffee break / Surprise Event

Lobby, Porthania

4:00pm - 5:30pm

T-PII-3: Augmented Reality
Session Chair: Sanita Reinsone

PII

4:00pm - 4:30pm
Long Paper (20+10min) [abstract]

Extending museum exhibits by embedded media content for an embodied interaction experience

Jan Torpus

University of Applied Sciences and Arts Northwestern Switzerland

Extending museum exhibits by embedded media content for an embodied interaction experience

Investigation topic

Nowadays, museums not only collect, categorize, preserve and present; a museum must also educate and entertain, all the while following market principles to attract visitors. To satisfy this mission, they started to introduce interactive technologies in the 1990s, such as multimedia terminals and audio guides, which have since become standard for delivering contextual information. More recently there has been a shift towards the creation of personalized sensorial experiences by applying user tracking and adaptive user modeling based on location-sensitive and context-aware sensor systems with mobile information retrieval devices. However, the technological gadgets and complex graphical user interfaces (GUIs) generate a separate information layer and detach visitors from the physical exhibits. The attention is drawn to the screen and the interactive technology becomes a competing element with the environment and the exhibited collection [Stille 2003, Goulding 2000, Wakkary 2007]. Furthermore, the vast majority of visitors comes in groups and the social setting gets interrupted by the digital information extension [Petrelli 2016].

First studies about museum visitor behavior were carried out at the end of the 19th and during the 20th Century [Robinson 1928, Melton 1972]. More recently, a significant body of ethnographic research about visitor experience of single persons and groups has contributed studies about technologically extended and interactive installations. Publications about visitor motivation, circulation and orientation, engagement, learning processes, as well as cognitive and affective relationship to the exhibits are of interest for our research approach [Bitgood 2006, Vom Lehn 2007, Dudley 2010, Falk 2011]. Most relevant are studies of the Human Computer Interaction (HCI) researcher community in the fields of Ubiquitous Computing, Tangible User Interfaces and Augmented Reality, investigating hybrid exhibition spaces and the bridging of the material and physical with the technologically mediated and virtual [Hornecker 2006, Wakkary 2007, Benford 2009, Petrelli 2016].

Approach

At the Institute of Experimental Design and Media Cultures (IXDM) we have conducted several design research projects applying AR for cultural applications but got increasingly frustrated with disturbing GUIs and physical interfaces such as mobile phones and Head Mounted Displays. We therefore started to experiment with Ubiquitous Computing, the Internet of Things and physical computing technologies that became increasingly accessible for the design community during the last twelve years because of shrinking size and price of sensors, actuators and controllers. In the presented research project, we therefore examine the extension of museum exhibits by physically embedded media technologies for an embodied interaction experience. We intend to overcome problems of distraction, isolation and stifled learning processes with artificial GUIs by interweaving mediated information directly into the context of the exhibits and by triggering events according to visitor behavior.

Our research approach was interdisciplinary and praxis-based including the observation of concept, content and design development and technological implementation processes before the final evaluations. The team was composed of two research partners, three commercial/engineering partners and three museums, closely working together on three tracks: technology, design and museology. The engineering partners developed and implemented a scalable distributed hardware node system and a Linux-based content management system. It is able to detect user behavior and accordingly process and display contextual information. The content design team worked on three case studies following a scenario-driven prototyping approach. They first elaborated criteria catalogues, suitable content and scenarios to define the requirement profiles for the distributed technological environment. Subsequently, they carried out usability studies in the Critical Media Lab of the IXDM and finally set up and evaluated three case studies with test persons. The three museums involved, the Swiss Open-Air Museum Ballenberg, the Roman City of Augusta Raurica and the Museum der Kulturen Basel, all have in common that they exhibit objects or rooms that function as staged knowledge containers and can therefore be extended by means of ubiComp technologies. The three case studies were thematically distinct and offered specific exhibition situations:

• Case study 1: Roman City of Augusta Raurica: “The Roman trade center Schmidmatt“. The primary imparting concept was “oral history”, and documentary film served as a related model: An archaeologist present during the excavations acted as a virtual guide, giving visitors information about the excavation and research methods, findings, hypotheses and reconstructions.

• Case study 2: Open-Air Museum Ballenberg: “Farmhouse from Uesslingen“. The main design investigation was “narratives” about the former inhabitants and the main theme “alcohol”: Its use for cooking, medical application, religious rituals and abuse.

• Case study 3: Museum der Kulturen Basel: “Meditation box“. The main design investigation was “visitor participation” with biofeedback technologies.

Technological development

This project entailed the development of a prototype for a commercial hardware and software toolkit for exhibition designers and museums. Our technology partners elaborated a distributed system that can be composed and scaled according to the specific requirements of an exhibition. The system consists of two main parts:

• A centralized database with an online content management system (CMS) to setup and control the main software, node scripts, media content and hardware configuration. After the technical installation it also allows the museums to edit, update, monitor and maintain their exhibitions.

• Different types of hardware nodes that can be extended by specific types of sensors and actuators. Each node, sensor and actuator has its own separate ID; they are all networked together and are therefore individually accessible via the CMS. A node can run on a Raspberry Pi, for example, an FPGA based on Cyclone V or any desktop computer and can thus be adapted to the required performance.

The modular architecture allows for technological adaption or extension according to specific needs. First modules were developed for the project and then implemented according to the case study scenarios.

Evaluation methods

Through a participatory design process, we developed a scenario for each case study, suitable for walkthrough with several test persons. Comparable and complementary case study scenarios allowed us to identify risks and opportunities for exhibition design and knowledge transfer and define the tasks and challenges for technical implementation. For the visitor evaluation, we selected end-users, experts and in-house museum personnel. The test persons were of various genders and ages (including families with children), had varying levels of technical understanding and little or no knowledge about the project. For each case study we asked about 12 persons or groups of persons to explore the setting as long as they wanted (normally 10–15 minutes). They agreed to be observed and video recorded during the walkthrough and to participate in a semi-structured interview afterwards. We also asked the supervisory staff about their observations and mingled with regular visitors to gain insight into their primary reactions, comments and general behavior. The evaluation was followed by a heuristic qualitative content analysis of the recorded audio and video files and the notes we took during the interviews. Shortly after each evaluation we presented and discussed the results in team workshops.

Findings and Conclusions

The field work lead to many detailed insights about interweaving interactive mediated information directly into the context of physical exhibits. The findings are relevant for museums, design researchers and practitioners, the HCI community and technology developers. We organized the results along five main investigation topics:

1. Discovery-based information retrieval

Unexpected ambient events generate surprise and strong experiences but also contain the risk of information loss if visitors do not trigger or understand the media aids. The concept of unfolding the big picture by gathering distributed, hidden information fragments requires visitor attentiveness. Teasing, timing and the choice of location are therefore crucial to generate flowing trajectories.

2. Embodied interaction

The ambient events are surprising but visitors are not always aware of their interactions. The unconscious mode of interaction lacks of an obvious interaction feedback. But introducing indicated hotspots or modes of interactions destroys the essence of the project’s approach. The fact that visitors do not have to interact with technical devices or learn how to operate graphical user interfaces means that no user groups are excluded from the experience and information retrieval.

3. Non-linear contextual information accumulation

When deploying this project’s approach as a central exhibition concept, information needs to be structured hierarchically. Text boards or info screens are still a good solution for introducing visitors to the ways they can navigate the exhibition. The better the basic topics and situations are initially introduced, the more freedom emerges for selective and memorable knowledge staged in close context to the exhibits.

4. Contextually extended physical exhibits

A crucial investigation topic was the correlation between the exhibit and the media extension. We therefore declined concepts that would overshadow the exhibition and would use it merely as a stage for storytelling with well-established characters or as an extensive media show. The museums requested that media content fade in only shortly when someone approached a hotspot and that there were no technical interfaces or screens for projections that challenged the authenticity of the exhibits. We also discussed to what extend the physical exhibit should be staged to bridge the gap to the media extension.

5. Invisibly embedded technology

The problem of integrating sensors, actuators and controllers into cultural heritage collections was a further investigation topic. We used no visible displays to leave the exhibition space as pure as possible and investigated the applicability of different types of media technologies.

Final conclusion

Our museum partners agreed that the approach should not be implemented as a central concept and dense setting for an exhibition. As often propagated by exhibition makers, first comes the well-researched and elaborated content and carefully constructed story line, and only then the selection of the accurate design approach, medium and form of implementation. This rule also seems to apply to ubiComp concepts and technologies for knowledge transfer. The approach should be applied as a discreet additional information layer or just as a tool to be used when it makes sense to explain something contextually or involve visitors emotionally.

References

Steve Benford et al. 2009. From Interaction to Trajectories: Designing Coherent Journeys Through User Experiences. Proc. CHI ’09, ACM Press. 709–718.

Stephen Bitgood. 2006. An Analysis of Visitor Circulation: Movement Patterns and the General Value Principle. Curator the museum journal, Volume 49, Issue 4,463–475.

John Falk. 2011. Contextualizing Falk’s Identity-Related Visitor Motivational Model. Visitors Studies. 14, 2, 141-157.

Sandra Dudley. 2010. Museum materialities: Objects, sense and feeling. In Dudley, S. (ed.) Museum Materialities: Objects, Engagements, Interpretations. Routledge, UK, 1-18.

Christina Goulding. 2000. The museum environment and the visitor experience. European Journal of marketing 34, no. 3/4, pp. 261-278.

Eva Hornecker and Jacob Buur. 2006. Getting a Grip on Tangible Interaction: A Framework on Physical Space and Social Interaction. CHI, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 437-446.

Dirk vom Lehn, Jon Hindmarsh, Paul Luff, Christian Heath. 2007. Engaging Constable: Revealing art with new technology. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '07), 1485-1494.

Arthur W. Melton. 1972. Visitor behavior in museums: Some early research in environmental design. In Human Factors. 14(5): 393-403.

Edward S. Robinson. 1928. The behavior of the museum visitor. Publications of the American Association of Museums, New Series, Nr. 5. Washington D.C.

Daniela Petrelli, Nick Dulake, Mark T. Marshall, Anna Pisetti, Elena Not. 2016. Voices from the War: Design as a Means of Understanding the Experience of Visiting Heritage. Proceedings Human-Computer Interaction, San Jose, CA, USA.

Alexander Stille. 2003. The future of the past. Macmillan. Pan Books Limited.

Ron Wakkary and Marek Hatala. 2007. Situated play in a tangible interface and adaptive audio museum guide. Published online: 4 November 2006. Springer-Verlag London Limited.

4:30pm - 5:00pm
Long Paper (20+10min) [abstract]

Towards an Approach to Building Mobile Digital Experiences For University Campus Heritage & Archaeology

Ethan Watrall

Michigan State University,

The spaces we inhabit and interact with on a daily basis are made up of layers of cultural activity that are, quite literally, built up over time. While museum exhibits, archaeological narratives, and public programs communicate this heritage, they often don’t allow for the public to experience interactive, place-based, and individually driven exploration of content and spaces. Further, designers of public heritage and archaeology programs rarely explore the binary nature of both the presented content and the scholarly process by which the understanding of that content was reached. In short, the scholarly narrative of material culture, heritage, and archaeology is often hidden from public exploration, engagement, and understanding. Additionally, many traditional public heritage and archaeology programs often find it challenging to negotiate the balance between the voice and goals of the institution and those of communities and groups. In recent years, the maturation of mobile and augmented reality technology has provided heritage institutions, sites of memory and memorialization, cultural landscapes, and archaeological projects with interesting new avenues to present research and engage the public. We are also beginning to see exemplar projects that suggest fruitful models for moving the domain of mobile heritage forward considerably.

University campuses provide a particularly interesting venue for leveraging mobile technology in the pursuit of engaging, place-based heritage and archaeology experiences. University campuses are usually already well traveled public spaces, and therefore don’t elicit the same level of concern that you might find in other contexts for publicly providing the location of archaeological and heritage sites and resources. They have a built in audience of alumni and students eager to better understand the history and heritage of their home campus. Finally, many university campuses are starting to seriously think of themselves as places of heritage and memory, and are developing strategies for researching, preserving, and presenting their own cultural heritage and archaeology.

It is within this context that this paper will explore a deeply collaborative effort at Michigan State University that leverages mobile technology to build an interactive and place-based interpretive layer for campus heritage and archaeology. Driven by the work of the Michigan State University Campus Archaeology Program, an internationally recognized initiative that is unique in its approach to campus heritage, these efforts have unfolded across a number of years and evolved to meet the ever changing need to present the rich and well studied heritage and archaeology of Michigan State University's historic campus.

Ultimately, the goal of this paper is not only to present and discuss the efforts at Michigan State University, but to provide a potential model for other university campuses interested in leveraging mobile technology to produce engaging digital heritage and archaeology experiences.

5:00pm - 5:30pm
Long Paper (20+10min) [publication ready]

Zelige Door on Golborne Road: Exploring the Design of a Multisensory Interface for Arts, Migration and Critical Heritage Studies

Alda Terracciano

University College London,

In this paper I discuss the multisensory digital interface and art installation Zelige Door on Golborne Road as part of the wider research project ‘Mapping Memory Routes: Eliciting Culturally Diverse Collective Memories for Digital Archives’. The interface is conceived as a tool for capturing and displaying the living heritage of members of Moroccan migrant communities, shared through an artwork composed of a digital interactive sensorial map of Golborne Road (also known as Little Morocco), which includes physical objects related to various aspects of Moroccan culture, each requiring a different sense to be experienced (smell, taste, sight, hearing, touch). Augmented Reality (AR) and olfactory technologies have been used in the interface to superimpose pre-recorded video material and smells to the objects. As a result, the neighbourhood is represented as a living museum of cultural memories expressed in the form of artefacts, sensory stimulation and narratives of citizens living, working or visiting the area. Based on a model I developed for the multisensory installation ‘Streets of...7 cities in 7 minutes’, the interface was designed with Dr Mariza Dima (HCI designer), and Prof. Monica Bordegoni and Dr Marina Carulli (olfactory technology designers) to explore new methods able to elicit cultural Collective Memories through the use of multi-sensory technologies. The tool is also aimed at stimulating collective curatorial practices and democratise decision-making processes in urban planning and cultural heritage.

4:00pm - 5:30pm

T-PIII-3: Computational Literary Analysis
Session Chair: Mads Rosendahl Thomsen

PIII

4:00pm - 4:30pm
Long Paper (20+10min) [abstract]

A Computational Assessment of Norwegian Literary “National Romanticism”

Ellen Rees

University of Oslo,

In this paper, I present findings derived from a computational analysis of texts designated as “National Romantic” in Norwegian literary historiography. The term “National Romantic,” which typically designates literary works from approximately 1840 to 1860 that are associated with national identity formation, first appeared decades later, in Henrik Jæger’s Illustreret norsk litteraturhistorie from 1896. Cultural historian Nina Witoszek has on a number of occasions written critically about the term, claiming that it is misleading because the works it denotes have little to do with larger international trends in Romanticism (see especially Witoszek 2011). Yet, with the exception of a 1985 study by Asbjørn Aarseth, it has never been interrogated systematically in the way that other period designations such as “Realism” or “Modernism” have. Nor does Aarseth’s investigation attempt to delimit a definitive National Romantic corpus or account for the remarkable disparity among the works that are typically associated with the term. “National Romanticism” is like pornography—we know it when we see it, but it is surprisingly difficult to delineate in a scientifically rigorous way.

Together with computational linguist Lars G. Johnsen and research assistants Hedvig Solbakken and Thomas Rasmussen, I have prepared a corpus of 217 text that are mentioned in connection with “National Romanticism” in the major histories of Norwegian literature and textbooks for upper secondary instruction in Norwegian literature. I will discuss briefly some of the logistical challenges associated with preparing this corpus.

This corpus forms the point of departure for a computational analysis employing various text-mining methods in order to determine to what degree the texts most commonly associated with “National Romanticism” share significant characteristics. In the popular imagination, the period is associated with folkloristic elements such as supernatural creatures (trolls, hulders), rural farming practices (shielings, herding), and folklife (music, rituals) as well as nature motifs (birch trees, mountains). We therefore employ topic modeling in order to map the frequency and distribution of such motifs across time and genre within the corpus. We anticipate that topic modeling will also reveal unexpected results beyond the motifs most often associated with National Romanticism. This process should prepare us to take the next step and, inspired by Matthew Wilkens’ recent work generating “clusters” of varieties within twentieth-century U.S. fiction, create visualizations of similarities and differences among the texts in the National Romanticism corpus (Wilkens 2016).

Based on these initial computational methods, we hope to be able to answer some of the following literary historical questions:

¥ Are there identifiable textual elements shared by the texts in the National Romantic canon?

¥ What actually defines a National Romantic text as National Romantic?

¥ Do these texts cluster in a meaningful way chronologically?

¥ Is “National Romanticism” in fact meaningful as a period designation, or alternately as a stylistic designation?

¥ Are there other texts that share these textual elements that are not in the canon?

¥ If so, why? Do gender, class or ethnicity have anything to do with it?

To answer the last two questions, we need to use the “National Romanticism” corpus as a sub-corpus and “trawl-line” within the full corpus of nineteenth-century Norwegian textual culture, carrying out sub-corpus topic modeling (STM) in order to determine where similarities with texts from outside the period 1840–1860 arise (Tangherlini and Leonard 2013). For the sake of expediency, we use the National Library of Norway’s Digital Bookshelf as our full corpus, though we are aware that there are significant subsets of Norwegian textual culture that are not yet included in this corpus. Despite certain limitations, the Digital Bookshelf is one of the most complete digital collections of a national textual culture currently available.

For the purposes of DHN 2018, this project might best be categorized as an exploration of cultural heritage, understood in two ways. On the one hand, the project is entirely based on the National Library of Norway’s Digital Bookshelf platform, which, as an attempt to archive as much as possible of Norwegian textual culture in a digital and publicly accessible archive, is in itself a vehicle for preserving cultural heritage. On the other hand, the concept of “National Romanticism” is arguably the most widespread, but least critically examined means of linking cultural heritage in Norway to a specifically nationalist agenda.

References:

Jæger, Henrik. 1896. Illustreret norsk litteraturhistorie. Bind II. Kristiania: Hjalmar Biglers forlag.

Tangherlini, Timothy R. and Peter Leonard. 2013. “Trawling in the Sea of the Great Unread: Sub-Corpus Topic Modeling and Humanities Research.” Poetics 41.6: 725–749.

Wilkens, Matthew. 2016. “Genre, Computation, and the Varieties of Twentieth-Century U.S. Fiction.” CA: Journal of Cultural Analytics (online open-access)

Witoszek, Nina. 2011. The Origins of the “Regime of Goodness”: Remapping the Cultural History of Norway. Oslo: Universitetsforlaget.

Aarseth, Asbjørn. 1985. Romantikken som konstruksjon: tradisjonskritiske studier i nordisk litteraturhistorie. Bergen: Universitetsforlaget.

4:30pm - 4:45pm
Short Paper (10+5min) [abstract]

Prose Rhythm in Narrative Fiction: the case of Karin Boye's Kallocain

Carin Östman, Sara Stymne, Johan Svedjedal

Uppsala university,

Prose Rhythm in Narrative Fiction: the case of Karin Boye’s Kallocain

Swedish author Karin Boye’s (1900-1941) last novel Kallocain (1940) is an icily dystopian depiction of a totalitarian future. The protagonist Leo Kall first embraces this system, but for various reasons rebels against it. The peripety comes when he gives a public speech, questioning the State. It has been suggested (by the linguist Olof Gjerdman) that the novel – which is narrated in the first-person mode – from exactly this point on is characterized by a much freer rhythm (Gjerdman 1942). This paper sets out to test this hypothesis, moving on from a discussion of the concept of rhythm in literary prose to an analysis of various indicators in different parts of Kallocain and Boye’s other novels.

Work on this project started just a few weeks ago. So far we have performed preliminary experiments with simple text quality indicators, like word length, sentence length, and the proportion of punctuation marks. For all these indicators we have compared the first half of the novel, up until the speech, the second half of the novel, and as a contrast also the "censor's addendum", which is a short last chapter of the novel, written by an imaginary censor. For most of these indicators we find no differences between the two major parts of the novel. The only result that points to a more strict rhythm in the first half is that the proportion of long words, both as counted in characters and syllables, are considerably higher there. For instance, the percentage of words with at least five syllables is 1.85% in the first half, and 1.03% in the second half.

The other indicators with a difference does not support the hypothesis, however. In the first half, the sections are shorter, there is proportionally more speech utterances, and there is a higher proportion of three consecutive dots (...), which are often used to mark hesitation. If we compare these two halves to the censor's addendum, however, we can clearly see that the addendum is written in a stricter way, with for instance a considerably higher proportion of long words (4.90% of the words have more than five syllables) and more than double as long sentences.

In future analysis, we plan to use more fine-tuned indicators, based on a dependency parse of the text, from which we can explore issues like phrase length and the proportion of sub-clauses. Separating out speech from non-speech also seems important. We also plan to explore the variation in our indicators, rather than just looking at averages, since this has been suggested in literature on rhythm in Swedish prose (Holm 2015).

Through this initial analysis we have also learned about some of the challenges of analyzing literature. For instance, it is not straightforward to separate speech from non-speech, since the end of utterances are often not clearly marked in Kallocain, and free indirect speech is sometimes used. We think this would be important for future analysis, as well as attribution of speech (Elson & McKeown, 2010), since the speech of the different protagonists cannot be expected to vary in the two parts to the same degree.

References

Boye, Karin (1940) Kallocain: roman från 2000-talet. Stockholm: Bonniers.

Elson, David K. and McKeown, Kathleen R. (2010) Automatic Attribution of Quoted Speech in Literary Narrative. In Proceedings of the 24th AAAI Conference on Artificial Intelligence. The AAAI Press, Menlo Park, pp 1013–1019.

Gjerdman, Olof (1942) Rytm och röst. In Karin Boye. Minnen och studier. Ed. by M. Abenius and O. Lagercrantz. Stockholm: Bonniers, pp 143–160.

Holm, Lisa (2015) Rytm i romanprosa. In Det skönlitterära språket. Ed. by C. Östman. Stockholm: Morfem, pp 215–235.

Authors: Sara Stymne, Johan Svedjedal, Carin Östman (Uppsala University)

4:45pm - 5:00pm
Short Paper (10+5min) [abstract]

The Dostoyevskian Trope: State Incongruence in Danish Textual Cultural Heritage

Kristoffer Laigaard Nielbo, Katrine Frøkjær Baunvig

University of Southern Denmark,

In the history of prolific writers, we are often confronted with the figure of the suffering or tortured

writer. Setting aside metaphysical theories, the central claim seems to be that a state incongruent

dynamic is an intricate part of the creativty process. Two propositions can be derived this claim,

1: the creative state is inversely proportional to the emotional state, and 2: the creative state is

causally predicted by the emotional state. We call this the creative-emotional dynamic ‘The

Dostojevskian Trope’. In this paper we present a method for studying the dostojevskian trope in

prolific writers. The method combines Shannon entropy as an indicator of lexical density and

readability with fractal analysis in order to measure creative dynamics over multiple documents.

We generate a sentiment time series from the same documents and test for causal dependencies

between the creative and sentiment time series. We illustrate the method by searching for the

dostojevskian trope in Danish textual cultural heritage, specifically three highly prolific writers

from the 19th century, namely, N.F.S. Grundtvig, H.C. Andersen, and S.A. Kierkegaard.

5:00pm - 5:30pm
Long Paper (20+10min) [abstract]

Interdisciplinary advancement through the unexpected: Mapping gender discourses in Norway (1840-1913) with Bokhylla

Heidi Karlsen

University of Oslo,

Abstract for long format presentation

Heidi Karlsen, University of Oslo

Ph.D. Candidate in literature, Cand.philol. in philosophy

Interdisciplinary advancement through the unexpected: Mapping gender discourses in Norway (1840-1913) with Bokhylla

This presentation discusses challenges related to sub-corpus topic modeling in the study of gender discourses in Norway from 1840 till 1913 and the role of interdisciplinary collaboration in this process. Through collaboration with the Norwegian National Library, data-mining techniques are used in order to retrieve data from the digital source, Bokhylla [«the Digital Bookshelf»], for the analysis of women’s «place» in society and the impact of women writers on this discourse. My project is part of the research project «Data-mining the Digital Bookshelf», based at the University of Oslo.

1913, the closing year of the period I study, is the year of women’s suffrage in Norway. I study the impact women writers had on the debate in Norway regarding women’s «place» in society, during the approximately 60 years before women were granted the right to vote. A central hypothesis for my research is that women writers in the period had an underestimated impact on gender discourses, especially in defining and loading key words with meaning (drawing on mainly Norman Fairclough’s theoretical framework for discourse analysis). In this presentation, I examine a selection of Swedish writer Fredrika Bremer’s texts, and their impact on gender discourses in Norway.

The Norwegian National Library’s Digital Bookshelf, is the main source for the historical documents I use in this project. The Digital Bookshelf includes a vast amount of text published in Norway over several centuries, text of a great variety of genres, and offers thus unique access to our cultural heritage. Sub-corpus topic modeling (STM) is the main tool that has been used to process the Digital Bookshelf texts for this analysis. A selection of Bremer’s work has been assembled into a sub-corpus. Topics have then been generated from this corpus and then applied to the full Digital Bookshelf corpus. During the process, the collaboration with the National Library has been essential in order to overcome technical challenges. I will reflect upon this collaboration in my presentation. As the data are retrieved, then analyzed by me as a humanities scholar, and weaknesses in the data are detected, the programmer, at the National Library assisting us on the project, presents, modifies and develops tools in order to meet our challenges. These tools might in turn represent additional possibilities beyond what they were proposed for. New ideas in my research design may emerge as a result. Concurrently, the algorithms created at such a stage in the process, might successively be useful for scholars in completely different research projects. I will mention a few examples of such mutually productive collaborations, and briefly reflect upon how these issues are related to questions regarding open science.

In this STM process, several challenges have emerged along the way, mostly related to OCR errors. Some illustrative examples of passages with such errors will be presented for the purpose of discussing the measures undertaken to face the problems they give rise to, but also for demonstrating the unexpected progress stemming from these «defective» data. The topics used as a «trawl line»(1), in the initial phase of this study, produced few results. Our first attempt to get more results was to revise down the required Jaccard similarity(2). This entails that the quantity of a topic that had to be identified in a passage in order for it to qualify as a hit, is lowered. As this required topic quantity was lowered, a great number of results were obtained. The obvious weakness of these results, however, is that the rather low required topic match, or relatively low value of the required Jaccard similarity, does not allow us to affirm a connection between these passages and Bremer’s text. Nevertheless, the results have still been useful, for two reasons. Some of the data have proven to be valuable sources for the mapping of gender discourses, although not indicating anything regarding women writer’s impact on them. Moreover, these passages have served to illustrate many of the varieties of OCR errors that my topic words give rise to in text from the period I study (frequently in Gothic typeface). This discovery has then been used to improve the topics, which takes us to the next step in the process.

In certain documents one and the same word in the original text has, in the scanning of the document, given rise to up to three different examples of OCR errors(3). This discovery indicates the risk of missing out on potentially relevant documents in the «great unread»(4). If only the correct spelling of the words is included in the topics, potentially valuable documents with our topic words in them, bizarrely spelled because of errors in the scanning, might go unnoticed. In an attempt to meet this challenge I have manually added to the topic the different versions of the words that the OCR errors have given rise to (for instance for the word «kjærlighed» [love] «kjaerlighed», «kjcerlighed», «kjcrrlighed»). We cannot in that case, when we run the topic model, require a one hundred percent topic match, perhaps not even 2/3, as all these OCR errors of the same word are highly unlikely to take place in all potential matches(5). Such extensions of the topics, condition in other words our parameterization of the algorithm: the required value of Jaccard similarity for a passage to be captured has to be revised fairly down. The inconvenience of this approach, however, is the possible high number of captured passages that are exaggeratedly (for our purpose) saturated with the semantic unit in question. Furthermore, if we add to this the different versions of a lexeme and its semantic relatives that in some cases are included in the topic, such as «kvinde», «kvinder», «kvindelig», kvindelighed» [woman, women, feminine, femininity], the topic in question might catch an even larger number of passages with a density of this specific semantic unity with its variations; this is an amount that is not proportional to the overall variety of the topic in question.

This takes us back to the question of what we program the “trawl line” to “require” in order for a passage in the target corpus to qualify as a hit, and as well to how the scores are ranged. How many of the words in the topic, and to what extent do several occurrences of one of the topic’s words, i.e., five occurrences of “woman” in one paragraph interest us? The parameter can be set to range scores in function of the occurrences of the different words forming the topic, meaning that the score for a topic in a captured passage is proportional to the heterogeneity of the occurrences of the topic’s words, not only the quantity. However, in some cases we might, as mentioned, have a topic comprehending several forms of the same lexeme and its semantic relatives and, as described, several versions of the same word due to OCR errors. How can the topic model be programmed in order to take into account such occurrences in the search for matching passages? In order to meet this challenge, a «hyperlexeme sensitive» algorithm has been created (6). This means that the topic model is parameterized to count the lexeme frequency in a passage. It will also range the scores in function of the occurrence of the hyperlexeme, and not treat occurrences of different forms of one lexeme equally to the ones of more semantically heterogenous word-units in the topic. Furthermore, and this is the point to be stressed, this algorithm is programmed to treat miss-spelling of words, due to OCR errors, as if they were different versions of the same hyperlexeme.

The adjustments of the value of the Jaccard similarity and the hyperlexeme parameterization are thus measures conducted in order to compensate for the mentioned inconveniences, and improve and refine the topic model. I will show examples that compare the before and after these parameters were used, in order to discuss how much closer we have got to be able to establish actual links between the sub-corpus, and passages the topics have captured in the target corpus. All the technical concepts will be defined and briefly explained as I get to them in the presentation. The genesis of these measures, tools and ideas at crucial moments in the process, taking place as a result of unexpected findings and interdisciplinary collaboration, will be elaborated on in my presentation, as well as the potential this might offer for new research.

Notes:

(1) My description of the STM process, with the use of tropes such as «trawl line» is inspired by Peter Leonard and Timothy R. Tangherlini (2013): “Trawling in the Sea of the Great Unread: Sub-corpus topic modeling and Humanities research” in Poetics. 41, 725-749

(2) The Jaccard index is taken into account in the ranging of the scores. The best hit passage for a topic, the one with highest score, will be the one with highest relative similarity to the other captured passages, in terms of concentration of topic words in the passage. The parameterized value of the required Jaccard similarity defines the score a passage must receive in order to be included in the list of captured passages from the «great unread».

(3) Some related challenges were described by Kimmo Kettunen and Teemu Ruokolainen in their presentation, «Tagging Named Entities in 19th century Finnish Newspaper Material with a Variety of Tools» at DHN2017.

(4) Franco Moretti (2000) (drawing on Margareth Cohen) calls the enormous amount of works that exist in the world for «the great unread» (limited to Bokhylla’s content in the context of my project) in: «Conjectures of World Literature» in New Left Review. 1, 54-68.

(5) As an alternative to include in the topic all detected spelling variations, due to OCR errors, of the topic words, we will experiment with taking into account the Levenshtein distance when programming the «trawl line». In that case it is not identity between a topic word and a word in a passage in the great unread that matters, but the distance between two words, the minimum number of single-character edits required to change one word into the other, for instance «kuinde»-> «kvinde».

(6) By the term «hyperlexeme» we understand a collection of graphemic occurences of a lexeme, including spelling errors and semantically related forms.

4:00pm - 5:30pm

T-PIV-3: Legal and Ethical Matters
Session Chair: Christian-Emil Smith Ore

PIV

4:00pm - 4:30pm
Long Paper (20+10min) [abstract]

Breaking Bad (Terms of Service)? The DH-scholar as Villain

Pelle Snickars

Umea University,

For a number of years I have been heading a major research project on Spotify (funded by the Swedish Research Council). It takes a software studies and digital humanities approach towards streaming media, and engages in reverse engineering Spotify’s algorithms, aggregation procedures, metadata, and valuation strategies. During the summer of 2017 I received an email from a Spotify legal counsel who was ”concerned about information it received regarding methods used by the responsible group of researchers in this project. This information suggests that the research group systematically violated Spotify’s Terms of Use by attempting to artificially increase plays, among others, and to manipulate Spotify’s services with the help of scripts or other automated processes.” I responded politely—but got no answer. A few weeks later, I received a letter from the senior legal advisor at my university. Spotify had apparently contacted the Research Council with the claim that our research was questionable in a way that would demand “resolute action”, and the possible termination of financial and institutional support. At the time of writing it is unclear if Spotify will file a lawsuit—or start a litigation process.

DH-research is embedded in ’the digital’—and so are its methods, from scraping web content to the use of bots as research informants. Within scholarly communities centered on the study of the web or social media there is a rising awareness of the ways in which digital methods might be non-compliant with commercial Terms of Service (ToS)—a discussion which has not yet filtered out and been taken serious within the digital humanities. However, DH-researchers will in years to come increasingly have to ask themselves if their scholarly methods need to abide by ToS—or not. As social computing researcher Amy Bruckman has stated, it might have profound scholarly consequences: ”Some researchers choose not to do a particular piece of work because they believe they can’t violate ToS, and then another researcher goes and does that same study and gets it published with no objections from reviewers.”

My paper will recount my legal dealings with Spotify—including a discussion of the digital methods used in our project—but also more generally reflect around the ethical implications of collecting data in novel ways. ToS are contracts—not the law, still there is a dire need for ethical justifications and scholarly discussions why the importance of academic research justifies breaking ToS.

4:30pm - 4:45pm
Short Paper (10+5min) [abstract]

Legal issues regarding tradition archives: the Latvian case study.

Liga Abele, Anita Vaivade

Latvian Academy of Culture,

Historically, the tradition archives have had their course of development rather apart, both in form and in substance, from the formation process of other types of cultural heritage collections held by “traditional” archives, museums and libraries. However, for positive influence of current trends in development of technical and institutional capacities to be fully exercised on the managerial approaches, there must be increased legal certainty regarding status and functioning of the tradition archives. There are several trajectories through which tradition archives can be and are influenced by the surrounding legal and administrative framework both at national and regional level. A thorough knowledge of the impact from the existing regulatory base can contribute to informed decision making in consistence with the role that these archives play in safeguarding the intangible cultural heritage. In the paper a case study of the current Latvian situation would be presented within a broader regional perspective of the three Baltic states. The legal framework of interest is defined by the institutional status of tradition archives, the legal status of the collections, as well as legal provisions and restrictions regarding functioning (work) that involves gathering, processing and further use of the archive material.

The paper is based on the data gathered within the EUSBSR Seed Money Facility project “DigArch_ICH. Connecting Digital Archives of Intangible Heritage” (No. S86, 2016/2017) executed in partnership of the Institute of Literature, Folklore and Art of the University of Latvia, the Estonian Folklore Archives, the Estonian Literary Museum, the Norwegian Museum of Cultural History and the Institute for Language and Folklore in Sweden. One of the several thematic lines of the project dealt with legal and ethical issues, asking national experiences about legal concerns and restrictions for the work of tradition archives, legal status of collections of tradition archives, practice on signing written agreements between researcher and informant, as well as existing codes of ethics that would be applied to the work of tradition archives. Responses were received from altogether 21 institutions from 11 countries of the Baltic Sea region and neighbouring countries.

Fields of the legislation involved.

There are several fields of national legislation that influence the work of tradition archives, such as the regulations on intangible heritage, documentary heritage, work of memory institutions (archives, museums, libraries), intellectual property and copyright, as well as protection of personal data. Depending on the national legislative choices and contexts, these fields may be regulated by separate laws, or overarching laws (for instance, in the field of cultural heritage, covering both intangible as well as documentary heritage protection), or some of these fields may remain uncovered by the national legislation.

According to the results of the survey, the legal status of the tradition archives can be rather diverse. They can be part of larger institutions such as universities, museums, or libraries. In the Latvian situation, there are specific laws for every type of the above-mentioned institutions that can entail large variety of rule sets. The status of the collections can differ also depending on whether they are recognised as part of the protected national treasures (such as the national collections of museums, archives etc.). The ownership status can be rather divers, taking into consideration the collections belonging to the state or privately owned. Moreover, ownership rights of the same collection can be split between various types of owners of similar or varied legal status. The paper proposes to identify and analyse in the Latvian situation the consequences for the collections of the tradition archives depending on the institutional status of their holder, their ownership status and influence exercised by legislation in the fields of copyright and intellectual property law as well as data protection. The Latvian case would be put into perspective of the experience by the Estonian and the Lithuanian situation.

International influence on the national legislation.

The national legislation is influenced by the international normative instruments of different level, ranging from the global perspective (the UNESCO) to the regional level, in this case – the European scope. At the global level there are several instruments ranging from legally binding instruments to the “soft law”, such as the 2003 UNESCO Convention for the Safeguarding of the Intangible Cultural Heritage or the 2015 UNESCO Recommendation Concerning the Preservation of, and Access to, Documentary Heritage Including in Digital Form. Concerning the work of tradition archives, this 2003 Convention namely relates to the documentation of intangible cultural heritage as well as establishment of national inventories of such heritage, and in this regard tradition archives may have or establish their role in the national policy implementation processes. The European regional legislation and policy documents are of relevance, adopted either by the European Council, or by the European Union. They concern the field of cultural heritage (having a general direction towards an integrated approach towards various cultural heritage fields), as well as private data protection and copyright and intellectual property rights. The role of the legally binding legal instruments of the European Union, such as directives and regulations, would be examined through perspective of the national legislation related to the tradition archives

Aspects of deontology.

As varied deontological aspects affect functioning of the tradition archives, these issues will be examined in the paper. There are national codes of ethics that may apply to the work of tradition archives, either from the perspective of research or in relation to archival work. Within the field of intangible cultural heritage, the issues of ethics have been also debated internationally over the recent years, with recognised topicality as for different stakeholders involved. Thus, UNESCO Intergovernmental Committee for the Safeguarding of the Intangible Cultural Heritage adopted in 2015 the Ethical Principles for Safeguarding Intangible Cultural Heritage. This document has a role of providing recommendations to various persons and institutions that are part of safeguarding activities, and this concerns also the work of tradition archives. There are also international deontology documents that concern the work of archives, museums and libraries. These documents would be referred to in a complementary manner, taking into consideration the specificity of tradition archives. Namely, the 1996 International Council of Archives (ICA) Code of Ethics. Although this code of ethics does not highlight archives that deal with folklore and traditional culture materials, it nevertheless sets general principles for the archival work, as well as cooperation of archives, and puts an emphasis also to the preservation of the documentary heritage. Another important deontological reference for tradition archives concerns the work of museum, which is particularly significant for archives that function as units in larger institutions – museums. Internationally well-known and often mentioned reference is the 2004 (1986) International Council of Museums (ICOM) Code of Ethics for Museums. A reference may be given also to the 2012 International Federation of Library Associations (IFLA) Code of Ethics for Librarians and other Information Workers.

4:45pm - 5:00pm
Short Paper (10+5min) [abstract]

Where are you going, research ethics in Digital Humanities?

Sari Östman, Elina Vaahensalo, Riikka Turtiainen

University of Turku,

1 Background

In this paper we will examine the current state and future development of research ethics among Digital Humanities. We have analysed

a) ethics-focused inquiries with researchers in a multidisciplinary consortium project (CM)

b) Digital Humanities -oriented journals and

c) the objectives of the DigiHum Programme at the Academy of Finland, ethical guidelines of AoIR (Association of Internet Researchers. AoIR has published an extensive set of ethical guidelines for online research in 2002 and 2012) and academical ethical boards and committees, in particular the one at the University of Turku. We are planning on analysing the requests for comments which have not been approved in the ethical board at the Univ. of Turku. For that, we need a research permission from administration of University of Turku – which is in process.

Östman and Vaahensalo work in the consortium project Citizen Mindscapes (CM), which is part of the Academy of Finland’s Digital Humanities Programme. University Lecturer Turtiainen is using a percentage of her work time for the project.

In the Digital Humanities program memorandum, ethical examination of the research field is mentioned as one of the main objectives of the program (p. 2). The CM project has a work package for researching research ethics, which Östman is leading. We aim at examining the current understanding of ethics in multiple disciplines, in order to find some tools for more extensive ethical considerations especially in multidisciplinary environments. This kind of a toolbox would bring more transparency into multidisciplinary research.

Turtiainen and Östman have started developing the ethical toolbox for online research already in their earlier publications (see f. ex. Turtiainen & Östman 2013; Östman & Turtiainen 2016; Östman, Turtiainen & Vaahensalo 2017). The current phase is taking the research of the research ethics into more analytical level.

2 Current research

When we are discussing such a field of research as Digital Humanities, it is quite clear than online specific research ethics (Östman & Turtiainen 2016; Östman, Turtiainen & Vaahensalo 2017) plays on especially significant role in it. Research projects often concentrate on one source or topic with a multidisciplinary take: the understandings of research ethics may fundamentally vary even inside the same research community. Different ethical focal points and varying understandings could be a multidisciplinary resource, but it is essential to recognize and pay attention to the varying disciplinary backgrounds as well as the online specific research contexts. Only by taking these matters into consideration, we are able to create some functional ethical guidelines for multidisciplinary online-oriented research.

The Inquiries in CM24

On the basis of the two rounds of ethical inquiry within the CM24 project, the researchers seemed to consider most focal such ethical matters as anonymization, dependence on corporations, co-operation with other researchers and preserving the data. By the answers ethical views seemed to

a) individually constructed: the topic of research, methods, data plus the personal view to what might be significant

b) based on one’s education and discipline tradition

c) raised from the topics and themes the researcher had come in touch with during the CM24 project (and in similar multidisciplinary situations earlier)

One thing seemingly happening with current trend of big data usage, is that even individually produced online material is seen as mass; faceless, impersonalized data, available to anyone and everyone. This is an ethical discussion which was already on in the early 2000’s (see f. ex. Östman 2007, 2008; Turtiainen & Östman 2009) when researchers turned their interest in online material for the first time. It was not then, and it is not now, ethically durable research, to consider the private life- and everyday -based contents of individual people as ’take and run’ -data. However, this seems to be happening again, especially in disciplines where ethics has mostly focused on copyrights and maybe corporal and co-operational relationships. (In the CM24 for example information science seems to be one of the disciplines where intimate data is used as faceless mass.) Then again, a historian among the project argues in their answer, that already choosing an online discussion as an object to research is an ethical choice, ”shaping what we can and should count in into the actual research”.

Neither one of above-mentioned ethical views is faulty. However, it might be difficult for these two researchers to find a common understanding about ethics, in for example writing a paper together. A multifaceted, generalized collection of guidelines for multidisciplinary research would probably be of help.

Digital Humanities Journals and Publications

To explore ethics in digital humanities, we needed a diverse selection of publications to represent research in Digital Humanities. Nine different digital humanities journals were chosen for analysis, based on the listing made by Berkeley University. The focus in these journals varies from pedagogy to literary studies. However, they all are digital humanities oriented. The longest-running journal on the list has been published since 1986 and the most recent journals have been released for the first time in 2016. The journals therefore cover the relatively long-term history of digital humanities and a wide range of multi- and interdisciplinary topics.

In the journals and in the articles published in them, research ethics is clearly in the side, even though it is not entirely ignored. In the publications, research ethics is largely taken into account in the form of source criticism. Big data, digital technology and copyright issues related to research materials and multidisciplinary cooperation are the most common examples of research ethical considerations. Databases, text digitization and web archives are also discussed in the publications. These examples show that research ethics also affect digital humanities, but in practice, research ethics are relatively scarce in publications.

Publications of the CM project were also examined, including some of our own articles. Except for one research ethics oriented article (Östman & Turtiainen 2016) most of the publications have a historical point of view (Suominen 2016; Suominen & Sivula 2016; Saarikoski 2017; Vaahensalo 2017). For this reason, research ethics is reflected mainly in the form of source criticism and transparency. Ethics in these articles is not discussed in more length than in most of the examined digital humanities publications.

Also in this area, a multifaceted, generalized collection of guidelines for multidisciplinary research would probably be of benefit: it would be essentially significant to increase the transparency in research reporting, especially in Digital Humanities, which is complicated and multifaceted of disciplinary nature. Therefore more thorough reporting of ethical matters would increase the transparency of the nature of Digital Humanities in itself.

The Ethics Committee

The Ethics committee of the University of Turku follows the development in the field of research ethics both internationally and nationally. The mission of the committee is to maintain a discussion on research ethics, enhance the realisation of ethical research education and give advice on issues related to research ethics. At the moment its main purpose is to assess and give comments on the research ethics of non-medical research that involves human beings as research subjects and can cause either direct or indirect harm to the participants.

The law about protecting personal info of private citizens appears to be a significant aspect of research ethics. Turtiainen (member of the committee) states that, at the current point, one of the main concerns seems to be poor data protection. The registers constructed of the informant base are often neglected among the humanities, whereas such disciplines as psychology and wellfare research approximately consider them on the regular basis. Then again, the other disciplines do not necessarily consider other aspects of vulnerability so deeply as the (especially culture/tradition-oriented) humanists seem to do.

Our aim is to analyse requests for comments which have not been approved and have therefore been asked to modify before recommendation or re-evaluation. Our interest focuses in arguments that have caused the rejection. Before that phase of our study we need a research permission of our own from the administration of University of Turku – which is in process. It would be an interesting viewpoint to compare the rejected requests for comments from the ethics committee to the results of ethical inquiries within the CM24 project and the outline of research ethics in digital humanities journals and publications.

3 Where do you go now…

According to our current study, it seems that the position of research ethics in Digital Humanities and, more widely, in multidisciplinary research, is somewhat two-fold:

a) for example in the Digital Humanities Program of the Academy of Finland, the significance of ethics is strongly emphasized and the research projects among the program are being encouraged to increase their ethical discussions and the transparency of those. The discourse about and the interest in developing online-oriented research ethics seems to be growing and suggesting that ’something should be done’; the ethical matters should be present in the research projects in a more extensive way.

b) however, it seems that in practice the position of research ethics has not changed much within the last 10 years or so, despite the fact that the digital research environments of the humanities have become more and more multidisciplinary, which leads to multiple understandings about ethics even within individual research projects. Yet, the ethics in research reports is not discussed in more length / depth than earlier. Even in Digital Humanities -oriented journals, ethics is mostly present in a paragraph or two, repeating a few similar concerns in a way which at times seems almost ’automatic’; that is, as if the ethical discussion would have been added ’on the surface’ hastily, because it is required from the outside.

This is an interesting situation. There is a possibility that researchers are not taking seriously the significance of ethical focal points in their research. This is, however, an argument that we would not wish to make. We consider it more likely that in the ever-changing digital research environment, the researches lack multidisciplinary tools for analyzing and discussing ethical matters in the depth that is needed. By examining the current situation extensively, our study is aiming at finding the focal ethical matters in multidisciplinary research environments, and at constructing at least a basic toolbox for Digital Humanities research ethical discussions.

Sources and Literature

Inquiries made by Östman, Turtiainen and Vaahensalo with the researchers the Citizen Mindscapes 24 project. Two rounds in 2016–2017.

Digital Humanities (DigiHum). Academy Programme 2016–2019. Programme memorandum. Helsinki: Academy of Finland.

Digital Humanities journals listed by Digital Humanities at Berkeley. http://digitalhumanities.berkeley.edu/resources/digital-humanities-journals

Markham, Annette & Buchanan, Elizabeth 2012: Ethical Decision-Making and Internet Research: Recommendations from the AoIR Ethics Working Committee (Version 2.0). https://aoir.org/reports/ethics2.pdf.

Saarikoski, Petri: “Ojennat kätesi verkkoon ja joku tarttuu siihen”. Kokemuksia ja muistoja kotimaisen BBS-harrastuksen valtakaudelta. Tekniikan Waiheita 2/2017.

Suominen, Jaakko (2016): ”Helposti ja halvalla? Nettikyselyt kyselyaineiston kokoamisessa.” In: Korkiakangas, Pirjo, Olsson, Pia, Ruotsala, Helena, Åström, Anna-Maria (eds.): Kirjoittamalla kerrotut – kansatieteelliset kyselyt tiedon lähteinä. Ethnos-toimite 19. Ethnos ry., Helsinki, 103–152. [Easy and Cheap? Online surveys in cultural studies.]

Suominen, Jaakko & Sivula, Anna (2016): “Digisyntyisten ilmiöiden historiantutkimus.” In Elo, Kimmo (ed.): Digitaalinen humanismi ja historiatieteet. Historia Mirabilis 12. Turun Historiallinen Yhdistys, Turku, 96–130. [Historical Research of Born Digital Phenomena.]

Turtiainen, Riikka & Östman, Sari 2013: Verkkotutkimuksen eettiset haasteet: Armi ja anoreksia. In: Laaksonen, Salla-Maaria et. al. (eds.): Otteita verkosta. Verkon ja sosiaalisen median tutkimusmenetelmät. Tampere: Vastapaino. pp. 49–67.

– 2009: ”Tavistaidetta ja verkkoviihdettä – omaehtoisten verkkosisältöjen tutkimusetiikkaa.” Teoksessa Grahn, Maarit ja Häyrynen, Maunu (toim.) 2009: Kulttuurituotanto – Kehykset, käytäntö ja prosessit. Tietolipas 230. SKS, Helsinki. 2009. s. 336–358.

Vaahensalo, Elina: Kaikenkattavista portaaleista anarkistiseen sananvapauteen – Suomalaisten verkkokeskustelufoorumien vuosikymmenet. Tekniikan Waiheita 2/2017.

Östman, Sari 2007: ”Nettiksistä blogeihin: Päiväkirjat verkossa.” Tekniikan Waiheita 2/2007. Tekniikan historian seura ry. Helsinki. 37–57.

Östman, Sari 2008: ”Elämäjulkaiseminen – omaelämäkerrallisten traditioiden kuopus.” Elore, vol. 15-2/2008. Suomen Kansantietouden Tutkijain Seura. http://www.elore.fi./arkisto/2_08/ost2_08.pdf.

Östman, Sari & Turtiainen, Riikka 2016: From Research Ethics to Researching Ethics in an Online Specific Context. In Media and Communication, vol. 4. iss. 4. pp. 66¬–74. http://www.cogitatiopress.com/ojs/index.php/mediaandcommunication/article/view/571.

Östman, Sari, Riikka Turtiainen & Elina Vaahensalo 2017: From Online Research Ethics to Researching Online Ethics. Poster. Digital Humanities in the Nordic Countries 2017 Conference.

5:00pm - 5:15pm
Short Paper (10+5min) [abstract]

Olivier Charbonneau

Concordia University,

Copyright, caught in a digital maelstrom of perpetual reforms and shifting commercial practices, exacerbates tensions between cultural stakeholders. On the one hand, copyright seems to be drowned in Canada and the USA by the role reserved to copyright exceptions by parliaments and the courts. On the other, institutions, such as libraries, are keen to navigate digital environments by allocating their acquisitions budgets to digital works. How can markets, social systems and institutions emerge or interact if we are not able to resolve this tension?

Beyond the paradigm shifts brought by digital technologies or globalization, one must recognize the conceptual paradox surrounding digital copyrighted works. In economic terms, they behave naturally as public goods, while copyright attempts to restore their rivalrousness and excludability. Within this paradox lies tension, between the aggregate social wealth spread by a work and its commoditized value, between network effects and reserved rights.

In this paper, I will summarize the findings of my doctoral research project and apply them to the case of digital games in libraries.

The goal of my doctoral work was to ascertain the role of libraries in the markets and social systems of digital copyrightable works. Ancillary goals included exploring the “border” between licensing and exceptions in the context of heritage institutions as well as building a new method for capturing the complexity of markets and social systems that stem from digital protected works. To accomplish these goals, I analysed a dataset comprising of the terms and conditions of licenses held by academic libraries in Québec. I show that the terms of these licences overlap with copyright exceptions, highlighting how Libraries express their social mission in two normative contexts: positive law (copyright exceptions) and private ordering (licensing). This overlap is both necessary yet poorly understood - they are not two competing institutional arrangements but the same image reflected in two distinct normative settings. It also provides a road-map for right-holders of how to make digital content available through libraries.

The study also points to the rising importance of automation and computerization in the provisioning of licences in the digital world. Metadata describing the terms of a copyright licence are increasingly represented in computer models and leveraged to mobilize digital corpus for the benefit of a community. Whereas the print world was driven by assumptions and physical limits to using copyrighted works, the digital environment introduces new data points for interactions which were previously hidden from scrutiny. The future lies not in optimizing transaction costs but in crafting elegant institutional arrangements through licensing.

If libraries exist to capture some left-over value in the utility curve of our cultural, informational or knowledge markets, the current role they play in copyright need not change in the digital environment. What does change, however, is hermeneutics: how we attribute value to digital copyrighted works and how we study society’s use of them.

We conclude by transposing the results of this study to the case of digital games. Québec is currently a hotbed for both independent and AAA video game studios. Despite this, a market failure currently exists due to the absence of flexible licensing mechanisms to make indie games available through libraries. This part of the study was funded with the generous support from the Knight Foundation in the USA and conducted at the Technoculture Art & Games (TAG) research cluster of the Milieux Institute for arts, culture and technology at Concordia University in Montréal, Canada.

4:00pm - 5:30pm

T-P674-3: Database Design
Session Chair: Jouni Tuominen

P674

4:00pm - 4:30pm
Long Paper (20+10min) [publication ready]

Open Science for English Historical Corpus Linguistics: Introducing the Language Change Database

Joonas Kesäniemi¹, Turo Vartiainen², Tanja Säily², Terttu Nevalainen²

¹University of Helsinki, Helsinki University Library; ²University of Helsinki, Department of Modern Languages

This paper discusses the development of an open-access resource that can be used as a baseline for new corpus-linguistic research into the history of English: the Language Change Database (LCD). The LCD draws together information extracted from hundreds of corpus-based articles that investigate the ways in which English has changed in the course of history. The database includes annotated summaries of the articles, as well as numerical data extracted from the articles and transformed into machine-readable form, thus providing scholars of English with the opportunity to study fundamental questions about the nature, rate and direction of language change. It will also make the work done in the field more cumulative by ensuring that the research community will have continuous access to existing results and research data.

We will also introduce a tool that takes advantage of this new source of structured research data. The LCD Aggregated Data Analysis workbench (LADA) makes use of annotated versions of the numerical data available from the LCD and provides a workflow for performing meta-analytical experimentations with an aggregated set of data tables from multiple publications. Combined with the LCD as the source of collaborative, trusted and curated linked research data, the LADA meta-analysis tool demonstrates how open data can be used in innovative ways to support new research through data-driven aggregation of empirical findings in the context of historical linguistics.

4:30pm - 4:45pm
Short Paper (10+5min) [abstract]

“Database Thinking and Deep Description: Designing a Digital Archive of the National Synchrotron Light Source (NSLS)”

Elyse Graham

Stony Brook University,

Our project involves developing a new kind of digital resource to capture the history of research at scientific facilities in the era of the “New Big Science.” The phrase “New Big Science” refers to the post-Cold War era at US national laboratories, when large-scale materials science accelerators rather than high-energy physics accelerators became marquee projects at most major basic research laboratories. The extent, scope, and diversity of research at such facilities makes keeping track of it difficult to compile using traditional historical methods and linear narratives; there are too many overlapping and bifurcating threads. The sheer number of experiments that took place at the NSLS, and the vast amount of data that it produced across many disciplines, make it nearly impossible to gain a comprehensive global view of the knowledge production that took place at this facility.

We are therefore collaborating to develop a new kind of digital resource to capture the full history of this research. This project will construct a digital archive, along with an associated website, to obtain a comprehensive history of the National Synchrotron Light Source at Brookhaven National Laboratory. The project specifically will address the history of “the New Big Science” from the perspectives of data visualization and the digital humanities, in order to demonstrate that new kinds of digital tools can archive and present complex patterns of research and configurations of scientific infrastructure. In this talk, we briefly discuss methods of data collection, curation, and visualization for a specific case project, the NSLS Digital Archive.

4:45pm - 5:00pm
Distinguished Short Paper (10+5min) [publication ready]

Integrating Prisoners of War Dataset into the WarSampo Linked Data Infrastructure

Mikko Koho¹, Erkki Heino¹, Esko Ikkala¹, Eero Hyvönen^1,2, Reijo Nikkilä³, Tiia Moilanen³, Katri Miettinen³, Pertti Suominen³

¹Semantic Computing Research Group (SeCo), Aalto University, Finland; ²HELDIG - Helsinki Centre for Digital Humanities, University of Helsinki, Finland; ³The National Prisoners of War Project

One of the great promises of Linked Data and the Semantic Web standards is to provide a shared data infrastructure into which more and more data can be imported and aligned, forming a sustainable, ever growing knowledge graph or linked data cloud, Web of Data. This paper studies and evaluates this idea in the context of the WarSampo Linked Data cloud, providing an infrastructure for data related to the Second World War in Finland. As a case study, a new database of prisoners of war with related contents is considered, and lessons learned discussed in relation to using traditional data publishing approaches.

5:00pm - 5:15pm
Short Paper (10+5min) [abstract]

"Everlasting Runes": A Research Platform and Linked Data Service for Runic Research

Magnus Källström¹, Marco Bianchi², Marcus Smith¹

¹Swedish National Heritage Board; ²Uppsala University

"Everlasting Runes" (Swedish: "Evighetsrunor") is a three-year collaboration between the Swedish National Heritage Board and Uppsala University, with funding provided by the Bank of Sweden Tercentenary Foundation (Riksbankens jubileumsfond) and the Royal Swedish Academy of Letters (Kungliga Vitterhetsakademien). The project combines philology, archaeology, linguistics, and information systems, and is comprised of several research, digitisation, and digital development components. Chief among these is the development of a web-based research platform for runic researchers, built on linked open data services, with the aim of drawing together disparate structured digital runic resources into a single convenient interface. As part of the platform's development, the corpus of Scandinavian runic inscriptions in Uppsala University's Runic Text Database will be restructured and marked up for use on the web, and linked against their entries in the previously digitised standard corpus work (Sveriges runinskrifter). In addition, photographic archives of runic inscriptions from the 19th- and 20th centuries from both the Swedish National Heritage Board archives and Uppsala University library will be digitised, alongside other hitherto inaccessible archive material.

As a collaboration between a university and a state heritage agency with a small research community as its primary target audience, the project must bridge the gap between the different needs and abilities of these stakeholders, as well as resolve issues of long-term maintenance and stability which have previously proved problematic for some of the source datasets in question. It is hoped that the resulting research- and data platforms will combine the strengths of both the National Heritage Board and Uppsala university to produce a rich, actively-maintained scholarly resource.

This paper will present the background and aims of the project within the context of runic research, as well as the various datasets that will be linked together in the research platform (via its corresponding linked data service) with particular focus on the data structures in question, the philological markup of the corpus of inscriptions, and requirements gathering.

5:15pm - 5:30pm
Distinguished Short Paper (10+5min) [abstract]

Designing a Generic Platform for Digital Edition Publishing

Niklas Liljestrand

Svenska litteratursällskapet i Finland r.f.,

This presentation describes the technical design for streamlining work with publishing Digital Editions on the web. The goal of the project is to provide a platform for scholars working with Digital Editions to independently create, edit, and publish their work. The platform is to be generic, but with set rules of conduct and processes, providing rich documentation of use.

The work on the platform started during 2016, with a rebuild of the website for Zacharias Topelius Skrifter for the mobile web (presented during DHN 2017, http://dhn2017.eu/abstracts/#_Toc475332550). The work continues with building the responsive site to be easily customizable and suite the different Editions needs.

The platform will consist of several independent tools, such as tools for publishing, version comparison, editing, and tagging XML TEI formatted documents. Many of the tools are already available today, but they are heavily dependent on customization for each new edition and MS Windows only. For the existing tools, the project aims to combine, simplify and make the tools platform independent.

The project will be completed within 2018 and the aim is to publish all tools and documentation as open-source.

5:30pm - 7:00pm

DHN Annual meeting

PII, Porthania

7:30pm - 10:00pm

Conference dinner

Restaurant Sipuli, Kanavaranta 7