Digital Humanities in the Nordic Countries
3rd Conference

7–9 March 2018, Helsinki

JavaScript is Disabled
Your browser's JavaScript functionality is disabled. We recommend enabling it to benefit from all options of ConfTool.
Here you can find information on how to enable JavaScript
If you have any problems, please contact the organisers at dhn-2018@helsinki.fi.

Conference Agenda

Overview and details of the sessions of this conference. Please select a date or location to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).

Session Overview

Session

T-P674-2: Crowdsourcing and Collaboration

Time:

Thursday, 08/Mar/2018:

2:00pm - 3:30pm

Session Chair: Hannu Salmi

Location: P674

Presentations

2:00pm - 2:30pm
Long Paper (20+10min) [abstract]

From crowdsourcing cultural heritage to citizen science: how the Danish National Archives 25-year old transcription project is meeting digital historians

Barbara Revuelta-Eugercios^1,2, Nanna Floor Clausen¹, Katrine Tovgaard-Olsen¹

¹Rigsarkivet (Danish National Archives); ²Saxo Institute, University of Copenhagen

The Danish National Archives have the oldest crowdsourcing project in Denmark, with more than 25 million records transcribed that illuminate the live and deaths of Danes since the early 18th century. Until now, the main group interested in creating and using these resources have been amateur historians and genealogists. However, it has become clear that the material also holds immense value to historians, armed with the new digital methods. The rise of citizen science projects show, likewise, an alternative way, with clear research purposes, of using the crowdsourcing of cultural heritage material. How to reconcile the traditional crowd-centered approach of the existing projects, to the extent that we can talk about co-creation, with the narrowly-defined research questions and methodological decisions researchers required? How to increase the use of these materials by digital historians without losing the projects’ core users?

This article articulates how the Danish National Archives are answering these questions. In the first section, we discuss the tensions and problems of combining crowdsourcing digital heritage and citizen science; in the second, the implications of the crowd-centered nature of the project in the incorporation of research interests; and in the third one, we present the obstacles and solutions put in place to successfully attract digital historians to work on this material.

Crowdsourcing cultural heritage: for the public and for the humanists

In the last decades, GLAMs (galleries, libraries, archives and museums) have been embarked in digitalization projects to broaden the access, dissemination and appeal of their collections, as well as enriching them in different ways (tagging, transcribing, etc.), as part of their institutional missions. Many of these efforts have included audience or community participation, which can be loosely defined as either crowdsourcing or activities that predate or conform to the standard definition of crowdsourcing, taking Howe’s (2006) business-related definition as “the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call” (Ridge 2014). However, the key feature that differentiates these crowdsourcing cultural heritage projects is that the work the crowd performs has never been undertaken by employees. Instead, they co-create new ways for the collections to be made available, disseminated, interpreted, enriched and enjoyed that could never had been paid for within their budgets.

These projects often feature “the crowd” at both ends of the process: volunteers contribute to improve access to and availability of the collections, which in turn will benefit the general public from which volunteers are drawn. In the process, access to the digital cultural heritage material is democratized and facilitated, transcribing records, letters, menus, tagging images, digitizing new material, etc. As a knock-on effect, the research community can also benefit, as the new materials open up possibilities for researchers in the digital humanities. The generally financially limited Humanities projects could never achieve the transcription of millions of records.

At the same time, there has been a strand of academic applications of crowdsourcing in Humanities projects (Dunn and Hedges 2014). These initiatives fall within the so-called citizen science projects, which are driven by researchers and narrowly defined to answer a research question, so the tasks performed by the volunteers are lined up to a research purpose. Citizen science or public participation on scientific research, that emerged out of natural sciences projects in the mid-1990s (Bonney et al 2009), has branched out to meet the Humanities, building on a similar utilization of the crowd, i.e. institutional digitalization projects of cultural heritage material. In particular, archival material has been a rich source for such endeavours: weather observations from ship logs in Old Weather (Blaser 2014), Benthan’s works in Transcribe Bentham (Causer & Terras 2014) or restaurant menus on What’s on the menu (2014). While some of them have been carried out in cooperation with the GLAMs responsible for those collections, the new opportunities opened up for the digital humanities allow these projects to be carried out by researchers independently from the institutions that host the collections, missing a great opportunity to combine interests and avoid duplicating work.

Successfully bringing a given project to contribute to crowdsourcing cultural heritage material and citizen science faces many challenges. First, a collaboration needs to be established across at least two institutional settings – a GLAMs and a research institution- that have very different institutional aims, funding, culture and legal frameworks. GLAMs foundational missions often relate to serving the public in general first, the research community being only a tiny percentage of its users. Any institutional research they undertake on the collections is restricted to particular areas or aspects of the collections and institutional interest which, on the other hand, is less dependent on external funding. The world of Academia, on the other hand, has a freer approach to formulating research questions but is often staffed with short-term positions and projects, time-constraints and a need of immediacy of publication and the ever-present demand for proving originality and innovation.

Additionally, when moving from cultural heritage dissemination to research applications, a wide set of issues also come into view in these crowdsourcing works that can determine their development and success: the boundaries between professional and lay expertise, the balance of power in the collaboration between the public, institutions and researchers, ethical concerns in relation to data quality and data property, etc. (Riesh 2014, Shirk et al 2012).

The Danish National Archives crowd-centered crowdsourced 25-year-old approach

In this context, the Danish National Archives are dealing with the challenge of how to incorporate a more citizen-science oriented approach and attract historians (and digital humanists) to work with the existing digitized sources while maintaining its commitment to the volunteers. This challenge is of a particular difficulty in this case because not only the interests of the archives and researchers need to align, but also those of the “crowd” itself, as volunteers have played a major role in co-creating crowdsourcing for 25 years.

The original project, now the Danish Demographic Database, DDD, (www.ddd.dda.dk), is the oldest “crowdsourcing project” in the country. It started in 1992 thanks to the interest of the genealogical communities in coordinating the transcription of historical censuses and church books. (Clausen & Jørgensen 2000). From its beginning, the volunteers were actively involved in the decision-making process of what was to be done and how, while the Danish National Archives (Rigsarkivet) were in charge of coordination, management and dissemination functions. Thus, there has been a dual government of the project and a continuous conversation of negotiation of priorities, in the form of, a coordination committee, which combines members of the public and genealogical societies as well as Rigsarkivet personel.

This tradition of co-creation has shaped the current state of the project and its relationship to research. The subsequent Crowdsourcing portal, CS, (https://cs.sa.dk/), which started in 2014 with a online interface, broadened the sources under transcription and the engagement with volunteers (in photographing, counselling, etc.), and maintains a strong philosophy of serving the volunteers’ wishes and interests, rather than imposing particular lines. Crowdsourcing is seen as more than a framework for creating content: it is also a form of engagement with the collections that benefits both audiences and archive. However, it has also introduced some citizen-science projects, in which the transcriptions are intended to be used for research (e.g. the Criminality History project).

Digital history from the crowdsourced material: present and future

In spite of that largely crowd-oriented nature of this crowdsourcing project, there were also broad research interests (if not a clearly defined research project) behind the birth of DDD, so that the decisions taken in its setup ensured that the data was suitable for research. Dozens of projects and publications have made use of it, applying new digital history methods, and the data has been included in international efforts, such as the North Atlantic Population Project (NAPP.org).

However, while amply known in genealogist and amateur historian circles, the Danish National Archives large crowdsourcing projects are still either unknown or taken advantage of by historians and students in the country. Some of the reasons are related to field-specific developments, but one of the key constraints towards a wider use is, undoubtedly, the lack of adequate training. There is no systematic training for dealing with historical data or digital methods in the History degrees, even when we are witnessing a clear increase in the digital Humanities.

In this context, the Danish National Archives are trying to push their material into the hands of more digital historians, building bridges to the Danish universities by multiple means: collaboration with universities in seeking joint research projects and applications (SHIP and Link Lives project); active dissemination of the material for educational purposes across disciplines (Supercomputer Challenge at Southern Denmark University ); addressing the lack of training and familiarity of students and researchers with it through targeted workshops and courses, including training in digital history methods (Rigsarkivets Digital History Labs); and promotion of an open dialogue with researchers to identify more sources that could combine the aims of access democratization and citizen science.

References

Blaser, L., 2014 “Old Weather: approaching collections from a different angle” in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 45-56.

Bonney et al. 2009. Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. Center for Advancement of Informal Science Education (CAISE), Washington, DC

Clausen, N.C and Marker, H.J., 2000, ”The Danish Data Archive” in Hall, McCall, Thorvaldsen International historical microdata for population research, Minnesota Population Center . Minneapolis, Minnesota, 79-92,

Causer, T. and Terras, M. 2014, ”‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections”, in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 57-88.

Dunn, S. and Hedges, M. 2014“How the crowd can surprise us: Humanities crowd-sourcing and the creation of knowledge”, in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 231-246.

Howe, J. 2006, “The rise of crowdsourcing”, Wired, June.

Ridge, M. 2014, “Crowdsourcing our cultural heritage: Introduction”, in Ridge (ed) Crowdsourcing our Cultural Heritage, Ashgate, 1-16.

Riesch, H., Potter, C., 2014. Citizen science as seen by scientists: methodological, epistemological and ethical dimensions. Public Understanding of Science 23 (1), 107–120

Shirk, J.L et al, 2012. Public participation in scientific research: a framework for deliberate design. Ecology and Society 17 (2),

Revuelta-Eugercios-From crowdsourcing cultural heritage to citizen science-197_a.pdf

Revuelta-Eugercios-From crowdsourcing cultural heritage to citizen science-197_c.pdf

2:30pm - 2:45pm
Short Paper (10+5min) [abstract]

CAWI for DH

Jānis Daugavietis, Rita Treija

Institute of Literature, Folklore and Art - University of Latvia

Survey method using questionnaire for acquiring different kinds of information from the population is old and classic way to collect the data. As examples of such surveys we can trace back to the ancient civilizations, like censuses or standardised agricultural data recordings. The main instrument of this method is question (closed-ended or open-ended) which should be asked exactly the same way to all the representatives of surveyed population. During the last 20-25 years the internet survey method (also called web, electronic, online, CAWI [computer assisted web interview] etc.) is well developed and and more and more frequently employed in social sciences and marketing research, among others. Usually CAWI is designed for acquiring quantitative data, but as in other most used survey modes (face-to-face paper assisted, telephone or mail interviews) it can be used to collect qualitative data, like un- or semi-structured text/ speech, pictures, sounds etc.

In recent years DH (digital humanities) starting to use more often the CAWI alike methodology. At the same time the knowledge of humanitarians in this field is somehow limited (because lack of previous experience and in many cases - education, humanitarian curriculum usually does not include quantitative methods). The paper seeks to analyze specificity of CAWI designed for needs of DH, when the goal of interaction with respondents is to acquire the primary data (eg questioning/ interviewing them on certain topic in order to make a new data set/ collection).

Questionnaires as the approach for collecting data of traditional culture date back to an early stage of the disciplinary history of Latvian folkloristics, namely, to the end of the 19th century and the beginning of the 20th century (published by Dāvis Ozoliņš, Eduard Wolter, Pēteris šmits, Pēteris Birkerts). The Archives of Latvian Folklore was established in 1924. Its founder and the first Head, folklorist and schoolteacher Anna Bērzkalne on regular basis addressed to Archives’ collaborators the questionnaires (jautājumu lapas) on various topics of Latvian folklore. She both created original sets of questions herself and translated into Latvian and adapted those by the Estonian and Finnish folklore scholars (instructions for collecting children’s songs by Walter Anderson; questionnaires of folk beliefs by O. A. F. Mustonen alias Oskar Anders Ferdinand Lönnbohm and Viljo Johannes Mansikka). The localised equivalents were published in the press and distributed to Latvian collectors. Printed questionnaires, such as “House and Household”, “Fishing and Fish”, “Relations between Relatives and Neighbors” and other, presented sets of questions of which were formulated in a suggestive way so that everyone who had some interest could easily engage in the work. The hand-written responses by contributors were sent to the Archives of Latvian Folklore from all regions of the country; the collection of folk beliefs in the late 1920s greatly supplemented the range of materials at the Archives.

However, the life of the survey as a method of collecting folklore in Latvia did not last long. Soon after the World War II it was overcome by the dominance of collective fieldwork and, at the end of the 20th century, by the individual field research, implying mainly the face-to-face qualitative interviews with the informants.

Only in 2017, the Archives of Latvian Folklore revitalized the approach of remote data collecting via the online questionnaires. Within the project “Empowering knowledge society: interdisciplinary perspectives on public involvement in the production of digital cultural heritage” (funded by the European Regional Development Fund), a virtual inquiry module has been developed. The working group of virtual ethnography launched a series of online surveys aimed to study the calendric practices of individuals in the 21st century. Along with working out the iterative inquiry, data accumulation and analysis tools, the researchers have tried to find solutions to the technical and ethical challenges of our day.

Mathematics, sociology and other sciences have developed a coherent theoretical methodology and have accumulated experience based knowledge for online survey tools. That rises several questions, such as:

- How much of this knowledge is known by DH?

- How much are they useful for DH? How different is DH CAWI?

- What would be the most important aspects for DH CAWI?

To answer these questions, we will make a schematic comparison of ‘traditional’ or most common CAWI of social sciences and those of DH, looking at previous experience of our work in fields and institutions of sociology, statistics and humanities.

Daugavietis-CAWI for DH-165_a.docx

Daugavietis-CAWI for DH-165_c.pdf

2:45pm - 3:00pm
Short Paper (10+5min) [abstract]

Wikidocumentaries

Susanna Ånäs

Aalto University

background

Wikidocumentaries is a concept for a collaborative online space for gathering, researching and remediating cultural heritage items from memory institutions, open platforms and the participants. The setup brings together communities of interest and of expertise to work together on shared topics with onnline tools. For the memory organization, Wikidocumentaries offers a platform for crowdsourcing, for amateur and expert researchers it provides peers and audiences, and from the point of view of the open environments, it acts as a site of curation.

Current environments fall short in serving this purpose. Content aggregators focus on gathering, harmonizing and serving the content. Commercial services fail to take into account the open and connected environment in the search for profit. Research environments do not prioritize public access and broad participation. Many participatory projects live short lives from enthusiastic engagement to oblivion due to lack of planning for the sustainability of the results. Wikidocumentaries tries to battle these challenges.

This short paper will be the first attempt in creating an inventory of research topics that this environment surfaces.

the topics

Technologically the main focus of the project is investigating the use of linked open data, and especially proposing the use of Wikidata for establishing meaningful connections across collections and sustainability of the collected data.

Co-creation is an important topic in many senses. What are the design issues of the environment to encourage collaborative creative work? How can the collaboration reach out from the online environment into communities of interest in everyday life? What are the characteristics of the collaborative creations or what kind of creative entrepreneurship can such open environment promote? How to foster and expand a community of technical contributors for the open environments?

The legislative environment sets the boundaries for working. How will privacy and openness be balanced? Which copyright licensing schemes can encourage widest participation? Can novel technologies of personal information management be applied to allow wider participation?

The paper will draw together recent observations from a selection of disciplines for practices in creating participatory knowledge environments.

Ånäs-Wikidocumentaries-242_a.pdf

3:00pm - 3:15pm
Short Paper (10+5min) [abstract]

Heritage Here, K-Lab and intra-agency collaboration in Norway

Vemund Olstad, Anders Olsson

Directorate for Cultural Heritage,

Heritage Here, K-Lab and intra-agency collaboration in Norway

Introduction

This paper aims to give an overview of an ongoing collaboration between four Norwegian government agencies, by outlining its history, its goals and achievements and its current status. In doing so, we will, hopefully, be able to arrive at some conclusions about the usefulness of the collaboration itself – and whether or not anything we have learned during the collaboration can be used as a model for – or an inspiration to – other projects within the cultural heritage sector or the broader humanities environment.

First phase – “Heritage Here” 2012 – 2015

Heritage Here (or “Kultur- og naturreise” as it is known in its native Norwegian) was a national project which ran between 2012 and 2015 (http://knreise.org/index.php/english/). The project had two main objectives:

1. To help increase access to and use of public information and local knowledge about culture and nature

2. To promote the use of better quality open data.

The aim being that anyone with a smartphone can gain instant access to relevant facts and stories about their local area wherever they might be in the country.

The project was a result of cross-agency cooperation between five agencies from 3 different ministries. Project partners included:

• the Norwegian Mapping Authority (Ministry of Local Government and Modernization).

• the Arts Council Norway and the National Archives (Ministry of Culture).

• the Directorate of Cultural Heritage and (until December 2014) the Norwegian Environment Agency (the Ministry of Climate and Environment).

Together, these partners made their own data digitally accessible; to be enriched, geo-tagged and disseminated in new ways. Content included information about animal and plant life, cultural heritage and historical events, and varied from factual data to personal stories. The content was collected into Norway’s national digital infrastructure ‘Norvegiana’ (http://www.norvegiana.no/) and from there it can be used and developed by others through open and documented API’s to create new services for business, tourism, or education. Parts of this content were also exported into the European aggregation service Europeana.eu (http://www.europeana.eu).

In 2012 and 2013 the main focus of the project was to facilitate further development of technical infrastructures - to help extract data from partner databases and other databases for mobile dissemination. However, the project also worked with local partners in three pilot areas:

• Bø and Sauherad rural municipalities in Telemark county

• The area surrounding Akerselva in Oslo

• The mountainous area of Dovre in Oppland county.

These pilots were crucial to the project, both as an arena to test the content from the various national datasets, but also as a testing ground for user community participation on a local and regional level. They have also been an opportunity to see Heritage Here’s work in a larger context. The Telemark pilot was for example, used to test the cloud-based mapping tools developed in the Best Practice Network “LoCloud” (http://www.locloud.eu/) which where coordinated by the National Archives of Norway.

In addition to the previously mentioned activities Heritage Here worked towards being a competence builder – organizing over 20 workshops on digital storytelling and geo-tagging of data, and numerous open seminars with topics ranging from open data and LOD, to IPR and copyright related issues. The project also organized Norway’s first heritage hackathon “#hack4no” in early 2014 (http://knreise.org/index.php/2014/02/27/hack4no-a-heritage-here-hackathon/). This first hackathon has since become an annual event – organized by one of the participating agencies (The Mapping authority) – and a great success story, with 50+ participants coming together to create new and innovative services by using open public data.

Drawing on the experiences the project had gathered, the project focused its final year on developing various web-based prototypes which use a map as the users starting point. These demonstrate a number of approaches for visualizing and accessing different types of cultural heritage information from various open data sets in different ways – such as content related to a particular area, route or subject. These prototypes are free and openly accessible as web-tools for anyone to use (http://knreise.no/demonstratorer/). The code to the prototypes has been made openly available so it can be used by others – either as it is, or as a starting point for something new.

Second phase – “K-Lab” 2016 –>

At the end of 2015 Heritage Here ended as a project. But the four remaining project partners decided to continue their digital cross-agency cooperation. So, in January 2016 a new joint initiative with the same core governmental partners was set up. Heritage here went from being a project to being a formalized collaboration between four government agencies. This new partnership is set up to focus on some key issues seen as crucial for further development of the results that came out of the Heritage Here project. Among these are:

• In cooperation develop, document and maintain robust, common and sustainable APIs for the partnerships data and content.

• Address and discuss the need for, and potential use of, different aggregation services for this field.

• Develop and maintain plans and services for a free and open flow of open and reusable data between and from the four partner organizations.

• In cooperation with other governmental bodies organize another heritage hackathon in October 2016 with the explicit focus on open data, sharing, reuse and new and other services for both the public and the cultural heritage management sector.

• As a partnership develop skillsets, networks, arenas and competence for the employees in the four partner organizations (and beyond) within this field of expertise.

• Continue developing and strengthening partnerships on a local, national and international level through the use of open workshops, training, conferences and seminars.

• Continue to work towards improving data quality and promoting the use of open data.

One key challenge at the end of the Heritage here project was making the transition from being a project group to becoming a more permanent organizational entity – without losing key competence and experience. This was resolved by having each agency employing one person from the project each and assigning this person in a 50% position to the K-Lab collaboration. The remaining time was to be spent on other tasks for the agency. This helped ensure the following things:

• Continuity. The same project group could continue working, albeit organized in a slightly different manner.

• Transfer of knowledge. Competence built during Heritage here was transferred to organizational line of the agencies involved.

• Information exchange. By having one employee from each agency meeting on a regular basis information, ideas for common projects and solutions to common problems could easily be exchanged between the collaboration partners.

I addition to the allocation of human resources, each agency chipped in roughly EUR 20.000 as ‘free funds’. The main reasoning behind this kind of approach was to allow the new entity a certain operational freedom and room for creativity – while at the same time tying it closer to the day-to-day running of the agencies.

Based on an evaluation of the results achieved in Heritage Here, the start of 2016 was spent planning the direction forward for K-Lab, and a plan was formulated – outlining activities covering several thematic areas:

Improving data quality and accessibility. Making data available to the public was one of the primary goals of the Heritage here project, and one most important outcomes of the project was the realisation that in all agencies involved there is huge room for improvement in the quality of the data we make available and how we make it accessible. One of K-Lab’s tasks will be to cooperate on making quality data available through well documented API’s and making sure as much data as possible have open licenses that allow unlimited re-use.

Piloting services. The work done in the last year of Heritage Here with the map service mentioned above demonstrated to all parties involved the importance of actually building services that make use of our own open data. K-lab will, as a part of its scope, function as a ‘sandbox’ for both coming up with new ideas for services, and – to the extent that budget and resources allow for it – try out new technologies and services. One such pilot service, is the work done by K-lab – in collaboration with the Estonian photographic heritage society – in setting up a crowdsourcing platform for improving metadata on historic photos (https://fotodugnad.ra.no/).

For 2018, K-Lab will start looking into building a service making use of linked open data from our organizations. All of our agencies are data owners that responsible for authority data in some form or another – ranging from geo names to cultural heritage data and person data. Some work has been done already to bring our technical departments closer in this field, but we plan to do ‘something’ on a practical level next year.

Building competence. In order to facilitate the exchange of knowledge between the collaboration partners K-Lab will arrange seminars, workshops and conferences as arenas for discussing common challenges, learning from each other and building networks. This is done primarily to strengthen the relationship between the agencies involved – but many activities will have a broader scope. One such example is the intention to arrange workshops – roughly every two months – on topics that are relevant to our agencies, but that are open to anyone interested. To give a rough overview of the range of topics, these workshops were arranged in 2017:

• A practical introduction to Cidoc-CRM (May)

• Workshop on Europeana 1914-1918 challenge – co-host: Wikimedia Norway (June)

• An introduction to KulturNAV – co-host: Vestfoldmuseene (September)

• Getting ready for #hack4no (October)

• Transkribus – Text recognition and transcription of handwritten text - co-host: The Munch museum (November)

Third phase – 2018 and beyond

K-lab is very much a work in progress, and the direction it takes in the future depends on many factors. However, a joint workshop was held in September 2017 to evaluate the work done so far – and to try and map out a direction for the future. Employees from all levels in the organisations were present, with invited guests from other institutions from the cultural sector – like the National Library and Digisam from Sweden – to evaluate, discuss and suggest ideas.

No definite conclusions were drawn, but there was an overall agreement that the focus on the three areas described above is of great importance, and that the work done so far by the agencies together has been, for the most part, successful. Setting up arenas for discussing common problems, sharing success stories and interacting with colleagues across agency boundaries has been a key element in the relative success of K-Lab so far. This work will continue into 2018 with focus on thematic groups on linked open data and photo archives, and a new series of workshops is being planned. The experimentation with technology will continue, and hopefully new ideas will be brought forward and realised over the course of the next year(s).

Olstad-Heritage Here, K-Lab and intra-agency collaboration-162_a.pdf

3:15pm - 3:30pm
Short Paper (10+5min) [abstract]

Semantic Annotation of Cultural Heritage Content

Uldis Bojārs^1,2, Anita Rašmane¹

¹National Library of Latvia; ²Faculty of Computing, University of Latvia

This talk focuses on the semantic annotation of textual content and on annotation requirements that emerge from the needs of cultural heritage annotation projects. The information presented here is based on two text annotation case studies at the National Library of Latvia and was generalised to be applicable to a wider range of annotation projects.

The two case studies examined in this work are (1) correspondence (letters) from the late 19th century between two of the most famous Latvian poets Aspazija and Rainis, and (2) a corpus of parliamentary transcripts that document the first four parliament terms in Latvian history (1922-1934).

The first half of the talk focus on the annotation requirements collected and how they may be implemented in practical applications. We propose a model for representing annotation data and implementing annotation systems. The model includes support for three core types of annotations - simple annotations that may link to named entities, structural annotations that mark up portions of the document that have a special meaning within a context of a document and composite annotations for more complex use cases. The model also introduces a separate Entity database for maintaining information about the entities referenced from annotations.

In the second half of the talk we will present a web-based semantic annotation tool that was developed based on this annotation model and requirements. It allows users to import textual documents (various document formats such as HTML and .docx are supported), create annotations and reference the named entities mentioned in these documents. Information about the entities references from annotations is maintained in a dedicated Entity database that supports links between entities and can point to additional information about these entities including Linked Open Data resources. Information about these entities is published as Linked Data. Annotated documents may be exported (along with annotation and entity information) in a number of representations including a standalone web view.

Bojārs-Semantic Annotation of Cultural Heritage Content-264_a.pdf

Print View

Contact and Legal Notice · Contact Address:
Conference: DHN 2018

Digital Humanities in the Nordic Countries 3rd Conference

7–9 March 2018, Helsinki

Conference Agenda

Digital Humanities in the Nordic Countries
3rd Conference