T-P674-3: Database Design
Thursday, 08/Mar/2018:
4:00pm - 5:30pm

Session Chair: Jouni Tuominen
Location: P674

4:00pm - 4:30pm
Long Paper (20+10min) [publication ready]

Open Science for English Historical Corpus Linguistics: Introducing the Language Change Database

Joonas Kesäniemi1, Turo Vartiainen2, Tanja Säily2, Terttu Nevalainen2

1University of Helsinki, Helsinki University Library; 2University of Helsinki, Department of Modern Languages

This paper discusses the development of an open-access resource that can be used as a baseline for new corpus-linguistic research into the history of English: the Language Change Database (LCD). The LCD draws together information extracted from hundreds of corpus-based articles that investigate the ways in which English has changed in the course of history. The database includes annotated summaries of the articles, as well as numerical data extracted from the articles and transformed into machine-readable form, thus providing scholars of English with the opportunity to study fundamental questions about the nature, rate and direction of language change. It will also make the work done in the field more cumulative by ensuring that the research community will have continuous access to existing results and research data.

We will also introduce a tool that takes advantage of this new source of structured research data. The LCD Aggregated Data Analysis workbench (LADA) makes use of annotated versions of the numerical data available from the LCD and provides a workflow for performing meta-analytical experimentations with an aggregated set of data tables from multiple publications. Combined with the LCD as the source of collaborative, trusted and curated linked research data, the LADA meta-analysis tool demonstrates how open data can be used in innovative ways to support new research through data-driven aggregation of empirical findings in the context of historical linguistics.

4:30pm - 4:45pm
Short Paper (10+5min) [abstract]

“Database Thinking and Deep Description: Designing a Digital Archive of the National Synchrotron Light Source (NSLS)”

Elyse Graham

Stony Brook University,

Our project involves developing a new kind of digital resource to capture the history of research at scientific facilities in the era of the “New Big Science.” The phrase “New Big Science” refers to the post-Cold War era at US national laboratories, when large-scale materials science accelerators rather than high-energy physics accelerators became marquee projects at most major basic research laboratories. The extent, scope, and diversity of research at such facilities makes keeping track of it difficult to compile using traditional historical methods and linear narratives; there are too many overlapping and bifurcating threads. The sheer number of experiments that took place at the NSLS, and the vast amount of data that it produced across many disciplines, make it nearly impossible to gain a comprehensive global view of the knowledge production that took place at this facility.

We are therefore collaborating to develop a new kind of digital resource to capture the full history of this research. This project will construct a digital archive, along with an associated website, to obtain a comprehensive history of the National Synchrotron Light Source at Brookhaven National Laboratory. The project specifically will address the history of “the New Big Science” from the perspectives of data visualization and the digital humanities, in order to demonstrate that new kinds of digital tools can archive and present complex patterns of research and configurations of scientific infrastructure. In this talk, we briefly discuss methods of data collection, curation, and visualization for a specific case project, the NSLS Digital Archive.

4:45pm - 5:00pm
Distinguished Short Paper (10+5min) [publication ready]

Integrating Prisoners of War Dataset into the WarSampo Linked Data Infrastructure

Mikko Koho1, Erkki Heino1, Esko Ikkala1, Eero Hyvönen1,2, Reijo Nikkilä3, Tiia Moilanen3, Katri Miettinen3, Pertti Suominen3

1Semantic Computing Research Group (SeCo), Aalto University, Finland; 2HELDIG - Helsinki Centre for Digital Humanities, University of Helsinki, Finland; 3The National Prisoners of War Project

One of the great promises of Linked Data and the Semantic Web standards is to provide a shared data infrastructure into which more and more data can be imported and aligned, forming a sustainable, ever growing knowledge graph or linked data cloud, Web of Data. This paper studies and evaluates this idea in the context of the WarSampo Linked Data cloud, providing an infrastructure for data related to the Second World War in Finland. As a case study, a new database of prisoners of war with related contents is considered, and lessons learned discussed in relation to using traditional data publishing approaches.

5:00pm - 5:15pm
Short Paper (10+5min) [abstract]

"Everlasting Runes": A Research Platform and Linked Data Service for Runic Research

Magnus Källström1, Marco Bianchi2, Marcus Smith1

1Swedish National Heritage Board; 2Uppsala University

"Everlasting Runes" (Swedish: "Evighetsrunor") is a three-year collaboration between the Swedish National Heritage Board and Uppsala University, with funding provided by the Bank of Sweden Tercentenary Foundation (Riksbankens jubileumsfond) and the Royal Swedish Academy of Letters (Kungliga Vitterhetsakademien). The project combines philology, archaeology, linguistics, and information systems, and is comprised of several research, digitisation, and digital development components. Chief among these is the development of a web-based research platform for runic researchers, built on linked open data services, with the aim of drawing together disparate structured digital runic resources into a single convenient interface. As part of the platform's development, the corpus of Scandinavian runic inscriptions in Uppsala University's Runic Text Database will be restructured and marked up for use on the web, and linked against their entries in the previously digitised standard corpus work (Sveriges runinskrifter). In addition, photographic archives of runic inscriptions from the 19th- and 20th centuries from both the Swedish National Heritage Board archives and Uppsala University library will be digitised, alongside other hitherto inaccessible archive material.

As a collaboration between a university and a state heritage agency with a small research community as its primary target audience, the project must bridge the gap between the different needs and abilities of these stakeholders, as well as resolve issues of long-term maintenance and stability which have previously proved problematic for some of the source datasets in question. It is hoped that the resulting research- and data platforms will combine the strengths of both the National Heritage Board and Uppsala university to produce a rich, actively-maintained scholarly resource.

This paper will present the background and aims of the project within the context of runic research, as well as the various datasets that will be linked together in the research platform (via its corresponding linked data service) with particular focus on the data structures in question, the philological markup of the corpus of inscriptions, and requirements gathering.

5:15pm - 5:30pm
Distinguished Short Paper (10+5min) [abstract]

Designing a Generic Platform for Digital Edition Publishing

Niklas Liljestrand

Svenska litteratursällskapet i Finland r.f.,

This presentation describes the technical design for streamlining work with publishing Digital Editions on the web. The goal of the project is to provide a platform for scholars working with Digital Editions to independently create, edit, and publish their work. The platform is to be generic, but with set rules of conduct and processes, providing rich documentation of use.

The work on the platform started during 2016, with a rebuild of the website for Zacharias Topelius Skrifter for the mobile web (presented during DHN 2017, The work continues with building the responsive site to be easily customizable and suite the different Editions needs.

The platform will consist of several independent tools, such as tools for publishing, version comparison, editing, and tagging XML TEI formatted documents. Many of the tools are already available today, but they are heavily dependent on customization for each new edition and MS Windows only. For the existing tools, the project aims to combine, simplify and make the tools platform independent.

The project will be completed within 2018 and the aim is to publish all tools and documentation as open-source.

