Session | ||
T-PIII-1: Open and Closed
| ||
Presentations | ||
11:00am - 11:30am
Long Paper (20+10min) [abstract] When Open becomes Closed: Findings of the Knowledge Complexity (KPLEX) Project. Trinity College Dublin, Ireland The future of cultural heritage seems to be all about “data.” A Google search on the term ‘data’ returns over 5.5 billion hits, but the fact that the term is so well embedded in modern discourse does not necessarily mean that there is a consensus as to what it is or should be. The lack of consensus regarding what data are on a small scale acquires greater significance and gravity when we consider that one of the major terminological forces driving ICT development today is that of "big data." While the phrase may sound inclusive and integrative, "big data" approaches are highly selective, excluding any input that cannot be effectively structured, represented, or, indeed, digitised. The future of DH, of any approaches to understanding complex phenomena or sources such as are held in cultural heritage institutions, indeed the future of our increasingly datafied society, depend on how we address the significant epistemological fissures in our data discourse. For example, how can researchers claim that "when we speak about data, we make no assumptions about veracity" while one of the requisites of "big data" is "veracity"? On the other hand, how can we expect humanities researchers to share their data on open platforms such as the European Open Science Cloud (EOSC) when we, as a community, resist the homogenisation implied and required by the very term “data”, and share our ownership of it with both the institutions that preserve it and the individuals that created it? How can we strengthen European identities and transnational understanding through the use of ICT systems when these very systems incorporate and obscure historical biases between languages, regions and power elites? In short, are we facing a future when the mirage of technical “openness” actually closes off our access to the perspectives, insight and information we need as scholars and as citizens? Furthermore, how might this dystopic vision be avoided? These are the kinds of questions and issues under investigation by the European Horizon 2020 funded Knowledge Complexity (KPLEX) project. by applying strategies developed by humanities researchers to deal with complex messy, cultural data; the very kind of data that resists datafication and poses the biggest challenges to knowledge creation in large data corpora environments. Arising out of the findings of the KPLEX project, this paper will present the synthesised findings of an integrated set of research questions and challenges addressed by a diverse team led by Trinity College Dublin (Ireland) and encompassing researchers in Freie Universität Berlin (Germany), DANS-KNAW (The Hague) and TILDE (Latvia). We have adopted a comparative, multidisciplinary, and multi-sectoral approach to addressing the issue of bias in big data; focussing on the following 4 key challenges to the knowledge creation capacity of big data approaches: 1. Redefining what data is and the terms we use to speak of it (TCD); 2. The manner in which data that are not digitised or shared become "hidden" from aggregation systems (DANS-KNAW); 3. The fact that data is human created, and lacks the objectivity often ascribed to the term (FUB); 4. The subtle ways in which data that are complex almost always become simplified before they can be aggregated (TILDE). The paper will presenting a synthesised version of these integrated research questions, and discuss the overall findings and recommendations of the project, which completes its work at the end of March 2018. What follows gives a flavour of the work ongoing at the time of writing this abstract, and the issues that will be raised in the DHN paper. 1. Redefining what data is and the terms we use to speak of it. Many definitions of data, even thoughtful scholarly ones, associate the term with a factual or objective stance, as if data were a naturally occurring phenomenon. But data is not fact, nor is it objective, nor can it be honestly aligned with terms such as ‘signal’ or ‘stimulus,’ or the quite visceral (but misleading) ‘raw data.’ To become data, phenomena must be captured in some form, by some agent; signal must be separated from noise, like must be organised against like, transformations occur. These organisational processes are human determined or human led, and therefore cannot be seen as wholly objective; irrespective of how effective a (human built) algorithm may be. The core concerns of this facet of the project was to expand the understanding of the heterogeneity of definitions of data, and the implications of this state of understanding. Our primary ambition under this theme was to establish a clear taxonomy of existing theories of data, to underpin a more applied, comparative comparison of humanistic versic technical applications of the term. We did this by identifying the key terms (and how they are used differently), key points of bifurcation, and key priorities under each conceptualisation of data. As such, this facet of the project supported the integrated advancement of the three other project themes, as well as itself developing new perspectives on the rhetorical stakes and action implications of differing concepts of the term ‘data’ and how these will impact on the future not only of DH but of society at large. 2. Dealing with ‘hidden’ data. According to the 2013 ENUMERATE Core 2 survey, only 17% of the analogue collections of European heritage institutions had at that time been digitised. This number actually represents a decrease over the findings of their 2012 survey (almost 20%). The survey also reached only a limited number of respondents: 1400 institutions over 29 countries, which surely captures the major national institutions but not local or specialised ones. Although the ENUMERATE Core 2 report does not break down these results by country, one has to imagine that there would be large gaps in the availability of data from some countries over others. Because so much of this data has not been digitised, it remains ‘hidden’ from potential users. This may have always been the case, as there have always been inaccessible collections, but in a digital world, the stakes and the perceptions are changing. The fact that so much other material is available on-line, and that an increasing proportion of the most well-used and well-financed cultural collections are as well, means that the reasonable assumption of the non-expert user of these collections is that what cannot be found does not exist (whereas in the analogue age, collections would be physically contextualised with their complements, leaving the more likely assumption to be that more information existed, but could not be accessed). The threat that our narratives of histories and national identities might thin out to become based on only the most visible sources, places and narratives is high. This facet of the project explored the manner in which data that are not digitised or shared become "hidden" from aggregation systems. 3. Knowledge organisation and epistemics of data. The nature of humanities data is such that even within the digital humanities, where research processes are better optimised toward the sharing of digital data, sharing of "raw data" remains the exception rather than the norm. The ‘instrumentation’ of the humanities researcher consists of a dense web of primary, secondary and methodological or theoretical inputs, which the researcher traverses and recombines to create knowledge. This synthetic approach makes the nature of the data, even at its ‘raw’ stage, quite hybrid, and already marked by the curatorial impulse that is preparing it to contribute to insight. This aspect may be more pronounced in the humanities than in other fields, but the subjective element is present in any human triggered process leading to the production or gathering of data. Another element of this is the emotional. Emotions are motivators for action and interaction that relate to social, cultural, economic and physiological needs and wants. Emotions are crucial factors in relating or disconnecting people from each other. They help researchers to experientially assess their environments, but this aspect of the research process is considered taboo, as noise that obscures the true ‘factual signal’, and as less ‘scientific’ (seen in terms of strictly Western colonialist paradigms of knowledge creation) than other possible contributors to scientific observation and analysis. Our primary ambition here was to explore the data creation processes of the humanities and related research fields to understand how they combine pools of information and other forms of intellectual processing to create data that resists datafication and ‘like-with-like’ federation with similar results. The insights gained will make visible many of the barriers to the inclusion of all aspects of science under current Open Science trajectories, and reveal further central elements of social and cultural knowledge that are unable to be accommodated under current conceptualisations of ‘data’ and the systems designed to use them. 4. Cultural data and representations of system limitations. Cultural signals are ambiguous, polysemic, often conflicting and contradictory. In order to transform culture into data, its elements – as all phenomena that are being reduced to data – have to be classified, divided, and filed into taxonomies and ontologies. This process of 'data-fication' robs them of their polysemy, or at least reduces it. One of the greatest challenges for so-called Big Data is the analysis and processing of multilingual content. This challenge is particularly acute for unstructured texts, which make up a large portion of the Big Data landscape. How do we deal with multilingualism in Big Data analysis? What are the techniques by which we can analyze unstructured texts in multiple languages, extracting knowledge from multilingual Big Data? Will new computational techniques such as AI deep learning improve or merely alter the challenges? The current method for analyzing multilingual Big Data is to leverage language technologies such as machine translation, terminology services, automated speech recognition, and content analytics tools. In recent years, the quality and accuracy of these key enabling technologies for Big Data has improved substantially, making them indispensable tools for high-demand applications with a global reach. However, just as not all languages are alike, the development of these technologies differs for each language. Larger languages with high populations have robust digital resources for their languages, the result of large-scale digitization projects in a variety of domains, including cultural heritage information. Smaller languages have resources that are much more scant. Those resources that do exist may be underpinned by far less robust algorithms and far smaller bases for the statistical modelling, leading to less reliable results, a fact that in large scale, multilingual environments (like Google translate) is often not made transparent to the user. The KPLEX project is exploring and describing the nature and potential for ‘data’ within these clearly defined sets of actors and practices at the margins of what is currently able to be approached holistically using computational methods. It is also envisioning approaches to the integration of hybrid data forms within and around digital platforms, leading not so much to the virtualisation of information generation as approaches to its augmentation. 11:30am - 11:45am
Short Paper (10+5min) [publication ready] Open, Extended, Closed or Hidden Data of Cultural Heritage 1National Library of Finland, Finland; 2Ruralia Institute, University of Helsinki, Finland The National Library of Finland (NLF) agreed on an “Open National Library” policy in 2016[1]. In the policy there are eight principles, which are divided into accessibility, openness in actions and collaboration themes. Accessibility in the NLF means that access to the material needs to exist both for the metadata and the content, while respecting the rights of the rights holders. Openness in operations means that our actions and decision models are transparent and clear, and that the materials are accessible to the researchers and other users. These are one way in which the NLF can implement the findable, accessible, interoperable, re-usable (FAIR) data principles [2] themes in practise. The purpose of this paper is to view the way in which the policy has impacted our work and how findability and accessibility have been implemented in particular from the aspects of open, extended, closed and hidden data themes. In addition, our aim is to specify the characteristics of existing and potential forms of data produced by the NLF from the research and development perspectives. A continuous challenge is the availability of the digital resources – gaining access to the digitised material for both researchers and the general public, since there are also constant requests for access to newer materials outside the legal deposit libraries’ work stations 11:45am - 12:00pm
Distinguished Short Paper (10+5min) [publication ready] Aalto Observatory for Digital Valuation Systems 1Aalto University, Department of Communications and Networking; 2Aalto University, Department of Management Studies Money is a recognised factor in creating sustainable, affluent societies. Yet, the neoclassical orthodoxy that prevails in our economic thinking remains as a contested area, its supporters claiming their results to be objective- ly true while many heterodox economists claim the whole system to stand on clay feet. Of late, the increased activity around complementary currencies suggest that the fiat money zeitgeist might be giving away to more variety in our monetary system. Rather than emphasizing what money does, as the mainstream economists do, other fields of science allow us to approach money as an integral part of the hierarchies and networks of exchange through which it circulates. This paper suggests that a broad understanding of money and more variety in monetary system have great potentials to further a more equalitarian and sustainable economy. They can drive the extension of society to more inclusive levels and transform people’s economic roles and identities in the process. New technologies, including blockchain and smart ledger technology are able to support decentralized money creation through the use of shared and “open” peer-to-peer rewarding and IOU systems. Alongside of specialists and decision makers’ capabilities, our project most pressingly calls for engaging citizens into the process early on. Multidisciplinary competencies are needed to take relevant action to investigate, envision and foster novel ways for value creation. For this, we are forming the Aalto Observatory on Digital Valuation Systems to gain deeper understandings of sustainable value creation structures enabled by new technology. 12:00pm - 12:15pm
Short Paper (10+5min) [publication ready] Challenges and perspectives on the use of open cultural heritage data across four different user types: Researchers, students, app developers and hackers 1Royal Danish Library; 2University of Copenhagen In this paper, we analyse and discuss from a user perspective and from an organisational perspective the challenges and perspectives of the use of open cultural heritage data. We base our study on empirical evidence gathered through four cases where we have interacted with four different user groups: 1) researchers, 2) students, 3) app developers and 4) hackers. Our own role in these cases was to engage with these users as teachers, organizers and/or data providers. The cultural heritage data we provided were accessible as curated data sets or through API's. Our findings show that successful use of open heritage data is highly dependent on organisations' ability to calibrate and curate the data differently according to contexts and settings. More specifically, we show what different needs and motivations different user types have for using open cultural heritage data, and we discuss how this can be met by teachers, organizers and data providers. 12:15pm - 12:30pm
Short Paper (10+5min) [abstract] Synergy of contexts in the light of digital humanities: a pilot study State University of Applied Sciences in Racibórz The present paper describes a pilot study pertaining to the linguistic analysis of meaning with regard to the word ladder[EN]/drabina[PL] taking into account views of digital humanities. Therefore, WordnetLoom mapping is introduced as one of the existing research tools proposed by CLARIN ERIC research and technology infrastructure. The explicated material comprises retrospective remarks and interpretations provided by 74 respondents, who took part in a survey. A detailed classification of multiple word’s meanings is presented in a tabular way (showing the number of contexts, in which participants accentuate the word ladder/drabina) along with some comments and opinions. Undoubtedly, the results suggest that apart from the general domain of the word offered for consideration, most of its senses can usually be attributed to linguistic recognitions. Moreover, some perspectives on the continuation of future research and critical afterthoughts are made prominent in the last part of this paper. |