4:00pm - 4:30pmLong Paper (20+10min) [abstract]Dialects of Discord. Using word embeddings to analyze preferred vocabularies in a political debate: nuclear weapons in the Netherlands 1970-1990
Ralf Futselaar, Milan van Lange
NIOD, Institute for War-, Holocaust-, and Genocide Studies
We analyze the debate about the placement of nuclear-enabled cruise missiles in the Netherlands during the 1970s and 1980s. The NATO “double-track decision” of 1979 envisioned the placement of these weapons in the Netherlands, to which the Dutch government eventually agreed in 1985. In the early 1980s, the controversy regarding placement or non-placement of these missiles led to the greatest popular protests in Dutch history and to a long and often bitter political controversy. After 1985, due to declining tensions between the Societ Block and NATO, the cruise missiles were never stationed in the Netherlands. Much older nuclear warheads, in the country since the early 1960s, remain there until today.
We are using word embeddings to analyze this particularly bipolar debate in the proceedings of the Dutch lower and upper house of Parliament. The official political positions, as expressed in party manifestos and voting behavior inside parliament, were stable throughout this period. We demonstrate that in spite of this apparent stability, the vocabularies used by representatives of different political parties changed significantly through time.
Using the word2vec algorithm, we have created a combined vector including all synonyms and near-synonyms of “nuclear weapon” used in the proceedings of both houses of parliament during the period under scrutiny. Based on this combined vector, and again using word2vec, we have identified nearest neighbors of words used to describe nuclear weapons. These terms have been manually classified, insofar relevant, into terms associated with a pro-proliferation or anti-proliferation viewpoint, for example “defense” and “disarmament” respectively.
Obviously, representatives of all Dutch political parties used words from both categories in parliamentary debates. At any given time, however, we demonstrate that different political parties could be shown to have clear preferences in terms of vocabulary. In the “discursive space” created by the binary opposition between pro- and contra-proliferation words, political parties can be shown to have specific and distinct ways of discussing nuclear weapons.
Using this framework, we have analyzed the changing vocabularies of different political parties. This allows us to show that, while stated policy positions and voting behavior remained unchanged, the language used to discuss nuclear weapons shifted strongly towards anti-proliferation terminology. We have also been able to show that this change happened at different times for different political parties. We speculate that these changes resulted from perceived changes of opinion among the target electorates of different parties, as well as the changing geopolitical climate of the mid-to-late 1980s, where nuclear non-proliferation became a more widely shared policy objective.
In the conclusion of this paper, we show that word embedding models offer a, methodology to investigate shifting political attitudes outside of, and in addition to, stated opinions and voting patterns.
4:30pm - 4:45pmDistinguished Short Paper (10+5min) [publication ready]Emerging Language Spaces Learned From Massively Multilingual Corpora
Jörg Tiedemann
University of Helsinki,
Translations capture important information about languages that can be used as implicit supervision in learning linguistic properties and semantic representations. Translated texts are semantic mirrors of the original text and the significant variations that we can observe across languages can be used to disambiguate the meaning of a given expression using the linguistic signal that is grounded in translation. Parallel corpora consisting of massive amounts of human translations with a large linguistic variation can be used to increase abstractions and we propose the use of highly multilingual machine translation models to find language-independent meaning representations. Our initial experiments show that neural machine translation models can indeed learn in such a setup and we can show that the learning algorithm picks up information about the relation between languages in order to optimize transfer leaning with shared parameters. The model creates a continuous language space that represents relationships in terms of geometric distances, which we can visualize to illustrate how languages cluster according to language families and groups. With this, we can see a development in the direction of data-driven typology -- a promising approach to empirical cross-linguistic research in the future.
4:45pm - 5:15pmLong Paper (20+10min) [publication ready]Digital cultural heritage and revitalization of endangered Finno-Ugric languages
Anisia Katinskaia, Roman Yangarber
University of Helsinki, Department of Computer Science
Preservation of linguistic diversity has long been recognized as a crucial, integral part of supporting our cultural heritage. Yet many ”minority” languages—lacking state official status—are in decline, many severely endangered. We present a prototype system aimed at ”heritage” speakers of endangered Finno-Ugric languages. Heritage speakers are people who have heard the language used by the older generations while they were growing up, and possess a considerable passive
competency (well beyond the ”beginner” level), but are lacking in active fluency. Our system is based on natural language processing and artificial intelligence. It assists the learners by allowing them to use arbitrary texts of their choice, and by creating exercises that require them to engage in active production of language—rather than in passive memorization of material. Continuous automatic assessment helps guide the learner toward improved fluency. We believe that providing such AI-based tools will help bring these languages to the forefront of the modern digital age, raise prestige, and encourage the younger generations to become involved in reversal of decline.
5:15pm - 5:30pmShort Paper (10+5min) [publication ready]The Fractal Structure of Language: Digital Automatic Phonetic Analysis
William A Kretzschmar Jr
University of Georgia,
In previous study of the Linguistic Atlas data from the Middle and South Atlantic States (e.g. Kretzschmar 2009, 2015), it has been shown that the frequency profiles of variant lexical responses to the same cue are all patterned in nonlinear A-curves. Moreover, these frequency profiles are scale-free, in that the same A-curve patterns occur at every level of scale. In this paper, I will present results from a new study of Southern American English that, when completed, will include over one million vowel measurements from interviews with a sample of sixty-four speakers across the South. Our digital methods, adaptation of the DARLA and FAVE tools for forced alignment and automatic formant extraction, prove that speech outside of the laboratory or controlled settings can be processed by automatic means on a large scale. Measurements in F1/F2 space are analyzed using point-pattern analysis, a technique for spatial data, which allows for creation and comparison of results without assumptions of central tendency. This Big Data resource allows us to see the fractal structure of language more completely. Not only do A-curve patterns describe the frequency profiles of lexical and IPA tokens, but they also describe the distribution of measurements of vowels in F1/F2 space, for groups of speakers, for individual speakers, and even for separate environments in which vowels occur. These findings are highly significant for how linguists make generalizations about phonetic data. They challenge the boundaries that linguists have traditionally drawn, whether geographic, social, or phonological, and demand that we use a new model for understanding language variation.
|