Bestand wählen
Merken

Analyzing paradigmatic language change by visual correlation

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
this is joint work by the way with my colleague at ideas smog computes and so they could try to this at the we visited the
Stalinist consult which and the title is analyzing paradigmatic change by visual correlation alright what
do we mean by paradigmatic change paradigmatic relationships
are 1 of the fundamental relationships between words identified originally by filled in on the source you the the in his still classical text book which was published post humanist and words operating magically related basically if they have a similar meaning and you can detect this by analyzing the typically usage of words so the context around to the word so and and that that's the distributional semantics hypothesis that words that that the or currents in similar contexts often also have a similar meaning it includes classical synonyms but also the so-called coal hypernyms so we will see lots of examples of paradigmatically related words for the sake of of completeness the other fundamental really that relationship between words is the syntagmatic relationship and that covers words that typically Colker together so that covers what comes after a while another etc. and so the hypothesis that we want to explore is that words with a similar usage context that is paradigmatically related words the rise and fall together so they be they become more frequent or the decrease in frequency over time
together and the approach to do so is that we take into account the 2 fundamental factors involved 1 is the frequency change and the other 1 is the distributional
semantics of words and in order to visually correlates of these 2 main factors we need to do dimensionality reduction because you can only visualized as many dimensions at once but as you all I will 1st describe the approach and then give quite a few examples of and have concluding remarks of course and because so serious so central to the central relationship here the Verimag relationship here is 1 of his quotes time changes all things and there's no reason why language should escape this universal law alright let's look
1st at at an example this is taking from a corpus which I will describe a little bit more in detail later which covers speed and side to important weekly magazines in Germany from 1953 to 2015 and it shows that the a nominalization race in this case U N G is very clearly our but CD-I with the fitted it's it's even more clearly It's very clearly decreasing over time and on so we see here that these paradigmatically related words from which you can expect that they also have a similar usage context that they decrease that together all the time we assume that this is probably even to it due to stylistic guides the newspaper language should not be like scientific language and should avoid nominalization and rather use verbs to be more easily comprehensible and so how do we go about it to visualize frequency change but really the use of color and to this if for for this purpose we simply try to foot fits a logistic growth curve through the frequencies measured in a fixed time intervals this is not an invention by us that that is just how and the decrease and increase the offer linguistic phenomena are is usually modeled only and the we end in this case the the exponents here can be an arbitrary polynomial here it's only a 1st order polynomial then we have an intercept which is not not really important we have a slope and this slope then characterizes in 1 parameter the frequency of development technically we do this with a generalized linear model of the binomial family on the and then once we have to slope it's an easy matter to simply map this 1 parameter two-way color and in this case a kind of a rainbow of color scale so that now words with this similar slope with a similar frequency development even if they are on a different scales up to 2 altogether and I get simply the same or very similar color this 2nd thing of so here we have again some some examples again this nominalization with all the 1 spot on the frequency per million scale and ones on the lobbied scale and there we see that's simply a similar of simply linear fit on on the log at scale of 2 terms rent well hello good is this shit and how are 2 slopes and distributed we see that the majority of words don't have a very clear temporal development and in particular are over it's a rather suck such a short period of all for just 50 years but we do have some very clear rising and some very clear fall of falling words and the solid all are a square is of whenever there is no clear development in 1 direction then we also don't get a very good of so Austria but when there is a clear development we we get really very good fits so for example to the examples that we solved the already constituted the pretty good fit but the fit of course improves if we increase the order of 2 Pollock polynomial so if it doesn't make much sense to go beyond the degree to in this case was a polynomial of degree 2 you can also model effects were were something rises and then of falls again which is quite a common development as well and of course if it if you in introduce more pop the parameters you always get a better fit and so how do we go about representing word usage in a few dimensions so but the problem here is
the to use its context is a very of the word is very high dimension and really in it he we see that all these so this represents to the word 2 words before his represents the word 1 word before and that and the same for the other direction and the dimensionality of all these things up is simply that that the number of the number of distinct words we call them types in the vocabulary in the order of 100 thousand 150 thousand or even more of if we provide some the cut of of minimum frequency so the basic idea the 1st step of dimensionality reduction it is and this was popularized by it said popularized brother recently by a very successful and scalable implementation were to wreck it is called the Thom by diet cool group is to simply trained by a backpropagation network with some really Clevo of sampling strategies so that you then represent this high dimensionality construct hidden layer and this hidden layer then typically has a dimension of 100 to 200 depending on how accurate you you want to have it and it's of course much easier tend to calculate but with 100 or 200 dimensions we more concretely we use in this case there's a so-called structured skip gram approach that is we take into account the order of 2 context your original implementation of model the context simply as a so called bag of words so it didn't take into account didn't distinguish how far the words in the context window where the rate and I thank you for I remember we're using a context window of minus 5 to plus fight and the other but a club problem because we're interested in modeling to develop also the semantics development of words over time is that we need to coordinate the word embeddings for different time periods and the approach that we use here is that we start initially was a random a neural net and then we simply trying it for the 1st period and use deep and use the resulting net as an input for the next period etc. and thereby we have 2 neural nets coordinated overtime . exist other approaches where you can do that exports this is what we chose to do we still have too many dimensions to do something with respect to visualization being to visualize it so in the next step we take these 100 or 200 dimension and applied what the a I'd rather popular in visualization searches for other popular the dimensionality reduction techniques particularly suitable for refusing to 2 or at most 3 dimensions it's called T distributed stochastic neighbor embedding embedding and the basic idea is that you calculate the probability of a word being similar to another word in a in in a high dimension XI and F X J are now the vector representations of the contexts of the word in 100 or 200 dimensions on and then you see another of probability distributions on 2 dimensions in this case this is a key distribution this should be a Gaussian distribution and that that such that the Kobuk liable divergent between these 2 just this probabilities distributions is minimal only and this is a 1 aspect of of this particular dimensionality reduction technique is and that should be a caveat in interpreting the results it does not preserve global structure but it preserves local structure so it is optimized in a way similar words should also book still be close in the low-dimensional space but a distance words can then be more or less arbitrarily the image position this is different from like for example Principal Component Analysis where global structure is also freezer yeah we
see the original example in this now particular visualization this snapshot is from 1955 this snapshot is from a the 1st 5 years around 1955 dissonance much for this 5 years around to fit the 2015 and color greenish-blue which means are decreasing and reddish means of increasing and we see also at 1st glance we here we have many more terms so not only have to his consistently they are similar college and not only have they been consistently going down but of course also that whole semantic field was more diversified it had more different words it was a more productive but originally than it is now in actual usage instead song this is 1 example height interestingly well that's another nominalization MIT with all the but so red is rising and green blue this is a decreasing of here
a short summary on the corpora on which we applied this method 1 is a very beautiful corpus that has been compiled not by my other colleagues in in sub looking it's transaction flows Philosophical Transactions of the Royal Society of London in in this particularly investigation from 16 65 to 18 69 they're currently extending it because of the Royal Society has donated them with of extensions beyond that then already mentioned speeded sides and the biggest corpus is still the Deutsche if events corpus German Reference Corpus daily coal that is the main corpus of written language that is compiled by DIDS my Institute here we use time slices of 10 years for this long a period here 5 years and Derrico is only 2000 to 2015 and only the news part of difficult but but here we have 18 billion of words Verizon Royal Society for example we have only 35 thousand words of those data here of of of come to that later that doctors are some derived statistics as Everything is available as so under this Eurail you you will find a a links to all these individual visualization so you can play around with that this is how the
visualization looks completely assumed out this is 1 of the royal society corpus we see here and in this this is now the time slice which covers the entire Royal Society corpus will can then navigate on on on this chart and 2 different time slices to also see how the landscape of changes that bluish region here is simply not English at this lepton we have also a French region because it's a multilingual corpus the and on this side you can always look at the frequency development together with the curves for selected words in the corpus here is also a nice example of the trial was built with a wide a being slows look slowly substituted with trial being spelled with and I 1 can then also assume but would what will also sees here is already an indication of this hypothesis that that they're genetically related words that is close words in this visualization rise and fall together the colors here are not uniformly distributed not randomly distributed we have very clear islands of reddish or greenish coloring time we can also look at this more formally by by analyzing whether there is a correlation in the slope of a frequency between the close a between very similar words so what we've done here is 40 individual corpus years Royal Society sprinted side and daily coal we have simply computed differ nearest neighbor the 2nd nearest neighbor and to certain nearest neighbor in a vector space not on the two-dimensional space but the more accurate vector space and then applied to Spielman rank correlation and we see here for example the slopes of the nearest neighbors in in our sky haven't relieve fairly high correlation of 0 comma decimal 7 2 7 it's a it's less than an hour evolved in speed inside and even a lesson Daily call we we also see that at least in these 2 corpora the correlation is strongest and always for the slope and next strongest afforded Courbet euros so the 2nd of the water Our and we also see that it decreases with increasing the distance between the neighbors so that basic hypothesis also has some statistical right next round another thing that we can do is if we look now at at different snapshots of the of the royal society corpus in 16 60 17 117 40 and so on until 18 60 we see a very clear diverse diversification of vocabulary in this corpus well science and language for science exploded in this time so it all's and also semantically so it simply needed more vocabulary for that and of course the later periods staying turned more red-colored words the than that early periods and and we also see at 1 glance that letting died out as being part of this particular publications and here if you are a nice
examples to give a better understanding of what levels of paradigmatic similarity and can be seen by about its visualization that's 1 of the favorite examples of my linguistic friends sets of sub because they're more interested in grammatical phenomena that and ends the magical phenomena it's a very clear decrease of W H adverbs so in the early stages of the royal society we had all these calls after Claw it very very long sentences which were typically connected with this we're with thereafter and and and so on so all these but clearly goal go down but another dramatically example
is that present tense goes up other goes down right present tense goes down employs delivers the etc. Varelas passive or passed at this level we we cannot really say whether this is a passive constrict or intestines construct go up shown recognized etc. earn grammar style I
like this person adjectives like industrious expert NEARnet honest but typically kind of a digital enunciate so so a reference to 2 Honourable reference to a person the they they go down the the rest for example process nouns evolution recommendation calcification they take it go up this this by the way really that completely the other way round as in disputed site of our corpus that's nominalization in the development of scientific language is a very important grammatical style phenomenon but a
good deal of of this very dramatic change barrel change of course is simply due to the thematic reasons if the scene becomes important then it needs a vocabulary off paradigmatically related words but also that can be interesting like adjectives of Providence like Italian English British-French-German etc. they go down and chemistry of these or all of the chemical compounds and and elements I go clearly up and is
it even worth in 1 seem in medicine but there is a clear trend from describing symptoms these are all symptoms that they all go down to but in this case I I called it and that to a more analytical approach towboats medicine then just empirical did the observing and describing the symptoms 1 can also ask the other
question that is what our nearest neighbors what that have at the debt have that supersede each other so that go down and have been banned the nearest neighbor goes up alike bigness clearly becomes an obsolete term and a sort of substituted with science or 1 of my a the examples is truths go down and effects go up well at least in this period we re it might it might go the other way around to multiple truths whatever plentiful abundant all these examples makes sense and it's just a nice way to use of the numbers that we have to do to find out these things beaded side called close we see that mostly it is relating to named entities like Carter's Obama or how Belgo Klinsmann and similar roles or will they science involved with 1 other thing is that except for the named entities liking the Royal Society corpus we see that the white light at that that parallel changes typically call hypernyms it's not synonyms whereas the change that by that D. O. posing change but typically concerns kinds of synonyms language doesn't like synonyms that because you do you often chooses 1 term over the the other what I and then this is clearly mostly thematic but it's also nice to see that windows the superseded by Android at noise socially doing is going down show Benzer goes up we all know that from our academic institutions on that assessment all spective fly or also interesting saves Moelzer sweet seat of CEP Malta as a non neutral term 2 so we see it as well at least currently more neutral term on yeah that was it already then summary we have seen I hope that there genetically related words indeed rise and fall together and that it's nice to visualize this in order to explore a corpus for such phenomena that that basic inside isn't anything new family a live of our already on based on lexica form this hypothesis of peril and change then we have a constant equal rate hypothesis etc. on the other thanks for your attention
Kugel
Korrelationsfunktion
Mathematik
Modelltheorie
Korrelationsfunktion
Kugel
Fundamentalsatz der Algebra
Distributionstheorie
Vervollständigung <Mathematik>
Mathematik
Klassische Physik
Mathematik
Strömungsrichtung
Ähnlichkeitsgeometrie
Gerichteter Graph
Statistische Hypothese
Teilbarkeit
Ähnlichkeitsgeometrie
Arithmetisches Mittel
Korrelation
Modelltheorie
Logistische Verteilung
Hausdorff-Dimension
Gruppenoperation
Familie <Mathematik>
Mathematik
Gesetz <Physik>
Term
Ähnlichkeitsgeometrie
Richtung
Eins
Korrelation
Lineare Regression
Ausgleichsrechnung
Logistische Verteilung
Grundraum
Korrelationsfunktion
Nominalskaliertes Merkmal
Distributionstheorie
Zentrische Streckung
Parametersystem
Strahlensätze
Kurve
Mathematik
Exponent
Kurve
Ähnlichkeitsgeometrie
Frequenz
Frequenz
Teilbarkeit
Ordnungsreduktion
Strahlensätze
Polynom
Quadratzahl
Minimalgrad
Kantenfärbung
Modelltheorie
Ordnung <Mathematik>
Stochastik
Resultante
Distributionstheorie
Subtraktion
Gewicht <Mathematik>
Sterbeziffer
Ortsoperator
Extrempunkt
Hausdorff-Dimension
Acht
Familie <Mathematik>
Gruppenkeim
Zahlenbereich
Mathematik
Raum-Zeit
Richtung
Divergenz <Vektoranalysis>
Karhunen-Loève-Transformation
Gruppendarstellung
Algebraische Struktur
Hausdorff-Dimension
Vektor
Nichtunterscheidbarkeit
Modelltheorie
Abstand
Schnitt <Graphentheorie>
Topologische Einbettung
Divergenz <Vektoranalysis>
Diskrete Wahrscheinlichkeitsverteilung
Topologische Einbettung
Ähnlichkeitsgeometrie
Vektorraum
Frequenz
Ordnungsreduktion
Abstand
Funktion <Mathematik>
Zahlenbereich
Stochastik
Dimensionsanalyse
Algebraische Struktur
Modelltheorie
Ordnung <Mathematik>
Topologische Einbettung
Kugel
Korrelationsfunktion
Statistik
Verschlingung
Mathematik
LOLA <Programm>
Term
Frequenz
Ereignishorizont
Arithmetisches Mittel
Mereologie
Körper <Physik>
Kantenfärbung
Modelltheorie
Maßerweiterung
Nominalskaliertes Merkmal
Korrelationsfunktion
Subtraktion
Krümmung
Wasserdampftafel
Abgeschlossene Menge
Unrundheit
Mathematik
Ähnlichkeitsgeometrie
Statistische Hypothese
Übergang
Koeffizient
Rangstatistik
Abstand
Indexberechnung
Ganze Funktion
Korrelationsfunktion
Dimension 2
Sterbeziffer
Strahlensätze
Mathematik
Kurve
Ähnlichkeitsgeometrie
Vektorraum
Frequenz
Menge
Mereologie
Kantenfärbung
Modelltheorie
Rangstatistik
Grothendieck-Topologie
Stochastischer Prozess
Evolute
Mathematik
Unrundheit
Extrempunkt
Modelltheorie
Nominalskaliertes Merkmal
Stochastischer Prozess
Übergang
Mathematik
Einheit <Mathematik>
Acht
Mathematik
Analytische Menge
Element <Mathematik>
Modelltheorie
Tonnelierter Raum
Korrelationsfunktion
Sterbeziffer
Gruppenoperation
Familie <Mathematik>
Zahlenbereich
Geräusch
Polarkoordinaten
Mathematik
Statistische Hypothese
Bilinearform
Term
Statistische Hypothese
Konstante
Hamilton-Operator
Stochastische Abhängigkeit
Sterbeziffer
Mathematik
Ähnlichkeitsgeometrie
Gleichheitszeichen
Frequenz
Gesetz <Physik>
Konstante
Differenzkern
Sortierte Logik
Ordnung <Mathematik>
Modelltheorie

Metadaten

Formale Metadaten

Titel Analyzing paradigmatic language change by visual correlation
Serientitel The Leibniz "Mathematical Modeling and Simulation" (MMS) Days 2018
Autor Frankhauser, Peter
Mitwirkende Leibniz-Institut für Oberflächenmodifizierung e.V. (IOP)
Leibniz-Institut für Troposphärenforschung (TROPOS)
Lizenz CC-Namensnennung - keine kommerzielle Nutzung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/35364
Herausgeber Weierstraß-Institut für Angewandte Analysis und Stochastik (WIAS), Technische Informationsbibliothek (TIB)
Erscheinungsjahr 2018
Sprache Englisch
Produktionsort Leipzig

Inhaltliche Metadaten

Fachgebiet Informatik, Mathematik
Abstract Paradigmatic language change occurs, when paradigmatically related words with similar usage rise or fall together. Such change is the rule rather than an exception. Words rarely increase or decrease in isolation but together with similar words. In the short term, this is usually due to thematic change, but in the longer term, also grammatical preferences change. We present an approach to visually explore paradigmatic change by reducing the dimensionality of and correlating the two main factors involved: Frequency change and distributional semantics of words. Frequency change is reduced to one dimension by means of fitting the logistic growth curves to the observed word frequencies in fixed intervals (e.g. year or decade). Semantics of words reduced to two dimensions such that words with similar usage contexts are positioned closely together. This is accomplished by reducing the very high dimensional representation of word usage context in two steps. Neural network based word embeddings and t-Distributed Stochastic Neighbour Embedding.

Zugehöriges Material

Folgende Ressource ist Begleitmaterial zum Video

Ähnliche Filme

Loading...
Feedback