We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Analyzing paradigmatic language change by visual correlation

00:00

Formale Metadaten

Titel
Analyzing paradigmatic language change by visual correlation
Serientitel
Anzahl der Teile
20
Autor
Mitwirkende
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortLeipzig

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Paradigmatic language change occurs, when paradigmatically related words with similar usage rise or fall together. Such change is the rule rather than an exception. Words rarely increase or decrease in isolation but together with similar words. In the short term, this is usually due to thematic change, but in the longer term, also grammatical preferences change. We present an approach to visually explore paradigmatic change by reducing the dimensionality of and correlating the two main factors involved: Frequency change and distributional semantics of words. Frequency change is reduced to one dimension by means of fitting the logistic growth curves to the observed word frequencies in fixed intervals (e.g. year or decade). Semantics of words reduced to two dimensions such that words with similar usage contexts are positioned closely together. This is accomplished by reducing the very high dimensional representation of word usage context in two steps. Neural network based word embeddings and t-Distributed Stochastic Neighbour Embedding.
KorrelationsfunktionLOLA <Programm>Abteilung <Mathematik>ModelltheorieMathematikKorrelationsfunktionKreuzkorrelationComputeranimationVorlesung/KonferenzBesprechung/Interview
MathematikModelltheorieÄhnlichkeitsgeometrieLOLA <Programm>ÄhnlichkeitsgeometrieGerichteter GraphMultiplikationsoperatorKlassische PhysikDistributionenraumArithmetisches MittelStatistische HypotheseStrömungsrichtungFundamentalsatz der AlgebraVollständigkeitRelativitätstheorieComputeranimation
ÄhnlichkeitsgeometrieMathematikLOLA <Programm>ModelltheorieTeilbarkeitDistributionenraumMathematikMultiplikationsoperatorGesetz <Physik>DimensionsanalyseOrdnung <Mathematik>KreuzkorrelationOrdnungsreduktionGrundraumComputeranimation
ModelltheorieMathematikLOLA <Programm>Logistische VerteilungKurveStrahlensätzeFrequenzÄhnlichkeitsgeometrieLemma <Logik>DistributionenraumParametersystemNominalskaliertes MerkmalOrdnung <Mathematik>Lineare RegressionGrößenordnungGraphfärbungPolynomBestimmtheitsmaßMultiplikationsoperatorZentrische StreckungKurveLogistische VerteilungExponentGüte der AnpassungAusgleichsrechnungFrequenzÄhnlichkeitsgeometrieEinflussgrößeMinimalgradFamilie <Mathematik>Prädikatenlogik erster StufeGruppenoperationQuadratzahlRichtungTermPotenz <Mathematik>EinsMathematikStrahlensätzeDimensionsanalyseComputeranimation
Funktion <Mathematik>VektorHausdorff-DimensionZahlentheorieDimensionsanalyseModelltheorieLOLA <Programm>MathematikStochastikEinbettung <Mathematik>Algebraische StrukturDivergenz <Vektoranalysis>AbstandModelltheorieNichtunterscheidbarkeitMultiplikationsoperatorDimensionsanalyseOrtsoperatorAlgebraische StrukturOrdnung <Mathematik>WahrscheinlichkeitsverteilungGewicht <Ausgleichsrechnung>StochastikFrequenzHausdorff-DimensionTopologische EinbettungNumerische MathematikDifferenteLokales MinimumOrdnungsreduktionÄhnlichkeitsgeometrieSterbezifferVektorraumGruppendarstellungRichtungSchnitt <Mathematik>HauptkomponentenanalyseMinkowski-MetrikDistributionenraumFamilie <Mathematik>AbstandGruppenoperationDivergenz <Vektoranalysis>ResultanteNormalverteilungt-VerteilungComputeranimation
LOLA <Programm>ModelltheorieMathematikEinbettung <Mathematik>StochastikAlgebraische StrukturHausdorff-DimensionVektorDivergenz <Vektoranalysis>AbstandKreuzkorrelationGraphfärbungKörper <Algebra>Arithmetisches MittelTermNominalskaliertes MerkmalComputeranimation
MathematikModelltheorieKreuzkorrelationEinbettung <Mathematik>LOLA <Programm>MultiplikationsoperatorMereologieMaßerweiterungStatistikTermVerschlingungEreignishorizontFrequenzLOLA <Programm>Computeranimation
ModelltheorieLOLA <Programm>MathematikKreuzkorrelationÄhnlichkeitsgeometrieKoeffizientRangstatistikSterbezifferStrahlensätzeKrümmungsmaßKurveStatistische HypotheseVektorraumRangstatistikKorrelationsfunktionDifferentialgleichungIndexberechnungLateinisches QuadratGanze FunktionDifferenteGraphfärbungMultiplikationsoperatorFrequenzAbstandMereologieAbgeschlossene MengeMathematikUnrundheitDimension 2WasserdampftafelComputeranimation
LOLA <Programm>MathematikModelltheorieInnerer PunktÜbergangÄhnlichkeitsgeometrieMengenlehreComputeranimation
LOLA <Programm>MathematikModelltheorieÜbergangComputeranimation
Kondensation <Mathematik>ParkettierungModelltheorieStochastischer ProzessARCH-ProzessMathematikLOLA <Programm>Stochastischer ProzessNominalskaliertes MerkmalEvoluteUnrundheitGrothendieck-TopologieComputeranimation
LOLA <Programm>ModelltheorieElement <Gruppentheorie>RelativitätstheorieMathematikTonnelierter RaumComputeranimation
ÜberschallströmungModelltheorieMathematikKonvexe HülleLOLA <Programm>EntropieRandverteilungAnalytische MengeComputeranimation
PolarkoordinatenLOLA <Programm>ModelltheorieMathematikSummierbarkeitStatistische HypotheseKonstanteSterbezifferGleichheitszeichenGesetz <Physik>Hamilton-OperatorStochastische AbhängigkeitKreuzkorrelationStatistische HypotheseMathematikKonstanteDifferenzkernSterbezifferTermÄhnlichkeitsgeometrieOrdnung <Mathematik>MultiplikationNumerische MathematikSortierte LogikModulformRauschenGruppenoperationFrequenzFamilie <Mathematik>Computeranimation
Vorlesung/KonferenzBesprechung/InterviewComputeranimation
Transkript: Englisch(automatisch erzeugt)
This is joint work, by the way, with my colleague at IDS, Mark Kupiec, and Ekita, who is at the Universitäte Saarlandes in South Britain. And the title is Analyzing Paradigmatic Change by Visual Correlation.
All right. What do we mean by paradigmatic change? Paradigmatic relationships are one of the fundamental relationships between words, identified originally by Ferdinand de Saussure in his still classical textbook, which
was published post-humanist. And words are paradigmatic related, basically, if they have a similar meaning. And you can detect this by analyzing the typical usage of words. So the context around the word.
And that's the distributional semantics hypothesis, that words that occur in similar contexts often also have a similar meaning. That includes classical synonyms, but also so-called co-hyponyms. So we will see lots of examples
of paradigmatic related words. For the sake of completeness, the other fundamental relationship between words is the syntagmatic relationship. And that covers words that typically co-occur together. So that covers what comes after another, et cetera.
And the hypothesis that we want to explore is that words with a similar usage context, that is, paradigmatic related words, rise and fall together. So they become more frequent, or they decrease in frequency over time together.
And the approach to do so is that we take into account the two fundamental factors involved. One is the frequency change, and the other one is the distributional semantics of words. And in order to visually correlate these two main factors, we need to do dimensionality reduction,
because you can only visualize as many dimensions at once as you all know. I will first describe the approach, then give quite a few examples, and have concluding remarks, of course.
And because Saussure is so central to the central relationship here, the paramagnetic relationship here is one of his quotes. Time changes all things, and there is no reason why language should escape this universal law. All right.
Let's look first at an example. This is taken from a corpus, which I will describe a little bit more in detail later, which covers Spiegel and Zeit, two important weekly magazines in Germany from 1953 to 2015.
And it shows that nominalization with, in this case, UNG, is very clearly, let me see, yeah, with the fitted curves, it's even more clearly, is very clearly
decreasing over time. And so we see here that these paradigmatically related words, from which you can expect that they also have a similar usage context, that they decrease together over time. We assume that this is probably even due to stylistic guides
that newspaper language should not be like scientific language and should avoid nominalization and rather use verbs to be more easily comprehensible. So how do we go about it?
To visualize frequency change, we use color. And for this purpose, we simply try to fit a logistic growth curve to the frequencies measured in fixed time intervals.
This is not an invention by us. That is just how decrease and increase of linguistic phenomena is usually modeled.
And in this case, the exponent here can be an arbitrary polynomial. Here it's only a first order polynomial. Then we have an intercept, which is not really important. We have a slope.
And this slope then characterizes in one parameter the frequency development. Technically, we do this with a generalized linear model of the binomial family. And then once we have the slope, it's an easy measure to simply map this one parameter
to a color scale, in this case kind of a rainbow color scale, so that now words with a similar slope, with a similar frequency development, even if they are on a different scale altogether, get simply the same or a very similar color.
The second thing, so here we have, again, some examples. Again, this nominalization with oom once on a frequency per million scale and once on the logit scale.
And there we see that it's simply a linear fit on the logit scale of the terms. All right, well, how good is this fit?
And how are the slopes distributed? We see that the majority of words don't have a very clear temporal development, in particular over a rather such a short period of just 50
years. But we do have some very clear rising and some very clear falling words. And the psoidal r square is whenever
there is no clear development in one direction, then we also don't get a very good psoidal r square. But when there is a clear development, we get really very good fits. So for example, the examples that we saw already constituted a pretty good fit. The fit, of course, improves if we increase
the order of the polynomial. So if it doesn't make much sense to go beyond degree 2, in this case, with a polynomial of degree 2, you can also model effects where something rises and then falls again, which is quite a common development as well.
And of course, the fit, if you introduce more parameters, you always get a better fit anyway. So how do we go about representing word usage in a few dimensions? So the problem here is the usage context of a word
is very high dimensional. Here we see that all these, so this represents the word two words before, this represents the word one word before,
and the same for the other direction. And the dimensionality of all these things is simply the number of distinct words, we call them types, in the vocabulary in order of 100,000, 150,000 or even more
if we provide some cut-off of minimum frequency. So the basic idea in a first step of dimensionality reduction is, and this was popularized rather recently
by a very successful and scalable implementation, word2vec, it is done by a guy at Google, is to simply train a back propagation network with some really clever sampling strategies
so that you then represent this high dimensionality construct via a hidden layer. And this hidden layer then typically has a dimension of 100 to 200, depending on how accurate you want to have it. And it's, of course, much easier than to calculate with 100 or 200 dimensions.
More concretely, we use in this case the so-called structured skip gram approach, that is we take into account the order of the context. The original implementation modeled the context simply
as a so-called bag of words, so it didn't distinguish how far the words in the context window were away. And I think, if I remember, we are using a context window of minus 5 to plus 5.
The other particular problem, because we are interested in modeling the semantic development of words over time, is that we need to coordinate the word embeddings for different time periods. And the approach that we use here
is that we start initially with a random neural net, and then we simply train it for the first period and use the resulting net as an input for the next period, et cetera, and thereby we have the neural nets coordinated
over time. There exist other approaches where you can do that expost, but this is what we chose to do. We still have too many dimensions to do something with respect to visualization, to visualize it. So in the next step, we take these 100 or 200 dimension
and apply a rather popular, in visualization circles, rather popular dimensionality reduction technique, particularly suitable for reducing to two or at most
three dimensions. It's called t-distributed stochastic neighbor embedding. And the basic idea is that you calculate the probability of a word being similar to another word in a high dimension.
xi and xj are now the vector representations of the context of the word in 100 or 200 dimensions. And then you seek another probability distributions on two dimension. In this case, this is a t-distribution.
This should be a Gaussian distribution. And such that the Kullback-Leibler divergence between these two probabilities distributions is minimal. This is one aspect of this particular dimensionality
reduction technique is, and that should be a caveat in interpreting the results, it does not preserve global structure, but it preserves local structure. So it is optimized in a way that similar words should also still be close in the low dimensional space.
But long distance words can then be more or less arbitrarily positioned. This is different from, for example, principal component analysis where global structure is also preserved.
Here we see the original example in this now particular visualization. This snapshot is from 1955. This snapshot is from five years around 1955. This snapshot is five years around 2015.
And color, greenish, bluish means decreasing. And reddish means increasing. And we see also at first glance
that here we have many more terms. So not only have these consistently, they are similar colored. And not only have they been consistently going down, but of course also that whole semantic field was more diversified. It had more different words. It was a more productive originally
than it is now in actual usage. Um setzung is one example. A height, interestingly. Well, that's not a nominalization with um. So red is rising.
And green, blue is decreasing. Here a short summary on the corpora on which we applied this method. One is a very beautiful corpus that
has been compiled by my colleagues in Saarbrücken. It's the philosophical transactions of the Royal Society of London. In this particular investigation from 1665 to 1869, they are currently extending it because the Royal Society has donated them
with extensions beyond them. Then already mentioned Spiegel Zeit. And the biggest corpus is the Deutscher reference corpus, term and reference corpus, DERICO. That is the main corpus of written language
that is compiled by the IDS, my institute. Here we use time slices of 10 years for this longer period, here five years. And DERICO is only 2000 to 2015. And only the news part of DERICO.
But here we have 18 billion words, whereas in Royal Society, for example, we have only 35,000 words. Those data here, I will come to that later.
Those are some derived statistics. Everything is available, so under this URL you will find a link to all these individual visualizations so you can play around with it. This is how the visualization looks completely zoomed out.
This is of the Royal Society corpus we see here. And this is now the time slice which covers the entire Royal Society corpus. One can then navigate on this chart
to different time slices to also see how the landscape changes. That bluish region here is simply not English, it is Latin. We have also a French region because it's a multilingual corpus. And on this side you can always look at the frequency
development together with the curves for selected words in the corpus. Here is also a nice example of trial spelled with a Y, being slowly substituted with trial being spelled with an I.
One can then also zoom in, but what one also sees here is already an indication of this hypothesis that paradigmatically related words, that is close words in this visualization, rise and fall together.
The colors here are not uniformly distributed, not randomly distributed. We have very clear islands of reddish or greenish colorings. We can also look at this more formally
by analyzing whether there is a correlation in the slope of a frequency between very similar words. So what we've done here is for the individual corpus,
here is Royal Society, Spiegel Zeit, and DERICO. We have simply computed the nearest neighbor, the second nearest neighbor, and the third nearest neighbor in the vector space, not on the two dimensional space, but the more accurate vector space. And then applied the Spearman rank correlation.
And we see here, for example, the slopes of the nearest neighbors in RSK have a really fairly high correlation of 0.77. It's less evolved in Spiegel Zeit and even less in DERICO.
We also see that, at least in these two corpora, the correlation is strongest always for the slope and next strongest for the curvature, so the second order. And we also see that it decreases with increasing
distance between the neighbors. So that basic hypothesis also has some statistical background. Another thing that we can do is, if we look now at different snapshots of the Royal Society
corpus in 1660, 1700, 1740, and so on until 1860, we see a very clear diversification of vocabulary in this corpus. Well, science and language for science
exploded in this time, and also thematically, so it simply needed more vocabulary for that. And of course, the later periods then can turn more red-colored words than the early periods.
And we also see at one glance that Latin died out as being part of this particular publication. Here are a few nice examples to give a better understanding of what levels of paradigmatic similarity
can be seen via this visualization. That's one of the favorite examples of my linguistic friends at Sartre Brücken, because they're more interested in grammatical phenomena than thematical phenomena. It's a very clear decrease of wh adverbs.
So in the early stages of the Royal Society, we had all these clause after clause, very, very long sentences which were typically connected with this wherewith, thereafter, and so on. So all these clearly go down.
Another grammatical example is that present tense goes down, alleges, employs, delivers, et cetera. Whereas passive or past, at this level we cannot really say whether this is a passive construct or a past tense
construct, go up, shown, recognized, et cetera. Grammar style, I like this. Person adjectives, like industrious, expert, learned, honest, typically kind of epitheta or nancia,
so reference to, honorable reference to a person. They go down. Whereas, for example, process nouns, evolution, accumulation, calcification, they go up.
This is, by the way, really completely the other way around as in the Spiegelzeitkorpus. That's nominalization in the development of scientific language. Which is a very important grammatical style phenomenon. A good deal of this paradigmatic change,
parallel change, of course, is simply due to thematic reasons. If a theme becomes important, then it needs a vocabulary of paradigmatic related words. But also that can be interesting, like adjectives of provenance, like Italian,
English, British, French, German, et cetera, they go down. And chemistry, these are all chemical compounds and elements go clearly up. Even within one theme in medicine,
there is a clear trend from describing symptoms. These are all symptoms. They all go down to, in this case, I called it anatomy, to a more analytical approach towards medicine than just empirical observing and describing
symptoms. One can also ask the other question. That is, what are nearest neighbors that supersede each other?
So that go down when the nearest neighbor goes up. Like, bigness clearly becomes an obsolete term and is sort of substituted with size. Or one of my favorite examples is truths go down
and facts go up. Well, at least in this period, it might go the other way around, too. Multiple truths, whatever. Plenty full abundant. Well, all these examples make sense. And it's just a nice way to use the numbers that we
have to find out these things. In Spiegel Zeitkorpus, we see that mostly it is relating to named entities, like Carter's Obamas
or Herberger Klinsmann, similar roles, or Rhodesian Zimbabwe. One other thing is that, except for the named entities, like in the Royal Society corpus, we see that the parallel change is typically co-hyponyms, not
synonyms. Whereas the opposing change typically concerns kinds of synonyms. Language doesn't like synonyms. It often chooses one term over the other.
OK. And then this is clearly mostly thematic. But it's also nice to see that Windows is superseded by Android.
And Neuferschuldung is going down. Schuldenbremse goes up. We all know that from our academic institutions, that this has a prospective flyer. Or also interesting, sebstmot zu suitsied, sebstmot as a non-neutral term, to suitsied as a, well,
at least currently more neutral term perceived. Yeah. That was it already. Summary. We have seen, I hope, that paradigmatically related words indeed rise and fall together, and that it's
nice to visualize this in order to explore a corpus for such phenomena. That basic insight isn't anything new. Lehrer, already based on lexica, formed this hypothesis of parallel change.
Then we have a constant equal rate hypothesis, et cetera. Yeah. Thanks for your attention. Thank you very much.