Pushing the frontiers of Information Extraction
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 84 | |
Autor | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/32449 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
FrOSCon 201635 / 84
2
4
5
7
11
16
25
26
27
28
29
30
31
32
34
36
37
40
41
43
46
47
48
49
50
52
55
56
58
59
62
63
64
65
66
67
68
69
70
71
72
73
75
76
77
79
82
83
84
00:00
PunktZahlenbereichSchnittmengeAnalysisProzess <Informatik>MaßerweiterungVersionsverwaltungTelekommunikationBeobachtungsstudieComputerspielInstantiierungInhalt <Mathematik>Wort <Informatik>BitOrtsoperatorSoftwareentwicklerLeistungsbewertungExogene VariableAuflösung <Mathematik>HypermediaGruppenoperationParametersystemSystemaufrufCASE <Informatik>InformatikNatürliche SpracheInterpretiererPerspektiveInformationFormale SpracheVerkehrsinformationReelle ZahlDateiformatEinfügungsdämpfungProjektive EbeneEndliche ModelltheorieStatistikMusterspracheSpektralfunktionSchlüsselverwaltungEinfach zusammenhängender RaumObjekt <Kategorie>Attributierte GrammatikUnternehmensmodellBildschirmmaskeInformation ExtractionFunktion <Mathematik>EnergiedichteElektronisches ForumQuantisierung <Physik>SichtenkonzeptRechenwerkRegulärer GraphKongruenzuntergruppeNatürliche ZahlQuellcodeUmwandlungsenthalpieForcingRelativitätstheorieTaskNegative ZahlMultiplikationsoperatorSerielle SchnittstelleDatenstrukturURLSoftwareGrundraumCoxeter-GruppeFuzzy-LogikWinkelInformation RetrievalSchreiben <Datenverarbeitung>DigitalisierungOrdnung <Mathematik>NeuroinformatikKontextbezogenes SystemRahmenproblemDifferenteFormale SemantikXMLUMLVorlesung/Konferenz
09:45
Rechter WinkelAlgorithmische LerntheorieFluktuations-Dissipations-TheoremSystemaufrufCASE <Informatik>SprachsyntheseInformationPhysikalische TheorieAggregatzustandGeradeNeuroinformatikEndliche ModelltheorieStrömungsrichtungGarbentheorieFormale SpracheUmwandlungsenthalpieNatürliche SpracheVersionsverwaltungGruppenoperationWort <Informatik>Arithmetischer AusdruckChatten <Kommunikation>HilfesystemFormale GrammatikComputersicherheitQuick-SortAlgorithmische ProgrammierungAussage <Mathematik>Arithmetisches MittelDifferentePerspektiveExploitBruchrechnungDatenverwaltungForcingCoxeter-GruppePhysikalischer EffektAnalysisWeb-SeiteOrdnung <Mathematik>RechenschieberARM <Computerarchitektur>Kontextbezogenes SystemRelativitätstheorieTaskMinkowski-MetrikPunktGüte der AnpassungGradientenverfahrenIterationElement <Gruppentheorie>MereologieKonditionszahlHypermediaBitInterpretiererLesen <Datenverarbeitung>ZahlenbereichEinsComputervirusSchießverfahrenMathematikVorlesung/Konferenz
19:23
Algorithmische LerntheorieMarketinginformationssystemTypentheorieOrdnung <Mathematik>TermMultiplikationsoperatorHypermediaOpen SourceDifferenteZahlenbereichAusdruck <Logik>TVD-VerfahrenSystemaufrufAlgorithmusAnalysisWellenpaketRechenwerkWurzel <Mathematik>TelekommunikationWort <Informatik>FrequenzGruppenoperationInverseVersionsverwaltungQuick-SortStatistikToken-RingSpeicherabzugArithmetisches MittelNeuroinformatikInformationArray <Informatik>BruchrechnungProzess <Informatik>Formale GrammatikAggregatzustandLogarithmusKugelMereologieVirtuelle MaschineAlgorithmische ProgrammierungDatensatzNormalvektorDickeNormalformSchnittmengeNatürliche SpracheComputerlinguistikMailing-ListeGemeinsamer SpeicherTotal <Mathematik>RechenschieberPartikelsystemGesetz <Physik>Natürliche ZahlSchlussregelTrennschärfe <Statistik>Monster-GruppeRandverteilungCASE <Informatik>MinimalgradReibungswärmeKonditionszahlGerade ZahlMittelwertEinfach zusammenhängender RaumPhysikalischer EffektBildverarbeitungRechter WinkelSoftware Development KitWeb SiteEndliche ModelltheorieBitrateVertauschungsrelationEigentliche AbbildungGeschwindigkeitVorlesung/KonferenzComputeranimation
28:10
SprachsyntheseCASE <Informatik>Kopula <Mathematik>ZahlenbereichSystemaufrufKlasse <Mathematik>MultiplikationsoperatorDifferenteWort <Informatik>EntscheidungstheorieAmenable GruppeOrdnung <Mathematik>Formale GrammatikTopologieSupport-Vektor-MaschineInformationObjekt <Kategorie>TaskSchreib-Lese-KopfSoundverarbeitungArithmetisches MittelMereologieAlgorithmische LerntheorieInverser LimesFormale SpracheOrtsoperatorGruppenoperationMathematische LogikStützpunkt <Mathematik>Arithmetischer AusdruckBruchrechnungTwitter <Softwareplattform>Uniformer RaumResultanteStichprobenumfangExpertensystemBitGewicht <Ausgleichsrechnung>TelekommunikationSchaltnetzAlgorithmusUmwandlungsenthalpieRechter WinkelNatürliche ZahlMailing-ListeGrundsätze ordnungsmäßiger DatenverarbeitungPhysikalischer EffektHilfesystemGeradeAbstandRechenschieberDemoszene <Programmierung>Kategorie <Mathematik>Shape <Informatik>Array <Informatik>Quick-SortPrimidealSyntaktische AnalyseVirtuelle MaschineNegative ZahlNichtlineares SystemEinsKugelComputeranimation
36:58
Quick-SortNatürliche ZahlAnalysisSchnittmengeResultanteQuadratzahlDatenverwaltungKurvenanpassungUmwandlungsenthalpieTaskInformationGruppenoperationWort <Informatik>TelekommunikationKategorie <Mathematik>Formale SpracheSystemaufrufRandverteilungFormale SemantikMultiplikationsoperatorAlgorithmusMailing-ListeGeradeHilfesystemLinearisierungArray <Informatik>BruchrechnungMultiplikationInstallation <Informatik>PunktIdentifizierbarkeitAbstandDisjunktion <Logik>Hinterlegungsverfahren <Kryptologie>SichtenkonzeptPrimidealRahmenproblemQuellcodeEndliche ModelltheorieSchaltnetzDifferenteAuflösung <Mathematik>Klasse <Mathematik>Komplex <Algebra>Objekt <Kategorie>CASE <Informatik>Natürliche SpracheLogischer SchlussVollständiger VerbandFlächeninhaltInformationsüberlastungRechter WinkelVarianzMittelwertGemeinsamer SpeicherSchreib-Lese-KopfAbenteuerspielMetropolitan area networkProgrammschleifeKomplementaritätE-MailSupport-Vektor-MaschineKonditionszahlVersionsverwaltungStellenringVerschlingungVorzeichen <Mathematik>ReibungswärmeBildschirmmaskeMathematische LogikInstantiierungLeistung <Physik>Computeranimation
45:45
MittelwertCASE <Informatik>AlgorithmusBereichsschätzungGrundraumSoftwaretestDatenbankMultiplikationsoperatorWellenpaketAnalysisProgrammbibliothekGruppenoperationResultanteEndliche ModelltheorieSystemaufrufEinflussgrößeZahlenbereichReelle ZahlComputerspielVerkettung <Informatik>ProgrammierumgebungRechter WinkelSoftwareentwicklerKartesische KoordinatenRechenschieberMomentenproblemGefangenendilemmaVersionsverwaltungVirtuelle MaschineAdditionTaskFitnessfunktionÄhnlichkeitsgeometrieStandardabweichungFormale SpracheMereologieMetrisches SystemPunktMultigraphInstantiierungQuick-SortMAPServerDifferenteWasserdampftafelWort <Informatik>BildschirmmaskeWechselsprungQuaderFreier LadungsträgerSprachsyntheseEinsBruchrechnungTwitter <Softwareplattform>InformationKategorie <Mathematik>GRASS <Programm>Algorithmische ProgrammierspracheXMLComputeranimation
54:33
PunktDifferenteCASE <Informatik>SystemaufrufGruppenoperationTaskResultanteReibungswärmeSprachsyntheseAuswahlaxiomOrtsoperatorFormale SpracheMultiplikationsoperatorSoftware Development KitEndliche ModelltheorieRechenwerkMereologieTopologiePerspektiveUmwandlungsenthalpieKernel <Informatik>EntscheidungstheorieAlgorithmische ProgrammierspracheTouchscreenRechter WinkelFlächeninhaltGanze FunktionMAPGeradeAuflösung <Mathematik>VarianzZellularer AutomatBitrateGewicht <Ausgleichsrechnung>DatenstrukturAggregatzustandSchießverfahrenInformationProjektive EbeneInstantiierungMultipliziererRelativitätstheorieStrategisches SpielElektronische PublikationWort <Informatik>AnalysisKontextbezogenes SystemCoxeter-GruppeFormale GrammatikE-MailWellenpaketAdressraumMaßerweiterungProzess <Informatik>Natürliche SpracheData DictionaryQuellcodeQuick-SortFehlermeldungEigentliche AbbildungHypermedia
01:03:20
Computeranimation
Transkript: Englisch(automatisch erzeugt)
00:07
Thank you very much for being here, I hope this microphone is on, so whoever is recording will be recording this fine. So also thanks for everybody watching this.
00:20
What we are going to talk about today is a perspective on text extraction, information extraction which tries to get a little bit out of the NLP, the linguistic perspective and also look at it from a point of view of the social sciences of what we are doing with the text extraction when we are processing large amounts of text.
00:45
Because our starting point was that in a way text has always been a key information source. If we are interested in very many things we turn to text, if we just want to know what we know about things, what other people have already found out about things, we turn
01:01
to text and all those books are by now digitised. If we want to know what's going on in the world we turn to text, if we want to understand how people are discussing things we can turn to newspapers for instance that report about this stuff and all these newspapers are again now available digitally.
01:21
We can be interested in planning processes, we can be interested in argumentation processes, we can be interested in basically opinions that people have, ideas that people present, even the behaviour of people is nowadays partly recorded in a textual format.
01:45
So basically almost all kind of questions that we can have in the social sciences and also beyond the social sciences, very many questions that we have in real life are things for which we turn to text. And that of course means that the processing of all those texts in text analysis and information
02:02
extraction has been growing very fast and has been a very important thing to develop. And our starting point for this project or for this development here is that there is a little bit of a gap between what we normally can find easily by using the tools that are available and what we really would like to find.
02:23
So for instance if we look at existing technology and text extraction and information extraction, one of the things that we are pretty good at is finding stuff. Obviously Google made this a business model and became very rich doing this, but in very many cases also complex tasks for finding specific entities, for finding specific locations
02:46
in a text, that's something we are pretty good at. We can filter and aggregate a lot of things very efficiently. So for instance sentiment analysis has been developing quite a lot in the last years. There are summarisation techniques where I can find specific topics and see what people
03:03
are saying about those topics and basically write this down in a more efficient format. Very interesting, very useful. I can find bigger patterns where I'm not really interested in specific content anymore but more in the arrangement of content in the patterning of things and use this for instance to classify documents that show similar patterns, can use this to check whether
03:25
Shakespeare really wrote what Shakespeare says Shakespeare wrote. All these things are very fascinating but they are not normally the kind of questions that we ask when we approach text from a social science perspective. So for instance when we want to find things it does happen that we have a very specific
03:42
question where we just want to find a specific article and then Google Scholar is an awesome thing because it finds the article very well. But most often what we try to find are like fuzzy entities, arguments, higher order semantic constructs, kinds of ways of talking about current issues.
04:02
We very often don't know exactly what we are looking for, we know what kind of thing we are looking for and when we see it we can say okay that wasn't what I was looking for or that was. But those are things that are defined by the structure of text and not so much by the content and these we are much worse at finding so far.
04:24
If we want to summarise things, filter, summarise, aggregate things, we're normally not just interested in things like sentiment. And sentiment is nice, you know, knowing whether the mood on a forum or in the news or whatever is positive or negative is a nice thing to have. The problem is the mood in the news is always negative, that's how the news is.
04:43
So that doesn't really give us terribly much. So the interesting question is what is it more negative about? Under which circumstances does it become more negative? Also if we want to look at specific attributions, under which circumstances do which kind of people attribute what kind of qualities to what kind of issues or objects?
05:02
That's more kind of questions so we're looking for these connections between all the contents that we can find. And again this is something where existing technology has a tendency to stop at a certain point. It becomes difficult. And if we're looking for specific patterns and like regularities in text, we are not
05:23
normally interested when it happens again but we're not normally interested in classifying documents but we're normally interested in classifying things within documents that are not bounded in a particularly clear way. So the document as a unit or the paragraph as a unit is not really what we're interested
05:43
in if we're looking for arguments or if we're looking for things like frames as a context, a context dualisation of issues. If we're looking for pragmatic contents, which is what we're going to do today, what language does, what is being expressed in text as a request, as a call for action?
06:01
That is something that is not terribly easily recognised as a pattern within a given unit so we need to have a slightly different approach. So the starting point for this entire project was that there is a little bit of a gap between the existing tools that we have and the scientific inquiry trying to use those because
06:21
we can operationalise for instance actors as named entities to some extent but they're not really the same thing. There are a number of ways of referring to actors that aren't named entities and there are a number of things that are misclassified. That's a smaller problem, right? Sentiment is not the same thing as an evaluation. You can use positive sentiment words to pass a negative evaluation.
06:44
Irony would be a classic case but there are very many ways of doing this. Topic models get us beautiful things that look a lot like topics but they aren't quite the same thing as topics. If you read a text and you have the topic model extracted from a body of text next to it, you see there's a difference. It's not exactly the same thing.
07:02
So what we are interested in is trying to get these two things closer to one another and to find ways of bringing the technology that we have closer to the question that we want to really answer. So when we depart from NLP, natural language processing, which is if you want the
07:21
intersection of computer science, linguistics and statistics, and you see there is no social science in there quite yet, what we are trying to do is use the existing tools, these are some that we are using, and look at them from a perspective that takes into account the social science questions and in this case communication
07:45
research. In particular what we are going to present here today is questions that come out of a project that you already see in the logo. Infocore stands for informing conflict, prevention, response and resolution.
08:02
The role of media and violent conflict and basically it's a big political science communication, science conflict studies research network that comprises all those universities that you can see here. We are a big team. We are processing very many texts with questions like what kind of views do social media
08:23
texts present of the enemy? What kinds of solutions are being advocated in the newspapers? How are specific ideas for resolving a conflict discussed and possibly dismissed in a parliamentary debate? And you see these are things that are questions that can be answered by turning to
08:43
text, but they don't map quite so well on the tools that we have. And what we do here is that we pick out or maybe first we are looking at six conflicts here. Kosovo, Macedonia, Israel, Palestine, Syria, Congo and Burundi.
09:03
And we are looking at, as I already alluded to, texts that are very heterogeneous in nature. We have strategic communication outputs, so that's PR, propaganda, all that stuff that governments, NGOs and so on, turn out and try and make sense of conflicts, tell us what it's all about, what should be done about it. We look at news coverage of those conflicts.
09:22
So what is popularised as the kind of the mainstream interpretation or community specific interpretations of these conflicts? What's going on in social media? How do people respond to all these debates? How do they fuel their own debates? And what do they argue about? And political communication here is mostly the debate in parliaments and political forums.
09:43
So the idea is what happens with all that information that turns it into policy or into conflict prevention and management. And what we're particularly interested in here is calls for action, because you can easily imagine that in conflict research that is a very critical question.
10:03
What kind of solutions, what kinds of actions are being called for in a conflict? Are there newspapers that call up people to go take the knife and murder the neighbours? Are there situations where there is a widespread consensus in the news that we should all calm
10:21
down and not escalate a conflict and start finding a peaceful solution? Are there propositions in a debate that might turn into radicalised violent action, which again may fuel conflict? So these are all things that we're interested in. What are people saying that needs to be done about the conflict?
10:42
And that's basically something that we, from a social science perspective, we call calls for action. And we've brought you a short definition of calls for action here. Calls for action are an expression of a request or desire for a specific cause of action with the aim of changing the current improvable or undesirable state.
11:02
So if I'm unhappy with what is currently the case, what needs to be done to get to a state that is fine? Or maintaining the current desirable state against threatening deterioration, so we're fine, but if we don't do a certain something, then this might actually stop and we will have a problem, we will have conflict, so what do we need to do in order to either
11:23
achieve a desirable state, whatever that is, or to prevent an undesirable state? So these calls for actions consist of a definition of what needs to be acted upon, what is it that we are concerned about, something that we propose to act, a specific cause of
11:45
action, and the motivational force. So some of these elements are more easily mappable on linguistic theories or on tools than others, but this is basically our starting point, and I'll turn over here to Katja, who will explain to you what exactly we have been doing and where we stand.
12:05
Oh, the microphone is clearly working. There we go. Hello.
12:20
All right, good afternoon, everyone. So yeah, just to sum up a little bit what Christian just said, so we're interested in those sentences in texts that express calls for action, and we're aiming at developing an automatic tool that will extract them from text automatically.
12:41
We will use natural language processing techniques, namely machine learning. And it can be formulated as a classification task, which will be performed in two steps. In first step, we will extract sentences that will classify sentences as ones calling for action or not calling for action. And then in the second step, those that do call for action, we will classify them
13:02
based on what exactly is being called for. So I will walk you through all these steps. But first of all, let me give you a good sense of what exactly we mean by calls for action, what exactly we are looking for in texts. So calls for action, they can be expressed explicitly, like pretty much straightforward, or they can be hidden, and we are interested in both ways.
13:22
So let's first look at the explicit ones. There are a number of ways in natural language that express this sort of information, that call for action. First of all, it's with the means of specific words, like command, request, demand, right? Like these verbs, like very straightforward, say that something needs to be done.
13:40
As an example here, Chad has called for the humanitarian community to support the government in dealing with the influx of Nigerian refugees. We have the expression has called for or to call for in an infinitive, which is pretty much straightforward, says that something is being called for. Another way to express this information would be to use modal verbs that oblige someone
14:03
to do something. So for English, those would be must, need to, have to, ought to, should. It can be just like one modal verb. There can be negator, like should not. In that case, it will be a call for not doing something. So like if we have negators, it doesn't mean that there is no call for action. It is still there, but just that the call is for not performing specific things.
14:26
Another example when calls for action are still expressed explicitly, but it's a bit trickier because there are no specific words here, but it's in grammar. So like for English, like here it is like imperative mood, right? Like I guess most languages, to the best of my knowledge, all of the languages have
14:42
specific grammar ways to express imperative mood. In English, it would be omitting the subject and placing the verb in the first place, like fight them. Or another example, which is a bit more interesting, fire, which is ambiguous. And here we're starting dealing with the fact that natural language is very ambiguous.
15:02
So it can be a verb. It can be synonymous to shoot, right, like fire. So I call for someone shooting at someone. But I can also exclaim fire, meaning hey guys, there is a fire here, so we need to run away. It's dangerous, which will be a noun. And this information is actually available only from the context. So like right now, it's really hard to say what I meant. So we really need to know when the sentence was exclaimed, written, uttered, and what
15:25
is the context. Yes. We already did it to the optimal, yeah.
15:42
So probably, yeah, sorry. I don't know if we tried to not, no, it won't work either, isn't it? But the slides, the presentations will be available through the recording, right? So if you want to make a read up. Yeah, I'm so sorry about this.
16:00
So probably you have to listen to me a bit more. Yeah, so the next thing that we are also trying to catch, not only those calls for action that are very straightforward in the text, but also that are implicit, that were just hidden, meant. Like these are very common in parliamentary debates, for example.
16:21
Like politicians, they probably not always like to say directly, let's go and kill someone. But they like to say, OK, probably let's do something bad for someone. But then actually I didn't mean it. And then I meant it again. So in order to leave the room for different interpretations. For us, we still want to know about those calls for action. They're still interesting. So here I will show you where is the fine line, which is still considered to be
16:44
a call for action, which is not. Because again, if you address different linguistic theories, like speech act theory, basically every sentence can call for something. Like so we need to really draw a fine line for us and for computers to better understand what we're interested in and what we're not interested in.
17:02
So first of all, we consider those sentences that communicate the satisfaction with the current situation to call for something. Like the idea is that if someone is not happy with the current state of affairs, then most likely the person will want to do something, will ask someone to do something, will perform some action to get to better situation,
17:23
to better future, to get the desirable outcome. It can be expressed with specific verbs. For example, like as here with the words meaning condemn, blame. Or like set expressions, idiomical expressions can be used. Like we will not sit with folded hands. And actually this can be proved that these sentences indeed call for action.
17:44
We can rephrase them. We can rephrase them with the help of model verbs. Like we can say, this motion should not be happening. Or like someone should not do something. Where we get into a more classical way of expressing this sort of information. There are expressions that still use like the same lexical tools.
18:04
Like model verbs, for example. But they don't call for specific action. They just say, okay, something should be done. Some steps need to be taken. Some actions should be taken. So this is also very interesting for us. Like probably again in political communication, for example, no specific course of action
18:21
can be advanced. But still some, like their idea that some action will follow is there. And we're also interested in capturing this. This information can also be expressed with the help of questions. Like rhetorical questions. As here, can we accept such a treatment? Right, so asking can we accept it? Or shall we do something that this treatment does not happen?
18:42
Shall we, so we basically call for either those people who perform unfavorable action to change it, unfavorable treatment. Or we ourselves will do something to change it to a better situation. And the last example on this slide, it's like propositions about desirable states.
19:04
For example, peace is the only answer. So this is in a pretty hidden way calls for peace. So either we have a war currently and let's change it to the state of peace. Or the peace is already established. Let's just maintain it. So here we also have like examples that signal about this sort of information.
19:23
Like something is the only answer. So the only answer, the only solution to a problem. So hopefully you more or less get what we are trying to catch in the texts. You know what we are looking for. So the next thing to do is to, yeah, I want to introduce our data which we are working
19:46
on and the tools that we are using. So first of all, we borrowed this nice open source tool, AMKAT, which stands for Amsterdam Content Analysis Toolkit, which was developed in Amsterdam, obviously, in the University of Amsterdam by Walter van Attefeld and his team.
20:03
And we adjusted it to our needs. We put it in Jerusalem. We call it Jamcat, which has nothing to do with Jerusalem AMKAT server, but with Jamcat, actually. So we adjusted to our needs. We also stored our data there. It also hosts some other projects, not only Infocore, also Record.
20:22
Actually, it's open source. Everyone can get account there if Christian and I agree on that. So let's put all our data there. This is the overview of the corpus that we are using. Already Christian mentioned that we address to four different communications spheres, namely political communication, strategic communication, conventional media, and social media.
20:43
We have nearly two million texts currently. The most represented sphere is news, like conventional newspapers, basically. Then we have social media and strategic communication, parliamentary debates. So this is the corpus that we are working with, which we want to analyze, where we want to extract our cost fraction from.
21:05
Now, as I said, we want to use machine learning. So we want to train a statistical model, statistical algorithm, to do it automatically. The algorithm needs something to learn from. It needs labeled data. It needs training corpus.
21:21
That's what we also did. We crafted a call for action corpus. So we basically query the same four types of documents with a time range, January, March 2015, with the search terms war, violence, and conflict. So we got all possible texts. Then we split them into sentences, and each sentence was labeled accordingly.
21:42
Either it calls for action or no. And those that are calling, they were also classified, which will be the second part of my talk today. So we have 5,000 sentences currently labeled. Yeah, basically the shares are more or less the same as the overall corpus. The richest in terms of agenda calls for action is strategic communication.
22:06
Then it's news media and political communication, and the social media surprisingly express the least number of calls for action. And on average, calls for action comprise 30% of all texts. So 30% of our corpus is basically calling for something.
22:23
So this is what our algorithm will learn from. But those are just sentences, right? And computers, they somehow, unfortunately, are not very good at understanding words. They need numbers. Our sentences, they have to be reprocessed in a specific way to become readable for computers.
22:42
And yeah, probably don't say anything. So when we deal with machine learning, it means that we need to extract specific features, which will be used as learning material for the algorithm to label unseen data. And I will walk you through some steps, which let's say state of the art in natural language processing
23:01
and computational linguistics. Some of them we did perform, some of them we didn't. So the first one, it's called like n-grams extraction. So we will want to extract n-grams from our text. Probably the best way to explain what n-grams are is to give an example. So if you have an example sentence, frost corn meets science, and we want to extract unigrams where n equals 1,
23:23
we will also kind of call that we will be dealing with back of words. So we will basically have three tokens, frost corn meets science, like three words, a simple token. This will be the units of our analysis. If our n equals 2, then we will be dealing with bigrams, and we will have units as frost corn meets, meet science, and frost corn science. If we have n equals 3, then it will be frost corn meets science.
23:42
So the whole sentence will be treated as one unit of analysis. But still this seemed to be like just two words, right? Like probably the computer won't get, if it's like one word or two words. So we still need to get some scores. And we normally get TF-IDF scores,
24:02
which stand for terms, frequency, inverse document frequencies. Oh yeah, probably you don't really see the formula here, but it's like very easy, and it's like it will be the only formula for today. So no worries, I won't torture you with many of them. So in order to compute these scores, the formula consists of two parts. So the left part is like we need to get the number of occurrences
24:25
of each word and divide it by the total number of words in the sentence. And then we will multiply it. The second part is the number of sentences divided by the number of sentences that that term occurs. We take logarithm of that, we multiply those two parts, and we get the score.
24:42
So all those units of analysis from the previous slides, all our engrams, they will be represented as a vector of scores, TF-IDF scores. Why don't we deal with just occurrences? Well, there are good reasons for that. Because some words are more informative, some words are less informative, some words occur more often in the sentence,
25:01
and they might be less informative. Sentences can be of different lengths. So all those things they accounted for in this formula. Another three common normalization steps in the natural language processing are stop words removal, stemming, and limitization. So stop words removal means that we basically get rid of small words
25:24
that occur very often in text, but that don't bear any semantic information and probably they don't contribute to classification. Those are like words as like articles add their preposition particles, connectors like or, and some auxiliary verbs. So it's usually like when we're dealing with large text classification,
25:43
it really increases the performance. Like a second step is stemming. When we bring all words in our text to its stem. So as an example here, terrorism, terrorist and terrorize will be brought to terror. So we get rid of suffixes and keep the root only and analyze only the roots,
26:03
which also makes sort of sense, right? Because the core meaning is there in the root. And like when we want to decide whether the text belongs to the topic war or the topic sports, that should suffice. The third very common step is limitization, which is a softer version of stemming.
26:21
When we don't get rid of suffixes, but we just bring all the terms to its canonical normal form. So for example, we'll get rid of tenses, like M being where like all different variations of verbs will be brought to be or all nouns like we disregard if it's single or plural, we just use the plural. So document and documents will be limitized to document.
26:44
As I said, when we deal with the large text classification problem, this really helps. There are lots of papers, experiments showing that it holds. But in our case, as I said, we are classifying sentences, which is a very short piece of text, right?
27:01
And if you recall those slides where I was giving you examples of calls for action, you might have noticed that actually the call for action, sometimes this information is concentrated in words like request, but very often it is concentrated in grammar, right? So we need this grammar imperative. We need the connection between words.
27:20
So in our case, we did not perform any of the steps. We tried. It decreased the performance. And right now, I will try to give you the sense why it was important not to perform them. And moreover, to weight up this grammar information to give more importance to those tiny little sort of meaningless words,
27:43
we developed linguistic features, which we will be using together with all the previous. So we are using n-grams. In our case, we are using n-grams with n between 1 and 4. We computed TF-IDF scores for all of them. So this is our, let's say, the first set of features. Then on top of that, we developed 32 linguistic features.
28:02
I won't just list them, but I will also again give you a couple of examples to show why we developed them and why they are important, and actually, including them, improved the performance of the classifier. Well, there are lots of examples here. Probably, you will be able to read them somewhere, somehow between the lines, the flickers.
28:23
So two examples. He needs to help the refugees, and the needs of all the refugees cannot be met. Hopefully, by now, you're also almost experts on calls for action. And you agree with me that the first sentence, he needs to help refugees, does call for action, while the second one does not.
28:40
But we have the word needs in both cases. And it's very likely the classifier, just judging based on these two words, will assign the same label to these two sentences. Especially if we get rid of those small, tiny, and informative words as articles, it's very likely that they will be classified in the same way. So how can we deal with this problem?
29:01
And the answer is we need to have part of speech information. So we use part of speech tagger, namely we use Stanford Cornell-P, which performs not only tagging, but also it extracts other useful linguistic information like grammar, dependency tree, grammar parsing. And the sentence that does call for action needs is a verb,
29:20
while in the second sentence is a noun. So we use this information to disambiguate these two sentences. Another example, basically the same. The request to cease fire versus user's request is being processed. Again, request in both sentences. The first one is a verb. It does call for action. The second one is a noun. It does not call for action. And most likely, like given this information,
29:43
most likely our classifier will get the labels correctly. Now let's have a look at more tricky example. The text must be withdrawn versus it must be called. So we have must be and must be on the left-hand side and on the right-hand side. So like even if we use bigrams, we will still be dealing with the same entities.
30:04
All of them are verbs. So parts of speech won't help here. So what to do? And the answer is we have to look at the grammar relationship with the neighboring words. So we have to look which words are our hint verbs connected to. So in the first case, it is connected to past participle,
30:20
must be withdrawn. In the second case, it is connected to adjective. So the feature we will develop will basically try to capture those issues when the hint word is connected to a past participle or when a hint word is connected to an adjective. So checking for a part of speech of the dependent word basically. Similar example.
30:41
We must not lower our guard at any time. Prime Minister Manuel Vos told parliament, adding that serious and very high risks remain, versus this certainly must not feel so bad. The same example. We have must not in one sentence. We have must not in another sentence. So in order to disambiguate these two guys, we also have to look at words and specifically what verbs they are connected to.
31:03
So we have to restrict, we have to create specific list of verbs which will signal us about either positive cause for action or negative ones. Another example. To call for peace and a call for your sister or to call on the phone.
31:21
So probably right now you already know what the answer will be. In the first case, we're dealing with the verb. In the second case, like a call for your sister, even though we have the same combination of a word and a preposition, but we really need to know the part of speech. So the verb, the sentence with the verb will be calling for something and the other one won't. To call on the phone.
31:40
Here, the call is still the verb but disconnected to different prepositions. So again, having the information on the neighbouring words and word sort of relationship they are with our main word will answer the question, will solve the problem. This is the only answer versus the only answer I have is that I simply didn't know.
32:03
Here we have three words in both sentences that are the same. So even three grams won't help us here. But that's a hard case and probably not every classifier will be able to assign labels correctly. The features that are supposed to capture these cases, again, like we look at the position in the sentence.
32:23
So whether it is an object or a subject, which if it is an object or is it an object of the verb to be, which is called copula verb, or is it an object of a different verb. All that information comes in handy and hopefully our classifiers, like having all that linguistic knowledge will be able to make the right decision.
32:42
But this is already like a pretty hard case. The last but not the least on this slide, the time to stop killing is now versus it takes long time to stop killing, right? Time to time to, similar, the same expressions, different meaning.
33:01
And again, we look at the words that are connected to those, like for example, when the word time is modified by adjectives as long or short or right, then it's not calling for something, or it's very likely not to call for something. Or if time is an object of the word take or request, then again, it is not calling for anything.
33:20
But if we have just time to, and then especially if we have verbal complement following our hint noun, then we will very likely classify it as cause for action. So hopefully now you're convinced that for our task, it's a good idea to leave those small informative words and maybe even give a bit more weight to them.
33:46
The last thing I wanted to say about our features that we used, we also used the data origin, so where our data comes from, from which communication sphere as a feature. And the intuition behind is that language on Twitter, for example, and language in newspapers,
34:01
in parliamentary debates, they're very different. Like Twitter, you have limited lengths, people have lots of misspellings, acronyms, abbreviations. While, for example, we have lots of texts from British parliamentary debates where the language is very polite and very ambiguous, like every sentence is started with my honorable friend and so on and so forth.
34:20
So hopefully our classifiers will be able to capture the differences and make use of this information to make decisions. So we have our corpus, we have our features. Now let's move to the algorithms. There are numerous of them. In machine learning, I don't know very many of them.
34:42
Some of them perform better for specific tasks. We used three. If you ask me why only three, well, we read many papers who recommend that this algorithm perform the best for text classification tasks. We ran a couple of experiments, they supported the paper results. So indeed, like three algorithms, Naive Bayes, k-nearest neighbors,
35:02
and support vector machines perform the best for the task. I will walk you very briefly through all of them, through their logics. So Naive Bayes is very simple. Let's imagine that our task is to classify a text if it belongs to cars, sports, or if it's a detective story. So what Naive Bayes does, first it places a new item,
35:24
a new unseen piece of data into the class with the most probable label. So in this case, these are cars. Then it looks at the next word, and the next word is dark. Okay, if we have a dark, then this feature is very likely to occur in the class about detective story. So then probably the text does not belong to cars, but to detective story.
35:43
Let's look at the next word in the text, and that's football. Okay, probably no texts about cars and detective contain the word football, but very likely that many, many texts about sport will contain this word. So it means that the label sport is the most probable, and eventually the new text will be assigned that label,
36:02
because that feature was the most probable. That's how Naive Bayes makes decisions. The next algorithm, k-nearest neighbors, it works in the following way. It computes the distance to k-nearest neighbors of a new item, unseen item, and then it decides that this item belongs to the class
36:23
to which the majority of those neighbors belong to. So if the majority belongs to cars, then this new one also belongs to cars, and so on and so forth. The good thing about this classifier is that it can make decisions, like non-linear decisions. So as here, you might notice in the picture, it's like more or less shapes round,
36:42
so like when the data is not linearly separable, it can handle those cases. Unlike support vector machine, which is a linear classifier, so it works very well when our data can be separated by a straight line. It's also called large margin classifier,
37:01
and the intuition behind it is that if we have two data sets, or two classes of our data, probably you can catch it from this picture here. We can draw the line to separate our data in multiple ways, but it will try to draw it in such a way that the distance
37:21
between each data item and the line is maximized, so that when we have new unseen data, it's very likely that it will fall on the right label, on the right category. Even though it's linear classifier, but somehow it also performs rather well for non-linear problems, and like actually this is the classifier
37:41
that showed the best performance in our case, and that's the one that we are using for our task. Well, oh look, you can see it, right? Cool. The problem is fixed. So these are the squares for three classifiers with different sets of features. So here you have Naive Bayes, here you have support vector machines,
38:03
gain nearest neighbors, the top line is three groups of feature, then it's two groups of feature, and only TF-IDF. The best one, as I said, the SVM, it performs the best, reaching accuracy around 0.8 in most cases,
38:22
like sometimes it's less on some categories, sometimes it's a bit higher, but generally speaking it's the best one. And actually it performs rather well even without our linguistic feature. So one might ask why bother then, why developing those complicated features? And actually the answer is here, because it makes the algorithm more powerful.
38:40
I don't know, sort of short of time, so I will just briefly mention that if you try to draw the performance of your classifier in learning curves, this one, where we're using three sets of feature, you can see that two lines go like this. It means the classifier, and this one, they are not parallel, but the gap is very high.
39:02
So this one is the signals about high variance, while it means that it cannot generalize very well, and the new data cannot be classified as well. While here when we see the curves are going like this, this is a good signal. It means that our model is rather powerful, more powerful than this one. So actually including the linguistic feature makes sense.
39:23
That's the big problem that we sort of solved. We've learned to extract calls for action from text, but then I said it's also very interesting to know what exactly is being called for, right? It's very important to know whether we want to start a war, or whether we want to maintain peace. So the classification that we developed, it includes cooperative treatment,
39:43
which has two subcategories, either calling for peace or de-escalation, or calling for support, help, it can be financial support, and it can be humanitarian support. It is restrictive treatment, which has three subcategories, calling for violence solution for escalation of conflicts,
40:03
calling for punishment and investigation, like it can be all legal acts like sentencing, sending someone to court, to jail, or exclusion of protest, when we say, okay, let's ignore someone, don't pay attention to him. Then we have calls for not doing something.
40:23
As I also mentioned before, like we can call for, like probably better to draw an example. We must not lower our guard at any time, Prime Minister Manuel Valls told, right? So we must not lower, we must not do something. And this information can also be expressed with the help of verbs as condemned or worn.
40:43
Another category is like general calls for just doing something, or rhetorical questions when no specific course of action is mentioned, but just the idea that we need to do something. Another category is when sentences call for multiple actions.
41:01
Normally those are complex sentences with many clauses, where each clause calls for something different. So one clause can call for de-escalation, another for support, and another one for sentencing, and then saying, okay, but let's just do something. And the last category, rubbish bin, so to say, the other. Whenever we don't know where the sentence belongs to,
41:21
we just assign a label other to it. As in this example, for example, the militants who massacre school children, the headed soldiers, and attacked defense installations have surely committed war crimes and must be dealt with as such. So something should be done, but it's very hard to say what exactly, like even for a human being, we don't know exactly what is meant.
41:42
Obviously those people should not be glorified because they did bad things, but should they be killed, should they be sent to jail, or just do something, we don't know. So this is other. We also perform classification for this, but here we're not doing that great yet, unfortunately. Here you can see the results,
42:00
if you can see the results for the fine-grained classification with nine categories. Okay, you don't see it. Well, I will just say that the scores are very low. Normally they are less than 50%. The reasons are because it's natural language, it's very ambiguous, and even for human coders it's hard to assign correct label. We also don't have enough data, like sometimes it's just a dozen of examples.
42:22
But if we merge a couple of categories, and we deal only with four of them, then the scores are getting higher, so we have about 0.6 accuracy on average, so we have more of the data here, so probably we are improving. We hopefully will get good results one day,
42:40
and one of the answers that we are aiming at is to get more data. Hopefully it will help us in this classification task. So to wrap up, I'm finishing. Christian started with words that there is a gap between what we can do from technological point of view and what we need, what questions we have to answer in communication science.
43:12
Many things can already be done. But the problem is that many things that we are interested in,
43:21
they are hidden in the language. They are very ambiguous. And this is why the technology cannot help us here. What we tried to do here, and we reached certain results, we extracted cost fraction, which is very pragmatic, semantic, deep hidden in language information with certain achievements, but there are obviously many more things to do and room to improve.
43:47
Can cost fraction be used in other areas, not only in communication science? Obviously yes. For example, we can identify user requests. It's the same formulation, the same words.
44:03
We can generate automatic to-do lists, like imagine a situation where you come from vacation, you have hundreds of emails, and you don't have time to read all of them, and you are afraid to miss important tasks and then you have a to-do list generated, so you already know where to go, what time you have meetings, what tasks you have,
44:22
which is quite handy, right? We can use it for reviews analysis. For example, if we have an analysis review of a hotel, and the guest suggested to improve something, to improve the swimming pool, for example, managers can easily extract this information and act upon it.
44:41
Or we can use it in, for example, medical documents, so when doctors have lots of documents to process, they can just query them and extract what treatment was recommended with the same symptoms to fasten their work.
45:01
What could be very useful for communication science, what still can be done, to answer our questions? We can combine different NLP tools to answer our questions. For example, if we combine semantic roles labelling with some lexical tools as FrameNet or WordNet,
45:21
we can find evidential claims, claims about the truth about specific object entity. We can identify different frames if we have our calls for action, if we have the same evidential claims, if we have sources and semantic roles. We could, in theory, find casual links,
45:44
so again, using all those above-mentioned tools, we could find what caused what, so build those casual links, casual chains of things. And of course, that's not everything, that's some couple of ideas we came up with, so maybe you have your own, which can be also very interesting and valuable,
46:02
and we are open for them. So that's it, thank you very much. Hope you enjoyed it.
46:51
And in particular, addressing ethical dilemmas in medicine, for example, around cysteinesia. And their modus operandi at the moment
47:03
is to have trained researchers analyse texts which are found initially, obviously, by searching databases. But then the analysis is, thematic test analysis is done entirely manually,
47:21
obviously, that's expensive. If you'd like to be able to do that for all sorts of policy development, guideline development, that sort of thing, if one can imagine that it's going to be hard to persuade an awful lot of bodies to pay for that, if one had tools to support that at least, so you have some of the groundwork done
47:42
to do that, that would seem to be a great thing, I wonder if you see that as a fit for these methods. Yeah, I repeat the question. So the question is whether this technology or similar technology can support
48:01
in real life applications. And an example was research run in one of the medical universities in Germany, that there are lots of texts that currently are analysed and annotated manually, which is very expensive and time consuming, so whether we can use technology for that.
48:21
Yeah, I think it's very possible, especially when you said there are already lots of texts that are probably manually annotated, analysed, then yeah, basically all the algorithms that are there, they can be used out of the box, and the main problem is when it cannot be used is when we don't have labeled data. Well, this is not the case. Of course, you will have to spend some time
48:41
on training algorithms, adjusting them, but in principle, it is very possible. I think we can have very good results. I think the trick probably is that the technology can take part of the way, right? And for instance, extracting passages
49:00
that we know that contain relevant information, what needs to be done, for instance, ethical recommendations. That is something that we can do with a certain high level of confidence. So we will miss some, but we will catch most, and we'll reduce the time needed to find those instances quite a lot. And then it's another question
49:22
whether you also want to use the technology for the second step, that is reading those fragments and then deciding what exactly needs to be done, because there might be a point in having the additional nuance and precision that you can get with a human reader. But I mean, you know, we're working on that, and the basically it works the better. The fewer ambiguous cases of language use there are
49:43
in distinguishing different kind of recommendations, and the thing with war and conflict discourse is that things are awfully ambiguous because people strategically try to hide what exactly they are calling for, right? So if you have texts that don't try to do that, performance might be actually quite much better.
50:21
All right, the question was where exactly the linguistic features extraction is implemented. So we are using for the whole, as I mentioned, like the, let's say, the big environment for all our development is the Jamcat server, which is Python-based. For machine learning, so specifically for algorithm training, we're using Python library scikit-learn,
50:43
and the features, that's a separate model which was developed by us using Stanford Core NLP to extract all those features. So we have them, and we just like add them to our feature metrics basically in the stage of, yeah, when we extract features, other features as well.
51:12
So the question is if our training corpus of 5,000 sentences is big enough for the algorithm to learn, not really. Of course, like the more data we have, the better.
51:23
But for the task of just disambiguating between calling for action or not, it's fine. Like, I mean, we have about 80% of accuracy, which is good number. For more fun-grained classification, you probably didn't see, it's not enough obviously. But the increase of data can improve.
51:42
Like, because we started with a corpus of about 2,000 sentences, and the scores were lower. Now we have 5,000, the scores got higher, and hopefully soon we will have even a bigger corpus, like we have some human coders working on annotating. So yeah, I mean, it is enough, but the more data we get, hopefully the better results we will get.
52:09
With the graphs?
52:35
This one, right? So the question was to bring back this slide.
52:47
So you're interested basically like in test training split, right? So yeah, the question is how many sentences, which part of corpus was used for training, which one for testing? So for this experiment, it's 20% split, 20% for testing and 80% for training.
53:06
Okay, so apparently, like, the question is to elaborate more on the graphs.
53:24
Yeah, like what you can see, like, okay, you have three bars. The blue one is the score for calls for action identification, the red one is for not calls for action, and green is average, and then they have precision recall and F1 score, so pretty standard measurements for accuracy.
53:44
And let's just look at the top corner, like right top. So you see the score, so it means that the calls for actions, precision for calls for action is actually one. The precision for not calls for action
54:00
is around 0.7 and average is close to 0.8, while recall for calls for action is zero, recall for not calls for action is one, and average is sort of, so it's just the scores to show how well or how bad the classifiers perform.
54:32
So the question is whether we can decide in favor of different classifiers depending on which score is more important for us. In principle, yes.
54:40
So if our task is, for whatever specific task, we are only interested in calls for action, and we really want to have high precision of those, then yeah, of course we could use like the classifier that performs better for this specific task, or if we don't call about calls for action, and we're only interested in recall for not calls for action, then yeah, Naive Bayes will be the best.
55:00
But in our case, we're sort of interesting to get the highest scores for all the tasks, for all the entities. Yeah, so that's why basically for our experiment, we chose SVM. Yeah, and I guess the challenge here is
55:20
if you rely on the precision score here, the precision is high because it didn't catch terribly many, right? So you see the recall is bad for the calls for action, and that means that there are a few calls for action that were so clear that the Naive Bayes did catch them,
55:42
and they were right, but they didn't catch many. So maybe that's not the best strategy. So the SVM performed clearly better on the whole.
56:04
The question is about giving some examples that are hard to classify. Well, okay, you know, you would like one would think that like those ambiguous examples as fire can be hard to classify, but actually that's a good question because when I was preparing the presentation,
56:21
I looked into some misclassified examples, and most of the misclassification were caused by imprecisions by linguistic features. So when the Stanford Cornell P assigned the part of speech tag wrongly, or like some of the relations were not caught, very often those sentences are misclassified,
56:42
and then like, you know, like I went manually into my files, I changed the value for features, and then the misclassification was gone. But many of the examples like really, I could not understand for what reason classifier decided in favor of a wrong label. When like, for example, obviously must is there,
57:00
and it is obviously calling for something, but the decision was made as it was not calling for action. So this is something like definitely I will dig in deeper.
57:32
Yeah, the question is whether the tool so far works only in English, and considering that we have texts in quite a bunch of languages,
57:40
that's correct. We like the basically the entire project Infocool works in eight languages, and we have a huge dictionary that tries to extract something like 4,000 different kind of concepts that can be mentioned in eight languages. This is a lot of fun. This tool in particular has been developed in English so far. Most of the things that it uses
58:02
are in principle available for other languages too, right? If we have a proper tool kit for assigning part of speech, for extracting grammar information, there's no particular reason why we can't run that in Arabic too. We have not done that yet, but in principle the technology
58:22
is one that should be translatable to the extent that the languages you deal with have these features or the structures that we're looking at. There's always an adjustment obviously needed, right? If you have a language that for instance doesn't use like the separate words for connecting or for definite articles or something like that,
58:41
like for instance you have in Hebrew, you have the prefixation as a solution for many things that you need to adjust the way this is done. But in principle there's no particular reason why this needs to be restricted to English.
59:12
Underline their hidden texts. There might be a religious or poetic. So for example, if you have a call to deal with these people
59:21
as Judas, John, or Aeternus, you have to make one, Judas, to hold service. Another example would be in certain countries where speech is somewhat discouraged and social media is full of rather more elusive speech. And I just wonder how well you want techniques to deal with
59:41
those sorts of things. I mean, they must apply in some ways. So the question is how the technology deals with ambiguity, like things like referentiality, if you have referentiality, basically if you externalize parts of the information to something people are supposed to know or allude to things.
01:00:00
And I guess it depends partly on how exactly this is done. Like, you know, if you have in a training corpus with a lot of cases that are working roughly like that, then the classifier should have a pretty good chance finding it. Kind of refers to another project that I'm currently toying with that is not yet in this stage,
01:00:22
where the idea is to look at the history of the same discourse. Right, so basically the idea is, you know, say for instance, Katja had the example that those people need to, they are war criminals and they need to be treated as such, right? And if you have then in the history of the discourse, other texts that say that war criminals
01:00:40
need to be treated by, and then, whatever is the locally appropriate way, hanging, shooting, imprisoning, you know, there are different ways, pardoning. And you have this kind of information from the historicity of the discourse, then you can fill this in. But this is obviously a much more complex procedure that goes far beyond trigrams, right?
01:01:02
So it is something that we have on the screen, and there is work in that area, but it is far from being in a, we can present that this actually works stage. Just very briefly to add, that here, like we deal with a sentence, the unit of analysis, so we basically,
01:01:22
we don't look beyond the sentence, which is a downside, obviously, because many things are in context. And in principle, there are tools in natural language processing, like anaphora resolution, for example, which can help us identify what the pronoun refers to, but currently, no, it's not included yet, but of course, that would be cool to do.
01:01:43
The next thing then, of course, is like, you know, every additional tool that you plug in multiplies error sources, right? So you have the error source of the Stanford Core NLP, then you have the error source of the anaphora resolution, and if basically, by the precision rates of all of these individual tools, you already get to an overall precision rate of 0.5,
01:02:02
then it starts being useless. So there's a little bit, there's basically, there are some things we can do by combining these tools, but the price we pay for that is that we depend on the tools and their precision. So whether tools are not perfect, then adding one has a price.
01:02:28
There was comment that there is a lot of work on this in the digital humanities, and it's obviously, it's a lot of both philosophical and like, you know, just developing the perspective of how one can try and find all these things
01:02:41
in text work, dealing with irony, dealing with implicature, dealing with figurative speech, and also quite a lot of tools. So there's a lot of stuff there. There seem to be no more acute questions, so I guess let us thank you very much for being here
01:03:01
and for the discussion. It's been a very big pleasure, and if you have any further questions, later, whatever, this thing is online, Infocore is online. We have email addresses, and we're happy about questions, ideas, suggestions, and so on. So thanks a lot. Thank you.