Language Processing Pipelines Part 3
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Part Number | 3 | |
Number of Parts | 3 | |
Author | ||
License | No Open Access License: German copyright law applies. This film may be used for your own use but it may not be distributed via the internet or passed on to external parties. | |
Identifiers | 10.5446/46148 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2019 | |
Production Place | Dubrovnik, Croatia |
Content Metadata
Subject Area | ||
Genre | ||
Keywords |
00:00
Mathematical analysisPattern recognitionNumberData typeFormal languageFunction (mathematics)Rule of inferenceOpen sourceSuite (music)ParsingRange (statistics)Forschungszentrum RossendorfFile formatPressureParticle systemFrequencyForm (programming)InfinityExecution unitMeasurementWritingFinite element methodLine (geometry)Protein foldingOrdinary differential equationPlot (narrative)CAN busRepeating decimalLogicFormal languageGraph (mathematics)InformationComputational linguisticsTheory of relativitySoftwareSpeech synthesisParsingFunction (mathematics)Type theorySemantics (computer science)Group actionLemma (mathematics)MereologyElasticity (physics)Mass storageKnowledge baseKernel (computing)WordPredicate (grammar)SyntaxbaumFinite-state machineRepresentation (politics)Uniform resource locatorRight angleMathematical analysisMetropolitan area networkNeuroinformatikXMLComputer animation
03:07
Formal languageMathematical analysisWritingFunction (mathematics)Pattern recognitionNumberGraph (mathematics)Open sourceSuite (music)Data typeTap (transformer)Computer configurationRule of inferencePRINCE2Graph (mathematics)ParsingArithmetic meanBitForcing (mathematics)Group actionProcess (computing)Point (geometry)WordMultiplicationTranslation (relic)Module (mathematics)Computer animation
04:36
SummierbarkeitFile formatData typeGraph (mathematics)Revision controlComputer animation
04:45
Computer configurationAtomic numberFormal languageVacuumData typeMathematical analysisFunction (mathematics)Pattern recognitionNumberGraph (mathematics)Self-organizationWritingOpen sourceSuite (music)Plane (geometry)ParsingRule of inferenceTap (transformer)Planar graphACIDNormal (geometry)Mathematical singularityCodierung <Programmierung>Formal languageGraph (mathematics)ParsingFunction (mathematics)Form (programming)NumberProcess (computing)Degree (graph theory)Validity (statistics)Task (computing)VotingAcoustic shadowComputer animation
06:28
Data typeView (database)Slide rulePlastikkarteoutputInclusion mapProcess (computing)Demo (music)Universal product codeSimulated annealingNormed vector spaceBuildingExecution unitEmpennageSource codeXMLComputer animation
06:38
Finitary relationSelf-organizationToken ringInclusion mapLemma (mathematics)MaizeMeta elementRepository (publishing)Service (economics)Formal languageInformationExecution unitTask (computing)Revision controlFormal languageBuildingArithmetic meanUniverse (mathematics)ChainNumberBasis <Mathematik>Process (computing)Point (geometry)Electronic mailing listSoftware frameworkAddress spaceDifferent (Kate Ryan album)Service (economics)Repository (publishing)MathematicsState of matterLine (geometry)Independence (probability theory)Source codeMultiplication signComputer animation
09:16
Repository (publishing)Logic gateSoftwareComponent-based software engineeringFormal languageSoftwareBuildingProduct (business)Electric generatorLogic gateChainMereologyIndependence (probability theory)Artificial neural networkProcess (computing)Set (mathematics)WordReading (process)Module (mathematics)Software frameworkMathematicsUniverse (mathematics)QuicksortAreaDifferent (Kate Ryan album)Computer animation
10:50
Formal languageNatural numberProcess (computing)MathematicsHookingFingerprintNatural languageProcess (computing)MathematicsXML
11:09
Computer animation
Transcript: English(auto-generated)
00:00
Of course, you can process much larger chunk of text, but for this, okay, so I say I want dependency parsing. And you have language auto-detect, okay? Not secure, so what?
00:21
You get analysis. So you have a sentence here with lemmas in blue and part of speech and MSD tagging, and you have parse tree here. As you see, dependency parse tree. But this parse tree can be then listed in a kernel format, which is common in computational linguistic NLP people.
00:45
And here you also see that the name entities are recognized, Paris as location, UNESCO as location. So, no, person, UNESCO is person, so that's a problem. So UNESCO is not really being recognized as it should be.
01:03
Or you can get also very detailed XML output as this one looks like. But this is only one type of selected output. So if I go to semantic graph, I will receive much more details.
01:24
So you see the dependency syntactic relations are up there and the semantic graph is down there. And if you look at the kernel format, then you have additional information. So these are like syntactic roles
01:41
and these are word senses annotated following the knowledge base. So these are word senses here. And on the right hand side you have semantic roles also recognized automatically.
02:01
But semantic graph gives you something like that. So you have like four main predicates in this recognized. And you see the relations in semantic graph looks like that. So you have hold, meet, that's derived from the noun meeting, but you have a underlying semantic action to meet.
02:23
You have prepare and you have undertake, which is next to undertake next and so on. And you see the AM1 actually is, and so no, this is temporal relation and you have A0 is relation to its,
02:44
so it's meet and undertake and so on. So and I mean this graphical representation can be tweaked. I mean this elasticity of this network can be tweaked, but that's, it's just a question of how to show
03:01
what has been automatically analyzed out of this sentence. And then you can have the same sentence in Slovenian. So it's translated.
03:20
But there I'm afraid we will, we don't have some of the modules available. So multiple detector detection for Slovenian doesn't work. And then we'll see what else doesn't work. And then we'll switch that off. Quantities detection is not available.
03:41
So these are the modules that are missing. Name entity classification is not available. Something else is not available. Let me check. Oh, where it sends this in yellow, we got something. Okay. Yes, but we didn't get the semantic graph. It's very modest one, okay.
04:03
So to organize means meeting UNESCO, just that. But as you see, dependency parsing is working. And you have some word sense annotation, that's this one.
04:20
And if you click there, then you open the right sense in the Princeton WordNet. No, but it's a wrong word sense. So this is 3.1, and this was analyzed by 2.0. So it had to be translated and convert
04:41
from older version to the newer version. So it opened the wrong literal. And then if we add another sentence in Portuguese,
05:06
Diego, is this a valid Portuguese sentence? To the languages, semantic graph is not available for Portuguese, okay.
05:22
Not even dependency parsing. So, okay, full task parsing. I think the full parsing is available, no. Shadow parsing, okay, let's see that.
05:41
Okay, so as you see, this is constituency parsing. So this is not the dependency parse. So it's the different formulas, okay. And you get it in common format, which is useful for further processing, of course. Or XML output as well, which is even more detailed.
06:06
So that's what, Freeling actually took some of the background processes from X-like and developed and added that to a number of languages. So it's not a very exhaustive number of languages,
06:21
but still there are some of these are, and not all of them are covered with all modules. That's something what we would like to build up for sure. Well, I've shown that, and I will quickly end my talk with the following things.
06:42
Okay, oh, that's the XML version. Well, what you might find very useful in your future work and experiments is try to find existing solutions for some of the tasks that you would like to use or do.
07:03
So I'll give you a list of different language repositories where you can find a lot of things. I mean, NLP community is already quite mature. You can find tools and services for almost every task you need, for almost.
07:22
That doesn't mean that that will be already done for language that you want to cover, but then you can find it for another languages, for English at least, and see how this can be ported maybe. Maybe that solution is language independent, but no one has ever tried it on a different language. So one of the things would be
07:41
European Research Infrastructure, CLARIN ERIC. Particularly, you can look at linguistic processing, the framework for building online, building graphically, building linguistic processing chains that has been developed, which is called Weblichte. It's been developed by a number
08:02
of German universities together, but the entrance point is in Tübingen. The address of this European Research Infrastructure is clarin.eu, and another repository, which is actually a federation of European repositories for different language technologies, resources, and tools
08:26
is called Metashare. And the next one is European Language Resources Agency, ELRA, which actually,
08:41
these two are mostly for free. In ELRA, you will have to pay for some of resources, and in ELRA, you will predominantly fund resources, not so many tools. But Linguistic Data Consortium is an American repository at University of Penn State,
09:00
and then you will, there you can find also some resources that come for free, and some are on a pay basis. But I think these are four main repositories for language technologies in general. And then there are some other frameworks that allow you to build linguistic processing chains.
09:22
One of the most popular and known is GATE from the University of Sheffield, and the Freeling has, I'll show you just the Freeling. It's been run by Polytechnic University of Catalunya from Barcelona, and you have, of course, NLP Stanford set of tools,
09:41
and they are not always so easily, you cannot always build chains out of that so easily, as in WebLift or where you have already complete chains like this Freeling or GATE. But still, they have a valuable pieces of software
10:00
that could be very often language independent. So they sort of favor the language independent approach. And of course, today, if you look at the commercial and analytics products packages, like IBM Watson or Linguamatics,
10:21
they all include already language technology modules in their analytical packages. They offer that as a part of commercial product. And if I tell you that NLP will make a difference, particularly if we include the neural networks
10:43
in training new generation of NLP modules and NLP tools, well, don't take my word for that, read this guy. So you have data-informed bright future for NLP. Why natural language processing will change everything?
11:02
So it's been written three years ago, and still is valid after that. I mean, I would surely say that. And thank you for your attention. So.
Recommendations
Series of 3 media
Series of 11 media