Towards Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communications

105 views

Formal Metadata

Title
Towards Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communications
Author
Auer, Sören
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Technische Informationsbibliothek Hannover (TIB), Leibniz Universität Hannover (LUH)
Release Date
2019
Language
English

Content Metadata

Subject Area
Computer Bit Weight Computer Physical system
Information Smart card Computer Computer hardware Interactive television Computer scientist Procedural programming
Area Functional programming Term (mathematics) Software developer Object (grammar) Shape (magazine) Functional (mathematics) Form (programming)
Computer programming Programming paradigm Process (computing) Information Scientific modelling Computer hardware Fiber bundle Object (grammar) Functional (mathematics) Computer Subtraction Error message
Slide rule Domain name Building Identifiability Scientific modelling Auto mechanic Student's t-test Graph (mathematics) Code Inference Web 2.0 Berners-Lee, Tim Linker (computing) Representation (politics) Subtraction Descriptive statistics Physical system Linked data Standard deviation Spacetime Product (category theory) Information Bit Cartesian coordinate system Cognition Search engine (computing) Data storage device Telecommunication Information retrieval Information systems Website Library (computing)
Addition Programming paradigm Information Predicate (grammar) Formal grammar Representation (politics) Self-organization Object (grammar) Bounded variation Formal language
Domain name Identifiability INTEGRAL Source code Event horizon Formal language Knowledge representation and reasoning Average Representation (politics) Data structure Subtraction Units of measurement Enterprise architecture Programming paradigm Graph (mathematics) Information Knowledge base Relational database Bit Faculty (division) Data model Search engine (computing) Predicate (grammar) Statement (computer science) Self-organization Right angle Natural language Object (grammar) Digitizing
Domain name Building Service (economics) Open source Source code Graph (mathematics) Field (computer science) Metadata Product (business) Formal language Chaining Linker (computing) Ontology Representation (politics) Subtraction Social class Numerical taxonomy Mobile Web Programming paradigm Graph (mathematics) Information Mapping Relational database Closed set Morley's categoricity theorem Instance (computer science) Tessellation Category of being Process (computing) Predicate (grammar) Internet service provider Statement (computer science) Formal grammar Self-organization Quicksort Object (grammar) Pirate Bay Data type Resultant Library (computing)
Web 2.0 Call centre Information Internet service provider Software developer Projective plane Natural language Graph (mathematics) Exploit (computer security) Resultant Physical system
Area Source code Traffic reporting Subtraction
Area Pattern recognition Electric generator INTEGRAL Connectivity (graph theory) Source code Student's t-test Cartesian coordinate system Local Group Helmholtz decomposition Computer network Selectivity (electronic) Subtraction Physical system Social class Computer architecture
Confidence interval Internet service provider Feedback Resultant
Constraint (mathematics) Graph (mathematics) Observational study Computer file Data structure Data type Power (physics) Physical system Template (C++) Exception handling
Area Dataflow Identifiability Product (category theory) Information Bit Digital library Library catalog Number Telecommunication Self-organization Descriptive statistics Library (computing)
Revision control Email Mapping Order (biology) Library catalog Cartesian coordinate system Resultant Number Probability density function
Domain name Email Service (economics) Product (category theory) Electronic data interchange Information Mapping INTEGRAL Physical law Library catalog Revision control Uniform resource locator Order (biology) Vertex (graph theory) Business model Probability density function Physical system
Information Relational database Telecommunication Shared memory 19 (number) Physical system
Domain name Mapping Information Embedding Interactive television Virtual machine Shared memory Bit Library catalog Semantics (computer science) Internetworking Data structure Probability density function
Peer-to-peer Area Collaborationism Information Bit Number Vector potential
Area Domain name Observational study Natural number Bit Run-time system
Area Word Constructor (object-oriented programming) Characteristic polynomial Right angle Bit Mereology Subtraction Resultant
Collaborationism Identifiability Information Personal digital assistant Virtual machine Subtraction Resultant Field (computer science) Metadata
Latent heat Focus (optics) Message passing Cellular automaton Machine vision Normal (geometry) Resultant
Revision control Information Direction (geometry) Auto mechanic
Medical imaging Proof theory Mathematics Beta function Graph (mathematics) Software Linker (computing) Physicalism Energy level Theorem Data structure Graph (mathematics)
Graph (mathematics) Identifiability Information Linker (computing) State of matter Electronic program guide Authorization Normal (geometry) Bit Species
Collaborationism Information Real number Projective plane Graph (mathematics) Evolute Semantics (computer science) Sign (mathematics) Atomic number Order (biology) Representation (politics) Energy level Data structure Units of measurement Social class
Domain name Pattern recognition Information Real number Connectivity (graph theory) Auto mechanic Generic programming Graph (mathematics) Semantics (computer science) Cognition Formal language Goodness of fit Linker (computing) Query language Internet service provider Formal grammar Energy level Subtraction Resultant Form (programming) Social class Physical system
Musical ensemble Graph (mathematics) Service (economics) Information Machine vision Open set Usability Revision control Latent heat Prototype Telecommunication Normal (geometry) Resultant
Prototype Graph (mathematics) Term (mathematics) Linker (computing) Database Order (biology) Content (media) Interface (computing) Field (computer science) Metadata
Addition Category of being Complex (psychology) Algorithm Implementation Computer State of matter Similarity (geometry) Computer scientist Merge-Sort Address space
Machine learning Information Bit rate State of matter Order (biology) Bit Library (computing) Form (programming)
Addition Data model Identifiability Information Predicate (grammar) Statement (computer science) Interface (computing) Representation (politics) Object (grammar) Metadata
User interface Graph (mathematics) Vapor barrier Software developer Bit Cartesian coordinate system Local Group Statistical hypothesis testing Category of being Data model Database Statistics Subtraction
Laptop Graph (mathematics) Integrated development environment Data storage device Computer Letterpress printing Open set Resultant Plot (narrative)
Statistical hypothesis testing Laptop Graph (mathematics) Open set
Statistical hypothesis testing User interface Batch processing Graph (mathematics) Computer Zoom lens Sampling (statistics) Variance Generic programming Student's t-test Inequality (mathematics) Variable (mathematics) output Descriptive statistics
Laptop Point (geometry) Ordnungsstatistik Graph (mathematics) Information Computer Correspondence (mathematics) Measurement Number Authorization Diagram Data structure Subtraction Resultant Descriptive statistics
Graph (mathematics) Process (computing) Information Key (cryptography) INTEGRAL Projective plane Sound effect Data analysis Open set Semantics (computer science) Number Faculty (division) Inclusion map Word Integrated development environment Knowledge representation and reasoning Factory (trading post)
Area Covering space Collaborationism Process (computing) Graph (mathematics) Variety (linguistics) Software developer Projective plane Student's t-test Mereology Cartesian coordinate system Number
thank you very much now I'm coming to the topic of today's talk and I want to
start with the bigger broader picture now if you look at the history of computing and that's a bit how it started of course it
started even earlier red light notes and see that in the logo of the University of lightness that the binary system developed by country in the
land light nets but as the 1st IT systems of of developed so here for example to is it said CAI from 1944 and at that time you really have to physically interact with the harder it really had to push and pull those registers and instructs the computer to do certain computations and then in the seventies and
eighties this kind of interaction with computers big was quite popular in common in in between of also punch cards they also had to physically interact here we don't have to physically interact anymore with muscles but I we still have to know what registers stored what data and how to shift to data from 1 registered to another 1 so it was very closely aligned to the hardware to the capabilities of the hardware and then the computer scientists realize that's not
the most intuitive way how to deal with information how to process information and we get inspired by cooking recipes which were used already for hundreds if not thousands of years to represent procedure how to add ingredients and how to combine those ingredients and that gave rise to like
procedural functional programming in the eighties and nineties and but and also that was not the end of the
development and then there was the area
of objects and objects have a former shape and function this is the Chinese sent burning
terms and burning object and the deserts for of a nice form nice shape but also this function that you can create a nice room sentence with that of center of sticks of rich you burn their and candleholder and that's inspired to
computer centers then to come to this object oriented paradigm of modeling of programming of interacting with the hardware which organizes information I think in a more intuitive reign of so bundle functions and methods and daytime these kind of objects we define relationships between different objects but still that they've had somehow hidden inside of these objects and and I think what we now see in the last decade as maybe that's
only come to also an error of their data information knowledge plays an increasingly important role and I I would say is also that inspired how we humans process of information in our brains are we exchange
information between different and humans and I would like to talk a bit about this in my talk how can be and represents data information and knowledge in a cognitive way exchanges and how can we use that for improving scholarly communication that's a public a particularly important year also for TI B
so on 1 approach which is now maybe 1 of the domain of approach for dealing with heterogeneous information is so according to the Linked Data principles especially than is also distributed and then distributed information system I'm a member of two year Institute off distributed information systems and so there we have your ideas which identify things so in the daytime saw your eyes are identifiers and libraries we of course have other identifier systems as the lies the ends and or many of you know these VN bar codes for example which are used for products so it's very important to be able to identify all kinds of things and also in in the daytime then and providing a mechanism to look up this information and as we have found about the possibility to retrieve that documents we can also retrieve data items using the same principles of using your eyes which are the reference sible using the that protocol and and return a description which everybody can understand or for that documents we have HTML as a standard for representing these documents and that's the reason why we can build search engines why we can build e-commerce applications by we can infer exchange information on the web in this global information space for data and this information representation is not so student enters the Resource Description Format and I will tell you on the next slide how this works and the same mechanism as we can link between websites we can apply here to the daytime time then link between different data and items and data storage and use Linked Data principles recall and also by Tim Berners-Lee was not only to invent off develops on world but also 1 of the guys between modeling data for example on our knowledge graphs as
it's so called and that more and more modern way so what of Lotus RDF to some data representation formalism and of the basic idea is that we organize information very similar how we humans organize information in sentences sentences consist of a subject predicate and object and in Germany have a lot of additional variations but I think also 9 more than 90 per cent of languages based on this simple paradigm with some of the some some
variations and we can't for example say that the Faculty of Electrical Engineering Computer Science organizers this that's follies today and 2019 and this as a subject predicate and object and as we
use objects of 1 sentence as subjects of other sentences in our nature language and we talk in the right of text we can do the same here and we can say that this on its full is on 2019 of stars or takes place on twenties of made 20 19 and it takes place in Hanover arousal and that already illustrates a bit how we can spend all create a small knowledge graph you consisting of harmony triples how many statements that we have here 3 gelatin knows the answer so basically it's 3 statements our each creatures small knowledge graph and you can already imagine that you can attach more information to the notes you could for example described the faculty in more detail we could describe the and lecture here in more detail on here and over actually refers to another entity in DBP knowledge base on which of the extracted from Wikipedia almost a decade ago and on and in like city which contains a lot of structured information about on all of the entities which are described in Wikipedia and the interesting thing about this they come all this triple statement data model is that we can also write this down as subject predicate object statements and they can easily integrate information from different sources and that's very different different and difficult but other data models if you look at relational database if you look at XML and or object oriented some representation paradigms and takes a lot of effort and time to integrate and to all of the combined data models from different sources while here it is in a way already built in into the data model that's very easy and almost on the and trivial true to integrate information from different sources of course and of the idea is also to reuse these identifiers so we see here that this organizers starts takes place these are predicates from a vocabulary and b we use this kind of predicates or you're supposed to reuse them and then the integration really on make salsa semantically sense and we can for example create a search engine for events and an overall worldwide by integrating information from lots of these sources so this kind of knowledge
representation already and takes up and of more and more companies organizations are using that use an example of for example information about an enterprise of digital for example we can represent in this way we can say that the jealous active in industry we can attach labels in different languages of for example Chinese and German we can describe the headquarters behind you can represent units and data items there and in this way and establish and knowledge representation average also can spend not only different languages but different domains cross domain knowledge that we don't have to a structured information in advance but we can add more information on the goal basically and this paradigm of
presenting information and knowledge graphs becomes more and more popular and some knowledge graphs are often fabric of concept class property instance relationships and the use some formal representation formalism for example RDF for hold about ontology language which and builds on top of RDS it often integrates information from different sources from different domains all sort of different granularity and this can be for example instance data which can come from open sources from private for example supply chains also crows data sources product models and companies are examples of such close data can comprise a derived an aggregated daytime sky Monday tiles like vocabularies and ontologies which can be represented in the same way as the data and and there's also a difference from other data representation formalisms metadata and taxonomies to categorize information links between internal and external data or mappings for example to either of data representation formalisms such as relational databases and all these kind of types of information and data can group presented but according this subject predicate object RDF as statement paradigm that he received that an example of after the BP their excerpt of the DDP their knowledge graph affects about bob dylan United States Steve jobs and stare many
more TV Peter was 1 of the 1st knowledge classmate meanwhile there many others on 1 of which is now more of better integrated this Wikipedia's begin data on which followed up basically the rope of TBP that as 1 the library world or many but 1 example is the German D Indian German national home all under demands on and on that note I think you very much Lambert and so and so on and another 1 in industry for example school this knowledge graph on which spends on many different domains and builds the backbone basically of mobile services sound at will invested some a large amount of money a thing more than 100 million dollars in acquiring a company called Freebase and and expanded this work invested even more in building this backbone knowledge crawford organize information about people places organizations which cooler now uses to provide services like with maps for example or e-commerce that search and then you search for example on will will you find not only documents but you also find results from the knowledge graph for example facts about people and what political party they belong to a vendor were born and ported children or if you look for restaurants on on Google Maps you get information from this knowledge graph and this is all interlinked and connected now how can we exploit
this information and so 1 project be performed and partially also here and then over the nice thing is since I'm here already for almost 2 years as I can also report some results which we have already achieved here is that they can use such knowledge graphs for Question Answering so that we can use nature language questions and provide answers to users and exploiters information from our knowledge graphs from the Web of Data develop of data basically is the developer of interlinked no literature so I'm increasingly available on the web and then provide such question answering services to citizens communities and industry many of you have made use Google now on Amazon Ilex sound and end-user examples or instantiations of such a question answering systems
operating on also structured knowledge so if you have for example a question like who is the Director of clockwork orange and we have to understand of a spoken question on they have to analyze this question find and answer find data how to answer the question then presenting answer to the user and the
finally show showed result and as you can see here already it's not that simply because of a clockwork orange for example is a named entity and its starts and like an orange clockwork but it's a piece of art a movie in that case
so and the example is sought publications and health reports related to timers in Greece so that's a question which is more related to a to
research for example that we need to integrate maybe different data sources not just 1 but 2 of them we could integrate for example pop or a data from the World Health Organization's clinical trials and that's an area of their money has to be done for example is working with us and intensively on this topic so
lots of how have the and and tackle this problem of previously often Question Answering was tackled by and homo genius approach basically and be in this WDR class of money could we training network with a group of 15 PhD students tackle the problem by developing a knowledge-based architecture for developing Question Answering system so it's not a monolithic system which takes a question and then provides an answer but it's basically a pipeline of where you have lots of different components of 4 and Curry decomposition for data source selection Korea execution named entity recognition answer generation and you can integrate lots of different components and depending on on the use case and the application and area and maybe
I'll show you how how this works in reality you can actually try this all difficult w the UK was about slash QA and you can test it yourself and so on let's have a look for example if you if you are
interested in or of question
clockwork orange you can enter the question
yeah on top and then you actually get the answer but do not only see the answer here it's derived from the Wikipedia but you also see a confidence basically it's all confidence how confident Edward was brought and is about of the results so you can also provide feedback and of course you
can ask a lot of other questions for example who is the mayor of London
study con- or at
home who is the author of of you to keep almost let's
see her and formed Suntech's properly or you can also ask
paintings by Monet for example you will get the collection of of paintings of by by money so in this nicely illustrates that the structure of knowledge provides us a lot of of power basically to answer questions and also go a step beyond maybe about question answering systems currently can do because of question answering systems like Google Now or Amazon's yearly the they work basically with a relatively rigid templates arousal there basically the type of questions often predefined and a half of thousands of craters who basically courageous kind of templates file here it's more open and you can ask all kinds of questions everything which is contained in the knowledge graph can be answered vital in this commercial who a system so it's often has to be basically predefined what kind of questions can be answered and on the other hand around you also can ask of viewed questions and get veered answers I think this is 1 of 2 reasons why and I'll except for example don't allow all kinds of questions that have these templates they adhere to because it has to be childproof for example and so I have certain provide certain safety constraints now how how
is this related to scholarly communication and I would like to come a bit to the topic of how can be applied as for TIBE for libraries for digital libraries to organize information flows also in individual area for a scientist for school
last if we look a bit at examples how off of publishing libraries traditional wrote this publishing how publishing worked so this was maybe 1 of the reasons why and the wall in East Germany came down because these Germans all wanted to have this nice although unhcr a catalog so you had product descriptions prices and you had also identifiers numbers to identify the products and
you all saw 20 years ago had a lot of of of Mets street maps basically in the result publishing industry publishing this kind of street maps and you had to buy a new 1 every year
or you'll remember maybe the time had phone books to look up phone number so but these are all industries with thousands of publishing houses Worldwide's often billion dollar or euro industries a but they all disappeared
and that Brooks completely different nowadays as you look at the Commerce we don't use PDF versions off a mail order catalogs today but the EU's new applications like Amazon for example or eBay or
EDI law as very can really organize and drill and drill down in the information in a completely new way and that's very different from just a PDF a digitized version may be of a mail order catalog
you know also street maps we don't use PDF versions of street maps but to use navigation systems which allow us also to zoom in Intermezzo knowledge tool locates of to personalize maps so to locate information on maps as so it's a completely new way how to interact with this information and what they can that this
world of publishing and on information exchange has profoundly changed and many of these verticals and domains and new possibilities were developed like this assuming this dynamic and the business models changed completely of we have much more focus on daytime to linking services and search in the data and also the integration of crowdsourcing plays a very important role as if you look at Amazon for them we have lots of reuse which a crowdsourced from people bought these products or in Google Maps you have also reviews of the businesses on on the map for example as integrating this information and integrating also contributions user contributions is a very important
aspect so how do know about scholarly communication scholarly communication is how the researchers publish share exchange information how does it look there
and this was 1 of 2 1st publications from 16 67 of the Philosophical Transactions of the Royal Society 1 of the 1st journal publications in the 17th century
and in the 19th century publications looked like that's here
and in the 19 seventies of remember here this famous of paper about on relational databases look like that PostScript was already use for as a typesetting system nowadays the use
PDF of sodium on publishers open-access you share PDF documents may be on the internet and but of it's only partially machine readable doesn't preserve much of the structure doesn't allow embedding of semantics interactivity and of a bit that compares to or a digitizing a mail-order catalog as pdf and then sending it around as e-mail putting it on the website side there were digitizing a street maps as PDF that's what how how we do and how we represent
information and in science nowadays and these other domains were completely disrupted and science in use I think very antique of ways of knowledge sharing and
information exchange and a big number a large number of issues and problems there are some like we don't use potential of a and B should collaboration for example in science but we have also this monopolization of commercial actors which and exploit their lock-in affects the ever reproducibility crisis a proliferation of publications on deficiency of peer review I will go a
bit more in detail for example proliferation of science you can see here that in this decay it from 2004 to 2014 and the number of publications in science and technology these are areas the take care as T for example almost doubled now and probably continue to grow and not so much because the in Germany publish much more to publish a bit more but that didn't double in Germany or in the U. S. so it's slightly increased but the tripled in China for example it crap tripled in India and if you look at Brazil or Russia these are countries which now also introduced a scientific or publication market and follow in our footsteps and publish large enough amounts of and documents and so
with regard to reproducibility there was a study in nature and and in this study 70 per cent of the the experiments of failed to be reproduced by an under scientists and 50 per cent of scientists also what we fail to reproduce their own experiments and I think this that even happens us because maybe after a few years the of our runtime environment is not available anymore and we cannot run on experiments after some time at this of course differs a bit between and scientific areas domains but I
think it's a major issue because reproducibility is 1 of the cornerstones of of science and I
think this results in a big duplication also inefficiency in and you
have quite some of them the terminology is not clearly defined every paper of course defines its terminology but if you take 10 or 15 papers even in the same area addressing the same research problem the receptor held differences in the terminology and it makes it extremely difficult out to compared to integrate and fifties problems and produce methods characteristics on a properly defined and imagine how would engineering rock how what and building construction right if he could not identify the parts of an exactly if he could not put them exactly together and unfortunately my impression in science we often after situation this different bits and pieces don't really fit very well together and it takes enormous effort to make them fit after after words on a
later on so we have this lack of transparency information is hidden in text integrate ability different research results are not very well fitting together there is almost no machine assistance assistance of use may be cool school or or or TIBE portal but of these full-text search sure of methods they don't work very well in supporting our scientists this beyond metadata we don't have much identifiability and collaboration as as a major issue and also getting an overview it's very difficult takes years for scientists to get an overview in a certain research field
and I want to illustrate that so if you look for example now at T of the portal for crisper of which is again all editing methods you found 9 thousand results but if you do the same search
we don't have that much because it's a biochemical methadone and we don't focus much on by by all methods of chemistry but that's what we have 9 thousand results Cougars' school as 238 38 thousand results cell and now I imagine you are interested in the precision and a safety the cost of the method or you want to know what specific Ceska norm editing when you apply to insects or who has applied it to butterflies and you know basically lost and this and a pile of haystack of of publications so now after
this depressing message how can we fix it there was already this vision also in
19 I think 44 also from Vannevar Bush organizing information in a kind of Memex he described at the time at that time and I would say for this time it must have sounded quite is enteric so the idea was to have a desk as a researcher and on top of the best you have a tap let which shows you at your fingertips the information the just magically provided by some mechanism paparazzo's below your deck a desk and it gives you basically this information you interested in for your research I would say at that time 1944 and was like really science fiction today my version is we have read opportunity to realize something like this and so to create of this Memex and
be would like to and we started working in this direction here at here be using
this approach of of Knowledge laughed of for identifying overarching concepts in scientific publications like research problems definitions research approaches methods like artifacts very have publications but also beta software image all your of the dual of knowledge graphs and colleges and then very domain-specific concepts like if you go into mathematics for example we have definitions theorems proofs in physics experiments state models of
sorry in chemistry substances structures reactions and so on so forth and the need to go much deeper in the publications and identify these concepts and link those concepts not just on the level of publications but on the level of these concepts and clarify the maybe in such a knowledge graph I
want to show you a bit how this can work if you have for example publication here a practical guide to crisper Qasr 90 norm editing and leave the doped here how live diptera butterflies Saletan name for butterflies
have we can now are not only represent may be done be the graphic mate they tell like the author and the title of the document but also information about the research problem which is addressed by this publication about the methods which are applied and on what and so and species they are applied he award experimental data and represent that in some kind of a knowledge graph linking these concepts with each other and and establishing relationships between these different entities in describing information than we do this for many publications rare Avi reuse those identifiers and link between those publications we can then and indeed variants and so I get an overview of of the state of the art for
example in order to do that we need to lift and cognitive like knowledge graphs to a more cognitive level where they can represent uncertainty disagreement semantic granularity and also the evolution and provenance of information about at the same time being flexible and and simple and we want to follow and the signs craft project which started so now and maybe yeah see consolidator project some the notion of some knowledge molecules that basically this contributions of research artifact are captured in such a molecule and a relatively compact so a simple but still structured unit of knowledge which can then be incrementally enriched annotated and interlink and if we
look at knowledge class today they describe real world entities atomic entities we add and delete facts and India and which may be facts collaborate by enriching facts but in the future we need to
make knowledge glass more cognitive in the sense that the a base and it is not only real world entities but also conceptual idea like it is in research often it's on from of course link to the real world but often is also ideas Albritton's of some methods we developed we conceive intellectually and interlink them annotate them of course did not always agree that we have disagreement we might also really view or provides different assessments of contributions and this should be captured and such knowledge graphs there might be also a drift over time of these concepts and varying aggregation level and semantics emerges in a way out of a collaboration of different
scientists and as a result also hit of the goal is to identify some information and to provide information for example uses Question Answering approach I I showed earlier and applied this on to such an knowledge class cognitive knowledge graphs which capture scientific knowledge for example we can answer the question how the different ginormous editing techniques cf. by passing the question by discovering named entities and relationships and links between them and then translate it into a formal query uracil occur language from Knowledge Graph called sparkle very inspired by SQL and construct a Korean then render results on to a user and apply here this pipeline approach different components but because named entity recognition for example something which works relatively well in specific domains but it doesn't work well in open war a generic domains for example for biomedical concepts of we can achieve quite good precision and recall and but for other domains in the different approaches of we need to pluck in and different systems and mechanisms for some form of constructing such prayers for example specifically form of questions with regard to specific domains
and provide India an overview of the results of 2 researchers for example here a comparison of different can norm editing methods their specific to use safety and ease of use and cost this is a mock up so this is not reality but I will now show you some example of of a
prototype which we started to develop and in the last year and we have a 1st so early out version you can actually go to let's start to start EU slash kg or a K. G. stands for open research knowledge graph and that's a service that you want to develop now in the next years here in Hanover which include bands and realize exactly this vision of a of structuring information for scholarly communication such a knowledge of how does it work so the idea is you
describe research publications maybe in the medium to far future we don't even need publications anymore it just describe our research findings in the knowledge graph order short term I think we still need these publications and people still have them
so you can start for the publication and if you have a boy you can actually add your going
there and you can describe or at a link it to a research fields of your publication of is
related to and then based on the crossref curry for example you can automatically get already to be geographic metadata after publication is already indexed in some kind of geographic databases and finally or most importantly then described and the content of the publication in the semantic away and that's of course a very early prototype pitch this unilateral and who is also here in the auditorium developed this interface but the aft experiment much more in the next months or even years and the ideas then to described for example here the publication
addresses the problem of sorting algorithms and then describe the basically the approach like this is merge sort algorithm it's implement in C plus plus has a stable implementation has a best complexity versus complexity and you can add arbitrary additional properties and reuse properties from other publications which tackled for example the same problem already and and describe your
publication and then we can also creates of similarity computations for example and then show
you how or show an overview of the state of the art in a certain of the burka addressing a certain research problem here for example related to sorting out does of course a classical example a toy example fast computer scientists and I will show you
a bit more complicated 1 in a minute and but this and may be considered an intuition of what we want to achieve in the future that this of course works for arbitrary research problems that you can quickly get an overview what does the state of the art in a certain field and that we represent information about other approaches there and having this crowdsourcing approach the researchers at this information by themselves they're be librarians at the abbey for example or the library community contributes else score rating that and of course also involving machine learning and Automatic techniques and maybe suggesting already information so that you don't have to fill out the forms completely manually but information is already has suggested that something be but we will work
on in the next years in order to do that so behind this user
interface here of the extended the
bitty RDF data model because you see here this subject predicate object or resource predicted resource representation so the base data models also RDF but we need a lot of additional metadata basically because every statement and can be a debatable AI researchers can agreed it can disagree so we need to attach further information on these kind of statements and of what you can do you can attach metadata to each individual resource to each predicate but also to the to the statements arbitrary metadata of course always important is when was the statement made by who was made have an identifier for the statement but you could also attach for example who agrees to the statement would disagree and represent a kind of scientific debate and that they also according to the data model and he also developed
of course and and and application with different layers on home where you have a graph database use of property graph database because property graph databases allows to add this kind of data model and and so on and then come home API of course which allows them to provide and develop different user interfaces on top of the application where we are now I'm experimenting with an expanding and extending this now
I would like to finally show you a bit more complicated to complex example that we also tried to even lower the barrier for scientists more-until increase maybe also a possibility to make things more reproducible so this is for a pair of Markis stock of 1 of the postdocs in my group who worked on and making a statistical analysis also reproducible like in this paper for example there is a statistical hypothesis test included features of our users of and statistical methods ontology to represent it and
we can store that also then and open research knowledge graph so for example if you read the paper you have been here a sentence which indicates which basically says that I'm translating it as a lay person now that's a patients who have heart failure often suffer also from from Iowa that so that their blood is not capable of and binding sufficiently of much higher-end and this is statistically validated and some plots integrated here which illustrate that and basically provide evidence for that but these plots often in the paper Severus small print difficult to read and 5 for 7 years later you cannot easily and get a debate how back and and of falsify overrefined
that and a lot increasingly many researchers are using this for example Jupyter notebooks but you can also integrate that into as SPSS as all our different other environments and what of mark was developed to use an approach which stores and the results of the Jupyter notebook in this open research knowledge graph so you actually don't have to leave your environment but while you perform a computation in you Jupyter notebook and computation results the data is also added to the to the
knowledge graph at the same time once you then describe your paper like your
again I have a boy for example
and and that of your
paper to open research knowledge graph and so it addresses you desire and efficiency and heart failure patients and then you can add to the research contribution here for this research problem and we can then link it
to the data which was captured by Jupyter notebook so in that case here the statistically significant hypothesis test an and can link basically all this data generated by the Jupyter notebook to the contribution in this
research publications and after linking
batch B then can browse of the
paper again and we have here a rich description then of the contribution
data so we have about the statistically significant have bulldozers tests and is a p-value computation of which includes songs to sample t-test and unequal variance as input variables and and we can then go
in more detail and look into the particular values we of course also experimenting this is the generic user interface for going looking into the graph of course we want to in the future also develop maybe more and tailored if you have such a key tests and you can also visualize it in a different way and make it more more intuitive they can basically browser zoom into the graph and look at all the
values to and to the measurement
values you have of measured and you have added to your Jupyter notebook
and your for example 105 and this numeric value corresponds exactly to 1 point in your up salt in the diagram in your publication
and this illustrates and also done on the knowledge graph the resulting knowledge graph so the half of the paper here as 1 entity the authors of which are linked to the paper but then we have also the research result and then we have a description of this research results and with the of statistical our contributions order statistical computations basically we presented as a knowledge graph and that's making it really reproducible so that you can do at a later stage maybe also compare different or aggregated information from different scientific papers which now requires a lot of cumbersome manual effort sometimes even e-mailing the author's trying to get the data and we hope that in the future but this helps to make research much more reproducible verifiable and and also represent information in a more structured systematic way yeah this leads
to a loan to the end of my overview talk of course I did manage to présent you of many things also worked on there are a number of other projects we do at the faculty and L 3 as a or AT T and be like a slightly he or inclusive OpenCourseWare words about the crowdsourcing the key environment for open educational resources or big data for factories very used his knowledge graph vocabularies semantic knowledge representation techniques and tools represent information for integrating data in our manufacturing processes of every hour in large European consortium with many other companies or a nursing AI of which is coordinated by also posted in medical Cobourne convocation of it's about an hour and Yau of assessing the effect of the ion on nursing in the future or projects like diocesan because they didn't expect this to be done for example takes care of which apply these methods for medical data analysis and integration and or environmental data analysis and the fear that we are involved in the community so this tool give given the overview and of course there are many people
already involved this and I started already
almost 2 years ago we have a sizable number of people in the team many of them are here in the audience like of my postdocs Simulcastr stock got bookish mark but also job at Gemini Jennifer Zuse and so the PhD students softer developers who work on these applications and and collaboratives like money ester be dire or 2 colleagues from that to who also you today from in FY which also take part in the science craft projects and so
on this is the team of responsible and of supporting this research a sensor cover quite a wide variety and we also have the ambition of course to bring these knowledge graph techniques also to many different areas ideally to all the areas the TIBE is also active like engineering and science and technology basically and I think we are still in the beginning there and there's a lot of more work to do in the future thank you very much for
your attention
Loading...
Feedback

Timings

  514 ms - page object

Version

AV-Portal 3.10.1 (444c3c2f7be8b8a4b766f225e37189cd309f0d7f)
hidden