Using Neo4j for exploring the research graph connections made by RD-Switchboard

Video thumbnail (Frame 0) Video thumbnail (Frame 1407) Video thumbnail (Frame 4437) Video thumbnail (Frame 6739) Video thumbnail (Frame 8485) Video thumbnail (Frame 9815) Video thumbnail (Frame 18493) Video thumbnail (Frame 19573) Video thumbnail (Frame 20925) Video thumbnail (Frame 22177) Video thumbnail (Frame 23547) Video thumbnail (Frame 24702) Video thumbnail (Frame 26866) Video thumbnail (Frame 28508) Video thumbnail (Frame 30415) Video thumbnail (Frame 31842) Video thumbnail (Frame 36860) Video thumbnail (Frame 39457) Video thumbnail (Frame 40557) Video thumbnail (Frame 41841) Video thumbnail (Frame 43508) Video thumbnail (Frame 44918) Video thumbnail (Frame 46272) Video thumbnail (Frame 47482) Video thumbnail (Frame 49857) Video thumbnail (Frame 51253) Video thumbnail (Frame 53051) Video thumbnail (Frame 54150) Video thumbnail (Frame 56731) Video thumbnail (Frame 58670) Video thumbnail (Frame 61037) Video thumbnail (Frame 62904) Video thumbnail (Frame 65905) Video thumbnail (Frame 69689) Video thumbnail (Frame 70795) Video thumbnail (Frame 72468) Video thumbnail (Frame 73636) Video thumbnail (Frame 74772) Video thumbnail (Frame 83782)
Video in TIB AV-Portal: Using Neo4j for exploring the research graph connections made by RD-Switchboard

Formal Metadata

Title
Using Neo4j for exploring the research graph connections made by RD-Switchboard
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2016
Language
English

Content Metadata

Subject Area
Abstract
Jingbo Wang (NCI) and Amir Aryani (ANDS) present the Neo4j queries that can help data managers to explore the connections between datasets, researchers, grants, and publications using the graph model and Research Data Switchboard. In addition, they discuss the recent paper on "Graph connections made by RD-Switchboard using NCI’s metadata", presented in the Reproducible Open Science workshop in Hannover earlier this month.
Goodness of fit Implementation Graph (mathematics) Query language Graph (mathematics) Multiplication sign Core dump Query language Metadata Ranking Mereology
Web page Slide rule Group action Context awareness Service (economics) Connectivity (graph theory) Combinational logic Set (mathematics) Computer configuration Energy level Position operator Link (knot theory) Information Cross-platform File format Keyboard shortcut Data storage device Line (geometry) Limit (category theory) Degree (graph theory) Repository (publishing) Query language Personal digital assistant Telecommunication Search algorithm Revision control Website Quicksort Resultant Row (database)
Collaborationism Group action Presentation of a group Multiplication sign Projective plane Moment (mathematics) Mereology System call Windows Registry Integrated development environment Universe (mathematics) Computer music Data structure Information security Computing platform
Context awareness Information Numerical digit Model theory Multiplication sign Internet service provider Set (mathematics) Database Revision control Prototype Digital photography Process (computing) Googol Basis <Mathematik> Different (Kate Ryan album) Personal digital assistant DDR SDRAM Library (computing)
Web page Slide rule Asynchronous Transfer Mode Connectivity (graph theory) MIDI Set (mathematics) Database Mathematical analysis XML Profil (magazine) Different (Kate Ryan album) Musical ensemble Gamma function Summierbarkeit Zoom lens Content (media) Internet service provider Scanning tunneling microscope Mereology Translation (relic) Digital object identifier Protein Inclusion map Process (computing) Sample (statistics) Repository (publishing) Personal digital assistant Time evolution Internet forum Chain Booting
Context awareness Group action Java applet INTEGRAL Graph (mathematics) Multiplication sign Direction (geometry) 1 (number) Set (mathematics) Database Parameter (computer programming) Computer programming Mathematical morphology Inference Programmer (hardware) Computer configuration Different (Kate Ryan album) Endliche Modelltheorie Physical system Area Theory of relativity Cross-platform Internet service provider Flow separation Category of being Arithmetic mean Software repository Repository (publishing) Quicksort Row (database) Point (geometry) Web page Slide rule Functional (mathematics) Implementation Identifiability Service (economics) Connectivity (graph theory) Disintegration Characteristic polynomial Virtual machine Maxima and minima Similarity (geometry) Scalability Metadata Element (mathematics) Supercomputer Profil (magazine) Googol Graph (mathematics) Harmonic analysis Energy level Data structure Computing platform Computer architecture Domain name Time zone Graph (mathematics) Scaling (geometry) Information Projective plane Database Line (geometry) Limit (category theory) Particle system Personal digital assistant Universe (mathematics) Object (grammar)
Slide rule Graph (mathematics) Link (knot theory) Connectivity (graph theory) Multiplication sign Set (mathematics) Mass Database Software maintenance Flow separation Element (mathematics) Degree (graph theory) Degree (graph theory) Process (computing) Flow separation Graph (mathematics) Endliche Modelltheorie
Presentation of a group Link (knot theory) Different (Kate Ryan album) Query language Projective plane Universe (mathematics) Energy level Water vapor Mereology
Slide rule Functional (mathematics) Presentation of a group Graph (mathematics) Graph (mathematics) Computer file Code Set (mathematics) Web browser Student's t-test Bookmark (World Wide Web) Web browser Revision control Inclusion map Cache (computing) Integrated development environment Atomic number Googol Query language Website Hydraulic jump Physical system Window Window
Slide rule Graph (mathematics) Information Prisoner's dilemma Computer file Content (media) Code Magnetoresistive random-access memory Water vapor Bookmark (World Wide Web) Mach's principle Fluid Integrated development environment Graph (mathematics) Hydraulic jump Physical system Window
Web page Point (geometry) Domain name PC Card Greatest element Connectivity (graph theory) Computer file Civil engineering Set (mathematics) Hurewicz-Faserung Bookmark (World Wide Web) Metadata Local Group Mach's principle Integrated development environment Query language Data Encryption Standard Gamma function Window
Slide rule Multiplication Email Link (knot theory) Graph (mathematics) Connectivity (graph theory) View (database) Graph (mathematics) Model theory Set (mathematics) Bit Digital object identifier Proper map Flow separation Number Degree (graph theory) Degree (graph theory) Flow separation Personal digital assistant Query language Internet forum Query language Data conversion
Point (geometry) Email Module (mathematics) Graph (mathematics) Graph (mathematics) Connectivity (graph theory) Cellular automaton View (database) Computer file Graph (mathematics) Code Database Bookmark (World Wide Web) Element (mathematics) Mach's principle Personal digital assistant Googol String (computer science) Gamma function Hydraulic jump Physical system Spacetime Computer architecture Window
Computer file Sound effect Insertion loss Instance (computer science) Limit (category theory) Bookmark (World Wide Web) Food energy Number Category of being Medical imaging Internet forum Spacetime Window
Digital electronics Computer file Electronic mailing list Content (media) Angle Bookmark (World Wide Web) Event horizon Bookmark (World Wide Web) Mach's principle Type theory Query language Personal digital assistant Computer configuration Googol Internet forum Simulation Directed graph Window
Slide rule Presentation of a group INTEGRAL Connectivity (graph theory) 1 (number) Set (mathematics) Water vapor Limit (category theory) Counting Metadata Number Element (mathematics) Mach's principle Revision control Sign (mathematics) Different (Kate Ryan album) Profil (magazine) Negative number Authorization Endliche Modelltheorie Traffic reporting Directed graph Window Link (knot theory) Graph (mathematics) Information File format Computer file Physical law Bit Bookmark (World Wide Web) Measurement Particle system Subject indexing Process (computing) Personal digital assistant Internet forum Right angle Quicksort Cycle (graph theory) Form (programming) Directed graph
Group action Connectivity (graph theory) Execution unit Characteristic polynomial Limit (category theory) Counting Automatic differentiation Field (computer science) Element (mathematics) Mach's principle Causality Different (Kate Ryan album) Endliche Modelltheorie Data structure Formal grammar Directed graph Physical system Window Link (knot theory) Graph (mathematics) Computer file Sound effect Type theory Data model Keilförmige Anordnung Personal digital assistant Repository (publishing) Internet forum Mathematical singularity Row (database)
Group action Graph (mathematics) Code Plotter Computer file Set (mathematics) Database Counting Digital object identifier Bookmark (World Wide Web) Mach's principle Category of being Googol Internet forum Authorization Form (programming) Physical system Row (database) Window
Code Computer file Set (mathematics) Control flow Basis <Mathematik> Limit (category theory) Density of states Regulärer Ausdruck <Textverarbeitung> System call Mach's principle Local Group Particle system Proof theory Internet forum Form (programming) Directed graph Row (database) Window
Email Link (knot theory) Connectivity (graph theory) Computer file Set (mathematics) Limit (category theory) Counting Tangible user interface Term (mathematics) Bookmark (World Wide Web) Number Mach's principle Frequency Internet forum Graph (mathematics) Connectivity (graph theory) Data Encryption Standard Hill differential equation Quicksort Form (programming) Directed graph Physical system Reverse engineering Window
Email Connectivity (graph theory) 1 (number) Set (mathematics) Limit (category theory) Mass Counting Likelihood-ratio test Protein Mach's principle Data management Degree (graph theory) Escape character Flow separation Blog Condition number Window Multiplication Key (cryptography) Computer file Line (geometry) Cartesian coordinate system Bookmark (World Wide Web) Flow separation Rothe-Verfahren Degree (graph theory) Arithmetic mean Query language Internet forum
Asynchronous Transfer Mode Divisor Structural equation modeling Web browser Open set Thermische Zustandsgleichung Mathematical morphology Area Mach's principle Local Group Revision control Data management Googol Pattern language Species Endliche Modelltheorie Formal grammar Window Domain name Metropolitan area network Graph (mathematics) Computer file Database Bookmark (World Wide Web) Digital object identifier Inflection point Forest Inclusion map Hausdorff space Personal digital assistant Query language Internet forum Quicksort
Slide rule Presentation of a group Information Graph (mathematics) Multiplication sign Computer file Moment (mathematics) Metadata Mereology Bookmark (World Wide Web) Digital object identifier Number Data management Integrated development environment Query language Different (Kate Ryan album) Internet forum Computing platform Physical system Window
Satellite State observer Service (economics) Dynamical system Scaling (geometry) Graph (mathematics) Connectivity (graph theory) Moment (mathematics) Data storage device Metadata Metadata Medical imaging Computational physics Process (computing) Integrated development environment Optics Energy level Figurate number Videoconferencing Physical system
Ewe language Service (economics) Scaling (geometry) User interface File format Connectivity (graph theory) MIDI Structural equation modeling Data model Process (computing) Integrated development environment Hill differential equation Virtual reality
Context awareness Logical constant Graph (mathematics) Identifiability Rational number Graph (mathematics) View (database) Relational database Database Database Line (geometry) Library catalog System call Computational physics Database normalization Query language Analog-to-digital converter Repository (publishing) Query language Video game Contrast (vision) Data structure Linear map God
Asynchronous Transfer Mode Identifiability Link (knot theory) Computer file Graph (mathematics) Connectivity (graph theory) Database Metadata Whiteboard Hierarchy Energy level Electronic visual display Cuboid Process (computing) Data structure Computer-assisted translation Domain name Graph (mathematics) Interface (computing) Moment (mathematics) Projective plane Metadata Computer network Database Library catalog Latent heat Process (computing) Software Synchronization Website Energy level Physical system Spacetime Row (database)
PC Card View (database) Connectivity (graph theory) Multiplication sign Database Analytic set Metadata Computational physics Goodness of fit Uniformer Raum Hypermedia Arrow of time System identification Graph (mathematics) Information Graph (mathematics) Metadata Internet service provider Database Digital object identifier Data management Digital photography Process (computing) Software repository Summierbarkeit Energy level Row (database)
Point (geometry) NP-hard Group action Presentation of a group Service (economics) Graph (mathematics) Connectivity (graph theory) View (database) Multiplication sign Database Branch (computer science) Subset Number Data management Different (Kate Ryan album) Collaborationism Graph (mathematics) Prisoner's dilemma Data storage device Thomas Kuhn Database Data management Digital photography Software repository Telecommunication Directed graph
Medical imaging Arithmetic mean Computer file Well-formed formula Graph (mathematics) Metadata Database Open set Computing platform Address space
Slide rule Digital filter Graph (mathematics) Numbering scheme Expert system Whiteboard Bubble memory Googol Position operator Wireless LAN Window Service (economics) Graph (mathematics) Computer file Java applet Code Database Bookmark (World Wide Web) Twitter Web browser Logic synthesis Inclusion map Repository (publishing) Fingerprint Directed graph
Web page Point (geometry) Meta element Reading (process) Asynchronous Transfer Mode Digital filter Computer file Software developer Graph (mathematics) Model theory Counting Data model Googol Thermal fluctuations Position operator Window Logical constant Information Software configuration management Computer file Java applet Menu (computing) Core dump Instance (computer science) Bookmark (World Wide Web) Web browser Element (mathematics) Inclusion map Repository (publishing) Normed vector space Inference Pulse (signal processing)
Email Group action Context awareness Presentation of a group INTEGRAL Graph (mathematics) Multiplication sign Combinational logic Set (mathematics) Database Insertion loss Open set Neuroinformatik Inference Different (Kate Ryan album) Computer configuration Semiconductor memory Bubble memory File system Endliche Modelltheorie Data conversion Physical system Service (economics) Email Inference engine Software developer Moment (mathematics) Data storage device Internet service provider Twitter Category of being Type theory Process (computing) Repository (publishing) Chain Computer music Cycle (graph theory) Spacetime Slide rule Implementation Observational study Computer file Virtual machine Student's t-test Streaming media Tangible user interface Smith chart Element (mathematics) Power (physics) Supercomputer Revision control Writing Well-formed formula Googol Graph (mathematics) Selectivity (electronic) Graph drawing Data structure Gamma function Computing platform Compilation album World Wide Web Consortium Pairwise comparison Graph (mathematics) Myspace Information Projective plane Graph (mathematics) Planning Database Line (geometry) Semantic Web Graph theory Subject indexing Number Personal digital assistant Query language Internet forum Video game Form (programming) Lambda calculus
good afternoon everyone and welcome to the millenium this afternoon my name is I I'm working for an answer and it myself today have thought the jingle rank from in C on who will be their core presented in this talk that we are going to talk about their we'll forge a that acknowledges that the use of part of their usage data switchboard and I will give some background to the talk was of 4 colossal of this and in a gym would talk about that in C or implementation of this technology
so the agenda for the talk which to the background on their research to the Switchboard and new research graph which is a data-modeling behind this then we'll talk about them will forge a query is have a look at that something where the 10 to 15 minutes on the technical side of things and then we to in CI implementation and at the end we would have time for questions and answers in the background and
this work is it has farther from their challenge off a cross-platform discovery of research data it goes back to 2004 in new research data lines working group when you had the problem of finding the related connections
sissonnes you might have seen this slide is actually the the slide in this work and I actually kept using that because it actually shows the early stage of the problem and I although we have solved this elusive started to a great degree but so many repositories already that have this issue if you have a data set and you want to know what else in a scholarly communication can be linked to disease formation is usually keyboard such is not efficient way so in this case for example the answer paging 2014 we had their data said that was actually who in this page for that they had a cross linked to edit the site and the 100 across a keyword search for title and really the key what now the problem was the queries that comes back from that was including a lot of false positives we had in this result we have more than 1000 records connected to the side so that's it was were supposed to be a recommendation for the researcher that is unrelated information to this dataset in practice that is another useful and have so many combinations so 1 of their initial ideas behind this was how the Amazon or other retail stores do this fight looking at their their service call the also service if you look at the book they tell you do want to look at these 5 other books the same also Watson publisher or someone level of the protesters this will also purchase builds on the books so usually those recommendations are very precise and are really at their limit that's seem you become until 5 options like 1 thousand options to their end users in this context we're sort of working with every day is this initial partners
that join the group at a time to basic active since a cross-platform with for the challenge that we had a significant the mission followed lots of aside every time you had opened their doors of a and other apartment joined us in astronomy their the collaboration of an and also in the sense of me and also there are other universities who had been involved in this project but I feel a little call in early stages of the working group now it this is a this is a mother sister intolerance
and usually when I get to this part of the presentation of talk about the that structure of resisted farmlands thinking in the scope of this talk we don't have that much time for this I would just as a brief notes that a research data alliances joint venture by the main from those that they invest in the infrastructure and the main goal is that the people who actually work on different projects that need to coordinate and collaborate and be from working groups and each working will have different that it will close is almost projects now in in this environment data Alliance me had a working group which suffered from 2004 the moon and concluded there main delivery of LAN deliverables in 2015 and L. we are at the moment continuing to maintain the wall and extending the platform now they're working group they're the main recommendation
after multiple prototypes was the data sets can be connected using the quantal should model and then this although in principle is that there is a simple idea but in practice requires connecting information across different infrastructures not when we were doing this at a time we were looking at the 1st stage of this process was looking at how this can be done by librarians and as a reaction went to the process of asking a library and what you had cases that we will to look at the this process and tell us what they will do so we learn from their practice so this context this this
is over but the college new version offers 58 hours after a couple of opposites of watching I that that screenshot from the same dataset there in 2004 games or have a new photo this data this is from University City and then when you research also mobile to the library and you look at the status that you can identify the researcher at less war or contributed to this status what you can actually search for that person and Google and they can't
find the relationship the profile page of the including publications when you go to their
publication released you can if you want to read every single paper at the
content of those paper you will find
data sets in other repositories has been cited low mentions so in this case you have a data set from the job depositors
now being it job repository you can actually see the same research at all the different name abolition and the publication is basic disconnected from University City so the low water and are just put together in this slide to emphasize the connection but in practice the new what enjoyable alignment you don't know that this war is connected to a researcher at the University of Sydney so if you go to the chain of all of these connections we would get something that will go from
a data set in answer to a research area in here since it means that article plus 1 to it datasets enjoy depositors now all this forces the 1st at the of the book was to actually demonstrated what so we went around what had what 250 collection that time that it is something wrong that we established elites have about of this it is understandable what you can do this with this new dataset and innocently to the decided use machines and in this context
we so the goal here was we wanted to have a solution with other spend too much time on the research on inventing the stand up so we adopted all the style of the cooled from other groups a of the other platforms and you try to implement something that it is simple to adult by others and also easy-to-maintain so their enormous structure is basically have 3 different layers the first one is their hydrogen there is basically is always document and it reads a positive performance the government will is the most obvious 1 but also there is a a a serious a motley line these and problem upon a ones from interacting repositories None of this stuff is actually is in the working group the least which I believe I have it if for this further in the slides they're Harmison layer put the information into a sort of machines that in this case we implement that zone but you can listen select followed other high-performance computing and platform that they're the main function of the required from those platforms that they should be able to run Java program because everything is implemented in Java and what it's what those problems do is that they basically read information from all of those points and the connect them to get invented the connection is possible it is or isn't identifiers it uses the prospect integration to get their metadata flooded your eyes the same for they decide it does the for orchid it it will API integration when you have the grant or paper and we actually search for certain domains in universities to find a profile pages the users some level of disambiguation and then we'll go inside the button loaded linking that blows across the graph and link the notes that there is a a visit inference components for those connections to happen there again I cannot actionable to detail of this there is a a document in will include the culmination that talks about the relationship that called known as so if you find 2 different elements that those elements air by I don't have the same identifier that enclosed in on a few all there are other same entitled with their similarity in other elements and them as a known as elements and then a new node linking this is definite and to get know the point is all of that information would get together in 1 database and in the case of our project we use we'll forge at the main reason for that was 1 the simplicity of implementation it is it was very easy to hire programmers who can actually called job while and actually but program for willful J and also the performance so that this speed of querying the database for morphology is much much weaker than that you implement something for engine arguments on user centered not the cheapest so the off order was that mean mean point of aggregating all of these connections for us and that the debate access layer after something because metadata harmonization which basically harmonizing it names as much as possible but by means of the property names as much as possible to what the government called that we ended up with a graph model of the college research graph at a time now there is 1 characteristic of this matter model which is different from the the others the method of of this this 1st of all it's not mainly by you or I have the philosopher elements where it is possible to convert information do you like the example of that is collected the deal lighting why are you like converting the grant ID to everywhere that we had options come to come but something to you ought to have done and that enables a scale the scalability of this graph to a distributed graph across multiple platforms and the other thing is that we have the separate relation object there which is some of you are familiar with as witnesses you have their well we have their particles have a data which is a collection we have a services he had activities in these small also the whole relation object and that rendition main advantage of this is that the enables connecting a that notes to you I was actually don't exist not persistent example of this is that if you're looking at their data repository university you might have actually or could recall other related identifier and in this more than just the limits up there put their relation object but so this records to Orchid and we don't have to resolve it sits there by the time they slamming what's the difference whether we can resolve the record to that a identified and Intergraph system actually handles this as a as a bi- directional relationships between nodes now I would not supposed talk about too much will architecture because you have a lot to talk about the actual and no food you queries so just call it have put them this topic there there
are this all examples of Monta degrees of separation just to 1 of the 2 2 database can be connected is for example if the dataset half the contributor which is also on that Artur actually published a paper and that people cited and other data sets and this is what we got 3 degrees of separation and have multiple offers this indifference nights this is a link to their
research graph and model in the interest of time I think I will skip this slide to a lot of a lot of job elements there a maintenance the long and if it had a time at the end of the living and if you have questions about that a connection come back and possible discriminate and these mainstay at
this level the available so you can actually learned like this a slow but the links that only that the stars and thinking and this meets the next part of
the talk is about male so that was actually the main motivation for this and under review switchboard project if you like but we implemented the research for the Switchboard you want but if institutions in Australian see I have that up that that university the signal the adopted the air and is using this and also in in Europe we have a multiple partners when using this apology now we came up with the same questions again and again and is on different queries that people ask how can do this I can't find my like dataset using you like how can I final the datasets that are from his particular publisher so in this part of this in the presentation under water to some of the scenarios
now 1 thing about a Newell for Gerda is that it has a graph browser and being that for the purpose of this presentation Annals of photosynthesis Switchboard we have their extended version of this balls of that site which has some built-in functionality clearly is related to the related a scholarly what's basically if if you don't think about this as an extended a graph explore the forge in this environment the 1 other thing that you can do as an example of this i've Clooney's is you can search for a data set and what I'm going to do is I'm to search for the same dataset from that just look at it in our example so here atoms of undersea did in well what if the student you from the slides so what
I said is that given a data set from the windows of the light and that light quanta base but I do it out its basic
incomes that with the a kind of fluid slash warned you dot edu prison that they just recall you get the content of this it tells you this is a recall from dry but this is the title and the use of waters know what we can do in this environment can actually be queried by order double-clicking on that or just a lovely little political expand but so here you would get the other information there are 400 datasets in the dragon violently to this there is a paper which is this slightest from plus 1 optical this was also 1 of the the slides and this is a researcher well in this environment I can keep expanding the nodes and what about the coldest strongest into the
graph so here I can see all the their publications for that
recall for the researcher reduce this is catching up and then all the grand sensible and also for these plants some of them actually have connections and I live 1 of them is also connected to another dataset while the status of now in this environment if I want to look at the metadata after recall I can actually expand this thing in the bottom of the page and here I can see the title which this is the 1st dataset at his father farmers so back to the point of what there is a fibration here I have a data set that
this leads to a researcher goes all the way back to the initial bids now in
this domain you can do the set of queries and I actually have a list of
things that we need to look at it so
we're going to look at the hard to find a dataset hard to find a publication glance incision had a fundamental could like on how to find data sets that have blue light that hard to find new eyes using prefix find highly connected datasets that is using there are a number of edges in the graph their connections with multiple degrees of separation and find shortest path between 2 incidents now this a mind and up to be a bit overwhelming to go into all of this either queries what I will do is that if there is some of the things that come conjugated the slides will be available online so you can actually go and try and and then basically said in e-mail and we cannot offline conversation about their syntax of the good in this case to already check their finding a data set by view on it but it can also find the of title so the way that it works is that new it they old in our research graph models have a proper article title and in this case the title for the data set is the 1 that we get from veritable that this is a simple query you can the descending for publication you can actually get the publications recall this is all obligation queries follow the fairly the latest knowledge in looking
at how much it cost us included here and then I can actually get this circle from
database now 1 thing another point out is that if you see the the the point of view on it actually can take longer than usual because they're the size of the graph is the database and this this cell at 16 in militants instructed and what it what you have just 1 here is that it's asked for a string search in a graph database full some of us were familiar with the architecture design of databases that graph they that this is not designed for string search longer trick about this and that I was doing of it would be a very good example in this in this case it was actors with 6 million nodes to find the remote that element but we collect them you connection actually do this much better by just making this will be more precise if you know it is from so we could just at the Serre in space here and also we can add that
limit longer at and what it does it says the first one that you founded comeback those go for the rest of this I know is only 1 instance effects so long to do here if I had been a button and it kind of that immediately so you know where you make a clearly have a direct impact on the performance of from the RAF it is now there this is another example of how 1 can
find the title of this is the same as a dataset so there's no complex about that then is the other and on recalled from Serre and that the publication a has a title and this is the of woman not for the grant to have another property that it is a useful and that is called pair persistent persistent you wanna there is a struggle for all that they are the image of our surveillance we have no which basically is this letter space and the grand ideas at the end we basically you have problems or the slash EU research such grounds and an a slash if it is a possible BARC background number it is an intimacy is that the energy loss Deshpande number so I can actually cobordism clearly a little here and pace
but before Apple's emotional another thing every included you type you can actually hit the spot and and out that your favorite so I
have the list of the favorite of all the qualities and I want to rock just in case of a circuit apartment after so in this case can actionable to this cities and people not the same query they can write and just to make a great doesn't just even 1 now this is
events and you can look at the content of the grant here now the same thing you can
actually search for Grandpa title obviously for the researchers there are a lot of options but you can certainly social above 1st and 1 last thing because Thurston is such but Aukey but also in the 1st and social but Scott was cited in this is still the actually get the scope was these interlinked or and mean excess so if you have if you're looking for research we discuss this idea that you can actually find that adequately would look like
this so you say I want a research over the scope sign equal to this and when is such this clearly that every
node into there was a lot more complicated so what you're going to hear they're going to find the connection so that not only looking for a particle of notes I'm looking for the nodes that satisfy this with the criteria in this case and we are looking for quote from John but this is little to local from Orchid now 1 thing is if
you if you had tried to convey this information in committee cycle queries into come upon presentation all the Tec-City tours but they do they fit around with their format of these dashed and other specific characters so just be mindful and that when you copy and paste sometimes they actually been sort of so back to the topic in this case we have a data set and is from dry up and this the syntax basically said I want the only notes from the and into working and then I wanted account of 1 to a hundred so this is actually introduction of an onto index inches cont elements a connection to vitamin D. The framing the notes you get it and a number of note so in this case we have 1 thousand 231 dryer datasets that are actually what I said it doesn't and publications that are included in the water profile and the way that we can actually see what law again just a with just 10 of those records and that these ones I can expand 1 off the land only so this publication that should be set up which you know explain and others that are not when you look at the nodes in their research graph model that is multiple labels for each note the labels identify the source in this case you have 1 publication which came from Orchid and process that means we have to a different sources for information that had the metadata for this report and then we managed now and 1 of their presentation someone asked me what the have and if there's a conflict about this with the data by the habit in the title is different from 5 till the faster the way that we manage this is that there is a priority in the this basically there are different sources that have authority but if information for example anything that is related to you why if it is is that you I religious faster in full measure from costs of we all right other notes so let's say in this case the tightening orchid recalled a might be modified by the researcher when be just this and then do integration of trust that should alright information from what by the cross that when it gets to you I know I know this might actually to false negatives but this is actually a bit for us was the most practical way to manage now this publication we can expand it postal could recall and is is it's expandable with recall what the song Felicia passion which in this case we didn't so that it is 1 of could recalled lead to 1 publication going back to the slides that
now this is another type of clues that might be useful for many of us is hotter find recalls the ads that talking turn this case I want to look for a cause the University Sydney
that contributed to this Australia up and we have a connection to work so the only difference he that I use the new elements that called kinds group now that there is another characteristic of our graph model is that that unlike a many difference what the some of the of have a concept of 4 in their routine not graph more than 200 to the concept of for and what we do is that we have found so the field but also their repositories this there the alderman them in a specific field we can just them into the system and efforts about the example that we did for NCR ingesting globalization nodes into the graph structure as to how these units will be in a data model that AID what's a move because the system actually agnostic to the metadata unless in the joint note to basically the graphical is you can have a hybrid model so in this case and in this case you have school which is with a data element from ants and say OK that is equal to the University of Sydney and Macau records of effect so it these are records from the University of Sydney enhanced wish to have a connection to work to for example this is a data sets I can expand this this publications on a oppressive publication singing or cost him actually it's its so this is all a triple again so this is another
example of a at this is that useful thing here is a new 1 of the things that you can do in the graph the authors of the search for properties so I want to know all the data sets that they have but can I should go beyond praise in our graph database system and
you can think about it will see how many datasets the half year but has your other differently the answer 15
7 thousand records in our data with how the plot of the code you want now you can replace this and other properties that the sale of undersold grounds that action they have come answer 45 thousand grams
now the other example of the clue that had a especially from all European partners was a finding the data sets by the proof of some of you knew what basically the goal of this is you want a lecture find the basis of somebody with O Jornal from particle publisher so
this actually uses the syntax of call regular expression in there it no forger you put a cue that after break equals and then you put that that start at the end the which that moved everything after that is acceptable so the idea that included a lot of it actually he has up with the attendance and this will tend
records that those Michael's have you ought to be vigilant about the matter all I can just pick up most of the code for this on off that have them you why that matches my criteria now another example
here would be a hobby can find a highly connected data sets so that things are getting more and more complicated and I have good news I have wanted to what is complicated clues to the so here what we have
done this is a game of and to see all the data sets based on the number of connections that have what we have done that with a period of 1 minute dataset should assist this for and southern distilled the hands that datasets that have been lost all of connections so I say OK well even with a cue for the article in the title of thoughts on lot about toward is and I want you to actually the number of connections that has sort them by the number of those connections in the reverse order know that it goes into the system and comes back quickly so here we find all the data sets that we have by the number of connections that they actually linked to them so if you actually look at his particular knowledge you end up 757 nodes link to that but it but it's now
that this is a this is the mass of all queries and this is a scenario lecture notes on a given line these ones this is example of finding the means that was from the multiple degrees of separation the application of this gloomy someone elected by lot of wines thinking how many connections that we have to drive well we might have connections directly but it was always also have collections indirect the example of indirect connections would be I have a data set L a protein and the just 3 that is actually going by all their recalled would provide an orchid and that'll final could also have connections to dry up so this syntax for this would be something like Gates Knuth says I want to get all the conditions team 1 to treat so if founded the quality like
its and ought company is returned title and the key and only 25 after what it does is it actually returned to their long titles to this is from and and then we have the data said she found the dry not the last of a police
I found this 1 is interesting for a lot of people in in the publication of the domain when they are looking at the researchers what datasets or 2 different datasets under wondering on these to actually connect connected but you can do in a graph database you can search for something that all the shortest
pack up that is the example of their coping obtained from the browser is loaded that the modest factors and appear that should go back to so keep this 1 of the last we don't have this
once about so we have a fix so
what this query has done it says OK I want of data from dry up is and island have a dataset from and spit is the life of final need the shortest but if things to and thus it looks like this status of these 2 datasets on me and using a research and publication now in this case you can actually replace the dataset researcher and sort of new you can resort began replace a dataset and with a grand final so that you want things now if you if you use the extended version of morphology with their research graph model you actually did this last half here which had opening and it has a
template for all of these queries so for example for the shortest path I here and few not the box automatic before we are not only need to actually fill up the cooler the deal was that only and we are actually extending this further so at the moment we have about 10 queries seen example and you planning to more will lose to this at template so that this is basically the
last slide for this we'll forge queries we want we will do is of 11 of the presentation will have a time to have a discussion about this question will be due on a then the next part of the talk is by a doctor jingle learned about in CI an ugly I did not actually introduce but bottle and there are other reminders actually looking at the data collection manager for a in CI in which is located in a new number up they have background is a this is actually the trick you want to try before that was envisioned by moved she she got here her PhD in their follows what seismologists by the son Sutherland usage is a it science and basically we have region of the molecule CI is collecting information across different different platforms by basically providing this and system for researchers and make the research environment more efficient not without introduction or I will have a lower this presentation through the will help everyone
I will use about 10 to 15 minutes to share my experience as a user of hockey Switchboard and we wanted I want to show their graph connection experiments and using our MCI metadata and this talk has also been presented 2 weeks ago mean the 1st reproduced science workshop in Hanover Germany it's got a positive feedback so for people of all who doesn't know much about and C I and C II is in short is on national computationally infrastructure so where the national level sentence physically located at the Australian National University campus from 2 thousand 13 we received a big chunk of money to store research data and the
MultiVision these some of the data are getting bigger and bigger to give this Gigabyte terrified even even much larger and in in especially in our domain such as the environment it is growing so fast I interviewed you PC or heart disease it cannot a rose large scale data and transferring data and share the data became a problem that's why we got this finding and still a large 1 of the aides no the storage to support 3 research data infrastructure at the moment we have more than 10 petabyte viruses up as you can see in this figure and our data including for wrong they spread east astronomy observation to satellite images and climate model a climate change research system onto the ground like geophysics exploration and even deeper might and your dynamic processing data ways the funding of
being 1 of the research data infrastructure we make use of the advantages to work on the data that we collected to make a seamless connection crossed different disciplines so as you can see
here we care about data formats because we want to make use of the HPC facility and we hear about provide open access to researchers and because of the large scale data is impossible for them to download it for their local machine and do the processing it's better to provide some kind of virtual environment for them to log and to 2 of the processing at our center because we have this so much
they are we need to modernize the catalog for people to know what did said available at the NCI so this is the 1 of the common questions researchers care about and their cattle we appeal to based on our rational relationship between researcher data grant and the paper for example if you see the 1st online it says researcher use data once supported by grants any generated paper 1 and 2 similarly for its of line we have 1 record however the obvious thing here is we can see the redundancy of researcher Beate appeared twice the 1 appeared twice grand be appeared twice is every every single nodes is in our database create up a lot of redundancy so the idea of adopting by switch bodies we use the idea identifier and we use the identifier of the same researchers like all kid we use the same identify all the data like dual I we use the same identifier like a pearl of a grant asexual now after we manage those and different nodes with the same identifier actually each entity of researcher data grants and paper and now connected is through of graph fickle relationship I think that's my understanding as a user how God Switchboard can help us to make the connection because with these graph view what I can't buy answered she question would for
example like what is the usage of MCI dataset that and he can be translated directly into ungodly Switchboard query that yeah just to show you how know but in a while ago how many datasets published at NCI about being referenced in the research journal articles and other questions such as What is the awareness of the available datasets with in the research community into contrast leads to a query questions that is how many researchers institutes are connected to the datasets and so on so that is that a question which is even more specific if I would like to know more about this dataset all should I contact call generate this data or use the data to publish the paper and what is the previous research has been done using this dataset und I believe this is on a very common question for researcher that when they start a new topic they would do this kind of research like myself doing a Google search 1st but you've we provide this kind of infrastructure it will make the researcher at the literature review much easier and now we used to use life
to explain exactly how we organize our catalog and then by adopting passage for the technically so we organize our cats can't in the hierarchy structure on the top of a node as you can see here it's an NCR network node which is only about a talk lecture level high-level summary of the data collection at the moment we have more than 2 100 and on the middle level you can see every single project has its own and she'll network catalog children's work is our metadata display interface but you can use other interface as well so it ends in the eye of for each individual project we might have thousands of other records fall in the file level of granular level it's not appropriate to have all this different granularity cataloged in this single goes note because we then at it's harder to separating them and it's harder to the by space by research domain for example so we use this structure as it provides flexibility for us to do more aggregation at a at a later stage you can check out of out of our main job network websites using this link so what I'm Switchboard do every single geometric has
its own dignity do database and we dump those databases into that i the Switchboard graph database and the connection has been made at here I don't show their exactly how it does but that's the magic where it happens in this box when the identifier was used it to merge the 2 French nodes so that's the connection has been made this screenshot is a common status that
using NCS metadata we find connections for example between the data sets the researchers errant Institute but in way I also notice that there are no are disconnected my follow up of processing by actually find out they are on the edge they are actually connected but in our metadata because it lack of some critical information so when I presented this database in a graph of all view it's disconnected it means that I have to concede correct a sum of all of the metadata information in our database so so far
as I explained that Switchboard help me out in device some missing critical metadata entries which by can provide I committed will complete sometimes there's is also help me identify the arrows in the catalog and by the he lived fix but without other Switchboard it's almost impossible because we have thousands hundreds some hundred and thousand records it's so hard to check manually but I switch but can tell me immediately to I switched photograph you will provide an analytical view of how research there has been used to so far this has been a very common question being in many times by our user because they care about who use their data and they hear about how to make the data even more public to make more connections to the external world and ideas was but it is an ideal to to make it happen it also can help me evaluate evaluate the the impact of the datasets researchers and in-situ Basedow on some that's more connections and need to get back into the has and it has up that of bigger impact if you like finally as you see in the example in the media as a demonstration is a research it doesn't have all Lockheed hate those clear wouldn't work so it's it's really good motivation or encouragement to forward researcher to have to register on can't for data manager and few men to do I for their dataset for data repositories provided President item you fire to increase the accessibility of the dataset so this is an one our experiences so far I would like to end up
my presentation by the you a real example that how I feel it is really helpful and from a data repository point of view and that is the basic question would be is that are connected to each other so we have a group from the Bureau of metrology and they down to the climate reanalysis data from U.S. because is too large and there are a number of people want to use that data so they are approved by NCI to store at NCI so that they can use it after a little while and by another group from cyro also climate research group they're downloading the clans reanalysis data from the same source but different politician different subset and anyone do some research however those 2 groups don't know each other but they both came to me I would like to find some storage at NCR to support their research and I suggest that since you shared a common interest why don't you talk to each other see if group pay are already download some data that could be can use and vise versa and then they start talking to each other after a few minds of that group which is also from the Bureau of metrology but a different branch and they are asking something very similar some question very very similar about using and sharing reanalysis data and I suggest this things means however as a human being as a communication hop it's very difficult and it's hard it's time consuming I can see the good chance for on the Switchboard now play my role as the communication higher to presenters those connections are automatically in the graph database so people can go there anytime 20 was there and check the connection of the dataset and start talking to each other without talking to myself is also reduced and it might also motivate a collaboration from different groups when they see the connections so that's my hope that that I Switchboard when NCR adopting and so we can offer those kind of services to our user so in summary that is what it is the greedy
to creatively linkage among researchers babies and and the publications the new photograph you is a very eye-catching and straightforward to complex interconnections with in the research community I also view this data management is a joint effort by the whole research community and the like librarian community and that's the end of my talk I will hand over to a Kuhnian as a prison fair clashing will think much on what will happen
after this wall after you're this living or we will have a ball of means usage of firms and there that would be 1 place that you can
find out some of the gene will be there as well and we also have not on your formula sheet should be we will talk about this technology furthering above and therefore some of you were interested to get that in you database that I was using for dinner affection quite a big file I can give you that 1 USB this so that's 1 way of getting that if they so that is a quick way in next week if you come to the research conferences you finally I can give you the fire there the if that doesn't happen said imaging 1 online actually putting that day-to-day on CI N platform as open access to positive so other people didn't downloaded but that is that is the publication it you're going to do little little of it so I don't like Thacker Holland was a fixed for that status at it should be that 1 also serve as regardless there the spies would be available online if you have any further questions about Ms. Newell Folger technology you can send me e-mail on this in an address
dating entire called football at switchboard and search graph database that structure all and get out so the leans on posits bear on the slide all I
can also bring in a screenshot from this so what will happen is the
1st thing you we probably would need to do if you want to create a graph database
you might want to go to nail forgery and
download there no for source Quillen called pilot that the easiest way would you we have a forge a repository
here that actually called so every plugging everything and being built the can just downloaded right so that would be a database legal you're asking what these on their scheme of repository here and
also there is a way that is a page and a way of explaining how does the long walks there are some crosswalks only positive
and you can look at my file you can get across what if you want to the import you know forgery
is fluctuation in another a harvesting point there so when you know the harvesting information it's with the Switchboard and the Switchboard data that is also unlimited GitHub repository under the suche what name have a multiple instances also here some of the positives here and a cold about as implicit making magic happens is in the influence repository now we have about 5 minutes that can actually allocated to questions and answers so
how 1 question from Christopher so the question is at what processing power is required for the queries of so the answer to this is that is very much a it depends on your 2 things action graph size is the obvious element and other 1 is your indexing of elements for the properties so in line in order to get everything then that obviously went to the left much quicker their dead the computation of but requires more storage they're in the in the trade off would be you can actually index less properties and then you will actually need tribal computation although allocated to that but the example that I shall inform actually my MacBook goal which has I buy 5 plus a so at this sitting in a graph database find a go to the in US offer plus followable presentation and also I it will forge of unwanted background so that is not expensive to love that thing which is expensive to run is an inference engine that's what it was high-performance computing and a lot of memory because it is a lot of that filesystem so you know in our case and as machine which I believe it is the will of the Iranian slides that was you have killed you to 36 costs and 6 in the weather Proc and that machine that's about at 72 hours to complete the pipeline but these if you have a machine so the rate of flow they're they're made of type is if you have a machine that half of the parlor it doesn't takes twice the time it actually make 8 times smaller so for the inference on their large graph databases you need a very probable machine or a set of emission of glass now this action opens a conversation about something that call this to be the blast I briefly mention that and that is why this graph project also now more adopting the idea of having a cluster of graphs running on different platforms and that is something that probably the few open beta conversation and other living well the technical so 1 question here this is for what this is what is the form-based search option plan other than the search queries see they yes the answer is yes so we are as a working on a a couple of different options for this 1 is at the institution repositories we on at the moment exploring the idea of having integration or life into the repositories that MySpace space that enables just stared they're called Plaphol the loss of planet and also deletions that bedside no walking on the idea of a providing as form-based search and I should tightly queries and get to the graph without actually loading of which OK so the question is these are any study at conquering that no for the graph and with other traditional news and the nuchal within compounds and here about so the question here is that is in this study above compelling in willful ejected biology and out of acknowledges that there are a lot of action studies underway and fractional forger compare the divisible with bring lots of other options so because that is where the company for their traditional databases like yesterday that is there are comparison willful gently did not explode later this is not long would you be and in comparison with e-mail forgery and a triple store in store X. studies so in this context as well there the first one is quite obvious there's no student and this is not you into this stream based search and they had not had the structure of the problem of those databases with this kind of scenario is a finding the chain of relationships is very very expensive process to do more SQL databases well we'll forge a is 1 of those surveyed are not comparing their different items in the same category that none of those options there is an honor for example singing up for that in this category of or you the the the mother mother called tight the needs and there's a couple of those in this group to matches social for finding if a performance differences is what it is that they're the main differences on performance the simplicity of the news and interoperability with other tools and platforms and the Semantic Web they're that the main difference between for J and a triple store is on the inference model I would say no for GEC it's far less capable of making a complex logic but at the same time it provides you with the simplicity of implementation and a performance inquiry so it's is much quicker to get the data file J. however there are different triple stores so different acknowledges and I remember in 2014 when we were doing the approach of for a 1st line is some cases that is a time there are experienced in their suggested that this is the cheapest or acknowledge requires more computation of but that might have been changing the last 2 years welcome OK so the question that means there is a compilation of empirical inference that the search quality short answer is yes it got for the last question in using their audiences with the agreement inserted alliance yeah well yes it's mother collaborative project of that was initially started by as violence sad and and other people joined so we had the infrastructure confusion that that data contribution we have coding contribution from the from partners X and overall we can say this is this is the implementation of what they're working will recover and that so the working group came up there the combination of different reactions had implemented that is which for look at the question of the modern research data the structure selection question was is is is which would be implemented in a system of Australia which I and Leslie that
so their answer to this 1 is that we use this 1st at the switchboard to being reached insisted Australia it's is 1 of the linking capabilities of hands has but at the moment we do not have the graph visualizer this is 1 of the items not pipeline L which there we already have this from planning to development cycle and in the future versions of the antecedent Australia we were looking how the graph visualizer that full light some of this information OK thank everyone there I mean up formulas for the time of the living on so I would like to thank the Avila for having you know not
Loading...
Feedback
hidden