Andrew Treloar, ANDS at DataCite summer meeting 2012

Video in TIB AV-Portal: Andrew Treloar, ANDS at DataCite summer meeting 2012

Formal Metadata

Andrew Treloar, ANDS at DataCite summer meeting 2012
Seeking Serendipity: repurposing DataCite metadata to augment ANDS discovery
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
10.5446/6571 (DOI)
Release Date
Production Year

Content Metadata

Subject Area
Inheritance (object-oriented programming) Transformation (genetics) Multiplication sign Range (statistics) Collaborationism Staff (military) Goodness of fit Different (Kate Ryan album) Object (grammar) Universe (mathematics) Physical system Collaborationism Service (economics) Focus (optics) Inheritance (object-oriented programming) Data storage device Metadata Staff (military) Mereology Binary file System call Equivalence relation Personal digital assistant Strategy game Video game Self-organization Object (grammar)
Laptop Context awareness Group action Building Service (economics) Link (knot theory) Multiplication sign Complementarity Mereology Twitter Connected space Latent heat Meeting/Interview Data structure Endliche Modelltheorie Descriptive statistics Physical system Area Information Projective plane Graph (mathematics) Moment (mathematics) Staff (military) Transformation (genetics) Connected space Integrated development environment Personal digital assistant Self-organization Window Spacetime
Web page Context awareness Beat (acoustics) Functional (mathematics) Inheritance (object-oriented programming) Service (economics) Link (knot theory) Disintegration Chaos (cosmogony) Disk read-and-write head Arm Theory Emulation Product (business) Number Revision control Data management Lecture/Conference Moving average Spacetime Information Aerodynamics Physical system Service (economics) Link (knot theory) Web portal Building Software developer Projective plane Computer program State of matter Range (statistics) Bit Database Voting Computer animation Query language Uniform resource name Time evolution Function (mathematics) System programming Computing platform Software framework Self-organization Physical system Data structure Experimentelle Versuchsforschung Domain name
Email Meta element Service (economics) Demon Archaeological field survey Archaeological field survey Computer network Database Data storage device Term (mathematics) Counting Demoscene Arm Sequence Data management Number Computer animation Personal digital assistant Set (mathematics) Moving average Gamma function Domain name
Web page Wechselseitige Information Greatest element Installation art MUD Link (knot theory) Multiplication sign Demo (music) Analogy Database Matching (graph theory) Infinity Arm Data management Summierbarkeit Scripting language Addition Matching (graph theory) Information View (database) Suite (music) Staff (military) Ultraviolet photoelectron spectroscopy Group action System call Frequency Computer animation Blog Natural number Quicksort
Link (knot theory) Code Demo (music) Perturbation theory Real-time operating system Water vapor Web browser Metadata Product (business) Goodness of fit Type theory Meeting/Interview Electronic visual display Ranking Information Alpha (investment) Service (economics) Link (knot theory) Matching (graph theory) Real number Metadata Degree (graph theory) Type theory Latent heat Word Voting Process (computing) Website Point cloud Resultant Domain name Row (database)
Distribution (mathematics) Weight Mountain pass Programmable read-only memory Nuclear space Real-time operating system Bit rate Stress (mechanics) Total S.A. Type theory Moving average Website Organic computing Descriptive statistics Chi-squared distribution Scripting language Metropolitan area network Partial pressure Data recovery Stress (mechanics) Parameter (computer programming) Statistics User profile Sample (statistics) Langevin-Gleichung Uniform resource name Different (Kate Ryan album) Sieve of Eratosthenes Compilation album Website Right angle Volume Data structure Row (database) Surface Freeware Service (economics) Quantum state Proxy server Dependent and independent variables Line (geometry) Maxima and minima Content (media) Metadata Sampling (statistics) Smith chart Value-added network Sequence Local Group Sound effect Computational physics Computer multitasking Maß <Mathematik> Self-organization Data type Window Multiplication sign Sine Computer network Core dump Field (computer science) Group action Data transmission System call RAID Mathematics Computer animation Social class Game theory Cuboid Pulse (signal processing)
Surface Freeware Proxy server Distribution (mathematics) Dependent and independent variables Weight Cloud computing Maxima and minima Stress (mechanics) Bit rate Total S.A. Sound effect Geometry Pointer (computer programming) Type theory Organic computing Data type Chi-squared distribution Data recovery Field (computer science) Parameter (computer programming) Exponential function Group action Data transmission Computer animation Compilation album Social class Volume Pulse (signal processing)
Distribution (mathematics) Weight Cellular automaton Matching (graph theory) Analytic set Dynamic random-access memory Content (media) Sound effect Landing page Finite element method Row (database) Integrated development environment Lipschitz-Stetigkeit Website Arc (geometry) Pressure Data type Newton's law of universal gravitation Metropolitan area network Workstation <Musikinstrument> Service (economics) Beta function Execution unit Link (knot theory) Observational study View (database) Image resolution Tape drive Computer engineering Web page Water vapor Metadata Parameter (computer programming) Core dump Field (computer science) Digital object identifier Protein Population density Explosion Computer animation Oval Computer cluster IRIS-T Partial derivative Personal area network Optical disc drive Thermoelectric effect
Link (knot theory) Demo (music) Directory service Content (media) Disk read-and-write head Number Sound effect Meeting/Interview Term (mathematics) Query language Energy level Row (database) Integrated development environment Ranking Data conversion Website Conditional-access module Pressure Physical system Data type Link (knot theory) View (database) Real number Software developer Web page Moment (mathematics) Metadata Staff (military) Directory service Term (mathematics) Element (mathematics) Protein Computer animation Partial derivative Website Energy level Ranking Optical disc drive Electric current
Distribution (mathematics) State of matter Weight Scaling (geometry) View (database) Water vapor Stress (mechanics) Bit rate Disk read-and-write head Total S.A. Type theory Query language Row (database) Information Data conversion Website Organic computing Logic gate Pressure Thumbnail Family Source code View (database) Data recovery Structural load Web page Moment (mathematics) Complex (psychology) Parameter (computer programming) Bit Term (mathematics) Sequence Entire function Uniform resource name Compilation album Website Energy level Volume Resultant Electric current Row (database) Web page Surface Freeware Service (economics) Proxy server Link (knot theory) Dependent and independent variables 3 (number) Maxima and minima Power (physics) Sound effect Goodness of fit Ranking Game theory Scale (map) Information management Information Field (computer science) Group action Data transmission Word Computer animation Personal digital assistant Query language Game theory Resolvent formalism Pulse (signal processing)
Scale (map) Source code Slide rule Beta function Proxy server Demo (music) Information Link (knot theory) View (database) Kolmogorov complexity Scaling (geometry) Weight Software developer Workstation <Musikinstrument> Water vapor Computer animation Integrated development environment Personal digital assistant Website Row (database)
My I think I spend more time here than in Australia but it's not True store Today it is talk to you about how we using admitted after To augment our existing Discovery Systems and I feel a little embarrassed talking after shots Because there was really amazing talk about a whole range of fantastic things is But with more mundane There same To me at least not quite as exciting but hopefully still of interest to you so 1st there some old each time even I apparently spend all my life in Europe Pat Summitt you not it hit me talk about this trend in national that we're a initiative unsurprisingly Estonian government we got to charges of funding 1 from a finger was focused on collaborative research infrastructure and then a 2nd tranche from the thing was cold Super science on not entirely sure why has some particularly super softer science anyway I we're collaboration between March University the University that I work at Australian National University and a federally funded R&D organization called the call of Scientific and Industrial Research Organization account good equivalent in Europe were about 50 staff We have a couple of Things that we say we're interested in we're interested in more researches reusing more data more often big focus on day reuse which can operate at the conference and secondly this idea Gators a first-class object resonating very nicely with the previous talks so it's not just about publications about data as being as important and in some cases more important than publications More recently we started talking about what we're trying to do in ends as enabling full transformations so we're trying to achieve transformation was wrong and you as a subtle difference they you say that
That column is headed data that column is hated structured collections but essentially from things that I managed to many of our men aged stuff that's in people's pockets on USB drives On how drives on their laptops whatever to structured collections of data that are now much beaten men aged for belong to from data that is Disconnected from the context in which it was created to data is now deliberately connected in the context much like those graphs That we saw in the previous talk of making connections between the data and the publication and the research project and the researcher and the institution and the instrument and services duties rich may of connections from data That is largely invisible because it's in someone's pocket to data that is now much more fondled and lastly your memorable phrase slowness in rural about use from Darfur that is single use to Darfur that is now much more reusable and saw
Those buildup if you like you need to get as many engaged then you need to get connected once it's connected it's much more fineable once you've found the thing you can reuse And so are off the goalie's so the Australian researchers can work better with data but of course we're not just about estradiol researchers when making a data available the whole world As part of the East with building this thing that we call the Australian research data calm so that the metaphor is like a common area That people can come to which brings together the data and descriptions and the relationships between and the infrastructure that lets people and things Comments and the reason I'm telling you this Is that what I want to focus on for the rest of my talk is what we call the window even to Commons so the window into the Commons is out Discovery system and that when we were trying to deal would add discovery environment for this strain in research data Commons we had Some deliberate things in mind the 1st wars We were not trying to replace discipline specific discovery environments if your marine researcher and you want just marine data their places that you can go to that And you probably know about those already because you work in that space so we went try to replace those were trying to complement gold walls discovery across disciplines or discovery of data by people who were not We did that discipline the 2nd thing that we're trying to do it at Discovery Systems is we didn't want to say to people you have to counter us to find data so we deliberately went for a model where we go to where the users on that is we make our Outdated descriptions discoverable Through the installation environments that people you switch primarily at the moment means making sure that people can find our staff in Google and being in Yahoo and Dr. go on whatever roasters Corky using so we make The data easily accessible for people who they don't have to change their information seeking behavior The 3rd thing we're trying to do was provide context around the data so it's not just the data etc. ready it's the data that is linked to the publications that is linked to the institutions to the researchers the research projects And we did that for 2 reasons 1 walls too Provide More context to help with discovery of what I mean by that is maybe you met someone at a conference maybe you tweaked talk a little that Twitter in time so you can remember someone's name or you can remember the research project about that's all you can remember which I make it easy for for you to find But 1st of all that Project and they strongly at follow the links to the organization I work for the projects they work on the data they produced but that context is also useful For more Assisting with so what you found some data you need some way of assessing its value and making links to funding bodies Auto institutions as a way of helping you say Well yeah that was founded by the National Institute of Health is probably good off that's associated with that research group of the day I don't want anything to do with so the context is helpful
I said we don't Replace discipline portals we link to them and we tried to do is as saying his window into slow what this looks like he is because so they see is the production version of research that Australia We had 40 thousand not collections 5 thousand all parties Proteus eh Persons Control plus exist system on a path is a person or an organization we have a number of services associated with data And we have about 27 thousand research projects and so you can put in a search he well and it will search the underlying database and all those things those 40 thousand collections 27 thousand activities is a awaiting page of voters that's being indexed by with search Slug Such The context living up talk about where the data side knitted outcomes in 1 of the things that we Decided to do fairly early on in our development of about discovery service 2 not just support beat on looking for this query but to support serendipity to make it easy for people to find things that they had deliberately searched for and I think in the back heads when we're doing is we're thinking on something a little bit like guy the Amazon system people who searched for these also searched for that We did decide to do it like that as it turns out that that was the kind of ideas so that The same chilled by theories we provide suggested links and we start small and we gradually add functionality so the 1st stage walls
What you might think of the internal suggestions sorry if I go he and I say I don't know The Australian researcher kangaroos contractually required to do this work overseas and a case so he is a survey of kangaroos from
Soccer combat and your city
Down the bottom of the page it stated Now he's suggestively so in addition to the search that I dominates saying will here are not I 892 data collection with matching subjects too if I want war staff on kangaroos because you know you can't get too many kangaroos click on that and informers of time he's is alive around in the fullness of time now it is actually doing something about that will go off and pulleys that was not what I wanted It is still running running but it's gone invisible what's interesting about OK see if it comes back slowed That's that was studied 1
Yet it this is of course exactly what you want for dinner would stop script I say we are the 1st blog Call suggested links that this particular queries These are other things that got Kangaroo staffing the sort stage 1 was that stage 2
As Young said we were vote early on in data side so we thought well Now that a data sites such IPI water we do that and so were show you this morning is not eating production about we'll be in production by the end of this month I think that what we're doing is we using the title of the record that someone is looking at it as a search process against the side metadata So rather than just doing suggested links inside our own collections with broadening that we search in real time we start for the best possible match we reduce the match percentage if we don't get any results we start by looking for Words in the title that doesn't work we will look for most of the words in the title if that doesn't work with some of the words in the title And we preferentially ranked data sick results Ahead of the other resource types the data site knitted after displays so what this looks like living dangerously now but I'm not doing it dinner with the browser that's been shown to be problematic are now doing this on alpha code Code running in the cloud stretching out the degree of difficulty here it's like walking up tight taught right with my blood blindfold site such a kangaroo actually there kangaroos a bad example for this 1 is pulling not gonna be a lot of good matches in Darfur let me look for marine sediment where I know that this is gonna work better so again I do search helpfully reminding me it's a demonstration of our I get the same kind of results
So it's sigh interested sedimentation stress in his particular kind of call On the Great Barrier Reef stray Arlette click on this 1
OK a what a game is suggestively they start off by providing the internal records and being in real time painted external websites 44 collections That led do any refresh you probably see that little data internal records external website so that's a surge in real time against the better Sunday metadata we haven't pulled the day after that Assad across using the IPI to query in real time stuff back so I can do exactly the same thing as I did before we get the same script but that's that's give it a go but now I live on pulling back the better side metadata using the search IPI let's say uninterested these side niche we thought about being directing a fake easier to partial pressure but having rated happen optical and so I click end each tells me well here is a description he is a citation it's a collection as I said we preference collections up and I can now look at the data side I bounce off to the side service more interestingly I can say Well I don't wanna see the data side today looked enough metadata already our go-to and and not I don't like that everybody loves made at rights but now raid and enough of the maitre data to decide that I want to go look at the record on
And he a please Do not wear this tweaking do not tweet enters week on it be let back in the country founded so I can just click directly and go to bypass the downside and go straight to original souls
What can I can relax because Saw that stage to stage 3 With still thinking so it would building a staff in stages so what other things could we mine for those additional suggested links that we have some ideas the National Library of Australia Is 1 possible targets they have a very comprehensive system called Troy which we could search against others that thing could start a special data directories is the episode living Australia which is associated with GABA infected GB stole The director there is living straight to run but I'm not bitter we got over 5 and because we don't have a thing to just be Israelian wearing conversations with dance in the Netherlands about using em announces which is a research portal the Dutch head as a possible targets and I'm sure there are other ideas will come along as well in terms of data side we have a number of possible enhancements that we're now looking at around 1 would beat we simply taking the Darfur search rankings using the search IPI we might want to think about tweaking that at the Darfur citing high like things we care about where discussions about a site developers It would be nice if you could do this sort of see also suggested links thing at the search level because of the moment But do research
I'm scrawled up close that I only get to see those suggested links when I go eat into the record it would be nice if I could see suggested links for the entire state of records rather than have to look at each 1 18 nice if you could use the power of the use of followed 2 gates to the page they're all get to al-Qaida on to re ranked turns in the search query in other words to say All I care that followed these sequence all things to get lets use that to tweak The results search at the moment we just using title title electoral pull things to search body nice if we could use Al subjects takes against the downside subject takes give you much richer results but that's push the search Oberon and a little bit harder We got some other ideas it be nice if you could say I'm looking for things with a similar spatial coverage of all things with a similar temporal coverage or both that it be nice if you call would remind information about the cold so He is some other stuff written by these people so again it's Scott Amazon richness of us were for things that share keywords lots of ways that we could improve but would be found The ways in which we help people stumble across serendipitous stuff That they wouldn't otherwise missed there's issues for the future Conversation I young market had with young eased as he can medically click on someone clicks on view resolve records they bypassed at site or together and just jump directly to the underlying thing fling not but we haven't had the conversation a gag you can tweak that he shook his head and said he didn't houses case Skiles this is obviously a city you start doing Federated search there's a view debate in this game for while synergies stopping Federated search has been a sky water lots of See also services so if you mention 18 This example that as well as 2 thousand 144 collections from data side we have another 10 thousand from sits about 3 thousand from somewhere and so on and so on and so on how's that gonna work you what you may not have noticed was that infect you can use the Web page as soon as the page loads you don't have to to wait for those things to paintings but you could legend someone sitting heartwarming thumbs waiting for seeks of seeks of these affiliate and that's not a good user experience
And how See you I don't work I mean you could begin a 4 foot back you can see that that's fine with 1 external collection it's gotta be OK with came not really work with 20 or 30 or hunger were still thinking through those habits at your work in practice and of course there's the for a month ago was the case question so we worry about all of these issues is the actual user
Is it going to be enough to simply say to the user here are some suggested links and get a blow the water get you the user probably don't care whether this comes from Data off Iran's internal from off that Australia you just take note He's some other stuff you are interested in my suspect that's where we're not going slide not have information about me in the middle and links to EU demos hours using now I have to say to you and start over there you is the website for all of us it links to the production environment that I did the 1st such as on research data Dormansville told is the pre release beta thing I can guarantee that it will always be out because we're fiddling with but it's the 1 that lets you play with the the also so I figured I should show it to you and you know what you're Australian so you and be telling developers that I've done this from but they add Fisher should decide how delight toward you go out they who Tweedy it's a delight see tweeting this particular station and is being really really helpful for together just reach with somebody else's said something better than they encourage you to keep doing thank you