Logo TIB AV-Portal Logo TIB AV-Portal

Project VandyCite: Using Wikidata to support research information management

Video in TIB AV-Portal: Project VandyCite: Using Wikidata to support research information management

Formal Metadata

Title
Project VandyCite: Using Wikidata to support research information management
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2020
Language
English

Content Metadata

Subject Area
virtual reality Clifford time Universal projects digital Stream structure Part events libraries
circulation Actions time unit sources ones materials schemes heads Part strategy hypermedia systems area digital moment coordination bits maximal staff digital entire skeleton Types message-based communication chain website editors sort Remote Manage point statistics Barriers link print student number production period Average terms Clifford intrusion detection systems Authorization testing standards information NET projects catalog visualization environment Software mix Universal libraries
Open Archives Initiative Protocol for Metadata Harvesting integrators views time workstation sources materials sets space functions Part information wikis sign different arrow model systems Development feedback The list effects bits several useful work category document management communication Board progress record web pages link XML events threshold training versions period goodness form addition mid information key bases law projects lines catalog Migrations environment visualization Factory Blog Universal statements archive libraries
link decision time sources number sign terms Average Software information statements contrast series fuzzing unite user interfaces script sources services link information decision projects storage staff lines completion van Faculty connections pub category words loop case string calculation interfaces statements website sort Video Results record
script states code routine schemes Databases Part Faculty wikis Blog processes series errors script collaboratives link mapping formating bits Part structured data processes orders website Right sort record point web pages services Identify files fields write spreadsheets Google statements form user interfaces Graph matchings information key interfaces projects Databases GRASS non-existence Faculty loop statements fuzzing life
standards script Graph time schemes Part wikis structured data mathematics different HDDs model systems script services track mapping moment storage bits Part van mechanisms structured data interfaces sort stable record point track services Identify table potential versions spreadsheets unique statements form web information verfolgen graphs basis lines Query calculation statements archive life table Routing
torus track circulation presentation management integrators time fields different terms Authorization systems conditions form tablets collaboratives Super projects storage interactive catalog Types category calculation Universal website Manage
take more nearly out of the stream how low omelettes easy to do you guys have you were priests and taishan i do so our show at structure and. that would be great and then i will at the stream will all myself out regard to mourn everybody of really nice to be with you have this is a sight to be part of this event and you know as it will eventually to were thrilled to be you from national it's a an early morning hours but but. it did do some breakfast time not too bad ok so. they are topic is project and the site and we're going to be discussing this initiative that was a gamble university during the past year and let me just begin by an introduction to myself and to have been to university so terrible university is a research university located nash.
schools and see was founded in eighteen seventy three and has a are under or graduate and roman of seven thousand students roughly and gradually rogen about six thousand five hundred us about the environment five hundred students altogether. and i'm a librarian of venables my title is used to university librarian research and digital strategy which means get involved in a lot of visual things you can imagine i'm also a member as with marilyn others in his group of the with media library steering committee so very much trying to bridge the. community of comedians and migraines i also active in other projects like thousand women in religion so that's my own background coming into the us but let me just again by talking little bit about the project to go so first to let me just mention the abuse.
it's about scarily communications because this term may not be familiar to everyone that's listening and very familiar to academics i think its quarterly communications abroad topic that deals with the entire research and publication value chain so clearly communications thinks about it really publish. watching it is an ecosystem from end to end and we think that ricky data can serve as a central hub in that early communications value chain by just as it's been so good at doing connecting these sort of just just brit islands of information and bring them together so i'm thinking about places that. to identify who people are like fire and you know connecting with institutions to identifying where those institutions are located in a kind of messages they are that support publishing activities the authors and their publications ever the ave other types of products. in the editor's that are working and journalists and so you know where those editors who have one of the journals also the citation network that emerged when you start looking at how all these groups connect to gather so that's of critical interest to people that are doing scarily communications and looking at how information and. occasions pass through that value chain. and i would also say that a skeleton occasions is not simply an analytical activity to quote so someone who is fairly well known. we are at a refinery waste as curlicue occasions is not about just interpreting the world in various ways but the point is to change that world so communications is also about advocacy were trying to make this publication ecosystem a more efficient. and and to reduce the barriers to access so that people can actually read the research happens that universities like than doubled in the rivers around the world produces. so that these into this question about you know and the net data side of things are. we have been working for a long time it agreeable and steve steve is in part with me a long long effort to really introduce concepts about linked data and to our library and this isn't something we'd had acted link a working group for for years really. i would say is you can comment if you like steve it was a bit tough going we focus on some of the and the you know standards are out there wasn't its cost and we look at r.d.f. and the ways you can serialized are yet we use tools like plays rat we explored it a dubious tighten which is their grabbed a debased. i did a lot of work into a rapid pace myself and we will try to show our library and how this could affect their work in in the air is a bibliography by you know again by connecting things like the allies and ice a sense of ice ins library of congress ids by of orchid ideas and and bring together. for all those different sources of information that they work was on a daily basis but don't necessarily see as linked data in the systems that are using. so i would say it was a it was a bit tough guy. high by your ear to injury high barrier to entry that you steve and so we can data has has dramatically lowered that barrier to entry and i think that's one of the great successes that i see even for the link it a community that is just who provide a great interest went for people to get into using linked data and average. that's all i come back to them but in a moment that because what we're talking about today is how we got those librarians involved in this particular project i called any site so then the site came out of like everyone else we had to make his crazy could transition back in mid march to remote work i mean we all know. so exactly what happened with copd so we we his library of managers i mean that's my role to see how people to manage projects across a number of areas had to think about how to keep people productively engage there were as we move from overwork and i was particular. the typical i think when you're thinking about people whose work was actually time to the facility in some way you know they might have been standing at a reference test for circulation desk or they might be involved in the delivery of or are you know. circulation of print materials in a number of those activities were shot down and so we need to make sure that we could keep people rejected we employed during a special that early period of remote work so about what we decided to stick to take this kind of approach of. organizing teams normally libraries organize and harman's and you know it's very. it's fairly hierarchal but we got given that we've got people working different units that have different needs that form teams around sort of particular goals and then we can meet those goals i've pretty intense of work as we move through a work and so we organise teams around things like digital preservation whether accessibility asynchronous learn. individual pedagogy so on so on that this is separate talk just for librarians about a mansion stout moving from a kind of our own motor a team wrote never lost one of those teams was a week aging now these the schemes require twenty hours of work at a minimum some people did. more they were as a cross departmental non-hierarchical and our particular team for which data included people have it was a there was a catalogue are there was someone who had worked was working in our and its library of copyright clearance coordinator to staff members from art of in the library and. others including see me and so is a very sort of mixed group of skills and at the beginning i was put in his eye as i formed a team i was a nominal head was really great about this was that only lasted for a little while and others quickly took over and let the way which i think was sent. that's because that's exactly what we wanted to my mean role became actually recordkeeping which is nothing which repeat it really helped with when we try to show ok how we know our people engage are they making contributions are there is no easier way than just using statistics that you get from the media to demonstrate that. ok so when we began his team's week we had to train. almost everybody that because they had news we dated for to use which he did and to engage in the goals of which the site so we had to introduce as communities and talk about how to get involved in it so what one thing that we wanted to avoid was a kind of the door and unity prior efforts really bad.
you know that there had been ever crossland the boards which was you know something that was done in the mid to thousands or maybe a little bit later. it it in ago was she to kind of engagement periodic bases with reference forced to show the value of lager and i mean i think for that goal and it was on the movement of what we want to do here was rather than sort of make a periodic engagement we wanted our librarians to engage with me. the community and and really learn from them as well as learn from us to understand themselves as part of community so one thing that we ask everyone to do was to write a conflict of interest statement before they begin adding there's we look at research within five or more conflict of interest in in which the data but we made sure that everyone. understood that you know we're ending on which the data heading for the greater good of which the data and we thought that this your line well with making sure that what were editing putting and articles about our factories godard he did serve the ends of which is that but we want to make sure that that they were away. whether there could potentially be a conflict of interests. and then i let several training sessions just on basic entry. at a crate and i don't have to add new properties had two references. and then the libraries are teaching each other as they had questions have come back to that his second and then what they also got him we also said plug into the executive are signed up for which the project books and learn about the models are being used there and you know blown beholder using of her model which is your natural along. i heard on for librarians a simplified just to use work in addition a rather than at the station i don't and so i think of people that it working cataloguing got this is great now i can also apply for which is something that i want to do so we did develop communication channels for for librarians two. the top of each other about the issues of their in countering and as i say we started. started just with gathering group and at first the the communication i thought maybe you should take place like on wiki and so i said. just a single age i can shoot my user page or train resources and and a lot of lists about what people should read their up and then we started a team channel that was in turmoil and i think people felt most comfortable there which is why i let it is here communicating in teams of which if you have news it's like a slack. like environment just because i think as they were starting day they would raise questions and other key would come in and i think that a little more comfortable there. but as as we moved on and people who grew and grew in their little comfort in the data itself we also started up for page called project began decided maybe she can put this into the notes. it's in their of it that's where we we we are merely use them to visualize our progress with things like was jerry a and. integrity i think is what is called the show basically. you know the progress over time of our efforts and so that's helpful things also help prop for others you can keep tabs on what we're about to tank. so one of the things that we're were able to achieve in this project was were able to make some wings not only way that we've been talking about arrive in time got up to now between different information silas a in cyberspace but also made links between different information silas that are our. our own library system so it as some you know academic libraries at research universities are also want to burst we had nine visual libraries and reasons in those libraries had been keeping their own bibliography is a battle members of occasions sometimes those voted are these are kept in such arrows sometimes there and. reggie it's sometimes you're being imported into institution or pasta it's like his base for the press we actually have open as to why a credible. and so there was no single bibliography for back with publications on canvas being that we were in effect when we try to look at where frankly was producing we were forced to go to third parties to to actually call it that data course and then present back to us and so we were also missing research output because. even people who were keeping those data in their individual libraries what is really working and like new forms of of scarlett communications by clinical trials or occasions of data sets and had your eyes and so it helped us in thinking about the scope of what we're doing what we want to get on with the data to start taking all those different information sources. and bring them together just is alive or internally. another place that i think that you know i mention that there could be potential for conflict of interest but it was an interesting discussion that we had looked for feedback from the new year is thinking about no ability no in some ways this is a much less fraught issue that is on mike english language wikipedia and. our view was that as long as you could find a relevant identifier of for an article that establish its you that it's called the article that it would belong within the purview of which is i. some some are branching out though from just scholarly articles to things like cars or so we have an initiative called the slaves decided to archive that collects sir rare materials about the migrations of course migrations of act in people's around the world and it's a very bored. market and now we have an interest for it and so you know again like me feel that's notable event that long under the category of digital archive there are though in some fact the biographies things that we have been shot probably don't belong and this is where i got a little bit more sensitive some factory would list. blog posts and maybe not peer reviewed posters and after some conversation without unless there was a peer review of that law cousin was in some sense seen more like a publication it it wouldn't be included. you know we never had any actually challenges on that not sure in record that everybody's fully aware of that we did publicize records and that's one thing i should mention it denies press release there is every one on campus would know what was going on which we can also the notes but i again want to be sure they were serving the going to have. the which the data which he said community and so there were some things that might have that did fall below the threshold that we would cut off and didn't quit.
i think steve that i've under my initial party was bringing up here. ok thinks klyce so as click mention i've been interested in linked data for a long time and i got quite interested in week he did it because it's such a practical application of that the next sign i so i'm going to talk about the hundred baht which is a project have been working on for.
about a year now i'm in i thought i would define sort of were bought his in what i mean when i called in a pot of bought and i'm just going to say it's things that can read and write to the data via its a.p.i. and some of the pots that are out there are operating without human intervention but vendor bought is not in our time. the spot and it is a lot that works together with humans next like so just to sort of contrast a lot of people are familiar with quick statements in this kind of a one way loop where the person looks at what's in which the data and it's a c s d or some other form of. they say and then uses the web interface to push that data into which the data excite the vendor bought is really kind of a team approach some three hundred baht which is a series of height on scripts serve sits in the middle you can both read and write to a key data and it uses french. it's as a local storage making isn't but it's also tied into other data sources it has said uses a.p.i. to look at things in or kid in a pub med in cross rest and it has a uses fuzzy matching to make to see disintegration decisions but then it also can call and. human if you can't decide what to do next sign. so basically it's a it's a team effort where the bought look things up for the human and then disintegrates were hands and tries to avoid duplicating dissing information but again it has human assistance the other thing that i try really to do is to identify when references were missing and make it possible to. that references to existing statements so i just made kind of back them full of calculation that i can make probably forty at its for our working by myself but i'm with vendor but i can edit about two hundred items per hour on average and that's items not it so. each of those items might actually involves a number of it it's so using have been caught as a tool i could were probably ten times or more faster than i could work by myself in exile. so prior to the start of the of in the site project i use vander bought to make over eight thousand items at its and most of those at its involve multiple statements being at it once and then result was that basically all of vanderbilt faculty. in many of the post docs and other staff and had records a fairly complete records or in terms of will not records but items for all them with at least minimal information and this allowed then linking those people to other resources. and i created almost a thousand links between researchers in clinical trials using the new principal investigator property and then this also is ready to go when van the site started so there are now about twenty two thousand words that are linked to vanderbilt researchers a vendor but did not make. those links but the fact that the researchers were there made it possible for either humans to do that or other connections it had been made previously are now associated with vanderbilt because the people are associated with in the next line.
so i'm going to not try to get in the weeks too much on how have been about works exile.
but basically as i said it's a series of its of several scripts the key script is the last one that's the one it's a general purpose it's a script that in just see as the and rights data to the weekend a.p.i. but that script gets sent by a series of others scripts for the project. involving giving people the standard but researchers these scripts are pretty in the us in credit because i had to scrape web pages and so essentially eroded different routine for every web page and then the that information from the web page had to be descending curated use and fuzzy matching and then i also download. did the existing statements using the the wiki data querrey interface and so the the key thing here was to try to not duplicate existing records which identify when they were already there and only add new information next i. so just i'm not going to dwell on this too much but there's sort of the loop that involves getting the information from the blaze a place graph interface and are the key to query service and then through either polling data from other places are having it may. annually added by humans the that information into in the sea is fee spread sheet and because it's a spreadsheet easily editable the human part of the team could go in and make corrections in and look for errors prior to actually writing it to the a.p.i. and then when everything is ready to go up there is a mapping scheme. to the vendor about uses that translates the format of a spread sheet into the form that the a.p.i. needs in order to ingest the data so what a critical piece of this also is once the data is written to the a.p.i. to grab the identifiers of the newly created items and that's important because there's actually. the a lag between when the date is put in the a.p.i. and when it's available inquiry service so you can't depend on finding out about things that were written immediately because sometimes there could be a lag of up to several hours before the information's available next site. so at this point i'm working with a charlotte ludicrous dhanda in the divinity school and we're working on trying to create journal items for all of the journals that people regularly published there and as a part of this process i'm trying to generalize the sort of preliminary scripts were. you acquire the data so that they're not focus entirely on people that really on any kind of thing like journals are our work or whatever i also have been working with my collaborator just the grass cops who is helping me or how to develop a web interface to create this scheme of that does the mapping between. and see as the files in the the the the wiki based data model and we have a manuscript were hoping to submit by december on us and india and its cliff mention there is one of these of terror in of his is that i'm is like offline and eventually we want to take. the the publications are in their connect them with the journal items are creating and with the divinity faculty there are already in which he did it she served create a graph these publications and these are i'm typically worked so are available in commercial databases so like for fields like. bio medical fields that kind of information is readily available but this is a we're excited about this business would fix the increase exposure of the work of people in departments and are covered as well in the state of a sense of excitement.
so you may be wondering why i'm developing this code in a just using existing could so want to talk a little bit that sort of the future uses icy for this next life.
so i mentioned that the identifiers really important and in the spread sheet see that jenna fires basically or how they interpret keeps track of whether the data already exists or if it doesn't yet exist in his needs to be written and but the. another piece of keeping track of these identifiers is it in a uniquely identifies all the bits that are in which the data and so you can check against the records in the stretch you can check the record since for cheap again sweats in which the data using the query service and this gives a possible way. it isn't for tracking vandalism which are of talk about in just a moment next line. so i mentioned that there was this mapping scheme out and it's actually done be three c. recommendation and the maps table columns to any r g f model and so the scheme of that i use maps it to the wiki data model and so as i mentioned these schemes are you. used by van der bought so that it knows how to connect the columns of the spread sheet to the with the base model so that's kind of the technical thing but the thing i'm really excited about is that you can also use the scheme is to emit exactly the same r.d.f. that you would get if you queried the query service and so essentially this allows. us to create in an ambiguous stable standardized way to archive of snapshots of what's what is in which the data in an easy to reach of their form exline so at which the car north america and we gave a talk in this is a slight a show at that point i was thinking of weekly basis. possible way to keep sort of like a local copy of which the data next i'm but i sort of gave up when that when i realized that i'm if you are kind of these french sheets in get have you could use basically the built in version ing of get have combined with the ability to to the minute. triples from the spread sheets in there is a route the script called calculator that will do this and in so you can basically take these snapshots put them in a triple store as separate named graphs and then it would be potential potentially possible have another script that would. clear the triple school or store query the with the life would be to queries service and look for differences between the two next lighting. and so you this basically gives us ability to do things the other tools i quit statements open refined so and cannot do on the ability to keep snapshots of the part of what you did that were interested in what is really powerful and being able to then. uses information to to both detect changes in other people have made to the topics of our interest but also to potentially detect and possibly rude for vandalism i think this is a a future capability of the system that i'm really excited about being able to explore in the future. i think that's my last like.
great thanks to try and i know our time is coming to an end so i just want to quickly say a few things about future separate like take so steve mention something to that.
a fantastic to work as you can imagine i'm so glad that is he is a collaborator on this project. but we're also thinking about said some other types of integrations into our systems one is a decade to connect via what we're doing with any side more closely with our institutions are story for those you that our term given is usually caused or is it provides open access to and. in fact the instead publications and reprints there's a already a property that we can use all work available and the nine three with put that on a few of our items that we've entered i'd like to have that on every item because that would help us demand ok what's available are actually in open access. dinner was tory verses of what's not yet available metformin and that helps isn't mentioned as calculations librarians to gauge starting more items in store was tory an open access i were also actively involved working with other people and at the university in the new research information management. there are systems out there as i alluded to that will essentially sell your data back to you produce authorisation and then they collated net provided a nice form we think that it's a visual be possible through with the data to to build tools that will rise to a level maybe. the even more complete for example of the system's miss items in humanities where there are very good and sciences and so because we don't have any bias towards one discipline another were working across its fields and very well aware of the publications in their respective areas. we think we may be more comprehensive and some a tool like school it. suitably developed i think will come soon a very strong competitor and its research and mission management air show will hear more about that and finally just as. so we're very happy to have received from with the site is to develop learning pathways you know what we're going to do with this gregg is to develop an interactive experience or librarians to vary somewhat our own my parents come into the side with the data for the first time and need to find adequate the ideas. you might come in as a catalogue or so there's a lot of things that you know already about for her example that you don't need explains year whereas if you come in from circulation best that might be less familiar to you so you might need to learn some of the ideas behind her so you can actively engage in understanding like what the differences between or conditions set so on soap or so and we're excited about that we're going to try. to use everything that we learn that we started rushing through and talking about to put its learning pathways exercise and release the community back or this coming struck and with that i will stop sharing is i think we're at the end of our time and he's so much and seven high thanks as a great tool we had which you have some questions for you indeed. pat but said we all right on the time and so am i thought it would be of circus try it in its months isn't the fact that i can we can certainly get any the pad and i apologise are talking to women know that's absolutely find it was a great presentation really enjoyed it but so will stay on track to save the great. for if you could to questions have been put the star few humdingers in the tank he says.
Feedback