Research Grant Data in the Griffith University Research Hub
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Alternative Title |
| |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/36009 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Observational studyUniverse (mathematics)Profil (magazine)Physical systemService (economics)Computer animation
00:17
User profileSystem programmingInformationData modelPhysical systemTerm (mathematics)Universe (mathematics)Function (mathematics)Profil (magazine)NumberPhysical systemOntologyGroup actionSemantic WebSelf-organizationInformationBitEndliche ModelltheorieCartesian coordinate systemCovering spaceQuicksortComputer animation
01:32
InformationSystem programmingEnterprise architectureDegree (graph theory)Source codeInformationGroup actionPoint (geometry)Data managementPhysical systemStatement (computer science)Degree (graph theory)Enterprise architectureCASE <Informatik>Term (mathematics)Descriptive statisticsTraffic reportingMoment (mathematics)Row (database)Digital photographySelf-organizationProfil (magazine)Computer animation
02:51
InformationTotal S.A.InformationShared memoryCASE <Informatik>MereologyDescriptive statisticsComputer animation
03:56
Arc (geometry)Source codeRevision controlInformationInformationSource codeRevision controlForm (programming)CASE <Informatik>Search engine (computing)Enterprise architectureComputer animation
04:41
Field (computer science)Revision controlInformationBitAreaTheory of relativityOrbitRow (database)Term (mathematics)Database2 (number)Process (computing)Field (computer science)Revision controlCASE <Informatik>Position operatorUltraviolet photoelectron spectroscopyComputer animation
05:47
Uniqueness quantificationSimilarity (geometry)CodeKeilförmige AnordnungLevel (video gaming)Function (mathematics)Link (knot theory)InformationPhysical systemCASE <Informatik>NumberMoment (mathematics)Position operatorField (computer science)CodeCodeAdditionFunctional (mathematics)Similarity (geometry)IdentifiabilityComputer animation
08:00
Link (knot theory)Projective planeWeb 2.0Sinc functionPhysical systemAdditionProfil (magazine)SpacetimeComputer animation
08:23
HypermediaException handlingXML
Transcript: English(auto-generated)
00:00
So, yeah, I work at Griffith University and eResearch Services and a while ago we used the ANDS Research Grant API to improve the data that we present in the Griffith University Research Hub. So the Research Hub is our publicly facing researcher profile system and we build that for two main purposes. One to make Griffith Research more discoverable to show what we are doing and the other
00:25
one to give researchers a profile that they can use for their own purposes that they can share that shows their work individually. And to give a bit of background, the Research Hub is built using Vivo, which is a semantic web application and it's becoming quite popular.
00:43
There's a large number of universities worldwide that build their researcher profile systems based on this. It came from Cornell University originally, so there's a huge uptake in the US in particular and as a semantic web application it has a couple of very nice benefits for this sort
01:00
of purpose and one is that it provides a very rich ontology to model information about researchers, research related activities, organizations such as institutes, schools, groups and in terms of activities we can model publications, grants and other research output. And of course it's also easy to add third party or your own ontologies to add even more data to this.
01:27
Now when we developed the Research Hub, one of the main aspects that we wanted to cover was that people would not have to maintain their profiles themselves and so in that spirit we tried to get as much data as possible from various enterprise systems that Griffith
01:41
and external systems if available. So at the end, at the moment, researchers really only have to add their photo if they want one, a short bio statement and maybe a research statement and everything else including academic degrees, employment history, publications, grants, supervision and so on gets drawn
02:01
from enterprise systems and we get the same information about institutes, groups and schools. However, one problem that we came across was that enterprise systems were at some point built for a specific purpose and that was usually not that the data would be displayed publicly and for a lot of the data that's not a huge issue, publication records are fairly standardized so we didn't have any problems there but grant information in
02:24
particular was not very well covered in our systems. Sometimes just because we were in the managing organization so if things changed later on in terms of titles and amounts and whatnot, that wasn't necessarily reflected in our systems and the other reason is that we didn't necessarily need descriptions and whatnot for the reporting
02:44
purposes the systems were built for. So for the research hub, we identified two business cases where we could use external grant data and really add some value to the research hub and one was to improve data on existing grants, get better descriptions, get full funding amounts like the total
03:03
grant amount and not just the share that Griffith University got from it and the other business case was that while we knew about grants that had some affiliation with Griffith, we didn't know anything about grants that researchers had while they were not at Griffith University and so adding that information became quite important because while it doesn't
03:25
showcase any Griffith research, it is an important part in the biography of our researchers and it gives a much more complete picture especially because we do have historic information about publications and whatnot so not having the grants left a gap that many people were
03:42
sort of eager to close and again we didn't want people to enter this information manually so getting as much of that done automatically as possible was the end goal. And this is where the ANDS research grant API came in and yeah as I said in the previous talks it draws from the same data sources as the Research Data Australia portal and
04:02
so it has very comprehensive information especially about ASC and NHMRC grants and it also provides us with a very nicely cleaned up version of this grant information so information that is maybe not well captured in a standardized vocabulary in the source data was actually cleaned up and is now provided in a very nice form.
04:24
And the API is based on Solr which is a very simple to use, very nice and very well documented enterprise search engine and so using this data was actually quite easy for us. So for the first business case we didn't actually have to do very much. We could basically look up grants based on their grant ID and the funding body.
04:43
Grant ID is not necessarily unique across funding bodies but doing this look up was quite easy and so we would get back the record as a JSON formatted record and all we really had to do was map those fields to our RDF vocabulary and do a few related look ups for people
05:03
in our database and what not to link it up properly but all in all it was a very easy process and well we did this work quite a while ago so about a year and a half I think most of it, a bit longer and initially a lot of the text fields still contained a lot of the actual
05:21
information in terms of funding amounts and what not and we did a fair bit of text processing to extract it as well. Nowadays ANDS has done a lot of work on improving this and so we're now getting a much cleaner version of the data so whoever wants to get into this area now and use this information is in a really good position to get very nice and clean data from this.
05:42
The second business case was a lot more difficult so we just heard about research identifiers, it's still very difficult to get that information for our researchers at the moment and ORCID is not very common yet and we don't get ORCID identifiers from the API or from the funding body so what we had to do to get
06:03
historic grants for researchers that had nothing to do with Griffith was we had to come up with a way of matching researchers by name and for that we built a two-stage scoring function. One simply looked at name similarity and gave us some idea whether two names could be referring to the same
06:21
person and we put a lot of empirical work into that because sometimes people go by a preferred name, sometimes by the actual first name, some people always include their middle name, some people don't so there's a lot of work to do about that and then we still have the problem or have the problem that names are not unique and so we added a second score that was based on the fields of
06:44
research people published in and we have very good information about that in the research hub so we could build a portfolio of four codes that people had published in previously and we just went by the assumption that if they had a grant in the past that had a certain four code
07:01
that they would have at least one publication that had that four code as well. Yeah then we had to implement some additional handling for edge cases where grants were actually managed by Griffith and we had information about them but people were different institutions and still attached to them and linking all that up but that was all relatively easy once we had the linking up and running. Well I can't actually give any numbers about how
07:24
well we're doing. Empirically it worked quite well and in practice over the last one and a half years I think we had about two or three false positives where people informed us that the data was incorrect and we built in functionality to manually add and remove grants but still
07:44
automatically ingest the data and yeah so both of these cases were very successful and that was largely thanks to how easy the ANTS API was for us to access and to use and yeah I thought to wrap it up I quickly put up some links to the systems involved. The first one
08:01
is our research hub, the second one for those who are interested and who may not know about it already that's the Vivo project which is definitely worth a look for everyone who's interested in getting into the space of researcher profile systems and the last one is the documentation to the ANTS API and since it's based on Solr there's a lot of additional
08:22
resources everywhere on the web and yeah that's all from me.
Recommendations
Series of 8 media