The challenge of organisation - based publishing (grey literature) for Wikidata
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Alternative Title |
| |
Title of Series | ||
Number of Parts | 36 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/51066 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
5
11
13
14
15
16
18
19
31
32
36
00:00
TelecommunicationAreaData structureHypermediaObservational studyPhysical systemNumberInheritance (object-oriented programming)ChainComputer programmingMetropolitan area networkOpen setMathematical analysisDigitizingMeeting/Interview
00:46
Presentation of a groupAuthorizationDatabaseProgrammer (hardware)NumberMeeting/Interview
01:17
Self-organizationType theoryData modelData modelQuicksortBitExpert systemSlide rulePresentation of a groupSoftware developerFocus (optics)WikiView (database)Key (cryptography)Endliche ModelltheorieComputer animationMeeting/Interview
03:04
Software developerInformationType theorySource codeInformation technology consultingComplex (psychology)Network topology
04:18
Variety (linguistics)Variety (linguistics)Term (mathematics)QuicksortBasis <Mathematik>Point (geometry)Context awarenessComputerSound effectArmRegular graphResultant
05:22
ExplosionMathematicsTraffic reportingVariety (linguistics)Context awarenessSelf-organizationIntegrated development environmentSoftware developerOpen setSystem callData managementProgrammer (hardware)Computer animation
06:01
Digital signalGroup actionMeasurementTuring testLevel (video gaming)CollaborationismGroup actionTraffic reportingResultantVariety (linguistics)Computer animation
06:31
TelecommunicationData modelExplosionPerformance appraisalMathematicsUser interfaceService (economics)AreaIntegrated development environmentVideoconferencingElectronic program guideMathematical analysisInterior (topology)Level (video gaming)Game controllerData typeRepository (publishing)Numbering schemeCodierung <Programmierung>User profileTerm (mathematics)Library (computing)Type theoryRoundness (object)Traffic reportingWordSoftware developerIntegrated development environmentTerm (mathematics)RhombusInformation technology consultingPerformance appraisalOpen setLibrary (computing)QuicksortTelecommunicationProjective planeSelf-organizationNumberWikiSubsetSource codeContent (media)Representation (politics)CASE <Informatik>Sampling (statistics)Different (Kate Ryan album)Observational studyOntologyMoment (mathematics)Repository (publishing)BitTable (information)Network topologyElectronic program guideGoodness of fitCartesian coordinate systemClient (computing)Online helpResidual (numerical analysis)Process (computing)Key (cryptography)RoutingMathematicsArtificial neural networkMultiplication signObject (grammar)Computer animation
12:15
Game controllerData typeRepository (publishing)Codierung <Programmierung>Numbering schemeUser profileTerm (mathematics)Library (computing)Uniform resource locatorLine (geometry)Computer configurationExplosionSimilarity (geometry)Data modelMereologyDigital object identifierMenu (computing)Internet service providerSelf-organizationElectronic program guideBitDifferent (Kate Ryan album)Formal languageMereologyBasis <Mathematik>Electronic program guideTraffic reportingWordSelf-organizationTouch typingProcess (computing)OntologyObservational studyProduct (business)Repository (publishing)Cartesian coordinate systemNumberQuicksortCASE <Informatik>Extension (kinesiology)Ferry CorstenFrequencyExterior algebraPoint (geometry)Goodness of fitGodArmCore dumpUniform boundedness principleMaterialization (paranormal)Complex (psychology)1 (number)Web pageLink (knot theory)Series (mathematics)Descriptive statisticsForm (programming)AuthorizationUniform resource locatorMultiplicationInstance (computer science)Content (media)IdentifiabilityData modelNumbering schemeOpen setNatural numberType theoryAreaComputer animation
17:59
Dependent and independent variablesExplosionQuicksortWeb pageLink (knot theory)Metropolitan area networkTraffic reportingDependent and independent variablesComputer animationLecture/ConferenceXML
18:25
Green's functionScanning tunneling microscopeExpandierender GraphDigital object identifierInternet service providerExplosionSelf-organizationUniform resource locatorElectronic program guideRepository (publishing)WordGeometrySoftware testingWebsiteTraffic reportingDigital object identifierScaling (geometry)IdentifiabilityVector potentialFormal verificationSpacetimeInformationComputer configurationTerm (mathematics)Right angleSelf-organizationMathematicsBridging (networking)Materialization (paranormal)DatabaseQuicksortElectronic program guideDifferent (Kate Ryan album)Content (media)Projective planeLibrary catalogLibrary (computing)Repository (publishing)Level (video gaming)Group actionOnline helpWiki1 (number)Real numberStandard deviationOpen setGodProcess (computing)Row (database)Multiplication signAuthorizationAverageRemote procedure callComputer hardwareStatistical hypothesis testingFamilyCoordinate systemComputer animation
22:35
Variety (linguistics)Presentation of a groupOcean currentOrder (biology)PlanningData miningMetropolitan area networkWikiMultiplication signComputer animation
23:21
Presentation of a groupContext awarenessQuicksortComputerSelf-organizationFeedbackAnglePerspective (visual)
24:51
Link (knot theory)Term (mathematics)Term (mathematics)Traffic reportingLink (knot theory)WebsiteDatabaseMultiplicationSelf-organizationMathematicsQuicksortSchmelze <Betrieb>Stability theoryScaling (geometry)Graphics tabletInformationProjective planeDifferent (Kate Ryan album)AreaUniform resource locatorIdentifiabilityNatural numberDigital object identifierNumerical taxonomyFormal verificationElectronic mailing listSubsetMeeting/Interview
27:32
QuicksortSoftwareDatabaseRow (database)String (computer science)Multiplication signPhysical systemContent (media)DigitizingTerm (mathematics)Query languageDifferential (mechanical device)Projective planeDifferent (Kate Ryan album)File viewerAreaResultantPixelWorkstation <Musikinstrument>Right angleInternetworkingComputer animationMeeting/Interview
Transcript: English(auto-generated)
00:00
All right, so we've got another talk scheduled, but I haven't got those people available yet, so we might skip ahead to Amanda Lawrence. Amanda's a researcher and librarian who works in the areas of research communications,
00:21
media, and publishing systems, digital collecting, open scholarship, grey literature, which is what she's gonna be talking about today, knowledge infrastructure, and public policy. She's currently completing a PhD on research communications and public policy in the media communications department at RMIT University. I hope that wasn't out of date. I got it from your orchid, I think.
00:42
Amanda was the director of analysis and policy of the analysis and policy observatory, an award-winning public policy research database from 2006 to 2018, where she's managed a number of research, Australian Research Council linkage and infrastructure grants. Amanda's worked in various roles in the book and cultural sector, including establishing the literature residency programme
01:02
and the author touring programme at the AsiaLink Centre at the University of Melbourne. So I'll hand you over to Amanda, who's going to give us a presentation on the challenge of institution published grey literature.
01:22
Great, thanks, Alex. Maybe make the, I think I'm quite large, so if you want to make the slides a little bit bigger. Yeah, okay, yeah, great. Get rid of me entirely, fantastic. So yes, thanks for that introduction. I didn't realise you were going to,
01:42
I'm not sure where that bio came from. It is still accurate, although submission is hopefully going to occur very soon for my PhD and then this week, I'm hoping. So, yeah, thanks everybody for joining in this session.
02:01
I'd just like to say first up that I'm not really much of a Wikidata, WikiCite or even Wikipedia expert. I have, I'm a total dabbler, so I can't really, so Margaret's presentation was pretty, a bit advanced for me,
02:23
but I do know a fair bit about grey literature, so I'm going to focus on that and focus on why it's really important and why we need to think about how we can get it into Wikidata and WikiCite.
02:40
So firstly, so just talk for 15 minutes, a little bit of an overview of what we're talking about here and then some of the kinds of publications we might be wanting to deal with and what the sort of referencing issues there might be in the data model
03:00
and some of the challenges that are involved. So firstly, I think Daniel already mentioned the sustainable development goals. So the reason why I want to bring this in is to have a think about just the scale, the complexity, the diversity of topics,
03:25
of institutions, of people, of countries, of types of research or research methods that would be involved in trying to address these goals. It is not going to be done
03:41
simply by academic journal articles, although they are going to be important. It's also going to involve electoral data, it's going to involve economic data, it's going to involve government research, it's going to involve thousands of NGOs
04:00
and think tanks and consultants, et cetera. So if we want to be a source of information and for the sustainable development goals, we need to expand our idea of where research is located. So I came across recently this term requisite variety
04:24
and I really like it. It comes from a sort of computing basis. So some of the audience may be familiar with it. It was new to me and it's also being adopted by people looking at sort of evidence-based policy
04:42
and we might think about that. So the quote is, an effective evidence ecosystem would seem to need a requisite variety of initiatives to encourage and enable evidence use. That is the repertoire of initiatives to promote evidence use needs to be at least as varied and nuanced as the policy and practise context
05:01
that are being targeted. So the point is that the kinds of resources that we're going to need are going to have to be probably as complex and diverse as the problems that we face. And therefore the computing system that we provide is also going to need to be fairly complex.
05:23
So the kinds of reports that we might be thinking about in this context of the sustainable development goals and climate change, et cetera, include reports from the IPCC itself, clearly peer reviewed. So it's not true that research reports
05:41
from organisations aren't peer reviewed. UN, the environment programme, biodiversity issues and World Bank, major quasi-government organisations and international non-government organisations.
06:00
There's also a huge variety of reports produced at a national, at a local level, all around the world. So these examples are all from Australia, but they include research centres, often research centres in collaboration
06:21
with both companies and NGOs, advocacy groups, government research centres, et cetera. So my concept of what we're dealing with here is that if we think about there's research and there's research communication
06:40
and scholarly communication is a subset of that, but so is research publishing and they kind of overlap. There's commercial academic publishing and there's organisation research publishing. So a lot of what we have on Wikisite and Wikidata is commercial academic publishing. There's also market research publishing, where you can pay thousands of dollars
07:02
to get access to a market research report. And there's substantial, but still relatively small, non-profit academic publishing. So the open access movement, diamond open access journals, et cetera. So if we think about what might be then included
07:24
on Wikidata and Wikisite, we need to think beyond books and journals to think about reports, technical reports, evaluations. It's not quite clear what that type is referring to.
07:41
There's policy briefs and briefings. Again, not very well-defined on Wikidata. Discussion papers, it's on Wikidata, but actually to my mind is the wrong definition. White papers produced by consultants and governments as well. Systematic reviews, again, on Wikidata,
08:01
that's kind of looks a bit like a topic. It's actually got a topic reference to the Library of Congress. So a lot of publications are both can be topics as well as publication types. So we need to kind of make sure we're not getting those things confused. Literature reviews, review articles, there's lots of those.
08:23
It's not quite clear exactly what that's referring to. Case studies, but the type there is actually referring to a research method rather than a case study as a publication. Preprints, working papers, a case report,
08:42
that's a medical report, and scholarly articles. So it's just a sample of what we might be wanting to work on and create references for in Wikidata. So on the topic of climate change,
09:02
there are actually thousands of publications. So this is using Scholier to see what was available on Wikidata in terms of diverse kinds of content. And when you have a look at that, you've got a lot of scholarly articles and then a lot of review articles.
09:22
If we get rid of the scholarly articles and the review articles, we've actually got, on the topic of climate change, one report a year for the last few years and maybe two. So there's an incredible lack of representation
09:41
of the kinds of material that we would want to see on Wikidata about climate change that would be very relevant. On the other hand, a source like APO has got 1400 reports on climate change.
10:03
And I really sort of show this mainly just to confirm to people that the reports are out there. It's not that the reports aren't there and it's not that they aren't important and substantial research reports and this is the example that's given here from a climate change round table,
10:23
development round table, the Ministry for Environment from New Zealand. So we're not necessarily dealing with insubstantial unverifiable research. It's a lot of really substantial material. And also something about APO is to have a look
10:42
at the diverse kinds of documents that are also there, conference papers, discussion papers, working papers, strategies, briefings, articles, guides, fact sheets, policy reports. So there's a lot of diversity within this area.
11:01
So this diversity is kind of pretty problematic because it's never really been worked out. So there are various attempts underway at the moment to develop ontologies for different sort of genres.
11:20
And even the word genre on Wikidata seems to be used for literary kinds of content rather than all sorts of publication types which is sort of how they're more usually used. So there's another number of projects looking at this.
11:41
It would be great to see a project looking specifically at kind of report literature or organisation publications. I actually prefer not to use grey literature but I use it because it's understandable for a certain segment of the community. But if we think of organisation publications
12:02
I think it's the easiest way of thinking about it. So the Confederation of Open Access Repositories has made an attempt to create a vocabulary and that's referenced on Wikidata. Some of those are really useful.
12:21
That actually is interesting. A lot of these ontologies sort of refer cross-reference to each other. But so the core vocabulary reference is the GISG one from 2008. The Library of Congress has also got an extensive genre form vocabulary.
12:44
The Library of Congress sort of has a little bit of a bias towards books. In fact most of these ontologies have a bit of a bias towards books. So they often miss, the policy area
13:00
has a lot of publication types of its own special nature. Schema.org is very random I think in what it chooses to create a genre for. So that can be used to a certain extent. But yes, something interesting.
13:21
You saw the working paper definition in the Confederation of Open Access Repositories actually describes it as unpublished documents. It references the GISG scheme which actually describes a working paper as a published document. So even if it's referencing the same thing
13:41
it can actually have the opposite definition. So there's really quite a bit more work that needs to be done here. But having kind of talked about all the complexity and differences, we can absolutely look to books
14:00
as a guide to a lot of what we would wanna be using for describing a lot of these materials. They are documents, so they have a title, they have an author, the author is often multiple authors, often institutional authors. You need to think about the instance.
14:21
So it's a publication, a report, a working paper, a technical report. It could even be multiples of those. The title, the subtitle, the description. The publisher is often an organisation, often multiple organisations. So the organisation itself may need to be added
14:41
in the country. The publication date, that can be confusing. It's interesting to look at the IPCC report has at least one I looked at had two dates. This can be the case with government documents where the report has got a date
15:01
of say November of one year but it actually doesn't get released until February of the following year. The topic, so there are vast numbers of topics. So the topic is going to be quite complex and I think that's a big issue
15:21
that Wikisight needs to address is how we look at getting some sort of topic, vocabulary that we can use. The URL to the full article. So this is why I asked the question,
15:40
where do we go to and what's going to happen? A lot of this material is published online by the organisation and it doesn't remain stable. So this is a big problem. The full work where that might be available, various identifiers. So a lot of reports do have identifiers,
16:00
a lot of reports don't. Some that might have multiple identifiers. I have tried using identifiers to put things in and sometimes it works and sometimes it doesn't. So even with an identifier, it might not pull any content. The language is a part of a series of collection, et cetera.
16:22
So there's a really strong basis to develop a data model for different kinds of documents based on books, but we also need to think about the differences. So this is the example where it's actually got two publication dates.
16:42
I'm not quite sure why, but it's interesting that that's possible. You can also sort of say that something follows or is followed by. So this is part of a series. So it's the case with the IPCC. It doesn't actually have any topics on it for this one. So that's a real shame.
17:01
Another example, the Finch report. So this is a good example where it's really important to put the alternative title. So although the name of this report is Accessibility, Sustainability Excellence, it's generally known as the Finch report,
17:20
which is the case with a lot of major government reports. And I, yeah, so this is a general guide. Just be, yeah, oh, I'm sorry. I was gonna, I always looked at the link for the Finch report, and it's not a dead link.
17:42
It goes to, it's a moved link. So the content for it has been moved. I'll just, if I can do this. Yeah, so we go to the live page, this, it's got a link here. This actually goes to this page.
18:01
So it's still sort of, it's still a page. It might not register as a dead link, but as you can see, there's no copy of the Finch report. And in fact, you know, you try and look it up and well, you think you've got it, but actually this is the government's response to the Finch report. And from what I can gather,
18:21
gov.uk doesn't have a copy of the Finch report. You can find it on this website. I'm not really quite sure what this website is about, but anyway, they do have a copy of the Finch report. So, whoa, fantastic. This, so SemanticScholar has also got a copy of the Finch report, which apparently has got a DOI.
18:43
Oh yay, fantastic. Yeah, no, DOI is not found. So this is a wild west of verifiability for something that is, you know, like an extremely important government report
19:00
in terms of open access policy. So I'll just finish up. And am I going for time? Probably going over now. So yeah, just, we can start to sort of think about what kind of standard guide we could put together
19:24
to help people catalogue these sorts of materials. I'd be really interested in working on that. There's a lot of challenges in this area. So multiple departments name changes. There are a huge problem with machinery of government changes in government.
19:43
There's often multiple organisations, often, you know, whole groups are mentioned. It's really hard to work out who actually are any of them. Authors, there's often no identifiers. There may be multiple identifiers. There may be identifiers that actually don't even work anymore.
20:00
Copyright is often missing. So it would depend on the jurisdiction you're in as to what that means. In Australia, that means automatically all rights reserved. You know, you can have different copyright jurisdictions depending on something that's published or unpublished. So we've only just in the last couple of years
20:21
got a duration for unpublished works. And there's often missing information. You know, no date, no author, no publisher, no location, no identifier, sometimes not even a title, it's quite incredible. So yeah, just have a think
20:40
if you're ever publishing online and to make sure that, you know, it will float around by itself. So, you know, there's a lot of potential here. There's actually such an enormous amount of material that we really need to be looking at bulk import
21:02
and how to be doing that. A lot of organisations like the UN, the WHO, World Bank have very well organised collections. They're not necessarily all that great for doing systematic review searches on. And so actually there's an enormous need
21:23
for a large scale database where you can systematically review the literature on various, on any topic. And being able to do that and include the major research reports is a serious issue. There is no large scale collection
21:40
doing that at the moment. There's lots of small ones. There is a new, I should have put it on, there's a new project called Policy Commons, which I'm sort of vaguely involved with. And so that's one option coming along, but it's a real opportunity for Wikidata and Wikisite if we can work out how to do it at scale.
22:06
Then the, you know, institutional repositories have actually often got some level of content that might be able to be mined. The Bridge Library has a really great collection of reports. So there are various databases,
22:21
but they tend to be small to medium sized and having sort of, you know, a certain segment. So there is a real need for an overarching coordinating collection in this space. Okay, sorry, I hope I didn't go over time. I think I might've gone over time.
22:43
Don't worry at all, that's totally fine. As mentioned earlier in the session, we've unfortunately had a no-show from one of our presenters. So we are reshuffling the presenter order currently. And we've actually got a couple of questions for you, Amanda, that have been added into the etherpad.
23:01
So the first of those was a definitional question, actually. What do you reckon is wrong with the Wikidata definition of discussion paper? Oh, so I have to look it up again. I think it was that it, yeah, sorry.
23:29
No, my finger's not going to do anything. Yes, sorry. I think it was that it presumed to be coming
23:40
from a very much a research angle and wasn't, sorry, my computer's not finding it for me. No, sorry, I'm not getting it up.
24:07
Yes, sorry, that it was coming from the perspective of a research discussion paper, like preliminary findings, that sort of thing, whereas actually a discussion paper in a political context is often,
24:23
or in a research sort of public discussion context, is a lot more about kind of presenting ideas and policies for feedback. So just the nuancing around a lot of the definitions come from an academic perspective
24:43
and don't really take account of a policy or an organisation perspective. And there was actually an additional question about the APO taxonomy. So it asks, has the APO term taxonomy changed?
25:02
The link that the questioner had to climate change now returns a 404 error. So they add the link in the etherpad. It may have, Dave, actually.
25:21
So I haven't worked there for nearly two years, so I'm not responsible for that anymore. And they have actually kind of created a new taxonomy a public policy sort of subset taxonomy, which has been added to Research Data Australia,
25:42
their taxonomy list. And so it may have been that they've been changed. Yeah, so that's all I can say. I also wanted to second the comment that you made about the difficulties in this wild west of verifiability.
26:04
I was working on a project that required me to look at a particular report a while back. And not only did that report, of course, not have a DOI or any stable identifier like that, it didn't have a stable URL that pointed to it.
26:21
And it didn't even have a stable title. So it was published, it appeared in multiple different places under different variants spellings of its title. And so in many ways, Wikidata is actually more vital for these sorts of items than it even is for academic journal publications and books, which already at least have some stability
26:43
in their site ability. And actually for a lot of the organisation published literature and reports and grey literature, actually Wikidata might end up being the primary, the primary location for collating that sort of information.
27:00
I think so. Well, I think, so my feeling is that there will be various small scale collections because of the niche nature of it. But it's really difficult to do any search at scale,
27:25
particularly like if you're doing a systematic review, you actually want to be able to, because you can do large scale searches of databases, what you really want to be able to do is separate out,
27:40
okay, we've got all the journal articles covered. What do we need to find out that's not a journal article? How do we actually search everything that's not a journal article? That is, you cannot do that on Google, you can't do that on, or Scholar, you can't do that on Microsoft Academic. I think a lot of the databases
28:01
you can't really sort of differentiate from, I don't want to see journal articles. And then when you do get good big collections like the WHO, you can't run specialist queries on them. So I had a project last year looking at digital health,
28:21
and I was given this sort of, you know, string of 10 different terms that I was meant to sort of run on various aspects of digital health that you can put into like a commercial academic database or some of the sort of systematic review software systems.
28:42
Every time I would run this string on another database like WHO, I'd get nothing, absolutely nothing, because it just couldn't, you know, you had to actually go back to just find digital health, no any other extra qualifiers. So then there's really a desperate need
29:02
for an overarching aggregator, but also one that's able to be able to have a lot more complex metadata, you know, around research methods and all that sort of thing that could sort of start to be added into records for different content so that that can be reused.
29:25
Fantastic. Thanks, Amanda, very much for that.