DISCOVERY - Using Linked Data relationships to enhance discovery and mitigate bias
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 14 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/60262 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Computer animation
04:26
Program flowchart
Transcript: English(auto-generated)
00:00
is Juliet Hardesty from Indiana University Libraries. She has a research focus on enhancing inclusivity. Please, Juliet, go ahead.
00:20
Hello. My name is Julie Hardesty, and I'm the metadata analyst at Indiana University Libraries. Thank you for the opportunity to share this work concerning linked data, metadata bias, and the use of controlled vocabularies in academic research. In getting started, I wish to acknowledge the Miami, Delaware, Potawatomi, and Shawnee people
00:42
on whose ancestral homelands and resources Indiana University Bloomington has built. The work I am sharing today is an attempt to improve the experience of researchers from these communities and researchers learning about these communities. And hopefully, what you are seeing today provides more substance than just the words of a land acknowledgment.
01:02
The problem I'm working on is that controlled vocabularies used in galleries, libraries, archives, and museums for cataloging and description can misrepresent or even use harmful terminology when describing groups of people, particularly groups of people who experience marginalization or systemic oppression.
01:21
This example here, Indians of North America, is an LCSH, Library of Congress, subject heading term that's provided by a major vocabulary in the United States. It's a problematic overgeneralization and a misnaming of how indigenous people identify with their communities. Larger controlled vocabularies like the Library of Congress
01:40
subject headings are slow to update. So even if they try, they're generally behind on the terminology used by a community of people. This is also showing on the screen the library catalog search interface at my institution. And as you can see, the subject is used in tens of thousands of records at this point.
02:00
What I'm proposing is that in addition to continuing to encourage large controlled vocabularies like the Library of Congress subject headings to update their terminology so their description more accurately reflects groups of people, I'm also interested in turning the lens. And instead of relying on the large generalized controlled vocabulary for subject access, use the controlled vocabularies
02:21
from these communities of people to look through the larger vocabulary to get more relevant and accurate search results from these systems. Linked data has the potential to provide these connections in a way that can be automated and maintained, I think. The approach here is to use a linked data vocabulary from a community that has experienced
02:41
systemic marginalization and connect their terms to LCSH as that community defines those connections. The proof of concept that I'm showing today is an application that provides an interface for accessing a system using Library of Congress subject headings by connecting the term list from the community to searches based on the linked data match terms.
03:03
So the application here is a proof of concept, and it's available on GitHub. It's a JavaScript file and an HTML file at this point. It can just run locally in the browser. Any searches that are conducted will require an internet connection to reach the library catalog that's being searched.
03:20
But otherwise, viewing the interface of this proof of concept information retrieval aid can just be done locally. The screenshot here shows what the interface looks like. The vocabulary list of terms is a scrollable list on the left side of the screen. When one of those terms is selected, the right side of the screen is filled in to show the preferred term from the vocabulary in the center.
03:42
Then the surrounding bubbles show connections to other terms and the relationships used for those connections. The top bubble is showing LCSH connections for exact matches and close matches. The left and right bubbles are showing narrower or broader terms from the vocabulary. And the bottom bubble is showing
04:01
any use for alternative terms and related terms from the vocabulary. Homosaurus is the linked data controlled vocabulary that we are using for this proof of concept. It's an international LGBTQ plus linked data vocabulary. The local term list provided in the proof of concept as JSON-LD is from homosaurus.org,
04:21
but includes only the homosaurus terms that are actually providing LCSH connections. The homosaurus is used by the Digital Transgender Archive and their implementation is what turned this vocabulary into a linked data vocabulary. And originally, the homosaurus was created by the IHLIA LGBTI Heritage in the Netherlands.
04:43
And the interface design that you see in the proof of concept information retrieval aid is also based on the design of the search aid at IHLIA LGBTI Heritage. On the other side of the proof of concept is the system that we're searching against. And this is Indiana University's online library catalog.
05:02
It's also called IU CAT. This is a library catalog that uses Library of Congress subject headings. And IU is a major research university in the United States and its catalog contains millions of results as you can see here. So here are the basics of how we're viewing the functional connections between
05:20
the homosaurus vocabulary, LCSH, and the library catalog. So this is showing the homosaurus full vocabulary list. And this is an HTML view of one of the terms in that vocabulary, AIDS activists. And the HTML view that the homosaurus provides includes an external exact match to id.lsc.gov,
05:42
which is the Library of Congress linked data vocabularies. And there's also a JSON-LD serialization provided for this record, which at this point does not include that external exact match information. The exact match identified in the homosaurus is to this term in the Library of Congress subject headings
06:01
is the view of the HTML view of AIDS activists there. So we have a homosaurus term with an identified connection to an LCSH term. This means we have a term from the community that we know we can use for a subject search in the library catalog that is also using LCSH terms. And here are the results of searching for this LCSH subject in the library catalog.
06:23
There are exact matches that are identified by the homosaurus to LCSH terms. And there are close matches that are also identified by the homosaurus to Library of Congress subject heading terms. So there's definitely something to work with here. So now I'll take you through some quick example screenshots and discuss what we are doing with the connections
06:42
that are being provided. And again, I'll remind you that the list of terms here is only including the homosaurus terms that have LCSH connections identified. So it's not the full homosaurus vocabulary. So when an exact match is provided by the homosaurus, we're mapping that into the JSON-LD serialization
07:01
that we're using locally using the exact match property from the SCO data model. When the exact match provided by homosaurus is literally an exact match of the LCSH term, both terms linked here as exact match in an LCSH in the top bubble and preferred the homosaurus term, they're both conducting subject searches in the library catalog.
07:21
And we can see the terms are the same here for AIDS activists. And then this is again the IUCAT search results for doing that subject search for that LCSH term. If the homosaurus term is different from the exact match term identified from LCSH, the LCSH term is still a subject search in the catalog,
07:42
but the homosaurus term is a keyword search instead of a subject search. So in this example, gender roles is the homosaurus term, but that won't work as a subject search since it's not an actual subject term in the Library of Congress subject headings. The exact match identified by homosaurus is sex role, and that term can be used as a subject search.
08:03
So sex role is used as the subject search in the catalog with the results that will come back from items identified by that LCSH subject heading. And gender roles will be a keyword search and will return records that have that term match in any indexed field in the catalog.
08:21
This provides a way to see this terminology not only from the community's perspective, but it also helps expand the search terms used when looking for a topic. For some terms, the homosaurus provides a connection to an LCSH term, and that connection is defined as a close match instead of an exact match. We provide that in the local JSON-LD term list
08:42
using the close match property from the SCOAS data model and link that term to conduct a subject search in the library catalog as well. So this example is for non-binary people, and this is an example of a term that shows a difference coming from the community versus what is used in the Library of Congress subject headings. And in seeing that the Library of Congress
09:02
is using gender nonconformity, and that's what the homosaurus terms exact match is pointing to, it's showing at the Library of Congress term is potentially showing some judgment in nonconformity. So if they're used for terms identified,
09:21
that's identifying terms that are alternatives to the homosaurus term. So the homosaurus term is the preferred term to use, but those terms might provide more, those other terms might provide more context or different kinds of results. These terms are connected to the library catalog as a keyword search. I also want to show when description is being provided,
09:42
there are some terms in the homosaurus where there are, there is description provided with the term, and that can be really important to provide context. So this is an example of a term that is part of the homosaurus vocabulary, but as you can see from the description, it's an outdated term for lesbians and gay men. So it's a term that's part of the vocabulary,
10:00
but there's some qualifications to its use. This is also, this is another example that shows a term that is defined as only to be used in historical context. So it's a similar sort of situation. And then last, I want to show that this term, LGBTQ plus movement from the homosaurus is showing an update in terminology
10:22
from what is defined as the exact match in LCSH. So this is something where LCSH is potentially not up to date with the terminology that the community is using. So the current status of the proof of concepts is that nothing at this point is automated
10:43
or machine read. So the linked data connections are helping to define, but not to implement. So the connections from homosaurus to LCSH are coming from the community, they're coming from the homosaurus, the folks that are managing the homosaurus vocabulary. So I think it's important to amplify what the community is providing
11:01
as the connections that they see from their terminology to the Library of Congress subject headings. But at this point, the homosaurus is not publishing those connections as part of any of the linked data serializations yet. The external exact matches are not provided yet in like the JSON-LD form of the records.
11:23
So even this case is still forming and their implementation of these connections might still change. There's been a little bit of evaluation, but there is more evaluation that's needed. So preliminarily, we show this to the LGBTQ plus culture center librarian at Indiana University.
11:40
And in their experience, they thought that offering the lens of this vocabulary into the library catalog would be useful for those new to the LGBTQ plus community and new to LGBTQ plus topics and issues. But they weren't as certain that the interface terminology we're using where we've got the bubbles that are labeled with preferred and broader and narrower and used for those types of terms
12:01
those there were concerned that those might not be easy for researchers to understand. So we're contemplating if there's other better terminology that can be used to help people navigate those different relationships. But we still have a need to evaluate this with researchers new to LGBTQ plus topics and issues to see if it does in fact help people to understand more about these topics
12:23
and about this community as they're doing the research. And we also need to evaluate this with researchers familiar with LGBTQ plus topics and issues to see if it does help them expand and get more relevant results than what they would otherwise be getting just directly in the library catalog. We also need to recognize that a vocabulary
12:40
from a community is not the vocabulary from a community. So the homosaurus might be really useful for a certain lens into the LGBTQ plus topics but it's not necessarily a lens to use for all topics related to the LGBTQ plus community. So that's also something to keep in mind. And last we have additional work that I'm liking.
13:02
I wanna try and consider it as opportunities but some of the things that we're still looking into are there other linked data vocabularies from other communities that can offer similar lens on researching and learning. There are other vocabularies like the homosaurus that exist for communities that have experienced marginalization
13:20
and should have descriptive representation centered on the terminology they're using. The classification schemes and CV spreadsheet that's linked here on this slide is gathering what I have found so far working with colleagues at Indiana University and elsewhere. But there might be other resources like this that are gathering these vocabularies together. And there are likely other vocabularies out there
13:42
that are not listed in what I've found so far. And then also I'm still considering what about vocabularies that are not linked data. So there are other vocabularies that are not linked data that also have LCSH connections defined. The way why library at the University of British Columbia has the BC first nation subject headings
14:02
that has terms from the indigenous communities as well as terms from LCSH, but it's a PDF document and it's not linked data yet. So is there something that can be done with that even though it's not linked data at this point? And then other questions I have around implementation and automating this,
14:21
is an API necessary to automate how this can work? The homosaurus does not have an API endpoint yet, but each term does have machine readable serializations available. So is there something that can be worked with there even if there's not an API endpoint? Is a tool like this something that can be configurable to switch out vocabularies
14:40
or switch out discovery systems? So on either side. So there's still a lot to consider and work through to determine how something like this can be most useful and effective to provide vocabularies from communities to better represent themselves on their own terms for search and discovery in library and other cultural heritage institution discovery systems.
15:00
So thank you very much for your attention today and special thanks to the folks who've contributed and helped with this project so far. Please do get in touch and check out the GitHub repo. And I think there are opportunities here to use linked data to improve research and discovery outcomes and disrupt metadata bias. Thank you.