Indexing of Special Collections for Increased Accessibility
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 14 | |
Author | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/37250 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
4
00:00
Computer animationLecture/Conference
02:04
Computer animationLecture/Conference
02:27
Computer animationLecture/Conference
04:09
Computer animationLecture/Conference
09:37
Computer animationLecture/Conference
15:04
Lecture/ConferenceComputer animation
17:17
Lecture/Conference
Transcript: English(auto-generated)
00:00
We are going to do this as a team because we want to share with you this project that we're doing with the assistance of Access Innovations. The libraries at the University of Florida are transforming access to library collections by transforming the way we catalog our collections and acknowledging the decreasing reliance on the online public access catalog for discovery.
00:28
The vision statement of the libraries calls for providing service at the point of need and digital collections are essential to that. Charles mentioned that Georgetown is a research one university. The University of Florida is also, we're a very large university, about 53,000 students.
00:46
Over a third of them are in graduate school or professional degree programs. We have six medical and health science colleges, over $770 million in research. So it's a very large, complex university.
01:01
We're also a land-grant university, so we have agricultural extension offices in every county in Florida, and we also have numerous research centers and other facilities outside of our home campus in Gainesville. Most on-campus students take one or more online courses per semester, and remote online degree programs are growing.
01:23
Service to users who are not physically in the libraries is very important, even though student use of the libraries is very high and growing. Historically, when we digitized materials from our print collections, the MARC record was the initial and often the only descriptive metadata, and Library of Congress subject
01:41
headings from the library were often the only controlled vocabulary assigned to the digital item. I'm not a fan of Library of Congress subject headings. Google has taught people to search with natural language, and using LCSH is often like asking our users to learn a foreign language.
02:30
I have to go way back. I'm sorry. Let's start over. I apologize. Got a little... This is it? Yes. Thank you. I'm sorry.
02:42
I guess I can't set my papers there, so maybe we'll have to hold them for you, Margie, when it's your turn, too. So our UF digital collections are very large and growing very quickly, so we are looking for automated processes to expand and enhance our metadata. Improved and consistent metadata practices must be defined and then rigorous prospectively
03:04
and metadata for existing content must be brought up to these new standards. This requires new tools and changing roles and responsibilities for our cataloging and metadata staff. We have a very large repository, the UF Digital Collections, which has more than 300
03:24
digital collections, over 545,000 items, and over 13 million pages submitted by UF libraries and partner institutions. We have several major initiatives that are driving us to make these changes in our metadata.
03:44
For example, Florida History is one of our preeminent collections at UF, both in print and digital form, but the content is drawn from many different collections, rare books, manuscripts, political papers, newspapers, maps, government documents, university archives, et cetera. The challenge is to identify all of that digital content aggregated and presented in a
04:04
coherent body of information, which we call the Portal of Florida History. As a result of this project, we turned to Access Innovations for assistance with this effort, and we jointly decided to begin with the digital and digitized theses and
04:23
UF graduates. So what you see here is the Portal for Florida History. In 2016, we began a pilot project, and we worked specifically with the thesis
04:43
foundations. And as you probably well know, if you were also involved in digitization, digital could be just a new microfilm, which is straight forward to create, but very difficult to
05:00
search and retrieve. The Florida Thesis Project, as we called it, was taking all the digital and digitized theses and dissertations, a collection of about 25,000 items, which covers a broad discipline, and see if we couldn't enhance the content.
05:23
Metadata for those to make them more retrievable. For about the last 10 years, University of Florida has been receiving digital deposit of electronic theses and the dissertations and performing retrospective scanning on the older dissertations. So it's a good-sized collection, and they have also begun the retrospective
05:44
scanning of all the previous theses. What we did was take software from the Data Harmony Collection, the Access Innovation Sys, and build a project to hold the data, to build a repository.
06:01
The data was extracted from the digital collection, including the existing metadata, which is in Mark Met's XML. We tested against three different controlled vocabulary, three Thesauri, which were very broadly, the coverage of them was very broad,
06:23
News Indexer, the National Information Center for Educational Media, and JSTOR, which covers a large repository. After we did all the testing and assessment, we determined that we really needed two Thesauri. One would be the topical one.
06:43
In this case, we chose JSTOR, and the second one would need to be Florida-specific authority terms. So we have a combination of the JSTOR Thesaurus modified for Florida, and a Geographic and Great Floridians file as well.
07:02
We also tagged the data for the Florida Thesis Project, and this shows you some before and after sample records. I don't know if this will work, does it?
07:21
Well, on your left, you see the original catalog record, which is very thin. There aren't very many keywords. There's not much in the line of genre. And on the right, you see the enhanced record, which has considerably more listed those
07:44
that come from JSTOR, as well as the additional ones from the Florida Thes. And you can see that there's a great deal more information for you to hang on to. Students can add their keywords as they submit their theses or dissertations, but they're not metadata accurate. And libraries
08:04
often add the LCSH headings, but as you know, particularly for special or great literature collections, those are very thin. And they use up with words like biography, theses, and electronic theses or dissertations. So out of the four genre headings, three of them are not particularly
08:25
useful for searching, particularly for conceptual searching. As you see from these examples, the first two and the last four lines remain the same in both records, but the automated process added 12 additional controlled vocabulary terms, which are theses and topicals
08:43
specific to the geography of Florida, and therefore made the records automatically inclusive for the portal of Florida history. What you see here on the left is an example of some of the University of Florida collections. And then
09:04
we move to the arrow on the top, which shows XML records exported from the University of Florida digital collections into the SIS repository. From that repository, we can export MARC records
09:20
to OCLC, we can export staff internal use and review, we can put them into a local repository with the enhanced metadata, and return them to the University of Florida digital collections. So as a result of all this work, University of Florida now has a taxonomy of Florida specific
09:42
terms that it can maintain and expand to use and manage both print and digital collections. But also because of the use of the taxonomy as the default search in the University of Florida digital collection Lucene software, instead of using the normal default in Lucene, which is full text, we get
10:02
80% or better accuracy in retrieval. Retrieval in just full text would be 55 to 60%, which gives an awful lot of false drops and frustration to the user groups. If we just add the terms to the full text, we only get a 6 to 7
10:23
percent increase in accuracy, so searching on the controlled vocabulary terms is very, very important. So we have done the theses, we have proved the workflow, and began work on some additional collections as well.
10:43
The crosswalk between the UDC collections and the SOBEX catalog at the University of Florida and now that we've done the assessment of the search results and enhanced the metadata, the pilot itself is concluded.
11:05
Do our little dance here. I will say that we were very pleased with the results of the pilot. The one record is just obviously a tiny sample, but many of the records ended up with that equivalent of quality of
11:21
search terms. So we're very eager now to apply these tools and processes to the rest of our digital collections. So SOBEX, which Margie mentioned, is the open source software platform that we use, and it supports 11 subject metadata fields, which have not been used consistently over time. This is certainly not a
11:43
unique occurrence, but it's one that needs to be addressed to improve access to our digital collections. We need to standardize the use of the fields. That's essential to development and application of the enhanced metadata and to support advanced search focused on one or more fields. All topical subject terms will be assigned to a single field. Geographic terms, place names,
12:03
corporate names will each be assigned a single field. They'll have authority files. The existing terms will be mapped to the appropriate fields and are replaced with controlled vocabulary terms. This will, as Margie said, give much greater precision. So, for example, Jose Marti could be an author. He could be a subject. He could be part of a place or corporate name.
12:23
We're doing a major project with the Bibliothèque Nationale de Cuba, Jose Marti, and it's going to show us a corporate name in that context. As with the Portal Florida History Project, we need to apply automated tools and techniques to existing collections in the Digital Library of the Caribbean and also to new collections submitted to DLOC, including
12:45
our new Cuban Heritage Collections. This will be particularly important for proven retrieval across collections that have come from different institutions with different metadata, schema, vocabulary, and languages. We hope that this process will allow us to continue to have metadata in the native language of the submitted material, but also in English
13:05
so that there will be an easier common search across disciplines. One of the new collections that will go into the Digital Library of the Caribbean is resulting from our agreement with the Bibliothèque Nationale in Cuba to create deep, broad open access to the Cuban Heritage Collections and into DLOC.
13:25
The British – the Bibliothèque Nationale estimated that 58 percent of its Cuban heritage materials are uniquely held in Cuba, and it is committed to digitizing those materials. But it's asking UF and its partners outside of Cuba to identify sources and digitize the other 42 percent, making bibliothèque control
13:44
essential to avoid duplication of effort and make the collection as comprehensive as possible. They have shared their bibliothèque records with UF, and we have agreed to establish an OCLC symbol for them, which we will manage and make sure that all of their Cuban heritage records are available in WorldCat. We will also – in the project management database.
14:05
However, 16,000 of the records that they provided to us while digital are not in MARC format. They're merely scanned images of catalogue cards, such as you can see on the slide, had I changed that slide. So we once again
14:20
turn to Access Innovations for assistance with conversion of these catalogue cards. Since we've already had the three-minute thing, I'll just point out to you very quickly that –
14:41
What you see here, those of you that are of an older generation might actually recognize what those are. And on the left, you see some of the conversion that the bibliothèque Nacional did. But for those where there was not a conversion
15:03
to MARC, what we got was a card that was undifferentiated in any way, so it was just text. And so we tried to separate that text and we're fairly successful in separating the text into a full bibliothèque record. And for search purposes, we needed to separate out not
15:22
the call number and the name and the title and so on, but we also want to place a publication and data publication and the publisher, because for searching and grade literature in general, those are important fields. And so we didn't want just a blob of undifferentiated material. What you
15:42
find through this thing that we fairly quickly described to you is that we are actually creating the records in the SIS database, no longer creating them in the cataloging application or the ILS. And that means that with the records created in SIS, we are
16:06
exporting to OCLC and the OPAC as appropriate to the University of Florida digital collections as well. So this is a more up-to-date version of the image that we showed you.
16:22
The SIS input panel, then you see the repository itself and from the repository, we're exporting the records to all of the other places. So we are not cataloging originally in MARC, but rather inverting the cataloging process to create a metadata record in SIS and export
16:42
it as a cataloging. This gives us a very streamlined workflow. It goes title by title and catalogers find it a very fast way to work. Once it's in the repository, we can output in any number of formats, including HTML,
17:02
and all the records can be both print and electronic. So to conclude, I wanted to go back, assuming it will let me change page I wanted to go back to the original title, which Margie
17:21
actually created for us, the death of the library catalog with a big question mark. And I want to come back to that title because I think when we look at these new metadata tools and techniques, you can ask the question, does this signal the death of the library catalog? And clearly most libraries continue to invest significant time and resources in
17:42
cataloging and in the integrated library systems that host our catalogs. But our students who have grown up with Google are much less likely to turn to the OPAC for discovery. So the answer to the question is not yet. But as research libraries like ours continue to place more and more emphasis on digital collections, there will be reduced emphasis on traditional title by
18:02
title cataloging and greater emphasis on automated metadata generation. As Margie said, we're inverting the traditional cataloging process. Automated metadata will become the source of MARC records rather than MARC records continuing to be the source of metadata.