Improving Named Entity Recognition in the Biodiversity Heritage Library with Machine Learning

Cite

Related Material

ZBW - Leibniz-Informationszentrum Wirtschaft

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Mika, Katie Esquivel, Alicia

Formal Metadata

Title

Improving Named Entity Recognition in the Biodiversity Heritage Library with Machine Learning

Title of Series

SWIB17 - Semantic Web in Libraries

Number of Parts

Author

Mika, Katie

Esquivel, Alicia

License

CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/47586 (DOI)

Publisher

ZBW - Leibniz-Informationszentrum Wirtschaft

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Release Date

2017

Language

English

Content Metadata

Subject Area

Information Science

Genre

Conference/Talk

Abstract

Scientific names are important access points to biodiversity literature and significant indicators of content coverage. The Biodiversity Heritage Library (BHL) mines its content using the open source Global Names Recognition and Discovery (GNRD) tool from the Global Names Architecture (GNA) suite of machine learning and named entity recognition algorithms, to extract scientific names to index and attach to page records. The 2017 BHL National Digital Stewardship Residents (NDSR) are working collaboratively on a group of projects designed to deliver a set of best practices recommendations for the next version of the BHL digital library portal. NDSR Residents Katie Mika and Alicia Esquivel will discuss (i.) BHL and the significance of taxon names, (ii.) the current workflow, proposed improvements, and example workflows for linking content across scientific names including semantic linking to biodiversity aggregators such as Encyclopedia of Life and the Global Biodiversity Information Facility, (iii.) how to use scientific names for content analysis, and (iv.) optimizing manuscript transcription of archival content, which introduces problems like outdated and common names, misspellings, and antiquated taxonomies to GNA tools. Authors invite questions, comments, and discussion from audience members as the Residents prepare to submit their final recommendations at the end of the year.