Obtaining Training Data - Different Tasks, Different Options

Boulanger, Christian

Wagner, Andreas

Max-Planck-Institut für Rechtsgeschichte und Rechtstheorie

Wagner, Andreas

Formal Metadata

Title

Obtaining Training Data - Different Tasks, Different Options

Title of Series

New Approaches For Extracting Heterogenous Reference Data

Number of Parts

Author

Wagner, Andreas

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/68328 (DOI)

Publisher

Boulanger, Christian

Wagner, Andreas

Max-Planck-Institut für Rechtsgeschichte und Rechtstheorie

Release Date

2023

Language

English

Producer

Reinold, Fabian

Production Year

2023

Production Place

Frankfurt am Main

Content Metadata

Subject Area

Information Science

Genre

Conference/Talk

Abstract

When scholars want to make use of the best mechanisms for reference extraction, this endeavour invariably involves training the available tools for their particular dataset (and, eventually, for the particular task of reference extraction, too). Unfortunately, training routines, and - even more cumbersome - data formats to use for such routines differ considerably between the most promising tools. In order to provide some orientation for both scholars planning a reference extraction project, and those considering offering some data they are in possession of as training data, this presentation will discuss (a) the most important data formats and (b) data preparation workflows for such training data. In the former part, besides the data formats themselves, some conversion mechanisms are mentioned. In the latter part, various tooling options for the manual or semi-manual tagging of references in source texts will be presented briefly.