New Approaches For Extracting Heterogenous Reference Data
Extracting heterogeneous references from texts, in particular from historical documents and humanities or legal scholarship, is an unresolved problem. Yet, there is currently no coordinated effort to develop solutions. In May 2023, we therefore invited scholars and practitioners from the social sciences, the humanities, and the informational and computational disciplines to participate in a workshop at the Max-Planck-Institute for Legal History and Legal Theory in Frankfurt/Main. There, we aimed to define the problem(s), establish the state of the art, and share resources. The overarching aim of the event was to find ways for jointly developing new tools and workflows which can unlock previously untapped reference/citation data in the humanities, law, and the social sciences. A particular focus was on newly emerging technologies that are based on (pre-trained) language models. For more information, see https://mpilhlt.github.io/reference-extraction/.