The University Library of Tübingen has been faced with an increasing number of requests from different faculties to transform relevant scientific achievements from the past - like type- and sometimes even partially handwritten bibliographies, registers and dictionaries - to a form that lends itself to the import into a database based retrieval system. In a considerable number of cases such works comprise the lifework of an individual person and partly have been an indispensable tool for generations of scientists, but despite their relevancy still only exist in a limited quality printed form, since smaller subjects lack the resources to provide a structured digital edition. Since these works are often both highly idiosyncratic in their structure (i.e. they follow individual rules for the formation of the information given), contain a plethora of abbreviations, have a high entropy (i.e. nearly every token is relevant) but have in many cases never been written with the intent of providing the unambiguous syntactical structure needed for computer processing, they pose a considerable challenge for the transformation process. Moreover, the text scope is normally too large to allow economic manual transformation, but nonetheless small in terms of the currently needed amount of data necessary to train an individual model. This, accompanied by the fact that the institution only has restricted resources for these kind of projects, leads to the question, whether and how generally accessible methods can be used to achieve feasible results. |