Video thumbnail
A spatial view in the culture heritage domain
Seoul, South Korea

Culture heritage institutions are hosting digital historic map collection and the collections more and more allow spatial-temporal searching and georeferencing of its maps. At the Saxon State and University Library Dresden (SLUB) this lead to the development of the Virtual Map Forum 2.0, which is a spatial data infrastructure (SDI) for searching, visualization and georeferencing plane survey sheets. This SDI mainly relies on OpenLayers 3, Mapserver, GeoNetwork and GDAL. Beside that, tools for automatic georeferencing based on image recognition software have been developed and compared with the use of crowdsourcing tools for georeferencing. A further topic, on which culture heritage institutions are focusing is enrichment, transformation and merging of existing heterogeneous metadata sets. The goal is to allow better searching and utilization approaches for digital and analog objects. In the SLUB this lead to the development of the open source ETL-tool d:swarm, which supports the transformation and enrichment of metadata records. This opens possibilities for adding spatial identifier to large amounts of library objects, like pictures, newspaper articles or books and through this allows for a greater consideration of the spatial dimension in discovery systems. Another big topic is long term preservation, which becomes even more important with the growing number of digital native publications and datasets. Libraries and archives as experts of long term preservation and spatial data infrastructure provider, which are confronted with tasks and questions regarding the preservation of content. They therefor can benefit from an exchange of knowledge and work between each other. The presentation will give an insight into the world of culture heritage institutions. It will present topics, where FOSS4G and libraries can benefit from each other. Therefore it discusses different issues from within the SLUB where FOSS4G is used or could be used and spatial issues are affected. The main topics are spatial-temporal searching and visualization, georeferencing, metadata enrichment and long-term preservation.
and not all my name is Jacob then actually the last talks which I gave where mainly on conferences from the cultural heritage domain and at this conference the supports were mostly but too technical for the for them I'm conference with the thinking of those and I think probably my talk is not taking enough for you uh but what is my talk about this and that I will try to give you some small insights and topics from the cultural heritage domain from more or less force which if you uh well basically the your from
and G. informatics guy and I came to the section State and University Library in into 13 and work there is in the IT department and the latter is actually is quite big ones because like 400 employers we got larger malls like think for points we mean print media as 4 . 1 million picture documents so that's enough with advertising and our houses uh strong engaged in a couple of digitization projects and we try to bring a lot of content to sit at the main and we all the try to be an active member in a couple of different open source communities and for example in Germany there's a convention system with which is quite popular it's called type was free and we are a member of this association and we are all an active developer of a couple of extensions for the commencement of the we are a little on their own active member for example in another digitization suffered called B and there's the other projects and to give you the and that we have all the of source and but why tell you that an the whole point this got no
it's animated picture it's quite funny but unlikely we have no the error and when I came into city to the library I ask myself why a library I mean what can a library have to do with a spatial data or and spatial questions and I mean the libraries they got books they got newspaper is the got notes in these pictures etc. but where the spatial data well for me it shows and that there is an spatial IT area of the spatial domain and the cultural heritage institutions and especially the libraries was because the movement and we have more of a point in context and then I sort I thought maybe there are other people out there who share the same destiny and therefore could be a good idea to share my experience and some and projects which we're doing in our house and I picked a topic and In the last decade
library starts to a run and more and more and more digitization projects and they digitize a lot of different content and you can find this the normally the year discovery systems and on the web and they can view it on the web and we got a with an within this duchesse condom got a lot of the teachers concerned with a spatial dimensions we got like historic maps we get honorary we got an other come products we got books diaries and newspapers and articles pictures and so on and aware of you they got off like multiple spatial dimension of where is this article published on which rich that's the report we got their optical character recognition which is run with a lot of digitized images had to extract text from it and we can then take this text and scanner for location information and you know there's a lot of potential libraries but and unluckily most of the content isn't yet ready to use it and from the developer machines and it isn't great use for the public then and uh and we are now is now working into projects to get better used for the public and I will try to introduce in the next presentation in the following minutes from this project and ii further will try to begin another topic which I find quite interesting and I haven't seen so far a lot of and specialist conferences 1st of all
and I will introduce into and project is called a bottleneck for Web 2 . 0 and this project was funded by the German Research Association and we did it together with University of Rostock and is based on the old met all of the you get old map from something about 100 77 cells and historic maps and and I use em and run this something about 25 cells already digitized and this digitized maps could be a suspect it genetic progeria and viewed brother the old metaphor not and last thinking especially supported to another company called confronting biology and a lot of another libraries like that the the British Library the National Library of Scotland who give you this opportunity to cure reference historic maps and to search them by the Special temporal continue and we saw we will go to this you making a climate for funding to get funding and we build on the georeferencing infrastructure and data infrastructure for our students this approach about and within the project we
develop the portal and which allows users to perform a special temporal searching and was the observation of the maps and through various tools that supports the comparison and the animation of custom and the comparison of different that and animation of customers serious and the we give the user the possibility to visualize endurance maps on top of the basement on top of each other and and all that was like the original maps and further the portal functions as an entry point to our mapping services as well as to our G. referencing tools I and will built a crowdsourcing Durán's tool as well as an image and automatic image recognition tool for the thing to and
to to sum this up and the Durant in was both approaches approach is quite successful we at you reference them already like 5 thousand 700 thing so sheets and the quality was uh mostly really really good and there are only in about 1 per cent of the cases uh the quality was were that and I want you to square positions and nevertheless an automatic recognition isn't our task of 4 planes of sheets because we only have to do the edge detecting you see it here no it isn't too much of of it is to offer a rich therefore for an automatic image or the image recognition of the but becomes quite challenging when you and I go to 2 Oldham sites like island maps and so there you have the the to integrate a user to or a person who sets different from from 2 points and for that we will focus in the future more to refer to improve their crowdsourcing cure referencing tool because now we got a lot of another met types and which we want to share with the public in which 1 of to the scientists and have the next that you're France and they were published it through our the historic maps as the ice that's the guy completely rights on the force you comp competence and actually really request for of both the right range on tools out there when I was us Due to the lack of such an infrastructure was originally a fork and the dotted which we published in the world as the eyes mostly licensed under CC BY SA and CCC rule the and we support a multiple service interfaces like OWN as in the WCS enough of CSW so on so and so on and the made out of the maps this may be published in conformance to ISO 19 1 1 5 this is quite untypical full library domain but that it allows us to couple or infrastructure easily with other they should have that infrastructures like for example there there's the I Jimmy the and unlikely on this picture we don't the so good quality and the qualities of the better in the the pope on presentation but what we see is that we realize our WMS and double you see as the support was the matter was up where the nuclear here and we publish before and I have read plane so she'd we publish and a layer and it is because of all access policy which is from the portal it fits better nor system but some scientists all the last 8 you don't aggregate all the things of sheets 1 layer from model of Europe and make them time enabled and we did this photo as a time layer and you can vary it the wave although as implemented WFS was use of all of its main used for for 4 and why used user random etc. because when we and compare both products and and we find that use of kind better support of an the Tuesday and is not especially all the features like Asian nation was quite useful when you're doing it when using it as a surge in exports and how we implemented at this year's w uh with that you had stance on and the genetics stands and this running from the PostgreSQL and postures so on May time vector data is mainly as saved and was critical and postures and power as the data is safe and play of plane phi system is due to its and we use for somebody although see we got like the complete simple source think but was an like CIA that's nearly all 4 would gene and the only thing what is not force-producing that recognition tool that relies on a broker tell of Sophocles how how come and not at all it's not on the sketch and is all you can do all the build up this stuff again and for example the the portal and you're referencing code is available on good up all that to say it's not yet really good for using it because it's quite dirty and what we do in like refactoring but I will say a minute something more about this and the portal we use like open
3 that's an awesome library and because we like open there's really quite much we also use now another open-source project this is for example project called properly presentations and tapestry extension for it's basically about fuel for digitized objects and like useful from a thing 120 institutions in Germany so like widely used and far I better there's lot of institutional downloaded I don't know if everybody you but there are are enough big houses to users and no and for its only as you uh deduce model for this
project and we're going out to do some restructuring and refactoring and of the infrastructure to increase the interest the reuse of the lithium the of also of the the goal is supported and parts of all developments to a tapestry extension to make it easier reusable for other institutions in Germany and to support their for the 4th to to help them to publish their and that collections Asturians maps and and to give more you're for instance to the public the and we also want to further proof of what you're hunting crowdsourcing tools and all that it will support an ultimate types of what is right now already demanded by scientists are the planes of sheets with a scale of 1 to 1 to and the q logical maps OK so that was to the historic maps and
but like us at the beginning we got a lot more content and the libraries and a big treshold location information uh lies in the metadata of books and newspapers are a little which objects and then and and the gap often like the reference to a specific location but the problems surveys of that the special dimension is only represented through a location name and uh through this is only considered in phonetic search but what would be really cool is when we could users location and extent the information residue you coordinates and because and we can grade completely new port other search interfaces and we can also supports use of his new applications all the user can create your application with the data but before we can do this we have a many to solve 2 task but 1st we need a stronger attitude toward the talk and the not station about this and again due has also to called support in our case like historic the place names and maybe we have to support all operations like agrarian gazetteer and was a timestamp and named so we have to see an the the the when we it had not such a good gazetteer and their 2nd part task would be to create transformation workers to a map the location and against against you against a gazetteer and to act according sent to it rhetoric of this task and this task is the task doesn't sound too hot but it could become quite challenging and when you the when you see the multiple it organism partly quite declaring data signals which are used in was that by various and such rarities institution and libraries so we got often almost problems getting altogether and the at this point uh another open-source project comes in called the small an it is developed
through house and in cooperation with the company of complex and it's basically a crew of a graphical web-based ETL to and that should enable the variance among developers to cook to import that in different formats from different sources the create transformation to method the cost what would you and then also there's a strong focus to open data and that makes it tries to make the great integration of on transmission of workflows as well as the sharing of them as easy as possible what as mean it means and then use of the sun shouldn't be a developed world it should be like and the variance and on the love and if you have work in a library you know that their variances are like there really like nonetheless there there hard for them but for them the suffers as the be really easy the
and this is basically a screenshot from the software you can see l like year for example a system that area do like the mapping from 1 scheme to another schema gets you the the input schema gets out the schema knowledge you should I don't think techniques here this graphic unlikely an and you could know as a for example number you uptake into can now say this taking them to for example and this transformation workflow with this and then it becomes this this is the goal of of the what is the now for half
of you it this 1 is still in there that the better face but we already we already use it in our house to produce domain search index and and it's now and and the library of this the 1st big library in Germany who completely relies on open source software for producing their search index because the library domain there like some quite big companies who doing a lot of money and with costs that could be all the the sulfide open source of found the then but what does it mean for location-aware meted out there right now and the 1 hasn't really special functionalities an when you the there where we want to use it and to a couple it was a gazetteer and to at like a transformation workflow that use the library and the other person can easily like a match place names location names against this this year and a half and hence oracle was location information is more than I mean was was was scoring and we refer to it could imagine that this 1 could be improved for other transformation capabilities and with other transformation typically these especially regarding like transformation jobs and inspire program is the program used rubella for agencies right institution have to struggle with bring data from 1 scheme out another schema and if we could enhance these 1 in this direction that could be also a lot of further and use cases at all I mean this software is open source so you are invited to you if you have a job problems in using other so that's good and maybe you can take it and extended with the ability no I know this was already this already already quite a lot and and result meant to project and before well and of state to another an interesting topic but and I the and in the topic about which I wanna speak next is a topic which I haven't seen so far looked in the IT domain are in the Special Achievement and and but I find it actually quite interesting when I started library it's like and talking with digital
preservation their sons but boring and actually it is in the boring and but and there are lot of cultural heritage institutions although for all agency and also other private companies was from retinal with and if we look and that the world we see right now that the digital content is growing is really really there incredible speed the and all domains in the library sector for some we got like fast growing amount of rate through the digitized object that means like ditched objects from from 1 of the content because all the like native digital objects that means like for example and masterpieces or but that users are all on the enablers digital domain and we have to preserve the status because it we cannot preserve this data will be lost and fifties last all and that's the way because we don't got as animals and always entries libraries and archives have samples of expert price to preserve data but in the animals in the world and now they have to put up again this then the expert ice for the digital domain and Due to the when you see the specter picture in the past when you integrate text stone it was quite easy to preserve it for of use and when you know save a file on a hot try for and you doesn't give this 5 further tension in 6 years that we were really lost or damaged and and this is this
is a task where where digital preservation comes in and the goal of feature reservation is that to make that a exible unusable and 50 years plus and the main goal is their i is to achieve just as she ventures anticipate future scenarios it helps to preserve that I get a feel for what could be relevant in the future and what called upon to recognize that the teacher preservation is not a simple back if you safe for example a shapefile and keep it in a bag of 4 I would say like 50 years you can probably not use any more than 50 years when you take it from the back and again and if you could use it actually is the shape but would be a good for are conducted the four-digit preservation regarding the when it comes to practice we got like 2 main topics integer preservation is the mean that is a bit-stream preservation the quantum preservation there only a stump short book so the between training has to guarantee the correctness of the teacher information has to give the father of a loss of the failures the and was attrition and and basically the main problems they are sold other systems will help people through to to preserve data and and a lot of questions you have to and already solved and the bigger research topic right now the core the preservation of it has to guarantee the interoperability and the usability of the digital information the future and the main question here which 5 or much should we use and which needed at it is relevant and how should be preserved I will skip this 1 the 2 and is 0 then again a special view to this topic as a far we have made a lot of experience and uh with digitization of spatial data but we actually start right now to dig into this topic deeper especially regarding our that's in the house no and we see we really really see there is a growing demand for for services in this area the regarding digital preservation and I see it all of my part time jobs as IT consultant and we go like other for agencies and sex and they came to us and ask us and we got provinces pursuing this kind of data this kind of data that can be thinking and the the demands a and this demand that there is a growing and that the demands from teacher preservation and leads to new requirements regarding the use data formats that their data management system and the data lay status and we have to think about what about the specific requirements for the spatial data and especially regarding the the digital preservation and for some of which differ much should we use for the rest on vector that and I think it will be really hard task in the spectrum and if we see what many data from what we regard and what many requirements the and an almost via right now I cemented due to format for a world a digital preservation um we already use just for preserving um digital images differs a quite simple format and 0 it's a good candidate for a for this approximation and this is all the reason why we hope that we can do a they could use due to and so
Chen although on and we need like like software that is really really stomach compliance and why they this and when we pursue reflective data we often see you and that there is that that lost a lot of various so for products is actually producing like defies which which you can read in every major the major view about you cannot use in and all you they there in the little that defenders mostly in that and we have them for the check reservation to repetitive patterns or to repair our reproduce completely new and to add to sum up I know this this kind of boring topic and actually we don't have much to say right now you're to the spectral domain but I'm sure this topic will there will be a further throne demand for services and software in this area and I hope that and false will become a major actor and on this topic especially because of that aspect of openness and which brings a really big benefit therefore ditch preservation and where that has to be was the for 50 years plus and service itself but which are used to have to be up that updated and enhanced government to come to my conclusion both last slide and I know this well quite a lot of topics and I hope you I could give you a small feeling for all 4 especially the topics which was whom the cultural heritage domain is right in what we saw was a follower was for limitation is a short presentation about of the fast pool which is missing energy yet and we got a new shoulder destruction to the topic of Fisher preservation actually had make to every 1 of these topics for presentation at because you can tell a lot about the and LPC in indeed some the context point and and I was also I also hope that libraries will become more aware of the spatial data and open to the public and I think libraries to be and had a good partner for the fossil because I'm lost their mission always to to make information more open to the public and libraries all left now to make a role shift from domain to the and the digital domain openness their through with all this important was all its aspect could never be great commitment and there are actually a lot of people and institutions who try to push their policies to open this so if you get a chance to to support the work and thank you very much any questions and they they