A spatial view in the culture heritage domain
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 183 | |
Author | ||
License | CC Attribution - NonCommercial - ShareAlike 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/32125 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Producer | ||
Production Year | 2015 | |
Production Place | Seoul, South Korea |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Forcing (mathematics)Public domainBitPoint (geometry)Computer animation
00:43
Internet forumDigitizingDifferent (Kate Ryan album)VirtualizationPresentation of a groupTexture mappingAssociative propertyAreaPoint (geometry)Projective planeSoftware developerExtension (kinesiology)MappingLetterpress printingOptical character recognitionMedical imagingWeb browserVisualization (computer graphics)Series (mathematics)Pairwise comparisonKonturfindungContent management systemOpen sourceUniform resource locatorInformationView (database)Vector potentialMetadataPublic domainService (economics)QuantumSoftwareTask (computing)Range (statistics)Component-based software engineeringContent (media)Server (computing)Functional (mathematics)MetreNP-hardPhysical systemVideo projectorDimensional analysisInterface (computing)Conformal mapMultiplicationWeb 2.0NeuroinformatikContext awarenessForcing (mathematics)Right angleRule of inferenceProduct (business)Sheaf (mathematics)Traffic reportingCASE <Informatik>Position operatorPlanningWebsiteVirtual machineAudiovisualisierungError messageFamilyState observerStudent's t-testComputer animation
09:51
Transformation (genetics)SoftwareData transmissionMappingUniform resource locatorArchaeological field surveyLogicInformationPublic domainNumberOpen sourceTimestampSpecial functionsPower (physics)Projective planeMereologyPresentation of a groupTask (computing)Physical systemService-oriented architectureData compressionPoint (geometry)Complex (psychology)Extension (kinesiology)Self-organizationMultiplication signDigital photographyNatural languageResidual (numerical analysis)Endliche ModelltheorieEuler anglesWorkstation <Musikinstrument>Operator (mathematics)GradientVector spaceInterface (computing)Cartesian coordinate systemVarianceAreaPattern matchingNumbering schemeoutputSubject indexingCode refactoringInstance (computer science)Proof theoryPlanningSoftware developerScaling (geometry)Different (Kate Ryan album)Shared memorySpecial unitary groupFocus (optics)INTEGRALDimensional analysisObject (grammar)Optical character recognitionMetadataOracleServer (computing)Row (database)File systemMedical imagingFreewareCoordinate systemWeb 2.0File viewerFunction (mathematics)Raster graphicsForcing (mathematics)Meta elementNP-hardData loggerCASE <Informatik>Content (media)Service (economics)Web application
18:59
Presentation of a groupSoftwarePublic domainObject (grammar)Content (media)File archiverDigitizingView (database)MereologyMultiplication signProcess (computing)Information technology consultingInformationService (economics)AreaPhysical systemVector spaceOpen setCommitment schemeArithmetic meanQuantumWave packetComputer fileTask (computing)Shape (magazine)Insertion lossCore dumpUsabilityTransformation (genetics)Projective planeState of matterBit rateGoodness of fitContext awarenessPoint (geometry)Shift operatorSampling (statistics)Slide ruleExpert systemLimit (category theory)Food energyOpen sourceData managementSpectrum (functional analysis)Computer programMedical imagingApproximationProduct (business)Pattern languageCASE <Informatik>Right angleNumbering schemeDirection (geometry)Computer hardwareMetadataBitAnalogyEmailImplementationMappingVideo gameCycle (graph theory)Raster graphicsComputer animation
28:07
Computer animation
Transcript: English(auto-generated)
00:04
Hello, my name is Jacob, actually the last talks which I gave were mainly on conferences from the cultural heritage domain and at this conference the talks were mostly a bit too technical for the folks there, now I'm on a conference with a lot of technical
00:24
person and I think probably my talk is not technical enough for you, but what is my talk about is that I will try to give you some small insights and topics from the cultural heritage domain from a more or less forceful G-view point.
00:45
Well basically I'm a geographer and geophomatics guy, I came to the Saxon State and University library in 2013 and worked there in the IT department. The library is actually a quite big one, it got like 400 employees, we got large amounts
01:04
like I think 4.3 million print medias, 4.1 million picture documents, so that's enough with an advertisement. Our house is strongly engaged in a couple of digitization projects, we try to bring our analogous content to the digital domain and we also try to be an active member in
01:27
a couple of different open source communities, for example in Germany there is a content management system which is quite popular, it's called Typo3 and we are a member of this association and we are also an active developer of a couple of extensions for this content management system.
01:45
We are also an active member for example in another digitization software, it's called Gobi and there are other projects only to give you a feeling that we also do some open source.
02:00
But why I tell you that, in the PowerPoint this got now an animated picture, it's quite funny but unluckily we have now the PDF.
02:22
When I came in 2013 to the library I asked myself why a library, I mean what can a library have to do with spatial data or spatial questions, I mean libraries they got books, they got newspapers, they got notes, they got movies, pictures, et cetera but where the spatial
02:44
data, well for me it shows that the spatial IT area or the spatial domain and the cultural libraries, well because I'm working at a library, have more points in contact than I thought
03:02
and I thought maybe there are other people out there who share the same destiny and therefore it could be a good idea to share my experience and some projects which we are doing at our house. But back to topic, in the last decade libraries start to run more and more digitization projects
03:28
and they digitize a lot of different content and user can find this content and normally we are discovery systems on the web, they can view it on the web, we got in this digitized
03:43
content we got a lot of digitized content with a spatial dimension, we got like historic maps, we got panoramic views, we got other cartographic products, we got books, diaries, newspapers, articles, pictures and so on where they got often like multiple special
04:02
dimension where is this article published or which region does he report. We got the optical character recognition which is run with a lot of digitized images to extract text from it and we can then take this text and scan it for location information though there's a lot of potential in libraries but unluckily most of the content isn't yet
04:26
ready to use it through the developers or machines, it isn't ready to use for the public and we are in our house now working in two projects to get it better used for
04:44
the public and I will try to introduce in the next presentation the following minutes to this two project and I further will try to dig in another topic which I find quite interesting and I haven't seen so far a lot of speciality conferences.
05:03
First of all I will introduce you to a project that is called Virtual Map Forum 2.0. This project was founded through the German Research Association and we did it together with the University of Rostock. It is based on the old map forum of the SLUB there.
05:23
Old map forum hosts something about 177,000 historic maps and panoramic views. From this are something about 25,000 already digitized and these digitized maps could be searched by thematic criteria and viewed in a browser in the old map forum but in
05:44
the last years they came especially supported to another company, it's called Klokan Technology, a lot of other libraries like the British Library or the National Library of Scotland to give users opportunity to geo-reference historic maps and to search them by a special temporal criteria.
06:02
We thought we will also do this but we make an appointment for a founding and we get a founding and we build our own geo-referencing infrastructure and data infrastructure for our historic maps. This is a project about it.
06:20
Within the project we developed a portal which allows users to perform a special temporal searching and visualization of the maps through various tools. It supports the comparison and the animation of custom, the comparison of different maps and the animation of custom map series.
06:41
We give the user the possibility to visualize the geo-reference maps on top of a base map or on top of each other and over to visualize the original maps. Further the portal functions as an entry point to our mapping services as well as to our geo-referencing tools.
07:03
We also built a crowdsourcing geo-reference tool as well as an image, automatic image recognition tool for the plain service sheets. And to sum this up, the geo-referencing was with both approaches quite successful.
07:21
We geo-referenced already like 5,700 plain service sheets and the quality was mostly really, really good and there were only about 1% of the cases, the quality was rather than 100 meters coordinate positions.
07:42
Nevertheless, automatic image recognition isn't a hard task for plain service sheets because we only have to do the edge detecting. You see it here, though it isn't too hard for an automatic image recognition software. But it becomes quite challenging when you go to all the map types like island maps
08:04
and so there you have to integrate a user or a person who sets different quantum points and for that we will focus in the future more to further improve the crowdsourcing geo-referencing tools because we got a lot of another map types which we want to
08:22
share with the public and which we want to open up to the scientists. After the maps were geo-referenced, they were published through our historic maps SDI. The SDI completely relies on Phospho-G components and actually we are really,
08:42
really grateful about the wide range on tools out there who allows us to build up such an infrastructure with a reasonable effort. The data which we published in our SDI is mostly licensed under CC by SA and CCC room. And we support multiple service interfaces like WMS, WCS, TMS or CSW, so on, so on, so on.
09:11
The metadata of the maps is mainly published in conformance to ISO-19115. This is quite untypical for the library domain but it allows us to
09:22
couple our infrastructure easily with other spatial data infrastructures like, for example, the SDI of Germany. Unluckily, on this picture, you don't see it so good.
09:44
The quality is a bit better in the PowerPoint presentation. But what we see is we realize our WMS and WCS support with the map server software. You see it here.
10:00
We publish for every plain service sheet, we publish a layer. That is because of our access policy which is from the portal, though it fits better in our system. But some scientists also ask us, hey, why you don't aggregate all the plain service sheets to one layer from middle of Europe and make them time-enabled?
10:21
And we did this, though it's all as a time-enabled layer. You can query it. We also implement WFS with GeoSurfer. It's mainly used for the portal. Why we use GeoSurfer and not map server? Because when we compare both products,
10:44
we find that GeoSurfer got a better support of the GeoJSON output and especially all the features like pagination, which is quite useful when you're doing it, when you're using it as a search index, for example. We implement the CSW with a Geonetwork instance.
11:06
The Geonetwork instance is running in front of a PostgreSQL and PostGIS server. Our meta and vector data is mainly saved in PostgreSQL and PostGIS. And our raster data is saved in a plain file system as geotiffs.
11:24
And we use, for example, JITA, although you see we got a complete, simple open source stack. Well, but it was quite good. Like you see, it's nearly all force for G. The only thing what is not force for G is the image recognition tool.
11:42
It relies on a broken software called Halcon. But it's not on the sketch here. So you can build up this stuff again. For example, the portal and the georeferencing code is available on GitHub. Also, I have to say, it's not yet really good for reusing it because it's quite dirty.
12:04
But we're doing like a refactoring, but I will say in a minute something more about this. In the portal, we use OpenAI 3. It's an awesome library.
12:21
And because we like OpenAI 3 quite much, we also use it now in other open source projects. This is, for example, a project called Gobi Presentation. It's a type of free extension. It's basically a web viewer for digitized objects, and it's useful from, I think, 100 or 200 institutions in Germany. So it's like widely used.
12:44
Or there are a lot of institutions who download it. I don't know if everybody's used it, but there are enough big houses who use it. Though, it creates the open layers, yeah. To give you a small outlook for this project,
13:02
we are planning now to do some restructuring and refactoring of the infrastructure to increase the reusability of our software. The goal is to port parts of our developments to a type of free extension to make it easier, reusable for other institutions in Germany,
13:21
and to support, therefore, to help them to publish their map collections as geo-reference maps, and to give more geo-reference maps to the public, though. We also want to further improve our geo-reference crowdsourcing tool, though that it will support other map types.
13:43
What is right now really demanded by scientists are the plain server sheets with a scale of one to 100,000, and the geological maps. Okay, so that was to the historic maps, but like I said at the beginning, we got a lot of more content in the libraries.
14:04
A big treasure of location information lies in the metadata of books and newspapers and other knowledge objects, and they got often like a reference to a specific location, but the problem is often that the special dimension
14:22
is only represented through a location name, and through this, it's only considered in a thematic search, but what would be really cool is when we could use this location name and extend the information with geo-coordinates, because then we can create completely new or other search interfaces,
14:40
and we can also support user with new applications, or the user can create new applications with the data. But before we can do this, we have mainly to solve two tasks. At first, we need a strong gazetteer that was a talk in the last session about this, and the gazetteer has also support in our case like historic place names,
15:01
and maybe we have to support all the operations where you can query a gazetteer with a timestamp and names, so we have to see. When we had now such a good gazetteer, the second task would be to create transformation workflows to map the location names against the gazetteer,
15:23
and to add the coordinates then to the data record. This task doesn't sound too hard, but it could become quite challenging when you see the multiple hit-organism partly quite antiquary data zeros, which are used and hosted by various
15:41
cultural heritage institutions and libraries. Though we got often in our house problems with getting them all together. At this point, another open source project comes in.
16:02
It's called BSWarm. It is developed through our house in cooperation with a company, Avangard Labs, and it's basically a graphical web-based ETL tool that should enable librarians and non-developers to import data in different formats
16:21
from different sources, create transformation workflows, and map them to custom output schemas. The whole software has a strong focus to link open data, and it tries to make the creation and the creation of own transformation workflows as well as the sharing of them as easy as possible. What does it mean? It means the end user of BSWarm
16:42
shouldn't be a developer, it should be like a librarian, a non-developer, and if you have work in a library, you know the librarians are like, they are really like non-developers. They are hard folk here, but for them the software has to be really easy.
17:02
This is basically a screenshot from the software. You can see here, for example, a system librarian can do the mapping from one schema to another schema. You got here the input schema, you got the output schema, and now you don't need the tag names here
17:21
in this graphic, unlucky. And you could now say, for example, normally here are tag names, you can now say this tag name, do, for example, this transformation workflow with this, and then it becomes this tag name. This is the goal of the software. What does it mean now for,
17:41
oh here, yeah, BSWarm is still in the beta phase, but we already use it in our house to produce our main search index, and it's now the library, the first big library in Germany completely relies on open source software for producing their search index, because over in the library domain
18:01
there are some quite big companies who are doing a lot of money with tasks. It could be also solved by open source software. But what does it mean for location-aware metadata? Right now, DSWarm hasn't really special functionalities.
18:26
We want to use it to couple it with a gazetteer and to add a transformation workflow that librarians and other persons
18:40
can easily match place names or location names against this gazetteer and enhance the record with location information, with coordinates. We further could imagine that DSWarm
19:01
could be improved for other transformation capabilities, with other transformation capabilities, especially regarding transformation jobs and the Inspire program. It's a big program in Europe where a lot of agencies and private institutions have to struggle with bringing data from one schema to another schema, and if we could enhance DSWarm in this direction,
19:21
there could be also a lot of further use cases. At all, I mean, this software is open source, so you're invited to, if you have ETL problems and you think, ah, this software looks good, maybe you can take it and extend it with special capabilities. Though I know this was already quite a lot,
19:45
and we saw now two projects, and before I will end, I will dig to another interesting topic.
20:05
The topic about which I want to speak next is a topic which I haven't seen so far, a lot in the IT domain or in the special IT domain, but I find it actually quite interesting when I start in the library. It's like, I will talk about digital preservation.
20:22
It sounds a bit boring, and actually it is a bit boring, but there are a lot of cultural heritage institutions, also federal agency, and also other private companies who are struggling right now with this topic. If we look in the IT world, we see right now that the digital content is growing with a really, really incredible speed
20:43
in all domains. In the library sector, for example, we got a fast-growing amount of retro digitized object, that means digitized objects from analogous content, and we got all the native digital objects. That means, for example, master theses or doctor theses are only available
21:00
in the digital domain, and we have to preserve this data, because if we cannot preserve this data, it will be lost in 50 years plus, and then it's away, because we don't got it analogous anymore. Over centuries, libraries and archives have built up expertise to preserve data, but in the analogous world,
21:20
and now they have to build up again this expertise for the digital domain. And when you see this picture here, in the past, when you create a text on a stone, it was quite easy to preserve it for a thousand years. When you now save a file on a hard drive,
21:42
and you doesn't give this file further attention, in six years, it will probably be lost or damaged. Though this is a task where digital preservation comes in.
22:00
The goal of digital preservation is to make data accessible and usable in 50 years plus. The main goal, as everybody, is to achieve, it has to achieve is to anticipate future user scenarios of the preserved data, to get a feeling for what could be relevant in the future.
22:23
What is also important to recognize is that digital preservation is not a simple backup. If you save, for example, a shape file and keep it in a backup for, I would say, 50 years, you can probably not use it anymore in 50 years when you take it from the backup again. And if you could use it, actually,
22:41
the shape file could be a good format candidate for digital preservation regarding vector data. When it comes to practice, we got like two main topics in digital preservation, that is the bitstream preservation and the content preservation, only some short works. The bitstream preservation has to guarantee the correctness of digital information. It has to deal with hardware failures,
23:01
with software failures, with hazards, with attrition, and basically, the main problems there are solved. There are systems who help people to preserve the data, and a lot of questions you have to, have already solved. The bigger research topic right now with the content preservation,
23:21
it has to guarantee the interoperability and the usability of the digital information in the future. The main questions here are which file format should we use, and which metadata is relevant, and how should we preserve it. I will skip this one and go directly to it.
23:41
So, again, a special view to this topic. So far, we have made a lot of experience with digital preservation of spatial data, but we actually start right now to dig into this topic deeper, especially regarding our maps in the house now. We really, really see there is a growing demand
24:02
for services in this area, regarding digital preservation. I see it, all of my part-time job, as an IT consultant, and we hear, like other federal agencies and sectionees, they came to us and asked us, hey, we got problems with preserving this kind of data, this kind of data, this kind of data, can you help us?
24:22
The demands, this demand that there's a growing, the demands from digital preservation leads to new requirements regarding the use data formats, the data management system, and the data life cycles. We have to think about what are the specific requirements for the spatial data,
24:42
especially regarding the digital preservation, and, for example, which data formats should we use for raster and vector data, and I think it will be a really, really hard task in the spatial domain if we see what many data formats we got, and what many requirements we got.
25:01
In our house, we, right now, examine the geotiff format for digital preservation. We already use TIFF for preserving digital images. TIFF is a quite simple format, though it's a good candidate for digital preservation,
25:21
and this is all the reason why we hope now that we can do a good deal with geotiff. Also, we need software that is really, really standard compliance, why I say this. When we preserve TIFF data, we often see
25:43
that a lot of various software products is actually producing TIFF files, which you can read in every major viewer, but you cannot use them, or they are invalid here. The TIFF headers are mostly invalid,
26:01
and we have them for digital preservation to repair the TIFF headers, or to repair or reproduce a completely new TIFF file. To sum up, I know this is a kind of boring topic, and actually, we don't have much to say right now here to the spatial domain, but I'm sure this topic will,
26:21
there will be a further growing demand for services and software in this area, and I hope that FOSS will become a major actor on this topic, especially because of its aspect of openness, which brings a really big benefit for digital preservation, where data has to be hosted for 50 years plus, and services and software which are used have to be updated and enhanced permanently.
26:44
To come to my conclusion, oh, it's the last slide, I know this were quite a lot of topics, and I hope I could give you a small feeling for special IT topics with whom the cultural heritage domain is right now struggling.
27:01
What we saw was a followed FOSS for G implementation. We saw a short presentation about a FOSS tool which is missing the G yet, and we got an even shorter introduction to the topic of digital preservation. Actually, I had make to every one of these topics a whole presentation, because you can tell a lot about it. I hope you see indeed some context point,
27:23
and I also hope that libraries will become more aware of the spatial data and open it to the public. I think libraries could be a good partner for the first world, because it was their mission always to make information open to the public,
27:43
and libraries also have now to make a road shift from the analogous domain to the digital domain. Openness with all its aspect could therefore be a great commitment, and there are actually a lot of people in the institutions who have tried to push their houses to openness,
28:01
so if you get the chance, support them. Okay, thank you very much. Any questions?