AV-Portal 3.23.3 (4dfb8a34932102951b25870966c61d06d6b97156)

Durable geospatial data and the budding open-source ecosystem driving it

Video in TIB AV-Portal: Durable geospatial data and the budding open-source ecosystem driving it

Formal Metadata

Durable geospatial data and the budding open-source ecosystem driving it
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Too often geospatial data is treated in an ephemeral manner. Its metadata is missing, it is hard to find, and usually it is not preserved. Stanford University Libraries presents an approach for preservation, access, and discovery of geospatial data and maps in a digital repository. This community approach uses a sustainable, collaborative ecosystem of existing open-source software projects and standards, while also developing new FOSS4G tools to fill gaps. These new tools include GeoBlacklight and OpenGeoMetadata. Presented topics will include an overview of geospatial data in a digital repository, an introduction to the tools that make it possible, a live demonstration, and discussion about the role of libraries and repositories within the FOSS4G community. Jack Reed (Stanford University)
Keywords Stanford University
Web 2.0 Software engineering Computer animation Mapping Meeting/Interview Digital library Bit
Rounding Software engineering Open source Mathematical analysis Library catalog Metadata Web service Computer animation Software Different (Kate Ryan album) Internet service provider Resultant Physical system
Type theory Service (economics) Web service Mapping Computer animation Oval Server (computing) Multiplication sign Similarity (geometry) Attribute grammar Library catalog Open set
Point (geometry) Scheduling (computing) Broadcast programming Information Multiplication sign Translation (relic) Attribute grammar Open set Mereology Open set Googol Process (computing) Frequency Computer animation Oval Sheaf (mathematics) Set (mathematics) Speech synthesis Abelian category
Metropolitan area network Service (economics) Mapping Computer animation Server (computing) Forest Self-organization Speech synthesis Open set Freeware Open set Speech synthesis
Satellite Mapping State of matter Multiplication sign Consistency Execution unit Gradient Digital signal Letterpress printing Usability Open set Library catalog Measurement Web service Computer animation Term (mathematics) Office suite Row (database)
Latent heat Computer animation Computer cluster Self-organization
Scripting language Vapor barrier Computer file File format Density of states Port scanner Digital library Information privacy First-person shooter Computer Formal language Hypothesis Medical imaging Goodness of fit Term (mathematics) Different (Kate Ryan album) Information Website Right angle God Scripting language Area Variety (linguistics) Information File format Java applet Content (media) Audio file format Letterpress printing Term (mathematics) Group action Digital library Formal language Information privacy Type theory Uniform resource locator Process (computing) Computer animation Software Internet service provider Website Personal area network Information security Associative property
Raw image format Functional (mathematics) Group action Open source Projective plane Open source 1 (number) Digital library Mereology Portable communications device Lattice (order) Portable communications device Local Group CAN bus Geometry Computer animation Repository (publishing) Repository (publishing) Software framework Software framework Freeware Geometry
Standard deviation Group action Open source Computer file Algorithm Line (geometry) Multiplication sign Disintegration Open set Digital library Focus (optics) Computer Local Group Derivation (linguistics) Geometry Web service Different (Kate Ryan album) Software Repository (publishing) Process (computing) System identification Installation art Mobile app Standard deviation Mapping Block (periodic table) Software developer Projective plane Open source Database Digital library Template (C++) Open set Shareware Derivation (linguistics) Computer animation Software Search algorithm Computing platform System identification Resultant Geometry
Computer animation Vector space Mapping File format Term (mathematics) Set (mathematics) Summierbarkeit World Wide Web Consortium
Open source Computer file Virtual machine Open set Metadata Emulation Revision control Different (Kate Ryan album) Physical law Location-based service Addition Collaborationism Software engineering File format Software developer Projective plane Content (media) Existence Flow separation Disk read-and-write head CAN bus Computer animation Software Computer cluster Universe (mathematics) Interface (computing) Self-organization Moving average
Collaborationism Service (economics) Transformation (genetics) Server (computing) Disintegration Real-time operating system Digital library Metadata Computer Subject indexing Geometry Web service Computer animation Software Different (Kate Ryan album) Repository (publishing) Musical ensemble Data structure 5 (number) Geometry World Wide Web Consortium
Web page Axiom of choice Presentation of a group Backup Service (economics) Link (knot theory) Multiplication sign Tape drive Online help Open set Digital library Mereology Twitter Derivation (linguistics) Mathematics Meeting/Interview Term (mathematics) Different (Kate Ryan album) Computer configuration Repository (publishing) Internationalization and localization Logic gate God Standard deviation Information File format Code Planning Cloud computing Type theory Data management Computer animation Internet service provider Universe (mathematics) Self-organization Point cloud Right angle Geometry
Computer animation
web OK so we'll we'll
continue with Jack read from Stanford University who will give us an approach an overview of an approach are related to preserving digital maps thanks homonyms Jack Reed among software engineers work at Stanford
University Libraries and that is the metadata is evil guys still here who is that and others outside OK will that's that's so that's how I used to think like I'm a G 0 per cent in I I taking the software engineering because as I said I wanna do that with the data in the current systems the software was working on would allow me to do it into that's was like well I'm just gonna build these tools myself that I can into and then and then I came into the library world analysis like metadata now we don't you know metadata we don't we do with that and stuff but I know they've they've slowly brainwashed me and kind of I I I think about things like metadata and durable data so what so
what is durable data but we think about when we see durable data we don't know will not right now so let's let's like let's talk about let's non-durable data can look like so this is the data that goal over catalog that the federal US government has of federated web services and data from a lot of different providers search GIS and went to the 2nd result and the
web service that backs it is not found in if you've ever do use data . gov whose use data . gov or a similar type catego catalogs yet have you ever gotten and not found dataset all the time yes it happens all the time it's like half the data that's in there doesn't isn't doesn't exist a doesn't you know isn't there anymore for a year
in another data catalog and you you come across the data and here it looks like the Web service back in the data is there in the data is their great but
there is no license to the data and there's no description available so I I put this put this title in Google Translate and told me something anybody speak Japanese heroes like government town centers or something like that because it we get some information about it but I don't know anything else about the dataset and I would argue that's durable dated and then it so
this is from the City of New York City's Open Data policy and they have a you know the Newark City has done a great job they've created this whole open data policy and it's great everything and part of it and I looked up to see what what what about you know uh retention of the data and basically the policy is any agency can decide whether retention schedule is and when they cannot get rid of the data at any point in time is so that's why we see a lot of these like you know for
a forest on this Open Data on because it's not maintained its not durable
it's uh it's not being there you no maintained throughout I'd argue open data is not like free beer Open Data is like free speech it's great if it's there if you can use it but it's you know it's not free right just because your city or your organization is put up an Open Data Portal that's great that's a 1st step and that takes a ton of work such trivializing any of that that's a huge accomplishment but maintaining that for the long term so that the users are other people being use that data is and
so I think we're creating this the thing which I would call them open data gaps that not everything is gone digital everything's gone online in we don't print things out anymore right but you know a lot of people working agencies will don't come across as old catalog and some this whole cabinet and you all know what it looks like a disease like draws like this thickness cabinet and you plot these draws there's maps and cited by anybody worked in a place like that before and seen that yeah that's really call you a quarter these things of the printed maps in some units remain these maps really cool to and here in your agencies public well changing offices we don't keep these things anymore and so what should we do and a lot of times people say you know what I think a library or a museum would like them let's just give it to them right and call the local library and sometimes the libraries like grade that's an amazing resource nobody's ever had this before in the library will take it in sometimes they can't do that stuff to do that but you will these had the historic record will have that necessarily with this digital data that's maybe send up in a catalog if the date is now for a forward from web services will have access to it anymore 100 how do you you have a physical copy of that data in a in a map draw some and so we're creating this open data gap for creating the state a gap in the records so
I'd argue durable data is long term it's consistent it's it's usable from that talker where are we looking at a satellite imagery from 40 years ago that's amazing that we can measure glacial retreat I'm from 40 years ago do we have the data to look at flight Michigan when their water pipes were installed from 20 years ago we don't have that and we need to make sure that the top beforehand was excellent that that data is going to be usable in the future as well so in to talk about the
future we not only mean tomorrow but were also thinking 30 years what about 300 years when we want to know you know we we wanna look at glacial retreat 300 years from now what about 3 thousand years that's all if and this is from the book The Long Now organization this graphic but is starting to think about data and that may not be important to some important specific organization but I'd argue it's pretty important to humanity as a whole and I'm not
arguing that your organization company should be preserving your data for we know 500 thousand years right you you have to make money you have to you have to meet your business schools but there are organizations I work at Stanford library this is not what it looks like anymore but there are organizations use this is the sole purpose is to think about data preservation think about durability of data In into work with you on that and
so libraries really what we do is we provide access to information that's kind of our for mandate is providing access to information in Stanford libraries were really focused on Stanford researchers but Stanford researchers are studying everything in the world so we we we want we want to provide access to information on everything in the world In and we're really libraries is a really unique here is we think about things like durability of equity of access and privacy some of these things talked about here this week which is following it's good it's great but these are really places where libraries really shine we should think about when durability what that looks like you know in practice things like persistent URL to data it is the website where I can access the data work without access to script can still download something if it if I don't have job of it from the running JavaScript so do I have long term access to it and can I use it so this concept of digital preservation maybe I preserves the original file format but I also convert that to long-term preservation file format so that we know in 50 years we can use this file and you know your thesis that you wrote in MS DOS is in a God so we think about those things 5 years 50 years 500 years from now what what types of data formats can we use summary bring this all back software equity of access so libraries also wanna provide equity of access so regardless of who you are your age and ethnicity language income other types of barriers that you should have access to information as well as privacy so no library is this is uh this is kind of a mainstay of of of libraries is trained provide a safe access to information without tracking users for tracking users in a responsible way so I mention
all that stuff because these things that libraries think about libraries excel at and in in this this is really a mandate of the Stanford libraries we try to think about all the digital content that comes in in these 3 different areas preservation and access and discovery so we do this for images we do this for audio files moving image of scanned books but we also also do this for GIS and that's what I'm going to talk about so and
phosphogen community introduce you to some of the library free and open source communities that a part of the was called Project hydro one's called geo blacklight and the other one's called OpenGeo matters and know what we're really focusing on it building tools building functionality that drives this preservation access and discovery Goals of the library that also collaborate and work with some of the existing faster G tools that are out there so
project hydra and there's a geospatial interest groups in really Hydrozoa generalizable portable open source framework for building digital repositories people know a digital repositories are yeah see 2 history here who doesn't know what it is repositories and yet that's what I thought when I came in the library world unlike what is these digital repositories and everybody's talking about but really what it is it it's a way to and preserve provide fixity come in in and manage the data from long-term use of that's really what additional repository
come in and there's currently software development going on within this interest group for us off project called geo concerns is the name we we can work on the name of somebody it's a software project that allows you to submit geospatial data to a digital repository it does some automatic things for you so provides data versioning fixity so it's hashing all the data and it provides data versioning of that across the time identification persistence derivative creation so it can automatically create some access derivatives are can actually create preservation derivatives so you know that file database may not be term accessible but it may be a little convert it to something that is more long-term accessible for you and this is also 1 of the projects that I work on called you Blacklight due black really tries to further that discovery goal so how do people find by geospatial data but is built on the popular open-source Project Blacklight in and give a demo of it but it includes all different kinds of cool features customizable spatial search algorithm really focuses on user experience integrated open standards so of w mastery of us other web services support and it's easy to upgrade so it's built with the mind-set that you're gonna be upgrading this thing in maintaining the suffer long-term and um it's installable with a single command which which is really important so this is your block
like this is the Stanford and installation of a cold earthworks can go to it earthworks that Stanford UTU allows you to do spatial search you zoom around the map from Siemens more in the search results will actually update on the stuff you know where he's in you can do text based search auto-complete and faceted refinement here so you come in when you go to a dataset you can and do all kinds of things like feature inspection
downloaded in multiple formats but opening Carter DB title I guess is the cargo is the new term and the vectors people
readily quick access to the
data sets from them in the ways that they want to use this dataset I just showed there was found it's a Stanford researcher created dataset and it was found by somebody at the MSF and the sounds sounds Frontiers in and they used it to create maps for a sum for cholera outbreak earlier this year in Zambia so this these things are being not only used by researchers but India's in people from across the world it's based
on this Blacklight project which is used in places from the rock and roll hall of fame to national intelligence agencies it's an open source kind of fun into solar and it was recently just using the i i j Panama papers investigations the software that they use to support the research for so the the original developers of GO black were a software engineers from MIT Princeton and Stanford University With this additional contributors command from NYU In quick adoption by Notre-Dame and James Cook University in Australia and over the past years so you recently in the past month which means the spring in and released the 1 . 0 version but it's been in production at several of these institutions for over a year now and we're seeing widespread adoption in North America in the no also Australia so far and it has also been used in commercial and commercial companies and I'm not sure how long public they wanna make that but
it's also been used in other organizations besides universities in all these different universities now are sharing all over metadata were sharing a metadata on get out of we have a project called OpenGeo metadata again of organizations and people share their metadata collaborate on metadata from open issues of other people's metadata make pull request metadata and all that kind of stuff this is geospatial metadata now with the metadata toolkits conforming around us so if you want to do of conver ISO 1 9 1 3 9 to comply HTML there is a great packaging here for for you to do that and so if you wanna go check out of this metadata there's like this 97 thousand different metadata files in here all in machine readable formats with an API to it and for the ability for you to download it all and use it to your heart's heart's content so all
these technologies and discuss really integrate with existing some fast reduce software so we're using all these across the stack here and we really do we really think about this concept of the software should do 1 thing really well that way we can replace it with that software dies in we need to replace it with something else so for discovery we used you Blacklight metadata collaboration means OpenGeo metadata submission to a the digital repository GEO concerns metadata transformation enhancement music toolkit called geo combine too many jails and then we also have a toolkit to monitor Web service availability called geo monitor and this actually updates in real-time are discovery indexes so for a later goes down at you know that if 1 of our datasets or a that we've ingested goes down are users are updated in real time so nobody's going be prompted to download the dataset that's not actually available so all of these
different software kind of thing in our goals of preservation access and discovery and we don't want just you create new software when there's a great of fast solutions out there no they're all kind of works together to kind of build out the geospatial and the structure of our digital library so I'd
encourage you you know we talked about durability at the start of this presentation encourage you to talk with you know if you not from a light and there's 1 person in here from a library I I don't know if anybody else in his from a library but I encourage you to talk with your local university libraries and other digital repositories new by you dead to talk to them about preservation of data a lot of organizations like ourselves we want a partner with some the opening of the Open Data movement government organizations who are pushing and pushing gated Open Data portals to provide preservation services to that data so that they may not have the mandate to preserve that data for long term but we want help and aid them in doing so so that's all I have found this a kind of links to some of our get hubs in in different informational pages or at GEO black on Twitter from the questions a feel how do you see the cloud technology affecting wanted in a you for the past had the but to get the United States to get some data
copyrighted you that his appears the tapes from 1 of Ramadan said at the Library of Congress of the cloud can changes all the right all that it has an impact yes no definitely I think we're using cloud technologies in a lot of other institutions offer for some parts of the stack and we also are doing things which I didn't even mention here universities are involved in digital preservation works so Stanford you know other institutions we all will God data to each other so that you know when you when the big 1 hits in California and Stanford you know in California gets you know uh cut off you know the US you know our data is backed up to other universities inside the United States and around the world so be a term from access to it so we definitely use the cloud for different types of things and we use it for preservation to but we also have other preservation backup plans in place yes you can watch it again with the the here was when the year and how old the choice of fast standard that could be affected and the region that you I think the choice of the standard definitely impacts of the durability of the data from sometimes you have the option to to think about that in advance before the data is created but a lot of times you're handling it after the fact and so a lot of times what we'll do if it comes in in a proprietary format or hard to handle format we want to preserve the original dataset but will also convert it to a preservation derivative so that will try to should try to preserve that data as much as we can intact but to a standard based on open format also that it would be accessible in the future some I think standards work is really important here but when you're dealing with hundreds of hundreds of different types of datasets we can guarantee that different file format you can guarantee the longevity of stuff so standards are good you should do that and you should create data and data management best practices to tried true durability any other questions thinking about the