Supporting Open Data with Open Source

Video in TIB AV-Portal: Supporting Open Data with Open Source

Formal Metadata

Supporting Open Data with Open Source
Title of Series
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Open Source Geospatial Foundation (OSGeo)
Production Year
Production Place
Portland, Oregon, United States of America

Content Metadata

Subject Area
Within the US Federal Government, there is a trend towards embracing the benefits of open data to increase transparency and maximize potential innovation and resulting economic benefit from taxpayer investment. Recently, an Executive Order was signed specifically requiring federal agencies to provide a public inventory of their non-restricted data and to use standard web-friendly formats and services for public data access. For geospatial data, popular free and open source software packages are ideal options to implement an open data infrastructure. NOAA, an agency whose mission has long embraced and indeed centered on open data, has recently deployed or tested several FOSS products to meet the open data executive order. Among these are GeoServer, GeoNode, and CKAN, or Comprehensive Knowledge Archive Network, a data management and publishing system.This talk will focus on how these three FOSS products can be deployed together to provide an open data architecture exclusively built on open source. Data sets hosted in GeoServer can be cataloged and visualized in GeoNode, and fed to CKAN for search and discovery as well as translation to open data policy-compliant JSON format. Upcoming enhancements to GeoNode, the middle tier of the stack, will allow integration with data hosting backends other than GeoServer, such as Esri's ArcGIS REST services or external WMS services. We'll highlight NOAA's existing implementation of the above, including the recently-deployed public data catalog,, and GeoServer data hosting platform, as well as potential build out of the full stack including the GeoNode integration layer.
Keywords Open Data GeoServer GeoNode CKAN PostgreSQL OGC CS-W catalog WMS REST ISO 19115 ISO 19139 XML JSON geospatial metadata
Data management Goodness of fit Computer animation Open source Wage labour Open set Object (grammar)
Implementation Presentation of a group Building Context awareness Open source Multiplication sign Open set Mereology Metadata Latent heat Thermodynamisches System Core dump Descriptive statistics Physical system Form (programming) Order theory Default (computer science) Information Software developer Projective plane Sampling (statistics) Electronic Government Bit Message passing Word Computer animation Personal digital assistant output Right angle Service-oriented architecture
Group action Computer file Open source Decision theory Motion capture Numbering scheme Translation (relic) Open set Function (mathematics) Mereology Metadata Theory Formal language Element (mathematics) Number Web 2.0 Single-precision floating-point format Diagram Physical system Order theory Collaborationism Addition Touchscreen Information Consistency Projective plane Library catalog Flow separation Word Uniform resource locator Computer animation Software Query language Chain Data center Remote procedure call Freeware Resultant Halbordnung
Dynamical system Server (computing) Context awareness Group action Open source INTEGRAL Multiplication sign Open set Mereology Computer programming Product (business) Supercomputer Web service Prototype Operator (mathematics) Software testing Office suite Information security Computing platform Position operator Physical system Authentication Enterprise architecture Focus (optics) Information Content management system Projective plane Moment (mathematics) Planning Representational state transfer Performance appraisal Web application Data management Computer animation Nonlinear system Integrated development environment Internet service provider Telecommunication User interface Whiteboard Service-oriented architecture Resultant
User interface Addition Suite (music) Server (computing) Game controller Group action Open source Content management system Connectivity (graph theory) Projective plane Data storage device Bit Open set Instance (computer science) Web 2.0 Content (media) Prototype Computer animation Integrated development environment Point cloud Service-oriented architecture Physical system Geometry
Point (geometry) Default (computer science) Touchscreen Information Execution unit Motion capture Function (mathematics) Field (computer science) Connected space Category of being Goodness of fit Computer animation Video game Lebesgue integration Service-oriented architecture Extension (kinesiology) Physical system
Purchasing Point (geometry) Group action Game controller Server (computing) Execution unit Client (computing) Information privacy Metadata 2 (number) Revision control Computer configuration Energy level Integer Information Mapping Projective plane Shared memory Set (mathematics) Instance (computer science) Representational state transfer Computer animation Text editor User interface Remote procedure call Service-oriented architecture Reading (process) Directed graph
Mashup <Internet> Presentation of a group Computer animation Mapping Variety (linguistics) Architecture Diagram Representational state transfer Instance (computer science) Library catalog Fingerprint
Point (geometry) INTEGRAL Execution unit Similarity (geometry) Translation (relic) Library catalog Instance (computer science) Field (computer science) Metadata Front and back ends Usability Product (business) Process (computing) Computer animation Term (mathematics) Core dump Remote procedure call Physical system
Point (geometry) Open source Computer file INTEGRAL Variety (linguistics) Planning Library catalog Open set Sphere Metadata Computer animation Diagram Remote procedure call Quicksort
Point (geometry) Slide rule Order theory Collaborationism Open source Software developer Multiplication sign Electronic Government Open set Stack (abstract data type) Product (business) Computer animation Software Strategy game Core dump Office suite Computing platform Spacetime
Point (geometry) Web page Dataflow Link (knot theory) Observational study Open set Rule of inference Metadata Twitter Web 2.0 Coefficient of determination Goodness of fit Term (mathematics) Physical system Email Shared memory Bit Incidence algebra Instance (computer science) Representational state transfer Connected space Subject indexing Word Computer animation Website Geometry Spacetime
good afternoon folks and thank you all for coming and labor day with my name is mike 1 I'm with no at the National Oceanic and Atmospheric Administration the US federal agency you know on behalf of my co co-author objective about you dare the no data management attacked I'm going to be discussing the topic of supporting open data with open source and but if it the the it was
the right and so right along
and so this talk is kind divided into 2 different parts and the 1st segment and is gonna talk a little bit of background on problem Open Data and I guess what it means in the context of this presentation and also in the context of of the the US federal government and this stage so on back in the middle of 2012 there was a Presidential memorandum released you know federal government wide entitled Building a 21st century digital government and the real message of that was trying to codifies in specific ways in which the government could increase users of its services and also just improve the overall the digital experience and that of the citizens of of the US so and that was intended is kind of as a brought umbrella document and with more specific and fall on policies to come later on the most relevant here it is the what's called the project Open Data or Open Data policy which follows last year in May 2013 i in the form of an executive order title making open and machine-readable the new default for government information so this was a specific policy and that had some requirements placed on federal agencies and departments and to release their data and where appropriate in open and interoperable formats with the open licenses as well so that the main the main message of this policy was really just to to treat the government data on investments in government data the the I guess is as an asset so recognizing the intrinsic value of those investments and the intrinsic value of the data itself so the policy actually cited a few examples of historical releases of open data by the government and those included both of the GPS system which I think is particularly relevant here everyone knows of the value of of GPS in our current so loud would that initially was a private closed system developed by the Department of Defense and was released in the early nineties when it was completed for public use on the 2nd sample is actually a weather data that's released by by by agency no and his most traditionally been Open Data agency in that regard and in both cases so there in a really large industries that have been built exclusively off of that data and so crafty developers and entrepreneurs who have the innovated and created value added services on top of the data the and so really the the the core of the project Open Data about the executive order is to delineate a specific metadata schema which consists of both the vocabulary and the data format for describing datasets and that agency releases so that the format used in the Policies Jason was probably all familiar with and vocabulary is sourced from what have been previously common In geospatial metadata or other metadata descriptive vocabulary words and we show also mention that the the schema itself is released on get help in the spirit of open source so other creators of of the policy really wanted to embrace raise a spirit of open source and input from from both users of the actual data and the schema time as well as the implementers like the Federal workers ourselves such as myself
so could will be more detail on the the actual files themselves the the Executive Order essentially mandated that each the Federal Department lists it's open data at a particular prescribed location on the web so the public users can count on accessing these data that is files sometimes very massive files just a word of warning and we don't try to pass in home 1 year 46 or something the so that there were an basically the policy dictated that these these be published to a particular URL so there's some consistency there I don't know I also wanted the why that's really visible but this is just a small example and screen capture of part of 1 dataset that no produced to comply with the policies and also listed a few the scheme elements so you if you're familiar with geospatial metadata you can get the idea that there's some carry-over between the some of a common language is there so in order
to meet this and this mandate that no you know as I mentioned before is tradition then Open Data agency and work comprised of several data centers you have been releasing data available for free online for a number of years and who have no as a result develop their own catalog systems have their own inventories to can facilitate that data access and however we needed a way to essentially got existing information into a single output file this data that is on file from which would then be fed up the chain to the part of commerce no is actually an agency and the DOC and in order to to do that the decision was made to deploy a centralized data catalog to be able to harvest from these existing remote catalogs and by catalog is based on and c can software which is open source it was actually a collaboration between the no and the Department of the Interior and through an existing inter-agency working group called the Federal G applied to cut co-developed the systems that I want to be deployed for a department theories and know to and the way this system works is so 1st by harvesting the remote and their inventories and making use of a plug-in that's been developed for c can related to Project Open Data they can handle that translation from the native metadata format to data edges so I'm just a little work workflow diagram I guess of what the catalog does it takes in the existing data and does the translation also adds an addition the the benefit of CSW endpoints for query and data access on as well as a native web again that's again provide but
the so so that's kind of some context I guess for and the rest of my talk and what I wanna focus on is a particular full open source stack that I where I guess experimenting with applying in our it's not that necessarily operationally use at the moment but I just wanted to take some time to kind of illustrate how few and well known as open source project uh projects are all familiar with your can work together and in compliance with with project Open Data so the 1st of those due server and so on that's spatial data hosting platform for OGC Services and 2nd is you know and which is essentially a web-based geospatial content management system that's built to sit on top of GeoServer and provide a a a dynamic modern user interface to allow users to access bird to discover and access of the underlying server services unless of course you can just I spoken about before the
and so those background GeoServer a historically over the years geoservers servers certainly been used piecemeal in different offices in the agency along with other open-source spatial data hosting systems and however it hadn't really been used as an as an enterprise-wide solution until and 2011 2012 when other no high-performance computing and communications project chose to find a basically a project to set up a prototype GeoServer that could be deployed agency-wide and used by the individual i'm office data providers and have the resources to run Juicer themselves you could just rely on a shared solution to publish the data so that funding through that project was provided to OpenGeo to provide a few enhancements deduce server and the 1st of which was to have finalized some work that's been done on the security subsystem enables an enterprise integration capabilities like l that authentication and 2nd was having an first-class support for isolation so that essentially you know improved user management position permission system so the you can restrict users only have access to their information and not the the across the board which is obviously essential for enterprise deployment and so the results of the the that no GeoServer hosting environments been online for about 2 years for testing evaluation and at the URL this here and the this is the prototype and that wasn't really plan for operational transition from however I just want the highlight that this past year the weather service as part of the integrated dissemination program shows GeoServer alongside re server for production geospatial hosting service there will be some production web services running off server and nonlinear feature of Jamaica's pretty cool
so I'm a can step through this stack this open source and open data stack and that we've been testing out so that the the 1st the 1st layer I guess obviously is server and this is a bit of a simplification and user provides many additional service types and I just want highlight W Masson WFS the which is what we've primarily used in our our incubator on prototype system post yeah should also be mentioned because that's the most yes imposed SQL that's the underlying data storage backbone for archaea server instance and also used and in every each other component of this of this fact as well
and so on the I guess the 2nd year in the system is you know and for those familiar and you notice a web based content special content management system and it's really you know tightly pretty tightly coupled with your server so you essentially perigee node instance with GeoServer instance and it gives that kind of modern and In a Web user interface is really good for data discovery and it has fine grained permission controls and and other things so knows history would GeoNode goes back to years as well and it was actually included as part of the federal geo cloud an inter-agency working group in 2012 so I know a group of of participated in had a proposal to accepted the participate in that and which basically the but instead of a shared infrastructure for transition of agency hosted geospatial services to the cloud to Amazon Web Services so we and collaborated with them and to the system will that suit you know we're prominent and that kind of an the tinkering with ever since I guess and however even though our odd no node system isn't probably deployed yet but going to the project the Department of Energy came along and I decided that they were actually interested in using genetic so they were able to essentially and user infrastructure as a starting point and deploy their own uh you know basis in cognitive and durable naturally related to the national environmentally Environmental Policy Act
so my current step through some quick pilot unit features for those who don't know so this is a screen capture it brings to life the individual data layers with end-user services so the user can go to June and say search by on some common fields such as title abstract can do filter by isotopic category keywords and of course if there is a temporal information that they can filter by by that as well on June also includes an integrated CSW service this is critical for this overall stack and design as you'll see later on by default it's based off of pi CSW currently but it can be also be scanned plug and play with the system so if you wanna use some which you network that's available as well so so that provides a you know a good amount connection point with desktop GIS also for a huge is user who is using the spatial search extension there any other extension i can talk to assist the service as we map as well and that's a greater data discovery tool and so data
access GeoNode so mention is pretty tightly coupled you service so understands the output formats that the user provides so once a user has logged in and found the data that they're looking for and it provides a convenient and pointless and it's very easy to download the information directly and
additionally but there's kind of 2 different ways where you can the public data to g and so on it can be configured so that the user can log into the web interface that have a spatial data set they wanna share and they can interactively the basically put did you know no thought some relevant metadata and you know it will push it back to the level the automatically so there's also the opposite approach which is actually taking data from existing GeoServer and sucking in integer so either way I once your GeoNode instances populated with data layers and you get and the capabilities of others an integrated metadata editor on which I have been and here so you know there's some information lacking from the needed metadata you have the option to fill out of the of the user interface and there's also pretty fine-grained access control so you can share data with other users if you want groups of users and or you can just published probably as well the very
recently with and you know there's been some work done on some pretty cool new features and 1st of which is remote services so In June that is really meant to run off user however and it does have the capability to fledgling capability to connect to a remote purchaser point and be able to pass layers from the rest API of project server as well as remote WS and the servers and some others the secondly among those is you get so for those who don't know what you get is very similar to get its basically a better version editing for geospatial data and was a recent work done on through some but you know partners of June provides cannibal read access so if you can figure GeoServer with that you get repository that history that edit history for you to spatial data can be read by displayed within user interface and is also a an external a climax in client called map and that will actually handles the editing cited as well so and if you have a spatial data set and you configured unit instance to work with Napoli and provide a disconnected editing and also to a sink with the with the amount that you get repository so it's pretty powerful
data editing workflow the that's Annapolis the yeah there's actually a presentation Our Friday forget which so check it out of this last feature that I wanna mentioned is map so of course once you populate it with and this variety of layers you can and create it's you know integrated map mashups and have the same provisions to share with with users you choose the so I'm begin by the
arched architecture diagrams but GeoNode no no desire nothing instance and priests you know sits mostly on top GeoServer talks seducer via the REST API and adds that CSW endpoints for data discovery as well as the interactive catalog so moving along c
can I so but in the year that CSW endpoints in Juneau so that actually allows can to take that as a remote harvesting points so as I mentioned before and I know a data catalog really are harvesting several not catalogs currently and the the the Unisys is w that that integration can happen in there as well so I'm anyway the you including an instance can be automatically harvested by again and there's you know maybe some similarities between the 2 and products c can can take so that more of you data catalog approach to presentation on the does a good job of passing out on fields from spatial metadata and presenting it in approachable user-friendly way and it's good passing out from online resource linkages so users can have direct access thereto the end points you you want them to use to access your data and it's also pretty efficient in terms of search so that it has at the back end of Apache solar instance and they can be configured handle spatial search as well and so it's units it's a pretty powerful and it kind of really you sits alongside you know this this system the of course I can handle the the largest on translation so of interest anyone at their special specially
federal users and so the other thing I wanna mention this again has some some interactive mapping capabilities as well so if you're GeoNode instance provides for a really any spatial metadata provides a gatekeeper w doubly-nested capabilities and point once again has a native of mapreduce tool so you interactive capability there and if you know that you know it actually on this has to be will modified I had you treat myself to provide this capability but that's something that hopefully will be back merge back into the core of some point
so our our diagram here and I did didn't know that governs our our our logo 1st attendance and so on you can see I put them side by side you know I'm really it it just can't compliments your existing June outside and provides that remote harvest capability as well as integration with any external catalogs that you may you may want to use the the so Leslie dated
UK of and for those who are familiar and this sort of the the hemisphere of the US Federal Open Data and this is the federal government's Open Data catalog and it's also seeking can base and it works very similarly to the known data catalog it does the remote harvest of all of the existing have a kind of a whole variety of Federal geospatial metadata sources and I think the plan is to have it at some point exclusively harvest the data some files and that's quite and implemented yet but nonetheless the 1 means or another but it's the merged collection of all Federal Open Data
according to the open data policy so against kind of just alongside the core of the stock that I want to highlight but nonetheless the Federal space and the lack of is certainly important and based office in the same software and so just a few
take on points that I want to make and hopefully I've kind of that shown the how use open source technologies can be used together to create a full and Open Data stack for geospatial data the complies with federal the federal open data policy if that's of interest to you I know as an agency is trying to continue its role and and leadership in In the Open Data world we were keeping up of course with the latest policies as much as possible and lastly the I think you know really the no getting back to the original slide which as I mentioned the digital government strategy and 1 of the main goals of that was to develop a shared platform for Federal IT infrastructure and I think that you know the story that's been done c can related to the open data policy really kind of illustrates a good example of leveraging open source software so I think that you know if you read that it be if you really read that allow the development strategy that way it's really kind of encouraging not only the use of open-source software but also contributions you know as a as a community of of IT users in the federal government and why shouldn't we work together to can develop a common product and collaborate as posters you sit around and wait for someone else to do in order to go out and buy the same thing many many times it but really make sense
and so Leslie et the mentioned a lot has worked at that I've been involved with the last few years and would have been possible without the support of the governing Doug neighbor and passed away this year tragically but I really you know a lot of this in in a lot of other advancements in the Federal geospatial space with possible that does leadership so I just want to give credit where credit is due role of aggressive him and if anyone has any questions I'd be happy to try to answer them you reach out to either myself or just and the e-mail word Twitter the thank publish until operantly use always we for the feelings creada found a minus effigy a node and c can the their publicly available so there must be book yes a lot of room for sure the the you have the the the work done with that the you cloud I think there's no reason that it couldn't be just baked into an online but I don't know know for sure but I'm guessing have to be there the I the I must by an expose my ignorance but has the geo Jason deal with rested datasets of that the data that is on here yeah so it's not it's really it's really leveraging Jason's kind of metadata format and so on but in terms of actual encoding spatial data doesn't really do that so it's it's made that graphical there but had but it's basically provides the associated and metadata to the dataset along with silicon access URL so whether it's know that just not dataset published on the web if it's a Web API will contain that link but in terms of actual data encoding it doesn't really do that and moves the I'm not familiar with the to node c can as much I like but when would use when the users use 1 versus the other the sharing of both all yeah that's a good question I mean I think I think really us can is kind of a a good entry point to the actual data or to a dataset that's that's exists and you know and so if you do that that connection the system the connection between the 2 of the what the incident as well as its index well by Google so for instance yeah someone does a good search they can find the page on honesty can site and then be directed to to you know to kind of and the more interactive mapping capabilities so I think it kind of flow that way most likely maybe a could say unknown to this so there are 2 different 1st study diesel so-called Open Data world and so called open data open geo-data world they're both a little bit different some dogs they kind of don't communicate to each other that well yet and we have to talk to to to be as true PQ guys should focus more tools Open Data guys to collect dolls rules more In summary by the the OK I assume everybody is looking for observed for next session which is called drinks in the whole I think think you think of