Merken

Supporting Open Data with Open Source

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
good afternoon folks and thank you all for coming and labor day with my name is mike 1 I'm with no at the National Oceanic and Atmospheric Administration the US federal agency you know on behalf of my co co-author objective about you dare the no data management attacked I'm going to be discussing the topic of supporting open data with open source and but if it the the it was
the right and so right along
and so this talk is kind divided into 2 different parts and the 1st segment and is gonna talk a little bit of background on problem Open Data and I guess what it means in the context of this presentation and also in the context of of the the US federal government and this stage so on back in the middle of 2012 there was a Presidential memorandum released you know federal government wide entitled Building a 21st century digital government and the real message of that was trying to codifies in specific ways in which the government could increase users of its services and also just improve the overall the digital experience and that of the citizens of of the US so and that was intended is kind of as a brought umbrella document and with more specific and fall on policies to come later on the most relevant here it is the what's called the project Open Data or Open Data policy which follows last year in May 2013 i in the form of an executive order title making open and machine-readable the new default for government information so this was a specific policy and that had some requirements placed on federal agencies and departments and to release their data and where appropriate in open and interoperable formats with the open licenses as well so that the main the main message of this policy was really just to to treat the government data on investments in government data the the I guess is as an asset so recognizing the intrinsic value of those investments and the intrinsic value of the data itself so the policy actually cited a few examples of historical releases of open data by the government and those included both of the GPS system which I think is particularly relevant here everyone knows of the value of of GPS in our current so loud would that initially was a private closed system developed by the Department of Defense and was released in the early nineties when it was completed for public use on the 2nd sample is actually a weather data that's released by by by agency no and his most traditionally been Open Data agency in that regard and in both cases so there in a really large industries that have been built exclusively off of that data and so crafty developers and entrepreneurs who have the innovated and created value added services on top of the data the and so really the the the core of the project Open Data about the executive order is to delineate a specific metadata schema which consists of both the vocabulary and the data format for describing datasets and that agency releases so that the format used in the Policies Jason was probably all familiar with and vocabulary is sourced from what have been previously common In geospatial metadata or other metadata descriptive vocabulary words and we show also mention that the the schema itself is released on get help in the spirit of open source so other creators of of the policy really wanted to embrace raise a spirit of open source and input from from both users of the actual data and the schema time as well as the implementers like the Federal workers ourselves such as myself
so could will be more detail on the the actual files themselves the the Executive Order essentially mandated that each the Federal Department lists it's open data at a particular prescribed location on the web so the public users can count on accessing these data that is files sometimes very massive files just a word of warning and we don't try to pass in home 1 year 46 or something the so that there were an basically the policy dictated that these these be published to a particular URL so there's some consistency there I don't know I also wanted the why that's really visible but this is just a small example and screen capture of part of 1 dataset that no produced to comply with the policies and also listed a few the scheme elements so you if you're familiar with geospatial metadata you can get the idea that there's some carry-over between the some of a common language is there so in order
to meet this and this mandate that no you know as I mentioned before is tradition then Open Data agency and work comprised of several data centers you have been releasing data available for free online for a number of years and who have no as a result develop their own catalog systems have their own inventories to can facilitate that data access and however we needed a way to essentially got existing information into a single output file this data that is on file from which would then be fed up the chain to the part of commerce no is actually an agency and the DOC and in order to to do that the decision was made to deploy a centralized data catalog to be able to harvest from these existing remote catalogs and by catalog is based on and c can software which is open source it was actually a collaboration between the no and the Department of the Interior and through an existing inter-agency working group called the Federal G applied to cut co-developed the systems that I want to be deployed for a department theories and know to and the way this system works is so 1st by harvesting the remote and their inventories and making use of a plug-in that's been developed for c can related to Project Open Data they can handle that translation from the native metadata format to data edges so I'm just a little work workflow diagram I guess of what the catalog does it takes in the existing data and does the translation also adds an addition the the benefit of CSW endpoints for query and data access on as well as a native web again that's again provide but
the so so that's kind of some context I guess for and the rest of my talk and what I wanna focus on is a particular full open source stack that I where I guess experimenting with applying in our it's not that necessarily operationally use at the moment but I just wanted to take some time to kind of illustrate how few and well known as open source project uh projects are all familiar with your can work together and in compliance with with project Open Data so the 1st of those due server and so on that's spatial data hosting platform for OGC Services and 2nd is you know and which is essentially a web-based geospatial content management system that's built to sit on top of GeoServer and provide a a a dynamic modern user interface to allow users to access bird to discover and access of the underlying server services unless of course you can just I spoken about before the
and so those background GeoServer a historically over the years geoservers servers certainly been used piecemeal in different offices in the agency along with other open-source spatial data hosting systems and however it hadn't really been used as an as an enterprise-wide solution until and 2011 2012 when other no high-performance computing and communications project chose to find a basically a project to set up a prototype GeoServer that could be deployed agency-wide and used by the individual i'm office data providers and have the resources to run Juicer themselves you could just rely on a shared solution to publish the data so that funding through that project was provided to OpenGeo to provide a few enhancements deduce server and the 1st of which was to have finalized some work that's been done on the security subsystem enables an enterprise integration capabilities like l that authentication and 2nd was having an first-class support for isolation so that essentially you know improved user management position permission system so the you can restrict users only have access to their information and not the the across the board which is obviously essential for enterprise deployment and so the results of the the that no GeoServer hosting environments been online for about 2 years for testing evaluation and at the URL this here and the this is the prototype and that wasn't really plan for operational transition from however I just want the highlight that this past year the weather service as part of the integrated dissemination program shows GeoServer alongside re server for production geospatial hosting service there will be some production web services running off server and nonlinear feature of Jamaica's pretty cool
so I'm a can step through this stack this open source and open data stack and that we've been testing out so that the the 1st the 1st layer I guess obviously is server and this is a bit of a simplification and user provides many additional service types and I just want highlight W Masson WFS the which is what we've primarily used in our our incubator on prototype system post yeah should also be mentioned because that's the most yes imposed SQL that's the underlying data storage backbone for archaea server instance and also used and in every each other component of this of this fact as well
and so on the I guess the 2nd year in the system is you know and for those familiar and you notice a web based content special content management system and it's really you know tightly pretty tightly coupled with your server so you essentially perigee node instance with GeoServer instance and it gives that kind of modern and In a Web user interface is really good for data discovery and it has fine grained permission controls and and other things so knows history would GeoNode goes back to years as well and it was actually included as part of the federal geo cloud an inter-agency working group in 2012 so I know a group of of participated in had a proposal to accepted the participate in that and which basically the but instead of a shared infrastructure for transition of agency hosted geospatial services to the cloud to Amazon Web Services so we and collaborated with them and to the system will that suit you know we're prominent and that kind of an the tinkering with ever since I guess and however even though our odd no node system isn't probably deployed yet but going to the project the Department of Energy came along and I decided that they were actually interested in using genetic so they were able to essentially and user infrastructure as a starting point and deploy their own uh you know basis in cognitive and durable naturally related to the national environmentally Environmental Policy Act
so my current step through some quick pilot unit features for those who don't know so this is a screen capture it brings to life the individual data layers with end-user services so the user can go to June and say search by on some common fields such as title abstract can do filter by isotopic category keywords and of course if there is a temporal information that they can filter by by that as well on June also includes an integrated CSW service this is critical for this overall stack and design as you'll see later on by default it's based off of pi CSW currently but it can be also be scanned plug and play with the system so if you wanna use some which you network that's available as well so so that provides a you know a good amount connection point with desktop GIS also for a huge is user who is using the spatial search extension there any other extension i can talk to assist the service as we map as well and that's a greater data discovery tool and so data
access GeoNode so mention is pretty tightly coupled you service so understands the output formats that the user provides so once a user has logged in and found the data that they're looking for and it provides a convenient and pointless and it's very easy to download the information directly and
additionally but there's kind of 2 different ways where you can the public data to g and so on it can be configured so that the user can log into the web interface that have a spatial data set they wanna share and they can interactively the basically put did you know no thought some relevant metadata and you know it will push it back to the level the automatically so there's also the opposite approach which is actually taking data from existing GeoServer and sucking in integer so either way I once your GeoNode instances populated with data layers and you get and the capabilities of others an integrated metadata editor on which I have been and here so you know there's some information lacking from the needed metadata you have the option to fill out of the of the user interface and there's also pretty fine-grained access control so you can share data with other users if you want groups of users and or you can just published probably as well the very
recently with and you know there's been some work done on some pretty cool new features and 1st of which is remote services so In June that is really meant to run off user however and it does have the capability to fledgling capability to connect to a remote purchaser point and be able to pass layers from the rest API of project server as well as remote WS and the servers and some others the secondly among those is you get so for those who don't know what you get is very similar to get its basically a better version editing for geospatial data and was a recent work done on through some but you know partners of June provides cannibal read access so if you can figure GeoServer with that you get repository that history that edit history for you to spatial data can be read by displayed within user interface and is also a an external a climax in client called map and that will actually handles the editing cited as well so and if you have a spatial data set and you configured unit instance to work with Napoli and provide a disconnected editing and also to a sink with the with the amount that you get repository so it's pretty powerful
data editing workflow the that's Annapolis the yeah there's actually a presentation Our Friday forget which so check it out of this last feature that I wanna mentioned is map so of course once you populate it with and this variety of layers you can and create it's you know integrated map mashups and have the same provisions to share with with users you choose the so I'm begin by the
arched architecture diagrams but GeoNode no no desire nothing instance and priests you know sits mostly on top GeoServer talks seducer via the REST API and adds that CSW endpoints for data discovery as well as the interactive catalog so moving along c
can I so but in the year that CSW endpoints in Juneau so that actually allows can to take that as a remote harvesting points so as I mentioned before and I know a data catalog really are harvesting several not catalogs currently and the the the Unisys is w that that integration can happen in there as well so I'm anyway the you including an instance can be automatically harvested by again and there's you know maybe some similarities between the 2 and products c can can take so that more of you data catalog approach to presentation on the does a good job of passing out on fields from spatial metadata and presenting it in approachable user-friendly way and it's good passing out from online resource linkages so users can have direct access thereto the end points you you want them to use to access your data and it's also pretty efficient in terms of search so that it has at the back end of Apache solar instance and they can be configured handle spatial search as well and so it's units it's a pretty powerful and it kind of really you sits alongside you know this this system the of course I can handle the the largest on translation so of interest anyone at their special specially
federal users and so the other thing I wanna mention this again has some some interactive mapping capabilities as well so if you're GeoNode instance provides for a really any spatial metadata provides a gatekeeper w doubly-nested capabilities and point once again has a native of mapreduce tool so you interactive capability there and if you know that you know it actually on this has to be will modified I had you treat myself to provide this capability but that's something that hopefully will be back merge back into the core of some point
so our our diagram here and I did didn't know that governs our our our logo 1st attendance and so on you can see I put them side by side you know I'm really it it just can't compliments your existing June outside and provides that remote harvest capability as well as integration with any external catalogs that you may you may want to use the the so Leslie dated
UK of and for those who are familiar and this sort of the the hemisphere of the US Federal Open Data and this is the federal government's Open Data catalog and it's also seeking can base and it works very similarly to the known data catalog it does the remote harvest of all of the existing have a kind of a whole variety of Federal geospatial metadata sources and I think the plan is to have it at some point exclusively harvest the data some files and that's quite and implemented yet but nonetheless the 1 means or another but it's the merged collection of all Federal Open Data
according to the open data policy so against kind of just alongside the core of the stock that I want to highlight but nonetheless the Federal space and the lack of is certainly important and based office in the same software and so just a few
take on points that I want to make and hopefully I've kind of that shown the how use open source technologies can be used together to create a full and Open Data stack for geospatial data the complies with federal the federal open data policy if that's of interest to you I know as an agency is trying to continue its role and and leadership in In the Open Data world we were keeping up of course with the latest policies as much as possible and lastly the I think you know really the no getting back to the original slide which as I mentioned the digital government strategy and 1 of the main goals of that was to develop a shared platform for Federal IT infrastructure and I think that you know the story that's been done c can related to the open data policy really kind of illustrates a good example of leveraging open source software so I think that you know if you read that it be if you really read that allow the development strategy that way it's really kind of encouraging not only the use of open-source software but also contributions you know as a as a community of of IT users in the federal government and why shouldn't we work together to can develop a common product and collaborate as posters you sit around and wait for someone else to do in order to go out and buy the same thing many many times it but really make sense
and so Leslie et the mentioned a lot has worked at that I've been involved with the last few years and would have been possible without the support of the governing Doug neighbor and passed away this year tragically but I really you know a lot of this in in a lot of other advancements in the Federal geospatial space with possible that does leadership so I just want to give credit where credit is due role of aggressive him and if anyone has any questions I'd be happy to try to answer them you reach out to either myself or just and the e-mail word Twitter the thank publish until operantly use always we for the feelings creada found a minus effigy a node and c can the their publicly available so there must be book yes a lot of room for sure the the you have the the the work done with that the you cloud I think there's no reason that it couldn't be just baked into an online but I don't know know for sure but I'm guessing have to be there the I the I must by an expose my ignorance but has the geo Jason deal with rested datasets of that the data that is on here yeah so it's not it's really it's really leveraging Jason's kind of metadata format and so on but in terms of actual encoding spatial data doesn't really do that so it's it's made that graphical there but had but it's basically provides the associated and metadata to the dataset along with silicon access URL so whether it's know that just not dataset published on the web if it's a Web API will contain that link but in terms of actual data encoding it doesn't really do that and moves the I'm not familiar with the to node c can as much I like but when would use when the users use 1 versus the other the sharing of both all yeah that's a good question I mean I think I think really us can is kind of a a good entry point to the actual data or to a dataset that's that's exists and you know and so if you do that that connection the system the connection between the 2 of the what the incident as well as its index well by Google so for instance yeah someone does a good search they can find the page on honesty can site and then be directed to to you know to kind of and the more interactive mapping capabilities so I think it kind of flow that way most likely maybe a could say unknown to this so there are 2 different 1st study diesel so-called Open Data world and so called open data open geo-data world they're both a little bit different some dogs they kind of don't communicate to each other that well yet and we have to talk to to to be as true PQ guys should focus more tools Open Data guys to collect dolls rules more In summary by the the OK I assume everybody is looking for observed for next session which is called drinks in the whole I think think you think of
Objekt <Kategorie>
Offene Menge
Open Source
Arbeit <Physik>
Datenmanagement
Offene Menge
Open Source
Güte der Anpassung
Computeranimation
Offene Menge
Bit
Electronic Government
Virtuelle Maschine
Implementierung
Kombinatorische Gruppentheorie
Computeranimation
Ordnungsrelation
Open Source
Metadaten
Deskriptive Statistik
Bildschirmmaske
Digitalsignal
Geschlossenes System
Standardabweichung
Stichprobenumfang
Softwareentwickler
Default
Umwandlungsenthalpie
Open Source
Default
Gebäude <Mathematik>
Physikalisches System
Kontextbezogenes System
Ein-Ausgabe
Dateiformat
Dienst <Informatik>
Offene Menge
Rechter Winkel
Mereologie
Dateiformat
Speicherabzug
Projektive Ebene
Wort <Informatik>
Information
Halbordnung
Message-Passing
Resultante
Offene Menge
Metadaten
Freeware
Formale Sprache
Gruppenkeim
Content <Internet>
Zahlenbereich
Online-Katalog
Element <Mathematik>
Physikalische Theorie
Computeranimation
Datensichtgerät
Rechenzentrum
Open Source
Metadaten
Benutzerbeteiligung
Online-Katalog
RPC
Software
Translation <Mathematik>
Ordnungsbegriff
Widerspruchsfreiheit
Funktion <Mathematik>
Touchscreen
Addition
Open Source
Abfrage
Einfache Genauigkeit
Nummerung
Physikalisches System
Elektronische Publikation
Entscheidungstheorie
Motion Capturing
Software
Kollaboration <Informatik>
Diagramm
Verkettung <Informatik>
Offene Menge
Ordnungsrelation
Mereologie
Ablöseblase
Dateiformat
Wort <Informatik>
Projektive Ebene
Information
URL
Halbordnung
Resultante
Telekommunikation
Metadaten
Ortsoperator
Momentenproblem
Gruppenoperation
Web-Applikation
Automatische Handlungsplanung
Systemplattform
Whiteboard
Service provider
Computeranimation
Service provider
Physikalisches System
Web Services
Gruppentheorie
Supercomputer
Virtuelle Realität
Computersicherheit
Optimierung
Nichtlineares System
Informationsmanagement
Prototyping
Leistungsbewertung
Softwaretest
Nichtlinearer Operator
Benutzeroberfläche
Diskretes System
Open Source
Computersicherheit
Systemplattform
Physikalisches System
Biprodukt
Kontextbezogenes System
Content Management
Fokalpunkt
Office-Paket
Integral
Dienst <Informatik>
Offene Menge
Mereologie
Server
Authentifikation
Projektive Ebene
Information
URL
Programmierumgebung
Unternehmensarchitektur
Prototyping
Bit
Gruppenoperation
Gruppenkeim
Räumliche Anordnung
Computeranimation
Benutzerbeteiligung
Knotenmenge
Web Services
Datentyp
Zusammenhängender Graph
Inhalt <Mathematik>
Speicher <Informatik>
Prototyping
Addition
Suite <Programmpaket>
Benutzeroberfläche
Open Source
Physikalisches System
Content Management
Dienst <Informatik>
Offene Menge
Mereologie
Elektronischer Fingerabdruck
Server
Gamecontroller
Projektive Ebene
Programmierumgebung
Streuungsdiagramm
Instantiierung
Einfach zusammenhängender Raum
Videospiel
Punkt
Metadaten
Kategorie <Mathematik>
Desintegration <Mathematik>
Güte der Anpassung
Digitalfilter
Physikalisches System
Computeranimation
Integral
Motion Capturing
Dienst <Informatik>
Einheit <Mathematik>
Web Services
Datenfeld
Körper <Physik>
Verschlingung
Dateiformat
Information
Maßerweiterung
Default
Funktion <Mathematik>
Touchscreen
Punkt
Texteditor
Metadaten
Gemeinsamer Speicher
Dokumentenserver
Versionsverwaltung
Gruppenkeim
Content <Internet>
Informationsmanagement
Gerichteter Graph
Computeranimation
Übergang
Metadaten
Client
Einheit <Mathematik>
RPC
Gruppentheorie
Widget
Gammafunktion
Datenmissbrauch
Benutzeroberfläche
REST <Informatik>
Zwei
EINKAUF <Programm>
Konfiguration <Informatik>
Mapping <Computergraphik>
Texteditor
Dienst <Informatik>
Menge
Ganze Zahl
Gamecontroller
Server
Projektive Ebene
Information
Lesen <Datenverarbeitung>
Instantiierung
Mashup <Internet>
REST <Informatik>
Desintegration <Mathematik>
Online-Katalog
Kombinatorische Gruppentheorie
Computeranimation
Mapping <Computergraphik>
Diagramm
Verknüpfungsglied
Elektronischer Fingerabdruck
Computerarchitektur
Varietät <Mathematik>
Instantiierung
Web Services
Inklusion <Mathematik>
Punkt
Benutzerfreundlichkeit
Ähnlichkeitsgeometrie
Online-Katalog
Physikalisches System
Kombinatorische Gruppentheorie
Biprodukt
Term
Computeranimation
Integral
Metadaten
Online-Katalog
Datenfeld
Einheit <Mathematik>
RPC
Prozess <Informatik>
Front-End <Software>
Elektronischer Fingerabdruck
Translation <Mathematik>
Speicherabzug
Instantiierung
Punkt
Open Source
Automatische Handlungsplanung
Online-Katalog
Dienst <Informatik>
Elektronische Publikation
Quick-Sort
Computeranimation
Integral
Open Source
Metadaten
Diagramm
Online-Katalog
Kugel
RPC
Offene Menge
Varietät <Mathematik>
Punkt
Electronic Government
Open Source
Keller <Informatik>
Biprodukt
Systemplattform
Raum-Zeit
Computeranimation
Office-Paket
Rechenschieber
Kollaboration <Informatik>
Software
Offene Menge
Strategisches Spiel
Speicherabzug
Softwareentwickler
Halbordnung
Sichtbarkeitsverfahren
Web Site
Bit
Punkt
Gemeinsamer Speicher
Inzidenzalgebra
Term
Räumliche Anordnung
Raum-Zeit
Computeranimation
Homepage
Metadaten
Knotenmenge
Benutzerbeteiligung
E-Mail
Einfach zusammenhängender Raum
Beobachtungsstudie
Güte der Anpassung
Schlussregel
Physikalisches System
Binder <Informatik>
Datenfluss
Twitter <Softwareplattform>
Automatische Indexierung
Offene Menge
Dateiformat
Wort <Informatik>
Instantiierung

Metadaten

Formale Metadaten

Titel Supporting Open Data with Open Source
Serientitel FOSS4G 2014 Portland
Autor Wengren, Micah
Beaujardiere, Jeff de la
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31643
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2014
Sprache Englisch
Produzent FOSS4G
Open Source Geospatial Foundation (OSGeo)
Produktionsjahr 2014
Produktionsort Portland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Within the US Federal Government, there is a trend towards embracing the benefits of open data to increase transparency and maximize potential innovation and resulting economic benefit from taxpayer investment. Recently, an Executive Order was signed specifically requiring federal agencies to provide a public inventory of their non-restricted data and to use standard web-friendly formats and services for public data access. For geospatial data, popular free and open source software packages are ideal options to implement an open data infrastructure. NOAA, an agency whose mission has long embraced and indeed centered on open data, has recently deployed or tested several FOSS products to meet the open data executive order. Among these are GeoServer, GeoNode, and CKAN, or Comprehensive Knowledge Archive Network, a data management and publishing system.This talk will focus on how these three FOSS products can be deployed together to provide an open data architecture exclusively built on open source. Data sets hosted in GeoServer can be cataloged and visualized in GeoNode, and fed to CKAN for search and discovery as well as translation to open data policy-compliant JSON format. Upcoming enhancements to GeoNode, the middle tier of the stack, will allow integration with data hosting backends other than GeoServer, such as Esri's ArcGIS REST services or external WMS services. We'll highlight NOAA's existing implementation of the above, including the recently-deployed public data catalog, https://data.noaa.gov/, and GeoServer data hosting platform, as well as potential build out of the full stack including the GeoNode integration layer.
Schlagwörter Open Data
GeoServer
GeoNode
CKAN
PostgreSQL
OGC
CS-W
catalog
WMS
REST
ISO 19115
ISO 19139
XML
JSON
geospatial metadata

Ähnliche Filme

Loading...