Bestand wählen
Merken

Data.gov/Geoplatform.gov CSW implementation through pycsw and CKAN integration

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
FIL everyone the this is my mind Jojo's and I'm here to presented the data the goal of the CSW is w implementation which is based on the places W and that it is using also seek an integration so in the in the next minutes we're going to I'm going to show you how how we implemented the project and during the last 6 months in jail again the need to really so that the presentation outlines today is that I'm going to give a short introduction about c can and by she is W and then I'm going to talk about what the components of the data of golf of and what are the features will which feature we implemented during this project and how the configuration was done and then I'm going to show you some demonstration of how to access the data how to search the catalog and what is coming in the next months so basically did of golf and that is the whole of the US Government Open Data in has it this is the 2nd iteration action of the project and you you're able to find the federal state and local data and resources in order to conduct research and of build applications and and we do have this obligation insurer or whatever you want with the data since it's about Open Data the data that called projects the reason is record right now is the around but by the General Services Administration but it's not completely open source so you just find the source code on github and you can may make modifications to it so the the new portal version 2 is based on c can which sees then abbreviation for Comprehensive Knowledge Archive Network it is an open-source show platform for building for publishing and sharing Open Data and that it has a really impressive history of deployments so far these are just a few of the of the well known in the deployment of its oddly you Open Data Portal The that the data that govern in use in the UK in the Australian in other places so and there's also as an extension of c can the which is actually enabling CT can to have geospatial capabilities so it has that is based on on POS actually and it had integrated OpenLayers actually not has also leafless support and there is also it can you can access the OGC services through all ws lived and it has now support for is W through by w and lately there has been a deal adjacent sport so by CSW is an odyssey servants is doubly server implemented in Python it's an open source project and the we're using the MIT license currently biases that use the fully fully certified to have the biology C and that we are in our reference implementation of CW and there are also on the was due incubation so this is now an overview of the history of the project the initially the there were discussions about integrating c can with places w basically because both projects are are Python-based and since spices w can be used as a library the word in the there were initial discussions in 2011 about doing such integration then in 2002 and 12 actually altered fan and the data docking partner adapted to build of the versions of these version of data the golf so then there what if any implemented the 1st probabi did during the 1st months of 2 thousand and 13 and then GSA took over the project and their extensions that we're and develop developed in order to continue and bring the product of 2 to deployment and production states are so but at some point a c can here have internal implementation of she w which was used in the UK but there were some issues with its uh so they finally made the choice to drop the internal playing and you spices w is there official on SERS w library and then we all went by CW was actively involved in the same implementation of we started working on a new features that will cover the needs of data that build golf and things like food asserts full-text search sorry and a repository filtering and connection pooling stuff like that I will talk about that later and then we released in the early to start a 2014 we release the prices W 1 . 8 who it's was the basis upon the what we need for Data Book golf and not currently where I actually Saturday we're going to release 1 . 10 for prices w and during the call the constraints here and this is the version that is going to be updated on a day of golf also it brings new features like OpenSearch I'll talk about that later in 1 more detail the so the goal of the project that are in our perspective work to be able to deploy a spices w
as as a w z i application directly into that your views you over there to be able to synchronize the metadata between c can and spices w because both projects have different database schemas and then to provide a collection of some level support because he can now for the the love this collection of uh uses a special extension to C can and the user uses data collections then we had the big issue of making this run very fast and we had to optimize for performance during this project and we did some documentation and the worst on the deployment of configurations so this is not a short diagram of how this project is being implemented in see that the the the the core software is c can which is actually working on top of both breasts and uh the was JS actually in order to be able to do special queries but that in order to perform a fast queries c can use a solar day but the later versions of c can actually can do special queries through seeker through solar but still pose serious is used because it has more features and also as you can see here there are many extensions should find c can on the left of the deal data golf extension of was implemented by directly for this project and also there are other prone plug-ins like the harvesters and dispersal extension our out of about later about how we deploy using a single and RPM that so by is w was used as a search engine next to solar in order to provide this this year's w interface to so c can it is in a very successful because it's targeting governments and companies and organizations and they're trying not to I'm uh there'll looting at all kinds of open data not only Jewish fossils so today's can is is is not about only just fossil software so the the good thing is that it provides a loosely coupled services which can be turned on and off according to the needs and this was a very successful projects and has kept capabilities like publicy and find that datasets storing data creating networks of of federated nodes of uh harvesting between the different sources of data and metadata at doing editing and management of metadata and also it has a very strong API which can be used directly to to to have to to give access to the the data so I have some screenshots of calls the original succumbed looks like where you can actually do searches through the user interface see carries is a pilot project so it's a bit me if users so it is of a Python based projects and there are search and discovery is done through this user interface where you just type keywords and then you get the results and based on the results you can you can redefine your searches or you can use the word or topic categories which are provided by the by the by the user interface and also you can you can actually looking into the metadata that are provided with the data and is very capable of far using lots of that many kinds of metadata that harvest from many sources and this is a very powerful feature of of c can and also there is the the the end there's a feature of using of of of visualizing Jewish fossil data sources directly on a map and also it it can do the same not only for Marx but also for greet into read data and uh and and it can create graphs so this the the Special extensions specifically uses so add sad actually 1 of sparse space spatial called to the port was Brest he of c can and it uses that to to perform queries and display the results to the front end for these projects of for below Gulf we have been using we have we have to we tried to use only Uysal XML metadata so actually when we harvested from buyer-seller catalogs we were doing transformations to owing or have a unified on XML representation of all the metadata and the and this is the the why has it is implemented today so there there are some extra capabilities that are added to the sea can call on like you see this again extension of the spatial extension adds them up to the left where you can actually create a bounding box searches you can refer to a refined the search with keywords and also econ it into and I futuristic into relevance so a random searches in terms of the bounding box so it will give you 1st results but are actually feeding better to the bonding books that you give and this isn't a nice thing to to see more and more data that you're expecting to get so the so this is also how it is performing this filtering that all you about so then
when you actually are find the that the dataset the true in looking for it provides all the resources and the metadata there in order to be able to view and download the data so you can send the uh from the space which is the resource page you can directly go to him up and see a WMS for example if there is such a resource or you can uh Click on their original XML filing he can directly look at it but there are a spot there are also a similar viewers for that so you can you can have you have different choices in terms of how you see your data and metadata and the the baker topic in this project was to be able to visualize many many different kinds of resources so there are things supported like that w manners that we 1st gasoline Donald save files or you can download of even which spreads its directly from the from the user interface the also a large part of the sea can there is there the there is a Davison preview our extension where you can actually see the the data space so that the within the viewer that is possible from the original resource and this is how you you get to see the exon at the regional XML file if you click on the on the on your so this
is this is w interface which was implemented are directly into into a sequence of and uh deployment so here we see a something more than 400 thousand records within the the the CSW endpoints so what
are the features that we implemented basically we needed to replicate the way that data gold gold does the as the collection this so they undergo has collections instead of showing to the end user for 100 thousand there records it it uses collections size as a as an intermediate little layer so you'll get the search something like 80 to 90 thousand records was uh which are actually collections of metadata so we needed to be able to do that and this is why we implemented a filtering process where you have your catalog you have 400 thousand records and you can actually filter those with uh with uh ns go query depending on if this is a collection that dataset or not and this was needed because we need to have resistance in the to have like similar and behavioral this is w with c can search up then we did some work work work on the database schooling for a w i in order to be able to deploy within the uh but within the environment of the adult got and actually the most important thing in the feature in this province was the Porter said full-text search this was the feature with that made actually the CSW endpoints really really fast so there's no problem searching those force that 40 400 thousand records or even more because both press has this nice feature and can actually be an index of all the data the then we did some work on the link type detection the big problem you that's on daily got you get resources from all those organizations around the US where you have of data like say fires within 6 as a prior archives when you don't know what it is inside so this was a pretty tough problem to solve and actually we can solve it but we kind of works poets Mars it so how like another problem would be when you have a WMS who was doesn't actually has the word w mass in in the URL what do you what do you do then you might get all ws what is that is it w message that WC at what is that's open the where problems like that and they were is own metadata provided by many organizations do have the link types and so we do know what kind of data collection and we we made some heuristics in order to be able solve easy cases but some other cases will not solve and how we need to ask for the organizations to provide new metadata which is something that nobody wants to actually have so of and also the last addition to our features here that we we are the 1st to implement the Odyssey OpenSearch doing time extensions because it's happened like 2 months before we release and it was it was something that people wanted so they say is that the extension that lets you do open so it's open search queries and providing bonding box is all but not only keywords so this was implemented the mean by sea as w and actually this is going to be released on Southern on Saturday but it is already In the day above costs w deployment apart from what we did for this project I'm going to I'm just showing you some some features that is w it's already doing like harvesting WMS and WCS and all those standards so we also we we support the public here i is always supported the sea we implemented the inspired by the the inspired by which have been using these 5 documentation diplomat to implement the service and the so we also support in many and databases we can work out of the nest relied on what we aspire to whatever is possible to be done through SQL out to me actually more features but we try to keep it simple the configuration of spices w can be done 4 minutes and Saxon very very simple to to to do that and we have a very extensible of pledging left Dexter where you can if you have a database and if you have your own scheme of metadata you can actually create a plug-in so that prices w can understand your your schema and provide that could uh responses to queries according to your database this requires a little bit of coding up it is already in integrated Lisa portals and other Python projects like deal and Open Data Data catalog and the and we tend to also real-time next most dates topic that and these are the standards with that we are currently supporting sold and 1 go through that lists of lists so how did we actually configure and deployed so was a long process was not something that was done actually in a few days and that's why we do and who also we didn't have any access to the physical machines we we had to provide
every step of the deployment of on an e-mail actually or in some kind of million how axis so we had to automate everything from from every single thing that would be done through a terminal we need to automate and everything so actually there was already work done there and the other day about golf people little using our symbol for that so we just because of that and we provided scripts to to do the deployment the treaty bodies that possible was is not used directly into the servers as it was or is and used to create the RPM packages that out then deployed to this to the servers of the GSA so it was a two-level deployments esteemed where we 1st create the IBM packages and then those RPM but others are sent to the production servers this way every 1 month or so yes 1 1 thing I the GSA is actually updating the the portal with new patterns and you the features so this is an easy way for the administrators to upgrade their systems 3 . of clusters there we have database cluster at the front end which is the seat and invited them you and also we have a plentiful harvest because choruses are actually doing the hard work their harvesting for from every source of metadata that is available around the US so if need a specific settled of hardware and we're using Santos are the operating system to that now what day of the cartilage is here how do we use it we have we have c can we can do it through the UI but why do we have c is w how can we use Si as W so part of the project was to document this process and this is why we created a set of documents and so on so that the users can actually do and understand 1st and then do some seed some simple w requests so this here I provide that there are some links on the documentation actually this presentation is available on the internet I would so I will give you the URL of the last slide and but also there are many other tools that somebody to use lacks is a data 1 of them is that it is good yes actually where there is the search plugin so which is now the the court you just like in so whenever you install troduce for or 2 . 4 or as a later version when that comes you are able to use the meta into to actually search the metadata from the above goal or any other C. is w server opt out there and also you can use any other CSW server client that you can go all you can have so I already told you about how do the the data are in in collections so these are the 2 endpoints that we provided for this year's w implementation so we have the 1st order of of the collective or the collection level filtering this is the URL where you can get you can serve the collections and then when you search the collections you can actually specify what the ideal the collection in order to go to the 2nd end point and search within the collection so there are tool actually interfaces instead the 1 just to be able to reproduce the workflow that that you can do 1 c can you I so since they have a ball users see can prices w others very interested in that so suddenly we hear from other organizations that want to reproduce this installation have met many people here and very happy to be here and our being part of this process and the few political nor the deployed the w extension there and the you see there are more and more coming so we have also applied deployment map where you can find the well who is doing that what did we learn from this process where the 1st thing that we learned is that you have to optimize your database this is something that cannot be done without database tuning and actually that the basis of need to be really fine fine tune in full in order to be able to do this kind of of heavily fitting of ability and searches and everything and also it's very important to be able to have very very well you have to create packages and you have to be very careful with dependencies and call you deploy those also us the software into your data and your production machines solely on it all came down to using the full text search of BOF press in order to be able to be really really fast and this is uh this is also part of the of the upgrade of the adult go to the latest Posterous 9 . 3 weeks actually was very fast we and other thing that we learned was that identifiers are very very important and what I mean by that the result of a metadata
files kind what's up with here thank you but in the and
why would that change the so the
original XML records from 1 from all the resources that we have a harvest have some the identifiers in order to be able to to an all but indexes data so C can have a does something a bit differently when the occurrences it creates new identifiers for the records in order to be able to know that I internally of how to manage internally metadata and that was very important because we had to find ways to synchronize or 2 2 2 2 2 coupled identifiers original once in the new what's so this was a beats Over trouble but there no it was a it was a the and and it was made possible also and some problems of where it there is no metadata were not the best data find I mean there were issues there and sometimes we have to actually asked for updates and this is a work in progress and some issues are already all all all always there but there would be fixed eventually so in the future what we want to do is to do a deeper integration in terms of not synchronizing between between of 2 databases but using only 1 that the reason this is also a work in progress the next year Beck's best is next big thing is going to be W 3 which is actually not yet released by LDC but we're also already working on that and I think at some point the next few months it will be released so we have some time she would show you a couple of things like they actually you I how you can perform searches directly
on so you can so but this is
the this is the data the Gulf portal of don't have enough time the I can do this or do then I'll just a well actually if we have to in atoms with another half so I was to
the so I want to send the used yes for supporting this project and J. flower who we worked together for many months now the edge of the sea and the the the GSA and that debate about golf development team and the integrator should columns and REI and the big thanks goes to 0 the biases w development team with company this and other things the unfortunately the project manager of who was not member passed away during a few months ago so I want to dedicate of this presentation to him and she was a really inspiring person who made this possible here uh and he he actually he was very passionate about C W 3 so but he never got to see but anyway that's life so this is for died thank you very much
Offene Menge
Nebenbedingung
Punkt
Gruppenoperation
Versionsverwaltung
Implementierung
Iteration
Online-Katalog
Kombinatorische Gruppentheorie
Systemplattform
Perspektive
Fächer <Mathematik>
Programmbibliothek
Zusammenhängender Graph
Maßerweiterung
Konfigurationsraum
Auswahlaxiom
Einfach zusammenhängender Raum
Open Source
sinc-Funktion
Gebäude <Mathematik>
Stellenring
Systemaufruf
Quellcode
Biprodukt
Integral
Dienst <Informatik>
Offene Menge
Basisvektor
Projektive Ebene
Wort <Informatik>
Ordnung <Mathematik>
Aggregatzustand
Resultante
Bit
Selbstrepräsentation
Versionsverwaltung
Kartesische Koordinaten
Ungerichteter Graph
Raum-Zeit
Computeranimation
Homepage
Übergang
Suchverfahren
Metadaten
Datenmanagement
Suchmaschine
Visualisierung
Randomisierung
Ortszeit
Auswahlaxiom
Schnittstelle
Sichtenkonzept
Datennetz
Kategorie <Mathematik>
Datenhaltung
Systemaufruf
Abfrage
Quellcode
Dialekt
Dienst <Informatik>
Projektive Ebene
Ordnung <Mathematik>
Subtraktion
Metadaten
Quader
Hyperbelverfahren
Selbst organisierendes System
Online-Katalog
Transformation <Mathematik>
Term
Viewer
Knotenmenge
Software
Datentyp
Maßerweiterung
Konfigurationsraum
Leistung <Physik>
Benutzeroberfläche
Dispersion <Welle>
Plug in
Elektronische Publikation
Mapping <Computergraphik>
Diagramm
Offene Menge
Mereologie
Debugging
Wort <Informatik>
Speicherabzug
Visualisierung
Schnittstelle
Bit
Prozess <Physik>
Quader
Selbst organisierendes System
Online-Katalog
Computeranimation
Virtuelle Maschine
Metadaten
Datensatz
Standardabweichung
Datentyp
Endogene Variable
Maßerweiterung
Konfigurationsraum
Schnittstelle
Addition
Datenhaltung
Konfigurationsraum
Heuristik
Ruhmasse
Abfrage
Nummerung
Mailing-Liste
Binder <Informatik>
Packprogramm
Arithmetisches Mittel
Dienst <Informatik>
Portal <Internet>
Forcing
Offene Menge
Automatische Indexierung
Projektive Ebene
Wort <Informatik>
URL
Ordnung <Mathematik>
Programmierumgebung
Message-Passing
Standardabweichung
Resultante
Prozess <Physik>
Selbst organisierendes System
Versionsverwaltung
Implementierung
Dienst <Informatik>
Kombinatorische Gruppentheorie
Computeranimation
Übergang
Internetworking
Virtuelle Maschine
Metadaten
Client
Online-Katalog
TUNIS <Programm>
Entscheidungsmodell
Software
Netzbetriebssystem
Mustersprache
Speicherabzug
Radikal <Mathematik>
Skript <Programm>
Cluster <Rechnernetz>
Maßerweiterung
Perspektive
Datenhaltung
Konfigurationsraum
Systemverwaltung
Plug in
Ideal <Mathematik>
Symboltabelle
Quellcode
Physikalisches System
Elektronische Publikation
Biprodukt
Binder <Informatik>
Meta-Tag
Mapping <Computergraphik>
Menge
Debugging
Mereologie
Basisvektor
Hochvakuum
Server
Projektive Ebene
Identifizierbarkeit
URL
Ordnung <Mathematik>
Perspektive
Bit
Punkt
Konvexe Hülle
Datenhaltung
Dienst <Informatik>
Term
Computeranimation
Integral
Metadaten
Datensatz
Online-Katalog
Arithmetische Folge
Automatische Indexierung
Schwebung
Identifizierbarkeit
Ordnung <Mathematik>
Ordinalzahl
Computeranimation
Demo <Programm>
Videospiel
Datenmanagement
Projektive Ebene
Kombinatorische Gruppentheorie
Softwareentwickler
ROM <Informatik>
Integral

Metadaten

Formale Metadaten

Titel Data.gov/Geoplatform.gov CSW implementation through pycsw and CKAN integration
Serientitel FOSS4G 2014 Portland
Autor Tzotsos, Angelos
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31698
Herausgeber FOSS4G, Open Source Geospatial Foundation (OSGeo)
Erscheinungsjahr 2014
Sprache Englisch
Produzent Foss4G
Open Source Geospatial Foundation (OSGeo)
Produktionsjahr 2014
Produktionsort Portland, Oregon, United States of America

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract This presentation will discuss the implementation of the CSW endpoint using pycsw within the Data.gov infrastructure (architecture/enhancements/testing/deployment) and CKAN, which powers Data.gov.CSW (Catalogue Service for the Web) is an OGC (Open Geospatial Consortium) specification that defines common interfaces to discover, browse, and query metadata about data, services, and other potential resources.Data.gov provides access to its catalog via the CSW standard for both first-order and all metadata for harvested data, services and applications. Data may be referenced from federal, state, local, tribal, academic, commercial, or non-profit organizations. The first-order CSW endpoint provides collection level filtering of all metadata records. The all metadata CSW endpoint provides all levels of metadata at varying levels of granularity.Any client supporting CSW (desktop, GIS, web application, client library, etc.) can integrate the Data.gov CSW endpoints.
Schlagwörter Open Data
Catalogue
CSW
OGC
pycsw
CKAN
Geospatial

Ähnliche Filme

Loading...
Feedback