We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

pygeofilter: geospatial filtering made easy

00:00

Formal Metadata

Title
pygeofilter: geospatial filtering made easy
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
pygeofilter (github.com/geopython/pygeofilter/) is a library to support the integration of geospatial filters. It is split into frontend language parsers (CQL 1 + 2 text/JSON, JFE, FES) , a common Abstract Syntax Tree (AST) representation and several backends (database systems) where the parsed filters can be integrated into queries. ## Parsers Currently pygeofilter supports CQL 1, CQL 2 in both text and JSON encoding, OGC filter encoding specification (FES) and JSON filter expressions (JFE) as input languages. Additionally pygeofilter provides utilities to help create parsers for new filter languages. The filters are parsed to an AST representation, which is a common denominator across all filter capabilities including logical and arithmetic operators, geospatial comparisons, temporal filters and property lookups. An AST can also be easily created via the API, if necessary. ## Backends pygeofilter provides several backends and helpers to roll your own. Built-in backends are for Django, SQLAlchemy, raw SQL, (Geo)Pandas dataframes, and native Python lists of dicts or objects. ## Usage pygeofilter is used in several applications, such as PyCSW (pycsw.org/), EOxServer (github.com/EOxServer/eoxserver/) and ogc-api-fast-features (github.com/microsoft/ogc-api-fast-features/)
Keywords
202
Thumbnail
1:16:05
226
242
View (database)SoftwareDeterminantQuery languageFront and back endsLimit (category theory)Digital filterQuery languageFilter <Stochastik>State observerLattice (order)Service (economics)BitMetadataHome pageStandard deviationInformation securityLibrary (computing)Row (database)Data storage devicePhysical systemObservational studyComputer animation
Digital filterGeometryFilter <Stochastik>Category of beingEnvelope (mathematics)Query languageKeyboard shortcutFormal languageStandard deviationSource codeJSON
Computer reservations systemKerr-LösungLatent heatFormal languageFunctional (mathematics)Library catalogMereologyJSON
Filter <Stochastik>Semantics (computer science)Formal languageQuery languageFile formatOdds ratioStandard deviationExpressionDifferent (Kate Ryan album)JSONComputer animation
Codierung <Programmierung>Predicate (grammar)Combinatory logic1 (number)BitQuery languageFront and back endsCartesian coordinate systemRootRegulärer Ausdruck <Textverarbeitung>Formal languageSet (mathematics)Group actionAbstract syntaxNetwork topologyFilter <Stochastik>Combinational logicExpressionCategory of beingPredicate (grammar)Local ringInterface (computing)Vertex (graph theory)DatabaseProgramming languageFrame problemSoftware frameworkGeometryLevel (video gaming)Search engine (computing)Operator (mathematics)Negative numberCuboidFunctional (mathematics)Codierung <Programmierung>Attribute grammarData structureRow (database)ParsingForm (programming)Formal grammarParsingTerm (mathematics)Abstract syntax treeFile formatStandard deviationVariety (linguistics)Task (computing)Dependent and independent variablesoutputRight angleFraction (mathematics)Representation (politics)WhiteboardService (economics)Message passingAbstractionCASE <Informatik>System callComputer programmingGreatest elementScaling (geometry)Elasticity (physics)LogicDebuggerComputer animation
Abstract syntax treeAttribute grammarParameter (computer programming)GeometryObject (grammar)Abstract syntax treePredicate (grammar)Module (mathematics)Right angleComputer animation
Interpreter (computing)Abstract syntax treeFront and back endsType theoryRight angleLimit setFront and back endsMathematical analysisMathematical optimizationMereologyOrder (biology)Task (computing)Physical systemNeuroinformatikCodeLocal ringStandard deviationTranslation (relic)InjektivitätFormal languageElectronic mailing listGeneric programmingSet (mathematics)Object (grammar)Query languageLimit (category theory)outputLatent heatAbstract syntax treeMechanism designPlug-in (computing)Type theoryFunctional (mathematics)Service (economics)Different (Kate Ryan album)Performance appraisalFilter <Stochastik>ImplementationFrame problemIterationElasticity (physics)EstimationDebuggerCASE <Informatik>Network topologyAbstractionComputer animation
ParsingQuery languageAbstract syntax treeFront and back endsAbstract syntax treeFilter <Stochastik>Performance appraisalFunctional (mathematics)CodeFront and back endsResultantQuery languageMathematicsModule (mathematics)DatabaseCodeInheritance (object-oriented programming)Abstract syntaxProjective planeLatent heatElectronic mailing listComputer animation
Right angleProjective planeWritingService (economics)NeuroinformatikSphereComputer animation
Computer animation
Transcript: English(auto-generated)
Hi, everyone. Thanks for joining. I'm going to talk about a small Python library called pygeofilter. And the idea of the talk is easing up geospatial filtering. First, a little bit about me. I'm Fabian Schindler. I work for a small Austrian company called UX.
And we are dealing with geospatial data, Earth observation data, and geospatial services. You can check out our home page. As already mentioned, pygeofilt is a Python package. So it's pure Python to parse geospatial filters and apply them to storages.
This sounds really abstract. So I'm going to start with explaining a little bit more about the actual problem. So the actual problem is you have a store of geospatial metadata or data. And you have stored records there. And then you want the users to perform geospatial queries.
Maybe not simple queries, like some simple filters with values, but maybe some more complex filters. And you also don't want to expose your system to the public. Maybe you don't want other people to actually change the records on your system
without any security. So you just want to limit them to querying. There are a couple of filtering standards. So for example, we have the venerable OGC filters, which are XML-based. You can see here. You can see here.
You can see here. This is how you would define a compound filter. You have the end filter keyword. Here you compare a property with a specific value that you provided. And you can also make a spatial query with some envelope. And then you combine them all with the end keyword.
So this is XML-based. It's rather heavy-handed and verbose, as you can see. But there are other query filtering languages as well. So for example, as we have seen already with the CQL text, CQL text is part of the catalog specification. And as you can see here, it's basically
the same functionality. But now you have it in a SQL-like filtering language. It's much, much, much less verbose. And it's also quite expressive. But in the end, it actually supports the same concepts as the OGC filter.
There's also now the CQL2 JSON, which is the same filters that we have seen in the earlier one. So basically, the same semantics behind it are now expressed in JSON. You can do the same things. You can do geospatial queries. You can do temporal queries. You can combine them with AND and OR and so on.
So you, again, have the same functionality as before, but again encoded in a different standard. There's also a nice thing I really like is the JSON filter expressions, which is another JSON format. Now it's not a CQL, but it's a different format. It's kind of a LISP style. So you have the operand before, and then you
have the operands afterwards. I think it's quite expressive. But again, it's just another filtering language to do exactly the same things as with the other filtering languages we had before. So again, you have geospatial queries. You have value queries. You can combine them with ANDs and ORs and whatever.
So they have very similar concepts. They allow you to access the attributes of your records. They allow you to specify as predicates, usually spatially temporal and other ones. And then you have logical combinators. But the encoding is very different. And if you want to support more than one, you basically have to write your own parser for everyone
and apply them to your business logic, which is hard. This is, again, even compounded with the fact that there is many, many, many backends that you can actually then apply these filters to. So you have search engines like Elasticsearch. You have higher level frameworks like SQLAlchemy or Django.
You can also make low level queries for SQLite or PostgreSQL. And even GDAL has its own SQL interface that you could use. Then there's also other things like Geopounders. You can have a local data frame that you might also want to filter. Or you can have an OSQL database like MongoDB.
So again, we have a very, very diverse set of backends that you can actually apply the filters to that drives your application. And switching from one to the other is, again, a huge task. Maybe you want it. Maybe you want to support more than one
because you want your users to configure them. OK. So again, there is a huge variety of backends available. And this is actually where PyGeoFilter comes in. So on the one hand, it is a filter language parser for all the standards that we have seen before.
And it is an adapter for basically all the backends that you have seen here. It is available on GitHub. Pull requests are welcome. And how does it actually work? So it actually works with the concept that I've explained before. On the top, we have the green bars
are basically all the filtering languages that are supported. And on the bottom, the blue items are the backends that we apply the filters to. And in the middle, we have the thing called an AST. This is a term borrowed from actual computer language parsing.
So for example, a C parser or whatever language parser usually has the concept of an abstract syntax tree. The abstract syntax tree, I'll come back to it later. So OK. So the frontend parser actually has the responsibility of turning whatever is passed by the service into this AST.
And in some cases, for example, in the XML filters, we can actually build on parsers already available. So we can parse the XML and then construct our AST from that. For JSON, it's also quite easy. But for example, when we have the SQL text,
we have to write our own parser. This is where we are using the Lark framework. Lark is a really nice framework where you can define your grammar in a BNF form. And then you can parse out your AST that way. OK. I wanted to start on the abstract syntax tree beforehand. As we said, this is a term borrowed for the actual
programming language parsing. And it is basically a representation of the filters in an abstract format. And for us, it means it is the least common denominator of all the parsing languages that we actually support. And every node expresses a single filtering operation.
As the name implies, it is a tree structure. So we start from a root node, and then we have a tree structure. I will give an example on that. And then there are several node types. Some are for combining or negating. Some are for predicates. Some are for property access.
Some are even for arithmetic operations. You can also add function calls, or you can have input values. So for example, if you want to compare, that you only want to get the records which are inside of a specific bounding box, the bounding box is the input value. OK, now we finally have an example.
So this is a query written in the SQL programming language as a SQL query language. We want to get all the records which have a population lower than 50, or we have a name compared to some regular expression and are intersecting with the geometry. So it's a complex query.
We have also some parentheses to group several expressions into a single one. And then we have a combinator to combine them. We have the or combinator in this group, and we have the and combinator in the second group. And so how does this look like when we construct an abstract syntax tree? It actually looks like this.
So the root node would be the and. And then we have the first group in the parentheses, which is basically the left side of this whole tree. And on the other hand, we have the right side of the tree, which is just the right-hand side of the intersects, so the intersect here. I tried to color code it to make it a little bit more obvious what is happening.
So with the green ones are the logical combinators. The blue ones are the predicates. The yellow ones are the property accesses. And the red ones are the actual values. So here, for example, we are comparing with the literal value 50, which is an actual value.
And here, we also have a geometry that we are comparing to, which is also a value that is passed in. Right. So it's also possible for whatever reason you want to construct an abstract syntax tree in Python. You can simply do that.
It's a simple module, and there's just objects that you combine. So for example, we have here a geometry within predicates. And you pass in the left-hand side and the right-hand side. The left-hand side is the first argument. The right-hand side is the second argument. The left-hand side is just an attribute access, and we want to access the utter attribute.
And on the right-hand side, we are simply passing in a geometry in which we want to compare it to. OK. So this is how we construct an abstract syntax tree. But now, what can we actually do with it? Here's the concept of a backend and the evaluator. So they're very close to each other. The backend is what we call, it's actually
the implementations of, for example, your Elasticsearch service. This is your backend. And the evaluator is the piece sitting in between the backend and the AST. Why do we need that? We need to transform the AST to something that the backend understands. So each backend functions completely differently.
So for example, PostgreSQL requires SQL queries. Elasticsearch has a JSON construct defining the query, and so on. So for everything, you have a different set of filters or filter language, and so on. And this is very, very much, it varies a great deal.
So for each, you need basically a plugin to deal with this particular backend. Also, it's important to note that not all the backend support all AST types. So especially in the spatial theme,
not every query mechanism might be supported. So for example, MongoDB only has a limited set of spatial queries that you can actually perform. So it's not possible to translate the whole set of the AST nodes to that particular backend. Some are more conformant. So for example, PostgreSQL basically supports everything.
But some backends are lacking in that regard. And this is fine. You just need to know about it, and you need to handle accordingly. Because you have chosen your backend for a specific set of requirements. And if you know them, it's also possible to access them using the filters.
So PyG filter already has a set of built-in backend connectors. So we have one for Django. We have one for SQLAlchemy. We have one for Elasticsearch, MongoDB. There's also one for generic SQL. Generic SQL is quite powerful, because you can pass it
on to many different backends. So for example, you can also use it for the GDAL API. So you can have basically access to many, many, many different backends again through GDAL. Currently, there is a caveat. Because we're targeting GDAL, it is not
possible to properly separate the input values from the query, which is usually done in order to limit SQL injection attacks. So this should be used with caution. We also have the Geopandas backend.
So for example, when you have a local Geopandas data frame, you can create your filters and apply them to your local Geopandas data frame. We also have a native backend. So you have an iterator, or you have a list of local Python objects in your code that you want to filter. You can also do that.
There are also some special backends. One is for SQL2 JSON. The idea behind that is that we actually use a frontend language again as a target. So we get any AST from any filtering language. And we can encode that again into SQL2 JSON.
So actually, it's Apache filters now a translator between two filtering standards. This is used, I think, in the planetary computer by Microsoft because their system only understands SQL2 JSON. So this is why this is translated using Apache filter.
And it actually performs its task really well. There's also the next special backend, which is the optimization one. In order, we can do some static analysis on your abstract syntax tree in order to make it more simple. And in some cases, we can even cut out parts of the abstract syntax tree entirely. Usually, this is done by backends.
And they are more performant. And this is not really used anywhere unless you're using the native backend. Right. This is how you would use it in code. So for this example, I used a MongoDB. We connect to MongoDB. We select the database and the collection.
Then we are parsing a SQL query to an abstract syntax tree. This is where we then make the actual evaluation. So we simply parse it to a function called to filter, which is from the MongoDB backends module. What we get back is a filter.
We can then transform the filter yet again. So if we have expert knowledge about the backend, we can make changes to that or extend it with other filters. And then we simply parse it on to the collection or to MongoDB, which is the actual filtering work. And we get back our results. If you have a backend that is not represented on this list,
it's rather easy to roll your own evaluator. This is done also using Python. You simply inherit from a specific base class. And then you can, with decorators, declare which abstract syntax tree node you want to adopt in this function.
And then you simply write a small function that translates this abstract syntax tree node into whatever is required for your backend. There's also a catch all, which is this adopt function, which is called if this particular node was not handled by the others.
Right. Finally, some projects that are using pygfilter already. So pygsw, we have already heard. Also, our own geospatial data service called UX service using it. There's also, in the sphere of the planetary computer
by Microsoft, there's also a project called OGC API fast features, which is using it. Yeah, and I hear some rumors that it's going to be used in even more projects, so I'm really proud. This whole project is now just, I'd say, two years old. So it's rather new.
If you find it interesting, please write issues, contribute, discussions, pull requests. Everything is welcome. Yes, I'd like to say thank you for your attendance and your attention, and questions are welcome.