We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Supporting precision farming with GeoServer: past experiences and way forward

00:00

Formale Metadaten

Titel
Supporting precision farming with GeoServer: past experiences and way forward
Serientitel
Anzahl der Teile
351
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
Produktionsjahr2022

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
The amount of data available from drones, earth observation as well as machinery itself (i.e. telemetry data) plus the advent of cloud infrastructure has given a huge impulse to innovating the way we used to support farmers and farming in general, democratizing access to data and capabilities like never before through precision (or digital) farming solutions. Precision farming (or digital farming) has therefore become one of main use cases for GeoServer deployments over the past years and at GeoSolutions we have worked with many clients, from NGOs to large private companies (like Bayer), from startups to organizations like DLR in helping them to support their client to make sense of data and information through GeoServer and other geospatial open source technologies at scale, in the cloud. This presentation will condense 10 years of GeoSolutions in ingesting, managing and disseminate data at scale in the cloud for the precision farming industry covering items like: - Proper optimizations and organization of raster data - Proper optimizations and organization of vector data - Modeling data for performance & scalability in GeoServer and PostGIS - Deployment guidelines for performance and scaling GeoServer - Styling to create NDVI and other visualizations on the fly At the end of the presentation the attendees will be able to design and plan properly a GeoServer deployment to serve precision farming data at scale.
Schlagwörter
Güte der AnpassungInformationCoxeter-GruppeBenutzerbeteiligungVerschlingungMultiplikationsoperatorXMLComputeranimation
EntscheidungstheorieWorkstation <Musikinstrument>InformationEndliche ModelltheorieOpen SourceEchtzeitsystemProzess <Informatik>TypentheorieFokalpunktOrtsoperatorTermSystemplattformGebäude <Mathematik>Verzweigendes ProgrammKontextbezogenes SystemDatenfeldZahlenbereichVorhersagbarkeitPay-TVWorkstation <Musikinstrument>GamecontrollerSelbst organisierendes SystemDienst <Informatik>ProgrammierumgebungProdukt <Mathematik>Automatische IndexierungServerMathematikAdditionFlächentheorieElementargeometrieSchlüsselverwaltungProgramm/QuellcodeComputeranimation
Open SourceSoftwarePaarvergleichFrequenzMAPBeobachtungsstudieOrdnung <Mathematik>MultiplikationsoperatorVisualisierungEinsAutomatische IndexierungVektorpotenzialNeuroinformatikAnalysisSchnitt <Mathematik>ZeitreihenanalysePaarvergleichSpezielle unitäre GruppeKonditionszahlMereologieTechnische InformatikComputeranimationProgramm/Quellcode
Divergente ReiheSpeicherabzugSoftwareOpen SourcePASS <Programm>VisualisierungFormation <Mathematik>MultiplikationsoperatorZahlenbereichMereologieMAPBeobachtungsstudieÄhnlichkeitsgeometrieAutomatische IndexierungBitDatenkompressionBildgebendes VerfahrenDatenstrukturMosaicing <Bildverarbeitung>ClientVersionsverwaltungQuick-SortServerSoftwareMathematikOpen SourceCASE <Informatik>GatewayZeitreihenanalyseInformationDatenflussWeb-SeiteEinfache GenauigkeitSatellitensystemGeradeMinkowski-MetrikSoftware EngineeringDatenbankDimensionsanalyseProzess <Informatik>Endliche ModelltheorieCloud ComputingVerschlingungSichtenkonzeptDatensatzNormalvektorExplosion <Stochastik>PräprozessorPunktspektrumProgramm/QuellcodeComputeranimation
Open SourceSoftwareEnergieerhaltungFunktion <Mathematik>CachingBildschirmfensterClientAuswahlverfahrenMultiplikationsoperatorMosaicing <Bildverarbeitung>DimensionsanalyseBitURLDatenstrukturSkalarproduktProzess <Informatik>Automatische IndexierungInformationVerschlingungServerZahlenbereichFormation <Mathematik>LastPixelBildgebendes VerfahrenSchaltnetzOrdnung <Mathematik>Dienst <Informatik>GrößenordnungObjekt <Kategorie>TesselationStandardabweichungVererbungshierarchieMAPAlgebraisches ModellDatensatzTermSkalierbarkeitInformationsspeicherungCASE <Informatik>HilfesystemNeuroinformatikMini-DiscCloud ComputingComputeranimation
SoftwareOpen SourceFrequenzFunktion <Mathematik>Data Envelopment AnalysisProzess <Informatik>VektorraumEntscheidungstheorieEinflussgrößeAnalysisPaarvergleichInformationsspeicherungIndexberechnungDatenbankBitmap-GraphikProzess <Informatik>TabelleStreaming <Kommunikationstechnik>DateiformatDimensionsanalyseVererbungshierarchieMultiplikationsoperatorEinfache GenauigkeitMereologieVektorraumDemoszene <Programmierung>Endliche ModelltheorieÄhnlichkeitsgeometrieElementargeometrieEindringerkennungSingularität <Mathematik>ComputeranimationProgramm/Quellcode
GravitationsgesetzVisualisierungBetafunktionSoftwareOpen SourceSichtenkonzeptElastische DeformationPunktZahlenbereichInternet der DingePerfekte GruppeCASE <Informatik>BitElementargeometrieComputeranimation
StatistikVideokonferenzMeta-TagLoginSoftwareOpen SourceGraphikprozessorInformationsspeicherungCloud ComputingBitServerCloud ComputingKonfigurationsraumOffice-PaketCachingVererbungshierarchieMereologieZweiMaßerweiterungDatenbankSkalierbarkeitKartesische KoordinatenBenutzerbeteiligungRechenschieberGüte der AnpassungDatenstrukturMultiplikationsoperatorREST <Informatik>DatenparallelitätMosaicing <Bildverarbeitung>Demoszene <Programmierung>Wort <Informatik>Endliche ModelltheorieProdukt <Mathematik>ElementargeometrieVektorraumZentrische StreckungComputeranimationProgramm/Quellcode
Open SourceSoftwareStrukturgleichungsmodellGlobale OptimierungSichtenkonzeptVerschlingungProgramm/QuellcodeComputeranimation
Transkript: Englisch(automatisch erzeugt)
Good morning everybody So I'll try to skip through the marketing stuff Because we have I'm telling you the presentation is a wall of text with a lot of information Because this was supposed to be a workshop at the beginning But then we didn't have time to prepare it because we have too many So we decided that we were proposing we decided to turn this into a presentation
So the goal is to give you information that you can follow up later on So there are a lot of links to other resources Webinars and things like that and we try to basically talk about our experience in supporting companies one way or the other They work with you serving precision farming. So you might know what you said there is so I skip this as well
Trying to set that say the context and the key concepts G solutions we Developed a number of open source products One of probably the most important one is to use ever other companies work on it other contributors
But still I mean with a very important piece of let's say work and business for us we support Right now and this a number of companies that are one way or the other working with a precision farming The type of data varies a lot and
Companies tend to focus on one or the other a few focus on all of them But most focus on one or the other some are working with EO data. They're more focused on longer-term Forecast and predictions some they use it for doing real-time processing and indexes some focus more on drones Some focus on field sensors metal station and song think about wineries. I mean when you have
Like super expensive wines you want to know the temperature and the humidity and everything I mean close to the sea the salinity many people don't think about that But it's an important thing to keep under control and you need field sensors on the on the wineries
Vacant positions is more about intensive farming It's not an important thing for example for Italy because I mean we have a few large Farms but US I mean Ukraine obviously Russia Brazil There it's very important. We work with companies that work with meteorological models and more
Deployments varies a lot as you can imagine because Some are small startups some are large organizations The focus on your debt some are actually subsidiary of larger Organization that decide to branch off and focus on precision farming so the
Things varies a lot But in general what we help this organization build. It's actually does solution data is a surface So they sell the data usually as subscriptions data information indexes whatever and we help them build this platforms Focusing on the data we saw before
So usually we're talking about a lot of data that it's continuously ingested or changes very frequently Processing is done on top to come up with additional information dashboards are that's charts whatever Because you basically have to monitor The environment one way or the other when you do precision farming and the tools that you use to
Work on the environment like the tractors of echoes and song Everybody's heard the stories you know about the tractors they were Remotely turned off in Russia, whatever Anyway So what are we going to talk about we're going to try and address some of the things we have seen that can be
Challenging working precision farming with you said and I mean obviously tools around geo server But we focus in most and you said because that's what what we do. It's too early to fall asleep
Okay, first thing your data. That's probably the most common And it's actually where all the your companies are pushing for so we have a tremendous amount of your data Both from the private sector as well as the public sector just think about Sentinel and Lancet And then think about planet etc etc so we had to find a use for them
The typical your scenario is multispectral data Sometimes it's also sad data Because again think about wineries yields you want to monitor Landslides or potentials landslides or land movements that you want to dissipate it Because I mean think about the wineries in Italy or in in France even in Spain
Think about the ones on the Moser River. I mean they are on steep hills, so you need to monitor them, okay? But most part of time it's fully spectral and now hyperspectral data I mean, I'm a computer engineer, so I don't know anything about the science behind it And I'm not going to talk about it most part of the time. It's actually pure visualization or
building the usual Indexes with map and bandage about whether and DVI Some companies they focus only on individual imagery although they ingest a lot of data But I mean they they they're always focused on the freshest data some companies They work more with deep time series because they want to do comparison and analysis
One thing that you might have seen for example talking about wineries because I obviously mean you're in Italy it's a good example because they do many different things and they usually have a lot of money, so which is not bad and They help you experiment right now there is this huge push in order to understand how to cope with the fact that the
temperature is rising and this poses a huge threat to wineries because Maybe you know where you don't but too much Sun. It's actually super bad for the wine Because it increased the alcohol percentage. It changes the taste, etc, etc so you actually to predict that in order to understand the exact period when you
Cut the leaves how many leaves you cut in order to not expose too much the grapes to the wine, etc, etc And so there is huge studies with longtime cities to understand what to do and what to do in the future And for example investors are starting to think about yeah
We should probably start making wine more wine in England or in these places where I mean at the Roman time They used to make wine because the weather was different, but now they don't do it any longer But maybe the conditions will be super good in a few years and there is money and the technical people to do it So I mean it's a it's also about blending
Okay Typically mistakes that we see and what to do as I told you there is a lot of text and then a lot Of links, I mean the number one thing is Understanding the data you want to serve with you said We usually get involved with the the usual question you said that is low what can I do etc etc
9% of the time people have not optimized the data. They are serving especially when you're talking about eel You should always optimize and there are recipes afterwards We won't give you a lot of details here, but I mean, you know what to do cheetah compression tiling
Overviews these are things you can automate especially if you ingest earth observation data I mean as far as I know yeah, that's sending a lot of satellites But the good sources they don't change continuously one you set up automation you can ingest Terabytes actually petabytes of data relatively easy nowadays with the tools that we have and the cloud infrastructure So this is the first thing that we should look into and for example one thing that we do for some of the clients
We for example Usually duplication is bad But in certain cases you might want to come up with visualization already versions of your data While you still keep the raw data. What does this mean? Let's say you have
hyperspectral data or you have Even sentinel which is 13 bands if I'm not mistaken It's always good to have some data, which is prepared for quick visualization Even if you duplicate which means you have all your raw bands so you can do whatever index you want on the fly But you keep RGB pre-processed compressed you JPEG etc etc which is super fast and super small
So as I've wrote at the end do not fear compression tiling overviews you need to do them you need to do that As Service possible in your process Sometimes you might need to sacrifice
Space like this space etc for performance duplicating a little bit And there is a ton of links of things that you might want to review if you're interested there is also this document here precision farming whatever which you actually brought for a few clients and We decided to put it on the open. So it's like probably 70 pages of
Let's say ideas and suggestion on how to do things you just have The other thing this is when the first thing you might say yeah, I mean, it's obvious Everybody makes these mistakes at the beginning not optimizing data. So It is not that obvious I've seen people trying to serve a 500 gigabyte
DTM out of GSI with no overviews a Single line image a geotiff. I mean I said it is low. I mean, I'm actually surprised you're able to see something Anyway, the other thing is you need to organize your data into yourself at one way or the other and this is true for any server I'll go directly to the similar simple example
You start with you serve and you start publishing you collect your data I don't know Sentinel tools and then one lands up and you start publishing a single layer every time you get an image At the beginning works very fast after five years you have ten thousand ten million layers Again capabilities explode because it becomes one gigabyte and most part of the time you're not able to generate it
It's impossible when you restart yourself. It takes five minutes, etc, etc Why did that these happens because you didn't I mean when you start with a database? Well, I mean they told me in software engineer you should do, you know normalization, etc, etc But you always do a little bit of modeling. You just don't throw all the time data the database
Maybe someone will not agree with no SQL etc But still it's always good to think about what you do before you do it in general So the same approach should be used with yourself you need to try and organize your data if you see that your data is actually a Harmonized flow of similar data that changes over time like a time series
You should probably build a time series into yourself So a single layer all day in a petabyte of data rather than one million layers Or the same amount of data because it will become impossible to search So again, this is a very simple suggestions But I've seen many companies struggling with this because then what happened you build clients on top
You make the assumption that the structures of the day It is a certain structures going back and changing the structure to use image mosaic like a time series layer It's almost impossible because you need to rewrite the client So you end up doing all of sort of gateway in the middle because you restructured yourself
But you can't rewrite the client etc. Etc. So you are even more layers of problems the solution we usually use in this case and there is a number of Information here is to use image mosaic We'll see a little bit more afterwards basically image mosaic and there is similar tools in other software like map cetera
Allows you to have instead of many many different layers when you are collecting some remote sensing data or we do it also with wrong data I mean whatever I said it I you have like this more or less current and you can add multiple dimension It can be time It can be other dimension for example for drones you would see we use flight ID
You can get more drones later and some the same container you should do it so you can end up having 10 15 20 different layers with a time dimension More or less constant and you can use a rest API to ingest and delete data. You could have for example a sentinel to data with the row bands and then you see there are ways to actually do
Bandage and build the indexes on the fly and you might have on the on the side an RGB sentinel to layer that you keep ingesting the sentinel data all over the world and you use time and And other Indexes sorry other dimensions by a WMS by a WCS etcetera to to access the data
So this is super important also because it will help you with clustering we'll talk about that later on and Then caching usually when people see things slow. They said we should cache Yes, but I mean it's not a mistake that I'm talking about caching after the rest
Caching is the last thing you should do before you have improved that anything else because otherwise We lied to the problems for a little bit and it will make them worse to solve afterwards Okay And people tell me the solution is caching our standard. Yes, but if you optimize everything else before if not
You need to go back. Okay, once you have optimized the data. The structure is more or less optimal Yes, you need to cache you usually start with tile caching So service like caching and then if you can you need to look into HTTP caching so browser caching if you know what I'm talking about. This is super important
Just to put together the three things People may say but if I cache things I don't need to optimize the data well Not always I mean in my experience actually you should
In very few cases, I mean you only rely on caching because under load Caching per se doesn't improve things it actually Augment the amount of work that you need to do on the server Caching works very well as I usually see if you reuse the cache a lot But it has an impact because you need to do some processing you need to do
Some stuff etc etc. So it's important that you optimize first What you need to look for if you put together the dots is actually a structure where your data Doesn't update but it's appended all the time So every time it's easier to catch because when you have a new image
You're not replacing a you something and for the same URL you get the same image If you use a time dimension things like that every time you look at something it's a different URL And this is the the perfect thing for caching I'll try to skip some of this to go to the deployment Well coke we use a lot coke lately here
There are a little bit of information on on how to use it now. We just have a super cog for S3 Google Cloud and soon Azure natively it works both for a Pure geotiff as well as image mosaic and if you can use object storage, I would use it
It won't give you a performance boost over, you know, super fast discs, but in terms of scalability, etc It's the way to go and I mean we have our companies go from for example AWS EFS to S3 you can save like an order of magnitude of money. Okay with decent performance
This is what I was talking about image mosaic again, there is no time to go into details, but There are links or resources for anything The goal when you organize your data is to have the number of layers in geo7 more or less constant But to add them as containers with dimension so you cannot continuously data to it
Possibly in a bent way or in a moving window way. So you are the remove or you simply add Okay, we have had clients with petabytes data set So they've been adding a lot of data for a for a huge amount of time But if you only focus on a normal window, like no casting or things like that
You can just have a moving window. But having them a moving window is important. For example For what I said before for caching So I mean dimension etc, etc These obviously may make your client a little bit more complex, but still This is interesting. It just have a support
map algebra so in order to reduce The amount of pre-processing you have to do for example again, there was an example at the beginning most of the Easiest indexes they can be computed on the fly if you have the raw data you can also compute the RGB
But usually what we do the RGB we the RGB we pre computed for performance reasons And then all the others we compute them on the fly because I mean it's a lot of combinations It would take a lot of time to compute them all And if you pre-process your data, this is going to be fast If you don't it's going to be even slower than the normal data
Obviously because this is adding more computation every time you open a pixel But I mean if you optimize the data and you do proper caching, this will be super fast Drones I would probably Skip part of it. I told you I mean there is a lot of stuff The interesting part about drones you can use the same approach
We usually use it with time and fly to your idea because even from drone data you can more or less Make a conceptual model for similar data and have a container for it So you have a single layer and you select which scenes you want to be you you want to view or?
Process using flight IDs, etc, etc An interesting thing when you talked about raster data vector data Usually you have also a vector data on this side for example Processing that you did on the eosines or on the drone data, etc, etc So we need to find a seamless way to view the two things at the same time
We tend to use this approach of having what we call seeing lot single large tables Partitioned on the same dimension. We're talking about think about time Fly to your ID, etc So that you can access and we're talking about post GIS obviously quickly Potions of this vector data out of super large tables
You shouldn't be scared about having tables in post GIS So that's super big if you organize them properly and this is what we do part of the time Well, we are experimenting with is actually a mosaic in approach. It's not being released yet Similar to what we do for raster data using flight geo bar or other
streaming formats for vector data so we can put them in a stream and a lot of money Iot is similar things one thing that I want to say we have a number of companies that started with elastic search for IOT data and
Our experience is mixed to be honest It can be super good and super bad Depending on your experience and now you organize the data and now you want to visualize the data the one thing I can tell You elastic search is the perfect use case if you want to get a small data set out of an enormous data Dataset if you are trying to visualize 1 million points coming from elastic search
Maybe you're doing something wrong because it's you're moving 1 million days on points, etc Etc. So this is going to be super slow. There is nothing you can do You can actually do aggregation geo hashing directly in Elasticsearch, but if you really want to see all those points and I mean there are reasons for doing that
Something like post GIS is usually much better. Okay Call it a bit of deployment part. I mean this this would be probably workshop per se I'd say something and then I'll I stop I'll stop in like 10 seconds. I started from the resources because we could talk for hours about this topic. Okay?
there is a huge amount of Material and documentation on the web I Put together some things in the slides But this is actually the most important thing and there is no way there is no way around you need to have a look but first a few words
Cloud and you serve Yeah, you can do it. Obviously is not is not cloud native But I mean there are efforts in how to improve this like the juice of the cloud Extension and there is I mean we have been running geo server everywhere. So I mean and with a huge amount of data
I don't think anybody uses as millions of Concurrent users with any GIS application that I know but we have reached we've picked at 20,000 concurrent users So, I mean it can be done Let me see there is a Few things if this is important and then I stopped
Let's put things together optimize the data a good concept on model Caching and then you need scalability. I love it. I have a liability Okay, because first you get performance like speed and then you can you need to scale so you need to cluster possibly auto scaling Etc, etc If you do things properly
You don't need anything super fancy to scare yourself But because if you think about what I said, you keep interest you can keep interest in terabytes of data per day without having to touch to yourself as simply using the rest interface of publishing data in the database if it's vector data So you said that can auto-scale you can have your configuration you said that you can be containers that you can auto scale up and down
Continuously, that's why if you now put the pieces together We try to have the structure where you don't create new layers continuously, but you publish data to existing layers So you don't touch the geo server configuration geo server simply adds new data behind the scenes to the image mosaics or the post GIS and you keep serving and
If the data is incremental, you don't even need to truncate any caches Before seeing new data you can do that in background because data will fall out You know the interesting part and you can truncate the cache So this is the simplest way of caching geo server, we call it back office production and we use it in 95% of the time
and I think that's it As I said, I mean there is a lot of stuff but there are links documentation that we don't have any