Supporting precision farming with GeoServer: past experiences and way forward
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 351 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/69024 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache | ||
Produktionsjahr | 2022 |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
| |
Schlagwörter |
00:00
Güte der AnpassungInformationCoxeter-GruppeBenutzerbeteiligungVerschlingungMultiplikationsoperatorXMLComputeranimation
00:37
EntscheidungstheorieWorkstation <Musikinstrument>InformationEndliche ModelltheorieOpen SourceEchtzeitsystemProzess <Informatik>TypentheorieFokalpunktOrtsoperatorTermSystemplattformGebäude <Mathematik>Verzweigendes ProgrammKontextbezogenes SystemDatenfeldZahlenbereichVorhersagbarkeitPay-TVWorkstation <Musikinstrument>GamecontrollerSelbst organisierendes SystemDienst <Informatik>ProgrammierumgebungProdukt <Mathematik>Automatische IndexierungServerMathematikAdditionFlächentheorieElementargeometrieSchlüsselverwaltungProgramm/QuellcodeComputeranimation
03:56
Open SourceSoftwarePaarvergleichFrequenzMAPBeobachtungsstudieOrdnung <Mathematik>MultiplikationsoperatorVisualisierungEinsAutomatische IndexierungVektorpotenzialNeuroinformatikAnalysisSchnitt <Mathematik>ZeitreihenanalysePaarvergleichSpezielle unitäre GruppeKonditionszahlMereologieTechnische InformatikComputeranimationProgramm/Quellcode
06:38
Divergente ReiheSpeicherabzugSoftwareOpen SourcePASS <Programm>VisualisierungFormation <Mathematik>MultiplikationsoperatorZahlenbereichMereologieMAPBeobachtungsstudieÄhnlichkeitsgeometrieAutomatische IndexierungBitDatenkompressionBildgebendes VerfahrenDatenstrukturMosaicing <Bildverarbeitung>ClientVersionsverwaltungQuick-SortServerSoftwareMathematikOpen SourceCASE <Informatik>GatewayZeitreihenanalyseInformationDatenflussWeb-SeiteEinfache GenauigkeitSatellitensystemGeradeMinkowski-MetrikSoftware EngineeringDatenbankDimensionsanalyseProzess <Informatik>Endliche ModelltheorieCloud ComputingVerschlingungSichtenkonzeptDatensatzNormalvektorExplosion <Stochastik>PräprozessorPunktspektrumProgramm/QuellcodeComputeranimation
12:06
Open SourceSoftwareEnergieerhaltungFunktion <Mathematik>CachingBildschirmfensterClientAuswahlverfahrenMultiplikationsoperatorMosaicing <Bildverarbeitung>DimensionsanalyseBitURLDatenstrukturSkalarproduktProzess <Informatik>Automatische IndexierungInformationVerschlingungServerZahlenbereichFormation <Mathematik>LastPixelBildgebendes VerfahrenSchaltnetzOrdnung <Mathematik>Dienst <Informatik>GrößenordnungObjekt <Kategorie>TesselationStandardabweichungVererbungshierarchieMAPAlgebraisches ModellDatensatzTermSkalierbarkeitInformationsspeicherungCASE <Informatik>HilfesystemNeuroinformatikMini-DiscCloud ComputingComputeranimation
17:32
SoftwareOpen SourceFrequenzFunktion <Mathematik>Data Envelopment AnalysisProzess <Informatik>VektorraumEntscheidungstheorieEinflussgrößeAnalysisPaarvergleichInformationsspeicherungIndexberechnungDatenbankBitmap-GraphikProzess <Informatik>TabelleStreaming <Kommunikationstechnik>DateiformatDimensionsanalyseVererbungshierarchieMultiplikationsoperatorEinfache GenauigkeitMereologieVektorraumDemoszene <Programmierung>Endliche ModelltheorieÄhnlichkeitsgeometrieElementargeometrieEindringerkennungSingularität <Mathematik>ComputeranimationProgramm/Quellcode
19:10
GravitationsgesetzVisualisierungBetafunktionSoftwareOpen SourceSichtenkonzeptElastische DeformationPunktZahlenbereichInternet der DingePerfekte GruppeCASE <Informatik>BitElementargeometrieComputeranimation
20:09
StatistikVideokonferenzMeta-TagLoginSoftwareOpen SourceGraphikprozessorInformationsspeicherungCloud ComputingBitServerCloud ComputingKonfigurationsraumOffice-PaketCachingVererbungshierarchieMereologieZweiMaßerweiterungDatenbankSkalierbarkeitKartesische KoordinatenBenutzerbeteiligungRechenschieberGüte der AnpassungDatenstrukturMultiplikationsoperatorREST <Informatik>DatenparallelitätMosaicing <Bildverarbeitung>Demoszene <Programmierung>Wort <Informatik>Endliche ModelltheorieProdukt <Mathematik>ElementargeometrieVektorraumZentrische StreckungComputeranimationProgramm/Quellcode
22:45
Open SourceSoftwareStrukturgleichungsmodellGlobale OptimierungSichtenkonzeptVerschlingungProgramm/QuellcodeComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:00
Good morning everybody So I'll try to skip through the marketing stuff Because we have I'm telling you the presentation is a wall of text with a lot of information Because this was supposed to be a workshop at the beginning But then we didn't have time to prepare it because we have too many So we decided that we were proposing we decided to turn this into a presentation
00:23
So the goal is to give you information that you can follow up later on So there are a lot of links to other resources Webinars and things like that and we try to basically talk about our experience in supporting companies one way or the other They work with you serving precision farming. So you might know what you said there is so I skip this as well
00:49
Trying to set that say the context and the key concepts G solutions we Developed a number of open source products One of probably the most important one is to use ever other companies work on it other contributors
01:02
But still I mean with a very important piece of let's say work and business for us we support Right now and this a number of companies that are one way or the other working with a precision farming The type of data varies a lot and
01:21
Companies tend to focus on one or the other a few focus on all of them But most focus on one or the other some are working with EO data. They're more focused on longer-term Forecast and predictions some they use it for doing real-time processing and indexes some focus more on drones Some focus on field sensors metal station and song think about wineries. I mean when you have
01:47
Like super expensive wines you want to know the temperature and the humidity and everything I mean close to the sea the salinity many people don't think about that But it's an important thing to keep under control and you need field sensors on the on the wineries
02:03
Vacant positions is more about intensive farming It's not an important thing for example for Italy because I mean we have a few large Farms but US I mean Ukraine obviously Russia Brazil There it's very important. We work with companies that work with meteorological models and more
02:26
Deployments varies a lot as you can imagine because Some are small startups some are large organizations The focus on your debt some are actually subsidiary of larger Organization that decide to branch off and focus on precision farming so the
02:42
Things varies a lot But in general what we help this organization build. It's actually does solution data is a surface So they sell the data usually as subscriptions data information indexes whatever and we help them build this platforms Focusing on the data we saw before
03:02
So usually we're talking about a lot of data that it's continuously ingested or changes very frequently Processing is done on top to come up with additional information dashboards are that's charts whatever Because you basically have to monitor The environment one way or the other when you do precision farming and the tools that you use to
03:25
Work on the environment like the tractors of echoes and song Everybody's heard the stories you know about the tractors they were Remotely turned off in Russia, whatever Anyway So what are we going to talk about we're going to try and address some of the things we have seen that can be
03:46
Challenging working precision farming with you said and I mean obviously tools around geo server But we focus in most and you said because that's what what we do. It's too early to fall asleep
04:00
Okay, first thing your data. That's probably the most common And it's actually where all the your companies are pushing for so we have a tremendous amount of your data Both from the private sector as well as the public sector just think about Sentinel and Lancet And then think about planet etc etc so we had to find a use for them
04:23
The typical your scenario is multispectral data Sometimes it's also sad data Because again think about wineries yields you want to monitor Landslides or potentials landslides or land movements that you want to dissipate it Because I mean think about the wineries in Italy or in in France even in Spain
04:43
Think about the ones on the Moser River. I mean they are on steep hills, so you need to monitor them, okay? But most part of time it's fully spectral and now hyperspectral data I mean, I'm a computer engineer, so I don't know anything about the science behind it And I'm not going to talk about it most part of the time. It's actually pure visualization or
05:02
building the usual Indexes with map and bandage about whether and DVI Some companies they focus only on individual imagery although they ingest a lot of data But I mean they they they're always focused on the freshest data some companies They work more with deep time series because they want to do comparison and analysis
05:23
One thing that you might have seen for example talking about wineries because I obviously mean you're in Italy it's a good example because they do many different things and they usually have a lot of money, so which is not bad and They help you experiment right now there is this huge push in order to understand how to cope with the fact that the
05:42
temperature is rising and this poses a huge threat to wineries because Maybe you know where you don't but too much Sun. It's actually super bad for the wine Because it increased the alcohol percentage. It changes the taste, etc, etc so you actually to predict that in order to understand the exact period when you
06:05
Cut the leaves how many leaves you cut in order to not expose too much the grapes to the wine, etc, etc And so there is huge studies with longtime cities to understand what to do and what to do in the future And for example investors are starting to think about yeah
06:20
We should probably start making wine more wine in England or in these places where I mean at the Roman time They used to make wine because the weather was different, but now they don't do it any longer But maybe the conditions will be super good in a few years and there is money and the technical people to do it So I mean it's a it's also about blending
06:41
Okay Typically mistakes that we see and what to do as I told you there is a lot of text and then a lot Of links, I mean the number one thing is Understanding the data you want to serve with you said We usually get involved with the the usual question you said that is low what can I do etc etc
07:01
9% of the time people have not optimized the data. They are serving especially when you're talking about eel You should always optimize and there are recipes afterwards We won't give you a lot of details here, but I mean, you know what to do cheetah compression tiling
07:21
Overviews these are things you can automate especially if you ingest earth observation data I mean as far as I know yeah, that's sending a lot of satellites But the good sources they don't change continuously one you set up automation you can ingest Terabytes actually petabytes of data relatively easy nowadays with the tools that we have and the cloud infrastructure So this is the first thing that we should look into and for example one thing that we do for some of the clients
07:47
We for example Usually duplication is bad But in certain cases you might want to come up with visualization already versions of your data While you still keep the raw data. What does this mean? Let's say you have
08:01
hyperspectral data or you have Even sentinel which is 13 bands if I'm not mistaken It's always good to have some data, which is prepared for quick visualization Even if you duplicate which means you have all your raw bands so you can do whatever index you want on the fly But you keep RGB pre-processed compressed you JPEG etc etc which is super fast and super small
08:27
So as I've wrote at the end do not fear compression tiling overviews you need to do them you need to do that As Service possible in your process Sometimes you might need to sacrifice
08:40
Space like this space etc for performance duplicating a little bit And there is a ton of links of things that you might want to review if you're interested there is also this document here precision farming whatever which you actually brought for a few clients and We decided to put it on the open. So it's like probably 70 pages of
09:02
Let's say ideas and suggestion on how to do things you just have The other thing this is when the first thing you might say yeah, I mean, it's obvious Everybody makes these mistakes at the beginning not optimizing data. So It is not that obvious I've seen people trying to serve a 500 gigabyte
09:21
DTM out of GSI with no overviews a Single line image a geotiff. I mean I said it is low. I mean, I'm actually surprised you're able to see something Anyway, the other thing is you need to organize your data into yourself at one way or the other and this is true for any server I'll go directly to the similar simple example
09:42
You start with you serve and you start publishing you collect your data I don't know Sentinel tools and then one lands up and you start publishing a single layer every time you get an image At the beginning works very fast after five years you have ten thousand ten million layers Again capabilities explode because it becomes one gigabyte and most part of the time you're not able to generate it
10:03
It's impossible when you restart yourself. It takes five minutes, etc, etc Why did that these happens because you didn't I mean when you start with a database? Well, I mean they told me in software engineer you should do, you know normalization, etc, etc But you always do a little bit of modeling. You just don't throw all the time data the database
10:24
Maybe someone will not agree with no SQL etc But still it's always good to think about what you do before you do it in general So the same approach should be used with yourself you need to try and organize your data if you see that your data is actually a Harmonized flow of similar data that changes over time like a time series
10:44
You should probably build a time series into yourself So a single layer all day in a petabyte of data rather than one million layers Or the same amount of data because it will become impossible to search So again, this is a very simple suggestions But I've seen many companies struggling with this because then what happened you build clients on top
11:04
You make the assumption that the structures of the day It is a certain structures going back and changing the structure to use image mosaic like a time series layer It's almost impossible because you need to rewrite the client So you end up doing all of sort of gateway in the middle because you restructured yourself
11:22
But you can't rewrite the client etc. Etc. So you are even more layers of problems the solution we usually use in this case and there is a number of Information here is to use image mosaic We'll see a little bit more afterwards basically image mosaic and there is similar tools in other software like map cetera
11:43
Allows you to have instead of many many different layers when you are collecting some remote sensing data or we do it also with wrong data I mean whatever I said it I you have like this more or less current and you can add multiple dimension It can be time It can be other dimension for example for drones you would see we use flight ID
12:02
You can get more drones later and some the same container you should do it so you can end up having 10 15 20 different layers with a time dimension More or less constant and you can use a rest API to ingest and delete data. You could have for example a sentinel to data with the row bands and then you see there are ways to actually do
12:24
Bandage and build the indexes on the fly and you might have on the on the side an RGB sentinel to layer that you keep ingesting the sentinel data all over the world and you use time and And other Indexes sorry other dimensions by a WMS by a WCS etcetera to to access the data
12:44
So this is super important also because it will help you with clustering we'll talk about that later on and Then caching usually when people see things slow. They said we should cache Yes, but I mean it's not a mistake that I'm talking about caching after the rest
13:04
Caching is the last thing you should do before you have improved that anything else because otherwise We lied to the problems for a little bit and it will make them worse to solve afterwards Okay And people tell me the solution is caching our standard. Yes, but if you optimize everything else before if not
13:24
You need to go back. Okay, once you have optimized the data. The structure is more or less optimal Yes, you need to cache you usually start with tile caching So service like caching and then if you can you need to look into HTTP caching so browser caching if you know what I'm talking about. This is super important
13:45
Just to put together the three things People may say but if I cache things I don't need to optimize the data well Not always I mean in my experience actually you should
14:00
In very few cases, I mean you only rely on caching because under load Caching per se doesn't improve things it actually Augment the amount of work that you need to do on the server Caching works very well as I usually see if you reuse the cache a lot But it has an impact because you need to do some processing you need to do
14:23
Some stuff etc etc. So it's important that you optimize first What you need to look for if you put together the dots is actually a structure where your data Doesn't update but it's appended all the time So every time it's easier to catch because when you have a new image
14:40
You're not replacing a you something and for the same URL you get the same image If you use a time dimension things like that every time you look at something it's a different URL And this is the the perfect thing for caching I'll try to skip some of this to go to the deployment Well coke we use a lot coke lately here
15:02
There are a little bit of information on on how to use it now. We just have a super cog for S3 Google Cloud and soon Azure natively it works both for a Pure geotiff as well as image mosaic and if you can use object storage, I would use it
15:22
It won't give you a performance boost over, you know, super fast discs, but in terms of scalability, etc It's the way to go and I mean we have our companies go from for example AWS EFS to S3 you can save like an order of magnitude of money. Okay with decent performance
15:41
This is what I was talking about image mosaic again, there is no time to go into details, but There are links or resources for anything The goal when you organize your data is to have the number of layers in geo7 more or less constant But to add them as containers with dimension so you cannot continuously data to it
16:03
Possibly in a bent way or in a moving window way. So you are the remove or you simply add Okay, we have had clients with petabytes data set So they've been adding a lot of data for a for a huge amount of time But if you only focus on a normal window, like no casting or things like that
16:21
You can just have a moving window. But having them a moving window is important. For example For what I said before for caching So I mean dimension etc, etc These obviously may make your client a little bit more complex, but still This is interesting. It just have a support
16:43
map algebra so in order to reduce The amount of pre-processing you have to do for example again, there was an example at the beginning most of the Easiest indexes they can be computed on the fly if you have the raw data you can also compute the RGB
17:04
But usually what we do the RGB we the RGB we pre computed for performance reasons And then all the others we compute them on the fly because I mean it's a lot of combinations It would take a lot of time to compute them all And if you pre-process your data, this is going to be fast If you don't it's going to be even slower than the normal data
17:24
Obviously because this is adding more computation every time you open a pixel But I mean if you optimize the data and you do proper caching, this will be super fast Drones I would probably Skip part of it. I told you I mean there is a lot of stuff The interesting part about drones you can use the same approach
17:44
We usually use it with time and fly to your idea because even from drone data you can more or less Make a conceptual model for similar data and have a container for it So you have a single layer and you select which scenes you want to be you you want to view or?
18:02
Process using flight IDs, etc, etc An interesting thing when you talked about raster data vector data Usually you have also a vector data on this side for example Processing that you did on the eosines or on the drone data, etc, etc So we need to find a seamless way to view the two things at the same time
18:22
We tend to use this approach of having what we call seeing lot single large tables Partitioned on the same dimension. We're talking about think about time Fly to your ID, etc So that you can access and we're talking about post GIS obviously quickly Potions of this vector data out of super large tables
18:42
You shouldn't be scared about having tables in post GIS So that's super big if you organize them properly and this is what we do part of the time Well, we are experimenting with is actually a mosaic in approach. It's not being released yet Similar to what we do for raster data using flight geo bar or other
19:05
streaming formats for vector data so we can put them in a stream and a lot of money Iot is similar things one thing that I want to say we have a number of companies that started with elastic search for IOT data and
19:22
Our experience is mixed to be honest It can be super good and super bad Depending on your experience and now you organize the data and now you want to visualize the data the one thing I can tell You elastic search is the perfect use case if you want to get a small data set out of an enormous data Dataset if you are trying to visualize 1 million points coming from elastic search
19:44
Maybe you're doing something wrong because it's you're moving 1 million days on points, etc Etc. So this is going to be super slow. There is nothing you can do You can actually do aggregation geo hashing directly in Elasticsearch, but if you really want to see all those points and I mean there are reasons for doing that
20:02
Something like post GIS is usually much better. Okay Call it a bit of deployment part. I mean this this would be probably workshop per se I'd say something and then I'll I stop I'll stop in like 10 seconds. I started from the resources because we could talk for hours about this topic. Okay?
20:26
there is a huge amount of Material and documentation on the web I Put together some things in the slides But this is actually the most important thing and there is no way there is no way around you need to have a look but first a few words
20:44
Cloud and you serve Yeah, you can do it. Obviously is not is not cloud native But I mean there are efforts in how to improve this like the juice of the cloud Extension and there is I mean we have been running geo server everywhere. So I mean and with a huge amount of data
21:01
I don't think anybody uses as millions of Concurrent users with any GIS application that I know but we have reached we've picked at 20,000 concurrent users So, I mean it can be done Let me see there is a Few things if this is important and then I stopped
21:22
Let's put things together optimize the data a good concept on model Caching and then you need scalability. I love it. I have a liability Okay, because first you get performance like speed and then you can you need to scale so you need to cluster possibly auto scaling Etc, etc If you do things properly
21:41
You don't need anything super fancy to scare yourself But because if you think about what I said, you keep interest you can keep interest in terabytes of data per day without having to touch to yourself as simply using the rest interface of publishing data in the database if it's vector data So you said that can auto-scale you can have your configuration you said that you can be containers that you can auto scale up and down
22:04
Continuously, that's why if you now put the pieces together We try to have the structure where you don't create new layers continuously, but you publish data to existing layers So you don't touch the geo server configuration geo server simply adds new data behind the scenes to the image mosaics or the post GIS and you keep serving and
22:23
If the data is incremental, you don't even need to truncate any caches Before seeing new data you can do that in background because data will fall out You know the interesting part and you can truncate the cache So this is the simplest way of caching geo server, we call it back office production and we use it in 95% of the time
22:43
and I think that's it As I said, I mean there is a lot of stuff but there are links documentation that we don't have any