PROBA-V mission exploitation platform

Video in TIB AV-Portal: PROBA-V mission exploitation platform

Formal Metadata

PROBA-V mission exploitation platform
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
In januari, the European Space Agency launched the first version of the PROBA-V mission exploitation platform. This platform, which is fully operated by VITO Remote Sensing, has the goal to simplify the use of open remote sensing data which should eventually result in operational applications that benefit society. Exploitation platforms are the way of the future to handle the ever increasing volumes of remote sensing data, and at VITO we believe that the use of Open Source software is the only way to collaborate on this shared vision. In this talk, first I want to give a general overview on what users can do with the PROBA-V MEP. This involves using an Openstack VM loaded with FOSS software and direct access to the dataset, to access an Hadoop cluster where a user can distribute his processing using Spark. Secondly, I want to show how we are using Geotrellis to support interactive queries on the full timeseries of remote sensing data that is available in the platform. Also showing how this can be done from within an interactive Scala notebook in the browser.
Keywords VITO Remote Sensing
Satellite Expected value Presentation of a group Universe (mathematics) Computing platform Computing platform
Presentation of a group Service (economics) Service (economics) Execution unit Bit Cartesian coordinate system Product (business) Value-added network Number Product (business) Value-added network Estimation Computing platform Form (programming) Computing platform
Satellite Metre Satellite Service (economics) Online help State of matter Image resolution Library catalog Hand fan Product (business) Inclusion map Data management CAN bus Medical imaging Military operation Network topology Energy level Data Encryption Standard Process (computing) Computing platform
Satellite Service (economics) Open source Multiplication sign Time series Complete metric space Polarization (waves) Theory Product (business) Expected value Centralizer and normalizer Virtual reality Dedekind cut Software Set (mathematics) Conservation law Integrated development environment Physical law Process (computing) Analytic continuation Computing platform Physical system Point cloud Addition Service (economics) Algorithm Feedback Operator (mathematics) Cloud computing Bit Division (mathematics) Library catalog Instance (computer science) Scalability Product (business) Process (computing) Software Integrated development environment Data center File archiver Computing platform Point cloud Software testing Right angle Writing
Satellite Group action Open source Software developer Multiplication sign 1 (number) Combinational logic 3 (number) Grass (card game) Expert system Number Centralizer and normalizer Drum memory Computing platform Computer architecture Algorithm Software developer Projective plane Expert system Bit Line (geometry) Instance (computer science) Group action Cartesian coordinate system Category of being Process (computing) Data center File archiver Library (computing)
CAN bus Mobile app Supremum Mapping Blog Artificial neural network Electronic mailing list Computing platform 3 (number) Website Usability Cartesian coordinate system
Mobile app Raw image format Service (economics) Sine View (database) Connectivity (graph theory) Expert system 3 (number) Set (mathematics) Ɯberlastkontrolle Computer programming Product (business) Medical imaging Information retrieval Computer cluster
Service (economics) Mapping Service (economics) Time series Instance (computer science) Cartesian coordinate system Product (business) Product (business) Series (mathematics) File viewer Website Condition number Wide area network
Execution unit Geometry Mapping File viewer Website File viewer
Server (computing) Service (economics) Open source Time series
Execution unit Mathematical analysis Time series Menu (computing)
Covering space Dialect Software developer Expert system 3 (number) Time series Planning Parameter (computer programming) Instance (computer science) Revision control Cross-correlation Different (Kate Ryan album) Reduction of order Energy level
Point (geometry) Image resolution Polygon Time series Energy level
Area Algorithm Satellite Pay television Demo (music) Open source Projective plane Maxima and minima Time series Insertion loss Cloud computing Parameter (computer programming) Cartesian coordinate system Process (computing) Integrated development environment Personal digital assistant Internetworking Series (mathematics) File viewer Quicksort
Greatest element Open source Code Direction (geometry) Interactive television Virtual machine Time series Set (mathematics) Student's t-test Number Twitter Steady state (chemistry) Electronic meeting system Conditional-access module Point cloud Area Metropolitan area network Dialect Interactive television Expert system Sampling (statistics) Mathematical analysis Data analysis Bit Instance (computer science) Cartesian coordinate system Process (computing) Hypermedia Function (mathematics) File archiver Point cloud Resultant Laptop
Laptop Metre State observer Functional (mathematics) Statistics Server (computing) Identifiability Computer file Open source Multiplication sign Plotter Artificial neural network Set (mathematics) Time series Computer Computer programming Power (physics) Prototype Arithmetic mean Different (Kate Ryan album) Extension (kinesiology) Social class Metropolitan area network Key (cryptography) File format Software developer Expert system Data storage device Heat transfer Coma Berenices Database Cartesian coordinate system Connected space Message passing Kernel (computing) Process (computing) Query language Auditory masking Resultant Local ring Library (computing)
Metropolitan area network Collaborationism Service (economics) Raw image format View (database) Interior (topology) Variance Geometry Personal digital assistant Military operation Website Uniform boundedness principle Computing platform
Context awareness Direction (geometry) Combinational logic Replication (computing) Food energy Formal language Web 2.0 Bit rate Information security Algorithm File format Software developer Moment (mathematics) Keyboard shortcut Sampling (statistics) Drop (liquid) Bit Instance (computer science) Connected space Skeleton (computer programming) Data mining Arithmetic mean Exterior algebra Process (computing) Website Summierbarkeit Right angle Cycle (graph theory) Permian Laptop Point (geometry) Statistics Sequel Time series Regular graph Number Revision control Latent heat Selectivity (electronic) Computing platform Multiplication Validity (statistics) Interface (computing) Consistency Projective plane Expert system Plastikkarte Database Line (geometry) Cartesian coordinate system Inclusion map Kernel (computing) Query language Data Encryption Standard
OK let's move to the this
presentation think you have everybody and I'm married universities from fetal presenting the mission expectations platform for the probe of the satellite missions uh which we created for that reason
or which we are created
creating fact uh so all tried to explain what it is a bit then this that most of the application and try to explain as as good as I
can what the performance so we are fetal are basically Divita remote sensing products units and we cover the whole range of remote sensing and applications so we have our own platforms URI fees airborne uh we are making sense so uh for those platforms and then we have the number of value-added services that it like creating products based on the data to recourse to
so our most important platform is in fact a probe of the uh it's a satellite mission which captures the globe
daily a global image of a 300 meter resolution and and we are basically uh doing the user segment for that for reason so that data comes in each state we do
processing we do atmospheric correction and so on and we generate 2 level tree products that distributed through our catalog services and
basically that aside a user segment is has always been built up until now and so you you just provide a catalog and service where users can download the data and he just downloads it from there and then he has to do what everyone can what everyone with his own processing resources Of course this means that there is also a long feedback loop between the users and the user segment we also don't always know what they want or what are doing with our data so now also as those datasets become larger and larger this downloading of data over time becomes a bit of an issue so it's easier to bring the users to the data because we already have a fairly large data center Mariel's post all of this so if we simply at the processing resources than the user can just come to us only provides a cloud platform where they can do their processing basically that's the division of the user for expectation platforms we're not the only 1 that was doing that and there will also be others that we are emission expectation but because we had the mission is our central theme and but there's also dramatic expectation platforms like the 1 for forestry or polar expectation platform those also exist so that's the
idea um so what do we have in there is of course is complete archive of the probably satellite but also its predecessor of sports GT which provided that 1 kilometer products but there but if you combine those 2 you get the time series which starts in 1998 to get a pretty long time series for continuous monitoring of for instance vegetation and do that we at some scalable resource I theory sources using Goldman Sachs so it's really a Cloud environment that we have a and we at some additional processing resources using the Hadoop Distributed Processing uh system which allows you to really do Distributed processing right distributed algorithms on this data which is in this status and the so for instance the well-known to elicit a spark and so you can write spot shops but to all of the that process the style of of probe of the tank top of that we also try try to support you and try to provide documentation so does this expectation platform is really a big thing which contains just a whole bunch of things it's not just the software products are just a cloud platform is also that the user supported the conservative of course but if
you look at it from a technical side and then is basically a layered architecture where we have of course the full archive of property but we can also provide access to other satellite missions because sometimes you want to combine data from from various missions so it makes sense to have a way to also queried data so this probably is of course all lines it's very easy to reach very fast but but we don't have the full Central to dataset downloaded at our data center and then on top of that we have those resources with the OpenStack in there and all bunch of open source libraries almost everything we use this open source so I won't be able to mention each project to use by name and but it's the usual ones like Cheadle also more cutting edge stuff like chill trellis which can be used to to cross processing on top of Spark and how do I will show that later on uh but also for instance you can very easily use cross and using those libraries of course you as user will have direct access to but it also build applications on top of that time and those applications is what I will try to demo
right now uh yeah so the idea is actually the we have some really end-user applications for that our really accessible for a large group of users like if you're a journalist or if you don't know anything about remote sensing about vegetation wondering you can still use them don't but of course you have less flexibility than if you go for more flexibility we have I bite Olympics where remote sensing experts who knows a bit of Python that can use different algorithms that he knows the the Bible modules that he knows how to do processing but also using spark so using these distributed resources to make things really go quite fast uh on on this large datasets and then if you go down all the way we also provide support for developers so you will be able to plug in your own applications let's say if you have a number and you want to provided to the user but the algorithm requires a dataset and you will be able to deploy a full-service inside the platform so you can really build your own applications and we
will we have sought some examples of that well of that
miracles uh so if you go to the website this is the
that's probably thought use evolved and that a simple web site and then here we we have the list of applications so the 1st 1 to see which is the
easiest to understand and which also shows that data set is a simple fewer uh so you can just few did the global image retrieval regarded here it's in July but that's it always contains the 2 latest data and you can post so if likely congestion it updates of course you can zoom in so of course viewing is not something the remote sensing expert dust very often but for a really was more a regular user selects this is already a nice we also in the of cars in a
standard which is a then daily composites which you can also view so you can easily go to real data set then of course and the next thing to do or maybe I should say something else 1st we have that provide the data but you also involved in the global and components of the apparent Copernicus program and for that they also some data products before that we build similar services so here if you
go to the Copernicus global and service website and you can now access those server services and
for instance there you can do the same thing but for those products and these are really
products about sometimes a long time series of various vegetation in conditions that you can view 1 there so that there's also been a been released is basically the same application but both forward to Copernicus products and the next step is that
we will just provide those layers also in the whole directly on the probe of the map website which makes more sense then the next application of collective to
do so that if you were
that the design of the
viewer is basically all open
source I want you to deep into detail because there
are much better at you server talks over here but is this cluster of GeoServer and in front of it to it's own marker server for W and service for fostering of reptiles and then we have everything around that due to monitoring and so on and of course if you have any questions about that all year for all of the conference so don't hesitate then the next step is that the time series fewer which is more
interesting because it
provides access to computed time-series as I said for for providing the data starts in 2014 so it's a relatively short time series for Copernicus it starts in 1998 sometimes so really if you want to do vegetation analysis it's
a nice tool to get a quick
idea of differences between the regions here it shows the time series at the country level on the map but if you zoom in which makes more sense to do that you can click on the regions and queried at the time series for a region and also for a specific plans cover that usually make smart most sensors do because otherwise if you aggregate over all off entirely different land cover covers together and just something which doesn't say as much and so this I mean I was all that the remote sensing expert myself and I started but when you process to reduce time series you do see the differences between regions and you get to a good feel of what various parameters mean here we only have any EVI and you can combine it with the rainfall for instance to get an idea correlation between new and rainfall of course then if you go to the to Copernicus version you see that here the time series is much longer and also you have more parameters available so if you combine the developer with the best cover for instance you can't you just get an idea of how they behave but usually done then the next step because what people usually do in their research is that they have a very specific area of
interest that they are researching a certain parcels where they need to query the time series and that's what
you will be doing next so that you can just draw your own polygonal or maybe upload shapefile or click on point and you get dead time series on the fly that's the next step at this level it's still yeah aggregated which is not good enough for a for a lot of people but then if you can really get all fly you can really know a lot for your own garden how it has been behaving over the last years only problem is the resolution is 1 kilometer so you need a fairly big cotton that's the time series
fewer applications so here it technology wise
we build it using openly 6 years Eurex uh there's a kind of 0 look you wanted to serve to data with at the time series data very fast and and the time series themselves are extracted from the imagery using our Hadoop distributed processing environment so yeah that's basically all open source this is the demo in cases bad don't have internet that and then loss of edition which I will not really show is an example of but how we want to provide algorithms to the community as an example of how you can integrate 1 of so a very easy example is the only amount composite thing we provide but then they composite with provide the with a specific algorithm but not everybody may be happy with that some users want to use to sort of algorithm or want to use 5 days and some of them this whatever using this tool you can provide all those parameters select your area of interest and generate a composite and then the next step which you can also do is create a subscription so you say OK whenever new data becomes available I want to have this this new composite and that's an example of an application that you can also write yourself as a developer and if you have your algorithm that you want to integrate you can just deployed on the cloud platform potentially even writing life at like this and other people have access to to your new invention business if you if you want to use this when you need to log in on the bottle it's free to use this you create a lot in in Europe when you register and then you can actually do this but basically it's a matter of filling in the parameters of that later you will get your project then
the next step if we go to remote sensing expert uh who really need flexibility is what we provide is a toolbox with a bunch of open source tools available so it's a regular fee and almost cloud like you can get on animals but with number of tools preinstalled huge areas cross everything and with direct access to the data archive all this so plain excess this will be don't normally available in Baltimore we will try to somewhere and the September there were released this sort available from what is available right now is the toolbox on older clouds with a sample dataset but it's similar so here you see I we just preinstalled uh the tools so you don't need to worry about that anymore finally the 1 application I also really like a of the node booster very popular nowadays for that as a way to do reproducible research so instead of writing a paper on complementary to that people can write node looks to publish their results which are basically just bits of code and based and the results fall in 1 thing and it's interactive and me just switch to do no books are so if 1
example there where we didn't we analyze the time series which we have for a time series you work so the precomputed time series for all the world there an interesting data sets to work with if you want to say to trend analysis and then what you get out of it in the end is that you find 2 regions in the world where there's a clear upward or downward trend now I have to add a disclaimer that this was not really done as a published result are peer-reviewed or whatever so it's just an example of what can be done so like for instance here at the bottom you just see that for the downward trend again just some noisy datasets which probably don't mean all of them but if you yeah you guys probably also want to see the code to that uh let me see where I can show that the goes here for instance we use a spark yep yeah it is so here's this here's how you you look at your data in distributed data set and you can filter only because the node book disconnected student to cluster uh so you can really do about powerful processing also machine learning for instance uh and then the final them what I want to show
you with that yield trellis so this is a scholar book uh because your trousers written and scholar and Jupiter notebooks Server also supports colic kernels that the only thing we did not get to work here was the connection with the cluster so when the process here it's only Jupiter observer itself and this because the notebooks application will only be released but somewhere around January next week so this is really really a prototype over some things just need to be worked out as uh so here I really show how you can really just below the bunch of open source libraries in your notebook using maven identifiers you add them to your class but then you do some spot in any initialization this can all be be copy pasted so we don't really need to be the real expert in that and then you can start using is that you trellis library loading the data is very simple because we preloaded that into like Khumalo database which is a key value store on top of up so we benefit from data locality so you don't need to worry about how do I open my files etc. it's just already in there so you write this query where you specify your your time of interest in your extent and then you get this data set containing all of your time the program then and if we go with 4 I will post this on the website so that you can have a look at a later time so here you see I you have these are DTDs which is a spark concept which is basically a collection of distributed collection of something and you have those and then finally what tuples provides on that is a method to compute statistics so we I define my function to compute mean values uh and here it's it's it's just 1 method called basically to 1 0 you need to provide a mosque and and put it in the right format so there's some good in between the cost of the 2 1 0 that is just 1 month of golf to computer stats for specific so means provided in a mask over your time series of data and then because it's that wouldn't be knowing look if we could not altered so we also always need to have some nice plot so eventually I could run it on 3 different datasets and so on and EVI of 2 different datasets and EDI at 300 meter and the of our power from Copernicus at 1 kilometer and yeah yeah you gonna see the results but it's it is not really about the results here and it's more about trying to show the technology and what are the real experts could do with him and good I think I need to conclude the yeah there's a
development is still ongoing that's the most important message here and actually during the it's still a
pre-release so in optimum that will be the real 1st release always looking for further collaboration because the more people reading it on platform Mori use cases we can get to make it
really work this a website
that have what does this is saying for
can think you the my idea was going
but the last 1 was only a bit of data that I could have shown you how I actually could run the selection that works uh and it doesn't take so long because I use a very small mosque and here in in the that's the reason why I wanted to connection with up to working my way of you can use see the the the problem was the skeleton the combination had getting the scholar kernel of Jupiter has to recruit due trolls and with spark uh but there are also other notebooks alternatives available at the Apache supplement for instance that's already integrates spark and another directly so there that works for sure so we're still looking at what direction to go where we get this to work eventually or do we have just have to go for for another thing and I think the problem was assumed the dependency issue because if you run stored locally it's simple if you distribute goes then you get conflicts if you already have certain glosses on the cluster with a different version and so on and uh so that's the of thing you have to fix if that you want to come yeah well you could do to things of cards you have that that time series data that the extracted time series are available on the platform so then you're working on the aggregated data but there you can do statistics between the rainfall and any other that the energy I for instance and then the other thing is of course that if you don't the rainfall imagery is also available the platform and yeah and from there you you can do whatever you want of course you know that you have it uh culture if I can answer that the a multi experts I I wouldn't say no because there there has been some validation 2 instances I think that that's all I can say the show was the and then the because all scholar it's but we also by the little books because we really tried to we know little lot of people in the remote sensing community use bottles you have to support uh but for instance your trousers written scholar dozens of Python bindings but it's also a very advanced interface so yeah were looking at the ideally we could provide by the mining but the that's really something to work out over the coming years ideally from this you could go to some kind of from Google Earth engine like the interface maybe if we work a lot if had uh with less resources of course and then provides something in number of languages but for the notebooks so with not all algorithms that use IA Camillo database because also all does and they're all the data is on this kinda regular intuitive 5 formants and for specific datasets to load them into a committal because of this duplication of data so you have to if you need to resources for that and you can you cannot just put any dataset narrative comes at a cost of but once it is in there because of the right now we're thinking about putting more so the stands composites in there because it's very interesting to do interactive uh time series analysis on on like act in a criminal so and stance or smaller than this 1 and then daily problems goes only everything so you is from I know miss uh but I haven't really dug into it and it's kind of similar uh it would be interesting to also try this uh but it's an for not the reliability of a kernel and we have had some issues with versions but we think it's a little bit of work context that should be a means it's quite stable but the rate of the really been working for the past month let's say on getting up to cluster running on getting security everything uh so the Faisal of issues everywhere the the axis of the the chart for consistency how he will charge right now it's still for free because it's an is a project we develop 2 for uh and we have resources to support the the number of users but it's not Google Earth and you don't have unlimited resources and at some point also when you started to a lot of processing that we made a start to think about it how too to set up something where you paper user something of a line this still needs to be DVD figured out what will be offered for free and what has to be a bit for I would say I don't hesitate to use it because when it's not used at all because the thing that you might have to pay a sum of point and it won't get anywhere so it's it's 1st step is to get users on the right we have to think about the sustainable thing to keep the running of course there somebody has to come up with a bit of my however because use it if you can just to register on the website right now what you get when registering is mostly the 2 and the composite replication and this toolbox that sample data so not that much in all we will do our 1st release it will be on the web site but if your user you might also get an e-mail uh and then you will get more available and full toolbox the with access to the Hadoop cluster normally and then notebooks will be at a later point so it's released a person In a just it's a combination of cycle to translate MDX queries into CQL queries and the moment the forward sequel database it's called the database which is quite fast cycle yeah it's just a business intelligence told basically from that base I you you uh it's the web and it has a maybe it does this cycle API in fact so it could be offered more flexibility although we don't know I didn't really think of doing that just yet to be done and also what we do is but the data on the Hadoop platform it's in pocket file format so inefficient file format which you can also process easily using spot so then you can just work on their which very fast access to the the full all the time series and do what everyone chapter there has been there getting the composite the application is a WPS application but it's also to the idea to use it more in the future of for all the interfaces also when developer implements its own algorithm and wants to expose and we would like to expose the true the WBS uh stand uh maybe built and aggregated WPS which combines all of the algorithms that have been developed and make it very easy also for developers to register their new I think the new order and that they don't have to worry themselves too much about how to expose it as of OK I can thank you thank you


  238 ms - page object


AV-Portal 3.20.2 (36f6df173ce4850b467c9cb7af359cf1cdaed247)