GRASS GIS 7: Efficiently processing big geospatial data

Video in TIB AV-Portal: GRASS GIS 7: Efficiently processing big geospatial data

Formal Metadata

GRASS GIS 7: Efficiently processing big geospatial data
Alternative Title
Geospatial - Grass 7
Title of Series
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Year

Content Metadata

Subject Area
Point (geometry) State observer Word Multiplication sign Plotter Drill commands Interrupt <Informatik> Time series Representation (politics) Grass (card game) Sphere Plot (narrative)
Point (geometry) Slide rule State observer Nim-Spiel Multiplication sign Computer-generated imagery Time series Set (mathematics) Temporal logic Grass (card game) Data analysis Event horizon Data management
Computer configuration Direction (geometry) Program slicing Content (media) Volume (thermodynamics) Grass (card game) Thresholding (image processing) Resultant
Virtual reality Process (computing) Graph (mathematics) Software String (computer science) Software Disintegration Open source Grass (card game) Process (computing) Grass (card game) Open set
State of matter View (database) Plotter File format Set (mathematics) Grass (card game) Function (mathematics) Mereology Computer programming Neuroinformatik Duality (mathematics) Process (computing) Extension (kinesiology) Pixel Library (computing) Social class Scripting language Service (economics) File format Statistics Parsing Connected space Internet service provider Interface (computing) Website Spacetime Point (geometry) Dialect Slide rule Statistics Functional (mathematics) Disintegration Maxima and minima Modulare Programmierung Planning Revision control output Address space Wireless LAN Graph (mathematics) Information Software configuration management Prisoner's dilemma Projective plane Ultraviolet photoelectron spectroscopy Directory service Grass (card game) System call Uniform resource locator Personal digital assistant String (computer science) Function (mathematics) Revision control Social class
State observer Pixel User interface Multiplication sign Outlier Set (mathematics) Grass (card game) Mathematics Graphical user interface Different (Kate Ryan album) Semiconductor memory Process (computing) Presentation of a group Workstation <Musikinstrument> Algorithm Mapping File format Linear regression Point (geometry) Bit Digital object identifier Vector space Principal component analysis MiniDisc Physical system Laptop Spacetime Point (geometry) Laptop Surface Inheritance (object-oriented programming) Divisor Mass Discrete element method 2 (number) Number Frequency Computational physics Term (mathematics) Software Principal component analysis MiniDisc Multiplication Graph (mathematics) Haar measure Surface Weight Grass (card game) Word Nonlinear system Software Query language Computer hardware Calculation Point cloud Hydraulic motor
Point (geometry) Surface Operations research Pixel Mapping Algorithm Multiplication sign Computer Grass (card game) Data management Computational physics Finite element method Different (Kate Ryan album) output Queue (abstract data type) MiniDisc Physical system Social class
Point (geometry) Frame problem Freeware Similarity (geometry) Grass (card game) Ext functor RAID Mereology Neuroinformatik Data management Finite element method Latent heat Very-high-bit-rate digital subscriber line Internetworking Computer hardware Cuboid Process (computing) Endliche Modelltheorie MiniDisc output Pixel Physical system Electronic data processing Debugger Projective plane Data storage device Heat transfer Computer network Bit Grass (card game) Benchmark Connected space Intranet Data management Process (computing) Software Raster graphics Function (mathematics) Time evolution Order (biology) Chain Revision control MiniDisc Communications protocol
Source code Freeware Link (knot theory) Multiplication sign Sampling (statistics) Time series Grass (card game) Data analysis Grass (card game) Number Mach's principle Finite element method Sample (statistics) Natural number Software Software testing Process (computing) Freeware Extension (kinesiology)
this whatever is of interest for you so there's also graphical representation In this brought you see point data are and then the could only continues but interrupted time series and so forth you if the X is that and and you get something out and you can see a short time series is complete or not and this is particularly interesting if you're dealing with millions of points for example or a long time series this plot here shows the chlorophyll versus time this is close to the uh that's in the south along the southern hemisphere modus observations and there have been analyzed to see all the chlorophyll evolves there over the various years and this is a broad I have done uh I would show in the next uh about modus land surface temperature reconstruction we have been doing before doing so a few words about vizualization
something you have already seen this in your animation toward included in in Graz yeah 7 you see on this slide at time series has been animated it comes from a mighty annual time-series observations being done in North and Carolina the coastline these data are publicly available in Portland last year we give a workshop on that and you can search for this data set and who the exercises yourself so it's pretty easy to get something like this and you can see how this is said you would all this due is moving over time because it's transport it said as the sand transport by wind and you see also houses being built up on even disappear because they're probably distracted by some bad weather event and so forth then in for
another point of my 2 temporal data analysis if you have this is the the tsunami event in Japan in 2011 you can see uh for disaster management before and after the event have the slider and can visually compare what happened to get an idea about the graphical sorry about the impact in a graphical way Williams had already shown how
to look into a volume is not so easy so there's that 2 options 1 is to make slices in any direction which you can see over there another option is to all get a semitransparent vizualization of your volume content and this is another possibility and you can then the move around and if you have discussed kind of theater
like here in North Carolina where much of this about this is Helen on because so much of the park has been developed uh you can get the coastline vizualization really is something like real is on on our the threshold what she was developing this result is efficient OK uh connecting to other
software which is quite of interest also
grass has been added to the Processing toolbox I don't want to go into detail many or few when all that we have updated the tour books about 7 solar thing in the next release there will be a graph 7 entry as well so that you can I go
for that published only I think 24 hours ago the SP grass 7 the extension for our so you can now directly connected grass in our uh as before but now with the new graph 7 version I just made some plot elevation verses ideological classes you get over your data into the our space roster victories both supported and you get all draw boxplot and it's ordinal so like this you can really do so phisticated statistics in no time and new in 7 is there WPS support so if you want to all defined WPS processes the different software packages supporting paragraph 7 this project by w prison 52 not all of them come with grass providers maybe there more I don't know in and the interesting part is that each call mind can express itself in this extended style year so and this also applies to your own script so if you write a script and you make use of this kind of parser command you just cloning assisting script it's pretty easy to set up no it would also be dealt this kind of information here which you can then integrate in your workflow so programming where uh on sorry this before so this is just a quick view of what you can do that you don't really want to import your data always because this also of duplicate state space and if you will space occupation and if you have something like 1 terabyte in you would imported and get another terabyte of space consumption it's not that fun and for this with a command called I external vetoed external as well you just register the external datasets could be a duty for whatever in your address location which you can automatically create from the original data set as well and then you define as output you want you to which means at this point you say of the original data set is there you don't imported you just a graph where it is and tail everything which is calculated uh would be saved as GeoTiff in this case and then you do your computation are and you can see here I put the ending numerical this is equal some function and this will be using immediately appear as a GeoTiff in the directory which specified here so you don't bother uh anymore with the import export but especially interesting for WBS you just goes through and get your uh GeoTiff or whatever format you prefer holes and then you sees the connection you can make use of it so programming
then this annual Python API which I don't show here because this is being shown in the next talk so um just stay in the room and you can also read the slides I think later on from the website and now
there's some words about the there are massive data support what's massive just quickly you all this probably limiting factors are memory and limiting factors can be processing time if you have lots of data of disk space is something which is nowadays my would consider no more this solved in in in a period of terabytes and going toward that about maybe and larger support 5 says is also no longer an issue but what's an issue is and what can be solved in the software itself this applies generally to and use of obviously uh make it faster and term is an example of a query um how much time it takes if you increase the number of points you million points for example you have lied lied up on cloud anyone to query something within 10 million points but it should be fast and you can see the difference between 6 and 7 is so that it is really fast and is due to annual the format which has been implemented so the grass vector engine has been quite improved and you can also easily operate between a both formats there's the so that computational time in in the roster work this course surface calculation in graph 6 years this nonlinear roles of time-consuming consumption which has been turned into a linear problem and this is something quite better you see my small laptop here so this is nothing fancy but but I can do work PCA so that is the principal component analysis of 30 million points in so what did I write 6 seconds on this mission so try this and some other software and I think it would take a little bit more of time it
so what we have done 1 we have been using a modus land surface temperature data so this is known example for a large data set 1 this out 21 I'm tired sulfur land surface temperature of motors and those are being and if you want to just move there to see more no problem for me the the so this is Europe you can hardly see it this is a 1 particular over past uh it has been co contaminated and what we wanted to do um to reconstruct the values which are not there and this is a fairly complex algorithm which we have been publishing in this they here and from there to there uh everything is done Multiple so outlier detection multiple regression also multiple regression is in 7 and you eventually get out this map so this looks like magic will come from here to there but what we do is but we only can we do not consider only the single map what we look back and forth and if the weights the closer we are to the observation itself the more weight we given the further we go on the less we do so maybe the day before the day after words that are not allowed in this particular pixel and we also assume of course that the season spoon and reckon the change so this is something which is so naturally to be considered here and
so this is an example 1 map out of 17 thousand maps at time so we've been processing the entire archival few world covering Europe each map is having something like 450 million pixels and to construct let's say calculate is mapped uh we have 9 different input map so we are might multiplying this uh . 9 we all most close to more although a 4 billion exist at this point and this is something which you can now easily doing class 7 and 6 4 so in 7 you can do that and this is now the animation of monthly averages out of the 17 thousand maps right so this is approximately including the average data
20 terrabytes so which we have new generating we used our class for this of you to let go of time I don't speak about the technical stuff too much but just to
give you an idea of what would be I mean this is what we have been setting up and maybe I would be happy to discuss this if you are doing similar things yesterday there was talk about blast of 5 system so we also using justified system here uh having small low-cost boxes each of them contains the uh forehead disks of 3 terrabytes this part years already something like 96 terabyte for the raw data the storage and then we have all the chassis here connected to the front end node and using a job manager we're doing the computation here and then we have tool high-speed devices as well for the gross data management and so on so if you are interested we can I discuss
some big data challenges where it so this is something I'm doing for many years meanwhile we always had the problem to saturate connections like connecting says iterating the Internet and W connection for example tuning the internal TCP protocol for that then we exceeded the ext street specifications so we switched to accept that we exceeded the except specifications and so forth no this is something the the more data you bet and this is naturally a problem also for the new center now data processing I think the chain the things changed a bit the 10 years of all your model maybe 15 you what legal data now we are almost sold by data which is a nice problem knowing we need to get our hardware and software right 1 but and so on and this is something which was a nice benchmark for us in order to uh C. if grass can handle this kind of data and so we would say we can do so now and I already mentioned the issue then run the project of the computations in parallel on having something like 4 billion points in 1 job but then you launch let's say a few of them in parallel then you really know if your uh I all works on so where's
the stuff everything is ready to use we are currently at the very least candidate number 1 so probably next what is today Sunday so in 2 or 3 days uh we will release the next release candidate and this is hopefully also the last 1 you get a free sample data to play with also the time the time series which I've already mentioned so that you can explore easily the including tutorial by the way exploring easily climate data analysis or relied does time series which of everybody has at home you can just download from
there and figure out the new features on this dedicated nature which is also link everywhere around but you're welcome to all the test it out if you don't do so if you I use of grass 6 is considered to operate rather sooner than later thank you thank you thank you