GRASS GIS 7: Efficiently processing big geospatial data
Video in TIB AVPortal:
GRASS GIS 7: Efficiently processing big geospatial data
Formal Metadata
Title 
GRASS GIS 7: Efficiently processing big geospatial data

Alternative Title 
Geospatial  Grass 7

Title of Series  
Author 

License 
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2016

Language 
English

Production Year 
2015

Content Metadata
Subject Area 
00:00
Point (geometry)
State observer
Word
Multiplication sign
Plotter
Drill commands
Interrupt <Informatik>
Time series
Representation (politics)
Grass (card game)
Sphere
Plot (narrative)
01:02
Point (geometry)
Slide rule
State observer
NimSpiel
Multiplication sign
Computergenerated imagery
Time series
Set (mathematics)
Temporal logic
Grass (card game)
Data analysis
Event horizon
Data management
02:14
Computer configuration
Direction (geometry)
Program slicing
Content (media)
Volume (thermodynamics)
Grass (card game)
Thresholding (image processing)
Resultant
02:55
Virtual reality
Process (computing)
Graph (mathematics)
Software
String (computer science)
Software
Disintegration
Open source
Grass (card game)
Process (computing)
Grass (card game)
Open set
03:16
State of matter
View (database)
Plotter
File format
Set (mathematics)
Grass (card game)
Function (mathematics)
Mereology
Computer programming
Neuroinformatik
Duality (mathematics)
Process (computing)
Extension (kinesiology)
Pixel
Library (computing)
Social class
Scripting language
Service (economics)
File format
Statistics
Parsing
Connected space
Internet service provider
Interface (computing)
Website
Spacetime
Point (geometry)
Dialect
Slide rule
Statistics
Functional (mathematics)
Disintegration
Maxima and minima
Modulare Programmierung
Planning
Revision control
output
Address space
Wireless LAN
Graph (mathematics)
Information
Software configuration management
Prisoner's dilemma
Projective plane
Ultraviolet photoelectron spectroscopy
Directory service
Grass (card game)
System call
Uniform resource locator
Personal digital assistant
String (computer science)
Function (mathematics)
Revision control
Social class
06:19
State observer
Pixel
User interface
Multiplication sign
Outlier
Set (mathematics)
Grass (card game)
Mathematics
Graphical user interface
Different (Kate Ryan album)
Semiconductor memory
Process (computing)
Presentation of a group
Workstation <Musikinstrument>
Algorithm
Mapping
File format
Linear regression
Point (geometry)
Bit
Digital object identifier
Vector space
Principal component analysis
MiniDisc
Physical system
Laptop
Spacetime
Point (geometry)
Laptop
Surface
Inheritance (objectoriented programming)
Divisor
Mass
Discrete element method
2 (number)
Number
Frequency
Computational physics
Term (mathematics)
Software
Principal component analysis
MiniDisc
Multiplication
Graph (mathematics)
Haar measure
Surface
Weight
Grass (card game)
Word
Nonlinear system
Software
Query language
Computer hardware
Calculation
Point cloud
Hydraulic motor
09:35
Point (geometry)
Surface
Operations research
Pixel
Mapping
Algorithm
Multiplication sign
Computer
Grass (card game)
Data management
Computational physics
Finite element method
Different (Kate Ryan album)
output
Queue (abstract data type)
MiniDisc
Physical system
Social class
10:31
Point (geometry)
Frame problem
Freeware
Similarity (geometry)
Grass (card game)
Ext functor
RAID
Mereology
Neuroinformatik
Data management
Finite element method
Latent heat
Veryhighbitrate digital subscriber line
Internetworking
Computer hardware
Cuboid
Process (computing)
Endliche Modelltheorie
MiniDisc
output
Pixel
Physical system
Electronic data processing
Debugger
Projective plane
Data storage device
Heat transfer
Computer network
Bit
Grass (card game)
Benchmark
Connected space
Intranet
Data management
Process (computing)
Software
Raster graphics
Function (mathematics)
Time evolution
Order (biology)
Chain
Revision control
MiniDisc
Communications protocol
12:36
Source code
Freeware
Link (knot theory)
Multiplication sign
Sampling (statistics)
Time series
Grass (card game)
Data analysis
Grass (card game)
Number
Mach's principle
Finite element method
Sample (statistics)
Natural number
Software
Software testing
Process (computing)
Freeware
Extension (kinesiology)
13:38
Googol
00:05
this whatever is of interest for you so there's also graphical representation In this brought you see point data are and then the could only continues but interrupted time series and so forth you if the X is that and and you get something out and you can see a short time series is complete or not and this is particularly interesting if you're dealing with millions of points for example or a long time series this plot here shows the chlorophyll versus time this is close to the uh that's in the south along the southern hemisphere modus observations and there have been analyzed to see all the chlorophyll evolves there over the various years and this is a broad I have done uh I would show in the next uh about modus land surface temperature reconstruction we have been doing before doing so a few words about vizualization
01:03
something you have already seen this in your animation toward included in in Graz yeah 7 you see on this slide at time series has been animated it comes from a mighty annual timeseries observations being done in North and Carolina the coastline these data are publicly available in Portland last year we give a workshop on that and you can search for this data set and who the exercises yourself so it's pretty easy to get something like this and you can see how this is said you would all this due is moving over time because it's transport it said as the sand transport by wind and you see also houses being built up on even disappear because they're probably distracted by some bad weather event and so forth then in for
01:55
another point of my 2 temporal data analysis if you have this is the the tsunami event in Japan in 2011 you can see uh for disaster management before and after the event have the slider and can visually compare what happened to get an idea about the graphical sorry about the impact in a graphical way Williams had already shown how
02:18
to look into a volume is not so easy so there's that 2 options 1 is to make slices in any direction which you can see over there another option is to all get a semitransparent vizualization of your volume content and this is another possibility and you can then the move around and if you have discussed kind of theater
02:38
like here in North Carolina where much of this about this is Helen on because so much of the park has been developed uh you can get the coastline vizualization really is something like real is on on our the threshold what she was developing this result is efficient OK uh connecting to other
02:59
software which is quite of interest also
03:02
grass has been added to the Processing toolbox I don't want to go into detail many or few when all that we have updated the tour books about 7 solar thing in the next release there will be a graph 7 entry as well so that you can I go
03:18
for that published only I think 24 hours ago the SP grass 7 the extension for our so you can now directly connected grass in our uh as before but now with the new graph 7 version I just made some plot elevation verses ideological classes you get over your data into the our space roster victories both supported and you get all draw boxplot and it's ordinal so like this you can really do so phisticated statistics in no time and new in 7 is there WPS support so if you want to all defined WPS processes the different software packages supporting paragraph 7 this project by w prison 52 not all of them come with grass providers maybe there more I don't know in and the interesting part is that each call mind can express itself in this extended style year so and this also applies to your own script so if you write a script and you make use of this kind of parser command you just cloning assisting script it's pretty easy to set up no it would also be dealt this kind of information here which you can then integrate in your workflow so programming where uh on sorry this before so this is just a quick view of what you can do that you don't really want to import your data always because this also of duplicate state space and if you will space occupation and if you have something like 1 terabyte in you would imported and get another terabyte of space consumption it's not that fun and for this with a command called I external vetoed external as well you just register the external datasets could be a duty for whatever in your address location which you can automatically create from the original data set as well and then you define as output you want you to which means at this point you say of the original data set is there you don't imported you just a graph where it is and tail everything which is calculated uh would be saved as GeoTiff in this case and then you do your computation are and you can see here I put the ending numerical this is equal some function and this will be using immediately appear as a GeoTiff in the directory which specified here so you don't bother uh anymore with the import export but especially interesting for WBS you just goes through and get your uh GeoTiff or whatever format you prefer holes and then you sees the connection you can make use of it so programming
06:04
then this annual Python API which I don't show here because this is being shown in the next talk so um just stay in the room and you can also read the slides I think later on from the website and now
06:20
there's some words about the there are massive data support what's massive just quickly you all this probably limiting factors are memory and limiting factors can be processing time if you have lots of data of disk space is something which is nowadays my would consider no more this solved in in in a period of terabytes and going toward that about maybe and larger support 5 says is also no longer an issue but what's an issue is and what can be solved in the software itself this applies generally to and use of obviously uh make it faster and term is an example of a query um how much time it takes if you increase the number of points you million points for example you have lied lied up on cloud anyone to query something within 10 million points but it should be fast and you can see the difference between 6 and 7 is so that it is really fast and is due to annual the format which has been implemented so the grass vector engine has been quite improved and you can also easily operate between a both formats there's the so that computational time in in the roster work this course surface calculation in graph 6 years this nonlinear roles of timeconsuming consumption which has been turned into a linear problem and this is something quite better you see my small laptop here so this is nothing fancy but but I can do work PCA so that is the principal component analysis of 30 million points in so what did I write 6 seconds on this mission so try this and some other software and I think it would take a little bit more of time it
08:07
so what we have done 1 we have been using a modus land surface temperature data so this is known example for a large data set 1 this out 21 I'm tired sulfur land surface temperature of motors and those are being and if you want to just move there to see more no problem for me the the so this is Europe you can hardly see it this is a 1 particular over past uh it has been co contaminated and what we wanted to do um to reconstruct the values which are not there and this is a fairly complex algorithm which we have been publishing in this they here and from there to there uh everything is done Multiple so outlier detection multiple regression also multiple regression is in 7 and you eventually get out this map so this looks like magic will come from here to there but what we do is but we only can we do not consider only the single map what we look back and forth and if the weights the closer we are to the observation itself the more weight we given the further we go on the less we do so maybe the day before the day after words that are not allowed in this particular pixel and we also assume of course that the season spoon and reckon the change so this is something which is so naturally to be considered here and
09:37
so this is an example 1 map out of 17 thousand maps at time so we've been processing the entire archival few world covering Europe each map is having something like 450 million pixels and to construct let's say calculate is mapped uh we have 9 different input map so we are might multiplying this uh . 9 we all most close to more although a 4 billion exist at this point and this is something which you can now easily doing class 7 and 6 4 so in 7 you can do that and this is now the animation of monthly averages out of the 17 thousand maps right so this is approximately including the average data
10:22
20 terrabytes so which we have new generating we used our class for this of you to let go of time I don't speak about the technical stuff too much but just to
10:33
give you an idea of what would be I mean this is what we have been setting up and maybe I would be happy to discuss this if you are doing similar things yesterday there was talk about blast of 5 system so we also using justified system here uh having small lowcost boxes each of them contains the uh forehead disks of 3 terrabytes this part years already something like 96 terabyte for the raw data the storage and then we have all the chassis here connected to the front end node and using a job manager we're doing the computation here and then we have tool highspeed devices as well for the gross data management and so on so if you are interested we can I discuss
11:19
some big data challenges where it so this is something I'm doing for many years meanwhile we always had the problem to saturate connections like connecting says iterating the Internet and W connection for example tuning the internal TCP protocol for that then we exceeded the ext street specifications so we switched to accept that we exceeded the except specifications and so forth no this is something the the more data you bet and this is naturally a problem also for the new center now data processing I think the chain the things changed a bit the 10 years of all your model maybe 15 you what legal data now we are almost sold by data which is a nice problem knowing we need to get our hardware and software right 1 but and so on and this is something which was a nice benchmark for us in order to uh C. if grass can handle this kind of data and so we would say we can do so now and I already mentioned the issue then run the project of the computations in parallel on having something like 4 billion points in 1 job but then you launch let's say a few of them in parallel then you really know if your uh I all works on so where's
12:39
the stuff everything is ready to use we are currently at the very least candidate number 1 so probably next what is today Sunday so in 2 or 3 days uh we will release the next release candidate and this is hopefully also the last 1 you get a free sample data to play with also the time the time series which I've already mentioned so that you can explore easily the including tutorial by the way exploring easily climate data analysis or relied does time series which of everybody has at home you can just download from
13:14
there and figure out the new features on this dedicated nature which is also link everywhere around but you're welcome to all the test it out if you don't do so if you I use of grass 6 is considered to operate rather sooner than later thank you thank you thank you