The Open Data Cube in a Box

Video in TIB AV-Portal: The Open Data Cube in a Box

Formal Metadata

The Open Data Cube in a Box
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
The Open Data Cube in a Box
Goodness of fit Cube Cube Open set Cuboid Open set
Satellite Point (geometry) Open source Length Motion capture Raster graphics Open set Perspective (visual) Number Twitter Natural number Different (Kate Ryan album) Library (computing) Source code Potenz <Mathematik> Satellite Graph (mathematics) Volume (thermodynamics) Cartesian coordinate system Exploit (computer security) Cube Website Object (grammar) Volume Library (computing) Spacetime
Complex (psychology) Server (computing) Email Programming paradigm Computer file Range (statistics) Data storage device Bit Volume (thermodynamics) Mereology Equivalence relation Web 2.0 Point cloud Object (grammar) Communications protocol Geometry Physical system Point cloud
Link (knot theory) Digitizing Weight Data storage device Cuboid Bit Streaming media
Point (geometry) Satellite Software developer Multiplication sign Tape drive Projective plane Workstation <Musikinstrument> Source code Data storage device Bit Open set Raster graphics Mereology Open set Supercomputer Revision control Software Cube Repository (publishing) Cube File archiver Library (computing)
Laptop Mapping Connectivity (graph theory) Data storage device Price index Database Bit Open set Cartesian coordinate system Open set Component-based software engineering Subject indexing Web service Cube Cube Software File system Entropie <Informationstheorie> Object (grammar) Sinc function Installable File System Physical system Library (computing)
Satellite Mehrplatzsystem Set (mathematics) Data storage device Open set Instance (computer science) Template (C++) Subject indexing Performance appraisal Integrated development environment Cube Permanent Cube File archiver Cuboid Integrated development environment Software testing Software testing Laptop Physical system
Scripting language Point (geometry) Curve Link (knot theory) Code Projective plane Price index Black box Open set Open set Template (C++) Integrated development environment Cube Cube Subject indexing Point cloud Cuboid Point cloud
Area Covering space Pixel Graph (mathematics) Vapor barrier Observational study Digitizing Real number Multiplication sign Gene cluster Set (mathematics) Line (geometry) Twitter Data mining Personal digital assistant Term (mathematics) Personal digital assistant Data mining Website
Curve Implementation Visualization (computer graphics) Mapping Integrated development environment Software developer Code Cube Software developer Open set Cartesian coordinate system
Point (geometry) Implementation Server (computing) Computer file Software developer Code Multiplication sign Patch (Unix) Time series Water vapor Black box Open set Metadata Software bug Attribute grammar Product (business) 2 (number) Architecture Goodness of fit Performance appraisal Different (Kate Ryan album) Semiconductor memory Term (mathematics) Cuboid Software testing Computing platform Computer architecture Curve Software developer Code Bit Volume (thermodynamics) Database Performance appraisal Subject indexing Integrated development environment Query language Cube Software testing Quicksort Musical ensemble Metric system Table (information) Library (computing)
well thanks Adam good morning folks so I'm here today to talk about the open data cube and some of the work that we've been doing at frontier so to make it easier to use so start off by
answering a question what is the open data cube so we have a long definition which comes off the website and I read that out here so the objective of the open data cube is to increase the impact of satellite data by providing an open and freely accessible exploitation tool and to foster a community to develop sustain and grow the breadth and depth of applications my nutshell explanation of what the open data cube is I guess from a technical perspective is that it's an open source Python library that facilitates working with large volumes of raster data so which data does it
facilitate working with well Earth Observation data and radiant earth have a nice graph here showing the number of satellites that have been launched to capture Earth Observation data now last year there were twice as many launched as the previous year and there was pretty much that's twice as many launches as the year before that and it's probably fairly likely that there's going to be an enormous number of satellites launched this year next year point is that there is well it looks like one of those exponents you know it's a trend that's going up it's getting easier nature so there's gonna be there's a lot of data pouring down from space now and there's going to be a lot more what that means is we've got more data we need to be able to access it easier and there's a lot of different ways that that's getting easier to access that data one of the things that
makes it easier is an emerging paradigm which is storing data in an object store like Amazon's s3 or equivalents in the Google cloud or the Alibaba cloud this is a part of the HTTP protocol called the get range request which allows you to ask for a file off a web server and just ask the part of that file so you don't have to download the whole thing and so some clever people extended geo TIFF in a backwards compatible way to add a little bit more to the headers and destruction of the data internally so that you can go on access the parts of GA tips that you want while it stays there stored up on the cloud as
we all know the cloud means that's all that complexity becomes somebody else's problem which as some of the implements technical systems is really nice when you're gonna offload some of that so hogs are a great way to store enormous volumes of data somewhere which means that I don't have to manage it someone else is managing it and I can go and access it so how can we use cogs
well huges 3 comes with support out of
the box so you can go to s3 and go to somewhere like digital earth Australia's data store and have a look at all of their tips and finally their tips copy the link you can paste it into QGIS and you can start using it other people for example a week 2 spoke to a fellow from Queensland and he stores a 600 gigabyte gif on s3 and just browsers it streams it off the net into QGIS so it's a way of storing very large bits about her and just accessing some good news for us is
that cogs are supported natively in the open data cube very convenient so back to the question what is the open data cube well a little bit of history a long
time ago Landsat satellites were launched and they didn't have a lot of onboard storage so they used to dump data down wherever was convenient one of the convenient spots on the other side of the world from the US is in Willmar where there's a antenna or a satellite dish wherever you call it and so Australia is a dead downlink station for bunch of data that got sent over to Canberra somehow that got sent back to the u.s. somehow and in Canberra how the folks put it onto tapes and store it in some kind of deep final repository there was a project in the less distant past called unlocking the Landsat archive where these tapes were digitized and the data stored on an NCI and the supercomputer and as part of that project there's a bit of software written called Australian Geoscience data cube which is used to be able to make that data more easier access at one point that was rewritten to make it not only work with Landsat data and work with other data sources and to work with other projections other than I think Australian albers which is was used for that and it was called the AG DC version - very creative soon after that it was renamed to the open data cube and that's where we're at now so it's a software library that enables access to vast quantities of raster data
technically it comes in a few components we have the open data cube Python library itself there's a Postgres database which contains an index pointing at actual data whether that's on a filesystem that's something that you can access or one of these object stores it's up to you after that you build applications on top of that whether that's a data science kind of application like Jupiter notebook where you exploring the data some spatial web services so jess since australia's didja left Australia folks build a WMS system on top of the open data cube serving data directly out of s3 into things like the national map or whatever you want it's a Python lab where you can do whatever you want you can build your own bespoke tool so that's what it is now
how can we use it well we've been doing a little bit of work at frontier cider make it I just disorder illustrate how you can implement and how you can use it so one of the ways that we've put
together is I think we're called we're calling the sandbox and we have a test environment there which we've launched for Australian data and we also have another one testing which we've launched on a global data set using Jupiter hub which enables a multi-user environment for Jupiter we've indexed data off s3 which means the people like myself and Felix over there who maintain the system don't have to wrangle terabytes and terabytes of data somebody else does that somebody like Andrew over here thanks Andrew and it means that people can turn up on a website log into this system and start exploring Geoscience satellite imagery and vast quantities of satellite imagery very easily we've also
put together a thing called the cube in a box the title of this talk which is a docker docker compose workspace that's up on github you can download the code and if you've got darker and darker compose installed in about five minutes you can launch yourself your own little open data cube environment including indexing data off a Landsat 8 global archive you can actually include a bounding box and index any data from anywhere in the world and be up and running and exploring that data really easily we also have a template for cloud formation so you can click a button if you can they'd obviously account AWS account and launch your own open data cube up on AWS again from clicking the button launching deploying an ec2 instance indexing data giving yourself an environment takes around seven minutes this here is an example of how
long it takes to launch the open data cube the kind of point of all of this is to get the learning curve of the open data cube which is fairly high and try and make it shallow so you can get up and running and started with an environment without having to understand everything the Kuban a box is an infrastructure as code projects which means that all of the code that's required to get this magic happening is there inside the box and so maybe it's a black box but you can open up and have a look and start learning and reading the scripts that are used to get it all working as I said up in the cloud
doesn't take that long and you can get yourself a scalable environment that gives you access to all the data so now what can we do with the open data cube
after that well there's some simple case studies like a mining example and this is a real example of data from digital earth Australia and here we have a mine site and what we're doing is we've got an area on the right-hand side which is a mined side being rehabilitated and on the left side we've got an area of native vegetation and we can quickly
investigate that area and do a trend over time and have a look at how things like bare earth is changing so this is this is out of a data set called fractional cover which is a classification of pixels into the various clusters so in this case the mine site at the top of is the blue line and so there's a small bare earth there and over time it's getting closer to the orange line which is the the forested area and so what we're showing there in that graph is that the mindset actually is being rehabilitated there's less barrier so it's being read vegetated and so we can use something like that to monitor a mining company's obligations in terms of rectifying eyesight the
other example is WMS and so this is an example of the National map visualization using the data cube and you can access this across the entirety of Australia right now another sort of
advantage of the Kuban abox environment is something called the developer experience you've heard of user experience I guess you know like focusing on the user and making sure that they find an application easy well there's this idea of developer experience which as a developer and implementer myself I'm pretty interested in and and I think that's an important opportunity for us at frontier sighted to put forward for the communities is building tools that illustrate how to use the open data cube how to implement the open data cube and to increase to increase or make improve the developer experience so we've got code examples and these infrastructure examples that demonstrate how to use the open data cube to try and again make that learning curve a little shallower so in terms of
outcomes I think that we've shallowed we've flattened the learning curve a little bit with one infrastructure is code which documents the architecture we've got environment where users can worry about using and not deploying this infrastructure our developers and implementers can open up that black box and really explore inside how we've done it so they can do it themselves and it means that testing and evaluation is easy doc is great you know there might be a bug in a point release of a of a a library that's used by the open data cube and we can install that specific one in to docker and test that environment and then blow it away install a different point release and see how they work it's a really good opportunity and in conclusion cogs make
the data easy the open data cube makes accessing that data easy the Kuban of box makes the open data cube easy and you can come and join us on slack or on github thank you very much for your time oh I'm quite interested to look at in the next in seminar because I'm I'm not quite understand Shaw but I think we can do rank requests on the jpeg2000 that is how easy is it to use for your users yeah with the data cube the second one is do support and do support rest attribute tables good questions the first one is really easy so you need to get a product definition and then get a little bit of metadata around at each of the four P files and then do the index into the database and then even used to open doc you I do I would be query for your second question I probably not the best to answer that very slightly second question or amber you mean the cube so suggested you've got one month and nitrogen has got to be the third time Angeles Times at the time series and could I have some like a cube which has all sorts of cervando metric data and so across time or you but also don't be deceived by the names know that the data qubits data cubes so so so like they essentially have as many fantasy-like and but they're all individual cubes on their own and and they all fit together it's an interesting discussion about where the cube seeds is it isn't the data that's on the server that's the cube or is it when you load in the memory so work [Music] well I think it's a platform for long times certainly around making your abstracting access in the water and putting anything behind maybe a lever to makes it easier in terms of large volumes of data again they're working people and used to go to slice and dice and patch it up and send off to processes like and use opened on the pipeline for doing that is that the dusk library the parking which i think is a really interesting way of exploring doing that and it certainly is used operationally in sort of how to awaken CSI Australia so our own


  384 ms - page object


AV-Portal 3.20.2 (36f6df173ce4850b467c9cb7af359cf1cdaed247)