Open Source Spatial Tools for Biodiversity and Environmental Data

Video in TIB AV-Portal: Open Source Spatial Tools for Biodiversity and Environmental Data

Formal Metadata

Title
Open Source Spatial Tools for Biodiversity and Environmental Data
Title of Series
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
Open Source Spatial Tools for Biodiversity and Environmental Data in the Atlas of Living Australia
Loading...
Protein folding Trail Data management Channel capacity Software developer Multiplication sign Cellular automaton MIDI Mathematical analysis Computing platform Position operator Wireless LAN
Information Open source Software developer Open source Memory management Database Staff (military) Open set Area Word Uniform resource locator Strategy game Species Species
Building Code Memory management Core dump Data analysis Data analysis Element (mathematics) Explosion Software Visualization (computer graphics) Repository (publishing) Integrated development environment Process (computing) Computing platform
Trail Context awareness Computer file Decimal Augmented reality Field (computer science) Data quality Data management Different (Kate Ryan album) Term (mathematics) Profil (magazine) Core dump Software testing Process (computing) Computing platform Descriptive statistics Numerical taxonomy Electronic data processing Email Standard deviation Information Structural load Digitizing Memory management Core dump Data analysis Motion capture Cartesian coordinate system Data management Loop (music) Integrated development environment Computing platform Species Quicksort Row (database)
Slide rule Group action Codierung <Programmierung> Video game Set (mathematics) Price index Directory service Augmented reality Mereology Data quality Different (Kate Ryan album) Visualization (computer graphics) Core dump Authorization Integrated development environment Software testing Information Process (computing) Endliche Modelltheorie Reverse engineering Data Augmentation Numerical taxonomy Electronic data processing Distribution (mathematics) Information Building Uniqueness quantification Electronic mailing list Fitness function Data analysis Database Bit Motion capture Subject indexing Numerical taxonomy Uniform resource locator Process (computing) Software Integrated development environment Network topology Quantum mechanics Self-organization Software testing Right angle Species Collision Quicksort Row (database)
Service (economics) Group action Service (economics) Information Distribution (mathematics) Debugger Data analysis Field (computer science) Database Motion capture Front and back ends Web 2.0 User profile Uniform resource locator Web service Different (Kate Ryan album) Object (grammar) Species Information Species Service-oriented architecture Row (database)
Area Probability distribution State observer Identifiability Plotter Mathematical analysis Analytic set Principle of maximum entropy Special unitary group Mathematical analysis Motion capture Variable (mathematics) Front and back ends Type theory Data mining Visualization (computer graphics) Electronic visual display Species Species Quicksort Electronic visual display
Point (geometry) Email Stapeldatei Standard deviation Service (economics) Scaling (geometry) Multiplication sign Moment (mathematics) Metadata Division (mathematics) Online help System call Frame problem Computer programming Scalability Metadata Revision control Web service Word Different (Kate Ryan album) Software Video game Integrated development environment Quicksort
Medical imaging Digital photography Functional (mathematics) Virtual machine System identification Species
okay thanks everyone I guess that's the introduction I'm an analyst developer and I've been with the Atlas now for about four years my first developers I attract which is a platform for animal tracking data and both so and so Jack and I were taken into the folds of the Atlas some some time ago and of wheedled my way into this management position and that's the capacity that I speak to you in today so I I work with all of our ecological analysis tools so a quick introduction
to the Ala who here has heard of or used the Ala okay so we're kind of all friends the Ala is is a big database of all plants and animal information so we are an aggregator of biodiversity data we pull it together from multiple sources and then we make it freely available for reuse so we're funded by increase the important words there are collaborative infrastructure which means that we're driven by an open-source software development strategy and open data policies we're hosted by CSIRO there's about 30 of us some of most of our staff are based in Canberra at Black Mountain and there's a handful of us littered around Melbourne on with the Australian Australian node of GBF and you saw GB before in Jane's talk the GBF has several international nodes and we're partnered with a whole heap of museums and collections people so the original idea for the atlas came from all the museums and collections wanting to have a central database such that they could look up their species and location information that was around about 10 years ago we're still partner closely with all of those guys I'm based at the Melbourne Museum myself our open source
software development has been really successful we've got countries all over the world now picking up our software and and using it
they're about a dozen and there's about ten more in negotiation just last week Austria came on board and so we even signed up recently as well we're working on building that living Atlas's community basically so now that it just doesn't become a whole heaps of forks of our software and we can end up having a common code base and get the good work that those other countries are doing collaborating and bringing stuff back back to us so my talk will go through
these elements here on the left-hand side data capture processing what we do to the data some of our discovery tools and mostly our data analysis and visualization a couple of our visualization platforms just quickly
with data capture we have a data management team that sits there and pulls data in from all sorts of different places we have automated and manual loads we speak Darwyn core quick show of hands for who knows what the Encore is it's more or less a biodiversity data standard so for us that means it's a bunch of 186 terms we can tell people make these your file headers and we roughly know and understand what those things mean so it contains things like species name species is trying to Vic name or decimal latitude decimal longer shoe we've got other platforms that pull in data or non-occurrence data you might say or some occurrence data we've got to buy a collect platform for supporting citizen science and field data collection a profiles up that gives us more descriptive information about species so track is my application for managing and visualizing animal tracking data and digivolve we support as well which is a digitization and transcription platform no down this way
okay our data processing I just wanted to take you through the sorts of things we do to that data once it comes in we run it through a massive engine we augment each record with a whole loop of information taxonomy about environment and the spatial context and then we run a whole heap of data quality tests on that data
so our first our first up is taxonomy we've got a piece of software that we built called the large tax on collider and the guy's a quantum physicist and here we work with all of the different Australian taxonomic authorities and we try to come up with a big giant list unique list of all Australian species it handles things like updates as and and and mergers and synonyms and all of those all of those sorts of things that happen in taxonomic names and for those of you who aren't aware of what a bloodbath taxonomy it's quite an exercise so so we get that taxonomic information we come up with the rights what we think it's the right scientific name for that organism and we actually add to the record the whole taxonomic tree basically so it's easy to look oh
we host around about five hundreds spatial layers different environmental layers contextual layers I had no idea how to describe all of these on a slide so I'm sorry about how much information is there but an important part of our data processing is that we take each location and we intersect it with each and every one of those layers we grab the value back and put it up next to the record so that we've so that we've got that easy to use later on and then we run our data quality tests we have around about a hundred data quality tests they're in the process of being internationally standardized for biodiversity through the tax fund anomic databases working group which is the Darwin core the Darwin called people so Jane alluded to before the idea that there's lots of data quality issues within the Atlas and that's entirely true because we rarely actually throw data out and we don't set ourselves up to be the judges of what makes a good record instead we try to run these so that we can help people assess the fitness for purpose of a record for the use we run these hundred data quality tests that they can then pick and choose what what might be useful for their for their purpose so species distribution modeling is obviously needs a really high quality record and you would do quite a bit of filtering before you found your right data set for such a scientifically important scientifically important goal but if you're just around with data you might just want everything so those data quality tests do things like check names check whether that record is where we would expect that species to be among other things and there's I think around half of them I location-based testing so what we've got in the end can be a record the top up to a thousand fields wide and we put that into a Cassandra database and index at some index with solar so we've got a we've got a lot of information that's come along with our original
record most people most people I don't know if that's most people which we've got a web front-end for navigating our data we call it the biodiversity information Explorer you can search species location you can go in through your collections or buy data set
probably our biggest strength is our Web API so the Atlas is a service-oriented architecture so we have our our database at the back that we call bio cash and we've got lots and lots of different front ends not just our main front-end but also other front ends like the Australian virtual who are variant the thing is that our database and the web service layer that sits in the middle to support all of our infrastructure is also publicly available so we publish our our API and anyone can use any of the tools that we use internally so our API is on API al-adel today you and these are just the groupings here on this side of of the the different services we've got we've got around about a hundred I think that a that are exposed
the spatial portal spatial de la is our is our visualization and analytics tool for dealing with with all this data so its purpose is to manipulate analyze display import export spatial data so it's got every tool Under the Sun I can think of to to be able to to work with species with areas the layers and those dozen or so facets so you can come up with so that you can come up with you know visualizations as such all the river red gums in occurrence data colored by the the type of observation that they are like the specimens for from the museum's as opposed to human observations we've we've colored there the MBOs sub grouping that those occurrences lie in and overlaid it with a temperature layer so those are the sorts of you know visualizations that that we can do within the spatial portal
it's it's got lots of lots of other tools out the back end there's scatter plot analyses for working with the continuous variables that come in through this but the environmental layers there's cross tabs for the discrete variable discrete variable analysis there's prediction software in there MaxEnt and i think there's around about we have identify there's around about seventeen and analysis tools in the spatial portal there we have a la for
our a la for i was written around about not wind around about 2014 by a chap called Ben Raymond at the Antarctic division our life uh hits those web services and puts them in a nice special point to data frame so that they're right there to use but like by tools like leaflet and ggplot and it's really trivial to to get that going if you're if you're in our program um I think it covers most of the API it doesn't it covers the mail the more well used API services so just for my last
couple of words I just wanted to talk to you about some of the issues that we're having at the moment around around around spatial stuff we're coming up ten years old next year we'll be celebrating ten years of going live which is pretty cool and I guess you know I think I think it was originally foss4g was he was it in Melbourne before or so we got a lot of help I hear last time we came with dynamically producing tiles and and these sort of services so we are appreciative of the conference and and we're sort of turning from this innovative startup II sort of culture into a BAU house and that we we've got a lot of work to do we haven't looked out with a lot I think and you know we've we have to look into things like these wxs services and what's going on making sure that we're we're keeping up with new tech probably a bigger problem though at the moment is our 500 spatial layers they come from all different agencies we want 200 more we've got 200 more waiting in the pipeline's they're all different they all have different licensing arrangements they all have different metadata different coverage different styles scales all sorts of all sorts of things and I'm sure there's a few of you in this room who are familiar with those sorts of problems and I guess we're wondering who else is dealing with these sorts of problems and is their call for a central agency that can manage these sorts of things that we can they then help us with web services like we have to to to intersect those layers and send us back in you know a value so we have we have those services ourselves we we produce the service you send a lot long and your layer name or you can get values back for all the 500 layers if you like and and and bring back those values we have a batch version of the same and we feel it wouldn't it be great if someone else could do that so you need scalable infrastructure standardized layers standardized vocabularies all that sort of boring stuff that isn't such sexy work to do but has such great returns so that's it
for me thanks very much everyone and yeah are there any questions and species databases machine learning yes I guess some well we're teaming up with my naturalist to really in place with deep learning so though emergencies do to do species identification on the images yeah so that's great for well-known species but once you get to the title so you might have a few images that gets hotter and hotter yes way we're really looking at big learning can't wait and we wait as I get something that would be great taking photos wondering what species it is yeah yeah yeah well check out our naturalistic a really great species suggestion functionality and when we we're creating an Australian motive I naturalist [Applause]
Loading...
Feedback

Timings

  375 ms - page object

Version

AV-Portal 3.20.2 (36f6df173ce4850b467c9cb7af359cf1cdaed247)
hidden