We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Human-in-the-loop Machine Learning with Realtime Model Predictions using GroundWork and Raster Vision

00:00

Formal Metadata

Title
Human-in-the-loop Machine Learning with Realtime Model Predictions using GroundWork and Raster Vision
Title of Series
Number of Parts
351
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2022

Content Metadata

Subject Area
Genre
Abstract
Acquiring and labeling geospatial data for training machine learning models is a time-consuming and expensive process. It is made even more difficult by the lack of specialized open-source tools for dealing with the idiosyncrasies of geospatial data. At Azavea, we have encountered both of these problems before. In this talk, we will present a solution that incorporates our geospatial annotation platform, GroundWork (groundwork.azavea.com), with our open-source deep learning framework, Raster Vision (rastervision.io), to provide a human-in-the-loop active learning workflow. This workflow allows labelers to immediately see the effect of their created labels on the model’s performance, thus speeding up the labeling-training-labeling cycle and making the connection between the AI and human GIS data labelers easy and seamless. This talk will extend the hands-on experience introduced in last year’s “Human-in-the-loop Machine Learning with GroundWork and STAC'' FOSS4G workshop. We will present an enhanced active-learning workflow that allows labelers to train a model and see predictions on-the-fly as they create labels in GroundWork. The model-training and predictions will be handled by Raster Vision. This workflow will give the labelers a clear view of the model’s current strength and weaknesses at all times, and thus allow them to direct their labeling efforts more efficiently. Newly created labels will propagate back to the AI model in real time, and an asynchronous job will continue to refine the model and predictions. This loop is backed by the open-source Raster Foundry (rasterfoundry.azavea.com) and Franklin (azavea.github.io/franklin) APIs, and is compliant with the STAC (stacspec.org) and OGC Features (ogc.org/standards/ogcapi-features) open standards.
Keywords
Machine visionLoop (music)Virtual machineLoop (music)Virtual machineProduct (business)BitCollaborationismSheaf (mathematics)Presentation of a groupComputer animation
SoftwareOrder (biology)Integrated development environmentCommitment schemeBitBeat (acoustics)Open sourceGeomaticsComputer animation
HypothesisProcess (computing)Order (biology)
Right angleMeeting/Interview
Mathematical modelLoop (music)Source codeData storage deviceOrder (biology)Wave packetSoftwareIterationTask (computing)Multiplication signApproximationShift operatorMathematical modelPredictionVirtual machineSet (mathematics)Process (computing)Confidence intervaloutputFunction (mathematics)Computer configurationExpert systemLoop (music)Mathematical modelPresentation of a groupSoftware testingCASE <Informatik>Computer animation
Mathematical modelProduct (business)Open sourceImplementationMereologyDifferent (Kate Ryan album)Mathematical modelLikelihood functionMachine learningMachine visionConstructor (object-oriented programming)CASE <Informatik>PredictionFile formatExecution unitTask (computing)RectangleWave packetVector potentialComputational visualisticsDampingRoundness (object)Right angleContext awarenessMedical imagingDivisorGoodness of fitLibrary (computing)Process (computing)Data managementLibrary catalogQuicksortOnline helpStack (abstract data type)Projective planeVirtual machineRaster graphicsComputer animationXML
Machine visionRaster graphicsOpen sourceLibrary (computing)Task (computing)Software developerLoop (music)Validity (statistics)PredictionData storage deviceProcess (computing)Library catalogContext awarenessElectronic data processingWave packetFile formatPoint (geometry)Machine visionMathematical modelRaster graphicsVirtual machineMobile appComputer animation
Group actionFreewareData typeNumberTask (computing)Computer-generated imageryFlagVideoconferencingProjective planeMobile appObject (grammar)Wave packetSet (mathematics)Medical imagingTask (computing)Level (video gaming)Computer animation
Task (computing)FlagCASE <Informatik>Object (grammar)Computer animation
FlagTask (computing)Wave packetMereologyOcean currentTask (computing)Perfect groupGoodness of fitPixelComputer animation
PredictionTask (computing)Computer-generated imageryWritingRaster graphicsWave packetOcean currentoutputLevel (video gaming)PredictionLoop (music)Mathematical modelProcess (computing)Roundness (object)Task (computing)Computer animationProgram flowchart
Heegaard splittingTask (computing)PredictionWave packetRoundness (object)Validity (statistics)Task (computing)CubeQueue (abstract data type)Mathematical modelMedical imagingMereologyComputer animation
Data typeTask (computing)NumberComputer-generated imageryPredictionPoint (geometry)Cartesian coordinate systemRoundness (object)Loop (music)VideoconferencingComputer animation
VideoconferencingVirtual machineComputer animation
PredictionTask (computing)Task (computing)AreaoutputPredictionEntire functionMedical imagingComputer animation
Mathematical modelWave packetService-oriented architecturePredictionPixelView (database)Loop (music)Mathematical modelVirtual machinePredictionMedical imagingChemical equationMereologyTask (computing)Roundness (object)DampingNumberWeightPixelMultiplication signWave packetInformationAverageFunction (mathematics)Mathematical modelAreaTransformation (genetics)Network topologyObject (grammar)Process (computing)Demo (music)InformationstheoriePolygonLevel (video gaming)Artificial neural networkDemosceneThresholding (image processing)Array data structureDialectSingle-precision floating-point formatMachine visionMultiplicationEntropie <Informationstheorie>Point (geometry)Functional (mathematics)QuicksortMoment (mathematics)Raster graphicsComputer animation
Product (business)Wave packetInstance (computer science)Raster graphicsMachine visionGraphics processing unitPredictionHistologyProcess (computing)Wave packetPredictionSource codeNumberMathematical modelIntegrated development environmentVirtual machineInstance (computer science)Open sourceLine (geometry)StapeldateiConfiguration spaceCloud computingInformationComputational visualisticsProduct (business)DatabaseMedical imagingoutputFront and back endsTask (computing)Decision theoryIterationWindows RegistryRaster graphicsMachine visionValidity (statistics)Computer animation
RoundingWave packetResultantAreaRoundness (object)Set (mathematics)CASE <Informatik>Validity (statistics)NumberTerm (mathematics)Shape (magazine)Revision controlNetwork topologyBranch (computer science)MultilaterationIterationPredictionVirtual machineMathematical model1 (number)Computer animation
Machine learningMultiplicationElectronic visual displayPredictionHypercubeConfiguration spaceTask (computing)Machine visionHeat transferDemosceneMathematical modelComputerVirtual machinePresentation of a groupComputer animation
GeometryProjective planePresentation of a groupProduct (business)TrailComputer animation
QR codeLink (knot theory)Form (programming)Machine codeMachine visionDigital photographyVirtual machineMessage passingTouch typingOpen sourceMusical ensembleRaster graphicsComputer animation
Transcript: English(auto-generated)
Good morning. Thanks for joining us. My name's Simon. I work for a firm called Zavia and I'm here with my colleague Aaron. Our other colleague and collaborator is Adil. He unfortunately couldn't make it today in person, but you'll hear from him later via prerecorded section of the presentation. In this talk, we're going to give you an overview of some work
we did incorporating two of our products into a human in the loop machine learning tool. But first, a little bit about Zavia. We're a B Corp based in Philadelphia, focusing on geospatial technology for civic, social and environmental impact. We also have a strong
commitment to open source software and try to contribute to the open source geospatial community whenever possible. All right, so in order to motivate what we're doing here, I'm going to walk through a little hypothetical scenario. So let's say that you just got hired as a short order cook at a diner. But here's the problem. You have
no idea how to cook at all. I guess you're persuasive because you talked yourself into this job, but you got no idea how to work griddle. In fact, the only thing you really know how to cook is toast. You'll be working breakfast, but really, how useful are you if that's all you can do? So you get to the diner and you meet Mike. He's
like, well, just when you think you've seen everything, right? So I can do my best to teach you today, but we've got to make the most of it because you're on your own tomorrow. So he teaches you how to make some stuff and then the customers start
showing up. It's getting busy and the orders are rolling in. You had a quick lesson, so you kind of know how to make these dishes, but you're far from an expert. So you need to use these orders as learning experiences and make the most of them in the time you've got this morning. You're not going to be able to make every order, so how do you
choose which dishes you're going to cook? You've got some options on the best way to go about this. You and Mike could just take turns grabbing whatever random order gets printed out. So this may be a decent approximation of what tomorrow's shift will look like when you're working alone, but again, you got to make the
most of this training time, and if you do this, you'll probably spend too much time working on the tasks like toast that you don't need practice in. No, you would want to choose the dishes that you're least confident in. You want to be making one dish, and then once it's done, you come over to Mike for another assignment. He'll look over the options, find one that you need the most work on, and say, here you go. Go make pancakes. And you do it, then you bring the
pancakes back to Mike. He'll look at it. If you nailed it, maybe he'll send you off to make French toast because you haven't done that yet, but if you screwed up, he'll explain what you did wrong and send you back to make another order of pancakes. So why am I talking about this? Why am I talking about
breakfast at this software conference? I'm talking about this because this approach that we just decided on is the same approach that human-in-the-loop machine learning uses. So a human-in-the-loop AI system, which is also sometimes referred to as active learning, and I will shorten to HIDL for the purposes of this presentation, is one in which a human is involved in an
iterative training process, or a loop, with the AI model. So this approach is appealing because it promises the harmony of the efficiency of the machine with the nuance of a human. So we start by labeling a small amount of
randomly selected data. This is the initial training set. We use this to train a basic model. The model comes back with predictions of varying confidence levels, so we focus on confirming or correcting some of the predictions that the model is least confident in, then add those corrected labels back to the training set. So now the training set is not only larger
but it's more diverse. We've targeted our input specifically to fill in gaps in the model. So to tie a bow on our diner metaphor before we leave it behind. In this case, Mike is the human. You are the AI model. The training process happens when you cook food. Your low-confidence predictions are the
dishes you bring to Mike for inspection, and he corrects the outputs by telling you what you did wrong. So why use this HIDL approach? So the first reason is efficiency. There's no shortage of imagery out there. The limiting factor is usually a lack of ground truth, good ground truth labels,
and of course the process of labeling imagery is expensive. So one way to think of an active learning approach is as a method of simply speeding up the process of human labeling. So let's consider the two images here. On the left is an image that you would need to label from scratch, and on the right is the same image with predictions that you would be correcting
in a HIDL context. So the active learning case is much quicker because the model has already done some of the work for you. Yeah, you'll likely need to adjust, delete, and add some labels, but it's certainly much quicker to label the image than which is to correct the image than to annotate
every car from scratch. So this approach also offers the potential for model transparency. Here we've got the predictions from a first round of training. You can see that the model labeled one car correctly, but it also thinks that a lot of different rectangular formations on top of the roof
are cars, which they're not. In a traditional approach, the model would just happily carry on, labeling skylights and HVAC rooftop units as cars, but this approach allows the human to peek into what the model is doing and say, that's not bad, but here are some ways you
could do better in the future. A third benefit is that it can help find edge cases. So in this example we've got a piece of construction equipment and what looks like to be some sort of shed, both of which are car-like enough for the model to be confused. So it stands to reason that
these will be the predictions that the model will be least confident in, so they'll be presented to the human to confirm or correct, therefore increasing the likelihood that it will be able to correctly classify the next piece of construction equipment or the next shed. And we were interested in this approach
because we do a lot of work applying computer vision to geospatial imagery, and as part of that work we've developed these two different tools, Groundwork and Raster Vision, which are foundational to this project. Groundwork is an imagery labeling tool for geospatial imagery. It's designed for creating training data for machine learning tasks. It has an open
source API which includes the active learning implementation that we're discussing today, and it works such that users upload imagery to Groundwork which breaks it into manageable chunks of work called tasks. Then users can collaboratively label and validate imagery before exporting the label in stack catalogs. Raster Vision is a library for deep learning on
geospatial imagery. It's fully open source and built on top of open source libraries like GDAL, NumPy, and PyTorch, and if you're interested in contributing we certainly encourage you to reach out. So it enables geospatial developers to bypass much of the tedious data processing tasks
associated with machine learning. They can pass in geospatial data in formats like GeoTIFF and GeoJSON, train a model, and get predictions back in the same familiar formats. Raster Vision handles everything in between. It's particularly easy to work with Raster Vision in conjunction with Groundwork
because both of these tools read and write stack catalogs. So pulling the two tools together in a HIDL context, we label imagery in Groundwork which stores the imagery with the labels. When a user kicks off a job via the Groundwork UI, it takes the imagery and labels and uses Raster Vision to
train a model and make predictions on the unlabeled imagery before sending those predictions back to Groundwork where we use the UI for human validation. As predictions are validated they get stored as labels and then after some validation the user can kick off another job starting the loop again. So at this point I'm gonna turn it over to Aaron to demo the app.
Thank you Simon. So yeah let's play the video. Suppose you have a project in Groundwork and you want to create some training data set for cars from the image in here. So you start labeling. Here you have a bunch of tools to help you trace the objects on the map. So when you are done labeling for this
current task, simply click on the confirm button and it will submit the labels for you. And if you see none of your like you know object of interest which in this case cars in here, you can just submit and it will not ask you to like label it again. And here we have a bunch of tools for you.
For example you can trace the cars by dropping vertices or by continuously moving your mouse. It works the same way. Since this is you know cemented segmentation you would probably want pixel perfect or somewhat pixel perfect like labels. So the tool does exactly that. It cuts out the part
of the labels that fall outside of the current task grid so you have a good quality of training data set. So when you're done labeling for this current task grid, you can go back around only with just a few labels of course. There is this predict button on top of the task map to let you do predictions. It kicks off a human the loop process that takes in the labels you just
drew and trains our active learning model under the hood. And this was done in just a few minutes with an in-app notification. And in here it gives you predictions already created some like pretty nice predictions with just a few label input and now you may want to start labeling or like validating these
predictions again. And this is just after three rounds of training and prediction and validating. So now you're in the validation queue. Under the hood the tasks in this validation queue are ordered in a way such that you will be focusing on the part of the image that the model is most uncertain about. What does it mean? It means HIDL will like you know leave the
heavy lifting to the active learning model under the hood and will reduce your you know labeling capacity or reduce your labeling effort. So at this point you want to pause and you want to start a new round of prediction that's also fine. And you are the human and this is the loop
built into the groundwork application. And now I will pass it to Adil who is not here unfortunately today but he recorded a pretty nice video to talk to you guys through the deep dive of machine learning. Does the audio work?
Audio is not working. All right. So I'm going to walk through Adil's portion given that the audio isn't is working. But this is going to let's start off with that conceptual image and take a closer look. Now we know that apart
from the labels it also takes in as input the entire groundwork task grid and apart from the predictions as just saw it also outputs priority scores for all the tasks in the task grid that basically help you prioritize which areas of the image to label next. With that in mind we can proceed to
take a closer look inside the black box and see how the magic happens. The first step is to train a model specifically a neural network. Now geospatial scenes are usually too large to fit all at once into a neural network. So we'll first need to break it down into smaller training chips. We also want to be careful about sampling those training chips only from
the parts of the image that we have actually labeled. Now rest of which in the library that we're using for this is well equipped to handle all of this and makes it fairly trivial. And you can see here what the process looks like at a high level. Now all of this is pretty far off the course for geospatial machine learning. However it is worth mentioning that there are some
machine learning considerations that only come into play with this kind of active learning workflow. The first of these considerations is the starting point of your model base. In each round as you add more labels you will have to update your model and to do that you have a choice of whether to
create a brand new model and train it from scratch or to pick up your model from the last round and continue training it with new data. We found that the second approach works better in practice. However it does have the potential disadvantage that if the model learns something incorrect in earlier rounds it might not be able to unlearn it in later rounds. The
second major consideration is your sampling approach to sampling training chips. Now one thing to note is that your pool of available data is not static throughout training. In fact it grows with each round as you add more labels. And so you probably want to increase the number of training chips
that you use in each round to take advantage of that. That's what we're doing currently. We currently scale up the number of training chips linearly with the number of label tasks. Another sampling consideration is especially if you're using the same model in each round is that it
will get exposed to some parts of the image more often than the others. And so you might want to strike a balance on how often it sees old data versus new data. Thirdly and finally we've made a big deal about sampling only from the parts of the image that are labeled. However believe it or not the unlabeled portions are not entirely useless. In fact in the
past few years a lot of research has gone into techniques like self-supervised and semi-supervised learning that are able to squeeze out useful training information even from parts, even from data that is not labeled. Using those techniques with this kind of human in the
loop workflow could prove to be a powerful combination and is exciting to think about. Regardless of how you choose to train you will end up with a model. Well not this model. Probably something a million times larger but a model all the same. And the next thing you want to do is to run this model over the entire scene to produce
predictions which at the most basic level looks something like this. Now what this is is a raster or two dimensional array of pixel probabilities where the value of each pixel represents how likely it is to be part of a car. Brighter regions here correspond to higher values, darker regions are lower values
and the transparent regions are values that are so low that they're basically zero. With that in mind we can see that the model is more or less getting it correct except for a few interesting mistakes which we'll get back to in a moment. But first at this point we are ready to produce the first one
of our two main outputs that is prediction polygons. And to do that we can simply threshold this image using a value of let's say 0.5. To get something like this we can we can then take all these contiguous regions convert them into polygons and ship it off to ground work where these will
show up as predictions on the map as you saw in the demo. Now getting back to pixel probabilities we can take a closer look at some of the areas that the model is struggling with. We can see for example that it is a little uncertain about those cars on the right side of the road that are partially hidden in the trees. It is also
mistaking some of the vents on rooftops with cars as well as a small round object on the left sidewalk near the top of the image that is clearly not a car. Now these are exactly the kind of areas that we want to prioritize for human correction. And to do that we
would want to apply some kind of transformation that would take these sort of middling values that is not too high and not too low and assign them the highest scores. To do that we use the entropy function from information theory that does exactly this. Note however that this is not the only way to define uncertainty. We could
for example train multiple models, make predictions from each of those models, and then take those areas where the models most disagree and categorize them as the most uncertain areas. But that is obviously very expensive to do. So anyway once we
have these uncertainty scores at the pixel level the final step is to realize that we cannot expect the human to go pixel by pixel correcting them. So what you want to do is aggregate them to a less granular level. To do that we simply take the average. So if you imagine this to be a single groundwork task the priority score for that task
would then just be an average of all the values that you see here. And this completes the second one of our two outputs that is the priority scores. And now we're ready to zoom back out of the machine learning black box and take a more holistic view of how all of this comes together.
Awesome. So let's take a look at how we engineer this under the hood. So we engineer this workflow by combining our open source APIs with the cloud infrastructure for productivity. So you can create any number of training data to get the model started. The labels are then sent through our APIs
and are then persisted to our open source PostGIS and Postgres database. So when you start a new prediction run in Groundwork the APIs under the hood they submit a new job to AWS batch. So this job is like a snapshot of all your image
data and your label data and it contains all the information needed for the machine learning down the line. So the batch process of course that would submit a new job to our configured compute environment which has a fleet of instances ready to take your input and then do some predictions on them. So each of these GPU
enabled instance would pull the hiddle container from the registry and then it will take the snapshot of your input data and then run a custom raster vision active learning model under the hood to give you some training and predictions. So of course during the training and predictions and among these iterations we persist the learning artifacts to S3 storage
so that the model can learn from the previous iterations as well. So this asynchronous workflow produces two major products as mentioned before, the prioritized task grid and the prediction labels. These are pretty critical for you to make further decisions
when you are validating these predictions. So these data are then persisted through our API to groundwork to the backend and when this is done as demoed before you will get an in-app notification of okay you have these predictions now you want to validate. So from here you can validate these predictions based
off of the model suggestions. So you can create your own edits on top of them and your own edits are your own copy of the labels and the predictions are the predictions. They do not affect each other. So here you are ready to go for another round in the loop until you're happy about the training data set. And now let's see some quick
results. So as you can see in this area the predictions got improved like round by round. So let's say like example car number one. So the model started with some very initial predictions in this case and it then gradually like you know have like fuller annotations after some human validations in later rounds. And let's
look at example car number two and three in here. These are the cars under the tree branches and as you can see the model predicted better in later rounds with some human involvement as well. So the predictions they also tend to converge to a more stable version of the predictions themselves in terms of the shape in later
rounds and this can be seen in all of these labels and especially in label number four. So this shows that with some proper human intervention in between of just a few machine learning you know iterations and some like you know very little training data set the resulting
training data set has already a pretty good quality. So we do plan to improve this entire workflow and we want to enhance that in-app experience as long as well as the machine learning under the hood. And if you're interested to know more or talk to us after this talk we're happy to show you more. So
we hope this presentation sounded interesting to you and if you're interested in like you know working on these cool projects or products we're currently hiring our CEO Robert who's also here in this talk and who's also going to be at booth 11 throughout the entire conference so please drop by if
you want to talk to us directly as well. Our colleagues Mike who's also here today and Daniel they are going to have a general track talk tomorrow in the same room at 9 a.m. excuse me about district builder and Topo Jason tomorrow 9 a.m. the same room. And yeah if you would like to
if you'd like to try this HIDL workflow out please let us know. It is currently by request only so if you're interested you can send us an in-app message or to talk to us directly today or you can scan the qr code here and or take a photo to fill out the contact form and also if you like to
try out raster vision which is the machine learning tool under the hood please scan the code there because that's a github like link to that open source tool. And if you have any other questions we're here today and you can also submit your questions through the contact form on this QR code as well. Yeah that's that's it let's keep in touch. Thank you guys.