We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

CanoClass: Creation of an open framework for tree canopy monitoring

00:00

Formal Metadata

Title
CanoClass: Creation of an open framework for tree canopy monitoring
Title of Series
Number of Parts
237
Author
Contributors
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Forested areas play an integral role in the maintenance of both local and global environments. They are the bulk of Earth’s carbon sequestration for mitigating anthropogenic processes, provide natural erosion and runoff control for flooding events which have been growing in frequency because of climate change, and can offer respite for urban heat islands. The effective creation of canopy data is of utmost importance to analyze the aforementioned processes in addition to forest patterns such as disturbance, mortality, and the societal and economic effects forests can provide. Because of the importance of forests and the cycles they are apart of, it is imperative that systems are created that enable the effective monitoring of forest canopy. In particular, canopy classification using remotely sensed data plays an essential role in monitoring tree canopy on a large scale. As remote sensing technologies advance, the quality and resolution of satellite imagery have significantly improved. Oftentimes, leveraging high-resolution imagery such as the National Agriculture Imagery Program (NAIP) imagery requires proprietary software. However, the lack of insight into the inner workings of such software and the inability of modifying its code lead many researchers towards open-source solutions. In this research, we introduce CanoClass, an open-source cross-platform canopy classification system written in Python. CanoClass utilizes machine-learning techniques including the Random Forest and Extra Trees algorithms provided by scikit-learn to classify canopy using remote sensing imagery. One such similar Python module that is based on scikit-learn is DetecTree, but it does not utilize near-infrared (NIR) band imagery. Subsequently, to the best of the authors' knowledge, there are no dedicated tree canopy classification libraries that use scikit-learn in conjunction with infrared data.
Smith chartPiSocial class2 (number)Open setSoftware frameworkMeeting/Interview
Goodness of fitIntegrated development environmentData analysisPresentation of a groupComputer virusStudent's t-testUniverse (mathematics)Level (video gaming)Workstation <Musikinstrument>State of matterMeeting/Interview
Open sourceCAN busSocial classSystem programmingEntire functionAnalytic setProduct (business)State of matterSoftwareNetwork topologyDesign by contractComplete metric spaceMathematical analysisMetric systemArmComputer animation
Observational studyPoint cloudTask (computing)SoftwareSystem programmingForestDataflowTensorOpen sourceData storage deviceConnected spaceDivision (mathematics)Digital photographyQuadrilateralComputer-generated imageryMassImage resolutionMusical ensembleGreen's functionLibrary (computing)Matrix (mathematics)Linear regressionMathematical analysisArtificial neural networkBefehlsprozessorProcess (computing)AlgorithmNetwork topologyMultiplicationSample (statistics)Standard deviationCorrelation and dependenceSuite (music)Independence (probability theory)Price indexSurfaceVisual systemVector potentialVarianceRandomizationNetwork topologySet (mathematics)VarianceProduct (business)ForestImage resolutionVirtual machineMusical ensembleUtility softwareExecution unitSupport vector machineQuicksortArtificial neural networkAreaMatrix (mathematics)Process (computing)Sampling (statistics)Subject indexingAuditory maskingSound effectFundamental theorem of algebraNumberLevel (video gaming)AbstractionFile formatSemiconductor memoryProxy serverService (economics)MetreDecision theoryBootstrap aggregatingNormal (geometry)Well-formed formulaBefehlsprozessorScaling (geometry)RainforestAlgorithmCovering spaceReal-time operating systemMultiplication signStructural loadHeegaard splittingBitExtension (kinesiology)Library (computing)Type theoryDivisorStandard deviationOntologySocial classCore dumpIntegrated development environmentMultiplicationTexture mappingPoint cloudMetric systemObservational studyComputer programmingGraphics processing unitAdditionState of matterSupercomputerAngular resolutionLimit (category theory)Physical systemComputational scienceLogic gateView (database)Group actionWater vaporSoftwareWebsiteDifferent (Kate Ryan album)Workstation <Musikinstrument>System programmingSpeech synthesisMereologyMedical imagingCloud computingUniverse (mathematics)Open sourcePoint (geometry)Personal digital assistantCuboidArmBit rateWordComputer animation
Price indexBounded variationMusical ensembleSurfaceDigital filterStapeldateiParameter (computer programming)Process (computing)Boundary value problemAreaCalculationSoftware testingGamma functionPlot (narrative)Pairwise comparisonGraph (mathematics)Similarity (geometry)SineParameter (computer programming)Musical ensembleAreaPairwise comparisonMathematical analysisSubject indexingFunction (mathematics)AdditionFlow separationSubsetInternet service providerDifferent (Kate Ryan album)Similarity (geometry)TesselationProduct (business)SoftwareLocal ringForestStatisticsSmoothingGraph (mathematics)AlgorithmResultantMultiplication signMetric systemSound effectProcess (computing)Medical imagingSuite (music)Slide ruleQuicksortRaster graphicsCASE <Informatik>RandomizationSelectivity (electronic)Morley's categoricity theoremObservational studyCoefficientFamilyArithmetic meanNetwork topologyLatent heatComputer fileSoftware testingSet (mathematics)Point (geometry)Greatest elementFunctional (mathematics)WindowRaw image formatRight angleState of matterPopulation densityoutputConfiguration spaceContext awarenessBlack bodyImage processingProjective planeOverhead (computing)RoboticsArmExistenceGrass (card game)Staff (military)PreprocessorBit rateWeb 2.0RainforestDataflowFigurate numberCategory of beingMassEngineering drawing
SineMusical ensembleMusical ensembleCASE <Informatik>BitPresentation of a groupTouchscreenImage resolutionInternet forumRight angleDemosceneMultiplication signComputer animation
Musical ensembleAddress spaceBoom (sailing)Direct numerical simulationMultiplication signHarmonic analysisArmAngular resolutionScaling (geometry)Social classMatter waveMusical ensembleLibrary (computing)QuicksortAuditory maskingNatural numberSoftware testingWater vaporAreaTesselationPresentation of a groupTape driveEmailInternet forumGrass (card game)Different (Kate Ryan album)Shared memoryCASE <Informatik>Image resolutionState of matterProgramming paradigmAcoustic shadowVideo gameCellular automatonSet (mathematics)Workstation <Musikinstrument>Right angleGroup actionLie groupExecution unitBoss CorporationSpeciesMeeting/InterviewComputer animation
Meeting/Interview
Transcript: English(auto-generated)
Okay, the second talk today is cannot class create of
Open framework for three cannot pie monetary with Owens meet Thanks, Owen. Good morning Hello. Good morning Okay Owen is a student of the Institute for environmental and the spatial data analysis
The universe of North Georgia. It's okay Okay, Owen the stages or yours are you share your presentation and good luck and good presentation Thank you
So this work was initially Undertaken and completed roughly two years ago while I was at the Institute for an environmental spatial analysis And I recently just graduated there. I'm now at North Carolina State pursuing my PhD in geospatial analytics
So with that in mind, I've learned a lot since then And yeah, but so this this work came about With the completion of a contract for the Georgia Forestry Commission in which we were using proprietary software to create tree canopy
products for the entire state of Georgia using ape imagery There we go, so they they wanted to know a lot about the deforestation metrics in the in the state Especially as Georgia within the United States is growing rapidly
Urban sprawl is a huge issue and naturally with that comes a lot of deforestation the different environmental factors that are caused by it And so we go into a little bit about deforestation here and the small scales and even at the large scales of what I can do and then
They wanted to be able to monitor it. However, we didn't have a ton of resources Ideally, we would have some sort of access to cloud compute to have real-time monitoring systems set up To be able to update and then with that we we were using texture ons feature analyst
Which in licenses are expensive, so Dr. Cho, Hiede Cho and I at the University of North Georgia. We were thinking well, can we do this with open-source software? And so we did and so the previous studies as I mentioned the shorter Forestry Commission, which was undertaken by us
And then others have used PyTorch care as tensorflow or theia toolbox and the likes So the imagery used for this Was the national agricultural imagery program otherwise known as a nape imagery it's collected by the USDA every three four years give or take and then it
Traditionally has been a one meter resolution. However after 2019 It's now at a 0.6 meter resolution and they offer it in two in two formats They offer it in three band just standard RGB then additionally a four band Product which has veneer for red band and as a really exceptional
Pre-processing quality a lot of it's flown From airplanes. So hence the high resolution And they remove cloud cover for us So there's there's really no need for a cloud mask oftentimes which helps with the processing because it's an extra step
That doesn't need to be done So the Python libraries that were used in this Were GDAL obviously as I'm sure everybody attending this conference knows GDAL and then numpy which is the kind of fundamental Computing library for science in Python that in addition to scipy
and then GDAL Utilizes numpy really heavily for its abstraction into Python and then additionally scikit-learn Which Charlotte gave a great overview of Utilizes numpy as well really heavily. It's all implemented in numpy
And then onto that scikit-learn It's it's pretty much to go to for Python machine learning again, it's built on top of numpy and scipy the sort of premier Python scientific computing libraries And then but why scikit because I mentioned other
other libraries such as PyTorch and Keras earlier and the reason for that is a lot of these packages use artificial neural networks pretty extensively and then they additionally use these for Processing along GPU units. Well at the time
That this was conducted That they had a really limited AMD GPU support and I didn't have access to any other Type of GPU. So I was pretty limited with that ultimately we were CPU Confined so we wanted something that was able to split parallel across CPU and scikit does that
Really? Well with in particular their random forest classifier And then since scikit is is built on top of numpy It all uses numpy extensively. They integrate really well together for any sort of remote sensing classification that's needed
So on to the algorithm then so we use we we decided to use random forests And Charlie gave a really great overview of that as well and so the reasons for choosing this over other things such as neural networks or support vector machines was that
Rainforest has been found to be very useful in land cover classification it is Has it has a good split of both time and accuracy As a lighter load computationally then say they add a boost algorithm And again, it can be parallelized across the CPU, which is incredibly useful
Especially as a we didn't have access to any sort of GPU that that was able to be utilized for this in the Python environment and B it becomes incredibly useful if
Used in a high-performance computing environment where primarily you're working across CPU cores one consideration though is it can be a Memory hog especially as the as a matrix of number samples by the number of trees is stored in the memory So this also becomes important right because because we're working with 1 meter and 0.6 meter resolution imagery and
Then so we also explored the extra trees classifier So like random forests, it's a multi tree predictor built using an ensemble of decision trees
Class that this classifier splits the nodes of the tree completely at random It uses the entirety of the sample and not just a bootstrap to grow trees This means that there's it's each tree is independent or uncorrelated to the very last whereas oftentimes random forest you can get a
Some some correlated data there And so again, we we went with this because it has a higher bias and lower variance standard random forest And it's suited for noisy or highly correlated data sets In the noise in particular was a big consideration due to the
spatial resolution of the data we were working with so then with that we had to decide what we were going to Classify, right? So as I mentioned the nape imagery comes in
Two products we have the standard RGB product and then the RGB plus NIR product So with that we wanted to test Both we wanted to see if using just pure RGB was Just as viable as the NIR
Index, so so with that we chose the visually atmospheric resistant index and the atmospheric resistant vegetation index So just for those who don't know the the near-infrared Is useful for vegetation remote sensing as it's absorbed by photosynthetically
active vegetation and lesser by photosynthetically inactive vegetation and is subsequently reflected by bodies of water and impervious services So quite useful Especially in Georgia where we do have a growing urban sprawl
And it becomes important to be able to separate that right? So the visually atmospheric resistant index Just uses only visible light bands potentially makes it more accessible More flexible in areas that that don't have in an NIR product available at a high resolution
and so the the blue band is incorporated into both of the index I'm going to show and this adds kind of A Proxy one could say for the removal of atmospheric effects without
Any sort of higher level processing to remove that? So far a formula and then we also normalized it between the values 1 and negative 1 classification and so these are just some examples of Of
The bari the normalized bari and ARVI which is the NIR you can see that the non normalized visually atmospheric resistant index Doesn't do quite a great job the black body you can see in the middle is a
Is a waterway this I believe is a wetland area so kind of a tricky area to classify as is But you can see the normalized bari is a little bit better, but the the ARVI Very clear that it becomes Better at separating that those values without
Any sort of say ensemble method using data fusion? So then on to the atmospheric resistant index so as I mentioned it uses the blue band to Simulate the removal of atmospheric effects it works very well with the literature was clear about that
Yeah as we show this so that what the previous slide was a subset of this current image here You can see that even throughout it that that water body throughout the wetland. It's very clear that it's
Separated you can see the difference between farmland Even some spots within the wetland where there's forest potentially That that are dying which is a whole other issue So we also enacted additional image processing steps for our output
Just a simple local statistics image processing just a Gaussian five by five medium filter Really fast as well that was implemented with scipy So very negligible computational overhead added with this
And so the overall workflow Can be seen here So I've spoken a lot about nape imagery and initially this product or this this work was designed around specifically to enable classification on nape imagery for the entire state
And So but I wanted to try to make it as modular as possible to be used with any sort of remote sensing data So on the bottom half of here, we have it set up to after pre-processing You can use on individual files In the top half you see pre-processing Classification then post-processing functions are included, but then there's a whole suite of
functions and methods to enable efficient processing of nape imagery and so from start to beginning you can input your configuration file parameters to hopefully make it more reproducible and
include testing And then within this as well. I didn't mention this but also we utilized scikit-learn's hyper parameter tuning which utilizes a grid search And that's offered as well kind of packaged for this specific use case
To try to find the most efficient and Most efficient most accurate parameters for the random forest classification So then again, I've talked about Georgia for those who don't know this is what Georgia looks like
It's where I was born and raised and it's where my family's from. So it's On the mind, but we used it as our case study primarily because we were already working within it and we had a Existing data set that we had created using a different software that we had already validated We already had all the raw data
So it was It kind of was natural to use this So we chose a couple physiographic districts within the state And again, I mentioned why Georgia's important it has a very high biodiversity You can go from the coast to the southern Appalachians to farmland in between it to one of the most populated cities
in the world in Atlanta So really varying Area so challenging as well to create accurate data sets
So then We ran the workflow as I as I have gone over And and a big thing for us was we wanted to look at the time Of what of what was created compared to the existing This was also
Relatively easy to do because again we created this other data set in this additional Seditional proprietary software so you can see on the the several graphs here that Utilizing the the pipeline set up from GDAL to do all the the pre-processing
All the way through scikit all the way through the the post-processing to the to the smoothing we have a We have a clear Very clear time advantage here. It's important to know that
Feature analyst it's not necessarily an apples-to-apples comparison Based on the algorithms feature analyst does use an ensemble method so of course, there'll be more processing steps with within that but again if we can get Comparable results for less and less time then to me that is worth it
So then here's just the times as well These will be in minutes That's an important note and you can see based on area and then The times for each step, right? So we have the index creation Very quite quick and then we also ran
Extra trees and then the base random forest classifier for each in compared to times to feature analysts and so then beyond the the out-of-bag Kind of metrics that scikit-learn Provides for for uncertainty and accuracy analysis. We also wanted to
try to quantify it against our The The product we had already created with this other software So for that we we implemented the moving window comparison coefficient in Python
And it basically it it's a spatially aware comparison index for categorical raster data sets so With that we're using categorical raster data sets. So it's a prime example of how to use it when to use it It's simple you can go more into the literature about that the the paper it was initially
It Was initially created by Robert Costanza in 1989 he has a great paper about that So here's just some examples So we for this is just a selection of 20 tiles for comparison and
We that the mean for extra trees was in 87.56 similarity Coefficient score and then for random forest was an eighty seven point six two similarity And then in the top right you kind of see from beginning to end you can see the the raw data
Extra trees outputs and then the feature analyst outputs and you can see in that second column one one thing that was noticed was that our Pipeline performed better in areas of dense forests
Then the other data set had without any sort of post-processing Which is really important to us as well. So Yeah So just some considerations
This was conducted about two years ago while still an undergraduate So had limited resources in time this project Was it was a learning opportunity for me as well? So with the sort of resources and technical skills I've learned since then I
Think the the product would be more robust than it currently is And I would certainly try my hardest to make it more robust Yeah any questions so Great presentation. We have
Some questions here. I put in the screen for great. The first is There is only four bands are sufficient in your case Yeah, so so that's a good question that that was a discussion that that was had quite a bit
Except that is a trade-off with nape imagery, right because we can use moderate resolution imagery say say landsat or At the time HLS the harmonized landsat Sentinel data set wasn't quite as robust as it is currently But We wanted to have that that really fine scale spatial resolution
Classify with so it's a trade-off But the four bands I believe will be res will be Sufficient because oftentimes with vegetation sort of remote sensing you're going to be using the the NIR bands in
Anyways that those will be your main Wavelengths that you'll be looking at and analyzing If we wanted to do say more different feature extraction Say say we wanted to potentially be better at removing water. We could use the squir bands that that say
More moderate resolution imagery would have but now Since then, I think this would be a great use case for say planet data which offers probably better wavelengths in more
more kind of So what I'm looking for Updated data, right because it's it's almost near daily data for that monitoring paradigm But yes for bands sufficient in this case Okay
We have another question Is possibility for extending the library to use in another biomass? Yes So that's a good question I was focused a lot on the state of Georgia Because because that's where we we had the data for the tape imagery. I think it's approximately
39,000 data tiles so There wasn't any sort of robust testing for other biomes Georgia itself has has
Quite a few biomes there was testing and kind of emergent wetland areas along the shores within the inner city and then within the Foothills of the Appalachian Mountains and then the mountains themselves, but there was no testing say in areas
that Perhaps would have snow And that's also a symptom of the the nature of nape imagery it's taken during the growing growing seasons So so we weren't really able to test for any different say like snow masks or anything like that
So that's good, that's a that's a great question and a consideration for the future Okay, so if we don't have another questions I Say
Good presentation and the people stay in the shed. They loved your presentation. So it's a very good so Do we have some minutes if you can say something or invite the people to know more about you? working and share your
Contacts Yeah, so Hopefully a lot more work coming from me. I just started my PhD this this past month or so at the University of North Carolina Or no, North Carolina State don't don't tell North Carolina State. I just said University of North Carolina
But yeah, but you can find me my my you can email me at my email. I'm on github My username is ocsmit I Fairly active with the grass community as well. So looking forward to contributions there. Yeah
Okay. Thanks, Owen presentation see you soon in the In the social gathering, I think so. We made a little Breaking for five minutes for another presentation so you can drink a water
Drink of comfort and you be here when the my my Another presentations. Well, thank you Owen. And yes, thank you Bye