CANNABIS VILLAGE - Identifying sick Cannabis with AI

Video in TIB AV-Portal: CANNABIS VILLAGE - Identifying sick Cannabis with AI

Formal Metadata

CANNABIS VILLAGE - Identifying sick Cannabis with AI
Alternative Title
Diagnosing Sick Plants with Computer Vision
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Harry Moreno is a professional Web Software Engineer and a Machine Learning Enthusiast. He has spent most of his career developing web applications for startups in Healthtech, Hospitality and Advertising. He was built machine learning applications for radar sensors, cell segmentation in images and audio identification Using cutting edge approaches in the field of Machine Learning and Computer Visions, we’ll take you through the steps for building an ML model for classifying sick cannabis plants. This is a feature present in many plant or cannabis monitoring systems, but why pay big bucks for big licenses when it could be freely available?
Software engineering Identifiability Closed set Virtual machine Set (mathematics) Distance Usability Digital photography Machine learning Process (computing) Self-organization Smartphone Endliche Modelltheorie Quicksort Computing platform
Domain name Context awareness Standard deviation Film editing Term (mathematics) Model theory Expert system Smartphone Online help Endliche Modelltheorie
Predictability Machine learning Virtual machine Set (mathematics) Drop (liquid) Wave packet Data model Digital photography Process (computing) Software Iteration Software Process (computing) Endliche Modelltheorie
Digital photography Computer-generated imagery Planning Website Bit
Medical imaging Digital photography Randomization Computer-generated imagery Virtual machine Object (grammar) Endliche Modelltheorie Quicksort Social class Wave packet
Data model Service (economics) Latent heat Artificial neural network Sheaf (mathematics) Heat transfer Endliche Modelltheorie Instance (computer science) Wave packet Library (computing)
Convolution Greatest element Validity (statistics) Weight Model theory Objekterkennung Resonator Heat transfer Wave packet Wave packet Residual (numerical analysis) Data model Medical imaging Digital photography Software Prediction Personal digital assistant Energy level Endliche Modelltheorie Resultant Social class Computer architecture
Predictability Data model Medical imaging Inference Point cloud Wave packet Product (business) 2 (number) Point cloud
User interface User interface Demo (music) Computer file Open source Multiplication sign Computer-generated imagery Expert system Web browser Digital photography Medical imaging Iteration Software testing Endliche Modelltheorie Quicksort Computational visualistics Sinc function Computing platform Social class
Predictability Source code Expert system Maxima and minima Set (mathematics) Water vapor Medical imaging Digital photography Website Natural language Quicksort Endliche Modelltheorie Computational visualistics Freeware Computing platform Social class
Point (geometry) Computer file Link (knot theory) View (database) Multiplication sign Set (mathematics) Water vapor Coma Berenices Wave packet Number Medical imaging Machine learning Different (Kate Ryan album) Energy level Endliche Modelltheorie Social class Augmented reality Projective plane Machine code Type theory Digital photography Process (computing) Software Website Quicksort
we have Harry Moreno he's gonna talk about how one would identify sick cannabis plants using machine learning ease your mind good hello hey keep it close lanes Harry alright thank you so hi hope your hope you're enjoying Def Con and so we'll begin this is a machine learning for sick cannabis how would you build a model to identify sick cannabis I designed this with with the intention of people being very practical so folks could take a photo of any sort of cannabis plants at a average distance from your from your smartphone so who am I am a software engineer I don't I'm not a data scientist that's not my job title I'm actually a software engineer but I am interested in data science and and machine learning and I thought this would be a pretty interesting problem very practical problem to solve from New York I am an organizer at cackle New York City kaggle for those who don't know is a data science platform where people compete from around the world to be the best data scientist on interesting data sets so if you're in New York and want to do hands-on machine learning please join us what's the background
here so some context on what's possible what what's being done with AI in 2018 artificial intelligence is becoming more pervasive and accessible to examples in radiologists and interpreting x-rays we the cut the say of the art is that it we've built models that are as good or better than professional radiologists in terms of accuracy as well as I think last year a standard Stanford University published a a model for skin cancer diagnosing skin cancer that could run on a smartphone so that's what's possible and so what what
are we trying to do we are trying to see if we can build something similar but for diagnosing cannabis the intended users are hobbyists people that don't aren't doctors aren't plant doctors and they just want to grow a plant industrial players so if you have a cannabis farm you're very keen on knowing if your if your crops are sick and the other problem too is that diagnosing a plant is a requires a domain expert so for example I'm not a domain expert so in diagnosing plants so I would want a model like this too to help me and so that's our goal we want
software that tells you if cannabis is sick by using your phone so how do we
make this how do you make a model how do we make a predictive model the machine
learning process is this you gather data you train on examples you build a predictive model on the examples we we launched it we publicize it so that people can use it and then we iterate so one of the things that we can do once we've deployed it is that as people use it as people upload photos and to get their their predictions that grows our data set and the goal is to make a very very accurate free predictive model so for step one
gathering data we myself and and some other people in the community health helped help scrape for sick cannabis pictures of sick cannabis and there are websites that do this there that you upload your your pictures of your sick plan and other people tell you what might be wrong like if it's purple or if it's yellow it means different things and to solve them they have different solutions so we built a scraper to collect sick cannabis photos
similarly we collected healthy cannabis photos this was a little bit easier because people like to show off their their healthy plants and and how old are doing so that was easier here we
collected data for a dataset that would trick the plant so if our goal is to build a a model that can tell you if a plant is the cannabis plant is sick or if it's healthy another a very practical concern is if I upload a photo King can the model tell me if there's even cannabis in the photo so I added this third class that was basically composed of mean photos the Caltech object dataset with random everyday objects and a lot of pictures of flowers and plants to see if the model could learn to what what if the model learned how to identify what is a cannabis plant first and then if it's sick or healthy so this is sort of the design of the of the training built this other images citizen so after
at the end of gap this stage of gathering the data we ended up with three thousand images one dozen sick one dozen healthy one thousand other this is actually a very very tiny data set in terms of for doing this machine learning stuff and we'll see that it how it performs if you want to follow along
what I pretty much did was I read section 5.3 and deep learning with Python here they go over the documentation which helps with mutating your date your data set so that you can get more mileage out of it as well as Trant as well as transfer learning which allows you to begin building your model based off of like their cutting-edge research that other people have published publicized so for practical
concerns I work off of a fairly old MacBook and it has a GPU but it's just very very old so I used Amazon Web Services there sage marriage maker offering which is a one-click deploy so a GPU instance and you get all of the libraries installed the one GPU instance is a dollar fifty an hour the for GPU is ten dollars an hour so if you want to do this and you don't have a GPU this is how you could do it the specific
technique that we that I used here was transfer learning which basically we take the image net winner so image that was another problem that was very very popular and now it's considered a solve problem which was which was the the challenge was object recognition so that challenge was you're giving tons of data and there's 1,000 classes and your model has to say what's in the photo this is actually considered a solve problem now and there are plenty of pre-trained models that you can build your own models off of and that's what this transfer learning technique is we basically take the so the full imagines of the from from top to bottom is the image net solution you would take that remove the top layer and put in your own and that's what you're training so these lower layers are reusable feature detectors that you can use and so what
are some of the results there you try to two architectures broadly there's this resin f50 resonator for residual network I don't want to get into the details but I tried this one first because this is the one there reached human level accuracy on the image net competition and the accuracy that we got when building our model was about 60 percent validation accuracy and then I tried another model another architecture called vgg 16 and this is coming out of Oxford and this one achieved 80% accuracy and this is pretty much what I ended up deploying this second architecture there it's I want to figure out why this is the case in everything online tells me that ResNet should have been more accurate but it seems like we just need more data so if we had something like 20,000 or 50,000 images I would like to reach try this again with resna and see if that could get up to 95 99% accuracy but for now with what with
3000 images we have a ouija 16 and then
we deployed it to the cloud this is running on on ec2 there's no GPU in production it takes about one second for for inference to happen for the prediction - - free to get a reproduction and it's built with with Python tools like Karros flask and that's about it
we built a user interface and you can check this out right now chronic sickness calm and I if there's enough time I'll demo it but it's there it's live pretty much you upload a file from your phone or from your browser from your computer and you submit it and you get a probability of what the model thinks is the outcome so for this example picture it's 88% it at so it all adds up to one the three o'clock classes so it would predict that that picture is of healthy cannabis and I encourage you to test it out help us make it better and we want
to iterate so this is just the first first step 80% isn't that great I feel like this this if we're putting radiologists since and dermatologists out of work this is this I think in my opinion would be a simpler problem so it's it's really just about getting more data three thousand images is way way too small we want to it's already open source we want to if the the the key problem though is getting good labeled data and that's very time insensitive requires an expert so we want to now build some sort of crowdsourcing platform where people can contribute when they have free time or as their interests brings them in and as they lose interest they can they can they can leave but we want to preserve that sort of work in a crowdsource sort of way so look out for that future work so the
problem of diagnosing sick plants is a three class problem not as it's it's it's it's hard to say like what what should be more difficult or what their less difficult but for example image that was a 1000 class challenge and statistically if you just made a random that guess it'd be pretty hard to get the right guess it when there's lots of classes so these other problems if we could classify the exact disease of a plant is much much more difficult than just what we have now which is sick or not sick or not cannabis and then it might be fun because there's lots of sites like this already to try to predict the the strain of the the cannabis just for fun and that is many many well one one source told me that it was about eight hundred strains so that's arguable as well but be pretty interesting to build something like this with no access to the the plants genetics it's purely its outward appearance and seeing what what computers think what sort of strains they discover them so and that's my talk
it the sites available at chronic sickness calm and if you want to reach out to me that's my site Harry Moreno calm and if you want to help if you have access to a large data set of labeled cannabis photos I really want to speak with you other than that I I want to build the crowdsourcing sort of platform so that people can contribute and mmm let's let's make a a free predictive model for cannabis disease that's that's free for everybody that we could all build together and it benefits everybody so that's kind of what I want to build and that's it [Applause] so I can take questions bye-bye my clock I have 10 more minutes but if people want to yeah I guess questions any questions yep okay so I think the question is do you do we want to build something that is for ailments other than diseases like lack of water and yeah so actually when so maybe I'm not an expert in cannabis but when I said disease I was actually going I was actually including those sort of ailments because this breakdown of 40
plant diseases is actually from grow etz calm and so we see things like boron deficiency for example light burn which isn't a disease but it's more of a issue that you'll have if you are just doing this in your closet or something so we could build both but I think for for proper diseases would it would actually be more difficult you would actually want to get some sort of botanist I think from a more practical point of view it'd be much easier to get collect data on those things that you that you did mention so practical things like is it lacking water it doesn't need more sunlight and is it too close to the generator that's one that I learned in this project so it's sort of like it's sort of like the salt the software sort of meets you in the middle so if people want to build certain things or we have the the talent we could certainly build a proper disease one but I think it'd be most eat just way easier to do the common ailments in the back that so the question is does the does the Z does the bottle yeah chronic sickness that come it does not do this this is this is for future work its we have a model that's about 80% accurate on justice on you if it think if it's sick or not and that's first first we want to solve that problem it should be in my pain and what I think that if we should be able to build a model that's above human level accuracy whatever we could debate about that what that means exactly but we should be able to build that for sick not sick and then once that's all we can tackle more granular problems like the specific ailment and the the strains just for fun so we don't have an API yet it's just the chronic sickness com where you can upload a photo and there's a link to the github so you can see the code but the model is not publicized it's it the model itself is about a hundred inside megabytes github doesn't have support for large files so it's not there but we can we can speak offline about that if people want an API we could we could build something like that one more question yeah so the GPU really helps with the specific technique called data augmentation and so you can do that for different types of data so in machine learning problems you might have audio data or text data and this in this problem it's image data and so it turns out that we can do random projections off that data like right rotating and skewing it and zooming in all of that is drastically accelerated with a GPU so in the book it says that if you don't have a GPU don't even try to do data augmentation and so so there's that and then the actual model training time with a GPU for the it's about an hour and a half for 3,000 images and I think that would grow linearly so yeah if we get if we get 20,000 images it might be like six-hour jobs that I think so I think the model training is more in it grows linearly with your data set size not so much with the problem the number of classes you're trying to predict so and so if you have any more questions the
the site is chronic sickness com you can follow through to the github and find my handling you can send me emails if you want if you want to help or if you have suggestions thank you