Citizen Science with Python

Video in TIB AV-Portal: Citizen Science with Python

Formal Metadata

Title
Citizen Science with Python
Title of Series
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this license.
Identifiers
Publisher
Release Date
2018
Language
English

Content Metadata

Subject Area
Abstract
You could make a difference in the world with a little science and Python. We'll look at several data-driven humanitarian and healthcare projects developed using Python and, all going well, run some audience experiments. By the end of the talk I hope you'll be looking to run your own experiments with the scientific Python stack.
Complexity effective time demos section set open part program Mathematical diagram model Link ARM moment sharing measurement connections types photo organization website rights results read recording point page slides App Link machines logs events field number Pi's networks Lie form domain focus demos inherits Bracketing projection Graphs consulting core basis Limit Theoretical Computer Science environmental Case cloud route
satellite choice bottom presentation graphs Net view time unit ones surveys set com open part Trace URLs Mathematical Dog signing single core negative information job sources collaboration satellite software developers page moment Store effectiveness bit open Runde model connections types job rights results recording point filterer laptops slides control App histograms file factor open source Link collaboration events elements hypothesis number versions periodicity causal conditions kit noise Standards graphs information interface Net projection analysis statistical correlations evolution Limit single causal events life factor family
dilation Net time Plotly unit curve Datenanalyse part Trace URLs Dog mechanism maschinellen Lernens different box vulnerability Machine learning curve Exponential simulations Decision Closing PCI effectiveness bit dilation connections types Folientastatur photo different rights sort results read recording point track slides statistics control factor Link machines drop number hypothesis second periodicity terms operation level Lie DoT Net Graphs projection count total lines single environmental labour
point DoT graphs Net unit machines Stäbe lines number strategies different interpreter results
build dynamical systems wavelength Net Decision time demos range function Perspektive maschinellen Lernens strategies single video patterns essence diagram job website Systems areas track Store Computed Stäbe bit photo mean job samples organization testing rights free curve fitting results meter point track Real image machines SDR raw field spreadsheets goodness propagation operation Graphs flowcharts testing Datentypen DoT demos help Net Trackings projection databases signal processing transmit carry number environmental Trees video spectral
areas Trackings
frame free Link unit rates solitaire video essence Space testing shadows website Datentypen areas view help file Trackings Computed drop bit favourite tens profile samples este boom website video route
free Link Net time Decision demos unit break part second periodicity hardware Space testing website kit Datentypen areas track demos map Trackings projection Computed drop bit favourite hardware Trees testing patterns video results
parsing code graphs Net front end outlier time Plotly forty demos range surveys Medians set parameter clients Perspektive odds geo Dog different kernel Kreis information Systems family chi P2P Overlays man view Lasten file Computed Infinite parameter range bit Medians sequence Runde mean numerous job samples periodicity rights alpha results recording point laptops slides file response TLS instant messaging cells Extremwerte Store content Plotly number spreadsheets Console norms domain histograms comparison multi Standards distribution response information demos neural network servers Net interface surveys Graphs interaction code variance Computernetzwerk skew volume 8th frame chaining environmental Query Case socket
write types Mathematical Headers messaging effective environmental time organization events address
so so welcome and I'm really happy to be here this is a huge honor to be given a
chance to provide a keynote I'm always pushing data science on everyone so my goal this morning is to try to educate you and convert you into the field of data science and then bring you into my meet-up and the PI data but the general data science world so I'm an engineering
data scientist if you're outside of the data science world you would have heard a lot about data science it's one of the sexiest things at the moment that it gets all of the the mind share many data scientists come into the into this world via PhDs and postdocs that's not me I came in via a theoretical computer science background 15 years ago and so I've taken the the other route into data science and I'm talking to you today more as engineers coming into this world rather than academics coming into the data science world so I work I've been running my own company for nearly 15 years I coach I train I act as an interim senior data scientist in teams that are lacking a senior and one of the reasons that I do that is that I love to learn new things so I keep challenging myself to go off and work in new domains a couple of the talks today will be medical focus but I've worked in a number of domains and just because I really enjoy learning new things and then I love sharing those new things that I learn and where possible I run experiments including upon my wife as you've already heard and and then see what I can learn so along the way I've written a book on high-performance pison with my coaster Misha who was a bitly and his has moved on to cloud I think I found it the co-founded the PI data London meetup which Alexander has mentioned so I'm really proud of Pi data we've got over a hundred PI data events around the world and a set of conferences of the 100 my London one built with Marco who's here and a number of other colleagues is the largest in the world I'm super happy about how they've got seven and a half thousand members are love to invite some of you to come and join us pi day to London but also we've got a pi day to Edinburgh we've got a Pilate at Cardiff in the UK there's a number of pi daters throughout Europe there's a point a to Frankfurt I'm reminded you really if you're interested in this at all it's a very friendly and welcoming community it's all about all the nice things about the Python community with people who like talking about data so if you're at all into that go and join I can sort through my model insight and I work with companies like Mitsubishi finance Bank Channel four I've been working with QB it's a very large insurer helping them figure out how to apply machine learning it into their insurance so I work with large companies where there are large projects they can be very slow they can have a big impact but they're their big corporate things that's not what I'm talking about today so I'm gonna give you some stories on citizen science these are small either individual or lightweight projects inside various organizations there are public projects I'll be giving a crowd LED demo I mean I'm sacrificing chickens to the demo God's here I'm giving a live demo with Jupiter lab so that those of you who haven't seen this the cheapest live environment before you can see how a data scientist might work and you might be inspired to go and try this as well and I need you to participate with me I'm going to be sharing a link for a Google Form twice no login required you just visit the link and then there's a single question you type in an answer you hit submit so you can do that on your phone I know that the Wi-Fi in here might be a bit tricky but the form is very lightweight it works over your 4G connection so for anyone who's got a mobile phone I would very much like you to take part in this little experiment and you just go to this form twice type in a number hit submit and then having done that you can submit another arm so if your neighbor doesn't have a device you could pass your device over and make it to submit their answer as well as all of the months no logins no complexity there at all the there are two appendix slides as well that I'll show they've got all the links for the talks that I'm using these stories in here so you can follow up and learn more about them so first of all I'm going to talk about Macedonian air quality has anyone been to a city with really bad air quality before okay so a bunch of us have so when I was at high data Hamsterdam about six weeks ago I met a chap called Gorgon I think Jovanovski I think that's his name and he told the story of the the Macedonian smelly folk and as he tells it every year the smelly folk descend so this is not an unusual photograph of a strange cloud layer in the city this is the smog in the city taken from above looking down and this is what the populace lives in you can see some of the skyscrapers just peeking through at the top there so this bad weather descends for many months of the year every year it's a known thing everyone just says that's the smelly fog there'll be rains later it'll clear it'll be ok in between the government issues warnings you know anyone with breathing difficulties or anyone who's a baby maybe shouldn't leave the house today because the air was particularly lethal and then they changed the limits my wife and I when we lived in Chile we had a similar thing where the government would change the red levels and increase the limits for what it meant for a day to be a red day when you shouldn't leave the house but it's kind of terrifying when you can't see down the street because the pollution is so thick um Gordon took he was learning programming at the time so we took some government open data about pollution measurements and he was learning programming so he wasn't very confident in what he was doing this was about five years ago I think he takes the data and he draws some graphs and he knows he's made mistakes because when he draws the graphs the numbers make no sense he takes it listen it's an undocumented dataset so he hasn't got any guidance there but every time he's drawing these numbers it's crazy these numbers are significantly higher than anything he sees around him he does some reading online these numbers are consistently four times higher and the bad pollution that he's read about in Beijing and 20 times the numbers expected in the EU at the worst possible case and these are the daily readings that he's experiencing and after a while he realizes oh they're true these numbers are actually correct and he's the first person that he's found who's playing with this data set so this is really awful right there's there's this killer air 20 times au pollution limits no one in a country is talking about it he writes a webpage graphing these results and then he finds some other people who care about this topic at all there's a lot of people who turns out a macedonia who care about the fact that they're being poisoned on a daily basis so we find some people and they popularize these results and within a month they've got a million people consuming these results first off a webpage and then off of a mobile app and it gets to the point in that diagram on the right there that's a member of parliament holding up printouts from the website in Parliament discussing the fact that there's this issue that transcends any nationality sex education wealth bracket this is the air that everybody every politician every child is breathing and it's killing them all and maybe this needs to be discussed and the incumbent government doesn't to discuss this because this is a bad topic and then the government that wants change is talking about it and using this to generate some action there was a an interesting part in Gordon's story where he talks about how the the Minister for ecology and who's goes online I think goes on to the radio and says it's all lies the data is wrong it's all a lie it's a conspiracy so Gordon comes back and says look I'll come into the government with my dates or compare it to your official dates it will compare to your paper records if I'm wrong I will apologize delete the app and remove everything and if you're wrong you resign and then that was the last they heard from the Minister of ecology and I asked you all to know how
did you get the data if the data is bad why would the government release it Satar well Macedonia wants to be in the EU the EU provided these sensors a requirement of having the censuses the data has to be published the data was published not documented but published so as a result the data was made available but then just not pushed not documented not investigated in any way and it took someone like Gordon to go and do something with it so they're improving upon this now they've gone from this single dump of data to frequent updates a drove government policy change during the election the new the the the change in government were promising rapid evolution in air quality standards they won the election nothing changed after that and so this is clearly going to be an ongoing slow process but some things did change using the the mobilised population they try down a highly polluting incinerator turns out British supplied highly polluting incinerator some things that we in the UK got rid of over a decade ago because we wouldn't meet the EU limits with this and the Guardian wrote about this in 2001 so this was gifted out and in as I read it's in fairness it was better than what was available at the time out there so it wasn't improvement it just should have been better but as a result of highlighting the problems with this unit the fact that it was running 24/7 rather than within strict timelines and we didn't have some filters on it and they got it fixed so it wasn't far less polluting the big step up they're doing now is a collaboration with the European Space Agency and the Copernicus project looking at real-time satellite data which which doesn't depend upon where a government places sensors which may or may not be in a sensitive location but take a satellite data which is which CSF we think and they're beginning to analyze that to drive a further change so what can we learn out of this for that they're a simple lesson here is draw a graph of unseen data so if you can find the data set that no one's looked at and there's a lot of open data I've got links to that in the appendix here go and find some data that no one's drawn before and draw it and you can draw it in Excel if you want the lot of this stuff it's really easy as CSV files typically draw the data and tell a story find some people if you want to try and get some change around this and then see where that can take you this is a really easy entry point into data so it's just getting some data and drawing it and there's a additional slide here I've just put in a month ago PI Londinium my colleagues Robert and Olivia talked about this personal air quality project they are working on a Raspberry Pi device with a low cost sensor that kit up their cost about 60 pounds they can use it for monitoring in the house there's the infamous dirty sausage store in which you can see told in that presentation if you go and watch it and what they're doing is they're mounting these sensors on the backs of pedestrians and cyclists so as they go around they're monitoring their own personal air quality as they travel around the street so they can make better choices about the streets they're taking and the pollution they may or may not consume and there's talk this afternoon by Douglas Finch from air quality and Pais and then if you're interested I'd suggest you going attend so here's the first first audience participation moment so I would like you to guess the weight of my dog and you know nothing about my dog so this is a wide-open simple survey there's if you go to that bitly com keynote 801 and you can guess that the second one will have a two on the end but don't go there yet so billy donkey no 801 there's no sign and you can go through on your mobile phone that link will appear on the next set of slides so you can go and go in there in the next couple of minutes please guess the way of my dog in kilograms only put in a number so if you put in any text to get stripped out and I'm gonna give you no information right now later on I gave you some more information you make a second guess and then we'll compare those two sets of results in a Jupiter notebook so just a number no negative numbers kilograms and nice round numbers or low numbers nothing too crazy please don't be the clever person when I gave this last time who types in na n to see if my part in blue team's work they do but there's no need to test this kilograms kilograms please yes when I run this last time I left that deliberately blank and immediately an engineer that I know dived in with my my requirements for units and I love that but janna kilograms only so we've mentioned my wife sneezing she's here in the audience and and I love the fact that she supports me in running these experiments upon members of my family including her so my wife sneezes a lot and when I say a lot there's a histogram in the bottom right hand corner we wrote an app that records where she can record when she sneezed so he just record every time she sneezed so the the left-hand bar at the days when she sneezes 0 times is over the course of about a year so there are about 35 days when Emily didn't sneeze at all during a day the next bar is is when she sneezed one time in a day well there's about 20 times and then two times the day three four times she sneezed about 40 40 days about four times a day over the course of a year the far right side is 28 sneezes in the day that was a particularly bad day and the quest emily's it was a mobile developer at the time could she write an app an open-source app that had benefit to other people's suffering from different conditions but a generalized app for medical personal healthcare and could I analyze the data to see if we could find possibly correlated possibly causal connections between events to see what might drive this sneezing so we had a hypothesis there are environmental factors that drive sneezing if we record all of these factors can we do something about it so this is the app that Emily built it's an iOS based app open source it has event log so the simple button interface you can just tap when something has happened you've got a runny nose your eyes actually particularly I've sneezed I've sneezed we talked about could you use the the device to automatically record sneezes so you get a physical jerk and get a loud noise that will be quite a lot of work I didn't wanna go quite that far tapping a button was easy enough for the first version of the experiment but I can see lots of ways you could automate elements of your butt or personal personal reactions collection over time if you suffer in that way which we might imagine seeing in future devices oh it's a hand up oh yes for the for the survey please just one answer per person and then there'll be the second survey where you put in a second answer later on thank you and so it's open source app editable history record GPS traces I will say one thing there with the GPS traces I take periodic updates from Emily and then I do the analysis it was really weird realizing that I had the same kind of view that Google and Apple halves watching a person's movements over time it was incredibly intrusive and of course it is incredibly intrusive I've got it lagged but nonetheless you get this view and it's a view that Apple and Google and any other controllers of our data or any mobile phone company has all the time and if we aren't looking at that we never think about it we kind of just take it for granted but one actually had it in my hands it's pretty weird to have that so one of the reasons I encourage people to run these kinds of experiment is it makes you think a little bit outside of what's normal to you in your everyday life and how you're interacting with the world or the data that's available so be gathering all this data and there's a number of things we've gotten out of here I've given a couple of talks on this this is just going to show one little result here so he'll be looking at a single patient antihistamine effect so Emily sneezes a lot she takes antihistamines roughly every other day on a day when an antihistamine is take and what is the effect we're clearly Emily you think she
needs the antihistamine she's sneezing she already feels like she's sneezing it's a day with high propensity to sneeze so what effect does the drug half so on the left-hand side we've got all of the traces for when individual sneezes have occurred so it's a period of 12 hours after the first anti or the one antihistamine of the day has been taken so when the antihistamine has been taken when Emily sneezes she's tapping away but she's already recorded an antihistamine has been taken so if I take those days and then say at the zero hour when an antihistamine was taken count all of the sneezes and then we just get a single count that's the blue line on the right hand side so hours zero and one after an antihistamine was taken the sneezes are high they're close to fifty two hours after the antihistamine was taken we see a marked drop number of sneeze is the total number of sneezes over all of these antihistamine days is markedly lower and it stays low for about eight hours and then it increases again and we might ask what's driving that behind it so the dotted line behind that's just an extrapolated line that I put together and I know that the antihistamine that Emily was taking at the time takes about two hours to have an effect to enter the bloodstream and then it has an exponential delay curve decay curve so that it drops off with a certain half-life and so I can plot that extrapolated line based on a simulation and we see that that two-hour point is when the sneeze is dropped down and then as it decays to a certain point the sneezes pick up again and of course this is a general result but this applies to everyone in different ways based on personal biology so based on the kind of medication you're taking you might have a different reaction in a different a different effect it might last for days it might last for only hours this particular drug other drugs might work in different ways better and worse ways so here's a nice simple way to record the data and see how it works for you to improve your own personal health care now I had the strong hypothesis that they were causal factors in the environment that drove the sneezing and I worked awfully hard and were really hard with a couple of colleagues really really hard trying to find any evidence of this causal connection we couldn't we found one results and there was a weak relationship with humidity as the air got drier the propensity to sneeze increased as the air got damper propensity to sneeze decreased and it turns out your nasal lining that the mucous membrane in the the nose when its drier it's more irritable and so it's more likely to be sneezed all things being you're more likely to sneeze all things being equal just cuz the nasal lining is drier so we can't control humidity but it is interesting at least to find a proper result in there now we escalated this took it to a king's college professor one of the top professors in the world connected via our PI data London community they said this is an amazing results Oh clearly this is a non allergic reaction going oh it's as chronic and persistent rhinitis so emily is primed to sneeze just because that's the way her body is working and there are no environmental factors we had data for different countries and different seasons different allergen types in the air were what kind of travel we were doing at the time London Underground buses all sorts no connection at all with any of that he did suggest the new treatment which we tried we didn't get any improved results out of it so it had some benefit enrolled at another treatment method I mean the antihistamine works just fine but it was we were looking to see if there was a better solution here but the important takeaway here is graphing was enough to get a diagnosis and the machine learning it did give us something new but you don't need to go all the way through to machine learning in a data analysis project typically getting good data good enough data and drawing graphs and having someone who can interpret it is what you need and that's the key takeaway here and I'm going to repeat that little bit more and you might want to see Marco BOM Santini's lies damn lies and statistics talk in a couple of hours time where he talks in to some of the issues around data analysis if you're interested in this that might mean if you want a little step forwards ok second guess the weight exercise so
I've got an English Springer Spaniel you get some sizing evidence there from some of those photographs the photographs appear in some of the subsequent slides and then there'll be this second link bucum keynote ADA to just go there and give a second guess for her weight in kilograms so a number only I'll let you look at those photos as lovely photos for just a minute I'm gonna move on but you'll see a few more photos so you can make a guess in a minute or two if you want when you've seen a few more so one that's my my dog who clearly up great with sensors as well that's video camera on her back as one of the experiments so updating outdated medical results this was a talk given at PI data warsaw last year by anna so here she's looking at updating outdated medical results it was a really nice lightning talk I didn't realize this it turns out that in birthing centers maternity units when a woman is coming up to giving birth there is a critical curve developed about sixty years ago by Freedman and which is used to judge whether the woman is on track to give birth based on time and cervical dilation at ten centimeters the baby is ready to come out and so you want the tract of the cervix is dilating appropriately over a period of time and if the woman is progressing too slow it's a failure to progress I think is the technical term then you need to intervene to make sure that the baby comes out successfully all hospitals around the world typically use the Freedman result from sixty years ago the Freedman result from sixty years ago was developed when we had different technology women gave birth to different ages women had different levels of health the drug intervention and mechanical intervention were very different and our understanding the bodies were very different and yet sixty years later we still used the same guidelines and it turns out increasingly around the world there is discussion about whether this is actually wrong and so Anna was part of a team looking into how this might be wrong and how it might be fixed and the important point is when when a doctor chooses to intervene because of a failure to progress that's that nice raise covers either drug intervention or perhaps a cesarean operation which could have significant negative impacts on the patient and on the baby and then the question is what do you need to worry about this at this stage or actually are we intervening too soon so she and colleagues conducted experiments opened did recordings on a couple of hundred I think there were first time and second time mothers the the link is in the appendix you can go and watch for the details they recorded the results of cervical dilation of all of these mothers over a number of hours about 12 hours I think and then what you see there with those box plots those boxes represent the majority of mothers readings each of those hourly bars so at the one hour point cervical dilation was between zero and three centimeters and then by the four hour points it's between what three and eight centimeters and then typically by say six hours at least some of the mothers have reached the ten centimeter dilation the baby has popped out and they're finished and then there are other mothers still progressing in their birthing this Center doesn't practice caesarean operations and drug intervention so they typically see all of their mothers through to successful delivery without intervening but there are medical facilities if that was required as an intervention lots of other hospitals follow the Freedman curve and intervene early if they believe it's necessary so the red line is the Freedman curve so if any mother is above this curve she's progressing either on track or faster than expected and that's fine if she's below that point and on the right hand side but those black dots below the red
line and when those black dots might be one or more mothers at that point then they're not progressing fast enough and that's when a doctor has to intervene according to the classical results but all of these mothers had no intervention and gave birth successfully and so this is one of a growing body of evidence being being gathered around the world in different birthing centers showing at this this intervention strategy is inappropriate or could be inappropriate and that some refinement is required to improve the quality of health care for these mothers so what do they do just
having having graph this and showing it well but in took an extra step and can they give an interpretable result that staff in the health care unit can use and they use the machine learning to
develop a decision tree so from a machine learning perspective this is as an incredibly trivial result it's a really really simple old-fashioned single decision tree it's not deep learning it's not big day - it's none of the buzzy things but this is an incredibly useful result this is interpretable by the staff the birthing center it's a flow chart effectively saying help me make a better decision than what's available in the textbook this is incredibly useful and so you can see if you're a first-time mother go to the left side and then based on your weight go left or right and then based on your height go left or right and then we predict how long it should take you to have the baby and then if if you're not progressing within that time appropriately that's a secondary bit of evidence to provide suggesting that maybe an intervention is required or actually you're on track you're under the time and everything is looking sensible still so this has been introduced to the staff there they like the idea of this and they want to do something with it and they're doing further experiments so what are our lessons here well check for outdated assumptions many of you work in organizations that are old their large old organizations they will carry lots of historic baggage maybe some of that baggage is outdated lots of it probably is some of it if you fixed it just by reviewing the data that you've got available maybe you could make better decisions so maybe that saves time or money or improves people's interventions or whatever the the metric is you want
to use people forget to go and check on these outdated assumptions they just become a matter of fact but if you've got access to the data because you've got access to a database or an Excel spreadsheet or whatever it is that you've got maybe you can go and draw some graphs and think about interpreting that evidence in a way that helps make better decisions and the one of the important outputs there is to make interpretive or advice don't make a really complicated system just because you could instead go and make something that is interpreted will by your colleagues one of the big challenges I've been talking about in the last couple of years in my public talks is around interpreting machine learning outputs so that you can go to a non machine learning collie you can explain why this system is saying a certain thing and that that flowchart there that decision tree is exactly that kind of output that you want so if you wanted to make a guess for a ders weight having seen some more photos now is about the time you want to do it you know 802 I think we've run out of pictures as I go on to the last little story so this story the last of the stories before we did a little demo where are the orangutan so my colleague Garson he runs the London learning meetup which I fall to my PI day to meet up but it focuses much more on the rather than the data science store is far more and specifically around machine learning and advances in machine learning it's a similarly large meetup very very popular hosted in the same hedge fund aho who hosts my Meetup we're both super grateful for that company for hosting us they're providing I mean at the meetup that we have and that Dirk has there about 200 attendees every month free beer free Pete's fully hosted which is the size of a small conference for free every month which is lovely that's a lovely example of community contribution to help us progress our own goals so Dirk runs this machine learning Meetup and he's got this personal project so some years ago he was involved in a company a commercial organization looking to track animals in the wild to see if you could intervene and monitor to provide better care for animals there in the world and that company didn't work out and he managed to acquire some rights to carry on working with the underlying technology and he found a charity who wants to work with this specifically around orangutan so it turns out orangutangs they are very bright to primates there they can be a pet and then they get bigger and they get less cute and then people just get rid of the pet they live in areas that suffer deforestation and farming and they could be mistreated and so you have aid agencies that's the picture in the middle they're going to rehome the animals that have been found and one of the problems with rehoming is you've rehomed and you know you've done that successfully how do you know that the animal is happy in the new environment has integrated and that your strategy for rehoming is is a good one and if you can demonstrate success you are likely to raise more funding and if you can't demonstrate success you've got a problem and of course you want to be successful with rehoming these animals into a nice environment so the way you do this is you take a little radio transmitter to the device on the right and you embed it in the the body under the skin you can't put a big tracker these are very bright creatures they don't want a big bracelet trapped onto them they don't have necks big thick stubby bits and whatever you try to adorn them with to survive years in a rugged environment with an animal who's not afraid to be a bit heavy-handed so they put these subcutaneous trackers in one of the problems there is there's limited range then you've got a radio tracker that gives out a weak signal and the way you track it is a human turns up to the point where they saw the animal yesterday that's their best guess as to where the animal is today and walks around with a radio tracker and if it starts beeping brilliant that means there's a signal within 200 meters of a dense jungle and they walk around back and forth trying to make the signal stronger and if they get a signal stronger hey they found the eight brilliance and if they don't well they try again tomorrow and at the beginning when they release an animal they have to their teams of two tracking 24/7 for several weeks and then it becomes more intermittent and then coincidentally they discover other animals that were released and they can start tracking them but it's kind of bitchy and its really time intensive so can we automate this so folks project can we use drones to automate this really sensible idea can you send the drone back and forth across the sky with a radio receiver picking up the radio signal processing it and then providing some kind of GPS locations really sensible idea turns out doing this on your own when you don't have a background for example in audio and radio signal processing and drone dynamics and automated flight systems and the like means that you take some time to build this up now Doug's a very smart guy he also works on autonomous self-driving vehicles at a large funded company so he does have a good strong background in engineering or buttocks but under less building a drone to fly in a jungle autonomously is a non-trivial operation and so if you if you were to watch his keynote talk he talks about the Python powered software-defined radio behind this and because they have to pick up the raw radio signals over quite a wide spectrum and then do post-processing things like the humidity in the jungle affects the signal propagation and the wavelength being used so they have to process to find these pings there is no simple API that just finds the pinging device they have to go and do the raw processing themselves when the when the drone comes back and that means then that you send this drone off it flies a flight pass it comes back hopefully it comes back and then you can process it to find out what it has recorded it's not a real I'm system which can lead to some problems so here are the results from one of their test runs they were releasing an orangutan called Suzy they knew where she was being led away by keeper to be released so they they took the drone and I'll show you some videos with this drone in just a minute they took the drone set it off and it starts flying in the middle top diagram you can see that the green diagram you can see those black dots you can see it's basically Tracey's flying up and down like flying up and down the field but it's flying over an area of jungle and then it gets to the bottom piece and then it flies straight back up on its returning home and then when it gets home you can you can process the the data now don't develop this in the UK so heathland in the UK very different to abort me in Jungle when they were there they discovered they had to fly the device lower because the signal quality was worse but nonetheless you can see areas of poor signal and then bright signal strong signal on that's where this orangutan was they had a successful test flight so then the question was well how do we take this further out and do some more work so I'll show you two videos there's no audio there should be audio but it didn't want to work so
you're gonna pretend that there's a buzzing sound then because that's all the audio really is here right so so here we've got that's the drone that Dirk is using and what recording it is a professional drone with a camera rig on this which is incredibly stable so you
can see this other drone in the background it's now going off on an autonomous flight to run just on our calibration run and so it flies off across this is out in the jungle but on the edge of the jungle you know a very safe area where they're developing and so this thing flies out and you know it's all very sensible and then I think
we see it somewhere in here you know see it so you can just see the shadow coming down the middle and then that's the the drone going down to land so it flies autonomously brilliant in a nice wide
open area then you get to the release site and so this is on the drone itself
this is Doakes unit it's flying up this is one of the test run because there's a test-run because it came back and they got the video so this one flies up you notice there the hole in canopy and then you noticed the canopy around no other holes so when this thing flies off it has to fly back to exactly the right spot to come down and land and so it's kind of gotta fly off on quite a large route so it's got tens of kilometers today are we flying off where there's no radio link and so this thing flies up and I think it flies around a little bit and yet you can you get the idea dead jungle you can't see the orangutan from the ground you can't see the sky so you can't see the drone but then this one comes all the way back
and then it comes in and it lands again and so this is nice it comes in it successfully lands everybody's happy they're ready for this they know there's an orangutan out there this is a a year ago and Dirk was out a couple of weeks ago for a second run but they're out a year ago they knew roughly where they wanted to send the this device so they say go the device takes off he's got a little signal tracker he knows that the device is in the nearby area it flies off they hear it go and then it's gonna go off for some period of time
fair enough and then they wait and they wait and they wait and then the signal tracker shows a bit of signal so this device is coming back and then there's nothing no wait no no no wait and then there's nothing and so they decide right we've we've lost the drone and it's quite an expensive big bit of kit it's disappeared in the jungle somewhere and it turns out by looking back at the maps and they thought they had a flat elevation as they were flying a crisscross pattern and then when this unit flew back it turns out there was some kind of no somewhere in there so the trees were higher and then the device flies through and it's not a smart device that flew through here tree probably crashed and then that was that and then actually someone's later it turns out the aid agency found the remains of the drone crashed into the tree and that was exactly what happens and they they sent it home so I was a disappointing first result but it did prove that this thing works and if you follow the keynote you will see that Dirk had lots of problems even getting lithium-ion batteries out of the eurozone - you lost some of them along the way they were captured by customs and once he was out of nowhere then you've only got so much kit you can carry and then something else breaks and then you have to start jerry-rigging parts that might just about keep it going so it's quite difficult to to keep this kind of thing working but they the aid agency funded another device and they went out and did a test run again I was hoping to get some video of that second run dope tells me that it worked better this time and he got the device back but he doesn't have successful results yet but they're going to continue with this project and that we links in the appendix if you want to read about that and follow where this project might go next so hardware is hard I mean harbor really is hard we've never done hard we're always hard but freeing up human time is valuable if you could free up those tracking humans who wander around with a radio device just listening to it and let them go and intervene more successfully and track more animals more consistently they can only make better decisions with that kind of result so if you tackle any kind of hardware kind of problem always expect to iterate a lot so always break it down into a project you can achieve in stages even like that handheld air quality monitor always break these things down into tiny stages that are achievable so now we're gonna go do the live demo so we'll see see if this works I'm a little bit nervous because now
I've got to fetch the data from your surveys so if you remember I asked you the question how heavy was Adah without showing you any evidence of what kind of dog she was and then after showing you evidence of what dog she was so we should see two different distributions of data and then maybe we can learn something from that as a data scientist I use Jupiter notebooks this is in the new Jupiter lab interface so this is a web-enabled in interactive Python environment where you can do charting and graphing and 3d plots and JavaScript and you can query SQL and big data systems and CSV files and anything that you need and you can develop it in a way that provides for easy demonstrations and if you've never used it before I recommend you give it a go so you're going to recognize some of the code I'm not really going to go into the code that's here so I'm just going to load in let's see if over 4G all right so we loaded the data we've got the data files down fine so these are some examples do these are the last rows of the last time I ran this but the rest of the dates will be four they're at the answers that you've put in Krita oh good grief a mean of infinite and a standard deviation of Nan so this is having put in my most robust parsing process possible in the hope and last time it ran just fine well that might be annoying if not I've got the pre-rendered demo on the other thing and I'll have to improve improve the slides let's know good grief who put in range parameter okay now skip that one can just the next there we go alright so what I would have shown you and I will show you what I would have shown you is a pre-rendered one that one of the first things you always do is load in the raw data and look at it and then you process the data to get rid of your outliers and the weirdnesses so you can look at the one that's hopefully a bit more sane getting rid of any mistakes that might have crept through I'm very curious to see what mistakes actually crept through but I'll debug that offline so having taken out some of the unusual guesses that have generated infinite results thank you for whoever did that we've got what we've got 448 responses in the collect region which is pretty sensible so this clipped region I take any numbers that's kilograms one or more and sixty or less so 60 is the weight of a large Rottweiler which is a pretty hefty dog dogs go do go up to over 100 kilograms but they're pretty rare they're bigger than humans they're fairly terrifying beasts my smaller spaniel is much smaller as you saw so we've got a nearly 500 responses so I'm really happy with this and we get an interesting distribution so we get a lopsided distribution skewed distribution so lots of is on the left hand side so many of you are guessing around odds that between 5 and 15 kilograms there's a spike at around 15 kilograms and there's a spike at around 20 kilograms now I expected this if you don't have any evidence to work off you're gonna probably pick a round number that's pretty sensible just not obviously wrong so 15 20 25 30 every points we're going to see spikes and this kind of artificial results and then some of you are taking some punts much further out onto the the larger weights and if we could look at the broad eights we would have seen guesses I'm guessing going up to a hundred plus because it's not unreasonable to have a dog that heavy it's just unlikely so we've got this this skewed distribution with a median guess of twelve point eight kilograms so if we sort all of the other guesses in numeric sequence and go halfway along that sequence the median is to the fiftieth percent percentile and that'll be twelve point eight kilograms which is a reasonable guess not knowing anything can advance what happens when we introduce some evidence how do your guesses change so let's load in the second one so four hundred and twelve guesses in the second case with a median of twelve so it turns out you're not all dog fanciers because you've all wrong or a lot of you are wrong so here we see this distribution this is what I wanted to say this distribution has closed down by providing more evidence you're as those of you who would have guessed higher probably have come lower those of you gets very low might have guessed higher so the distribution has closed down a bit so it's still a skewed distribution there's still a lot of weight on the left-hand side the longer tail to the right-hand side the median hasn't changed very much it's kind of interesting but the spikes that we saw at 15 have disappeared and is still one at twenty but there are spikes here just under ten and just over ten kilograms which means that you're guessing is up to nine kilograms 13 kilograms which isn't crazy at all she is a smaller dog but if you're not dog owner we may be hard to to guess her weight it turns out she's actually 17 kilograms but you're not too far out there now one thing we could do if you want to start comparing your results is to take these two individual sequences of numbers and put them together into a data frame which multiple sequences of numbers it'll be like an Excel spreadsheet with multiple columns so I combine these two and then we can look at these so because there are less results in one than the other one of them has got these missing numbers and that's fine for the graphing perspective and then if we destroy these and overlay them here we can see just a simple visual comparison of your before and after guesses so the Blues of the before and then the kind of the orange you ready one is after and so what we see is that the Blues are higher as we go to the right-hand side so more of you are making larger guesses in particular those round number points we see fifteen twenty thirty forty and kind of fifty five jumping out and then once you've got some more information your guests have come towards the left-hand side towards the lower numbers and we see a greater volume of those guesses around the ten kilogram point and it's all kind of bunched up in there so the wisdom of the crowd is kind of working here you've you've made sensible guesses but you're not dog fanciers this is not coming as some kind of dog competition so you don't have great information about what the weight of the dog might be and so you're you're not support on the actual the correct answer is somewhere in here which is a low point in the result which is interesting but the purpose of this was to see that the variance of the results shrunk rather than exactly where the the median or the mean guess might be so I'm really happy with that demo and hopefully what you've seen from that is you can take some raw data draw some graphs make some comparisons and ask some questions and inevitably raise some new questions which drives you back to the beginning to get more data and draw more graphs so you can go around in a circle okay so it's time to wrap up and so
closing thoughts it's all about collecting data and visualizing it and then sharing your results there's an awful lot of hype about big data a deep learning and that the cleverest smartest next thing coming but almost all of my work with clients involves finding their dates or realizing they haven't got the data they thought they had fixing it up into a way that it's used for drawing graphs and then interrogating people about what does this actually mean and then providing some results and then they're iterating and making things slightly more complex and then iterating and iterating it's all about getting the data and visualizing it and you've all got access to that data there are data sets in the appendix you're very welcome to go and follow those when these slides go online and then find some datasets if you don't have access to your attent data but working off the data you understand is the right way to go that domain knowledge is incredibly important and only you have the domain knowledge about the data that you've got I have a
request of you if if I've made you think about something new and if you're interested in this topic and if you want to go make some change around your own environment I'd love to get a postcard I've been collecting postcards for the last year they remind me that these talks actually actually work they make people think about what they're doing so I've got a lovely collection of postcards at home if you would like to send me a postcard just send me an email I send you my address okay when you send it away you send it from I just like getting postcards with nice messages saying hey you made me think so if that's I think you would like to do please get in contact and more importantly please if you haven't yet thanked an organizer and speaker here please go and thank an organizer and a speaker and many people forget that these are volunteer run events the speakers put a lot of time in the organizers put a lot of time in and they forget to go and say thank you so we can assume from the ecosystem without contributing back even to say thank you we're a lovely group here poisonous there's a very lovely bunch please go and thank the people around you for the world they've put into this the right type of will make you very much you
Feedback