We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Bayes is BAE

00:00

Formale Metadaten

Titel
Bayes is BAE
Serientitel
Teil
35
Anzahl der Teile
86
Autor
Lizenz
CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Before programming, before formal probability there was Bayes. He introduced the notion that multiple uncertain estimates which are related could be combined to form a more certain estimate. It turns out that this extremely simple idea has a profound impact on how we write programs and how we can think about life. The applications range from machine learning and robotics to determining cancer treatments. In this talk we'll take an in depth look at Bayses rule and how it can be applied to solve problems in programming and beyond.
35
CASE <Informatik>VorhersagbarkeitLeistung <Physik>Bayes, ThomasXMLComputeranimation
PhasenumwandlungService providerSchlussregelHauptidealMathematikerinAutorisierungMultiplikationsoperatorFluss <Mathematik>FehlermeldungStatistikBitEinfacher RingObjekt <Kategorie>ZahlenbereichBAYESComputeranimation
RotationsflächeStochastische AbhängigkeitQuick-SortInverseCASE <Informatik>StichprobenumfangFlächentheorieVorhersagbarkeitXMLComputeranimation
VorhersagbarkeitEreignishorizontZellularer AutomatMinimalgradNeuroinformatikVersionsverwaltungApp <Programm>Neuronales NetzMultiplikationsoperatorAlgorithmische LerntheorieServerARM <Computerarchitektur>ResultanteWort <Informatik>ProgrammierungServiceorientierte ArchitekturBetafunktionWahrscheinlichkeitsverteilungRuhmasseClientGradientKontinuierliche IntegrationDatenverwaltungDatenmissbrauchDigitales ZertifikatBitMAPGemeinsamer SpeicherProgrammfehlerAnalytische FortsetzungGanze FunktionComputeranimation
ProgrammfehlerBitAggregatzustandMultiplikationsoperatorDienst <Informatik>Open SourceCodeXML
Leistung <Physik>Verzweigendes ProgrammDimensionsanalyseBildschirmmaskeFlächeninhaltQuadratzahlAggregatzustand
AggregatzustandZahlenbereichStrömungsrichtungSelbstrepräsentationData MiningSelbst organisierendes SystemElektronischer ProgrammführerTypentheorieTwitter <Softwareplattform>Computeranimation
GradientKlasse <Mathematik>RobotikNeuronales NetzSchlussregelFlächeninhaltBitAutomatische HandlungsplanungZahlenbereichXMLComputeranimation
NormalverteilungOrtsoperatorKurvenanpassungEinflussgrößePunktRobotikCASE <Informatik>GraphInformationZahlensystemAggregatzustandPhysikalisches SystemDiagramm
InformationVorhersagbarkeitRobotikAggregatzustandPhysikalisches SystemSchätzfunktionEinflussgrößeSchlussregelXML
Filter <Stochastik>EinflussgrößeOrtsoperatorVorhersagbarkeitVerträglichkeit <Mathematik>ResultanteProdukt <Mathematik>FaltungsoperatorFunktionalComputeranimation
RobotikPunktMereologieEinflussgrößeGruppenoperationRichtungKlasse <Mathematik>FreewareBitEndliche ModelltheorieVorhersagbarkeitSkalarproduktFilter <Stochastik>GraphfärbungKalman-FilterMultiplikationsoperatorRechter WinkelCaching
SchlussregelSchreib-Lese-KopfCASE <Informatik>BitNichtlineares GleichungssystemInformationResultanteHyperbelverfahrenQuick-SortProgrammierungGüte der AnpassungOrdnung <Mathematik>Neuronales NetzTopologieMIDI <Musikelektronik>Wort <Informatik>MultiplikationBAYESPunktZirkulation <Strömungsmechanik>Selbst organisierendes SystemArithmetisches Mittel
TopologieSchreib-Lese-KopfResultanteVerzweigendes ProgrammEreignishorizontPunktRechter Winkel
Element <Gruppentheorie>TopologieSchlussregelExpertensystemExpandierender GraphMinimumBitBildschirmmaskeTotal <Mathematik>Computeranimation
Total <Mathematik>BitSchreib-Lese-KopfPunktSummierbarkeitMultiplikationsoperatorCASE <Informatik>Leistung <Physik>Ordnung <Mathematik>MAPMinkowski-MetrikGarbentheorieMathematikComputeranimation
SchlussregelBildschirmmaskeMathematikMultiplikationsoperatorSchreib-Lese-KopfA-posteriori-WahrscheinlichkeitInformationRechter WinkelGarbentheorieMultipliziererNichtlineares GleichungssystemCASE <Informatik>Ordnung <Mathematik>KontrollstrukturPunktPhysikalisches SystemBayes-RegelRuhmasseMinimumMAPZahlenbereichComputeranimation
Rekursive FunktionKalman-FilterSchätzfunktionWort <Informatik>GraphFilter <Stochastik>Schreib-Lese-KopfA-posteriori-WahrscheinlichkeitInformationVerschlingungMathematikVorhersagbarkeitFaltungsoperatorEinflussgrößeComputeranimation
EinflussgrößeGeradeRauschenRechenwerkGlättungPhysikalisches SystemAutomatische HandlungsplanungCASE <Informatik>MaschinenschreibenFilter <Stochastik>Computeranimation
Filter <Stochastik>EinflussgrößeEinfacher RingVorhersagbarkeitBitrateFehlermeldungShape <Informatik>Spannweite <Stochastik>GradientenverfahrenFlächeninhaltBildschirmmaskeOrtsoperatorGeradeEndliche ModelltheorieTermCASE <Informatik>SchätzfunktionBitSigma-AlgebraKonstantePhasenumwandlungRobotikQuick-SortMaskierung <Informatik>MultipliziererMessfehlerZahlenbereichProgrammierungCodeMultiplikationsoperatorAusnahmebehandlungPhysikalisches SystemComputeranimation
BitVorhersagbarkeitQuadratzahlRobotikAutomatische HandlungsplanungGraphSystemzusammenbruchGeradeFehlermeldungFreewareEinflussgrößeDiagramm
Kalman-FilterCASE <Informatik>BitMatrizenrechnungMultiplikationsoperatorBildschirmmaskeWinkelFilter <Stochastik>Physikalisches SystemComputeranimation
Filter <Stochastik>BitSchlussregelSchätzfunktionAutomatische HandlungsplanungStatistische HypotheseWahrscheinlichkeitsverteilungInformationKurvenanpassungMaschinenschreibenCASE <Informatik>MereologieEinflussgrößeVorhersagbarkeitMaßerweiterungDifferenteOrdnung <Mathematik>Linearisierung
VorhersagbarkeitSchlussregelSchnittmengeSpezielle unitäre GruppeGerichteter GraphPhysikalischer EffektInformationMultiplikationsoperatorExplosion <Stochastik>Physikalisches SystemSchreib-Lese-KopfBitLeistung <Physik>Systemaufruf
SchlussregelSchätzfunktionKalman-FilterGanze FunktionGrößenordnungPhysikalische TheorieMathematikAbstimmung <Frequenz>Wurzel <Mathematik>CASE <Informatik>Digitale PhotographieStützpunkt <Mathematik>YouTubeGüte der AnpassungAlgorithmusTheoremRauschenFilter <Stochastik>PlotterEinfügungsdämpfungProgrammierungBAYESProgrammiergerätRechenschieberComputeranimation
XML
Transkript: Englisch(automatisch erzeugt)
What if you could predict the future? What if we all could? I'm here today to tell you
that you can. We all can. We have the power to predict the future. The bad news is that we're not very good at it. The good news is that even a bad prediction can tell
us something about the future. Today, we will predict. We will learn. Today, we will discover
why Bayes is Baye. Introducing our protagonist. This is Thomas Bayes. Thomas was born in 1701.
Maybe. We don't exactly know. He was born to a town called Harfordshire. Okay. Possibly. We
can't know for certain. We don't actually even know what Bayes looked like. What we do know is that Bayes was a Presbyterian minister and a statistician. We also know that his most famous work was published, a paper that gave us Bayes' rule, was not published until after his death.
Before this, he published two other papers. The divine benevolence or an attempt to prove that the principle end of the divine providence and government is the happiness of his creatures. Yes, that is one title. As well as an introduction of the doctrine of fluxons and a defense of the mathematicians against the objections of the author of the analyst. You
know, I like my titles a little bit shorter, but everybody has different preferences. So, why do we care about this? Well, Bayes contributed significantly to probability with the formulation of Bayes' rule. And again, even though it wasn't published until after his death,
let's travel back and put our minds in a commoner of the era. So, the year is 1720. Sweden and Prussia just signed the Treaty of Stockholm. Anna Maria Mozart, the mother of the
person who wrote the Requiem that we just enjoyed, Wolfgang Amadeus Mozart, so not Mozart, but his mother was born in 1720. And statistics is all of the rage, as well as probabilities. At the time, we can do things like say, given we know the number of winning tickets at a raffle,
what is the probability of any one given ticket will be a winner? In the 1720s, Gulliver's Travels was published. This is 45 years before the American Revolution, 45 years before, you know,
we went to battle with Britain and gained our independence. And also in the 1720s, Easter Island is discovered because people knew it was there before, but the Dutch didn't. So, and I don't know if you know this or if you've seen this, but there's actually a lot
more to the statues. There's a lot more underneath the surface, which is also very true of probability as well. See, what we knew how to get the probability of a winning ticket, what we didn't know how to do was the inverse. An inverse probability says that, okay, well,
if we draw 100 tickets and we know, and we find that 10 of them are a winner, what does that say about the probability of drawing a winner? Well, in this case, it's pretty simple. 10 are winners, we drew 100 tickets, this is about 10%. But what if we have fewer samples? What if we have one sample? We drew one ticket and it was a winner. Well, does that mean that 100%
of tickets are winners? Is that what we're gonna guess? So, the answer is no. We wouldn't guess that. That's not, you're like, oh, well, maybe it's a really weird raffle, but I've not found any raffles that are like that. And the reason why you were able to correctly answer that
is because you can predict the future. Even if that prediction's wrong, not dead on, it's still better than making no prediction at all. This was Bayes' insight, that we can take two probability distributions that are related, and even if they're both inaccurate, the result
will be more accurate. We can do things with this, such as machine learning and artificial intelligence, I'll be focusing on artificial intelligence in this talk, but I want to take a second and introduce myself. My name is Schneems, it's pronounced like schnapps,
it's got the little fun shh at the beginning. I maintain sprockets poorly, I have commit to Rails as well as Puma, and I'm also taking a CS in masters, a masters in CS at Georgia Tech with their online program. I went there for my bachelors for a mechanical
engineering degree, and absolutely hated it, it was brutal and not very much fun, but they're only charging me seven grand for the entire program, so it's pretty cheap, not a bad deal. So I work full time for a timeshare company. It's timeshare with
computers, that's what we do. So hopefully some of you already know what Heroku is, so instead of pitching or explaining Heroku, I'm going to explain some new features you might not have heard of. We have a thing introduced called automatic certificate management,
this will provision a let's encrypt SSL cert for your app and automatically rotate it every 90 days, which is pretty sweet. We also have SSL for free, and that was on all paid and hobby dinos, and the SSL that we offer for free is what's known
as SNI SSL, and I don't know if you heard about the legislation that went through Congress that was like, hey, FCC, you cannot protect people's privacy. Anybody hear about that? Okay, yeah, so adding SSL onto your server is going to help your clients get a little
bit of protection. The free version of SSL that we have, which is SNI, does leak the host name to your ISP, but we also have an NSA grade SSL, which is an add-on that you have to add and then you also have to provision and maintain your own certificate. We have Heroku CI, which is continuous integration, it's in beta, you can give that a shot.
Review apps, which I absolutely positively love, try these if you haven't. Every time you make a pull request, Heroku will automatically deploy a staging server just for that pull request. So you're like, hey, I fixed the CSS bug. It's like, did you really? Did you?
The person reviewing can click through, see an actual live deployed app and verify that. So that's it for the company I work for. Typically this would be the time when I do a little bit of self-promotion and typically I would do something like promote the service that I run called Code Triage, which is the best place to get started contributing to open source, but since I'm not going to be talking about Code Triage, instead
what I want to talk about is the biggest problem our country faces, especially, I come from Texas in the state of Texas, faces gerrymandering, which is awful, and unlike Code Triage, gerrymandering is very bad. Anyway, so this is gerrymandering. Basically, given a population, you could represent
it perfectly and say, okay, well, there are more blue squares than there are more red squares, so we should have more blue districts than red districts, but if you look all the way over on the side, you can create those districts in such a way that, oh, magically,
now there are more red districts. So this is where I live. This is the district in Texas that stretches from San Antonio to Austin. I don't know if you know, but that's a really far way. Yeah, I mean, like, just look at it. Seriously.
So yeah, gerrymandering kind of, like, takes away your voice, diminishes the power of your vote, and so I think we need countrywide redistricting reforms, and it's not just me who thinks this. My district was actually ruled illegal by the state of Texas, the judicial branch. Unfortunately, an illegal district will not deter the people in charge
of redistricting in Texas, and they're refusing to hear any bills on the issue, and you might say, wow, that's a really important issue. Okay, but what can I do? So I highly recommend looking up your state representatives. You have a House representative
and a Senate representative. Like, find them. Mine are Kirk Watson and Eddie Rodriguez. I have their phone numbers in my phone, and then call them and let them know, like, hey, I care about redistricting, and I care about gerrymandering, and, like, I want this to be an issue that we should push. You might say, oh, well, is there more
that I can do? Well, there are local organizations. For example, in Texas, there's DJerrymanderTexas, which is a really long Twitter handle, and they give guides and talk about current legislation and those types of things. So, yeah, I just think that gerrymandering is very unpatriotic, un-Texan. It can be un-Arizonaan, too,
you know, no bias, and it really just takes away the freedom to elect people who represent us. So, okay, yeah, back to Bayes. So artificial intelligence. For this talk, I'm going to
be talking about some examples for the grad course that I've been taking at Georgia Tech, where we've been using Bayes' rule for artificial intelligence with robotics. If you're not familiar, this is what a robot looks like. The world is very different ever since the robotic uprising of the mid-'90s. There is no more unhappiness. Affirmative.
Okay. Can I get the audio just like a little bit? Okay. There we go. So when we have a robot, and we need to get that robot somewhere, we need two things. We need to know where the robot is, and then we also need to have a plan on how to get them there. So robots don't see the world the same way we see them. They see them through sensors,
and those sensors are unfortunately noisy, so they don't see the world perfectly clear. So given the case that we have a robot, and a really simple robot can move, let's say, just right and left, if we take a measurement, it will tell us about where it is. We can represent this by putting it on a graph, and this is a normal distribution.
So here we have a robot. It's at position zero, but we don't know for sure that it's at position zero. It could be further away. It could be all the way over at .6, but this is a lot less likely. It's not very probable. The more accurate our measurement, the steeper our curve will be.
We are now, at this point in time, it's almost impossible that it would be at .6, and it's much more likely that it would be a lot closer to .0. So a robot is an example of a low information state system. We could take thousands or hundreds of measurements of that robot as it's just sitting there
and average them together, but what if our world is changing? What if there's other things impacting our sensors? Or it's like, hey, our robot needs to move and do things. And so one of the things that we can do is use Bayes' rule. We can make a prediction, and with that prediction, use it to increase the accuracy of the estimate of where the robot is.
So previously we thought we were at position zero, plus or minus some error. Well, then we can predict what the world would look like if we were to drive forwards by 10 feet. If we did that, it would look something kind of like this.
We were at zero. Now we're at 10. But we want to be sure, so we take a measurement, and it says, oh, we're not at 10. It's showing that we're at five. So what do we do? Our measurement and our prediction disagree.
So probably a good guess might be somewhere right in between the two. We can take our measurement and our prediction and make a convolution, which is a really fancy way of saying the product of two functions. The result is actually more accurate than either of our guesses individually.
So even though our measurement was noisy, we don't actually know if we're at five, and our prediction was noisy, we're not actually at 10, the end result is more reliable. And this gives us a Kalman filter. A Kalman filter can be used any time you have a model of motion and some noisy data that you want to produce a more accurate prediction.
So how good is a Kalman filter, you might ask. This is an example of a homework assignment that was given to us. The green represents an actual robot's path where all of the little red dots are the noisy measurements. And it's so noisy that if you just take two subsequent points, two measurements, you can't tell which direction the robot is moving in
because the second point might actually be way behind the first point. So it's incredibly, incredibly noisy. And this is part of the class. You can actually go to Udacity and take the course for free, and this is the final thing that they do in the course.
If you end up going to Georgia Tech, there's a little bit more involved. But to make things even more interesting, not only do you have to figure out where the robot is, you have your own robot that moves slightly slower than the one you're trying to find, and you have to chase it.
So you have to predict where it will be a time or two into the future and then be there. And sorry for anybody who's colorblind. They picked the colors, not me. So what does this look like? Well, if we can apply a Kalman filter and we end up something kind of like this. Before, our red dots were virtually unusable.
As I mentioned, given two points, we can't even determine the direction. But with this correctly implemented, we can see our chaser robot getting closer and closer. So I like a little bit of audience participation.
Who here likes money? Okay. All right. I think some people didn't raise their hands. It's okay. Before we look at how a Kalman filter looks like, let's look at some cold hard cache. This is a 1913 Liberty Head Nickel. It was produced without the approval of the U.S. Mint, and as a result, they only made five of them.
Only five of these got into circulation. As a result, it's incredibly, incredibly rare, and if you find this, it's worth $3.7 million. So yeah, I'd say that's a pretty penny, but I'll be here all week, folks.
This is not a Liberty Head Nickel. This is a trick coin that for some reason your coin collecting friend happened to have that has two heads instead of being the actual Liberty Head Nickel. And this coin collecting friend also has a $3.7 million coin.
And for some strange reason, they put two coins into a bag and shake it up and draw one. So we have one fair coin and one trick coin in our bag. They say, hey, you know what, do you want to play a game? Do you want to make $3.7 million?
And so they take a coin out, they flip it, and they say that, oh, okay, it landed on heads. From here on, they might try to make some sort of a wager or bet, like, okay, well, you know, if it's a $3.7 million coin, you can keep it, but otherwise you have to, you know, I don't know, mow my lawn or something.
I mean, it's fairly equivalent, right? But, like, would that be a good bet or not? In order to know, we have to know what is the probability that given it landed on heads that we have our fair coin. To do this, we can use Bayes' Rule. So this is what it looks like.
To explain a little bit of the syntax, the P stands for probability, and we are saying what the probability of A given B. So this is the probability that we have a $3.7 million coin given that we know it was heads. That's the information.
That's all we knew. So in order to do this, we can flesh this out piece by piece. So the probability of heads. Well, what is the probability of heads? We have three total chances of getting heads and one chance of getting tails. So we have a 3 out of 4 or 75% chance of getting heads.
Another way that we can do this is say, well, there's a 50% chance that we get our fair coin, and if we get that fair coin, there's a 50% chance that it's heads. We can add that to a 50% chance of getting our trick coin, and if we get our trick coin, there's a 100% chance that we're going to get heads.
And when you do that, you end up with the exact same result. This is just the more mathy way of achieving that instead of intuition, because later on, I tried to teach my program intuition. It didn't work out too well. Also, so this is a talk on artificial intelligence, and I have to admit I don't know a whole lot about artificial intelligence,
or I would have written an artificial intelligence to write my talk. So thank you. Okay, so we're going to add this onto our equation and keep moving. So now we want to know what is the probability of A, the probability of getting that $3.7 million coin.
Well, we know we have two different cases. They're equally probable. We have a 50% chance of getting that coin, and we can add this back to our equation. The last piece is the probability of heads given that we have a fair coin, given that we have this $3.7 million coin.
So in that case, we only have, like assuming that we have the fair coin, we flip it, there's only a one out of two chance that we have heads. So that's 50%, we can add it here. When we put all of that together, we end up with a one in three, or 0.33%, 33% chance of owning a multimillion dollar 1913 Liberty Head nickel.
So one in three, it's not great, but it's not nothing. This is what we can do with Bayes' Rule.
Given two related probabilities, in this case, what is the probability we'll get heads, and also what is the probability that we'll draw our money coin, we can accurately predict that relationship. Khan Academy has a really good resource on Bayes' Rule, and instead, another way to teach this, this is the very mathy way,
one other way to look at this is with trees. So here's essentially that. To answer this question, we need only rewind and grow a tree. The first event, he picks one of two coins, so our tree grows two branches, leading to two equally likely outcomes, fair or unfair.
The next event, he flips the coin, we grow again. If he had the fair coin, we know this flip can result in two equally likely outcomes, heads and tails, while the unfair coin results in two outcomes, both heads.
Our tree is finished, and we see it has four leaves, representing four equally likely outcomes. The final step, new evidence. He says, Whenever we gain evidence, we must trim our tree. We cut any branch leading to tails, because we know tails did not occur, and that is it.
So the probability that he chose the fair coin, is the one fair outcome leading to heads, divided by the three possible outcomes leading to heads, or one third. All right, so if we use trees, or we use Bayes' rule, we get the same outcome. I'm not an expert in probability, but that's probably a good thing.
One element I mentioned, but didn't dwell on, was total probability. And also, I'm very terribly sorry, I lied about Bayes' rule. That isn't all of Bayes' rule, it actually looks a little bit more like this. So this is the expanded form, and to see both side by side,
this is just expanded, the total probability of B, expanded on the bottom. So what exactly is total probability? If we're going to look at our problem another way, we can say, all right, well we have a 50% chance of our actual coin, or the zero dollar trick coin,
and in this problem space, if we're going to land on heads, heads is going to completely take up the trick coin case. If we have the trick coin, there's a 100% chance of heads. However, it only half takes up the 3.7 million dollar coin. If we land on tails, tails falls entirely inside of the 3.7 million dollar coin,
and we have a 100% chance that that is that coin. Now, what we actually want to know is this section. So, or, sorry, what we want to know is the probability,
the total probability of getting heads, and in order to do that, we can calculate it by adding up this section along with this section, and that will give us the total probability. To write it out long form, we have the probability of heads given that we have our fair coin times the probability of the fair coin
plus the probability of heads times the trick coin multiplied by the probability of getting that trick coin. So it's just this summation, and we did this previously when I showed you this slide, but I didn't explain exactly why we did it or where we're getting that math from.
So that's where it came from. We can make this a little bit tougher, though. What if we flipped two coins, or what if we flipped the coin twice and it landed on heads both times? In order to do that, it makes it actually a little simpler if we use the expanded form. I'm not going to dwell on exactly where we got all of the numbers from as much,
but here, the suffix I indicates each of the different cases, so we could have a coin that is our fair coin, or we could have a coin that is the not fair coin. So the probability of landing on heads twice given our fair coin is going to be, you flip it and it's a 50% chance of heads.
You flip it again, it's a 50% chance of heads. Multiply those two together. The probability of getting that fair coin hasn't changed. It never will. There's always a 50% chance of getting one out of two coins. And then we can flesh out this summation, and at the bottom, again, so it's 0.25 times a half,
plus if we get heads, or if we have the trick coin, it's 100% probability, so it's one times the probability of getting the trick coin, which is a half.
You all with me? Okay. All right. So if you add all this together, you end up with a fifth, which is 0.2. And now Bayes' rule doesn't claim certainty. Our values are going down. It is more and more and more likely that we do not have the fair coin. But it's never actually going to reach zero.
And that's a really important part, because if it does reach zero, and then we flipped it again, and it turned out to be tails, well, the way Bayes' rule is written, it would never recover from that. Mathematically, it would never recover from that. So sorry to get a little bit mathy, but we need it. Is anybody ready for a break from math?
All right. So we are going to take a break from math. With some more math. All right. For that, I'm going to put on my math jacket. I do appreciate you all bearing with me.
So if we look back at Bayes' rule again, one way to represent it would be splitting the equation out. This is exactly what we had before, but on one side, we basically have a constant. The probability of getting our fair coin every single time was exactly the same. So this is going to be called our prior.
Without any information at all in the system, we can say that would be the probability of getting our coin. This other section is after we have information, so it's a posterior, so post information.
And even if our prior is 0.5, our posterior, if we have the case where we got a tails, our posterior is so large that it actually pulls the 0.5 up all the way to be 100% and say we definitively have a fair coin. So a Kalman filter is a recursive Bayes estimation,
and I can guarantee you that all of these are words. Previously, we looked at a graph, and we had a prediction, and so that is actually going to be our prior. We also had a measurement, and that's going to be our posterior. This is the thing that updated after we got new information.
And our convolution, we're going to be somewhere in between. We don't exactly know where. So that's where actually implementing a Kalman filter comes from. So the next example comes from Simon D. Levy. I have a link to this resource.
Step-by-step goes through and really explains the math. I know your heads might be hurting a little bit, but I'm barely skimming the surface, and some of it's really interesting. He also has a fairly unique and fairly simple example that I'm going to walk through how to implement it in a Kalman filter.
So let's say we've got a plane, and this plane is really simple. All it can do is land, apparently. And the way you control it is by multiplying your current altitude by some other value. In this case, it's 0.75. And this gives us a nice steady landing.
Towards the end, it's like moving in smaller and smaller and smaller increments until eventually we kind of touch down. Unfortunately, our measurements are really, really, really noisy. So this is that line but with 20% noise, and we're actually going below the ground here. We're going negative measurements.
So according to our measurements, we're repeatedly slamming into the ground. And I know visually, mentally, you're just like, oh, yeah, there's a nice little line in there. But if you are writing a system that depends on those measurements, we need it to be a nice straight line, nice smooth line, instead of this jagged thing that sometimes indicates we're below the ground.
So we're going to actually program this in a Kalman filter. We're going to start off with our rate of descent, just 0.75, our initial position, and our measurement error. We're then going to just make a guess. We're going to say, well, let's just assume you were at the very first position that you were measured at.
And we also introduce a new thing called P, which is our estimation error. This is our prediction error. And there's going to be a value between 0 and 1 that we're going to use to remember how we kind of adjusted our robot sort of back and forth. Is it closer to the prediction? Is it closer to the measurement? And that's how we're going to do that.
To get started, we pull a measurement off of our measurement array. Oh, and I do apologize. This is in Python. Yeah, I assume everybody here is a polyglot. Luckily, all of the code is identical to what it would be in Ruby, except for the very top line, the 4K in range 10.
So, all right, so we start off with our guess. We multiply where we currently were by our constant, so 0.75. That's now where we think we are. We then want to say, build into our system some way where if we move just a little teeny tiny bit, our prediction's probably pretty accurate.
But if we move a whole lot, our prediction's not that accurate. So we're going to multiply our emotion by our prediction error. And the reason we do this twice is that prediction error is actually represented as sigma squared, so it's error squared. And you don't really need to know that. Just multiply it twice.
So that's the prediction phase. Then, after we've predicted, we have to update it with our measurement. I'm going to skip this gain line and instead go straight to the actual update. So we have our guess of where we currently are. Then we add it with a mysterious gain number times the current measurement minus the previous guess.
And so the way that we can think about this gain is it's sort of the ratio of our last measurement and the prediction. If our prediction error is really low, like really, really low, then our gain is really, really low.
And if it's so low that it gets pretty close to zero, it can approximate zero. And when that happens, we can actually eliminate out this entire term, and that means that we should just ignore our noisy measurements altogether. Our last prediction was so good, it was so good,
we don't even need our new measurements. Either that or our new measurements were so bad, that it's not helping us in any way, shape, or form. If the prediction error is high, then it means we have a really high gain. And when that happens, we end up approaching one. And when we do this, we have an X guess,
and then we also have a negative X guess, and those two terms cancel each other out, and we end up just guessing whatever our measurement is. This means that our, we throw out our previous prediction and just use our measurement. You might want to do this in a case where it turns out that your sensor is really, really, really accurate, but your prediction model is not.
So a way to visualize that is if our prediction is less certain or less accurate, it's kinda a little bit more flat, and our robot would be leaning towards our measurement, or if our prediction is more certain, it's a little bit more peaky, then our robot is gonna be leaning more
towards the prediction. You put all of this together, and you recursively update your prediction error, and you end up with a graph that kinda looks a little bit like this. So that the jagged line represents our very noisy measurements, the blue line represents the actual value of the plane,
and the little green squares are what we are predicting. Now, it's not dead on. Again, we're not perfect at predicting the future, but we're pretty close. We're a lot better than what we had previously, and given this, hopefully our plane
won't crash into the ground repeatedly. So that's pretty much the simplest case of a Kalman filter. We can get a lot, a lot deeper. There's a lot more scenarios and situations. One of the more common things
is having a Kalman filter in a matrix form. For example, in this case, we only had altitude, but what if we also had engine speed and barometric pressure and the angle of our flaps and the angle of the pilot is pulling back on the controls?
And if we put all of those together, and if they are related, instead of individually writing a Kalman filter for each of them, we put them in one Kalman filter, it actually ends up being much, much, much more accurate for the entire system. And so, yeah, this looks pretty similar,
but it's, yeah, there's a little bit more going on that we don't necessarily have time to get into. The other case where a Kalman filter gets into trouble is in motion that isn't linear. So previously, yes, we had a nice, gentle curve, but each step itself was linear. Each step was just based on a constant
multiplied by the previous step. But there are cases where we have sinusoidal motion or logarithmic or just, you know, not linear, and when that happens, we end up having two different probability distributions, and then when we put them together, in order to add two probability distributions together, they have to be on the same plane.
And here, we're kind of estimating and making a bad estimation. Granted, this is still likely better than doing it without any kind of, just taking the noisy measurements, but I would recommend not doing this. Instead, there's other ways. There's an extended Kalman filter, there's an unscented Kalman filter, and this is kind of the way I think
of extended Kalman filters, is it rotates the plane of our probability distribution so that it approximates, it still has to be on a line, and it still has to, both of them have to be on the same line, but we can approximate our curve by rotating it.
All right, okay, so that's it for Bayes' rule, or sorry, that's it for Kalman's filter. I did want to go back a little bit to Bayes' rule and touch on the two most important parts.
So the prediction, if we never predict the future, then we can't know if we're right or wrong. This is what scientists, this is why scientists start with a hypothesis. If the hypothesis is wrong, we're forced to reevaluate our underlying assumption. And then whenever we get new information, we have to update.
We have to update our own set of beliefs. And the interesting thing about this is we can never be too sure of ourselves. No matter how many times we get heads, we can never be 100% sure that it is a trick coin unless we actually investigate it. That's why this is probability. As soon as it dips to that,
you end up going all the way to zero, or if you just make that claim, if you say oh, there's a 0% chance this could ever happen, Bayes' rule will not help you. Your system can never recover. So yes, I already gave the example previously of even if you get tails, it's like sorry, Bayes' rule tells you
there's a 0% chance you cannot recover. So no matter how sure of yourself that you are, you always need to remain a little bit skeptical. You might think that there's a 0% chance, or that there's 100% chance of the sun coming up tomorrow. That'd be a pretty good bet. And for most days, you'd be right.
But if it turns out that tomorrow is the day that our sun turns into a red giant and consumes the earth, hopefully your millennia of prior experience with the sun coming up every day doesn't cause you to accidentally die. On that note, it always pays to have good information.
And good guesses. We don't have to wait until our sun explodes. We can actually take a look at other stars and see what happens to them. We can compare our situation to, it's like oh, maybe not, it's not exactly the same,
but it'll give us a better prediction than we would have otherwise. And so the more data and the more predictions that we make, the better our outcomes will be. Let that sink in. So I highly recommend a book called
Algorithms to Live By. I think it's a book every programmer should read. It's very narrative, and it has an entire chapter on Bayes' rule. It's very easy to read. It doesn't get into the math nitty-gritty like I did. I also have, I see some people taking photos. I'm gonna leave it up here and speak to delay the next slide.
Okay, good, good. I also highly recommend The Signal and the Noise. This is a book written by Nate Silver. It's about probability. Nate Silver runs FiveThirtyEight. He successfully predicted our 45th president has a one in five chance of winning and would likely lose the popular vote.
He did not predict the magnitude by which he would lose the popular vote. Just saying. The audio I got, it's Mozart in Requiem in D minor. Previously, the Kalman tutorial,
you can go to bit.ly slash Kalman dash tutorial. This is Steven D. Levy's resource. And then also, if you're really into Kalman filters and you want to see a lot of that unscented Kalman filters, extended Kalman filters, this is a great resource. It's bit.ly slash Kalman dash notebook. And unfortunately, all of this is also in Python.
But it's, I mean, if you know Ruby, it's pretty easy to read. You can also check out Udacity and Georgia Tech. And if you didn't know, Bay is not short for baby. It's African American vernacular, and it stands for before anyone else.
So Copernicus built on top of Bayes theory and developed special cases of when we can truly have no prior estimate. What should we do? Laplace took Bayes' work and actually much of what we know is Bayes' rule and Bayes' theorem to be the nice, pleasant, polished thing that it is
actually comes from Laplace. So before there was Copernicus, before there was Laplace, Bayes was Baye. Thank you very much.