Joint Distribution (1)
Formale Metadaten
Titel |
| |
Serientitel | ||
Teil | 4 | |
Anzahl der Teile | 16 | |
Autor | ||
Lizenz | CC-Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben. | |
Identifikatoren | 10.5446/12889 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | |
Genre |
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
00:00
Folge <Mathematik>GraphNumerische MathematikWahrscheinlichkeitsverteilungZufallsvariableVariableWahrscheinlichkeitsmaßBinomialverteilungDiskrete WahrscheinlichkeitsverteilungFunktionalStichprobenumfangWechselsprungParametersystemRuhmasseRandomisierungPunktBeobachtungsstudieF-VerteilungWahrscheinlichkeitsfunktionMinkowski-MetrikRechter WinkelEinsEnergieerhaltungFunktionalBinärdatenComputeranimationVorlesung/Konferenz
09:55
GeometrieNumerische MathematikStatistikNormalverteilungVarianzWahrscheinlichkeitsverteilungZufallsvariableVariableKategorie <Mathematik>Analytische FortsetzungArithmetisches MittelBinomialverteilungFunktionalGeradeWechselsprungMultiplikationssatzStochastische AbhängigkeitParametersystemNormalvektorWurzel <Mathematik>RandomisierungSummierbarkeitAdditionGleitendes MittelBeobachtungsstudieGeometrische VerteilungWahrscheinlichkeitsfunktionMultiplikationsoperatorGeometrieBinomialverteilungIndexberechnungPoisson-ProzessOvalParametersystemBernoullische ZahlKapillardruckEinfügungsdämpfungWeg <Topologie>ComputeranimationVorlesung/Konferenz
19:50
MathematikNumerische MathematikModulformUnendlichkeitUngleichungDivergente ReiheFunktionalGeometrische ReiheLeistung <Physik>TermTeilbarkeitParametersystemDistributionenraumQuadratzahlSummierbarkeitEreignishorizontKonditionszahlDifferenteMultiplikationsoperatorZweiRechter WinkelGleichheitszeichenMereologieParametersystemSterbezifferBernoullische ZahlVorlesung/Konferenz
29:46
Maß <Mathematik>Numerische MathematikRelativitätstheorieZahlensystemWahrscheinlichkeitsverteilungKategorie <Mathematik>Derivation <Algebra>Dichte <Stochastik>Diskrete WahrscheinlichkeitsverteilungFundamentalsatz der AlgebraFunktionalGleichverteilungKalkülMomentenproblemStetige FunktionTermTheoremWechselsprungNichtlineares GleichungssystemParametersystemKartesische KoordinatenEreignishorizontWeg <Topologie>DifferenteObjekt <Kategorie>MultiplikationsoperatorGeometrieDichte <Stochastik>Analytische FortsetzungPi <Zahl>RelativitätsprinzipParametersystemSterbezifferBernoullische ZahlMeterElement <Gruppentheorie>Vorlesung/Konferenz
39:41
Maß <Mathematik>MathematikNumerische MathematikModulformKategorie <Mathematik>IntegralGanze ZahlUnendlichkeitExistenzaussageExponentFunktionalLeistung <Physik>Potenz <Mathematik>TermGüte der AnpassungMultiplikationssatzNichtlinearer OperatorParametersystemFaktor <Algebra>GammafunktionDichte <Physik>Sortierte LogikVorzeichen <Mathematik>Jensen-MaßEreignishorizontMultiplikationsoperatorTVD-VerfahrenFunktionalPi <Zahl>ParametersystemDichte <Physik>LESComputeranimationVorlesung/Konferenz
49:36
EvolutionsstrategieExponentialverteilungGraphNumerische MathematikStatistische MechanikModelltheorieWahrscheinlichkeitsverteilungZufallsvariableKategorie <Mathematik>FunktionalGleichverteilungHyperbelverfahrenLeistung <Physik>Sigma-AlgebraTabelleTermGüte der AnpassungStochastische AbhängigkeitMathematikerinWurzel <Mathematik>GammafunktionQuadratzahlDichte <Physik>Symmetrische MatrixSortierte LogikVorzeichen <Mathematik>MultiplikationsoperatorStandardabweichungEinsOvalDichte <Physik>Vorlesung/Konferenz
59:31
ExponentialverteilungGraphMathematikNumerische MathematikOrdnung <Mathematik>WahrscheinlichkeitsverteilungAusdruck <Logik>Derivation <Algebra>Uniformer RaumFunktionalGeradeLogarithmusMereologieTermStochastische AbhängigkeitRandomisierungVorzeichen <Mathematik>Umkehrung <Mathematik>InverseOrtsoperatorMaß <Mathematik>Arbeit <Physik>ComputeranimationVorlesung/Konferenz
01:09:26
BruchrechnungEvolutionsstrategieExponentialverteilungVarianzWahrscheinlichkeitsverteilungZufallsvariableVariableDerivation <Algebra>UnendlichkeitUngleichungArithmetisches MittelFunktionalGleichverteilungPrimidealSigma-AlgebraStichprobenumfangTermStochastische AbhängigkeitNormalvektorWurzel <Mathematik>QuadratzahlPunktDichte <Physik>EreignishorizontKnotenmengeMultiplikationsoperatorRechter WinkelFeldgleichungVorlesung/Konferenz
01:19:22
MathematikNumerische MathematikSignifikanztestStatistikTransformation <Mathematik>NormalverteilungVarianzWahrscheinlichkeitsverteilungZufallsvariableArithmetisches MittelFunktionalMereologiePaarvergleichPotenz <Mathematik>Sigma-AlgebraTabelleNichtlineares GleichungssystemStochastische AbhängigkeitGewicht <Ausgleichsrechnung>Wurzel <Mathematik>RandomisierungQuadratzahlDifferenzkernDifferenteMinimalgradMultiplikationsoperatorGauß-FunktionVorlesung/Konferenz
01:29:17
Numerische MathematikZufallsvariableBerechenbare FunktionAnalytische FortsetzungFunktionalTermFlächeninhaltMultiplikationssatzStochastische AbhängigkeitParametersystemRandomisierungDistributionenraumPunktDichte <Physik>KreisflächeDifferenzkernRadiusWahrscheinlichkeitsfunktionKonditionszahlBestimmtheitsmaßDifferenteMultiplikationsoperatorBinomialbaumBerechenbare FunktionReelle ZahlVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
00:05
capital X is less than or equal to little x is equal to 0 if little x is less than
00:23
0. It's 1 minus p if 0 is less than or equal to x is less than 1 and it's 1 if x is
00:41
greater than or equal to 1. That function looks like this. It's 0 up to but not
01:28
including the value x equals 0. It jumps to 1 minus p at 0, stays there until you get to 1, then it jumps to 1 and stays there. And from this graph you can read
01:49
out the following pieces of information. The probability that capital X is 0 is the jump at 0. And the jump at 0 is 1 minus p. The probability that x
02:09
is 1 is the jump at 1 and that's size p. Now to see why this is true, this captures
02:25
every aspect that you need to know about discrete random variables. Discrete random variables are ones that have this feature. They have jumps and between the jumps are flat. Those are called discrete random variables. If you look
02:42
at this function here and you have y less than x then f of x minus f of y would be the probability that little y is strictly less than capital X is less
03:01
than or equal to 1. So x is a function on this two-point space. How many values can x take on
03:21
if it's a function? How many values can it take on at s? Only one and f only one. So x can only have two values. And what could those two values be? Well, if I put some
03:42
number here bigger than 0 and some number here less than 0. For example, if I draw a picture like this, what's this probability in this case? Well, it's f at x minus f of y. We
04:01
did this on Friday. What's f at x? 1 minus p. What's f of y? It's 0. And does it matter how close x and y are to 0? Well, a little bit. x can't be over here. But as long as x is near 0 to the right of it and y is near 0 to the left of it, this will be 1 minus p. And if I
04:24
let x and y collapse down, what do we conclude about the probability that x is equal to 0? Well, what does this interval collapse to when x goes to 0 and y goes to 0?
04:45
Collapse to the point 0. So where could x be? It has only two possible values. One of them is going to be 0 and the probability of being 0 is 1 minus p. On the other hand, if I put x over
05:01
here and y there, what would f of x minus f of y equal? Well, f of x is 1. f of y is 1 minus p. 1 minus 1 minus p is p. So this would be the jump here, which would be p. And I can let
05:21
x and y get arbitrarily close to 1. And that would mean that probably x is 1 is p. Now to review, Bernoulli random variables led to a whole bunch of other ones. In fact,
05:42
probably almost all the ones you'll study that are discrete are coming from Bernoulli random variables. Any question on this Bernoulli random variable and its distribution function? Okay. I guess there's one last thing to say. The probability mass function,
06:43
otherwise the probability of x is equal to a value of little x is 0. And when you don't assign the values to a probability mass function, let's assume that they're 0. And we'll use an
07:03
abbreviation for a probability mass function, PMF. Okay, so Bernoulli random variable is a
07:50
random variable that assigns value x to success, I'm sorry, 1 to success, 0 to failure. Binomial random variable, you do n trials, which are Bernoulli. These aren't all equally
08:28
likely, but the probability, so what you do here is x of an outcome is the number of S's in the outcome. So each of these sequences of S's and F's, in each one you count the number
08:51
of successes, that's x. And the probability of any outcome is the following. It's p to
09:04
the number of S's in omega, and then 1 minus p to the number of F's in omega. Okay,
09:24
so omega would be a sequence of S's and F's. And we put a probability measure on this sample space, which assigns this amount, this probability to omega. If omega has k, say k S's and n minus
09:45
k failures, this would be p to the k, 1 minus p to the n minus k. And then here, a binomial random variable with parameter n and p, which will abbreviate binomial and
10:07
p as property that the probability that x is equal to k is p to the k, 1 minus p to the n minus k. So binomial random variable has interpretation, it's the number of successes
10:24
in n independent trials of a, n independent Bernoulli trials. Oh, I'm sorry, I forgot some here, times n choose k. Bernoulli also led to geometric. You perform independent
11:45
Bernoulli trials, that is, you perform an experiment that has two outcomes, success and failure. Probability of success is p, probability of failure is 1 minus p. You perform that until the first success happens. That random variable is called a geometric
12:01
random variable. And last time we said that probably that x is k for this random variable is 1 minus p to the k minus 1, and then times p, and this is for k equal 1, 2, etc. So this
12:21
is the probability mass function for a geometric random variable. The distribution function here, probability of capital X is, let's say, bigger than or less than or equal to k, would be the sum j equal 1 to k, 1 minus p to the j minus 1 times p. You get the
12:53
distribution function from the probability mass function by adding. Bernoulli also led
13:04
to Poisson, performed independent trials, but the probability of success depends on n.
14:16
I gave some biological examples, also the example of emission of alpha particles from
14:21
a blob of radioactive substance. Then the probability that x is equal to k is approximately equal to lambda to the k over k factorial e to the minus lambda. And a random variable with this PMF is called Poisson. And then finally Bernoulli led to a normal distribution.
15:12
So then the probability x over root n is less than or equal to the number little x is
16:21
approximately given by this distribution function. And this is the distribution function for normal random variable with mean 0 and variance 1. So Bernoulli led to all these different random variables or distributions. We'll study all of them. We spent some time on discrete.
16:53
Let's spend a little time on continuous. I would like to do one computation with the geometric before going on to continuous though. Continuous means that the distribution
17:05
function of the random variable has no jumps. That's what continuous random variable means.
17:24
So here I'm taking x to be geometric. Okay, so when I say that, I don't say what the value of x is. I say something about the statistical properties or probabilistic properties
17:43
of x. And the only probabilistic information we have about x would be its PMF or equivalently its distribution function. So when I say x is a geometric random variable, I'm saying its distribution function is this. Whenever I identify a random variable, x is something
18:01
like normal. I'm saying what the distribution function is. If I say x is Bernoulli, I'm saying what the distribution function is. So here I'm saying x is a geometric random variable. And I should say what the parameters are. That means if I want to know what the
18:23
probability that x is smaller than or equal to some value k, it's equal to this. That's the information we're given. Now I want to do a conditional probability. I want to
18:52
compute the probability that what? What's the interpretation of x again? It's a trial on which the first success appears. So it's like p is a 6. This is the same as rolling
19:04
a die until the first one appears. And the number of rolls it takes, the number of roll on which the first one appeared, that's the value x. So what's the probability that the first success occurs after this trial given that it didn't occur in the first
19:25
n minus 1 trials? So what's the probability that the success doesn't occur in the first n plus k minus 1 trials given that it did not occur in the first n minus 1 trials? Does everyone understand why what I'm saying is contained in this line here? X is the
19:46
trial in which the first success occurs. This says you've been trying whatever you're trying like rolling a die or tossing a coin, waiting for success. You've done it n minus 1 times and you didn't succeed. What's the probability that if you do it k more times
20:00
you still don't succeed? That's what this probability is. Now do you recall the definition of conditional probability? The probability of a given b? These are two events. This is an event. That's an event. So we look at the probability of a given b.
20:26
You remember that's the probability of a intersected with b divided by the probability of b. So we do that here. Here's the event a. This is the event b. What does this become?
21:16
So this comma means both things happen. Now one of those inequalities is stricter than the
21:25
other in the numerator. Which one is stricter or which one gives a smaller event? Right. Because if x is bigger than this it's automatically bigger than that. We're taking k to be 1, 2, 3, etc. So that means I can just do what? Get rid of it.
21:46
Cross that out. It doesn't add anymore. So now let's compute. What's the probability? Can we compute the probability that x is bigger than something?
22:01
We know the probability that x is smaller than something, smaller than or equal to. A distribution function has this form. What if you wanted to compute that? Right. It's 1 minus f of x.
22:30
It's just the thing that I told you before. The probability of a complement is 1 minus the probability of a. a here is this event. x less than or equal to little x. What's
22:45
a complement then? x bigger than little x. So here's a complement. This would be the probability of a. So we can compute these things. Because what? We know the distribution function. So how would we turn this around? What would 1 minus, this is f of k here.
23:11
What's 1 minus f of k in that case? Well what happens when I add this up, j equal 1 to infinity? I get f of infinity which is 1. 1 minus f of k would be the sum k plus 1 to
23:28
infinity. Let me do it this way. This one is the sum j equals 0 to infinity.
23:41
1 minus p to the j minus 1 times p. This one is the sum j equals 0 to k. 1 minus p to the j minus 1 times p. Does everybody agree with this? I did this last time. Why is this 1?
24:15
1 minus p. Right. This is a geometric series here. Oops, I'm sorry. This should be 1 here. Sorry. That's 1. I started at the wrong place here. 1, 1, 1. Did I start at the right place
24:25
here? Yeah, I did it right. Or another way to say this is f of infinity and the distribution function at infinity is always equal to 1. So what would we have if we subtract this from this? Well we'd have the terms from k plus 1 to infinity, right?
24:55
Okay, so can you say anything more specific about that? Can we evaluate that?
25:06
Can we make it simpler? Can we express it without using an infinite series? Well there's a factor that's multiplied times every term here with p, right? And that doesn't
25:25
depend on j so I can pull that out. And what do we know about series that look like this?
25:42
We do know the sum j equals 0 to infinity r to the j is 1 over 1 minus r as long as absolute r is less than 1. And here we have 1 minus p, p is the number between 0 and 1,
26:02
so absolute 1 minus p is less than 1. Now the difference between this and this is what? Well I guess one difference is here's a j and there's a j minus 1. But I don't think that's a big obstacle. We could just change variables, right? So let's do that first.
26:26
Well maybe we do that second. What's the other difference? Here we start at k plus 1, here we start at 0. So we'd like to start at 0 and have a j there. So maybe what can we do? Can we
26:47
factor out? What's the first term here? If j is k plus 1, what appears here? If j is k plus 1, what appears here? k, right? If this is k plus 1, k plus 1 minus 1 is k.
27:03
So the first term here is 1 minus p to the k and all the others will have higher powers of 1 minus p. So why don't we factor out 1 minus p to the k? It'll occur in every term. So p, 1 minus p to the k sum, j equals something to infinity, 1 minus p to the what?
27:28
Well now that we've taken out the 1 minus p to the k, we start with what? 1 minus p to the k was here. We pulled it out. So what's left for the first term
27:41
when j is k plus 1? We factor out 1 minus p, this becomes 1. What's the next term? We put k plus 2, we get 1 minus p to the 1. What's the next one? When k plus 3, we get 1 minus p squared. So we get 1 minus p to the 0 plus 1 minus p to the first plus 1 minus p squared
28:03
plus 1 minus p cubed. In other words, we get some j equals 0 to infinity, 1 minus p to the j, wouldn't we? And this we can evaluate using that. So we get p times 1 minus p to the k times 1
28:25
over 1 minus 1 minus p. And 1 minus 1 minus p is p. And here's a p. So this and this cancel,
28:48
leaving 1 minus p to the k. So let's summarize, what did we just do? The probability that x is bigger than k for a geometric random variable with parameter p
29:05
is 1 minus p to the k. All right, so what did we want to compute? We wanted to compute this ratio. Can we do it? Yes, we can. We know what things like this look like and what
29:28
things like that look like. So let's evaluate that. Here we get 1 minus p to what power?
29:55
And here we get, and this is 1 minus p to the k, which was the probability you have to wait k
30:12
trials. Or you are a little more careful. It's a probability that you don't get a success in the
30:20
first k trials. So the probability you have to wait k more trials, even after you've gone n minus 1 trials, is the same as though you're starting from the beginning. So if you're rolling a die and you've done it a thousand times, you didn't get a 1, does it mean that
30:45
you're due? That you should be more likely to get a 1 next or in the next few trials? No, you're not due. It's just like you're starting again and so you haven't rolled the die at all. Sometimes you hear people say that in the lottery some numbers are going to be hot,
31:05
because why? Well, they haven't appeared for a while. Well, does that mean they're going to appear soon? No. So keep this lesson in mind. If you forget everything else, remember this. And does it make sense to you? Yeah. Why? The coin or die or whatever
31:30
you're doing with it is an object with no memory, no way of controlling outcomes, just a stupid object. So now let's go to continuous random variables.
31:58
Spell it right. So it means that the distribution function is a continuous function of x.
32:38
The distribution functions I did earlier all had jumps interspersed with flat spots.
32:48
Here there's continuous increase, though these functions can have flat spots, but no jumps.
33:00
We did one last time, uniform random variable. The very important one is called the exponential random variable with parameter lambda. Let me
33:58
be a little more careful. X cannot take on negative values. The probability that capital
34:15
X is negative is zero. The probability that it's less than or equal to little x is like this.
34:23
If this is the case, what would the probability that capital X is bigger than little x equal?
34:48
What's the complement of this event? This is the complement of this event. Right? Capital X bigger than little x is the complement of capital X less than or equal to little x. This probability is one minus that probability. What's one minus this probability?
35:06
Right, e to the minus lambda x. The probability dies off exponentially fast as little x goes to infinity. Sometimes these functions are differentiable.
35:25
In this case it is. What's the derivative of this function? It would be lambda e to the minus lambda x for x positive and zero for x negative. At zero it's not differentiable.
35:50
But we usually give the notation little f and that's called the density.
36:05
What's the relation between a function and its derivative? That is given by the integral calculus. Do you remember the fundamental theorem of calculus? If you integrate the derivative of a function, what do you get? You get the function back again. If f prime is little f,
36:48
then the integral of little f from wide x gives f at x minus f at y. If these are distribution functions, then this would be... So keep this formula in mind. If you want to know
37:08
probability like this, if you know the distribution functions, you just take the difference of the distribution functions. If you know the density, you integrate the density from y to x.
37:26
These random variables, the exponential random variables, the parameter lambda, are used to model waiting times, like the time you wait for a bus, the time you wait for the car in front of you to start up after light turns from red to green.
37:42
The time you wait for a phone call, the time you wait for lightning to strike, the time someone waits for a customer, the first customer coming to a store, all kinds of things like that. The time you wait for a light bulb to burn out, the time you wait until the first emission of an alpha particle from your smoke detector,
38:04
or once one is emitted, the time you wait until the next one is emitted. So many, many applications of this. And they have also the same peculiar property here that geometric random variable had.
38:26
And that is, if you're interpreting this in terms of light bulbs,
38:47
and x is the time when the light bulb burns out, you put the light bulb in, now you keep track of how long it's been burning. The first moment it burns out, that's the
39:00
event that the light bulb has not burned out by time t. It's still, when it burns out is later than t. What's the probability that it lasts longer than s plus t moments, given that it's lasted t moments? It says that it's the probability that it lasts at least s moments.
39:21
In other words, it forgets that it's been burning for t time units already, and its probabilistic behavior in the future is as though you just put it in. Let's check to see that this is true. I mean that this equation is true. I don't know that it's definitely true about light bulbs.
39:43
If it is true about light bulbs, I think lambda is pretty big. My experience at home, I'm always going to the garage getting the ladder out and putting in new light bulbs. So lambda must be large for the light bulbs I'm buying. Let's check this. How can we do this again? We use the definition of conditional probability.
40:18
Now s and t are positive. I didn't say that, but let me say it now.
40:24
That means s plus t is bigger than t. That means that this event has no effect, because if x is bigger s plus t, it's automatically bigger than t. Now do we know how to compute the numerator and denominator?
40:42
Well, what random variable is this? This is the exponential. Here it is. Probability in exponential is bigger than this value. It would be e to the minus lambda times that value. So what's x in the numerator? It's s plus t.
41:11
Here it's e to the minus lambda t. That's e to the minus lambda s. Which is, in fact, the probability capital X is bigger than s.
41:27
This is sort of a renewal property. If something's been going on for t time units and nothing's happened yet, the probability you have to wait s more time units is the same as if you just started your wait for the event at time zero and had to wait s time units.
41:55
A random variable closely related to exponential is called the gamma.
42:09
Let me first introduce the gamma function.
42:38
This is an interesting function. If you think functions can be interesting, then
42:43
this is one of them. This is for alpha. We'll define this for alpha bigger than zero, because when alpha is zero you have a one over x here and near the origin that would not have a
43:00
finite integral, but as long as alpha is bigger than zero this is fine. You can integrate this by parts. Let me get rid of my sign there.
43:48
That would say the gamma of alpha is what? u times v evaluated between zero and infinity
44:00
minus the integral from zero to infinity v du. The v has a minus sign. Here's the minus sign. It becomes plus e to the minus x. That's the v. du is alpha minus one. I can put that alpha minus one out in front and I get x to the alpha minus two.
44:22
The first thing here is what? At infinity it's zero because of this guy. At zero it's zero because of this guy, so this term is zero. We get alpha minus two integral from zero to infinity x to the... I'm going to write alpha minus two in a perverse way. It may be weird.
44:47
Alpha minus one minus one because why? Look at the form of the gamma function. You have a number minus one and an exponent for x. What do we have here? A number minus one
45:06
in the exponent for x. What is this integral here? That's gamma at alpha minus one.
45:22
Oh wait, I'm sorry. I don't know how I got to alpha minus two there. It's alpha minus one here. That was alpha minus one. What does this say? I can erase this now.
45:49
What's gamma of one? When alpha is one, there's no power of x here. It's just the integral
46:01
from zero to infinity e to the minus x dx and what's that? If we look back to the exponential random variable, that's the integral of the density of the exponential with parameter lambda equal one and that has to be one. I think you all know how to do this integral. It's minus e to the minus x from zero to infinity. Gamma of one is one.
46:27
These two together imply that gamma of an integer n is what? n minus one factorial. So this is an integral way of representing the factorial functions. Why is that? Why is this?
46:47
Why does this follow from these two things? Put n here. What do we get? n minus one times gamma at n minus one. Now we repeat this with gamma of n minus one. We get gamma of n minus one is n minus two times gamma of n minus two. We'll keep going down until we get to
47:02
one and gamma of one is one. So this says if I put n here, this thing would be n minus one factorial. One last little variation on that. Let's try to compute this
47:32
before the break. Now we know how to compute this if lambda is one.
47:48
So that suggests what operation here? What should we do? We should try to get the variable of integration here. We should change variables. If lambda is one, we know this is gamma of alpha. So let's put y equal to lambda x. What's dy in that case? It's lambda
48:14
dx. How would this change a variable affect the limits of integration? When x is zero, what's y?
48:22
0. When x is infinity, y is infinity. So that doesn't change. We get our e to the minus y here. That's good. For dx, we put in lambda to the minus one dy. And now for x, we put in
48:45
y over lambda. For x, we put y over lambda. We'd put y over lambda here. We get y to the alpha minus one. What power of lambda would we get? It's y over lambda, so we get lambda to the one minus alpha. Now I erase this.
49:20
And this gives lambda to the minus alpha. And what's this integral then? That's gamma of
49:30
alpha. So this is equal to gamma of alpha over lambda to the alpha.
49:49
And that says that the following is a density. That just means it's non-negative
50:30
and integrates to one. And when we come back, we'll talk about the random variables with that density. Those are called gamma, random variables. And these
50:41
model things where you have to wait for successive waiting times. Like maybe you wait until five light bulbs burn out. You put in a light bulb, wait until it burns out. Put in another one that's identical, wait until it burns out. Third one, wait until it burns out. Fourth one, wait until it burns out. Fifth one, when the fifth one burns out,
51:04
that would be a gamma random variable. Or I mentioned waiting at a traffic light, the light turns red to green. You're the fifth car back. The waiting time for the first car to take off is probably exponential. After that, the second one takes off. Another exponential,
51:20
third one takes off exponential. Some of these exponential random variables are a gamma random variable. So a lot of waiting time problems are modeled using gamma random variables. So let's take a break. Come back in about 10 minutes. Okay, so let me finish up the gamma random variable here.
51:45
This is a gamma random variable if its density is given by that.
52:10
And we'll work extensively with gamma later. I think I already mentioned the uniform distribution. Let me also mention x is a random variable. So this is the distribution function for x.
53:18
That's given by that. That means its density is this constant times that function.
53:24
That thing that looks like sigma squared is just the symbol, right? It's not. It's a number. It's a number. It's a positive number. But it's not the number squared, that's the whole symbol. No, it's a number squared. It's a sigma, and then it's squared. It's a sigma to the power of two. But they usually go together.
53:44
Well, they go together here and here. But then sigma is called the standard deviation. And the graph of this function, the density then,
54:05
this statement is equivalent to saying the density of x is one over root two pi sigma squared e to the minus x minus mu squared over two sigma squared. That's good for all x.
54:25
This is the famous bell shaped curve. The graph of this function looks like this. Symmetric with respect to mu.
54:40
Mu is zero, then it's centered over here. I have here a 10 mark note which used to be
55:03
eu. And there's a picture on here of Gauss. How do you like that? A country where they put mathematicians on the currency. That's a great country. He was a surveyor, so they have a surveying tool on the back. He was also the one who introduced that density. And so
55:28
there's a graph of it here on the 10 mark note. I'll pass it around. Here it is on the front. I'm waiting for the day when they have some, maybe Gibbs was a great American mathematician and
55:44
he did some things in statistical mechanics. Maybe they'll put something on a $100 bill someday. If you ever run for president, maybe that'll be a great idea.
56:06
So that's sort of a category or a catalog, I should say, of random variables with various distributions. A lot of times we do manipulations on random variables because we don't like the
56:20
ones we started with, so we try to transform them in some way. Or we do it for simulation purposes. There are tables that give a list of independent copies of uniformly distributed random variables. So there'll be just tables and numbers. There's a number, it's picked at
56:42
random between 0 and 1. The next one is another one picked at random between 0 and 1. You can use these to give independent copies of random variables with other distributions. So we take functions of random variables for a couple of reasons.
57:59
So we'll start with a random variable that has a given distribution function capital F.
58:04
It could be any one of the ones we've written down before. And then we take a function of that random variable. Let's say that it's an increasing function, though a decreasing function also would work, but I want the function to be 1 to 1. So maybe I should say strictly here,
58:22
so that g is 1 to 1. Makes it simpler. So it's either strictly increasing or it could be strictly decreasing. And then we take g of x. And what's the distribution function of g of x?
58:41
So that's another random variable. Function of a random variable is a random variable. And what's the distribution function of a random variable defined by? It's just that. It's a probability
59:21
that that random variable is smaller than or equal to little x. It's this function of little x. Let's try to compute that in terms of capital F. What is capital F? Capital F is the probability that capital X is less than or equal to little x.
59:45
Can we express this in terms of capital X is less than or equal to something?
01:00:00
to 1, so it has an inverse, and the inverse would be increasing or decreasing? If g is increasing, what about its inverse? It's increasing also. So if I apply g inverse to both sides, if this number is less than or equal to that one, what about g inverse of this number? It'd
01:00:20
be less than or equal to g inverse of that number. In other words, g inverse would preserve this order if it's strictly increasing. If I apply g inverse to here, what do I get? Capital X. And then that would be less than or equal to g inverse of little x, right?
01:00:47
But this is the function capital F evaluated here, okay? So that would be the distribution
01:01:03
function for g of x, F at g inverse of x. Let's do an example. Let's suppose x is uniformly
01:01:22
distributed on the interval from 0 to 1. When I say what is the distribution of a random variable,
01:02:15
it means what is the distribution function? So you should write down the probability that g
01:02:20
of x is less than or equal to little x and then try to compute that. So let's remind ourselves what this means, that capital X is uniformly distributed on the interval from 0 to 1. This means that's the graph of capital F. That's what it means to
01:03:46
be uniformly distributed. It means its distribution function is this. That means you're picking a point at random from the interval from 0 to 1 and there's no bias for any particular place in the interval. This is likely to come from one side, it's from the other.
01:04:03
Okay, so let's try to compute the distribution function for g of capital X where that's g.
01:04:34
Oh, and by the way, maybe I should say what would happen if g is decreasing. What would
01:04:56
change here? G inverse would then be decreasing so this would be reversed and that would be 1
01:05:22
minus F of g inverse of little x, okay? I just want to add that. So is this function increasing? X goes from where to where here? We're interested in values of capital X between 0 and 1. So we're
01:05:42
looking at X between 0 and 1 here. Is this function increasing? What about minus X, increasing or decreasing? The function L of X equals minus X. Is that increasing or decreasing? Look at the line Y equals minus X. What does that do? It goes down, right? What about 1
01:06:01
minus X? Increasing or decreasing? Decreasing. What about the log of 1 minus X? Log is an increasing function. An increasing function of a decreasing function would be, well, take derivatives. Maybe we just take the derivative of this. What's the derivative? And now you
01:06:25
have a minus sign here and when you take the derivative of this part you get a minus sign also so they would cancel giving something positive. So this is an increasing function, okay? This is increasing. So we can apply that formula there. And what's the inverse here? Find inverses.
01:07:28
You just solve for X here, right? So let's multiply by minus lambda, okay? So there's
01:07:47
g inverse, okay? So this would be the probability capital X is less than or equal to g inverse g inverse of Y, g inverse of X, and g inverse of X is 1 minus e to the minus lambda X. And
01:08:14
now where is this number? 1 minus this thing. X is going to be bigger than or equal to 0 here.
01:08:32
This number is between 0 and 1. So it falls in here. So this is just capital F, evaluated at 1 minus e to the minus lambda X. What would that be? I'm sorry, you can't read that,
01:08:46
can you? Let me just say if this is Y, what's f of Y? What's the height of that function, that Y? It's Y, right? So if we put this in here, what's f at this value? It's that value.
01:09:10
What distribution function is that? That was the exponential. So if I start with a uniform random variable on the interval from 0 to 1, and I take this function of it,
01:09:21
I get an exponential random variable. So I just mentioned that there are tables of independent copies of or independent samples of uniform random variables. How could you get independent samples of exponential random variables from that? Just apply this function to them. And that'll
01:09:43
generate a list of independent random variables or samples of independent random variables that have exponential distribution. So this transforms random variables with one distribution to random variables with another one. Let's do another example. Did I write this symbol here? You'll
01:10:40
get Y squared. And let's say Y is equal to AX plus B. What's the distribution of Y?
01:11:13
So what does it mean when you're asked, what is the distribution of Y? It means you should try to figure out what the distribution function is. That is,
01:11:21
you should write down the probability Y is less than or equal to little x and see if you can compute it. So you're being asked, what is this? Well, how do you start?
01:11:51
Well, this is given in terms of something that's known. What? That. And now we could
01:12:14
just subtract B from both sides. These two inequalities are equivalent. Now the next step
01:12:26
might require assuming A is either positive or negative. Which do you prefer? Okay, let's take positive. That's a good outlook. Let's say if A is positive, we divide both sides by A. If A was negative, what would happen? We would just reverse the inequality, right?
01:12:51
But the point is that this inequality is equivalent to that. So this event and this event are the same. That means the probabilities remain the same. Now we know what
01:13:16
that is. That's 1 over square root 2 pi sigma squared integral from minus infinity to X minus
01:13:27
A over B e to the minus Y minus mu squared over 2 sigma squared. Now you might think, well, how can I ever remember all that? Well, this is going to, you're going to see this so often that you'll wake up in the middle of night and it'll be the last thing you were
01:13:45
dreaming about before you woke up. Okay, so that's the distribution function of Y. What's the density of Y? The density of a distribution function is its derivative. So let's take the
01:14:02
derivative, the density of Y. How do you differentiate an integral? Especially if you have an integral from minus infinity to, let's say, H of X, little f of Y, dy. What's the
01:14:28
derivative of this with respect to X? Remember how that goes? It's f at H of X times H prime of X. Yeah, you're right. Thank you. I told you about my vertical version of, right? That's
01:14:59
happening again. Okay, so that's the H of X. So we just evaluate the density at H and then times
01:15:11
H prime. Okay, H of X is X minus B over A. So what's H prime? If that's H, X minus B over A,
01:15:22
H prime would be one over A, right? I'm going to put a little cap, little Y here. I evaluate
01:15:40
here. Okay, I don't like that all that well because it looks a little complicated, doesn't it?
01:16:14
So let's try to simplify it a bit. I'm going to put the A inside here and that means it becomes
01:16:38
A squared. And then because I don't have enough room to write high, I'm going to write X instead
01:16:46
of E2. I think you know that notation, right? And then up here, let's see, I'm combining
01:17:18
these two guys and this one's a fraction. And if I'm going to add two things, I should have
01:17:25
a common denominator. So the denominator should be A. So this becomes mu A over A. And then I'll have something over A in here, right? But it's squared. So where could we put that?
01:17:42
We could put it down here, right? We can put the A squared down here. And then what would be left up here? Well, this doesn't have to be changed. You get X minus B, but then what's left here? A mu. I'm going to write this as A mu over A. I combine this, I get X minus B minus A mu. And we write it this way. Okay? Is everybody
01:18:16
okay with that? No objections? You see why? Okay. So what do we have?
01:18:26
Yeah, it's normal with what's the new mu? This is normal density. This becomes a mu, B plus
01:18:46
A mu. And the new variance becomes, the new sigma squared becomes sigma squared A squared. Okay. So if we start with something with normal mean mu variance sigma squared, can we get
01:19:11
by this transformation, which I erased? Can we get to a normal with mean zero and variance one?
01:19:23
We took A X plus B and we got to this. Can we arrive at normal mean zero variance one? Can we get the new mu to be zero and the new variance to be one? How could we go from
01:19:40
normal mean mu variance sigma squared to normal mean zero variance one? Can we do that? All we have to do is solve the equation B plus A mu is zero. Sigma squared A squared equal one. So what should A squared be? Or what should A be? One over A. I'm sorry,
01:20:04
A should be one over sigma. And mu should be, I'm sorry, B should be, well,
01:20:21
minus mu over sigma. Then we go from, if we multiply by A and add B, they go from there to there. What about the other way? Can we go backwards? Suppose we start with a normal mean zero and variance one. Can we get to a normal mean mu variance sigma squared? How do we go from here to here? So that means
01:21:07
we start with this as one zero and this is one. That means we wind up with one over square root two pi A squared exponential. We wind up with what? X minus, this is zero, so X minus B
01:21:28
squared over two A squared. Now how could we get mu here and sigma here? Just take B equal mu and A equal to sigma. That is, if X is, I'll usually write it this way, if X is,
01:21:57
can't do it right, normal with mean zero and variance one, then mu X, I'm sorry,
01:22:08
sigma X plus mu is normal with mean mu and variance sigma squared. This is how you go from a normal with mean zero and variance one to a normal with mean mu and variance sigma squared. So this is handy because in a lot of statistical tests, like
01:22:27
here's a dream job, you get your degree in math and statistics and you're hired by I don't know, who owns M&M's? I forget. Maybe Nestle's, I don't know. Or maybe you're hired
01:22:43
by Nestle's. They make chocolate drops, right? These chocolate chips. Chocolate chips don't all have the same weight. The weight of a chocolate chip is a random variable and it's got a normal distribution with some mean mu and some variance sigma squared and you're supposed to
01:23:02
find out what it is. You use things like this where you try to go from normal with mean zero and variance one to whatever this mean weight and variance for the chocolate chip is. So you want to use this transformation. Also, there are tables for normal random variables with mean
01:23:25
zero and variance one. In fact, in the back of the book, there's a table for the distribution function of normal with mean zero and variance one. So that is, in the back of the book, you'll find values for this function of x for a lot of different x's.
01:23:59
And it should be table two. In my older version, well, it's the appendix B table two.
01:24:18
So they will have a bunch of numbers in here and right here is 0.4 and up here is 0.04. So
01:24:26
if I go across here to 0.44, this number is a probability that a normal random variable with mean zero and variance one is less than or equal to 0.6700. I'm sorry, the probability the random variable with mean zero and variance one is less than or equal 0.44 is 0.6700.
01:24:45
Now, if I want to know a similar probability for this kind of random variable, I apply this transformation to normal random variable with mean zero and variance one and I get one with mean zero and variance mu squared or I might go backwards. So this helps you to read off
01:25:04
distribution function for this kind of random variable from a distribution function for the normal mean zero and variance one. And later on in the statistical testing part of the course, we'll be doing that kind of thing a lot. Okay, so that's the end of chapter two.
01:25:26
Now let's start chapter three on joint distributions. A lot of times comparisons are made. For example, my oldest son has a shoe size three sizes bigger than mine, size 14.
01:25:51
So that's a random variable. My shoe size is a random variable. You might look at the shoe size of a father and the shoe size of his son. These will be two random variables. Will they be
01:26:03
related somehow? Would you expect, I don't know, Yao Ming, for example, to have a son with a size nine shoe? Probably not, right? But Yao Ming's son's shoe size would be related to his shoe size.
01:26:20
Or to go the other way, who knows who Messi is? Who is he? Soccer player. Yeah, he's a soccer player. He's probably the best soccer player in the world. And in fact, his team spotted him when he was a young boy. He was very highly skilled,
01:26:41
but he was really small. So in fact, they gave him growth hormones. But he's still small. But you would expect his son to have small shoe size, right? So these are examples of pairs of random variables, shoe size of a father, shoe size of a son, or height of a father, height of a son, which are correlated.
01:27:03
They have some kind of joint characteristics, joint distribution. So chapter three is on jointly distributed random variables. And there are many examples of random variables that come
01:27:26
in pairs that are jointly distributed. So let's do a couple of maybe very simple examples that involve what we've done already. Let's say we select a number at random from 0, 1.
01:28:08
Then we perform n independent Bernoulli trials with probability u of success.
01:29:22
What is the probability that we have no more than k successes and that the u we picked is less than or equal to the u? This is called the joint distribution of
01:29:45
x and u. So we perform some operation, some experiment first, namely picking a number at random. That becomes our probability of success. And then with that probability of success, we generate a random variable, which is
01:30:01
the number of successes in n independent Bernoulli trials with that probability of success each time. Now this is a little ambitious to do the first time. Let's try something a little easier. Let's try the following. Let's say, I'm sure you'll be in favor of trying
01:30:34
an easier computation first. Let's say that the probability that u is a fourth is equal to
01:30:44
the probability that u is three fourths is one half. So we have a random variable that takes two values, one fourth or three fourths with equal probability. So how could you do that?
01:31:01
Well, you toss a coin. If it comes up heads, you set u equal a fourth. If it comes up tails, you set u equal to three fourths. Now let's try this. We could look at now, in fact, the pmf. This would be the joint pmf. Well, what kind of values
01:31:42
of j do we have to consider here? Just this one and this one. So let's start with this one. So what do we have here? Maybe we should write this in terms of conditional probabilities.
01:32:14
It would look like that, right? Now, given that u is a fourth, what kind of distribution does
01:32:20
x have? It's binomial. Yeah, go ahead. Oh, I'm sorry. Yeah, yeah, yeah. There it is again. Thanks. Thank you. So given u is a fourth, what distribution does x have?
01:32:49
Over here it says that it would be, we do the same kind of thing. It's binomial with parameter one fourth. It's binomial with parameter one fourth. So this would be
01:33:02
n choose k, one fourth to the k, three fourths to the n minus k, times a half, the half coming from here. And above it here, I'll write the probability
01:33:30
x is equal to k and u is three fourths would be what? Do the same thing. That's the probability x is equal to k given u is three fourths times the probability u is three fourths,
01:33:49
which would be, this is now binomial with parameters n and t equal three fourths. So the n choose k, three fourths to the k, one minus three fourths is one fourth
01:34:09
to the n minus k times a half. So generally we say the pair x, y of random variables has
01:35:27
a joint distribution function capital F if the probability capital X is less than or equal to x, capital Y is less than or equal to y is f of x, y. So this would look like this. Here's
01:35:43
x. Here would be y. This is the point x, y. That probability here is the probability that the pair x, y falls in that region. In case of continuous random variables,
01:36:55
the pair x, y has joint density little f if the probability you lie in a region like
01:37:00
this can be gotten by integrating a function little f. So let me just give one last example of a density and we'll stop.
01:37:43
OK, so do you know what this means? Indicator of something? This means a function that is one if this condition is satisfied and it's zero if it's not. So this density is one times this
01:38:24
number when x squared plus y squared is less than or equal to r squared. And it's zero when x squared plus y squared is bigger than r squared. So where is this function different from zero? Inside this circle centered at the origin of radius r. And what's its value on
01:38:42
that circle? It's one over the area of the circle. A pair of random variables of this joint density is said to be selected at random from the disk of radius r centered at the origin. This is uniformly distributed on the disk or circle of radius r centered at the origin.
01:39:03
OK, we'll do more with that later. OK, thanks.