Joint Distribution (1)
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 4 | |
Number of Parts | 16 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/12889 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
00:00
Open setMathematicsExact sequenceWahrscheinlichkeitsmaßRandom variableGraph (mathematics)Numerical analysisProbability distributionFunctional (mathematics)Hydraulic jumpBinomial heap1 (number)SpacetimeRandomizationPoint (geometry)SequenceRight angleMassSampling (statistics)ForceWahrscheinlichkeitsfunktionDistribution (mathematics)Parameter (computer programming)Drop (liquid)Multiplication signWater vaporExpected valueSpherical capSign (mathematics)Process (computing)Euklidischer RingSequelClosed setLecture/Conference
09:55
MKS system of unitsLikelihood-ratio testMassNormal (geometry)Distribution (mathematics)Functional (mathematics)WahrscheinlichkeitsfunktionStatisticsMultiplication signNumerical analysisAnalytic continuationCategory of beingFactory (trading post)Normal distributionConditional probabilityArithmetic meanFigurate numberProcess (computing)Spherical capVariable (mathematics)Mortality rateRule of inferenceIndependence (probability theory)Set theoryLine (geometry)Parameter (computer programming)GeometryExpected valueSequelProper mapEnergy levelRandom variableMoving averageAdditionRootRandomizationGeometric distributionBinomial heapComputabilityHydraulic jumpBinomial distributionDifferent (Kate Ryan album)VarianceObservational studyLecture/Conference
19:50
Physical lawCondition numberMultiplication signInequality (mathematics)Power (physics)Event horizonTerm (mathematics)Summierbarkeit2 (number)Variable (mathematics)MathematicsDistribution (mathematics)Different (Kate Ryan album)Numerical analysisRight angleEquals signFunctional (mathematics)Geometric seriesSeries (mathematics)DivisorInfinityParameter (computer programming)Affine spaceProcess (computing)Charge carrierFrequencyConditional probabilityProbability theoryCommutatorGrothendieck topologyDecision theoryCuboidEntropyLecture/Conference
29:46
Moment (mathematics)Multiplication signEvent horizonTrailRight angleTerm (mathematics)Parameter (computer programming)Category of beingHydraulic jumpDistribution (mathematics)Functional (mathematics)GleichverteilungObject (grammar)Exponential distributionCartesian coordinate systemNumerical analysisPopulation densityDifferent (Kate Ryan album)Well-formed formulaDerivation (linguistics)Continuous functionTheoremRandom variableFundamental theorem of algebraNichtlineares GleichungssystemMaß <Mathematik>Theory of relativityPositional notationPotenz <Mathematik>Unitäre GruppeCalculusSpherical capPosition operatorResultantProcess (computing)Decision theoryMarginal distributionOrbitSet theoryAnalytic continuationPhysical lawSocial classLecture/Conference
39:41
Duality (mathematics)InfinityDirected graphRight angleNumerical analysisGamma functionAlpha (investment)Sign (mathematics)Multiplication signFunctional (mathematics)Event horizonExt functorCategory of beingMaß <Mathematik>Existential quantificationSpherical capIntegration by partsRenewal theoryExpected valueFraction (mathematics)Grothendieck topologyLimit (category theory)Order (biology)TrailGroup actionMathematicsVariable (mathematics)Position operatorBuildingReal numberOperator (mathematics)Potenz <Mathematik>DivisorIntegerPopulation densityExponentiationPower (physics)ModulformTerm (mathematics)INTEGRALBounded variationMany-sorted logicConditional probabilityMereologyParameter (computer programming)Lecture/Conference
49:36
Gamma functionMereologyIndependence (probability theory)Set theoryMultiplication signTable (information)1 (number)Functional (mathematics)Statistical mechanicsRight angleModulformNumerical analysisSquare numberDistribution (mathematics)Category of beingForestPopulation densityCircleRandomizationTerm (mathematics)Physical lawMatching (graph theory)Random variableGraph (mathematics)Archaeological field surveyMathematicianMarginal distributionState of matterGroup actionExplosionRootSymmetric matrixMany-sorted logicExponential distributionSigma-algebraPower (physics)Genetic programmingMathematicsGibbs-samplingStandard deviationCurveGleichverteilungSign (mathematics)Goodness of fitIdentical particlesModel theoryLecture/Conference
59:31
Physical systemMathematicsTable (information)Water vaporSpherical capFunctional (mathematics)RandomizationUniformer RaumDistribution (mathematics)Expected valueTerm (mathematics)Line (geometry)Element (mathematics)Cohen's kappaLogarithmNumerical analysisWell-formed formulaSign (mathematics)Order (biology)Inverse elementUniverse (mathematics)BijectionFigurate numberGraph (mathematics)Cartesian coordinate systemImmersion (album)Associative propertyPotenz <Mathematik>Inversion (music)Position operatorMereologyExponential distributionDerivation (linguistics)Lecture/Conference
01:09:26
Independence (probability theory)Square numberArithmetic meanPopulation densityUniformer RaumSampling (statistics)Functional (mathematics)Series (mathematics)Point (geometry)Numerical analysisEvent horizonNormal (geometry)Object (grammar)Distribution (mathematics)Potenz <Mathematik>Right angleInequality (mathematics)Nominal numberTerm (mathematics)Fraction (mathematics)Prime idealNormal distributionGenetic programmingSpherical capRule of inferenceState of matterDaylight saving timeVector spaceMultiplication signDerivation (linguistics)Transformation (genetics)VarianceVertex (graph theory)Exponential distributionInfinityRandom variableGleichverteilungSigma-algebraRootLecture/Conference
01:19:22
Different (Kate Ryan album)RandomizationFunctional (mathematics)Nichtlineares GleichungssystemNormal (geometry)Random variablePairwise comparisonDistribution (mathematics)Process (computing)Multiplication signVarianceSigma-algebraTable (information)Arithmetic meanSquare numberTransformation (genetics)MereologyStatistical hypothesis testingNormal distributionWeightIndependence (probability theory)Characteristic polynomialEqualiser (mathematics)MathematicsPoint (geometry)Potenz <Mathematik>Drop (liquid)RootNumerical analysisDegree (graph theory)StatisticsVariable (mathematics)Mortality rateMoment <Mathematik>Grothendieck topologyThomas BayesDivisorCartesian coordinate systemStandard errorDirection (geometry)State of matterDistanceRight angleSupremumLecture/Conference
01:29:17
WahrscheinlichkeitsfunktionPoint (geometry)Distribution (mathematics)Functional (mathematics)Probability distributionEqualiser (mathematics)Multiplication signComputabilityParameter (computer programming)Conditional probabilityRandom variableBinomial heapBinomial coefficientPopulation densityCondition numberMultilaterationRadiusCircleIndependence (probability theory)Numerical analysisRandomizationOperator (mathematics)Term (mathematics)AreaRight angle4 (number)Sieve of EratosthenesStaff (military)Cohen's kappaDecision theoryBuildingState of matterForcing (mathematics)SpacetimeCartesian coordinate systemExplosionPrice indexLecture/Conference
Transcript: English(auto-generated)
00:05
capital X is less than or equal to little x is equal to 0 if little x is less than
00:23
0. It's 1 minus p if 0 is less than or equal to x is less than 1 and it's 1 if x is
00:41
greater than or equal to 1. That function looks like this. It's 0 up to but not
01:28
including the value x equals 0. It jumps to 1 minus p at 0, stays there until you get to 1, then it jumps to 1 and stays there. And from this graph you can read
01:49
out the following pieces of information. The probability that capital X is 0 is the jump at 0 and the jump at 0 is 1 minus p. The probability
02:08
that x is 1 is the jump at 1 and that's size p. Now to see why this is true, this
02:24
captures every aspect that you need to know about discrete random variables. Discrete random variables are ones that have this feature. They have jumps and between the jumps they're flat. Those are called discrete random variables. If you
02:42
look at this function here and you have y less than x, then f of x minus f of y would be the probability that little y is strictly less than capital X is
03:01
less than or equal to 1. So x is a function on this two-point space. How many values can x
03:20
take on if it's a function? How many values can it take on at s? Only one and then f only one. So x can only have two values. If I put, and what
03:47
is this probability in this case? Well, it's f of x minus f of y. We did this on Friday.
04:04
What's f of x? 1 minus p. What's f of y? It's 0. And does it matter how close x and y are to 0? Well, a little bit x can't be over here. But as long as x is near 0 to the right of it and y is near 0 to the left of it, this will be 1 minus p.
04:24
And if I let x and y collapse down, what do we conclude about the value of x? What do we conclude about the probability that x is equal to 0? That it's, well what does
04:40
this interval collapse to when x goes to 0 and y goes to 0? Collapse to the point 0. So where could x be? It has only two possible values. One of them is going to be 0 and the probability of being 0 is 1 minus p. On the other hand, if I put x over here and y there,
05:04
what would f of x minus f of y equal? Well, f of x is 1. f of y is 1 minus p. 1 minus p is p. So this would be the jump here, which would be p. And I can let x and y get arbitrarily
05:22
close to 1. And that would mean that probably x is 1 is p. Now to review, Bernoulli random variables led to a whole bunch of other ones. In fact, probably almost all the ones
05:44
you will study that are discrete are coming from Bernoulli random variables. Any question on this Bernoulli random variable and its distribution function? I guess there's one
06:04
last thing to say. The probability mass function, otherwise the probability of x is
06:44
equal to a value little x is 0. And when you don't assign the values to a probability mass function, let's assume that they're 0. And we'll use an abbreviation for a probability
07:04
mass function, pmf. OK, so Bernoulli random variable is a random variable that assigns
07:51
value x to success, I'm sorry, 1 to success, 0 to failure. Binomial random variable,
08:03
you do n trials, which are Bernoulli. These aren't all equally likely, but the probability
08:31
and so what you do here is x of an outcome is the number of S's in the outcome. So each
08:47
of these sequences of S's and F's, in each one you count the number of successes, that's x. And the probability of any outcome is the following, it's p to the number of S's,
09:07
p to the number of S's in omega, and then 1 minus p to the number of F's in omega.
09:24
OK, so omega would be a sequence of S's and F's. And we put a probability measure on this sample space, which assigns this amount, this probability to omega. If omega has k, say, k S's and n minus k failures, this would be p to the k, 1 minus p to the n minus k.
09:51
And then here a binomial random variable with parameter n and p, which will abbreviate
10:06
binomial in p as property that the probability that x is equal to k is p to the k, 1 minus p to the n minus k. So binomial random variable has interpretation, it's the number of successes
10:24
in n independent trials of, n independent Bernoulli trials. Oh, I'm sorry, I forgot something here, times n choose k. Bernoulli also led to geometric. You perform independent
11:45
Bernoulli trials, that is, you perform an experiment that has two outcomes, success and failure. Probability of success is p, probability of failure is 1 minus p. You perform that until the first success happens. That random variable is called a geometric random variable.
12:03
And last time we said that probably that x is k for this random variable is 1 minus p to the k minus 1, and then times p, and this is for k equal 1, 2, etc. So this is
12:21
the probability mass function for a geometric random variable. The distribution function here, the probability capital X is, let's say, bigger than or less than or equal to k, would be the sum, j equal 1 to k, 1 minus p to the j minus 1 to the k minus
12:48
1 times p. Get the distribution function from the probability mass function by adding.
13:04
Bernoulli also led to Poisson, perform n independent trials, but the probability of
14:13
success depends on n. I gave some biological examples, also the example of an emission of alpha particles from a blob of radioactive substance. Then the probability that x is
14:27
equal to k is approximately equal to lambda to the k over k factorial e to the minus lambda. And a random variable with this pmf is called Poisson. And then finally, Bernoulli
15:02
led to a normal distribution, so then the probability X over root n is less than
16:18
the number little x, is approximately given by this distribution function. And this is
16:25
the distribution function for normal random variable with mean 0 and variance 1. So Bernoulli led to all these different random variables or distributions. We'll study all of them. We spent some time on discrete, let's spend a little time on continuous.
16:56
I would like to do one computation with the geometric before going on to continuous
17:01
though. Continuous means that the distribution function of the random variable has no jumps. That's what continuous random variable means. So again, here I'm taking x to be geometric.
17:35
Okay, so when I say that, I don't say what the value of x is, I say something about the statistical properties or probabilistic properties of x. And the only probabilistic
17:45
information we have about x would be its pmf or equivalently its distribution function. So when I say x is a geometric random variable, I'm saying its distribution function is this. Whenever I identify a random variable, x is something like normal. I'm saying what
18:03
the distribution function is. If I say x is Bernoulli, I'm saying what the distribution function is. So here I'm saying x is a geometric random variable. And I should say what the parameters are. That means if I want to know what the probability that x is smaller
18:24
and equal to some value k, it's equal to this. That's the information we're given. Now I want to do a conditional probability. I want to compute the probability that what?
18:55
What's the interpretation of x again? It's a trial on which the first success appears.
19:00
So it's like p is a 6. This is the same as rolling a die until the first one appears. And the number of rolls it takes, the number of roll on which the first one appeared, that's the value x. So what's the probability that the first success occurs after this
19:21
trial given that it didn't occur in the first n minus 1 trials? So what's the probability that the success doesn't occur in the first n plus k minus 1 trials given that it did not occur in the first n minus 1 trials? Does everybody understand what I'm saying is contained
19:44
in this line here? x is the trial in which the first success occurs. It says you've been trying whatever you're trying like rolling a die or tossing a coin, waiting for success. You've done it n minus 1 times and you didn't succeed. What's the probability that if you do it k more times, you still don't succeed? That's what this probability is.
20:05
Now do you recall the definition of conditional probability? Probability of a given b? These are two events. This is an event, that's an event. So we look at the probability of a given b. Remember that's the probability of a
20:28
intersected with b divided by the probability of b.
20:45
So we do that here. Here's the event a. This is the event b. What does this become?
21:16
So this comma means both things happen. Now one of those inequalities is stricter
21:24
than the other in the numerator. Which one is stricter or which one gives a smaller event? Right. Because if x is bigger than this, it's automatically bigger than that. We're taking k to be 1, 2, 3, etc. So that means I can just do what? Get rid of it,
21:46
cross that out. It doesn't add anymore to the event. Right. So now let's compute. What's the probability? Can we compute the probability that x is bigger than something?
22:01
We know the probability x is smaller than something, smaller or equal to. A distribution function has this form. What if you wanted to compute right? It's 1 minus f of x.
22:30
It's just the thing that I told you before. The probability of a complement is 1 minus the probability of a. a here is this event, x less than or equal to
22:43
little x. What's a complement then? x bigger than little x. So here's a complement. This would be the probability of a. So we can compute these things because what? We know the distribution function. So how would we turn this around? What would
23:05
1 minus, this is f of k here. What's 1 minus f of k in that case? Well what happens when I add this up j equal 1 to infinity? I get f of infinity which is 1. 1 minus f of k would be the sum k plus 1 to
23:28
infinity. Let me do it this way. This one is the sum j equals 0 to infinity.
23:41
1 minus p to the j minus 1 times p. This one is the sum j equals 0 to k. 1 minus p to the j minus 1 times p. Does everybody agree with this? I did this last time.
24:02
Why is this 1? 1 minus p. This is a geometric series here. I'm sorry this should be 1 here.
24:21
That's 1. I started at the wrong place here. 1, 1, 1. Did I start at the right place here? Yeah I did it right. Or the other way to say it is this is f of infinity and the So what would we have if we subtract this from this?
24:41
Well we'd have the terms from k plus 1 to infinity right? Okay so can you say anything more specific about that? Can we evaluate that?
25:08
Well what do I mean? Can we make it simpler? Can we express it without using an infinite series? Well there's a factor that's multiplied time at every term here with p right and that
25:25
doesn't depend on j so I can pull that out. And what do we know about series that look like this?
25:42
We do know the sum j equals 0 to infinity r to the j is 1 over 1 minus r as long as absolute r is less than 1. And here we have 1 minus p. p is the number between 0 and 1.
26:02
So absolute 1 minus p is less than 1. Now the difference between this and this is what? Well I guess one difference is here's a j and there's a j minus 1. But I don't think that's a big obstacle. We could just change variables right?
26:21
So let's do that first. Well maybe we do that second. What's the other difference? Here we start at k plus 1. Here we start at 0. So we'd like to start at 0 and have a j there.
26:41
So maybe what can we do? Can we factor out? What's the first term here? If j is k plus 1 what appears here? If j is k plus 1 what appears here? k right? If this is k plus 1, k plus 1 minus 1 is k.
27:03
So the first term here is 1 minus p to the k and all the others will have higher powers of 1 minus p. So why don't we factor out 1 minus p to the k? It'll occur in every term. So p 1 minus p to the k sum j equals something to infinity 1 minus p to the what?
27:28
Well now that we've taken out the 1 minus p to the k we start with what? 1 minus p to the k was here. We pulled it out. So what's left for the first term?
27:41
When j is k plus 1 we factor out 1 minus p. This becomes 1. What's the next term? We put k plus 2 we get 1 minus p to the 1. What's the next one? When k plus 3 we get 1 minus p squared so we get 1 minus p to the 0 plus 1 minus p to the first plus 1 minus p squared
28:03
plus 1 minus p cubed. In other words we get some j equals 0 to infinity 1 minus p to the j wouldn't we? And this we can evaluate using that. So we get p times 1 minus p to the k times 1
28:25
over 1 minus 1 minus p and 1 minus 1 minus p is p and here's a p. So this and this cancel
28:48
leaving 1 minus p to the k. So let's summarize what did we just do. The probability that x is bigger than k for a geometric random variable of parameter p
29:05
is 1 minus p to the k. All right so what did we want to compute? We want to compute this ratio.
29:21
Can we do it? Yes we can. We know what things like this look like and what things like that look like so let's evaluate that. Here we get 1 minus p to what power and here we get and this is
30:02
1 minus p to the k which was the probability you have to wait k trials or you are a little more careful it's the probability that you don't get a success in the first k trials. So the probability you have to wait k more trials
30:28
even after you've gone n minus 1 trials is the same as though you're starting from the beginning. So if you're rolling a die and you've done it a thousand times you get a 1
30:41
you didn't get a 1 does it mean that you're due that you should be more likely to get a 1 next or in the next few trials? No you're not due. It's just like you're starting again as though you haven't rolled the die at all. Sometimes you hear people say that in the lottery some
31:04
numbers are going to be hot because why? Well they haven't appeared for a while. Well does that mean they're going to appear soon? No. So keep this lesson in mind. If you forget everything else remember this. Okay and does it make sense to you? Yeah. Why? The coin or die or whatever
31:30
you're doing with it is a an object with no memory no way of controlling outcomes just a stupid object right? Okay so now let's go to continuous random variables again. I think I did
31:55
maybe I didn't like it continuous I should try to spell it right.
32:34
Okay so it means that the distribution function is a continuous function of x. The distribution functions I did earlier were all had
32:42
flat spots interspersed I mean jumps interspersed with flat spots. Okay here there's continuous increase though they can these functions can have flat spots but no jumps. Okay we did one last time uniform random variable.
33:04
The very important one is called the exponential random variable with parameter lambda. Let me be a
33:59
little more careful x cannot take on negative values the probability capital x is negative is
34:16
zero the probability that's less than or equal to x is like this. If this is the case what
34:39
would the probability that capital x is bigger than little x equal? Well what's the complement
34:49
of this event? It'd be one might this is the complement of this event right capital x bigger than little x is the complement of capital x less than or equal to little x. So this probability
35:01
is one minus that probability. What's one minus this probability? Right e to the minus lambda x. So the probability dies off exponentially fast as little x goes to infinity. Now
35:23
sometimes these functions are differentiable in this case it is. What's the derivative of this function? It would be lambda e to the minus lambda x for x positive
35:40
and zero for x negative and at zero it's not differentiable. No but we call this we usually give the notation little f and that's called the density
36:05
and what's the relation between a function and its derivative that is given by the integral calculus? Do you remember the fundamental theorem of calculus? If you integrate the derivative of a function what do you get? You get the function back again right.
36:46
If f prime is little f then the integral of little f from wide x gives f of capital f at x minus f of y and if these are distribution functions then this would be okay. So keep this
37:06
formula in mind if you want to know the probability like this if you know the distribution functions you just take the difference of the distribution functions if you know the density you integrate the density from y to x. Okay so these random variables the exponential
37:28
random variables parameter are used to model waiting times like the time you wait for a bus the time you wait for the car in front of you to start up after light turns from red to green
37:42
uh the time you wait for a phone call the time you wait for lightning to strike the time someone waits for a customer the first customer come into a store all kinds of things like that the time you wait for a light bulb to burn out the time you wait until the first emission
38:01
of an alpha particle from your smoke detector or once one is emitted the time you wait until the next one is emitted so many many uh applications of this and they have also the same peculiar property here that uh geometric random variable had and that is if you're interpreting this in
38:45
terms of light bulbs and x is the time when the light bulb burns out you put the light bulb in now you keep track of how long it's been burning when the first moment it burns out that's the value of capital x it says that this is the event that the light bulb has not burned out by
39:03
time t it's still when it burns out is later than t what's the probability that it lasts longer than s plus t moments given that it's lasted t moments it says that it's the probability that it lasts at least s moments in other words it forgets that it's been burning for t
39:27
time units already and it's probabilistic behavior in the future as though as though you just put it in let's check to see this is true i mean that this equation is true um i don't know that it's definitely true about light bulbs uh if it is true about light bulbs i think lambda
39:46
is pretty big my experience at home i'm always going to the garage getting the ladder out and new light bulbs so lambda must be large for the light bulbs i'm buying but let's check this how can we do this again we use the definition of conditional probability
40:17
and now s and t are positive i didn't say that but let me say it now
40:24
that means s plus t is bigger than t and that means that this event has no effect because if x is bigger s plus t it's automatically bigger than t and now do we know how to compute the numerator and denominator
40:42
well what what random variable this is this this is the exponential and here it is probably an exponential is bigger than this value would be e to the minus lambda times that value okay so what's x in the numerator it's s plus t
41:10
and here it's e to the minus lambda t and that's e to the minus lambda s which is in fact the probability capital x is bigger than s so this is sort of a renewal
41:29
property if something's been going on for t time units and nothing's happened yet probably you have to wait s more time units is the same as if you just started your wait for the event at time zero and had to wait s time units
41:46
okay a random variable closely related to exponential is called the gamma
42:08
let me first introduce the gamma function
42:38
this is an interesting function if you think functions can be interesting then
42:42
this is one of them and this is for alpha we'll define this for alpha bigger than zero because when alpha is zero you have a one over x here and near the origin that would not have a
43:00
integral finite integral but as long as alpha is bigger than zero this is fine you can integrate this by parts let me get rid of my sign there okay and that would say
43:50
the gamma of alpha is what u times v evaluated between zero and infinity minus integral from
44:01
zero to infinity v du well the v has a minus sign here's the minus sign it becomes plus e to the minus x that's the v du is alpha minus one i can put that out the alpha minus one out in front and i get x to the alpha minus two the first thing here is what at
44:24
infinity it's zero because of this guy at zero it's zero because of this guy so this term is zero so we get alpha minus two integral from zero to infinity x to the i'm going to
44:41
write alpha minus two in a perverse way well maybe weird let me take that alpha minus one minus one because why look at the form of the gamma function you have a number minus one and
45:01
an exponent with x for x right what do we have here a number minus one in the exponent for x so what is this integral here that's gamma at alpha minus one oh wait i'm sorry this is this
45:25
is i don't know how i got to alpha minus two there it's alpha minus one here that was alpha minus one alpha minus so what does this say i can erase this now right and what's gamma of
45:50
one when alpha is one there's no power of x here it's just the integral from zero to infinity
46:02
e to the minus x dx and what's that well if we look back to the exponential random variable that's the integral of the density of the exponential with parameter lambda equal one and that has to be one and i think you all know how to do this integral it's minus e to the
46:21
minus x right from zero infinity gamma one is one so these two together imply that gamma of an integer n is what n minus one factorial so this is an integral way of representing the factorial functions why is that why is this why is this follow from these two things put
46:48
n here what do we get n minus one times gamma at n minus one now will you repeat this with gamma of n minus one we get gamma of n minus one is n minus two times gamma of n minus two and we keep going down until we get to one and gamma of one
47:03
is one okay so this says it if i put n here this thing would be n minus one factorial okay one last little variation on that let's try to compute this before the break
47:43
now we know how to compute this if lambda is one so that suggests what operation here what should we do we should try to get the variable of integration here we should change variables if lambda is one we know this is gamma of alpha so let's put y equal to lambda x
48:08
what's dy in that case it's lambda dx okay how would this change the variable affect the limits of integration when x is zero which y
48:22
zero when x is infinity y is infinity so that doesn't change we get our e to the minus y here that's good for dx we put in lambda to the minus one dy and now for x we put in y over lambda for x we put y over lambda we put y over lambda here
48:53
we get y to the alpha minus one what power of lambda would we get it's y over lambda so
49:01
we get lambda to the one minus alpha right now i'll erase this and this gives lambda to the minus alpha and what's the what's this integral then that's
49:28
gamma of alpha so this is equal to gamma of alpha over lambda to the alpha and that says
49:50
that the following is the density that just means it's non-negative and integrates to one
50:32
and when we come back we'll talk about the random variables with that density those are called gamma random variables and these model things where you have to wait for a successive
50:47
you have successive waiting times like maybe you wait till five light bulbs burn out you put in a light bulb wait till it burns out put another one that's identical wait till it burns out third one wait till it burns out fourth one wait till it burns out fifth one when the fifth one
51:03
burns out that would be a gamma random variable or i mentioned waiting at a traffic light the light turns red to green you're the fifth car back the waiting time for the first car to take off is probably exponential after that the second one takes off another exponential third
51:21
one takes off exponential some of these exponential random variables are they gamma random variables so a lot of waiting time problems are modeled using gamma random variables so let's take a break come back in about 10 minutes okay so let me finish up the gamma random variable here
51:45
this is a gamma random variable if it's sensitive given by that
52:10
and we'll work extensively with gamma later i think i already mentioned the uniform distribution which let me tell also mentioned
52:31
x is rather than variable this is the distribution function for x
53:18
let's give my that means its density is this constant times that
53:22
yeah that thing that looks like sigma squared is just the symbol right it's not it's a number it's a number yeah positive number but it's not the number squared that's the symbol uh no it's a number squared it's a sigma and then it's squared yeah it's a sigma to the power of two but they always they usually go together well they go together here and here
53:47
but then sigma is called the standard deviation and the graph of this function the density then
54:05
this statement is equivalent to saying the density of x is one over root two pi sigma squared e to the minus x minus mu squared over two sigma squared that's good for all x
54:25
this is the famous spell shape curve mu is here graph of this function looks like this symmetric with respect to mu mu is zero then it's centered over here i have here a 10 mark note which used to be
55:00
the currency in germany before the eu and there's a picture on here of uh gauss how do you like that country where they put mathematicians on the on the currency that's a great country he was a surveyor so they have a surveying tool on the back he was also the one who
55:25
introduced that density and so there's a graph of it here on the 10 mark note i'll pass it around see here it is on the front i'm waiting for the day when they have some maybe gibbs was
55:42
a great math american mathematician and he did some things in statistical mechanics maybe they'll put some something on a hundred dollar bill someday if you ever run for president maybe that'll be a great idea you can okay okay so that's sort of a category of uh or a catalog
56:10
i should say of uh random variables uh with various distributions um a lot of times we do manipulations on random variables because we don't like the ones we started with so we
56:21
try to transform them in some way uh or we do it for simulation purposes there are tables that give um a list of independent uh copies of uh uniformly distributed random variables so there'll be just tables on numbers there's a number it's picked at random between zero one
56:43
the next one is picked another one picked at random between zero one you can use these to uh give independent uh copies of random variables with other distributions so we take functions of random variables for a couple of reasons so we'll start with a random variable
58:01
that has this given distribution function capital f it could be any one of the ones we've written down before and then we take a function of that random variable and let's say that it's an increasing function though a decreasing function also will work but i want the function to be one to one so maybe i should say strictly here so that g is one to one
58:25
makes it simpler so it's either strictly increasing or it could be strictly decreasing and then we take g of x and what's the distribution function of g of x so that's
58:41
another random variable function of a random variable is a random variable and what's the distribution function of a random variable defined by it's just that it's a probability
59:21
that that random variable is smaller and equal to little x it's this function of little x let's try to compute that in terms of capital f what is capital f capital f is probably capital x is less than equal little x can we express this in terms of capital x is
59:47
less than or equal to something g is strictly increasing so it's one
01:00:00
So it has an inverse, and the inverse would be increasing or decreasing? If g is increasing, what about its inverse? It's increasing also. So if I apply g inverse to both sides, if this number is less than or equal to that one, what about g inverse of this number? It would be less than or equal to g inverse of that number.
01:00:23
In other words, g inverse would preserve this order if it's strictly increasing. If I apply g inverse to here, what do I get? Capital X. And then that would be less than or equal to g inverse of little x, right?
01:00:47
But this is the function capital F evaluated here.
01:01:02
So that would be the distribution function for g of x, f at g inverse of x. Let's do an example. Let's suppose x is uniformly distributed on the interval from 0 to 1.
01:02:13
When I say what is the distribution of a random variable, it means what is the distribution function? So you should write down the probability that g of x is less than or equal to little x
01:02:21
and then try to compute that. So let's remind ourselves what this means, that capital X is uniformly distributed on the interval from 0 to 1. This means that's the graph of capital F.
01:03:45
That's what it means to be uniformly distributed. It means its distribution function is this. That means you're picking a point at random from the interval from 0 to 1 and there's no bias for any particular place in the interval. It's as likely to come from one side as from the other.
01:04:03
Okay, so let's try to compute the distribution function for g of capital X where that's g. Oh, and by the way, maybe I should say what would happen if g is decreasing.
01:04:56
What would change here?
01:05:03
g inverse would then be decreasing so this would be reversed
01:05:20
and that would be 1 minus f of g inverse of f of x. I just want to add that. So is this function increasing? X goes from where to where here? We're interested in values of capital X between 0 and 1.
01:05:41
So we're looking at X between 0 and 1 here. Is this function increasing? Well, what about minus x? Increasing or decreasing? The function L of x equals minus x. Is that increasing or decreasing? Look at the line y equals minus x. What does that do? It goes down, right?
01:06:01
What about 1 minus x? Increasing or decreasing? Decreasing. What about the log of 1 minus x? Log is an increasing function. The increasing function of a decreasing function would be? Well, take derivatives. Maybe we could just take the derivative of this. What's the derivative?
01:06:24
And now you have a minus sign here and when you take the derivative of this part you get a minus sign also so they would cancel giving something positive. So this is an increasing function. This is increasing. So we can apply that formula there.
01:06:45
And what's the inverse here?
01:07:23
Find inverses. You just solve for x here. So let's multiply by minus lambda.
01:07:46
So there's g inverse. So this would be the probability capital X is less than or equal to g inverse of y.
01:08:01
g inverse of x. g inverse of x is 1 minus e to the minus lambda x. And now where is this number? 1 minus this thing.
01:08:23
X is going to be bigger or equal to 0 here. This number is between 0 and 1. So it falls in here. So this is just capital F evaluated at
01:08:42
1 minus e to the minus lambda x. What would that be? I'm sorry, you can't read that, can you? Let me just say if this is y, what's f of y? What's the height of that function, that y? It's y, right? So if we put this in here, what's f at this value?
01:09:03
It's that value. What distribution function is that? That was the exponential. So if I start with a uniform random variable on the interval from 0 to 1,
01:09:20
and I take this function of it, I get an exponential random variable. So I just mentioned that there are tables of independent copies of, or independent samples of uniform random variables. How could you get independent samples of exponential random variables from that? Just apply this function to them.
01:09:41
And that'll give you a, that'll generate a list of independent random variables or samples of independent random variables that have exponential distribution. So this transforms random variables with one distribution to random variables with another one.
01:10:03
Let's do another example.
01:10:21
Did I write this symbol here? You'll see this a lot. Suppose x has a normal distribution with mean mu and variance sigma squared.
01:10:41
And let's say y is equal to ax plus b. What's the distribution of y?
01:11:13
So what does it mean when you're asked, what is the distribution of y? It means you should try to figure out what the distribution function is. That is,
01:11:20
you should write down the probability y is less than or equal to little x and see if you can compute it. So you're being asked,
01:11:44
what is this? Well, how do you start? This is given in terms of something that's known. What?
01:12:08
And now we can just subtract b from both sides. These two inequalities are equivalent.
01:12:24
Now the next step might require assuming a is either positive or negative. Which do you prefer? Okay, let's take positive. That's a good outlook. Let's say if a is positive, we divide both sides by a.
01:12:43
If a was negative, what would happen? It would just reverse inequality, right? The point is that this inequality is equivalent to that. So this event and this event are the same. That means the probabilities remain the same. Okay.
01:13:16
Now we know what that is. That's 1 over square root 2 pi
01:13:22
sigma squared integral from minus infinity to x minus a over b e to the minus y minus mu squared over 2 sigma squared. Now you might think, well, how can I ever remember all that? Well, you're going to see this so often that you'll wake up in the middle of the night
01:13:43
and it'll be the last thing you were dreaming about before you woke up. Okay, so that's the distribution function of y. What's the density of y? The density of a distribution function is its derivative.
01:14:02
So let's take the derivative, the density of y. How do you differentiate an integral? Especially if you have an integral from minus infinity
01:14:22
to, let's say, h of x little f of y dy. What's the derivative of this with respect to x? Remember how that goes? It's f at h of x times h prime of x. Yeah?
01:14:45
You're right. Thank you. I told you about my vertical version of Right? That's happening again.
01:15:06
Okay, so that's the h of x. So we just evaluate the density at h and then times h prime. Okay? h of x is x minus b over a. So what's h prime? If that's h, x minus b over a,
01:15:22
h prime would be one over a, right? I'm going to put a little cap, a little y here. I evaluate here.
01:16:05
I don't like that all that well because it looks a little complicated, doesn't it? So let's try to simplify it a bit.
01:16:32
I'm going to put the a inside here. And that means it becomes a squared.
01:16:42
And then because I don't have enough room to write high, I'm going to write x instead of d2. I think you know that notation, right? And then up here,
01:17:16
let's see. I'm combining these two guys and this one's a fraction.
01:17:21
And if I'm going to add two things, I should have a common denominator. So the denominator should be a. So this becomes mu a over a. And then I'll have something over a in here, right? But it's squared. So where could we put that?
01:17:42
We could put it down here, right? We could put the a squared down here. And then what would be left up here? Well this doesn't have to be changed. I should get x minus b. But then what's left here? a mu. I'm going to write this as a mu over a. I combine this, I get x minus b minus a mu.
01:18:03
And I'm going to write it this way. Okay? Is everybody okay with that? No objections? You see why?
01:18:22
Okay. So what do we have? Yeah, it's normal. What's the new mu? This is normal density.
01:18:40
This becomes a mu. b plus a mu. And the new variance becomes, the new sigma squared becomes sigma squared a squared. Okay?
01:19:02
So if we start with something with normal mean mu variance sigma squared, can we get by this transformation which I erased? Can we get to a normal with mean
01:19:20
zero and variance one? We took ax plus b and we got to this. Can we arrive at normal mean zero variance one? Can we get the new mu to be zero and the new variance to be one? How could we go from
01:19:40
normal mean mu variance sigma squared to normal mean zero variance one? Can we do that? All we have to do is solve the equation b plus a mu is zero. Sigma squared a squared equal one. So what should a squared be? Or what should a be? One over a?
01:20:03
I'm sorry. a should be one over sigma. And mu should be I'm sorry, b should be well
01:20:21
minus mu over sigma. Then we go from a if we multiply by a and add b we go from there to there. What about the other way? Can we go backwards? Suppose we start with a normal mean zero and variance one. Can we get to a normal mean mu
01:20:40
variance sigma squared? How do we go from here to here?
01:21:03
So that means if we start with this as one zero and this is one that means we wind up with one over square root two pi a squared exponential
01:21:22
we wind up with what? x minus this is zero so x minus b squared over two a squared. Now how could we get mu here and sigma here? Just take b equal mu and a equal to
01:21:41
sigma. That is if x is I'll usually write it this way if x is normal with mean zero and variance one
01:22:01
then mu x, I'm sorry sigma x plus mu is normal with mean mu and variance sigma squared. This is how you go from a normal with mean zero and variance one to a normal with mean mu and variance sigma squared.
01:22:21
So this is handy because in a lot of statistical tests like here's a dream job you get your degree in math and statistics and you're hired by I don't know who owns M&M's? I forget some
01:22:40
maybe Nestle's I don't know or maybe you're hired by Nestle's and they make chocolate drops these chocolate chips chocolate chips don't all have the same weight the weight of a chocolate chip is a random variable and it's got a normal distribution with some mean mu
01:23:00
and some variance sigma squared and you're supposed to find out what it is. You use things like this where you try to go from normal with mean zero and variance one to whatever this mean weight and variance for the chocolate chip is. So you want to use this transformation.
01:23:21
Also there are tables for normal random variables with mean zero and variance one. In fact in the back of the book there's a table for the distribution function of normal with mean zero and variance one. So that is in the back of the book you'll find
01:23:53
values for this function of x for a lot of different x's and it should be table
01:24:00
which yeah table two in my older version well it's the appendix B table two so they will have a bunch of numbers in here
01:24:22
and right here is point four and up here is point zero four so if I go across here to point four four this number is a probability that a normal random variable with mean zero and variance one is less than or equal to point six seven zero I'm sorry the probability of a random variable with mean zero and variance one
01:24:41
is less than or equal to point four four is point six seven zero zero. Now if I want to know a similar probability for this kind of random variable I apply this transformation to normal random variable with mean zero and variance one and I get one with mean zero and variance mu squared or I might go backwards
01:25:01
so this helps you to read off a distribution function for this kind of random variable from a distribution function for the normal with mean zero and variance one and later on in the statistical testing part of the course we'll be doing that kind of thing a lot. Okay so that's
01:25:21
the end of chapter two now let's start chapter three on joint distributions a lot of times comparisons are made for example my oldest son is
01:25:40
has shoe size three sizes bigger than mine size fourteen and so that's a random variable my shoe size is a random variable you might look at the shoe size of a father and the shoe size of his son these will be two random variables
01:26:02
will they be related somehow would you expect I don't know Yao Ming for example to have a son with a size nine shoe probably not right but Yao Ming's son's shoe size would be related to his shoe size or
01:26:21
to go the other way who knows who Messi is who is he soccer player yeah he's a soccer player he's probably the best soccer player in the world and in fact his team spotted him when he was a young boy he was very highly skilled but he was really small
01:26:43
so in fact they gave him growth hormones but he's still small but you would expect his son to have small shoe size right so these are examples of pairs of random variables shoe size of a father shoe size of a son or height of a father height of a son which are
01:27:01
correlated they have some kind of joint characteristics joint distribution so chapter three is on jointly distributed random variables and there are many
01:27:21
examples of random variables that come in pairs that are jointly distributed so let's do a couple of maybe very simple examples that involve what we've done already let's say we take
01:28:01
we select a number at random from zero one then we perform n independent Bernoulli
01:28:21
trials with probability u of success
01:28:40
what is the probability that we have
01:29:23
no more than k successes and that the u we picked is less than or equal to u
01:29:42
this is called the joint distribution of x and u so we perform some operation some experiment first namely picking a number at random that becomes our probability of success and then with that probability of success we generate a random variable which is
01:30:01
the number of successes in n independent Bernoulli trials with that probability of success each time now this is a little ambitious to do the first time let's try something a little easier let's try the following let's say I'm sure you'll
01:30:33
be in favor of trying an easier computation first let's say that the probability that u is
01:30:42
a fourth is equal to the probability that u is three fourths is one half so we have a random variable that takes two values one fourth or three fourths with equal probability so how could you do that
01:31:01
well you toss a coin if it comes up heads you set u equal a fourth if it comes up tails you set u equal to three fourths now let's try this we could look at now in fact the pmf this would be the joint pmf
01:31:41
well what kind of values of j do we have to consider here just this one and this one so let's start with this one so what do we have here
01:32:04
maybe we should write this in terms of conditional probabilities it would look like that now given that u is a fourth what kind of distribution does x have
01:32:24
it's binomial yeah go ahead oh I'm sorry yeah there it is again thanks thank you
01:32:44
given u is a fourth what distribution does x have over here it says that it would be we do the same kind of thing it's binomial with parameter one fourth it's binomial with parameter one fourth
01:33:01
so this would be n choose k one fourth to the k three fourths to the n minus k times a half the half coming from here
01:33:26
and above it here I'll write the probability that x is equal to k and u is three fourths would be what do the same thing that's the probability that x is equal to k
01:33:41
given u is three fourths times the probability that u is three fourths which would be this is now binomial with parameters n and p equals three fourths so it would be n choose k
01:34:03
three fourths to the k one minus three fourth is one fourth to the n minus k times a half so generally we say
01:34:49
the pair x, y
01:35:26
of random variables has a joint distribution function capital F if the probability that capital X is less than or equal to x capital Y is less than or equal to y is f of x, y so this would look like this
01:35:41
here's x here would be y this is the point x, y that probability here is the probability that the pair x, y falls in that region
01:36:28
in case of continuous random variables the pair x, y has joint density
01:36:57
little f if the probability that you lie in a region like this
01:37:01
can be gotten by integrating a function little f so let me just give one last example of a density and we'll stop
01:37:42
okay, so do you know what this means? Indicator of something? This means the function that is one is one if this condition is satisfied and it's zero if it's not so this density is
01:38:23
one times this number when x squared plus y squared is less than or equal to r squared and it's zero when x squared plus y squared is bigger than r squared so where is this in words, where is this function different from zero? inside this circle centered at the origin of radius r and what's its value
01:38:42
on that circle? it's one over the area of this circle a pair of random variables of this joint density is said to be selected at random from the from the disc of radius r centered at the origin this is uniformly distributed on the disc
01:39:01
or circle of radius r centered at the origin okay we'll do more with that later okay, thanks