Hypothesis Testing
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 13 | |
Number of Parts | 16 | |
Author | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/12883 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
00:00
Open setMathematicsEqualiser (mathematics)Conditional probabilityComputabilityParameter (computer programming)ResultantWahrscheinlichkeitsfunktionQuantum stateExpected valueSummierbarkeitCondition numberRandom variableIndependence (probability theory)Lecture/Conference
02:22
Conditional probabilityRandom variableMultiplication signCondition numberLine (geometry)WahrscheinlichkeitsfunktionMereologyNetwork topologyMany-sorted logicMusical ensembleIndependence (probability theory)Numerical analysisMoment (mathematics)Parameter (computer programming)Fraction (mathematics)Poisson processDistribution (mathematics)Product (business)Expected valueVariable (mathematics)Forcing (mathematics)Functional (mathematics)Lecture/Conference
06:03
Power (physics)Fraction (mathematics)Numerical analysisFactory (trading post)Term (mathematics)Binomial heapSummierbarkeitMereologyWahrscheinlichkeitsfunktionRight angleCuboidOrder (biology)Exclusive orSign (mathematics)IntegerForcing (mathematics)VotingLecture/Conference
08:17
Condition numberParameter (computer programming)Right angleWahrscheinlichkeitsfunktionDistribution (mathematics)Natural numberMarginal distributionModulformDecision theoryExpected valueBinomial heapModule (mathematics)AreaMany-sorted logicLecture/Conference
12:15
Factory (trading post)Special unitary groupSolid geometryPrice indexModel theoryDirected graphSummierbarkeitTrailDivisorIterationEqualiser (mathematics)OctahedronFlow separationInsertion lossSeries (mathematics)Conditional probabilityIndependence (probability theory)AreaOrder (biology)Multiplication signDistribution (mathematics)Expected valueBinomial heapParameter (computer programming)ComputabilityTerm (mathematics)Marginal distributionLecture/Conference
18:47
Multiplication signComputabilityDistribution (mathematics)Social classParameter (computer programming)Degrees of freedom (physics and chemistry)Series (mathematics)Category of beingLecture/Conference
20:22
Population densityTerm (mathematics)Social classCovering spaceProbability density functionLecture/Conference
22:09
ComputabilityLine (geometry)Independence (probability theory)Exponential distributionDerivation (linguistics)Theory of relativityInequality (mathematics)Negative numberDirection (geometry)Cartesian coordinate systemPosition operatorWahrscheinlichkeitsfunktionDistribution (mathematics)Boundary value problemSet theoryVariable (mathematics)LengthPotenz <Mathematik>Event horizonPopulation densitySpacetimeMultiplication signEqualiser (mathematics)Parameter (computer programming)StatisticsSummierbarkeitGrothendieck topologySign (mathematics)Spherical capMultiplicationDirected graphModulformMany-sorted logicCovering spaceWater vaporFraction (mathematics)AreaFigurate numberRule of inferenceMathematicsLecture/Conference
29:34
Population densityPotenz <Mathematik>INTEGRALLine (geometry)Parameter (computer programming)Point (geometry)MereologySet theoryLimit (category theory)Multiplication signProduct (business)DivisorMany-sorted logicState of matter4 (number)Right angleFamilyOrder (biology)AreaGrothendieck topologyArrow of timeTrailLecture/Conference
32:41
Multiplication signPopulation densityLecture/Conference
34:04
Population densityParameter (computer programming)Potenz <Mathematik>INTEGRALMultiplication signEvent horizonAreaFlow separationWater vaporFraction (mathematics)Goodness of fitSquare numberDerivation (linguistics)ExponentiationCoefficientGamma functionNichtlineares GleichungssystemTerm (mathematics)Numerical analysisBoundary value problemMathematicsInterior (topology)Right angleEnumerated typeLecture/Conference
40:32
Gamma functionParameter (computer programming)Goodness of fitArithmetic meanSampling (statistics)Multiplication signLecture/Conference
42:35
Arithmetic meanVarianceBasis <Mathematik>Sigma-algebraChi-squared distributionDegrees of freedom (physics and chemistry)Division (mathematics)Forcing (mathematics)Variable (mathematics)EstimatorClassical physicsMassProcess (computing)Flow separationPoint (geometry)Operator (mathematics)SpacetimeSummierbarkeitSampling (statistics)MeasurementChi-Quadrat-TestWeightSquare numberCentralizer and normalizerStatisticsLimit (category theory)Normal distributionNormal (geometry)RandomizationLecture/Conference
45:53
WaveCondition numberNormal (geometry)VarianceArithmetic meanSquare numberSigma-algebraIndependence (probability theory)Right angleAlgebraMultiplication signMany-sorted logicSelectivity (electronic)SummierbarkeitSeries (mathematics)Directed graphForceWater vaporIndependent set (graph theory)Price indexMortality rateDegrees of freedom (physics and chemistry)Term (mathematics)Distribution (mathematics)DivisorEqualiser (mathematics)Lecture/Conference
53:21
Sigma-algebraMultiplication signRootSquare numberSummierbarkeitDegrees of freedom (physics and chemistry)Distribution (mathematics)Term (mathematics)Chi-squared distributionDegree (graph theory)Water vaporLecture/Conference
54:57
Independence (probability theory)Distribution (mathematics)SummierbarkeitMultiplication signNumerical analysisSquare numberMortality rateNormal (geometry)Sigma-algebraWater vaporDegree (graph theory)RankingArithmetic meanRandomizationVarianceRandom variableDegrees of freedom (physics and chemistry)RootLecture/Conference
57:14
Degrees of freedom (physics and chemistry)Sigma-algebraMereologySquare numberRootSummierbarkeitRight angleDegree (graph theory)VarianceVotingFree groupChi-Quadrat-TestLecture/Conference
59:02
Arithmetic meanSquare numberRootDistribution (mathematics)VarianceDegrees of freedom (physics and chemistry)RandomizationSigma-algebraSampling (statistics)Normal (geometry)Multiplication signNumerical analysisMultiplicationMany-sorted logicLocal ringStudent's t-testGrothendieck topologyUniverse (mathematics)Free groupChi-squared distributionVotingObject (grammar)Lecture/Conference
01:03:26
Object (grammar)Student's t-testMultiplication signVotingSampling (statistics)1 (number)SubsetNumerical analysisMeasurementArithmetic meanArchaeological field surveyEstimatorVariancePrice indexWeightCircleInterior (topology)Division (mathematics)AverageOrder (biology)Cycle (graph theory)WurzelbaumSolid geometryEntire functionPosition operatorProcess (computing)Selectivity (electronic)Lecture/Conference
01:08:20
Sampling (statistics)Arc (geometry)Beat (acoustics)Arithmetic meanVarianceStatisticsLecture/Conference
01:09:42
Numerical analysisState of matterTotal S.A.SummierbarkeitStatisticsMultiplication signVarianceSet theoryLecture/Conference
01:12:11
Social classSubsetDirected graphOrder (biology)Coefficient of determinationSampling (statistics)Right angleRandom variableSelectivity (electronic)RandomizationNumerical analysisPoint (geometry)MereologyMultiplication signLecture/Conference
01:14:51
Set theoryExpected valueNumerical analysisCommutatorSelectivity (electronic)Many-sorted logicSpherical capDistribution (mathematics)Water vaporEqualiser (mathematics)Matrix (mathematics)VarianceArithmetic meanVotingLecture/Conference
01:19:10
Numerical analysisTotal S.A.Lecture/Conference
01:20:32
Numerical analysisGoodness of fitCondition numberGrothendieck topologyWater vaporLecture/Conference
01:22:18
Numerical analysisTotal S.A.Conditional probabilitySet theoryProduct (business)Condition numberDifferent (Kate Ryan album)Hand fanMany-sorted logicWater vaporLecture/Conference
01:24:30
Probability distributionMultiplication signVarianceSummierbarkeitEqualiser (mathematics)Water vaporLecture/Conference
01:26:03
Numerical analysisMultiplication signAreaWater vaporOrder (biology)Object (grammar)Directed graphRight angleMany-sorted logicFrequencySampling (statistics)Sequel1 (number)Arithmetic meanTerm (mathematics)AutocovarianceSummierbarkeitEqualiser (mathematics)VarianceLecture/Conference
01:31:37
Group actionMany-sorted logicFunctional (mathematics)Water vaporIndependence (probability theory)SpacetimeMathematicsSampling (statistics)WahrscheinlichkeitsfunktionMultiplication signMortality rate12 (number)Operator (mathematics)Arithmetic progressionSummierbarkeitVarianceConditional probabilityAutocovarianceProbability distributionRandomizationArithmetic meanDistribution (mathematics)Lecture/Conference
01:36:50
SummierbarkeitDifferent (Kate Ryan album)Multiplication signCarry (arithmetic)Square numberLecture/Conference
01:38:51
Multiplication signSummierbarkeitTerm (mathematics)Equaliser (mathematics)Substitute goodAutocovarianceDependent and independent variablesTable (information)Order (biology)Right angleWater vaporSampling (statistics)Flow separationLecture/Conference
Transcript: English(auto-generated)
00:05
Any questions on anything? No? Okay. Thought I'd start today with just a couple of computations,
00:25
one related to Chapter 6, whatnot.
01:03
You have independent Poisson random variables with the same parameter lambda, and you want to find the conditional PMF of one of them given information about the sum.
01:21
Does anybody know what this would be? So this is then the probability that x is j given x plus y is k.
01:51
Since they're discrete random variables, that's what it would be. And how do you proceed from here?
02:02
What's the definition of conditional probabilities? The probability of a given b is what? That's the probability that x is equal to j and x plus y is equal to k
02:23
divided by the probability that x plus y is k.
02:41
So now what? So far we've only used the definition of conditional probabilities or conditional PMFs. The first line here is the definition of conditional PMF. Here we've just used the definition of conditional probability. Maybe it's now time to use something about the random variables x and y.
03:02
What do we know about them? They're independent and Poisson. So how can we use independence? Are x and x plus y independent? x and x plus y? No. But can we use it somehow? We can't use it here. This is not the probability that x is j times the probability that x plus y is k.
03:23
But can we use it somehow? When x is j, what does y have to be if x plus y is k? k minus j. So we can use this information, replace this by j. Let's replace this by j.
03:45
If we replace this by j, what does it say about y? It's k minus j. So here we haven't used independence yet, so let's do it now.
04:04
x and y are independent. So the numerator would be the probability that x is k times the probability that y is k minus j divided by the probability that x plus y is k.
04:26
So we've used independence. Now it's time to use other information about x and y, which is... Yes, it should. Thanks.
04:51
What's the probability that x is j?
05:04
What's the probability that y is k minus j? So far so good. What about x plus y?
05:21
If x and y are independent Poissons with parameter lambda, what's the distribution for x plus y? Yeah, Poisson with parameter 2 lambda. We've done that before. How did we do it? We computed moment generating functions. So this would be Poisson with parameter 2 lambda.
05:42
So that means this would be 2 lambda to the k over k factorial e to the minus 2 lambda in the denominator. Okay, does that admit some simplification?
06:05
What about e terms, exponentials? They go, right? Because what's the power of e in the numerator if I combine them? Yeah, e to the minus 2 lambda in the denominator is e to the minus 2 lambda, so they cancel.
06:25
Right. So here in the denominator, we're dividing by k factorial. So that would mean it goes into the numerator up here. Let's take care of the lambdas first. What happens to the lambdas?
06:40
What's the power of lambda in the numerator? Yeah, lambda to the k. What's the power of lambda in the denominator? Also lambda to the k, so the lambdas cancel. So what's left?
07:00
k factorial goes in the numerator over j factorial and then k minus j factorial and then a 1 over 2 to the k. That's it. And somebody in the first row noted
07:25
this part is k choose j. So the conditional PMF of one of the sumands given the value of the sum is this.
07:42
And j can be what values here? What values can j have in here? 0 to k because x and y are non-negative, right? Not integer valued. The one sumand can't be more than the sum. So j goes from 0 to k.
08:00
Otherwise, what would this be? 0. And what? That's binomial. So if you know x plus y, x is binomial when they're both Poisson. Actually, you had an exercise like this. It's kind of the other way around.
08:38
Let's say if z is Poisson with parameter 2 lambda,
09:43
this means you generate z first. It gives you some non-negative integer value and then given z, x is binomial with parameter z and 1.5. So that means that this thing,
10:00
one way to rewrite that statement in the form of conditional PMFs is this. If you're given the value of z as k, then x is binomial with parameter k and 1.5.
10:28
So what would the marginal distribution of x be? How do you find marginals? Yeah, you have to find the joint and then you have to sum over all the other variables,
10:44
the other values. So what's the joint distribution of x and z in this case?
11:06
What are we given? We're given the conditional distribution of x given z. Are we given any other distributions? Are we given this, for example, in the statement of the problem?
11:21
What are we given? We're given the distribution of z. We're given the PMF for z and the conditional PMF for x given z. So can we get the PMF for the pair xz from this and this?
11:40
How can we get the joint PMF from this and the marginal for z? What's the marginal for z? This is for k equals 0, 1, 2.
12:00
Is that right? So we have the marginal for z and the conditional. Can we get the joint? How? Right.
12:25
All we've done here is something like this. Probability of A intersect B is probability of A given B times probability of B. It's essentially the definition of conditional probabilities. So we multiply these together and what do we get?
13:10
Well, it looks like the 1 half to the k and the 2 to the k cancel.
13:36
So how do we get the marginal for x?
13:41
Right, you take the joint for x and z and we sum over k. And what values can k have and this be non-zero from j to infinity?
14:35
We have here, we have a lambda to the k.
14:40
Well, let's see. We have k choose j lambda to the k over k factorial e to the minus lambda. That's what we have to sum. Did I get everything? No? Oh, right, here I guess.
15:35
k choose j is this and we have a lambda to the k
15:40
over k factorial. And then the e term doesn't depend on k, which is the index of summation, so I pull it out. It appears in every term. And the k factorials cancel. Now what?
16:01
Is there anything in here that doesn't depend on the index of summation? The j factorial. So in the next step let's factor that out. And then what's left? We have k equal j to infinity
16:23
lambda to the k over k minus j factorial. And this would almost look like e to the lambda if instead of lambda to the k we had lambda to what?
16:43
k minus j. Well, can we make it k minus j? Well, certainly we can. Of course that violates this. So how do we... This means we multiply by lambda to the minus j so we should also multiply by lambda to the j and let's do that outside.
17:10
Or you might think of it this way. I factored lambda to the j out of here. Lambda to the k is lambda to the j times lambda to the k minus j.
17:21
And now we can relabel here. Let's call L k minus j. And what do we get? We get lambda to the j over j factorial e to the minus 2 lambda sum L equal what to what? When k is j, j minus j is 0
17:44
and k goes to infinity so so does L and we get lambda to the L over L factorial. And what's that sum? That's e to the lambda. That sum is e to the lambda. What's e to the lambda times e to the minus 2 lambda?
18:03
e to the lambda times e to the minus 2 lambda. e to the minus lambda. So this becomes lambda to the j over j factorial e to the minus lambda.
18:21
And what's that distribution? Poisson. So if x is binomial with parameter z and half where z is a random variable which is independent of x and Poisson with parameter 2 lambda then the marginal distribution of x is Poisson with parameter lambda.
18:43
That's kind of the reverse of the first computation we did. And remember how to extract the marginal for x from the information about the PMF for z and conditional of x conditional on z?
19:03
You find a joint first and then you sum out the z variables. That gives the marginal for x. OK. I do this so you'll get more familiarity with these kinds of computations.
19:23
Let's do another computation. Last time in class we did a ratio. I believe it was this one.
20:12
u divided by its degrees of freedom over v divided by its degrees of freedom has a distribution that we call f
20:20
with parameters m and n. And we even computed the density for this. It's a little complicated so I haven't memorized it.
20:41
Probably you don't have to memorize it either because usually you can look this up. The density, and this is for w bigger than or equal to 0,
21:54
for w less than 0, the density is 0 because we're taking the ratio of two non-negative quantities.
22:05
And we did this in class. I didn't rearrange all the terms but we got all these things. So useful to be able to... We've done a lot of computations with getting PDFs or PMFs
22:21
for sums of independent random variables but we didn't do very much with ratios so let's try another ratio. This one involved finding the PMF for a ratio. Let's do another computation that involves ratios.
22:59
Suppose that x and y are independent
23:00
exponential random variables with parameter 1 and they are, well I said independent. Let's find the PMF of the ratio x over y.
23:27
Sometimes in statistics you might be interested in ratios like say the ratio of your height to your foot size. Might be some interesting statistic there.
23:40
In fact this is the, I think this might be the way people gave an individual's height in the Middle Ages. I was just reading about Charlemagne, Charlemagne? You know who that is, Charlemagne? A French guy. A French guy, yeah.
24:01
Some French guy. He was not long after France was unified by Clovis and he was named Holy Roman Emperor by the Pope in the Middle Ages. Anyway, his scribe said that he was seven times,
24:23
his height was seven times the length of his foot. His height was seven times the length of his foot. So that means he was seven feet tall. Where a foot was, you know, his foot length. Which was common in those days.
24:40
Foot was set by the length of the foot of the king. So he was quite big. So you might look at something like the distribution of height to foot size if you're, I don't know, if you have a fetish for this. So this might be, there are other examples where you might be interested in height ratios.
25:03
Like maybe, well, I think you can think of your own. So let's get the PMF here. This is the CDF. So I'm going to, how do I get from the CDF to the PMF?
25:24
What's the relation between the CDF? Yeah, I want to compute this and then compute the derivative. Oh, this is the CDF. This is a cumulative, yeah, okay, so.
25:41
Oh, oh, I see. Yeah, yeah, yeah, yeah. Okay, thank you. Okay, so we'll find this and then, so the thing we want is the derivative of this guy. This is a little x here. Okay, so what happens if I multiply this inequality by capital Y?
26:06
Does it change directions or do I have to worry about cases Y and negative Y positive here? No, because Y's and exponential variables are positive, okay?
26:22
And now again, any time, you know, jointly distributed random variables, in this case they're independent. If X and Y have a PDF, joint PDF,
26:51
how do I compute the probability that the pair is in a set A? Right, integrate over A the density, the joint density, okay?
27:10
That's what densities do for you. They allow you to compute probabilities. Okay, do we have something like this here? Is this of that form?
27:23
Yeah, so what we have to do is figure out what is, what does this look like so that we can integrate it over. In other words, we have to figure out, this is the pair X, Y is in the set A and then we have to integrate over A. Okay, so let's try to figure out what A is in this case.
28:03
I can, I think I shouldn't have picked X, little x here. I think it's going to be confusing. Can I change it to Z?
28:26
Notice it only took one person to say sure for me to, to do it.
28:41
So I'm interested in the event that pair X capital Y satisfies capital X less than or equal to Z times capital Y. So let's draw that in little x, little y space here.
29:00
And of course we're only interested in the first quadrant because our variables capital X, capital Y are non-negative. And where do we have equality here? What are the set of pairs little x, little y such that, let's first decide what this looks like.
29:21
That would be the boundary of this region. It's a line with slope, nope, 1 over Z, line with slope 1 over Z. Okay, that's that line.
29:46
But we're interested in one side of that line. Which side would that be? Well, we're interested in the part of the first quadrant that's on one side of that line. What if I take the point up here with zero for the x coordinate?
30:04
Would that satisfy this inequality? Yeah, so what side of the line is it? It's above, okay? So that's our set A.
30:20
So we want to integrate over A the joint density. Well, what's the joint density for capital X and capital Y in this problem? Individual ones, right? Because they're independent, it's a product. Now I didn't say explicitly what it is, but they're exponential.
30:40
What's the density for an exponential with parameter 1? E to the minus x. For a fourth parameter lambda, it's lambda e to the minus lambda x. And that's for x bigger than zero. So it would be e to the minus x. So the joint density would be e to the minus x times e to the minus y.
31:13
Now in this problem, all we have to do is set up limits of integration that describe A.
31:20
And how can we do that? What limits of integration would describe an integral over A? Y goes from zero to infinity. Whoops, that's not infinity. And then for a particular value of y, x goes from zero up to what?
31:47
That value there, which would be zy. X goes from zero to zy. Actually, you can read it off from here, right? X goes from zero up to zy.
32:09
Everybody got that? See, in here, there is no restriction on y, is there? Except that y should be bigger or equal to zero. So y goes zero to infinity. X goes zero to zy.
32:24
So let's perform these integrations. They're very difficult. Integral of e to the minus x is? Negative e to the minus x.
32:53
So we have to evaluate at the two ends here. We get integral of zero to infinity. We subtract the value at zero, which is one, minus one.
33:04
If you subtract minus one, you get plus one. And then we have e to the minus zy here. Okay, that's what this evaluation gives. Times e to the minus y, dy. Now we have to do what?
33:20
Let's write this in two steps maybe. Okay, I just multiplied this through here.
33:44
Here's a z times minus y. Here's a one times minus y. So that's z plus one times minus y. And let's see. I think we don't need this.
34:04
The first integral, e to the minus y from zero to infinity is what? Well, that's the e to the minus y. That's the density for an exponential random variable with parameter one. And if you integrate a density, what do you get?
34:20
One. The second one is almost a density for an exponential with parameter one plus z. What's it missing from being a density? What's the density for exponential with parameter lambda? Lambda e to the minus lambda. So this is missing the lambda or the one plus z.
34:41
So I put a one plus z here, and then I divide by it. So if I put the one plus z in the integral, I get one. So that means I get one over one plus z. You could also just do the integration. Okay. Well, that's kind of neat.
35:02
It simplifies. Just find a common denominator of one plus z, and what do we get in the end? Yeah, z over one plus z, or z times one plus z to the minus one. This is for z bigger than or equal to zero.
35:25
Okay. So have we ever seen that anywhere? I wonder.
35:41
Is that familiar to you, or moderately familiar? Have you seen it recently? No? No? I wonder what that could be. Hmm. Hmm.
36:02
I don't know. What do you think that is? Ah, here! What a coincidence. But this is too complicated. That's so simple. Well, if it, okay, so, all right, let's see. All right, maybe you're right, but what are m and n, then?
36:26
Well, that says z over there. It says w. Does that make a difference? No. No? Okay, good. So z is w. Maybe I should have changed that to w, right? But then it would have been too easy for you. So what's the coefficient of w here?
36:40
One. No, here. M over n. Over there, it's one. So what does that say about m and n if they have to be the same? So we have m is equal to n. What's the exponent here on w? M over 2 minus 1. What's the exponent on the z term over there?
37:04
See, this 1 plus this to that would have to be 1 plus z to the minus 1, right? And the z would have to be this, so this has to be 1. So what does it say about m? The second equation says m must be what?
37:20
2? No, 4? What? Did I do this right? M over 2. Minus 1 here. Minus 1, sorry. Yeah. M must be 2. And so n must be 2.
37:40
What? Okay. M over 2. Professor, we didn't differentiate. Oh, this? Oh, we didn't differentiate yet. Oh, yeah, yeah, yeah. Okay, thanks. That's what it is, yeah. So let's differentiate now.
38:01
No wonder it wasn't working. Is this right? We get the derivative of the numerator. That would be 1 times the denominator, which is here.
38:20
Minus the numerator, which would be z, times the derivative of the denominator, which would give you that, over the denominator squared. So this is 1 over 1 plus z squared, or 1 plus z to the minus 2. Okay, now let's try. This would be...
38:45
So here, m over n would still have to be equal to each other, right? Or equal to 1. M over n would have to be 1, because what is it there?
39:02
It's 1. M plus n over 2 would have to be what? So here we get m is n, and here we get m plus n is 4.
39:25
So m and n must be 2. Now let's see if that all works out. So if m is 2, what does this become? Well, okay. Yeah, so this would be w to the 0.
39:41
We'd get w to the 0. We'd get 1 plus w to what power? M over n would be 1. Here we'd have 2 plus 2 is 4, over 2 is 2, to the minus 2. And here we have what? 1, so we want to write that.
40:03
Here we'd have gamma of 2 plus 2 over 2 is 2, divided by gamma of 1, and then gamma of 1. And what's gamma of 2? It's 1 times gamma of 1, right?
40:22
So we'd get 1 over gamma of 1 times 1 plus w to the minus 2. And gamma of 1 is 1, so this would be 1 plus w to the minus 2. So this ratio is an f, with parameters m and n equal to 2.
41:05
Questions? All right, good.
42:30
Last time we showed that if you have independent normal random variables, then their sample mean and sample variance are independent.
42:41
Actually, to be honest, we only did this when mu was 0 and sigma squared is 1, but the minor modifications of what we did established this. Here's an interesting fact.
43:27
I'm sorry, I wrote the wrong thing here. If I multiply s squared by n minus 1, divide by sigma squared, I get a chi squared with n minus 1 degrees of freedom.
43:45
Oh, and by the way, I didn't make a remark about why x bar is here instead of mu. Typically, in statistical problems, you might conclude, because of the central limit theorem,
44:01
that your samples or measurements are realizations of normal random variable, but you don't know mu and sigma squared. And what you usually want to do is give an estimate for mu and an estimate for sigma squared. So this gives an estimate for mu, but you don't know what mu is exactly.
44:24
So you can't put mu here because you don't know what it is. So what can you put in instead? An estimate for it. So that's why you have this x bar here instead of mu, because typically you don't know mu. For example, you might go to work for Nestle's, a company in Switzerland.
44:44
I like Switzerland, a beautiful country. They make chocolate chips. You're probably aware of that. Anybody here ever make chocolate chip cookies? Good, all right. And you're probably aware that not all chocolate chips are the same size. There's some variability in the weight of a chocolate chip.
45:05
This is very interesting to Nestle company. They want to know what's the mean weight for a chocolate chip. So it would be some normal random variable. The weight of a chocolate chip would be normal with some mean and some variance. So you want to estimate the mean and the variance.
45:20
You don't want much variance in the chocolate chips. Well, people get upset if one chocolate chip is like this and one is like this and you're trying to make cookies, right? So you want this variance to be small. So here's an amazing fact that n minus 1 over sigma squared s squared is chi-squared with n minus 1.
45:42
Do you remember what chi-squared means? It's the sum of the squares of n minus 1 independent normals with mean 0 and variance 1. So how can we see that?
46:51
By the way, this is under the same condition as I have up here, that these are independent normals of mean, mean and variance sigma squared. What about this random variable here?
47:04
I have the sum of n things, squares. What's the distribution of each sum and before I square it? Well, what's the distribution of xi? Mean mu variance sigma squared. So if I subtract mu, what's the distribution of xi if I subtract mu?
47:27
Normal mean 0 and variance sigma squared. And now if I divide that by sigma, what's the distribution of this thing? Normal mean 0 variance 1. Right, normal mean 0 variance 1. And I square it and I add up independent ones. So what's the distribution of this?
47:43
This is chi-squared with n degrees of freedom. Because we are summing n independent normals, which have mean 0 and variance 1, squared.
48:14
What would n minus 1, well, over sigma squared s squared equal?
48:26
That would be 1 over sigma squared sum i equal 1 to n of xi minus, well, no, I don't want to start with this.
48:41
I'll start with that, I guess. I'm really interested in this with a mu bar, mu replaced by x bar.
49:00
So I'll add and subtract it inside here. And now I'm going to expand the square in each term.
49:58
So this chi-squared can be written as this plus this plus this.
50:06
So I start with my chi-squared with n degrees of freedom. And I subtract x bar and add x bar in each term. Then when I square that, I think of this as a plus b squared would be a squared plus 2ab plus b squared.
50:24
Nothing you're not completely familiar with. The first term is n minus 1 over sigma squared s squared. This is n minus 1, this is s squared, right?
50:47
That doesn't have the 1 over n minus 1, right? So if I solve this, what do I get? n minus 1 s squared is the sum i equal 1 to n xi minus x bar squared, right?
51:01
And now if I divide by sigma squared, I get this. That's what we're interested in. So the first term is the thing we're interested in, okay? Yeah, shouldn't be.
51:34
Okay, now in here, this does not depend on the i, the index of summation.
51:45
So that means this term is in every factor for every term. So I can factor it out. So let's look at 2x bar minus mu over sigma squared times the sum i equal 1 to n xi minus x bar.
52:06
What's that equal to? Who said that? It's zero because, well, when we sum this part, we get sum i equal 1 to n xi, right?
52:35
When I just sum this, I get the sum i equal 1 to n xi. How is that related to x bar?
52:41
How is the sum of the first n related to x bar? That sum is n times x bar. So when I sum this up, I get n x bar. Now I add this up how many times? n times. So I get minus n x bar.
53:02
So what's that equal to? Zero, okay? The sum of the xi's is n times x bar. And if I sum up x bar n times, I get n x bar. So n x bar minus n x bar is zero.
53:20
So the second term here is zero. And then I get this last term. And I write the last term as this. I have the same thing every time, so I get n over sigma squared times that. We're going to write it this way. x bar minus mu divided by sigma over root n squared.
53:55
This would be n, when I sum this, it would be n times this.
54:00
So I get an n over sigma squared without the sum. And I'm going to put this inside the square root, but I'm going to write it this way. I'm sorry, inside the square. So I have to take the square root when I put it inside. Okay, so this is equal to, what's the distribution of this sum here?
54:26
This pair? Well, it's what we started with, and that's chi-squared with n degrees of freedom. So this plus this is chi-squared with n degrees of freedom.
54:48
What's the distribution of this one? Well, what's the distribution of x bar? What have we done? We've added up n independent normals.
55:02
If you add up n independent normals, what's the distribution of the sum of n independent normals?
55:21
If you add up independent normals, you get a normal back again. The only question is what's the mean and what's the variance? So all you do is compute the expected value and the variance. What's the expected value of x bar? Well, we get one over n and then the expected value of the sum. What's the expected value of a sum? It's the sum of the expected values.
55:41
So we get one over n, sum of the expected values. Each one of these is mu. We add up mu n times, we get n mu. Divide by n, we get mu. So this is normal with mu. Now we have to compute the variance of this. Okay, we have the variance of a number times a random variable. What happens to the number?
56:01
It becomes squared. The variance of x bar is one over n squared. And then we get the variance of a sum of independent random variables. And what's the variance of a sum of independent random variables? It's the sum of the variances. So we add up the variances of the xi's. What's the variance of xi?
56:22
Sigma squared. We add that up n times, we get n sigma squared. So this would be sigma squared over n. So x bar is normal with mean mu and variance sigma squared over n. So if we have a normal random variable,
56:42
we subtract its mean and we divide by its variance, square root of its variance, I'm sorry. That becomes normal with what? Mean zero and variance one.
57:03
Now if I square that, what do I get? What happens when you square normal with mean zero and variance one? That's chi-squared with one degree of freedom. So this guy here is chi-squared with one degree of freedom.
57:22
What do I have to add to that to get chi-squared with n degrees of freedom? Chi-squared with n minus one degrees of freedom. The sum is chi-squared with n degrees of freedom. This is chi-squared with one degree of freedom. What's a chi-squared with n degrees of freedom?
57:44
It's Z1 squared plus Z2 squared plus ZN minus 1 squared plus ZN squared. This is chi-squared with N degrees of freedom, where Z1 through ZN are normal, independent normals
58:03
with mean 0 and variance 1. So we have that the whole thing is chi-squared with N degrees of freedom. This part is with 1. So that means that what we're adding to it must be chi-squared with N minus 1 degrees of freedom.
58:27
And now before the break, I want to just mention
58:41
that X bar minus mu over sigma root N divided by square root of N minus 1 over sigma squared S squared divided by N minus 1, I believe.
59:01
Is that right? I think I got that right. In other words, I want to say what is a T with N degrees of freedom is a normal divided by square root of U
59:20
over N, where U is chi-squared with N degrees of freedom. Z is normal, mean 0 variance 1. And U and Z are independent. That's what a T distribution is. Normal divided by square root of a chi-squared divided
59:42
by its number of degrees of freedom. This is what kind of random variable? X bar minus mu over sigma divided by root N. It's normal, mean 0 variance 1. This is what kind of random variable? Chi-squared with N minus 1 degrees of freedom.
01:00:01
X-bar and S squared are independent. So, so far we have u is chi-squared with, well, we have n minus 1 degrees of freedom. This is the u. Z is this part. We have n minus 1 instead of n. If we simplify this, this becomes X-bar minus mu over sigma divided by root n.
01:00:26
And we divide by the square root of this thing. And now, can we get rid of the sigma squared?
01:00:41
Yeah, the sigma squareds cancel. So we get X-bar minus mu over root n divided by the square root of S squared. This is a, has a distribution of a T with n minus 1 degrees of freedom. And that's where T comes from, really, comparing sample mean to S squared.
01:01:04
So I could probably put n, the n down here might look a little bit nicer. So this is where T came from originally, comparing sample mean to sample variance. So let's take a break.
01:02:13
Okay, so to summarize, if you have X1 through Xn are independent, normal, with mean mu variance sigma squared,
01:02:26
then the sample mean and the sample variance are independent. This multiple of the sample variance is a chi-squared with n minus 1 degrees of freedom. And the ratio X-bar minus mu to root n S.
01:02:40
S means, takes square root of that, is a T random variable with n minus 1 degrees of freedom. Okay, so now it's time to talk about sampling. So that finishes the discussion of chapter six.
01:03:03
We're now in chapter seven. We'll start with simple random sampling. So we sample from a population.
01:03:22
Population could be a bag of chocolate chips. It could be a bag of M&Ms. It could be a bucket full of bolts. You might maybe measure their strength or size. It could be students at a university, the students in this room.
01:03:43
Could be voters. And what usually happens is these objects are assigned some value. Like a weight, a strength, a score on a test, or an opinion about a candidate.
01:04:00
And you want to discern what's the mean of the whole population, what's the mean value of these things, and what's the variance. So the population size will be denoted by capital N.
01:04:23
The values in the population will be denoted by little x's.
01:04:47
And I'll try to be careful. The index subscript on the x goes from 1 to capital N. So maybe individual 1 has this measurement assigned to it.
01:05:02
That would be the first chocolate chip in the bag. That's how much it weighs, maybe. This would be how much the second one weighs, how much the third one weighs. This is how much the last one weighs. Or if these are people, maybe this is a typical problem of reelection,
01:05:23
or a typical thing that's not a reelection. You sample to see who's ahead in the, say, presidential race. So maybe you assign 1 to individual 1 if they're for candidate A, minus 1 if they're for candidate B. So these would be a bunch of 1's and minus 1's.
01:05:43
If you add them all up, what would you get? You'd get some number. If that number is positive, what would that mean? Candidate A is ahead. If it's negative, candidate B is ahead. You might want to, instead of just adding them all up,
01:06:01
you might want to do something like add them all up and divide by the size of the population. So the true mean, denoted mu, is 1 over capital N times the sum, i equal 1 to capital N xi.
01:06:34
Now suppose I wanted to have some estimate on the, I don't know,
01:06:43
height of the, average height of the students at UCI. What could I do? How many students do you think there are here? Maybe 30,000, so capital N would be 30,000. I could go around stopping people on the inner circle and say,
01:07:00
hold on a minute, I want to see how tall you are, and do that 30,000 times and then add them up and divide by this. You can, by your laughter, I can tell you know that this wouldn't meet with much success. So what's another possibility? I could measure some sub-population, some randomly chosen subset of the student body and measure them.
01:07:27
And the question is, if I measure some subset of the student body and take the mean of that sample, how close will that be to the true mean? How good of an estimate will I have? This is what's done in public opinion polling that I just mentioned for presidential elections.
01:07:42
A survey company like Gallup or whatever, the other ones, the Reuters or whatever, they don't call up every voter in the country every time they're on a survey, do they? I mean, that would be millions and millions of phone calls. They just don't have time to do it in a short period, not to mention the resources.
01:08:01
So do they sample the entire population? A typical sample size I think is in the low thousands, maybe a thousand, two thousand. Yet the estimates are very good. You can sample a small subset if you sample randomly and get a very good estimate of the true mean.
01:08:22
But you have to be careful about random sampling. None of us are around when Truman beat Dewey, but do you know this election, this famous election? Truman won. He's holding up a newspaper the next morning that said Dewey beats Truman that they had run
01:08:41
because there were public opinion polls then, and how did they run public opinion polls? This is, I think, the 1948 election. Well, they would phone people and ask, are you voting for Dewey or for Truman? And they found out that among the people they asked, Dewey was ahead. So what was wrong with that?
01:09:01
Well, in 1948, not everybody had phones. Who had phones? Richer people had phones, and there's a bias. Richer people were biased towards Dewey in that election. So they found, not surprisingly, that Dewey was ahead. So you have to be careful about how you sample, but if you are able to get a good sample,
01:09:22
you might be able to estimate this without measuring every individual in the population. And that's what statistics is about, is trying to find the true mean and true variance in a population without checking every single individual in the population.
01:09:47
Sometimes the total might be interesting. For example, if this is the number of people in the United States, this might be the total dollar worth of individual one. This would be the total dollar worth of individual two, et cetera.
01:10:01
And the sum of all this would be the total wealth of individuals in the United States. So that might be an interesting statistic. So the total is given the name tau, and of course that's capital N times mu.
01:10:37
And the variance, that would be the true variance.
01:11:09
Okay. Now, so we select a set of size little n,
01:12:13
and we're going to see what values that subset has.
01:12:24
Okay, this means we select, this happens too much more, class is dismissed. Ah, you're in luck.
01:13:02
So you've probably seen Toy Story. Anybody here seen the movie Toy Story? Remember at the, at one point there's one of these machines that you have at amusement parks with little toys inside and there's a claw that reaches down?
01:13:22
So think of the population as being inside one of those toys like that, and your claw can reach down and grab little n individuals at one time. So it reaches down and the little things are looking up at it and saying, ooh. And so you draw out little n things without, you know, without any preference for one over another.
01:13:47
Those claws are truly random, right? So we select n individuals without replacement. So that means we don't pick one person twice and sample their size or their value.
01:14:19
So let capital X1 through capital Xn be the resulting values that you get
01:14:23
when you make a sample of little n individuals. This is, we treat these as random variables. The values depend on the randomness of the selection procedure. So they are random. You reach in, you could throw them back in, reach in again and draw little n out, you get another random sample.
01:14:41
So we treat these as random variables. These numbers on the other hand are not random. These are fixed. These are the values in the population. But those are the random values we see if we select little n individuals from population of capital N.
01:15:07
Then we estimate in sigma squared with capital X bar S squared.
01:16:12
If you take expectation, meaning take the expected value over all selections, the expected value of X bar will be the true one, true mean,
01:16:23
and the expected value of the sample variance will be the true variance. This I'll leave as an exercise. Or maybe I'll do it. I don't know. But to compute expectations you need to know distributions.
01:16:40
So let's see if we can figure out the distribution for the capital X's. Now in the case of voting or public opinion polls, you want to know if somebody's in favor of candidate A or candidate B, then we would assign the numbers plus one or minus one to individuals, right? So these little X's would have how many values?
01:17:17
Actually you could have a third value. You could have no preference, maybe.
01:17:23
So let's suppose we're doing a presidential public opinion poll and we have this set of values, X1, X2, et cetera, X of capital N, one value for each member of the population. That's a set of numbers, but there might be repetitions. And so this might actually be the set of minus one, zero, one
01:17:47
in the case of public opinion polling. This means the person's in favor of candidate B. This would mean the person has no preference. This means the person's in favor of candidate A, okay? So each one of these numbers is either a minus one, a zero, or a one.
01:18:04
So what's of interest then would be something like N1 equal to the number of i's such that Xi is equal to minus one.
01:18:20
N2 equal to the number of i's such that Xi, maybe I'll call this N minus one here. N sub zero, number i's such that Xi is equal to zero. And N1 equal to the number of i's such that Xi is one.
01:19:07
Then what's the probability that Xj is equal to L equal minus one, zero, and one?
01:19:44
What's the probability that the jth individual you picked out is in favor of candidate B? What's the total number of people in favor of candidate B?
01:20:04
N sub minus one, right? N sub minus one. So what's the probability you picked out somebody from the set here that has a minus one assigned to it? Well, how many have that? It'd be N sub minus one divided by the total population size,
01:20:22
which would be capital N. So this would be N sub minus one over N. What about here? This would be the number of individuals that have no preference divided by N. And this would be the number of individuals that favor A over something like that.
01:21:29
Let me take a coffee break while you figure that out.
01:21:40
Well, I think conditioning would be a good idea here.
01:22:08
This one we know. That's N sub j over capital N. Right? j is minus one, zero, or one. N sub j would be N sub minus one if j is minus one.
01:22:22
It's the number in the population that have the value j divided by the total population size. What about here? Well, this says I've picked out what? Somebody who has opinion j. And now I want to know what's the probability that this one has opinion L.
01:22:42
Well, when I picked somebody out, how many individuals are left in the population? N minus one.
01:23:06
N minus one. So here we go. Nj over N. Nj over N. And does it matter whether L and j are the same or different? What if L is equal to j?
01:23:22
That is, what if j is zero? So I picked out an individual, that person had opinion zero. What's the probability now that I pick out another person with opinion zero? How many are left with opinion zero? N zero minus one. On the other hand, if L is different from j,
01:23:41
the full set of people with opinion L is still there. It would be N sub L over N. So if j is equal to L, I would get N sub j minus one over N minus one for this conditional probability. Because I have one fewer individual remaining with opinion j
01:24:03
and one fewer member of the population. If j is different from L, then I still have N sub L people with opinion L
01:24:21
because I didn't pull one under this conditioning. But I do have one fewer person. So it's this product or that product. So this gives the probability distribution for the pair.
01:24:41
And how could I compute the expected value of x there? What's the expected value of xj? Well, it's the sum over L. Whatever value it could have.
01:25:02
It could have value L, I guess. And then times the probability that xj is equal to L. Okay? That's just the definition. But now we have some L equal minus one, zero, and one. L and the probability that xj is L is N sub L over capital N.
01:25:27
So it would be minus N sub minus one over capital N plus zero times N zero over capital N plus N sub one over capital N.
01:25:41
Or N sub one minus N sub minus one over the expected value. What about the variance of xj?
01:27:01
Okay, now here, what would we get? L would be minus one, zero, or one. When L is minus one, this becomes one. So if this is minus one, we get one here. What do we get here when I put minus one there?
01:27:22
It's N sub minus one over capital N. When I put zero here, I get nothing. And when I put one there, I get one times the probability that xj is one. That would be plus N one over capital N.
01:28:08
Okay? So the expected value of x bar, we already know no matter what,
01:28:23
that's going to be... Is that going to be mu? Yes, that's mu always. Let's check. What's the true population mean here? Did I write that down already? I think I did, didn't I? The true mean here was N one minus N minus one over capital N.
01:28:43
That's the true mean for the population. So we already know that. Does everybody agree with that? If I add up one over capital N, I equal one to capital N mu.
01:29:02
That's the... This is the true mean where I use these values, the values that I erased, x one through x capital N. What's this equal to? Well, what values do I get here? I get minus ones, zeros, and ones. How many times in this sum do I get minus ones?
01:29:22
N sub minus one, right? So I get minus one times the number of times you get minus one. How many times do I get zeros? Zero times N zero. And how many times do I get ones?
01:29:43
So that's the true means. So you see the sample, the expected value of the sample, sampled individual, is the true mean. What about the expected value of x bar?
01:30:02
That is of the sample mean. I just take expected value inside. Each term here would be this. I add that up little n times.
01:30:20
I get little n times this. I divide by little n, I get this. Finally, let's do the variance of x bar.
01:31:30
I think you remember this formula. The variance of a sum is the sum of the covariances. This also tells you why the variance of a sum
01:31:42
is the sum of the variances in the case of independent random variables, because what's the covariance of xi and xj if xi and xj are independent? It's zero. So you'd only get covariance of xi, xi, but that's the variance of xi. OK, so are xi and xj independent?
01:32:11
Well, what does this say up here? Up here, we find that the probability
01:32:20
that xk is equal to L given xj is equal to xm is equal to j was nj minus 1 over capital N if j is equal to L.
01:32:42
And nL over capital N if j is different from L. And what's the probability that xk is equal to L? It's n sub L over capital N, right? Is this conditional probability equal to this probability?
01:33:02
No. So what does that say? It says given information about xm, it changes your opinion about the distribution of xk. Is that independence? No, independence would mean if you know something about xm, it doesn't change your probability distribution for xk. So obviously, if you pick out somebody with opinion minus 1
01:33:23
from the population and then sample again, that changes the probabilities for the second time. So these won't be zero. So we should try to figure out what's the covariance of xi and xj.
01:34:06
OK, so if i is equal to j, the variance of xj is the variance of xj. The covariance of xj xj is the variance of xj is equal to this. This could probably be simplified.
01:34:24
I don't need to do that. But let's consider now for i different from j.
01:34:46
Well, the covariance is given by this. We know what these are.
01:35:02
Each one is the true mean of the population. Here and here. So we need to figure out this.
01:35:20
Actually, let me ask you, do you think the covariance will be positive or negative? Positive means they're likely to agree, then more likely to agree than not, sort of, roughly speaking. Negative would mean they would tend to disagree on average.
01:35:46
So if you take somebody out of the population and they are going to be for candidate A, is it more likely or less likely that the next person will be for candidate A? Less likely. So you might expect this to be negative.
01:36:04
OK, so how do we do this? How do you compute the expected value
01:36:32
of a function of two random variables? You evaluate that function at l and k, and then multiply by the, in this discrete setting,
01:36:40
the joint PMF, which is here. And what do we know about this? Well, actually, we don't know it directly. Or do we? Yeah, we do here. Here it is.
01:37:09
So it matters whether l is equal to k or not, right? It matters whether, well, in this case, I have l and j. But over there, I have l and k.
01:37:21
It matters whether l and k are equal or not. OK, so let's break this up into the sum over l. l squared, probably x1. xi is l. And xj is l. Plus the sum over l different from k.
01:37:44
lk, probably xi is l. xj is k. In the first one, when they're equal,
01:38:04
we get n sub l minus 1 times n sub l over a capital N squared here. And then here we get sum over l not equal to k.
01:38:22
lk, n sub l over a capital N minus, oops, I'm sorry. This is wrong. N times N minus 1. N minus 1 and then n sub k over n. Like that.
01:38:59
Now, if I didn't have this minus 1 here,
01:39:02
I could combine them again, right? Because this would be nl times nl. It would be nl times nk. So I'm going to separate out this minus 1 term and then over here, I have the sum over l and k.
01:39:28
lk, nl, nk over n times n minus 1.
01:39:55
Now how many terms are in this sum? l and k go minus 1, 0, 1.
01:40:03
So when they're both 1, what do we get? When they're both 1, we get 1 here. N1 squared over n minus 1. I'm sorry, capital N times n minus 1. When 1 is 1 and 1 is minus, we're going to have l equal 1.
01:40:20
So this is l equal k equal 1. Let's consider l equal 1, k equal minus 1. At the same time, let's consider l equal minus 1, k equal 1. Both times, this would be a minus 1.
01:40:42
What would it be here? You'd have n sub 1, n sub minus 1. What would it be here? You'd have n sub minus 1, n sub 1. They're the same, right? So you get two terms that are the same. So it would be minus 2, n1, n minus 1 over capital N times capital N minus 1. Or they could both be minus 1.
01:41:05
In that case, this would be minus 1 times minus 1 would be 1. And then you get n minus 1 squared over capital N, capital N minus 1. And then here, we get two terms, one where this is 1
01:41:20
and one where this is minus 1, so we get minus n1. Both times, this is 1. So there's the expected value of xi xj. And I can see I've run over by four minutes. So next time, I want you to do this.
01:41:48
Substitute that in here, subtract this, and simplify. Substitute this in here, subtract that, simplify, and that gives you the covariance.
01:42:02
That'll be your assignment, which won't be handed in.