Lab 10
|
||||||||||||||||||||
|
|
When
we
discussed z-scores, we were using z-scores to
locate a score or set of scores in the population. However, in
our last lab we discussed the practical necessity of sampling.
Recall, that we don't usually have access to the entire population, so
instead we take a subset of the population, a sample. In this lab we
will begin to learn some of the inferential statistical methods
that will allow us to make claims about population parameters based on
sample statistics.
Suppose that you take 3 different random samples from the same population. They are probably going to be different from one another. See the figure below for an example of what I mean. The samples may have different shapes, different means, and different variability. So how do you figure out what the best estimate of the population mean is? There are essentially an infinite number of samples that can be taken from a population if we sample with replacement (put the ones we choose back into the population each time). But the huge set of possible samples forms a simple, orderly, and predictable pattern (a sampling distribution). Because of this, we are able to base our predictions about sample characteristics on the distribution of sample means.
In other words, what we want to do is look at all of the possible samples (of a particular size, this part is important) and make predictions based on the properties of all of them. We do this the same way that we've done it in the past: We essentially find the average of those properties. The Distribution of Sample MeansWe can create a distribution of sample means by looking at all possible samples of a certain size (n) and considering the means of each of those samples. Let's look at a concrete example:
Because this population is so small we actually can know the mean (and variability): m = (2+4+6+8)/4 = 5, but suppose that we didn't, and wanted to be able to make an estimate of this population from samples chosen from the population (like we do when we conduct a research study). step 1: pick a sample size: for this example we'll pick samples of n = 2
step 2: Because we selected such a small population, we can actually consider all of the possible samples that you could get (ignoring duplications resulting from sampling with replacement), and look at their distribution.
____________________________________
step 3: Now you're ready to answer questions
like: What is the probability of getting a sample with a mean greater
than 7? p(
look at our distribution of sample means, we find that 1 out of 16 have a mean greater than 7. So that's our answer: 1/16 = .0625 = 6.25%
Sampling Distribution Simulations
|
||||||||||||||||
| The standard deviation of the distribution of sample means is called the standard error. The standard error is influenced by two factors: the variability of the population (s) and the sample size (n). |
We'll consider each of these factors below:
large s big differences from the pop mean |
small s small differences from the pop mean |
(B) the size of the sample - the larger your sample size (n), the more accurately the sample represents the population. This is known as the Law of large numbers.
|
- If I randomly selected 1 score, how accurately will that score predict the population's mean? |
|
- Suppose that I take 5 scores. Are things more accurate? |
|
- what about 100 scores? |
These two characteristics are combined in the formula for the standard error.
standard error of
=
=

Blackboard 6) Answer the Blackboard question about standard errors.
as n approaches infinity.
Often we are not interested in where individuals fall in
population distributions but rather where a sample is in the
distribution of
sample means. We might have data from a sample and we might wonder if
it really came from a particular population or if it is from a
different population. We can calculate the probability that a sample
came from a particular population using the properties of the normal
curve and the z-score distribution.
Remember that using the Central Limit Theorem, we know the distribution of sample means is normal if n is greater than 30 OR the population is normal.)
In other words, we don't want to know the probability of each individual having a score of 112 or better separately. Instead we want to know as a group, what is the probability of getting an average score of 112 or better.
Next we need to get the mean and standard deviation of the distribution of the samples (note: we'll assume a normal distribution because the original population distribution of IQ scores is normally distributed) so that we can calculate the z-score.
m = 100 (because the population mean is 100).
=
= 15/sq. root of 9 = 15/3 = 5
Now we need to figure out the z-score that
corresponds to this sample mean: the z-score formula looks
exactly the same except for the new subscripts. The subscripts make no
difference in the calculations. They just show that something different
is being calculated (we're
locating a sample in the distribution of sample means rather than
finding a single score in a population):
P(
> 112) = P(Z
>
(112 - 100)/ 5 ) = P(Z >
2.4) = 0.0082
In other words, the probability that we'll get a sample of size n = 9 students with an average IQ equal to or greater than 112 is very small (0.0082). In our next labs we will extend this result to make claims concerning hypotheses about our population and our sample.
Population distribution |
- at first it looks wrong
- it seems like 112 should be less than a z = 1, because 115 is where z should equal 1 |
Distribution of Sample means |
- however, we must remember that this isn't
the correct distribution to be looking at, we need to look at the
distribution of sample means.
-we know that the distribution of sample means has a standard error = 5 and a mean = 100. - So 112 should have a z >2 |
=
= 15/sq. root of 25 = 15/5 = 3
Now we need to work backwards because we don't know the z-score. We can determine the z-score for the range based on the portion of the distribution we're looking for. We want the top 10% of the distribution. This corresponds to a cumulative proportion of .90 for the distribution (i.e., the top 10% is higher than 90% or .90 of the distribution). If we use the NORMINV function in Excel (select any cell and type "=NORMINV(.9,0,1)" [without any quotes]), we find that .90 corresponds to a z-score of about 1.28.
So for our example:step 1: look up the z-score associated with less than or equal to 0.9 (that is, for the top 10%).
step 2: work backwards through the z-score formula to solve for the sample mean
= z *
+ m =
(1.28)(3) + 100 = 103.84
Now try some on your own on Blackboard questions 7 through 9.
Blackboard 7 through 9) Answer 3
questions about group psychotherapy.
Answer additional review questions.