|
|
Considering measures of variability
The variability of a distribution tells us important
information
about how different the scores are from one another. Variability helps
us understand the shape of the distribution, understand the difference
between the mean and median of the distribution, and make decisions
about
what the data distribution means with regard to our research question.
We'll concentrate on two measures of variability, the range
and
the standard deviation. Consider a very simple distribution
(scores
on a 30 point quiz), with only eight data points.
21, 22, 23, 24, 25, 26, 27, 28
|
Score
X |
Mean
μ |
Deviation
(X-μ) |
Squared Deviation
(X-μ)2 |
|
2 |
|
|
|
|
5 |
|
|
|
|
7 |
|
|
|
|
10 |
|
|
|
|
11 |
|
|
|
| Totals |
|
|
|
|
STEP 1: Our first step is to compute the mean of
the distribution.
Blackboard 2) Compute the mean
for
this
quiz score distribution: 2, 5, 7, 10, 11.
STEP 2: The next step is to figure out how
far away each of the
data points in the distribution is from this standard (the deviations).
Calculate the deviation
of each score from the
mean by subtracting each score in the distribution from the mean you
calculated
above (i.e., score - mean = deviation). REMEMBER THAT DEVIATIONS CAN BE
NEGATIVE.
Blackboard 3) The deviation for 2 is:
Blackboard 4) The deviation for 5 is:
Blackboard 5) The
deviation for
7 is:
Blackboard 6) The
deviation for
10 is:
Blackboard 7) The
deviation for
11 is:
Notice that if you add up all of the deviations they
should/must equal
0. Think about it at a conceptual level. What you are doing is taking
one
side of the distribution and making it positive, and the other side
negative
and adding them together. They should cancel each other out.
STEP 3: The next step is to find the
"average"
of these differences.
But notice that we have a problem. If they always add up to 0, then the
average difference will always be zero. So what we have to do is get
rid
of the negative signs. We do this by squaring the deviations (and then
later we'll reverse this by taking the square root of the sum of the
squared
deviations). So to do this step we'll square the deviations first and
then
add them together, this value we'll refer to as the Sum of the
Squared
Deviations (Sum of Squares for short).
Blackboard 8) Square each of
the
deviation scores and then add up the values you get. This will be the
sum of squared deviations. What
is the sum of squares for this distribution?
STEP 4: Now we have the sum of squares (SS),
but remember that
we're looking for the average of the squared deviations. The averaged
squared deviation is called Variance.
To
calculate variance, we need to divide by the number of individuals
in the sample
minus
1 (n - 1). The reason that we need to subtract 1 is related to
the
fact that our deviations always add up to 0. Because we know the mean
of
our sample in advance, this constrains one of the data points. We will
discuss this issue (called degrees of freedom) in more detail later in
the course. If we are computing the variance for a population
rather than a sample, then we figure out the average deviation by
dividing
by n alone. Some calculators and statistical packages will give
you the option of calculating the variance for either a
sample
or a population.
Blackboard 9) Calculate the
sample
variance by dividing the sum of
squares by N - 1 (i.e., 5 - 1 = 4).
STEP 5: The last step is to reverse that
squaring we did to get
rid of the signs of the deviations. We need to take the square root of
the variance to
get our standard deviation.
Blackboard 10) Calculate the
sample
standard deviation by taking the square root of the sample variance.
Round your answer to 2 digits.
You should always ask yourself, "Does this answer
make sense?" The final answer should be roughly the average distance of
each score from the mean. If your answer is much bigger or much smaller
than you think it should be just by eyeballing the distribution, there
is probably a miscalculation somewhere. When you square numbers, it is
easy for miscalculations to make answers come out way off the mark.
To review:
STEP 1: Compute the mean of the distribution
STEP 2: Find the deviation scores by
subtracting each score
from the mean
STEP 3: Square the deviation scores to get rid
of the sign and
add them up to get the sum of squares
STEP 4: Take the average of the sum of squares
by dividing by
n for a population or n - 1 for a sample
STEP 5: Take the square root to reverse the
squaring that was
done earlier
Let's compare the standard deviation and the range for a second.
The range
is based on just 2 numbers: the lowest and the highest.
The standard deviation is based on every number in the distribution.
Every number matters.
Thus it is easy to see why statisticians prefer the standard deviation
over the range as an accurate and informative measure of variability.
Using SPSS to compute measures of variability
Let's check our calculations using SPSS to find the range and standard
deviation for this distribution.
Open SPSS and instead of opening a
data file,
choose the option to type in new data or just click "Cancel." SPSS
opens to the "Variable View" tab by default. In the "Name" column, type
in the name of a new variable. It does not matter what you call it. You
can do what you want but whenever I don't care what something is named,
I name it "Fred" like this:

Click on the "Data View" tab at the bottom left and type in the quiz
scores (21, 22,
23, 24, 25, 26, 27, 28) into a single column. Like this:

You can access these descriptive statistics in the same
way that we
accessed the mean, median, and mode. Go to "Analyze", "Descriptive
Statistics",
and "Descriptives".

This will open up a window so that you can choose the
variable you want
descriptive statistics on. Choose the variable and click on the arrow
tab
in the middle of the window to put it in the box. The click "Ok." Your
output will display minimum and maximum values, the mean of the
distribution,
and the standard deviation.

You can get the range by chooing "Options" and clicking
the range box
to check it.

Click "Continue"
and then click "Ok." Your output will display minimum and
maximum values,
the range, the mean of the distribution, and the standard deviation.
Blackboard 11) What is the range
of
this distribution?
Blackboard 12) What is the standard deviation
of this distribution?
Comparing variability for different distributions
Let's go back to the students.sav
file and look at the distributions of
quiz scores (quizzes 1-5) that we looked at in lab 8 to compare them
based
on their variability.
Open the students.sav file and
calculate the range
and variability for each quiz (1-5). There are more ways than one to
get means and standard deviations. Let's use the Frequencies function
by clicking Analyse -> Descriptives-> Frequencies. Click the
"Statistics" button and select "Standard Deviation" and "Range" Click
"Continue." Now click the "Charts" button and select "Histograms."
Click "Continue." Select all 5 quiz variables at once in the variable
list at the left and click the black arrow button in the middle. All
five quizzes should be in the "Variable(s)" box to the right. Click
"OK" and inspect the standard deviations, the ranges, and the
histograms of each variable.
A good test discriminates between learners who
really know the material and those who don't. If the scores are nearly
all the same (and thus there is not much variability in the test
scores), it is not clear that all the questions are good questions.
They might be too easy or too hard. A test with good questions will
distinguish between students who know very little, who know some but
not much, who know most of the material but not the finer points, and
who really has mastered the material. Usually this means that there
will be a wide range of scores and a large standard deviation. Also,
you can look at a histogram to make sure that there is not too much
skew in the distribution. A quiz with a lot of scores piled up at the
top means that the test was probably too easy to discriminate between
those who know the material well enough and those who have really
mastered it. A quiz with a lot of scores piled up at the bottom is
probably too hard. Good tests usually have a symmetrical distribution
with a lump in the middle but still have a wide range of scores.
Blackboard 13) Assuming that each
quiz had
10 possible points, which quiz appears
to do
a better job of discriminating
student learning of the material on the quiz (that is, which quiz best
differentiates between high and low performers/students)?
Properties of the standard deviation
Let's look at some properties of the standard deviation
using SPSS.
Enter the following
set of data into an SPSS data file and calculate the standard deviation.
Transform the
distribution
by adding 3 points to
each score (adding a constant to every score in the distribution).
(hint:
you can use the compute
function in SPSS to do this).
Calculate the standard deviation for the transformed variable.
Blackboard 14) Comparing the standard deviations of the original and
transformed variables, what happened?
Now transform the original
distribution by multiplying every score by 2 (again, use the
compute
function). Calculate the new standard deviation.
Blackboard 15) Comparing the standard
deviations of the original and transformed variables, what happened?
Here is what I hope
you learned by doing this:
1) Adding a constant to each score in the distribution will not change
the standard deviation.

So if you add 2 to every score in the distribution, the mean changes
(by 2), but the standard deviation stays the same (since none of the
deviations
would change because you add 2 to each score and the mean changes by
2).
2) Multiplying each score by a constant causes the stardard deviation
to be multiplied by the same constant. This one is easier to think of
with numbers. Suppose that your mean
is 20, and that two of the individuals in your distribution are 21 and
23. If you multiply 21 and 23 by 2 you get 42 and 46, and your mean
also
changes by a factor of 2 and is now 40. Before your deviations were (21
- 20 = 1) & (23 - 20 = 3). But now, your deviations are (42 - 40 =
2) & (46 - 40 = 6). So your deviations are getting twice as big as
well.
|