Psychology 138
Lab 6

Variability




 

Considering measures of variability

The variability of a distribution tells us important information about how different the scores are from one another. Variability helps us understand the shape of the distribution, understand the difference between the mean and median of the distribution, and make decisions about what the data distribution means with regard to our research question.

We'll concentrate on two measures of variability, the range and the standard deviation. Consider a very simple distribution (scores on a 30 point quiz), with only eight data points.

 21, 22, 23, 24, 25, 26, 27, 28

    The simplest measure of variability is the range.
      The range is the difference between the largest (maximum) X value and the smallest (minimum) X value. In this case the answer would be 28 - 21 = 5

      Blackboard 1) What is the range of these scores? (Don't answer from the example range above. Blackboard will generate random scores for each person.)
       
       

    We can also calculate the standard deviation of the distribution. It measures how far off all of the individuals in the distribution are from a standard, where that standard is the mean of the distribution. In other words, it is the "average distance" that each point is from the mean. Calculation of the standard deviation will require many steps. It will be much easier if you create a table that looks like the one below and then fill in the blanks as you go. Trust me on this one.

Score
X
Mean
μ
Deviation
(X-μ)
Squared Deviation
(X-μ)2

2



5



7



10



11


Totals




      STEP 1: Our first step is to compute the mean of the distribution.

      Blackboard 2) Compute the mean for this quiz score distribution: 2, 5, 7, 10, 11.

      STEP 2: The next step is to figure out how far away each of the data points in the distribution is from this standard (the deviations).

      Calculate the deviation of each score from the mean by subtracting each score in the distribution from the mean you calculated above (i.e., score - mean = deviation). REMEMBER THAT DEVIATIONS CAN BE NEGATIVE.
      Blackboard 3) The deviation for 2 is:
      Blackboard 4) The deviation for 5 is:
      Blackboard 5) The deviation for 7 is:
      Blackboard 6) The deviation for 10 is:
      Blackboard 7) The deviation for 11 is:

      Notice that if you add up all of the deviations they should/must equal 0. Think about it at a conceptual level. What you are doing is taking one side of the distribution and making it positive, and the other side negative and adding them together. They should cancel each other out.

      STEP 3: The next step is to find the "average" of these differences. But notice that we have a problem. If they always add up to 0, then the average difference will always be zero. So what we have to do is get rid of the negative signs. We do this by squaring the deviations (and then later we'll reverse this by taking the square root of the sum of the squared deviations). So to do this step we'll square the deviations first and then add them together, this value we'll refer to as the Sum of the Squared Deviations (Sum of Squares for short).

      Blackboard 8) Square each of the deviation scores and then add up the values you get. This will be the sum of squared deviations. What is the sum of squares for this distribution?

      STEP 4: Now we have the sum of squares (SS), but remember that we're looking for the average of the squared deviations. The averaged squared deviation is called Variance. To calculate variance, we need to divide by the number of individuals in the sample minus 1 (n - 1). The reason that we need to subtract 1 is related to the fact that our deviations always add up to 0. Because we know the mean of our sample in advance, this constrains one of the data points. We will discuss this issue (called degrees of freedom) in more detail later in the course. If we are computing the variance for a population rather than a sample, then we figure out the average deviation by dividing by n alone. Some calculators and statistical packages will give you the option of calculating the variance for either a sample or a population. 

      Blackboard 9) Calculate the sample variance by dividing the sum of squares by N - 1 (i.e., 5 - 1 = 4).

      STEP 5: The last step is to reverse that squaring we did to get rid of the signs of the deviations. We need to take the square root of the variance to get our standard deviation.

      Blackboard 10) Calculate the sample standard deviation by taking the square root of the sample variance. Round your answer to 2 digits.

      You should always ask yourself, "Does this answer make sense?" The final answer should be roughly the average distance of each score from the mean. If your answer is much bigger or much smaller than you think it should be just by eyeballing the distribution, there is probably a miscalculation somewhere. When you square numbers, it is easy for miscalculations to make answers come out way off the mark.

      To review:

        STEP 1: Compute the mean of the distribution
        STEP 2: Find the deviation scores by subtracting each score from the mean
        STEP 3: Square the deviation scores to get rid of the sign and add them up to get the sum of squares
        STEP 4: Take the average of the sum of squares by dividing by n for a population or n - 1 for a sample
        STEP 5: Take the square root to reverse the squaring that was done earlier

    Let's compare the standard deviation and the range for a second.
      The range is based on just 2 numbers: the lowest and the highest.
      The standard deviation is based on every number in the distribution. Every number matters.
      Thus it is easy to see why statisticians prefer the standard deviation over the range as an accurate and informative measure of variability.


Using SPSS to compute measures of variability

Let's check our calculations using SPSS to find the range and standard deviation for this distribution.

Open SPSS and instead of opening a data file, choose the option to type in new data or just click "Cancel." SPSS opens to the "Variable View" tab by default. In the "Name" column, type in the name of a new variable. It does not matter what you call it. You can do what you want but whenever I don't care what something is named, I name it "Fred" like this:



Click on the "Data View" tab at the bottom left and type in the quiz scores (21, 22, 23, 24, 25, 26, 27, 28) into a single column. Like this:


You can access these descriptive statistics in the same way that we accessed the mean, median, and mode. Go to "Analyze", "Descriptive Statistics", and "Descriptives".


This will open up a window so that you can choose the variable you want descriptive statistics on. Choose the variable and click on the arrow tab in the middle of the window to put it in the box. The click "Ok." Your output will display minimum and maximum values, the mean of the distribution, and the standard deviation.

You can get the range by chooing "Options" and clicking the range box to check it.

Click "Continue" and then click "Ok." Your output will display minimum and maximum values, the range, the mean of the distribution, and the standard deviation.

Blackboard 11) What is the range of this distribution?
Blackboard 12) What is the standard deviation
of this distribution?


Comparing variability for different distributions


      Let's go back to the students.sav file and look at the distributions of quiz scores (quizzes 1-5) that we looked at in lab 8 to compare them based on their variability.

      Open the students.sav file and calculate the range and variability for each quiz (1-5). There are more ways than one to get means and standard deviations. Let's use the Frequencies function by clicking Analyse -> Descriptives-> Frequencies. Click the "Statistics" button and select "Standard Deviation" and "Range" Click "Continue." Now click the "Charts" button and select "Histograms." Click "Continue." Select all 5 quiz variables at once in the variable list at the left and click the black arrow button in the middle. All five quizzes should be in the "Variable(s)" box to the right. Click "OK" and inspect the standard deviations, the ranges, and the histograms of each variable.


      A good test discriminates between learners who really know the material and those who don't. If the scores are nearly all the same (and thus there is not much variability in the test scores), it is not clear that all the questions are good questions. They might be too easy or too hard. A test with good questions will distinguish between students who know very little, who know some but not much, who know most of the material but not the finer points, and who really has mastered the material. Usually this means that there will be a wide range of scores and a large standard deviation. Also, you can look at a histogram to make sure that there is not too much skew in the distribution. A quiz with a lot of scores piled up at the top means that the test was probably too easy to discriminate between those who know the material well enough and those who have really mastered it. A quiz with a lot of scores piled up at the bottom is probably too hard. Good tests usually have a symmetrical distribution with a lump in the middle but still have a wide range of scores.

Blackboard 13) Assuming that each quiz had 10 possible points, which quiz appears to do a better job of discriminating student learning of the material on the quiz (that is, which quiz best differentiates between high and low performers/students)?


Properties of the standard deviation

Let's look at some properties of the standard deviation using SPSS.

Enter the following set of data into an SPSS data file and calculate the standard deviation.

2, 2, 8, 14, 14
Transform the distribution by adding 3 points to each score (adding a constant to every score in the distribution). (hint: you can use the compute function in SPSS to do this).
Calculate the standard deviation for the transformed variable.
Blackboard 14) Comparing the standard deviations of the original and transformed variables, what happened?


Now transform the original distribution by multiplying every score by 2 (again, use the compute function). Calculate the new standard deviation.
Blackboard 15)
Comparing the standard deviations of the original and transformed variables, what happened?

Here is what I hope you learned by doing this:



    1) Adding a constant to each score in the distribution will not change the standard deviation.

      So if you add 2 to every score in the distribution, the mean changes (by 2), but the standard deviation stays the same (since none of the deviations would change because you add 2 to each score and the mean changes by 2).
2) Multiplying each score by a constant causes the stardard deviation to be multiplied by the same constant. This one is easier to think of with numbers. Suppose that your mean is 20, and that two of the individuals in your distribution are 21 and 23. If you multiply 21 and 23 by 2 you get 42 and 46, and your mean also changes by a factor of 2 and is now 40. Before your deviations were (21 - 20 = 1) & (23 - 20 = 3). But now, your deviations are (42 - 40 = 2) & (46 - 40 = 6). So your deviations are getting twice as big as well.