Lab 7

Standard scores and the normal distribution

 

Transformations

At the end of the last lab you had to add a constant to every score in the distribution. If you change every score in the distribution in the same way, this is called a transformation. Essentially what you do with a transformation is change the scale of the distribution. This will typically change some of the properties of the overall distribution (e.g., center and spread), but within the distribution all of the points remain in the same location relative to each other.

Consider the following example:

    Suppose that you have a distribution of heights for 10 individuals measured with a metric ruler. You can transform this distribution into a different measure, feet for example. So now the mean will change (it'll be in terms of feet instead of meters), as will the standard deviation (again it terms of feet rather than meters). However, what doesn't change are the heights of any of the 10 people in the distribution. Everybody stays the same height!

One of the most common transformations that is performed on distributions is to convert "raw scores" into z-scores. Z-scores are measured in standard deviation units. That is, the transformation removes measures like feet or meters, and replaces them with a unit that can be interpreted as "how many standard deviations away from the mean is this point." The transformation is performed by using the z-score formula.


z  = (X - μ)

σ


This formula computes the deviation between a score and the mean of the distribution and divides it by the average deviation in the distribution (which is the standard deviation).

Consider the following example:

Suppose that you have recently taken the SAT test and you get a score of 540. In the information pack that came with your score, it states that the mean SAT score is m = 500 and with a standard deviation of s = 100. Suppose that you would like to convert your score into a z-score (the rest of the lab will give you some idea why you might want to do this).

z-score = (540-500)/100 = 40/100 = .4

This means that your score is .4 standard deviations above the mean.


Blackboard 1) Answer the question about reaction time (Blackboard will generate different numbers for each student. Remember to round to 2 decimals, if necessary. Don't forget the negative sign!)

Blackboard 2) Answer the question about memory for words.


Converting from z-scores to raw scores is simple. If we multiply both sides of the z-score formula by σ and add μ to both sides, we get the following formula:

zσ + μ = X

Thus, if we know that σ = 5, μ = 10, and that z = 2, we would know that the original score (X) would be : 2 * 5 + 10 = 20

Blackboard 3) Answer the question about the special ops qualifying test.



You can use SPSS to compute z-scores. Go to the Descriptive Statistics menu and select descriptives.

Then click on the save as standardized values box.

This will create a new column in the dataset that includes the z-scores for each data point.


Open the students.sav file again and create z-scores for the Quiz1 variable.

Blackboard 4) What is the highest z-score for Quiz1 in the dataset? Hint: Rather than wasting time looking at each and every score, have SPSS create a frequency table (Analyze->Descriptives->Frequencies) with the new variable Zquiz1. You'll find Zquiz1 at the end of the variable list. Its label in the list will be "Zscore(quiz1)." Round your answer to 2 decimal points.


The Normal Distribution

One of the most commonly occurring distributions is the Normal Distribution.

Let's examine the Normal Distribution and see how we work with probabilities to find the area under the curve for different ranges of scores. If a distribution is normally distributed then it is symmetrical and unimodal. A graph of a normal distribution is shown below.

A few things to note about Normal Distributions.

  • Not all unimodal, symmetrical curves are normal, but a lot are
  • For this class, we won't worry about how close a distribution is to normal, in fact for most of the course we'll assume that the distribution is normal
  • A smooth curve like that above is referred to as a density curve (rather than a frequency curve)
  • The area under (any density) curve must sum to 1. Why? remember that the area under the curve refers to the probabilities (or proportions) and the total probability must equal 1.
  • The normal distribution is often transformed into z-scores.
  • For a normal distribution:
      34.13% of the scores will fall between the mean (m) and 1 stdev.
      13.59% of the scores will fall between 1stdev & 2stdev.
      2.28% of the scores will fall between the 2stdev & 3stdev.

This relationship is sometimes referred to as the 68-95-99.7 rule.
    In the normal distribution with mean m and a standard deviation s:
    • 68% of the observations fall within s of the mean m.
    • 95% of the observations fall within 2s of the mean m.
    • 99.7% of the observations fall within 3s of the mean m.





What is the probability of having an IQ of 85 or less?
A more compact way of asking this question uses probability notation like this:
What is p(IQ < 85)?
    for IQ scores m = 100, s =15
z = (IQ - μ) / σ = (85 - 100) / 15 = -1

    Thus, 85 is -1 standard deviations from the mean.

However, we don't know how much of the figure at the right is shaded.
In the old days, we had to look up the answer in a large and cumbersome
statistical table. Fortunately, we can use now use Excel (or other spreadsheet
programs like Corel Quattro or OpenOffic Calc) to get the answer. We will use
Excel's NORMDIST function.

The NORMDIST function tells you how much of the normal curve is less than
the value you look up.

Open Excel and in cell A1 type "=NORMDIST(85,100,15,TRUE)" (without the quotes).

Press the Enter key.

Cell A1 should now display a value close to 0.1587. This means that about
15.87% of the population has an IQ of 85 or lower.

In the NORMDIST function, the first value is the one you want to look up (85
in this case).
The second value is the mean of the population or sample
(100, in this case) .
The third is the standard deviation of the population or sample
(15, in this case).
The fourth value ("TRUE") tells the function to calculate the cumulative
proportion rather than the density (i.e., the height or frequency of the
normal curve).

What if we know the percentile and wish to find the score associated with it?
For example, what IQ do you need to be in the 75th percentile? To find a
score associated with a percentage, use the NORMINV function.

In cell A2 type "=NORMINV(.75,100,15)" (without the quotes)

Cell A2 should now display 110.12 or something close (I rounded). This means that
you need an IQ of about 110 to be at the 75th percentile.


Blackboard 5) What is the probability of having an IQ score of 78 or less? (Hint: This is not a percentage question, so don't multiply by 100. Probability has a range from 0 to 1. Round your answer to 2 decimal points.)

Blackboard 6) On a test with a mean of 34 and a standard deviation of 3, which score falls at the 94th percentile? (Hint: Since this is a percentile question, divide 94 by 100 to convert it to a proportion first. Round your answer to 2 decimal points.)

Although I do want you to be aware of the fact that most of the statistical tables we used to have to use can be replaced with Excel, I don't expect you to become an "Excel Master" in this course. I think that a little time invested in learning Excel will pay large dividends in many aspects of your life (not just in statistics). People who know Excel extremely well get raises at work because they can be many times more productive than their peers in a wide variety of tasks. That being said, some of the questions I would like you to be able to answer in this course can become unnecessarily complicated using a blank Excel sheet to start with. Therefore, because I wish to spare you needless headaches, I made an Excel spreadsheet tool that makes this whole process easier.

Download this file and open it in Excel.

To use this spreadsheet:
1. Select "Score to Proportion" if you know the score(s) and which to calculate proportion or probabilities. Select "Proportion to Score" if you wish to know a raw score when you already know the probability or proportion.
2. Select "Less Than", "More Than", "Between", or "Exclude Between" depending on what you which to do.
3. Enter the mean in the dark box at the top left.
4. Enter the standard deviation in the dark box at the top right.
5. Enter the raw score(s) or proportion(s) that are known in the dark boxes below the mean and standard deviation boxes. Remember that proportions MUST range from 0 to 1. Any value outside this range will result in an error.

Here is a silent demonstration of how to use the file

style="color: rgb(0, 0, 0);">
Example:
Suppose you wish to know what proportion of scores are less than 5 when μ = 10 and σ = 3.
1. You know the score (i.e., 5) and you want to know a proportion so you select "Score to Proportion."
2. You want to know how much of the scores are LESS THAN 5 so select "Less Than."
3. Enter 10 as the mean
4. Enter 3 as the standard deviation.
5. Enter 5 as the raw score.
You should now see the answer (.05) in the "Proportion Under Curve" box.

Blackboard 7) What proportion of scores are less than 50 when μ = 50 and σ = 10?
Blackboard 8) What proportion of scores are more than 64 when μ = 50 and σ = 10? Hint: Select "More than" instead of "Less than" in the listbox.
Blackboard 9) Approximately which score is in the top 25% of scores (i.e., higher than 75% of scores)  in a distribution  in which
μ = 100 and σ = 10? Hint: Select "Proportion to Score." Select "More Than" and enter .75 in the cumulative proportion box. The answer will appear in the "Raw Score" box. Round answer to 2 decimals.



Sometimes we need to find the probability that X will fall between two scores rather than simply above a score or below a score.

The spreadsheet tool looks up the cumulative proportions for both z-scores and computes the difference between them.


Example
:
    What is the proportion of the population scores between 22 and 28
    on the ACT?

    Assume that for the ACT: μ = 21, σ = 5

    Before computers did this task for us, we used to have to calculate
     the z-scores, look up the cumulative proportions associated with
    those z-scores in a table, and then subtract the difference.The
    process used to look like this:

    The z-score of 22 is (22 - 21) / 5 = .20
    The z-score of 28 is (28 - 21) / 5 = 1.40

    The cumulative proportion associated with a z-score of .20 is .5793 
    (value obtained from a table like
    this one).
    The cumulative proportion associated with a z-score of 1.20 is .9192.
    The difference between these proportions is .9193 - .5793 = .3400
    (with a little rounding error).


    To do all this with the spreadsheet is simple:
    1. Make sure that the "Score to Proportion" option is selected.
    2. Make sure that the "Between" option is selected.

    3. Enter 21 as the mean in the dark box at the top.
    4. Enter 5 as the standard deviation.
    The box labeled "Proportion Under Curve" should now say .34,
    which is the same answer
    obtained above.




Blackboard 10) If μ = 25 and σ = 10, what proportion is between 34 and 41? (Hint: This is not a percentage question, so don't multiply by 100. Proportion has a range from 0 to 1. Round your answer to 2 decimal points.)


And finally, you might want to know what proportion lies outside two points (essentially the opposite of the last situation). If you did not have the spreadsheet tool, you would solve the problem just like you did with the between type of question but then you would subtract your answer from 1. Thus, in the previous example, 34% of the ACT scores were between 22 and 28. This means that 100% - 34% = 66% of the scores fell outside this range.

Example:

    What is the prob. of scoring lower than 300 or higher than 650 on the SAT?

    Assume: m = 500, s =100

    p(z >  (650 - 500))= p(z <  1.5) = .9332
    100

    p(z < (300 - 500))= p(z < -2.0) = .0228
    100
    The difference between these numbers is .9104. This is the area between the scores. Subtracting .9104 from 1 gives us the area outside the scores.
    Thus, 1 - .9104 = .0896
    Rounding gives us .09.

    To answer the problem with the spreadsheet tool, select "Scores to Proportions", select "Exclude Between", enter 500 and 100 as the mean and standard
    deviation, and enter 300 and 650 as the 2 raw scores.
Blackboard 11) If μ = 47 and σ = 6, what proportion is outside of 40 and 44?


Comparing Distributions

    Consider the following example:

    The distribution is of SAT scores.

    The mean (m) is 500.
    The standard deviation (s) is 100.

    If you got a score of 650 on the SAT, what is the corresponding z-score?

    (X - ) / s = (650 - 500) / 100 = 150 / 100 = 1.5

    So your score (650) is 1.5 standard deviations above the mean.

    Now let's think about why making the transformation into z-scores is important. Suppose that we want to compare two scores from two distributions. If these distributions each have a different mean and standard deviation, then this task can be difficult. However, if we transform each distribution into z-scores (standardize the distribution, like SPSS did in the earlier section), then we can compare the distributions more easily.

    Consider the following situation. You take the ACT test and the SAT test. You get a 26 on the ACT and a 620 on the SAT. The college that you apply to only needs one score. Which do you want to send them (that is, which score is better, 26 or 620?).

    It is hard to do a direct comparison here because the two distributions have different properties: different means, and different variabilities.

    How might we go about it?

      Step 1: look at the distribution graphs, locate the scores and compare -- still hard to tell
      Step 2: think about cumulative percentiles and percentile ranks -- this will work
      Step 3: try and take the deviations and standard deviations into account by converting all the scores to z-scores

        e.g., ACT mean = 18, SD = 6, deviation = 26 - 18 = 8
        so an 8 is 1.33 SD above the mean (8 / 6)
        SAT mean = 500, SD = 100, deviation = 620-500=120
        so a 620 is 1.2 SD above the mean (120 / 100)
        - so the ACT score is better than the SAT score

    So to be able to make a comparison, one approach would be to transform both distributions into a standardized distribution.

    We can transform any and all observations or values from a distribution to a z-score if we know either the μ or σ.

      Blackboard 12) Suppose that you got a 540 on the SAT (μ = 500, σ = 100) and a 20 on the ACT (μ = 18, σ = 6). Which score is better?

      Blackboard 13) Suppose instead that you got a 600 on the SAT (μ = 500, σ = 100) and a 24 on the ACT (μ = 18, σ = 6). Now which score is better?

      There are also review questions on Blackboard to complete.