In this lab, we will learn about transforming scores in distributions so
that the scores become more useful to us. Before anything is done to the
scores in a distribution, we call the scores **raw
scores**.

At the end of the last lab, you had to add a constant to every score in
the distribution. If you change every score in the distribution in the same
way, this is called a **transformation**.

A transformation can alter the mean and the standard deviation of the distribution so that the location and the space between the numbers may change. However, all the numbers in the distribution stay in the same order.

Suppose that you have a distribution of heights for 10 individuals
measured with a metric ruler. You can transform this distribution into a
different measure, *feet* for example. Thus, the mean will change
because of the change from *feet* to *meters*. The standard
deviation will also change for the same reason. However, what does NOT
change are the heights of any of the 10 people in the distribution.
Everybody stays the same height! Therefore, when height is measured in
*feet* or in *meters*, the position of each person in the
distribution will remain the same.

One of the most common transformations is to convert raw scores into
** z-scores**. Instead of familiar units like

\[z=\dfrac{X-\mu}{\sigma}\]

In the numerator of this formula,
*X* − *μ* is a **deviation**.
In the denominator, is the standard deviation (*σ*), which is
the typical distance of scores to the mean. Thus, the *z*-score is a
ratio of a particular score's deviation to the typical score's
deviation.

Suppose that you have recently taken the SAT test and you get a score of
540. In the information pack that came with your score, it states that the
mean SAT score is *μ* = 500, with a standard deviation
of *σ* = 100. Suppose that you would like to convert
your score into a *z*-score (the rest of the lab will give you some
idea why you might want to do this).

\[z=\dfrac{X-\mu}{\sigma}=\dfrac{540-500}{100} =\dfrac{40}{100}=0.4\]

This means that your score is 0.4 standard deviations above the mean.

ReggieNet: Answer the question about reaction time.

ReggieNet will generate different numbers for each student. Remember to round to 2 decimals, if necessary. Don't forget the negative sign!

ReggieNet: Answer the question about memory for words.

Converting from *z*-scores to raw scores is simple. If we
multiply both sides of the *z*-score formula by *σ* and
add *μ* to both sides, we get the following
formula:

Thus, if we know that *σ* = 5,
*μ* = 10, and that *z* = 2, we would
know that the original score (*X*) would be :

\[2 * 5 + 10=20\]

ReggieNet: Answer the question about the special ops qualifying test.

Go to the menu and select *Descriptive
Statistics*→*Descriptives*.

Then check the *Save standardized values as
variables* box.

This will create a new column in the dataset that includes the
*z*-scores for each data point.

Open the students.sav file
again and create *z*-scores for the Quiz1 variable.

ReggieNet: What is the highest *z*-score for
`Quiz1`

in the dataset?

**Hint**: Rather than wasting time looking at
each and every score, have SPSS create a frequency table
(*Analyze*→*Descriptives*→*Frequencies*)
with the new variable `Zquiz1`

. You'll find `Zquiz1`

at the end of the variable list. Its label in the list will be
*Zscore(quiz1)*. Round your answer to 2 decimal points.

One of the most commonly occurring distributions in nature is the normal distribution.

Let’s examine the normal distribution and see how we work with
probabilities to find the area under the curve for different ranges of
scores. If a distribution is normally distributed then it is
*symmetrical* and *unimodal*. A graph of a normal
distribution is shown below.

A few things to note about normal distributions:

- Not all unimodal, symmetrical curves are normal, but a lot are.
- For this class, we will not worry about how close a distribution is to normal, in fact for most of the course we'll assume that the distribution is normal.
- The equation that defines a smooth curve like that above is
referred to as a
*probability density function*. In the case of the normal curve with a mean*μ*and a standard deviation*σ*, the probability density function is: \[f(X)=\dfrac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\dfrac{X-\mu}{\sigma}\right)^2} \] - The area under the normal curve (or any other curve) must sum to 1. Why? remember that the area under the curve refers to the probabilities (or proportions) and the total probability must equal 1.
- The normal distribution is often transformed into
*z*-scores. - In the image below, you can see the proportions between each standard deviation interval.

In the normal distribution with mean *μ* and a standard
deviation *σ*:

- 68% of the observations fall within 1
*σ*of the mean. - 95% of the observations fall within 2
*σ*of the mean. - 99.7% of the observations fall within 3
*σ*of the mean.

This relationship is sometimes referred to as the “68–95–99.7 rule.”

What is the probability of having an IQ of 85 or less?

For IQ scores, *μ* = 100,
*σ* = 15,

Thus, 85 is −1 standard deviations below the mean.

In the old days, we had to look up the answer in a large and cumbersome
statistical table. Fortunately, we can now use Excel (or other spreadsheet
programs like Corel Quattro or OpenOffice Calc) to get the answer. We will
use Excel's `NORM.DIST`

function.

The `NORM.DIST`

function tells you how much of the normal
curve is less than the value you look up.

Open Excel and in cell `A1`

type
`=NORM.DIST(85,100,15,TRUE)`

Press the Enter key.

Cell `A1`

should now display a value close to 0.1587. This
means that about 15.87% of the population has an IQ of 85 or lower.

In the `NORM.DIST`

function:

- The first value is the one you want to look up (85 in this case).
- The second value is the mean of the population or sample (100, in this case).
- The third is the standard deviation of the population or sample (15, in this case).
- The fourth value (
`TRUE`

) tells the function to calculate the cumulative proportion rather than the density (i.e., the height or frequency of the normal curve).

Suppose we know the percentile and wish to find the score associated with it.

For example, what IQ do you need to be in the 75^{th}
percentile? To find a score associated with a percentage, use the
`NORM.INV`

function.

In cell `A2`

type
`=NORM.INV(.75,100,15)`

Cell `A2`

should now display 110.12 or something close (I
rounded). This means that you need an IQ of about 110 to be at the
75^{th} percentile.

ReggieNet: If IQ is normally distributed, has a mean of 100, and a standard deviation of 15, what proportion of the population has IQ score of 78 or less?

**Hint**: This is not a percentage question,
so don't multiply by 100. Proportion has a range from 0 to 1. Round your
answer to 2 decimal points.

ReggieNet: On a test with a mean of 34 and a standard
deviation of 3, which score falls at the 94^{th} percentile?

**Hint**: Since this is a percentile question,
divide 94 by 100 to convert it to a proportion first. Round your answer to
2 decimal points.

Download this Excel spreadsheet tool and open it.

To use this spreadsheet:

- Select
*Score to Proportion*if you know the score(s) and wish to calculate proportions or probabilities. Select*Proportion to Score*if you wish to know a raw score when you already know the probability or proportion. - Select
*Less Than*,*More Than*,*Between*, or*Exclude Between*, depending on what you wish to do. - Enter the mean in the dark box at the top left.
- Enter the standard deviation in the dark box at the top right.
- Enter the raw score(s) or proportion(s) that are known in the dark boxes below the mean and standard deviation boxes. Remember that proportions MUST range from 0 to 1. Any value outside this range will result in an error.

Here is a silent demonstration of how to use the file:

Suppose you wish to know what proportion of scores are less than 5 when
*μ* = 10 and *σ* = 3.

- You know the score (i.e., 5) and you want to know a proportion so
you select
*Score to Proportion*. - You want to know how much of the scores are less than 5 so select
*Less Than*. - Enter 10 as the mean
- Enter 3 as the standard deviation.
- Enter 5 as the raw score.

You should now see the answer (0.05) in the *Proportion Under
Curve* box.

ReggieNet: What proportion of scores are less than 50
when *μ* = 50 and *σ* = 10?

ReggieNet: What proportion of scores are greater than
64 when *μ* = 50 and
*σ* = 10?

**Hint**: Select *More than* instead of
*Less than* in the listbox. Round to 2 decimals.

ReggieNet: Approximately which score is in the top 25%
of scores (i.e., higher than 75% of scores) in a distribution in which
*μ* = 100 and *σ* = 10?

**Hint**: *Select Proportion to Score*.
Select *More Than* and enter 0.75 in the cumulative proportion box.
The answer will appear in the *Raw Score* box. Round answer to 2
decimals.

Sometimes we need to find the probability that X will fall between two scores rather than simply above a score or below a score.

The spreadsheet tool looks up the cumulative proportions for both
*z*-scores and computes the difference between them.

What is the proportion of the population scores between 22 and 28 on the ACT?

Assume that for the ACT: *μ* = 21,
*σ* = 5.

Before computers did this task for us, we used to have to calculate the
*z*-scores, look up the cumulative proportions associated with those
*z*-scores in a table, and then subtract the difference. The process
used to look like this:

The *z*-score for 22 is \( \dfrac{22-21}{5}=0.20 \).

The *z*-score for 28 is \( \dfrac{28-21}{5}=1.40 \).

The cumulative proportion associated with a *z*-score of 0.20 had
to be looked up in a table like this one. The value is
0.5793.

The cumulative proportion associated with a *z*-score of 1.20 is
0.9192.

The difference between these proportions is 0.9193 − 0.5793 = 0.3400 (with a little rounding error).

To do this with the `NORM.DIST`

function in Excel is fairly
simple:

`=NORM.DIST(28,21,5,TRUE)-NORM.DIST(22,21,5,TRUE)`

However, even this approach is sometimes hard to remember. It is easier to use the spreadsheet tool:

- Make sure that the
*Score to Proportion*option is selected. - Make sure that the
*Between*option is selected. - Enter 21 as the mean in the dark box at the top.
- Enter 5 as the standard deviation.
- Enter 22 and 28 in the
*Raw Scores*boxes.

The box labeled *Proportion Under Curve* should now say 0.34,
which is the same answer obtained above.

ReggieNet: If *μ* = 25 and
*σ* = 10, what proportion is between 34 and 41?

**Hint**: This is not a percentage question,
so don't multiply by 100. Proportion has a range from 0 to 1. Round your
answer to 2 decimal points.

And finally, you might want to know what proportion lies outside two points (essentially the opposite of the last situation). If you did not have the spreadsheet tool, you would solve the problem just like you did with the between type of question but then you would subtract your answer from 1. Thus, in the previous example, 34% of the ACT scores were between 22 and 28. This means that 100% − 34% = 66% of the scores fell outside this range.

What proportion of the population scores lower than 300 or higher than 650 on the Verbal SAT?

Assume: *μ* = 500, *σ* =100

First we find the proportion of the distribution less than 300 and also less than 650.

\[P\left(z \le \dfrac{650-500}{100}\right)= P(z \le 1.5) = 0.9332\] \[P\left(z \le \dfrac{300-500}{100}\right)= P(z \le -2) = 0.0228\]The difference between these numbers is 0.9104. This is the area
*between* the scores. Subtracting 0.9104 from 1 gives us the area
outside the scores.

Thus, 1 − 0.9104 = 0.0896.

Rounding gives us 0.09.

In Excel:
`=1-(NORM.DIST(650,500,100,TRUE)-NORM.DIST(300,500,100,TRUE))`

To answer the problem with the spreadsheet tool, select *Scores to
Proportions*, select *Exclude Between*, enter 500 and 100 as the
mean and standard deviation, and enter 300 and 650 as the 2 raw scores.

ReggieNet: If *μ* = 47 and
*σ* = 6, what proportion is outside of 40 and
44?

Consider the following situation. You take the ACT test and the SAT test. You get a 26 on the ACT and a 620 on the SAT. The college that you apply to only needs one score. Which do you want to send them (that is, which score is better, 26 or 620?).

It is hard to do a direct comparison here because the two distributions have different properties: different means, and different variabilities.

How might we go about it?

- Look at the distribution graphs, locate the scores and compare. However sometimes it is hard to tell when it is close.
- Compute and compare the
*z*-scores:- ACT: \( z=\dfrac{26-18}{6}=1.\bar{3} \)
- SAT: \( z=\dfrac{620-500}{100}=1.2 \)

*z*-score is larger, the ACT score is better than the SAT score. - Compute and compare the cumulative proportions in Excel using the
`NORM.DIST`

function.- ACT:
`=NORM.DIST(26,18,6,TRUE)`

: 0.9088 - SAT:
`=NORM.DIST(620,500,100,TRUE)`

: 0.8849

- ACT:

Methods 2 and 3 are both valid and will yield the same answer every time.

ReggieNet: Suppose that you got a 540 on the SAT (μ
= 500, *σ* = 100) and a 20 on the ACT
(μ = 18, *σ* = 6). Which score is
better?

ReggieNet: Suppose instead that you got a 600 on the
SAT (μ = 500, *σ* = 100) and a 24 on
the ACT (μ = 18, *σ* = 6). Now which
score is better?

If you are interested, you can check out an introduction to a different way of thinking about the normal curve. The material on this video will not be on any exams in this course.

Normal curves are symmetrical. However many variables are lop-sided or
*skewed*. Here is an introduction to skewness. The material on this
video will not be on any exams in this course.

Typically, the first thing we want to know about a distribution is its
central tendency. Second, we want to know about its variability. Then we
want to know about its skewness. What is left to know after that?
*Kurtosis*.

Although *z*-scores are very useful, there are good reasons to
transform them into other types of standard scores.