# Transformations

In this lab, we will learn about transforming scores in distributions so that the scores become more useful to us. Before anything is done to the scores in a distribution, we call the scores raw scores.

At the end of the last lab, you had to add a constant to every score in the distribution. If you change every score in the distribution in the same way, this is called a transformation.

A transformation can alter the mean and the standard deviation of the distribution so that the location and the space between the numbers may change. However, all the numbers in the distribution stay in the same order.

#### Example:

Suppose that you have a distribution of heights for 10 individuals measured with a metric ruler. You can transform this distribution into a different measure, feet for example. Thus, the mean will change because of the change from feet to meters. The standard deviation will also change for the same reason. However, what does NOT change are the heights of any of the 10 people in the distribution. Everybody stays the same height! Therefore, when height is measured in feet or in meters, the position of each person in the distribution will remain the same.

# z-scores

One of the most common transformations is to convert raw scores into z-scores. Instead of familiar units like meters or grams, z-scores are measured in standard deviation units. A z-score measures how many standard deviations a particular score is from the mean. The transformation is performed by using the z-score formula:

$z=\dfrac{X-\mu}{\sigma}$

In the numerator of this formula, X − μ is a deviation. In the denominator, is the standard deviation (σ), which is the typical distance of scores to the mean. Thus, the z-score is a ratio of a particular score's deviation to the typical score's deviation.

#### Example:

Suppose that you have recently taken the SAT test and you get a score of 540. In the information pack that came with your score, it states that the mean SAT score is μ = 500, with a standard deviation of σ = 100. Suppose that you would like to convert your score into a z-score (the rest of the lab will give you some idea why you might want to do this).

$z=\dfrac{X-\mu}{\sigma}=\dfrac{540-500}{100} =\dfrac{40}{100}=0.4$

This means that your score is 0.4 standard deviations above the mean.

ReggieNet will generate different numbers for each student. Remember to round to 2 decimals, if necessary. Don't forget the negative sign!

## Convert from z-scores back to raw scores.

Converting from z-scores to raw scores is simple. If we multiply both sides of the z-score formula by σ and add μ to both sides, we get the following formula:

$X=z\sigma+\mu$

Thus, if we know that σ = 5, μ = 10, and that z = 2, we would know that the original score (X) would be :

$2 * 5 + 10=20$

# Using SPSS to compute z-scores

Go to the menu and select Descriptive StatisticsDescriptives.

Then check the Save standardized values as variables box.

This will create a new column in the dataset that includes the z-scores for each data point.

Open the students.sav file again and create z-scores for the Quiz1 variable.

ReggieNet: What is the highest z-score for Quiz1 in the dataset?

Hint: Rather than wasting time looking at each and every score, have SPSS create a frequency table (AnalyzeDescriptivesFrequencies) with the new variable Zquiz1. You'll find Zquiz1 at the end of the variable list. Its label in the list will be Zscore(quiz1). Round your answer to 2 decimal points.

## The Normal Distribution

One of the most commonly occurring distributions in nature is the normal distribution.

Let’s examine the normal distribution and see how we work with probabilities to find the area under the curve for different ranges of scores. If a distribution is normally distributed then it is symmetrical and unimodal. A graph of a normal distribution is shown below.

A few things to note about normal distributions:

• Not all unimodal, symmetrical curves are normal, but a lot are.
• For this class, we will not worry about how close a distribution is to normal, in fact for most of the course we'll assume that the distribution is normal.
• The equation that defines a smooth curve like that above is referred to as a probability density function. In the case of the normal curve with a mean μ and a standard deviation σ, the probability density function is: $f(X)=\dfrac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\dfrac{X-\mu}{\sigma}\right)^2}$
• The area under the normal curve (or any other curve) must sum to 1. Why? remember that the area under the curve refers to the probabilities (or proportions) and the total probability must equal 1.
• The normal distribution is often transformed into z-scores.
• In the image below, you can see the proportions between each standard deviation interval.

In the normal distribution with mean μ and a standard deviation σ:

• 68% of the observations fall within 1σ of the mean.
• 95% of the observations fall within 2σ of the mean.
• 99.7% of the observations fall within 3σ of the mean.

This relationship is sometimes referred to as the “68–95–99.7 rule.”

## Which percentile is associated with a particular score in the normal distribution?

What is the probability of having an IQ of 85 or less?

For IQ scores, μ = 100, σ = 15,

$z=\dfrac{IQ-\mu}{\sigma}=\dfrac{85-100}{15}=-1$

Thus, 85 is −1 standard deviations below the mean.

In the old days, we had to look up the answer in a large and cumbersome statistical table. Fortunately, we can now use Excel (or other spreadsheet programs like Corel Quattro or OpenOffice Calc) to get the answer. We will use Excel's NORM.DIST function.

The NORM.DIST function tells you how much of the normal curve is less than the value you look up.

Open Excel and in cell A1 type =NORM.DIST(85,100,15,TRUE)

Press the Enter key.

Cell A1 should now display a value close to 0.1587. This means that about 15.87% of the population has an IQ of 85 or lower.

In the NORM.DIST function:

• The first value is the one you want to look up (85 in this case).
• The second value is the mean of the population or sample (100, in this case).
• The third is the standard deviation of the population or sample (15, in this case).
• The fourth value (TRUE) tells the function to calculate the cumulative proportion rather than the density (i.e., the height or frequency of the normal curve).

## Which score in a normal distribution is associated with a particular percentile?

Suppose we know the percentile and wish to find the score associated with it.

For example, what IQ do you need to be in the 75th percentile? To find a score associated with a percentage, use the NORM.INV function.

In cell A2 type =NORM.INV(.75,100,15)

Cell A2 should now display 110.12 or something close (I rounded). This means that you need an IQ of about 110 to be at the 75th percentile.

ReggieNet: If IQ is normally distributed, has a mean of 100, and a standard deviation of 15, what proportion of the population has IQ score of 78 or less?

Hint: This is not a percentage question, so don't multiply by 100. Proportion has a range from 0 to 1. Round your answer to 2 decimal points.

ReggieNet: On a test with a mean of 34 and a standard deviation of 3, which score falls at the 94th percentile?

Hint: Since this is a percentile question, divide 94 by 100 to convert it to a proportion first. Round your answer to 2 decimal points.

## The Area Under the Normal Curve Spreadsheet

1. Select Score to Proportion if you know the score(s) and wish to calculate proportions or probabilities. Select Proportion to Score if you wish to know a raw score when you already know the probability or proportion.
2. Select Less Than, More Than, Between, or Exclude Between, depending on what you wish to do.
3. Enter the mean in the dark box at the top left.
4. Enter the standard deviation in the dark box at the top right.
5. Enter the raw score(s) or proportion(s) that are known in the dark boxes below the mean and standard deviation boxes. Remember that proportions MUST range from 0 to 1. Any value outside this range will result in an error.

Here is a silent demonstration of how to use the file:

### Example:

Suppose you wish to know what proportion of scores are less than 5 when μ = 10 and σ = 3.

1. You know the score (i.e., 5) and you want to know a proportion so you select Score to Proportion.
2. You want to know how much of the scores are less than 5 so select Less Than.
3. Enter 10 as the mean
4. Enter 3 as the standard deviation.
5. Enter 5 as the raw score.

You should now see the answer (0.05) in the Proportion Under Curve box.

ReggieNet: What proportion of scores are less than 50 when μ = 50 and σ = 10?

ReggieNet: What proportion of scores are greater than 64 when μ = 50 and σ = 10?

Hint: Select More than instead of Less than in the listbox. Round to 2 decimals.

ReggieNet: Approximately which score is in the top 25% of scores (i.e., higher than 75% of scores) in a distribution in which μ = 100 and σ = 10?

Hint: Select Proportion to Score. Select More Than and enter 0.75 in the cumulative proportion box. The answer will appear in the Raw Score box. Round answer to 2 decimals.

Sometimes we need to find the probability that X will fall between two scores rather than simply above a score or below a score.

The spreadsheet tool looks up the cumulative proportions for both z-scores and computes the difference between them.

#### Example:

What is the proportion of the population scores between 22 and 28 on the ACT?

Assume that for the ACT: μ = 21, σ = 5.

Before computers did this task for us, we used to have to calculate the z-scores, look up the cumulative proportions associated with those z-scores in a table, and then subtract the difference. The process used to look like this:

The z-score for 22 is $$\dfrac{22-21}{5}=0.20$$.

The z-score for 28 is $$\dfrac{28-21}{5}=1.40$$.

The cumulative proportion associated with a z-score of 0.20 had to be looked up in a table like this one. The value is 0.5793.

The cumulative proportion associated with a z-score of 1.20 is 0.9192.

The difference between these proportions is 0.9193 − 0.5793 = 0.3400 (with a little rounding error).

To do this with the NORM.DIST function in Excel is fairly simple:

=NORM.DIST(28,21,5,TRUE)-NORM.DIST(22,21,5,TRUE)

However, even this approach is sometimes hard to remember. It is easier to use the spreadsheet tool:

1. Make sure that the Score to Proportion option is selected.
2. Make sure that the Between option is selected.
3. Enter 21 as the mean in the dark box at the top.
4. Enter 5 as the standard deviation.
5. Enter 22 and 28 in the Raw Scores boxes.

The box labeled Proportion Under Curve should now say 0.34, which is the same answer obtained above.

ReggieNet: If μ = 25 and σ = 10, what proportion is between 34 and 41?

Hint: This is not a percentage question, so don't multiply by 100. Proportion has a range from 0 to 1. Round your answer to 2 decimal points.

And finally, you might want to know what proportion lies outside two points (essentially the opposite of the last situation). If you did not have the spreadsheet tool, you would solve the problem just like you did with the between type of question but then you would subtract your answer from 1. Thus, in the previous example, 34% of the ACT scores were between 22 and 28. This means that 100% − 34% = 66% of the scores fell outside this range.

#### Example:

What proportion of the population scores lower than 300 or higher than 650 on the Verbal SAT?

Assume: μ = 500, σ =100

First we find the proportion of the distribution less than 300 and also less than 650.

$P\left(z \le \dfrac{650-500}{100}\right)= P(z \le 1.5) = 0.9332$ $P\left(z \le \dfrac{300-500}{100}\right)= P(z \le -2) = 0.0228$

The difference between these numbers is 0.9104. This is the area between the scores. Subtracting 0.9104 from 1 gives us the area outside the scores.

Thus, 1 − 0.9104 = 0.0896.

Rounding gives us 0.09.

In Excel: =1-(NORM.DIST(650,500,100,TRUE)-NORM.DIST(300,500,100,TRUE))

To answer the problem with the spreadsheet tool, select Scores to Proportions, select Exclude Between, enter 500 and 100 as the mean and standard deviation, and enter 300 and 650 as the 2 raw scores.

ReggieNet: If μ = 47 and σ = 6, what proportion is outside of 40 and 44?

# Comparing Distributions

Now let's think about why making the transformation into z-scores is important. Suppose that we want to compare two scores from two distributions. If these distributions each have a different mean and standard deviation, then this task can be difficult. However, if we transform each distribution into z-scores (standardize the distribution, like SPSS did in the earlier section), then we can compare the distributions more easily.

Consider the following situation. You take the ACT test and the SAT test. You get a 26 on the ACT and a 620 on the SAT. The college that you apply to only needs one score. Which do you want to send them (that is, which score is better, 26 or 620?).

It is hard to do a direct comparison here because the two distributions have different properties: different means, and different variabilities.

How might we go about it?

1. Look at the distribution graphs, locate the scores and compare. However sometimes it is hard to tell when it is close.
2. Compute and compare the z-scores:
• ACT: $$z=\dfrac{26-18}{6}=1.\bar{3}$$
• SAT: $$z=\dfrac{620-500}{100}=1.2$$
Because its z-score is larger, the ACT score is better than the SAT score.
3. Compute and compare the cumulative proportions in Excel using the NORM.DIST function.
• ACT: =NORM.DIST(26,18,6,TRUE): 0.9088
• SAT: =NORM.DIST(620,500,100,TRUE): 0.8849
Because this cumulative proportion is higher, the ACT score is better than the SAT score.

Methods 2 and 3 are both valid and will yield the same answer every time.

ReggieNet: Suppose that you got a 540 on the SAT (μ = 500, σ = 100) and a 20 on the ACT (μ = 18, σ = 6). Which score is better?

ReggieNet: Suppose instead that you got a 600 on the SAT (μ = 500, σ = 100) and a 24 on the ACT (μ = 18, σ = 6). Now which score is better?

# Beyond Psy 138: The Normal Curve & and the Central Limit Theorem (Sum of the many reasons variables are normally normal)

If you are interested, you can check out an introduction to a different way of thinking about the normal curve. The material on this video will not be on any exams in this course.

# Beyond Psy 138: Skewness

Normal curves are symmetrical. However many variables are lop-sided or skewed. Here is an introduction to skewness. The material on this video will not be on any exams in this course.

# Beyond Psy 138: Kurtosis

Typically, the first thing we want to know about a distribution is its central tendency. Second, we want to know about its variability. Then we want to know about its skewness. What is left to know after that? Kurtosis.

# Beyond Psy 138: Standard Scores and Why We Need Them

Although z-scores are very useful, there are good reasons to transform them into other types of standard scores.