Frequency Distribution Tables
Note:
In the past, many students have found it difficult to complete this lab
in the allotted 50 minutes. You have until Friday at
11:55pm to submit it. Most computers on campus have SPSS installed on
them.
The pictures in this lab come from different computers (Mac &
Windows) running slightly different versions of SPSS. Most things are
the same but there may be a few
cosmetic differences.
Also, a word of advice. You should save all of your assignments that
you email to the GA's so that if there is any question about whether
you completed the work, you can show us the files.
For
this lab we'll use a new data file that includes hypothetical
course grade information. Click on students.sav
and then select "save". You may need
to name the file as "students.sav" so that SPSS will recognize it (all
SPSS data files must have a .sav ending in the name). Put the file
someplace to save it for future use.
After saving the file you should open it up in SPSS. Open SPSS and then
open the data file students.sav from where you saved it.
In this file there are a number of variables. For now
we'll just look at quiz1 and quiz2. Your task for this part of the lab
is to create a frequency distribution table for each of these variables
and to compare them to get a feel for some of the features of
distributions.
Go to SPSS (which should already be open if you
followed the
instructions above). The students.sav file should be open already.
We'll start by looking at quiz1. Think about each of
the following questions.
What's the variable of interest?
scores on quiz 1 - which corresponds to the data in the column labeled
quiz 1
What kind of variable is quiz1?
The quiz scores are numerical, so the variable is quantitative.
What is the most typical score on quiz 1?
What is the range of scores on quiz 1?
Did some people do a lot worse or better than the rest of the class?
Overall, was the quiz easy or hard?
It is hard to answer these last 4 questions just by
looking at the
numbers as they are. Instead we can start using some statistical
procedures to organize the data, to make it easier to understand the
data.
The first thing that we can do is
to "sort" the datafile by quiz1.
By sorting the file you can begin to see the pattern of the
distribution. For example, now it is easy to see what the lowest and
highest scores are (now at the top and bottom of the column). However,
usually just sorting the variable isn't enough.
Another statistical tool to help "see" the distribution is to make a frequency
distribution table.
A frequency distribution is an
organized tabulation of the number of individuals located in each
category on the scale of measurement.
STEP 1: What is the range of responses
(highest and lowest numbers)? The X column has been filled in for you
based on the range of responses.
_________________________________ X f p cf cp 0 1 2 3 4 5 6 7 8 9 10 ________________________________
Scores on quiz1 range from 0 to 10 so we list these values in the X
column starting with the lowest value and listing each value down to
the highest.
STEP 2 Frequencies (f): How many of each did we
get?
Fill in the f column. This is the
frequency of occurrence. For each X value list in the f column next to
it, count how many of those scores were listed in the quiz1 column in
the
students.sav file.
This tells you how many of each response we got. Note
that there may be 0's in the f column if no one got that particular
score.
Notice that if you add up the frequency column, you
get the total number of observations.
S f = N
If you wanted to know what the total of all of the X's was, how would
you do it? The easiest way would be to multiply the (X) & (f)
columns and then add (sum) the results.
S (Xf )
Calculate the sum of all the scores
using this formula.
Now let's work on the other columns in the table.
STEP 3: Percentages (p) How much of the total group
got this value for X? How do you get this information?
p = 100* f / N
Recall that N = the total number of observations.
Fill in the p column for each X
value by caluculating the percentages of all scores from the value you
listed in the f column.
STEP 4: Cumulative
Frequencies (cf)
The first entry in the cf
column is the same as the f
column. The second cf entry is the score above (the first cf entry) plus the second f entry. The third cf entry is the score above (the
second cf entry) plus the
third f entry. Keep repeating this pattern until you end. The last cf
entry should equal N.
Fill in
the cf column in the table by
adding each cf to the next f value, starting with the first
row.
STEP 5: cp The cp column is cumulative
percentage. Basically you do the same thing as the cumulative
frequency but add up the percentages column (p). The cumulative percentages tell
us something about percentiles. Think back to getting your ACT scores.
You may remember something like "your score is in the 76th
percentile. This means that 76% of the people who took the test got
your score or worse.
Notice that the final cp (on the bottom of the chart)
should always equal 100 (because 100% of the people could get the
maximum score or worse).
Fill in the cp column in the table by adding
each cp to the next p value, starting the first row.
From a frequency distribution table you can "see" the
distribution more easily. At a glance you can see what the highest and
lowest scores are, whether some scores are "outside" of the rest (that
is, did a few people really bomb the test or did a few ace it), what
the
most common score was, where most of the scores were, etc.
Answer the
following questions in Blackboard Lab 4:
Look at your finished frequency
distribution table and answer the
following questions:
(1) What percentage of the scores is at or below a score of 7?
(2) Which was the most frequent score in Quiz 1?
(3) What is the cumulative frequency for a score of 4?
(4) What
percent of people scored a 5?
Making Frequency Distribution Tables with SPSS (Note
that the pictures below may look a little different in version 16)
SPSS will also create this table for you. Go to the
"Analyze" menu, select "Descriptive statistics", and within that sub
menu select "Frequencies".

SPSS will then ask you for which variable you want the
table for.

For quiz 1 the frequency table output should look
something like this:

Create a frequency distribution
table
using SPSS for quiz2. Select the table and copy it (Shortcut for Copy
is Control-c. You can also click Edit->Copy). Paste table in your
Word document (Shortcut for Paste is Control-v).
Grouped Frequency Distribution Tables
Use SPSS to
create a grouped frequency table:
First we will
recode the Quiz 1 scores into grouped values.
Step
1: Click Transform at the top of your screen and click Recode into
Different Variables

Step 2: Select Quiz 1 and click
the arrow to
move the quiz1 variable into the white box.

Step
3: Click “Old and New Values.”

Step
4: Select “Range, LOWEST through value” on the left and enter “2.”
Then on
the upper right corner of the box enter “2” where it says “Value.”
Next,
click
“Add.”

Step
5: Enter the rest of the grouped values by first selecting “Range”
on the
left and enter the range between 2 and 4. Enter “4” where it says
“Value” on
the upper right and then click “Add.”

Recode
the ranges for 4-6, 6-8, and 8-10 into the values 6, 8, and 10,
respectively.
Then click “Continue.”

Step 6: Enter the new variable name
“quiz1grouped” in the “Name” box. Then click “Change.”

Click OK.

Step 7: Click “Analyze” at the top
of the
screen and go to “Descriptive Statistics.” Click “Frequencies.”

Step
8: Select “quiz1grouped” at the bottom of the variable list and
move it
into the white box with the blue arrow. Click “OK.”


You should see a table that
looks like this:
|
quiz1grouped
|
|
|
|
Frequency
|
Percent
|
Valid Percent
|
Cumulative
Percent
|
|
Valid
|
2.00
|
3
|
2.9
|
2.9
|
2.9
|
|
4.00
|
12
|
11.4
|
11.4
|
14.3
|
|
6.00
|
18
|
17.1
|
17.1
|
31.4
|
|
8.00
|
30
|
28.6
|
28.6
|
60.0
|
|
10.00
|
42
|
40.0
|
40.0
|
100.0
|
|
Total
|
105
|
100.0
|
100.0
|
|
Finish the table in the word document
that looks like the table below for the variable "percent" which
represents final course grades for the students.sav file. Use the
"Recode" function described above.
________________________________________ X f p cf cp 0-49 50-59 60-69 70-79 80-89 __90-100_________________________________
Answer in Blackboard
(5) What percent of
people earned an A
(90-100)?
(6) What percent of people earned less than 80 percent?
Graphing Frequency Distributions
In the sections we saw that one way to summarize and
simplify an entire distribution of scores is by organizing the scores
in a frequency distribution table. In this section we will learn about
several other ways to represent distributions, focusing primarily on
graphic displays: bar charts, histograms, and stem-and-leaf plots.
The charts from this lab
will mostyly be found under Graphs-->Legacy
Dialogs in SPSS 15 or just under the Graphs menu in SPSS 14. In the
Graphs-->Chart Builder menu there is a new and intuititve method of
making charts that you can also try. You might also try the Interactive
Charts under the Graphs menu. The figures come out slightly different
but some people find them more intuitive than the other options.
Bar graphs
To display the distribution of a categorical variable
one should use a bar graph (pie charts are also used, but we won't be
discussing these in this lab). Within SPSS, there are a number of
different kinds of bar charts that it will make (simple, clustered, and
stacked), we'll focus on simple and clustered.
| Bar chart: (simple, clustered, and
stacked): These are used most often to display the distribution of
subjects or cases in certain categories, such as the number of A, B, C,
D, and F grades in a given class. |
Let's start with looking at the distribution of
ethnicity in our
students.sav datafile. So what our graph will show are the counts
(or frequency) for each of ethnic category.
| Step 1: First select bar graph from
the menu.
Step 2: Then select "simple" from the
bar chart box.

|
Step 3: Then click define.
Step 4: Then select your variable and
insert it into the category field.

|
|
You should get a bar chart that looks something
like this. 
|
Bar charts are also useful for presenting
distributions that are "broken into" different categories.
For example suppose that we wanted to know the mean
scores (basically the arithmatic average, we'll talk more about means
next week) on quiz 1 (so these scores are a response variable) broken
down by the three different sections (our categorical response
variable).
Step 1: We'd select bar graph, select simple,
but then we need to make some different selections in the bar graph
window.
Step 2: We need to click on "other summary
function", and then select the variable for which we want to plot the means
(quiz 1 in this case). The default summary function is mean. Then we
need to put the category variable in the category field.

We should end up with a graph that looks like this.

Suppose that we want to look at the same means by
section but broken down by ethnicity. To do this we must use a clustered
bar graph.
So select bar graph, then chose clustered. Now enter
things as we did in the example above, except we must also select
ethnicity for the 'cluster bars by' field.

We should end up with a graph that looks like this.
Make a bar graph of the counts of
the final grades (called "grades" in the file) in the class (i.e. A, B,
C,...) and paste it into the Word document.
Make a bar graph of the counts of the final grades in the class
(i.e. A, B, C,...), further broken down by whether they attended the
review session or not.
and paste it into the Word document.
Answer in Blackboard
(7) Based on the graph, would you
conclude that
attending the review session had an impact on final grades?
Histograms
Suppose that we wish to know how the students did on
quiz 1. We could try looking at all of the scores, but that's a lot of
numbers. Instead, it is better to try to look at the entire
distribution, rather than all of the individual scores. In the last lab
we did this by creating frequrency distribution tables. Another way to
do it is to construct a histogram to represent the entire distribution.
We should use a histogram because our variable (score on quiz1) is a
continuous variable.
| Histogram: A histogram is a pictorial
representation of the distribution of values for a particular variable.
The bars represent the number of occurances of each value. These look
similar to bar graphs except they are used more often to indicate the
number of subjects or cases in ranges of values for a continuous
variable, such as the number of subjects or cases in ranges of values
for a continuous variable. |
Using SPSS to create a histogram:
Creating a histogram of the students scores on quiz1.
Step 1.At the top of the data window is a row
of menus. To make graphs we will use the 'graphs' menu. Click on the
graphs menu.
Step 2.Under this menu a large number of
graphing options will appear. On the bottom third of the list is
'histogram'. This is the option that we'll use to look at distributions
(for this lab at least).
Step 3.Select histogram. Now you'll get a
window that looks like this:

Step 4Select 'quiz 1' as your variable
and then click okay.
This should result in a new window (the output
window) opening up, and it should have your histogram in it.
| The histogram of quiz 1 is basically just a
picture of the frequency distribution table. Below is a frequency
distribution table and a histogram for quiz 1.

|
| For quiz 1 the frequency table output should
look something like this:

|
In this case the histogram is a little different than you might expect
after comparing it to the frequency distribution table above. Why?
Because, the above histogram is based on a Grouped frequency
distribution table of quiz 1 (see previous lab for discussion). Go
ahead and group scores 10 & 9, 8 & 7, 6&5, etc. and see if
now the histogram looks as you'd expect it would.
An important lesson from this is that the size of
the interval that you plot may influence the overall shape of the
histogram.
Using SPSS
create a histogram for quiz 5. and paste it into the
Word document.
Stem and leaf plots
There is also another kind of plot, other than a histogram, that
provides a quick and easy way to "see" the distribution. Stem and leaf
plots (or displays).
| Stem and leaf displays - These displays
break each number down into a lef part called the stem and a right part
called the leaf. If numbers are two digits, then the left digit is the
stem and the right digit is the leaf. |
|
For example: weight of students in Psych
820 course
|
Data Set:
90, 110, 112, 118, 120,
130, 130, 140, 140, 140,
145, 145, 145, 150, 175,
176, 185, 205, 205, 220,
225, 240
|
9 | 0 10 | 11 | 028 12 | 0 13 | 00 14 | 00055 15 | 0 16 | 17 | 56 18 | 5 19 | 20 | 55 21 | 22 | 05 23 | 24 | 0
|
| Using this procedure we get a
picture that looks as if it is a histogram rotated 90 degrees, however,
unlike with histograms, we can recover all of the individual data
points |
Let's look at our quiz1 data. Since the values for the
variables were 1-10, the stems are the integers, and the leaves are the
numbers after the decimal place (and since there was no partial credit
on the quiz, all of these numbers are zeros).
0 | 00 1 | 2 | 0 3 | 000000 4 | 000000 5 | 000000 6 | 000000000000 7 | 00000000000000 8 | 0000000000000000 9 | 00000000 10 | 0000000000000000000000000000000000
SPSS will also create a stem and leaf plot for you.
step 1:Go to the "Analyze" menu, select
"Descriptive statistics"
step2 :within that sub menu select "Explore".

SPSS will then ask you for which variable you want the
table for.

Then you'll need to click the plot button and make
sure that stem and leaf is clicked. 
For quiz 1 the frequency table output should look
something like this:

Using SPSS create a stem and
leaf plot for quiz 2 and paste it into the
Word document.
Submit your Blackboard answers. If
you
get any questions wrong, you have 1 more chance to correct them.
Save your Word
document and email it
as an attachment to your GA.
|