Lab 4

Frequency distributions

Download Lab 4 Worksheet


Frequency Distribution Tables


Note: In the past, many students have found it difficult to complete this lab in the allotted 50 minutes. You have until Friday at 11:55pm to submit it. Most computers on campus have SPSS installed on them.

The pictures in this lab come from different computers (Mac & Windows) running slightly different versions of SPSS. Most things are the same but there may be a few cosmetic differences.

Also, a word of advice. You should save all of your assignments that you email to the GA's so that if there is any question about whether you completed the work, you can show us the files.


For this lab we'll use a new data file that includes hypothetical course grade information.
Click on students.sav and then select "save". You may need to name the file as "students.sav" so that SPSS will recognize it (all SPSS data files must have a .sav ending in the name). Put the file someplace to save it for future use. After saving the file you should open it up in SPSS. Open SPSS and then open the data file students.sav from where you saved it.

In this file there are a number of variables. For now we'll just look at quiz1 and quiz2. Your task for this part of the lab is to create a frequency distribution table for each of these variables and to compare them to get a feel for some of the features of distributions.

    Go to SPSS (which should already be open if you followed the instructions above). The students.sav file should be open already.

    We'll start by looking at quiz1. Think about each of the following questions.

      What's the variable of interest?
        scores on quiz 1 - which corresponds to the data in the column labeled quiz 1
      What kind of variable is quiz1?
        The quiz scores are numerical, so the variable is quantitative.
      What is the most typical score on quiz 1?
      What is the range of scores on quiz 1?
      Did some people do a lot worse or better than the rest of the class?
      Overall, was the quiz easy or hard?

    It is hard to answer these last 4 questions just by looking at the numbers as they are. Instead we can start using some statistical procedures to organize the data, to make it easier to understand the data.

    The first thing that we can do is to "sort" the datafile by quiz1.

      Step 1: To do this select "sort cases" under the "data" menu.


      Step 2: Then select "quiz1" for the sort variable field.

    By sorting the file you can begin to see the pattern of the distribution. For example, now it is easy to see what the lowest and highest scores are (now at the top and bottom of the column). However, usually just sorting the variable isn't enough. Another statistical tool to help "see" the distribution is to make a frequency distribution table.

    A frequency distribution is an organized tabulation of the number of individuals located in each category on the scale of measurement.

    STEP 1: What is the range of responses (highest and lowest numbers)? The X column has been filled in for you based on the range of responses. 

    _________________________________
    X f p cf cp
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    ________________________________
    Scores on quiz1 range from 0 to 10 so we list these values in the X column starting with the lowest value and listing each value down to the highest.

    STEP 2 Frequencies (f): How many of each did we get?

    Fill in the f column. This is the frequency of occurrence. For each X value list in the f column next to it, count how many of those scores were listed in the quiz1 column in the students.sav file.

    This tells you how many of each response we got. Note that there may be 0's in the f column if no one got that particular score.

    Notice that if you add up the frequency column, you get the total number of observations.
    S f = N


    If you wanted to know what the total of all of the X's was, how would you do it? The easiest way would be to multiply the (X) & (f) columns and then add (sum) the results.
    S (Xf )

    Calculate the sum of all the scores using this formula.


    Now let's work on the other columns in the table.

    STEP 3: Percentages (p) How much of the total group got this value for X? How do you get this information?
    p = 100* f / N

    Recall that N = the total number of observations.

    Fill in the p column for each X value by caluculating the percentages of all scores from the value you listed in the f column.

STEP 4: Cumulative Frequencies (cf)
The first entry in the cf column is the same as the f column. The second cf entry is the score above (the first cf entry) plus the second f entry. The third cf entry is the score above (the second cf entry) plus the third f entry. Keep repeating this pattern until you end. The last cf entry should equal N.

Fill in the cf column in the table by adding each cf to the next f value, starting with the first row.

    STEP 5: cp The cp column is cumulative percentage. Basically you do the same thing as the cumulative frequency but add up the percentages column (p). The cumulative percentages tell us something about percentiles. Think back to getting your ACT scores. You may remember something like "your score is in the 76th percentile. This means that 76% of the people who took the test got your score or worse.

    Notice that the final cp (on the bottom of the chart) should always equal 100 (because 100% of the people could get the maximum score or worse).

    Fill in the cp column in the table by adding each cp to the next p value, starting the first row.

    From a frequency distribution table you can "see" the distribution more easily. At a glance you can see what the highest and lowest scores are, whether some scores are "outside" of the rest (that is, did a few people really bomb the test or did a few ace it), what the most common score was, where most of the scores were, etc.

    Answer the following questions in Blackboard Lab 4:

    Look at your finished frequency distribution table and answer the following questions:
    (1) What percentage of the scores is at or below a score of 7?
    (2) Which was the most frequent score in Quiz 1?
    (3) What is the cumulative frequency for a score of 4?
    (4) What percent of people scored a 5?


Making Frequency Distribution Tables with SPSS (Note that the pictures below may look a little different in version 16)


SPSS will also create this table for you. Go to the "Analyze" menu, select "Descriptive statistics", and within that sub menu select "Frequencies".

SPSS will then ask you for which variable you want the table for.

For quiz 1 the frequency table output should look something like this:

Create a frequency distribution table using SPSS for quiz2. Select the table and copy it (Shortcut for Copy is Control-c. You can also click Edit->Copy). Paste table in your Word document (Shortcut for Paste is Control-v).


Grouped Frequency Distribution Tables


Use SPSS to create a grouped frequency table:

 

First we will recode the Quiz 1 scores into grouped values.

 

            Step 1: Click Transform at the top of your screen and click Recode into Different Variables

 

 

Step 2: Select Quiz 1 and click the arrow to move the quiz1 variable into the white box.

 

            Step 3:  Click “Old and New Values.”

 

 

            Step 4: Select “Range, LOWEST through value” on the left and enter “2.” Then on the upper right corner of the box enter “2” where it says “Value.” Next, click “Add.”

 

 

 

            Step 5: Enter the rest of the grouped values by first selecting “Range” on the left and enter the range between 2 and 4. Enter “4” where it says “Value” on the upper right and then click “Add.”

 

Recode the ranges for 4-6, 6-8, and 8-10 into the values 6, 8, and 10, respectively. Then click “Continue.”

 

 

 

 

Step 6: Enter the new variable name “quiz1grouped” in the “Name” box. Then click “Change.”

 

            Click OK.

 

 

Step 7: Click “Analyze” at the top of the screen and go to “Descriptive Statistics.” Click “Frequencies.”

 

            Step 8: Select “quiz1grouped” at the bottom of the variable list and move it into the white box with the blue arrow. Click “OK.”

 

 

 

 

 

You should see a table that looks like this:

 

 

quiz1grouped

 

 

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

2.00

3

2.9

2.9

2.9

4.00

12

11.4

11.4

14.3

6.00

18

17.1

17.1

31.4

8.00

30

28.6

28.6

60.0

10.00

42

40.0

40.0

100.0


Total

105

100.0

100.0

 

 

Finish the table in the word document that looks like the table below for the variable "percent" which represents final course grades for the students.sav file. Use the "Recode" function described above. 

________________________________________
X f p cf cp
0-49
50-59
60-69
70-79
80-89
__90-100_________________________________

Answer in Blackboard
(5) What percent of people earned an A (90-100)?
(6) What percent of people earned less than 80 percent?

Graphing Frequency Distributions

In the sections we saw that one way to summarize and simplify an entire distribution of scores is by organizing the scores in a frequency distribution table. In this section we will learn about several other ways to represent distributions, focusing primarily on graphic displays: bar charts, histograms, and stem-and-leaf plots.


The charts from this lab will mostyly be found under Graphs-->Legacy Dialogs in SPSS 15 or just under the Graphs menu in SPSS 14. In the Graphs-->Chart Builder menu there is a new and intuititve method of making charts that you can also try. You might also try the Interactive Charts under the Graphs menu. The figures come out slightly different but some people find them more intuitive than the other options.


    Bar graphs

    To display the distribution of a categorical variable one should use a bar graph (pie charts are also used, but we won't be discussing these in this lab). Within SPSS, there are a number of different kinds of bar charts that it will make (simple, clustered, and stacked), we'll focus on simple and clustered.

    Bar chart: (simple, clustered, and stacked): These are used most often to display the distribution of subjects or cases in certain categories, such as the number of A, B, C, D, and F grades in a given class.

    Let's start with looking at the distribution of ethnicity in our students.sav datafile. So what our graph will show are the counts (or frequency) for each of ethnic category.

    Step 1: First select bar graph from the menu.

    Step 2: Then select "simple" from the bar chart box.

    Step 3: Then click define.

    Step 4: Then select your variable and insert it into the category field.

    You should get a bar chart that looks something like this.


    Bar charts are also useful for presenting distributions that are "broken into" different categories.

    For example suppose that we wanted to know the mean scores (basically the arithmatic average, we'll talk more about means next week) on quiz 1 (so these scores are a response variable) broken down by the three different sections (our categorical response variable).

    Step 1: We'd select bar graph, select simple, but then we need to make some different selections in the bar graph window.

    Step 2: We need to click on "other summary function", and then select the variable for which we want to plot the means (quiz 1 in this case). The default summary function is mean. Then we need to put the category variable in the category field.

    We should end up with a graph that looks like this.

    Suppose that we want to look at the same means by section but broken down by ethnicity. To do this we must use a clustered bar graph.

    So select bar graph, then chose clustered. Now enter things as we did in the example above, except we must also select ethnicity for the 'cluster bars by' field.

    We should end up with a graph that looks like this.

        Make a bar graph of the counts of the final grades (called "grades" in the file) in the class (i.e. A, B, C,...) and paste it into the Word document.
       
        Make a bar graph of the counts of the final grades in the class (i.e. A, B, C,...), further broken down by whether they attended the review session or not.
and paste it into the Word document.

Answer in Blackboard
      (7) Based on the graph, would you conclude that attending the review session had an impact on final grades?


    Histograms

    Suppose that we wish to know how the students did on quiz 1. We could try looking at all of the scores, but that's a lot of numbers. Instead, it is better to try to look at the entire distribution, rather than all of the individual scores. In the last lab we did this by creating frequrency distribution tables. Another way to do it is to construct a histogram to represent the entire distribution. We should use a histogram because our variable (score on quiz1) is a continuous variable.

    Histogram: A histogram is a pictorial representation of the distribution of values for a particular variable. The bars represent the number of occurances of each value. These look similar to bar graphs except they are used more often to indicate the number of subjects or cases in ranges of values for a continuous variable, such as the number of subjects or cases in ranges of values for a continuous variable.

    Using SPSS to create a histogram:

      Creating a histogram of the students scores on quiz1.
        Step 1.At the top of the data window is a row of menus. To make graphs we will use the 'graphs' menu. Click on the graphs menu.

        Step 2.Under this menu a large number of graphing options will appear. On the bottom third of the list is 'histogram'. This is the option that we'll use to look at distributions (for this lab at least).

        Step 3.Select histogram. Now you'll get a window that looks like this:

        Step 4Select 'quiz 1' as your variable and then click okay.

        This should result in a new window (the output window) opening up, and it should have your histogram in it.

    The histogram of quiz 1 is basically just a picture of the frequency distribution table. Below is a frequency distribution table and a histogram for quiz 1.

    For quiz 1 the frequency table output should look something like this:

    In this case the histogram is a little different than you might expect after comparing it to the frequency distribution table above. Why?
      Because, the above histogram is based on a Grouped frequency distribution table of quiz 1 (see previous lab for discussion). Go ahead and group scores 10 & 9, 8 & 7, 6&5, etc. and see if now the histogram looks as you'd expect it would.

      An important lesson from this is that the size of the interval that you plot may influence the overall shape of the histogram.

      Using SPSS create a histogram for quiz 5. and paste it into the Word document.


    Stem and leaf plots

    There is also another kind of plot, other than a histogram, that provides a quick and easy way to "see" the distribution. Stem and leaf plots (or displays).
    Stem and leaf displays - These displays break each number down into a lef part called the stem and a right part called the leaf. If numbers are two digits, then the left digit is the stem and the right digit is the leaf.

    For example: weight of students in Psych 820 course
    Data Set:
      90, 110, 112, 118, 120,
      130, 130, 140, 140, 140,
      145, 145, 145, 150, 175,
      176, 185, 205, 205, 220,
      225, 240
     9  |  0
    10 |
    11 | 028
    12 | 0
    13 | 00
    14 | 00055
    15 | 0
    16 |
    17 | 56
    18 | 5
    19 |
    20 | 55
    21 |
    22 | 05
    23 |
    24 | 0
    Using this procedure we get a picture that looks as if it is a histogram rotated 90 degrees, however, unlike with histograms, we can recover all of the individual data points

    Let's look at our quiz1 data. Since the values for the variables were 1-10, the stems are the integers, and the leaves are the numbers after the decimal place (and since there was no partial credit on the quiz, all of these numbers are zeros).

      Quiz 1
      0   |	00
      1 |
      2 | 0
      3 | 000000
      4 | 000000
      5 | 000000
      6 | 000000000000
      7 | 00000000000000
      8 | 0000000000000000
      9 | 00000000
      10 | 0000000000000000000000000000000000
    SPSS will also create a stem and leaf plot for you.

    step 1:Go to the "Analyze" menu, select "Descriptive statistics"
    step2 :within that sub menu select "Explore".

    SPSS will then ask you for which variable you want the table for.

    Then you'll need to click the plot button and make sure that stem and leaf is clicked.

    For quiz 1 the frequency table output should look something like this:

      Using SPSS create a stem and leaf plot for quiz 2 and paste it into the Word document.

    Submit your Blackboard answers. If you get any questions wrong, you have 1 more chance to correct them.

    Save your Word document and email it as an attachment to your GA.