|
|
Cross tabulation and the Pearson Chi-Square
Test
Suppose that you have noticed that a lot of
psychology majors are women
with many fewer men. It could be that there are
just more women
enrolled in the university, and so you'd expect
more female psych
majors
than male psych majors. Or, it could be that
there is something about
the psychology
major that attracts women (or repels men?).
Both major and gender are categorical
variables (i.e., nominal variables). And in
this case, we're interested
in whether there is a relationship
between these two categorical variables: major
and gender. The
variables are
measured in categories (thus, categorical
variables). These two
things put us at the bottom left of the
Decision Tree diagram:

1) We're looking for a relationship, and (2)
we have categorical
(nominal,
not interval/ratio) data. If we look at the
bottom of the chart, these
things will lead us to the
Chi-Square test.
One part of this test is a crosstabulation.
Crosstabulation is a statistical technique
used to display a breakdown
of the data by these two variables (that is,
it is a table that has
displays the frequency of different majors
broken down by gender).
The Pearson chi-square test of indenpendence
essentially tells us
whether the results of a crosstabulation are
statistically significant.
That
is, are the two categorical variables independent
(unrelated)
of one another? So basically, the chi square
test is a kind of
correlation test
for categorical variables.
- A chi-square will be significant if the
residuals
(the differences between observed
frequencies and expected frequencies)
for one level of a variable differ as a
function of another variable.
- The chi-square value does not tell us the
nature of
the differences
So for our example, the chi-square test will
tell us whether there are
more female psychology majors than you would
expect by chance (based on
total number of males and females and total
number of people in
different majors).
The Chi-Square Formula

When do we use these methods?
- When we have categorical variables
- Do the percentages match up with how
we thought
they would?
- Are two (or more) categorical
variables
independent?
Hypothesis Testing with Chi-square
- We test the null hypothesis that nothing
interesting is happening (i.e., there is no
relationship) versus
alternative hypothesis that findings are
interesting (i.e., there is a
relationship).
- The null hypothesis can only be rejected
if there
is a .05 or lower probability that our
findings are due to chance
Hypothesis tests determine the extent to
which our findings may be due
to chance
Example
A manufacturer of watches takes a sample
of 200
people. Each person is classified by age
and watch type preference
(digital vs. analog).
The question: is there a relationship
between age and watch preference?
Setup our data in a "cross tabulation" of
our two
variables. The data are observed
frequencies (fo).
|
|
Watch preference |
|
|
digital |
analog |
undecided |
| Age |
under 30 |
90 |
40 |
10 |
| over 30 |
10 |
40 |
10 |
Step 1: State the hypotheses and
select
an
alpha level
H0: In the population,
preference is independent of (NOT
related to) age
Ha: In the population,
preference is related to age
We'll set a =
0.05
Step 2:
- Compute your degrees of freedom
df = (#Columns - 1) * (#Rows - 1)
- Go to Chi-square statistic table and
find the
critical value
For this example, with df = 2, and a =
0.05 the critical chi-squared
value is 5.99
Step 3: Collect your data and compute
your
test statistic
So let's enter the predicted (expected)
values (in green)
into our
crosstabulation.
|
|
Watch preference |
|
|
digital |
analog |
undecided |
| Age |
under 30 |
90
70
|
40
56
|
10
14
|
140 |
| over 30 |
10
30
|
40
24
|
10
6
|
60 |
|
|
100 |
80 |
20 |
|
Part 3: Compute the Chi-squared
statistic
- Find the residuals (fo - fe)
for each cell
- Square these differences
- Divide the squared differences by fe
- Sum the results

So then add them up

Step 4: Compare this computed
statistic
(38.09) against the critical value (5.99)
and make a decision about
your hypotheses
df=(rows-1)*(columns-1) =
(3-1)*(2-1) = 2*1
= 2
The Excel function CHIINV gives us the critical
value of χ2.
χ2
Critical = CHIINV(α,df) = CHIINV(.05, 2) =
5.99
The Excel function CHIDIST
gives us
the p-value of χ2.
p = CHIDIST(χ2
obtained,df) =
CHIDIST(38.09,2) = 0.0000000054
-
- here we reject the H0 and
conclude
that there is a relationship between age
and watch preference
Computing Crosstabs and Chi-squared in SPSS
Choose Analyze, Descriptive
Statistics,
Crosstabs
|
|
Select your categorical
variables
put one in Row and the other in
Column
Click on the Statistics button and
then check
the chi-square option.

|
|
Expected Counts
Expected counts are based on
marginal percentages
Multiply the marginal percentages
together to get the expected
percentage for that cell, then
multiply by N to get expected
counts
Or, have SPSS compute them --
Choose Cells, then check Expected.
Residuals
Difference between expected and
observed counts
Choose Cells, then check
Unstandardized in the Residuals
box.
Standardized Residuals are
distributed as z-scores (they were
divided
by the standard deviation of the
residuals)
|

|
Output:
Here is some sample output looking
at a crosstab of final grade
and review session attendance
from the students.sav file.
- Crosstab shows frequencies of
one variable
for each level of the other
- Count refers to the observed
frequencies
(from the data)
- expected counts are the
expected frequencies
|
 |
Output shows Pearson chi-square
and "Asymp.
Sig." (significance level)
for the crosstab above.
If "Asymp. Sig." is less than .05
then the residuals differ as a
function of the independent variable
- So here the chi square is not
significant
(sig is greater than a =
0.05), so we would
fail to reject the H0.
This means that we are not
rejecting
the hypothesis that final grade
and review session attendance
are
independent (in other words,
there is not a relationship
between the
two variables).
|
|
For some of the questions
in the
lab, you will need this data file students.sav.
Lab 24 Worksheet
Email to your GA when finished.
Use any extra time to complete your homeworks
and your project.
Every computer lab on campus has SPSS. Here
is a complete
list of them.
|
|
|
|