SPSS is one of many software packages that are useful for data
analysis. SPSS is, by far, the most popular program among
psychologists. We will use Version 16.0. Version 18 has been released
recently but the University is still using Version 16 (What happened to
17?). I started with version 6.0 in 1996. The program is considerably
better now and continues to improve but the basics are the same and are
likely to remain so for a long time.
For whatever reason, approximately 75% of my students think that saying
"SPSS" is a mouthful and shorten the name to "SPS" (or SSP, PSP, SSPS,
SP, and many other variants). Be among the 25% to get it right and I
will think that you are a very bright and conscientious student with a
special flair for statistics! Recently, the name of the program changed
from SPSS to PASW. This is not an improvement, IMHO. I will probably
continue to refer to the program as "SPSS" for a long time.
Before you can learn to read, you need to work on your ABC's. Before
you can learn to analyze data, you need to learn to create a data file.
"Data file" refers to a computer file with organized data (BTW, The
word "data" is plural for "datum" and thus it is proper to use the word
like this: "The data are inconclusive." or "There are no data to
support your conclusion." Fuddy-duddies like me talk like this but
recently usage notes in dictionaries have begun to allow sentences like
"The data is inconclusive. " and "There is no data to support your
conclusion." I am not bothered by this usage but many of my fellow
fuddy-duddies will think you uninformed and possibly unintelligent if
you talk like that. So if you want to avoid embarrassment in
professional settings, stick to the traditional rule of using "data" as
a plural noun. Of course, then you'll sound like a fuddy-duddy. Thus,
it is safer to circumvent the problem by saying things like, "The
results are inconclusive, given the data we have." or "I haven't seen
any data that would support your conclusion." Better yet, maintain your
smart and cool image by avoiding the word entirely by saying, "I'm not
sure if we can conclude anything yet." or "I don't think you're right.
Do you have any evidence?").
Go ahead and open SPSS. There should be a shortcut icon for SPSS on
your desktop. Sometimes it is slow to load (it is a big, powerful
program). Click "Cancel" (or "Type in data") if you see this:
You should see an empty datafile that looks like this:
Click on the "Variable View" tab at the bottom left (I circled it in
red on the image above.). Now your screen should look like this:
You are now ready to create a dataset.
Naming your variables
Let's say we have data from participants in a clinical trial for a new
medical treatment. We have information about the person's name, sex,
age, and annual income. In most datasets, each person has a unique
identification number of some sort (e.g., social security number). This
prevents problems that arise when people have the exact same name.
Let's call the first variable ParticipantID.
Go ahead and type "ParticipantID"
(without the quotes) on the first row in the "Name" column, change the
"Decimals" value from 2 to 0, and type "Participant ID" in the "Label"
column. It should now look like this:
The ParticipantID variable will be a numeric variable (i.e., a variable
that holds numbers instead of text, dates, or other kinds of values). I
had you set the "Decimals" column to 0 so that values would be
displayed as integers rather than more precise values like 2.14. The
"Label" column helps you describe what the variable is and it allows
characters that the "Name" column won't. Labels can be up to 255
characters long. I recommend labeling all variables. It takes extra
time at first but will save you time during data analysis because you
won't have to figure out what your variables are each time.
Here are some restrictions on variable
names (Reading these might save you future heartache but there is no
need to memorize them):
1. Variable names typically must begin with a letter (there are some
exceptions to this rule but that topic is more advanced than anything
you'll need for this class).
2. Variable names can have uppercase letters, lowercase letters,
numbers, and these characters: _ . $ # @. For example, A._$@#1 is a
valid variable name (this might be a good name for a variable that
contains a curse word!).
3. Variable names cannot have spaces.
4. Variable names cannot have names longer than 64 characters. In the
old days, variable names could not be longer than 8 characters. Such
are the fruits of progress!
4. The following cannot be variable names: ALL, AND, BY, EQ, GE, GT,
LE, LT, NE, NOT, OR, TO, WITH. However any change at all to these words
can be used as a variable name. For example "all" is impossible but
"all1" is perfectly fine.
5. It is possible but not recommended to have variable names that end
in a period or an underscore: _ . This can cause problems with syntax
("Syntax" refers to an SPSS-specific programming language that many
people, including me, use to run SPSS in a more efficient manner. We
won't be using syntax in this class, though.).
Suggestions about variable names
1. The variable name should be a description of the variable, if
possible. If the variable can't be succinctly described in the name, be
sure to describe it in the "Label" column. I promise you that otherwise
you will forget what the variable is if you put the dataset away and
then return to it months or years later. If you have a longitudinal
study of couples, one of your variables might be "DepressionWifeTime1"
and you can label the variable "Wife's level of depression at the
beginning of the study".
2. Periods and underscores are useful to show that some variables are
grouped together. for example, if you measure the height of children
twice a year for 3 years starting in 2007, you could name the variables
like this:
height.2007.1
height.2007.2
height.2008.1
height.2008.2
height.2009.1
height.2009.2
Variable Types:
The 4 variable types used in this course will be
1. Numeric. A variable whose values are numbers and are displayed in
standard format.
2. String. A variable whose values are text. Uppercase and lowercase
letters are considered distinct. String variables are also known as
alphanumeric variables.
3. Date. A numeric variable whose values are displayed as a date. There
are many different date formats available.
4. Dollar. A numeric variable displayed with a leading dollar sign ($).
When entering data, you don't need to type the dollar sign.
There are other variable types, including scientific notation, custom
currency, comma, and dot. Check the Help menu in SPSS for explanations
if you think you might need them (you won't in this course).
Let's make another number variable.
On line 2, enter "Age" in the "Name"
column. In the "Label" column enter "Participant Age (in years)".
For adults, we would probably enter age as an integer but we would want
more precision for children, especially infants. So 2.50 would mean
that the person is 2 and a half years old. Let's leave the "Decimals"
as 2, which is the default for all numeric variables.
Let's make 2 string variables.
On
line 3, enter "LastName" and on line 4 enter "FirstName". These
contain the last and first names of the participant in the study. By
default, new variables are assumed to be numeric.
Change the variable type of LastName by
selecting the cell in the "Type" column. Click the gray box with the 3
dots that appears on the right side of the box. Now select String and
enter 30 in the "Characters" box. The "Characters" box specifies
how many characters can fit in the variable. If you have a name with
more than 30 characters, you would need to enter a higher number.
Repeat this process for the FirstName
variable.
Let's make a dollar variable.
One
line 5, enter "Income". Change the variable type to "Dollar".
The display can be formatted in several ways but you don't need to
select any of the options.
Participant gender could be coded as a string variable and you would
just enter "Male" and "Female" for each person. This would be okay for
small datasets but it is generally a better idea to have a code number
for male and another code number for female. This saves time during
data entry, makes the size of the file smaller, and makes the analyses
run faster. Most researchers use either 1 and 2 or 0 and 1 for
categorical variables like gender. However, this is merely a
convention. You could choose anything you like when you create your own
research data. What is important is that you remember which number
corresponds to which sex. You do this by using the "Values" column.
Enter "Sex" on line 6. Change the
"Decimals" to 0. Enter "Participant Sex" in the "Label" column. Select
the "Values" cell on line 6. Click the gray box on the right side of
the cell. Enter 1 in the "Value" box. Enter "Male" in the "Label" box.
Click Add. Enter 2 in the "Value" box. Enter "Female" in the "Label"
box. Click Add. Click Okay.
This is what you should have so far.
Add in the missing labels as shown
below.
If all I had were these variables, I probably wouldn't have written
"Participant" in each of the labels because it is obvious what the
variables are. However, in some datasets it is good to specify. For
example, if you were to also measure the age of the participants'
spouses and children, you wouldn't want there to be any ambiguity in
any of your printouts. In general, it is better to be too detailed than
to be not detailed enough.
Saving your dataset
1. Save your dataset often. I have had heartbreaking events happen
because I failed to save my data often. I've had power outages,
software crashes, computer failures, roommate interference, pet
interference, and my own general stupidity cause me to have re-do hours
of work. Save often. A lot. Frequently. Really. I'm not kidding.
Hitting the ctrl-S shortcut key is a quick and easy way to save data in
SPSS (and most other programs).
2. It is a good idea to put the date in the name of the file. In this
course, your data will be neat and clean. In real data analysis, you
often have multiple copies of similar datasets so it is helpful to know
which is the most recent one. Note that dates cannot have slashes
(e.g., 1/16/2008) in file names because Windows interprets slashes as
folders.
3. Name your dataset something descriptive rather than "Data" or
something like that. If the study is about the effect of journaling on
stress related illnesses, call it "Journaling and Stress 1-16-2008".
Click File. Click Save (or click
the Save button or press the ctrl-S shortcut keys). Name your dataset
"Lab 2" followed by your section number, your last name, and today's
date. If your name is Jones and you are in section 2, save the
file as "Lab 2 Section 2 Jones 1-16-2007". This will help your GA know
whose file is whose. Note section meeting times:
12:00 Section 1
1:00 Section 2
2:00 Section 3
3:00 Section 4
Enter your data.
With real data analysis, datasets are often very large. We will
start small with only 5 people.
Click the "Data View" tab at the
bottom. In the "Variable View" page, each row is a variable. In
the "Data View" page, each row is a person and each column is a
variable. This is a little tricky at first but you'll get used to it
soon.
Here are the data:
Participant 1: Franz Ardle, Age 34, makes $48,000 per year, male
Participant 2: Julie Barnes, Age 50, makes $79,000 per year, female
Participant 3: Maria Chamorro, Age 22, makes $26,000 per year, female
Participant 4: Wynona David, Age 18, makes $5,600 per year, female
Participant 5: Zachary Franco, Age 41, makes $40,000 per year, male
Remember that "male" and "female" are entered as 1 and 2. Your data
should look like this:
Save your file again and email it
as an attachment to your GA.
Put "Lab 2", your name, and
section number in the subject line. If you are in Section 4 and
your name is Fred Jones, your subject line should be "Lab 2 Fred Jones
Section 4".
You've made your first dataset in SPSS! Congratulations!