SPSS allows us to select part of the data set for further analysis,
while excluding the remaining cases from these analyses. The procedure is
found by choosing **Select** from the **Data Menu**.

We then have several "Select" options within the dialogue box that comes
up so we can tell SPSS which data to select and which to ignore. The
select dialogue box looks like this:

First, we have to specify how to select data and which data to retain for the analyses:

**All cases**This option actually turns off any previous selection and uses all data in the file. Click on this radio button and then click on the OK button.

**If condition is satisfied**This option allows us to specify a rule based on values of variables; all cases that meet the criteria are retained. After clicking on the radio button for this option, we click on the "If..." button to bring up an additional dialogue box sp we can define the rule or rules for including or excluding data:

We want to specify a condition in the right box based on one or more variables listed in the box to the left. Click on the HELP button for definitions of each of the "calculator" buttons in this dialogue box. For instance, if we have a variable for gender coded as 1 for women and 2 for men, we can select only the women for analysis by selecting "gender=1" as shown below:

**Random sample of cases**This option allows us to select a specified proportion of the cases or a fixed number of cases at random. After selecting the corresponding radio button, we click on the "Sample" button to bring up a dialogue box that allows us to specify either an approximate percentage of cases or to sample an exact number of cases at random.

**Based on time or case range**This option allows us to specify a range of cases based on the internal case number. Thus, by typing in "4" and "8", we can select only cases 4, 5, 6, 7, and 8.

**Use filter variable**This option uses a special variable type (a filter variable that has values of 1 or 0 only: the cases with a 1 are included, and the cases with 0 on the filter variable are excluded.

We also have to tell SPSS what to do with the unselected data. SPSS
can either filter it or delete it. If we choose to delete the unselected
data, those cases not meeting the criteria specified above will be **deleted
and cannot be recovered!** If we choose to filter the unselected data,
then the data will **not** be deleted, but SPSS will ignore the data in
any and all analyses until the filter is "turned off" by selecting the
**All cases** option described above. This filtering option,
therefore, is far safer than the deleting option.

After we have selected one of the radio buttons for the selection
method and after we have selected one of the radio buttons for handling
unselected data, clicking on the **OK** button will perform the
selection. If we have chosen to filter unselected data, cases that are not
being used with have a slash through the case number

The split file option from the **Data** menu works similarly to the
select option. The difference, though, is that we use the split function
to repeat the same analyses, separatel, on multiple groups. For instance,
if we want to compute descriptive statistics of men and women separately,
we would select the spit option from the **Data** menu.

Then, we would click on the radio button for "Organize output by
groups":

Then, we would select the gender variable in the left field and click
the right arrow to move it to the right field:

After clicking on the **OK** button, we would see the "Split file
on" message in the lower right-hand corner of the SPSS data window

Finally, any further analyses we run will be run separately for men and
for women until we turn off the split file option by selecting "Analyze
all cases -- do not split groups":

Sometimes we need to let some cases have more "weight" in the analyses because we under or over sampled from a group. We can determine a set of weights, one weight for each case so that the groups would more closely resemble the proportions we had hoped to sample. Alternatively, sometimes there is a single variable in a data set that represents the number of occurences of a behavior or a frequency. We can instruct SPSS to treat such a variable as a case weight so that we can create shorter data sets for frequency data without having to enter a separate case for each observation. You may want to refer to this section after chi-square procedures are discussed in class.

Select the **Weight cases** item of the **Data** menu:

In the dialogue box that pops up, select the option for "Weight cases
by":

Click on the weighting variable in the left field that contains frequencies or optimal
case-weight information, click on the right arrow to move that variable
into the right field:

After we click on the **OK** button, the "Weight on" message appears
in the lower right hand corner of the SPSS data window.

When we select the **Weight** option from the **Data** menu and
select the "Do not weight cases" option and click "OK", then cases will no
longer be weighted and the weight on message disappears.

SPSS has very powerful capabilities for creating new variables as a
function of existing variables. For instance, we can use these functions
to create averages of existing variables, to rescale existing variables, or
to compute difference scores by subtracting one variable from another. To
do so, we select the **Compute** option from the **Transform**
menu:

Selecting this option will bring up the compute dialogue box:

First, we need to supply a name for the target variable (i.e., the new
variable SPSS will create to contain the new values. For example, we may
want to create a new variable to report the number of minutes studied
rather than the number of hours spent studying. Thus, we would name the
new variable "minutes":

The next step is to define for SPSS how the new values should be
computed, essentially giving SPSS a formula. To convert hours to minutes,
we should multiply the studyhrs variable times 60. Thus, we type
"studyhrs*60" in the numeric expression field:

After we have clicked on the **OK** button, the new variable
"minutes" is created:

- Create a data set for the following data:

Group Hw1 Hw2 Hw3 expt 92 84 93 expt 77 84 85 expt 87 86 81 expt 89 90 93 expt 64 73 78 control 81 84 93 control 83 90 91 control 84 88 86 control 82 80 78 control 96 91 88 - Select only the experimental group.
- Select approximately 50% of the sample.
- Select only cases 2 through 5.
- Split the file to analyze the experimental group and the control group separately.
- Compute a new variable to represent the SUM of all three homework scores.
- Compute a new variable to represent the average of all three homework scores.