NOTE: The output below was produced using SPSS version 15. The commands should work with earlier versions of SPSS (back to version 7.5).

NOTE: Although commands are show in ALL CAPS, this is not necessary. We follow the SPSS convention of doing this to make clear which parts of the syntax are SPSS commands, subcommands or keywords, and which parts are variable names (shown in lower case letters). SPSS is not case sensitive, so use whichever case is easiest for you.

NOTE: In some examples, the name of the command or subcommand has been shortened to only three letters. This is done to remind users that the full command name does not need to be provided. Users can use either the abbreviations or the full command name.

## 1. Introduction and description of data

We will present sample programs for some basic
statistical tests in SPSS, including t-tests, chi square, correlation, regression,
and analysis of variance. These examples use the **auto** data
file. The program below reads the data and creates a temporary SPSS data file.
(In order to demonstrate how these commands handle missing values, some of the values of
**mpg**
have been set to be missing for the AMC cars. This differs from the
data files for other modules where the AMC cars have valid data for **mpg**.)

DATA LIST FIXED/ make (A17) price 19-23 mpg 25-26 rep78 28 hdroom 30-32 trunk 34-35 weight 37-40 length 42-44 turn 46-47 displ 49-51 gratio 53-56 foreign 58 . BEGIN DATA. AMC Concord 4099 3 2.5 11 2930 186 40 121 3.58 0 AMC Pacer 4749 3 3.0 11 3350 173 40 258 2.53 0 AMC Spirit 3799 3.0 12 2640 168 35 121 3.08 0 Audi 5000 9690 17 5 3.0 15 2830 189 37 131 3.20 1 Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1 BMW 320i 9735 25 4 2.5 12 2650 177 34 121 3.64 1 Buick Century 4816 20 3 4.5 16 3250 196 40 196 2.93 0 Buick Electra 7827 15 4 4.0 20 4080 222 43 350 2.41 0 Buick LeSabre 5788 18 3 4.0 21 3670 218 43 231 2.73 0 Buick Opel 4453 26 3.0 10 2230 170 34 304 2.87 0 Buick Regal 5189 20 3 2.0 16 3280 200 42 196 2.93 0 Buick Riviera 10372 16 3 3.5 17 3880 207 43 231 2.93 0 Buick Skylark 4082 19 3 3.5 13 3400 200 42 231 3.08 0 Cad. Deville 11385 14 3 4.0 20 4330 221 44 425 2.28 0 Cad. Eldorado 14500 14 2 3.5 16 3900 204 43 350 2.19 0 Cad. Seville 15906 21 3 3.0 13 4290 204 45 350 2.24 0 Chev. Chevette 3299 29 3 2.5 9 2110 163 34 231 2.93 0 Chev. Impala 5705 16 4 4.0 20 3690 212 43 250 2.56 0 Chev. Malibu 4504 22 3 3.5 17 3180 193 31 200 2.73 0 Chev. Monte Carlo 5104 22 2 2.0 16 3220 200 41 200 2.73 0 Chev. Monza 3667 24 2 2.0 7 2750 179 40 151 2.73 0 Chev. Nova 3955 19 3 3.5 13 3430 197 43 250 2.56 0 Datsun 200 6229 23 4 1.5 6 2370 170 35 119 3.89 1 Datsun 210 4589 35 5 2.0 8 2020 165 32 85 3.70 1 Datsun 510 5079 24 4 2.5 8 2280 170 34 119 3.54 1 Datsun 810 8129 21 4 2.5 8 2750 184 38 146 3.55 1 Dodge Colt 3984 30 5 2.0 8 2120 163 35 98 3.54 0 Dodge Diplomat 4010 18 2 4.0 17 3600 206 46 318 2.47 0 Dodge Magnum 5886 16 2 4.0 17 3600 206 46 318 2.47 0 Dodge St. Regis 6342 17 2 4.5 21 3740 220 46 225 2.94 0 Fiat Strada 4296 21 3 2.5 16 2130 161 36 105 3.37 1 Ford Fiesta 4389 28 4 1.5 9 1800 147 33 98 3.15 0 Ford Mustang 4187 21 3 2.0 10 2650 179 43 140 3.08 0 Honda Accord 5799 25 5 3.0 10 2240 172 36 107 3.05 1 Honda Civic 4499 28 4 2.5 5 1760 149 34 91 3.30 1 Linc. Continental 11497 12 3 3.5 22 4840 233 51 400 2.47 0 Linc. Mark V 13594 12 3 2.5 18 4720 230 48 400 2.47 0 Linc. Versailles 13466 14 3 3.5 15 3830 201 41 302 2.47 0 Mazda GLC 3995 30 4 3.5 11 1980 154 33 86 3.73 1 Merc. Bobcat 3829 22 4 3.0 9 2580 169 39 140 2.73 0 Merc. Cougar 5379 14 4 3.5 16 4060 221 48 302 2.75 0 Merc. Marquis 6165 15 3 3.5 23 3720 212 44 302 2.26 0 Merc. Monarch 4516 18 3 3.0 15 3370 198 41 250 2.43 0 Merc. XR-7 6303 14 4 3.0 16 4130 217 45 302 2.75 0 Merc. Zephyr 3291 20 3 3.5 17 2830 195 43 140 3.08 0 Olds 98 8814 21 4 4.0 20 4060 220 43 350 2.41 0 Olds Cutl Supr 5172 19 3 2.0 16 3310 198 42 231 2.93 0 Olds Cutlass 4733 19 3 4.5 16 3300 198 42 231 2.93 0 Olds Delta 88 4890 18 4 4.0 20 3690 218 42 231 2.73 0 Olds Omega 4181 19 3 4.5 14 3370 200 43 231 3.08 0 Olds Starfire 4195 24 1 2.0 10 2730 180 40 151 2.73 0 Olds Toronado 10371 16 3 3.5 17 4030 206 43 350 2.41 0 Peugeot 604 12990 14 3.5 14 3420 192 38 163 3.58 1 Plym. Arrow 4647 28 3 2.0 11 3260 170 37 156 3.05 0 Plym. Champ 4425 34 5 2.5 11 1800 157 37 86 2.97 0 Plym. Horizon 4482 25 3 4.0 17 2200 165 36 105 3.37 0 Plym. Sapporo 6486 26 1.5 8 2520 182 38 119 3.54 0 Plym. Volare 4060 18 2 5.0 16 3330 201 44 225 3.23 0 Pont. Catalina 5798 18 4 4.0 20 3700 214 42 231 2.73 0 Pont. Firebird 4934 18 1 1.5 7 3470 198 42 231 3.08 0 Pont. Grand Prix 5222 19 3 2.0 16 3210 201 45 231 2.93 0 Pont. Le Mans 4723 19 3 3.5 17 3200 199 40 231 2.93 0 Pont. Phoenix 4424 19 3.5 13 3420 203 43 231 3.08 0 Pont. Sunbird 4172 24 2 2.0 7 2690 179 41 151 2.73 0 Renault Le Car 3895 26 3 3.0 10 1830 142 34 79 3.72 1 Subaru 3798 35 5 2.5 11 2050 164 36 97 3.81 1 Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1 Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1 Toyota Corona 5719 18 5 2.0 11 2670 175 36 134 3.05 1 Volvo 260 11995 17 5 2.5 14 3170 193 37 163 2.98 1 VW Dasher 7140 23 4 2.5 12 2160 172 36 97 3.74 1 VW Diesel 5397 41 5 3.0 15 2040 155 35 90 3.78 1 VW Rabbit 4697 25 4 3.0 15 1930 155 35 89 3.78 1 VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1 END DATA. FORMATS hdroom (F3.1) gratio (F4.2) .

The data set has missing values which were left blank, and
the long character variable **make **which contains blanks. Thus fixed field input was
used with columns ranges specified.

## 2. T-tests

We can use the **t-test** command to determine whether
the average **mpg** for domestic cars differ from the mean for foreign cars.

T-TEST /GROUPS=foreign(0 1) /VARIABLES=mpg.

Here is the output produced by the **t-test**.
The results show that foreign cars have significantly higher gas mileage (**mpg**) than domestic cars. Note that the overall N is 71 (not 74). This is because
**mpg**
was missing for 3 of the observations, so those observations were omitted from the
analysis.

Note that the output provides two t values, one assuming
that the variances are **Equal** and another assuming that the variances are
**Unequal**. To the left of the t-test output is the **"Levene’s Test for Equality
of Variances"**, which tests whether the variances are equal.
However, this test is very sensitive to issues other than variances (such as
homogeneity), so we often ignore it. When deciding between the t-test
assuming equal or unequal variances, instead look at the standard deviations in
the Group Statistics table. If the standard deviation of one variable is
not more than about twice the other variable, then it is probably safe to use
the equal variances version of the t-test. If the standard deviation of
one variable is much larger than that of the other variable, then you may want
to use the t-test with the unequal variances assumed.

## 3. Chi-square tests

We can use the **crosstabs** command to examine the
repair records of the cars (**rep78**, where 1 is the worst repair record, 5
is the best repair record) by **foreign** (foreign coded 1, domestic coded
0). Use the **chissq** keyword on the **statistics**
subcommand to request a chi-square test. This test determines if these two variables are
independent. The syntax is shown below.

CROSSTABS /TABLES=rep78 BY foreign /STATISTICS=CHISQ.

The results are shown below, presenting the crosstab first and then following with the chi-square test.

Notice that SPSS tells us that four of 10 cells have an expected value of less than five. The chi-square is not really valid when you have cells with expected values less than five. Thus, you should use Fisher’s exact test, which is valid under such circumstances. Unfortunately, Fisher’s exact test is only available if you have installed the Exact Tests add-on module to SPSS.

## 4. Correlation

Let’s use the **correlations** command to examine the
relationships among **price**, **mpg** and **weight**.

CORRELATIONS /VARIABLES=price mpg weight.

The results of the **correlations** command are shown
below.

The output is a correlation matrix for **price**, **mpg**
and **weight**. The off-diagonal cells have three entries: correlation coefficient, P value
and number of cases (N). The p-value is the two-tailed p-value for the hypothesis test that the correlation is 0.

By looking at the sample sizes, we can see how
the
**correlations** command handles the missing values. Since **mpg** had
three missing values, all of the
correlations with **mpg** have an N of 71. The rest of the correlations were based on an N of
74. This is called pairwise deletion of missing data. Since SPSS used the maximum
number of non-missing values for each pair of variables it uses pairwise deletion.
It is possible to ask SPSS for correlations only on the cases having complete data for all
of the variables on the **variables **subcommand. This is called listwise deletion of
missing data, when any of the variables are missing for a case, the entire case will be
omitted from analysis. You can request listwise deletion with the **
listwise** keyword
on the **missing=** subcommand. This is demonstrated in the syntax below.

CORR /VARIABLES=price mpg weight /MISSING=LISTWISE.

Notice that the **correlations** command can be abbreviated
as **corr**.

The results of this command are shown below.

Footnote **a** indicates that the N is 71 for all of the correlations in the matrix
(because the **missing=listwise** subcommand was specified).

** **

## 5. Regression

Regression is a technique used to find the best linear
prediction of a criterion variable from a set of predictor variables.
Let’s perform a
regression analysis to predict **price** from **mpg** and **weight**. We can use the **
regression** command
as in the example below. The **dependent** subcommand names the criterion variable
(also known as the outcome or dependent variable) **price**. The
**method** subcommand names the predictor variables **mpg** and **weight**, and the
**enter** keyword
causes both variables to enter the equation at the same time.

REG /DEPENDENT price /METHOD=ENTER mpg weight.

You should note the following two points in looking at the output below.

1) Only 71 observations are used instead of 74 because
**mpg** had three missing
values. The **regression** command deletes missing cases using listwise deletion. If you have a
large amount of missing data you may lose too many cases unless you use some method for
estimating missing values.

2) In the Coefficients table, we can see
that **weight** is the only variable that
significantly predicts price. The predicted regression coefficient (B) for **weight**
is 1.690 with a t value of 2.603 and a p-value of 0.011. One reason for
this may be the high correlation between **mpg** and **weight**.

The results are shown below.

## 6. Analysis of variance (and analysis of covariance)

To compare the average prices among the cars in the
different repair groups we use Analysis of Variance. We can use the **anova**
command to perform an ANOVA
comparing the prices among the repair groups. Since there are so few cars with a
repair record (**rep78**) of 1 or 2, we should concentrate on the cars with repair records of
3, 4 and 5. We will use the range specification (3,5) on the **
variables** subcommand to
limit processing to those categories three through five. The ANOVA below performs an tests
the hypothesis that the average mpg for the three repair groups (**rep78**) are the same. It
also produces the means for the three repair groups.

ANOVA /VARIABLES=mpg BY rep78(3,5) /METHOD=EXPERIM /STATISTICS MEAN.

The results of the ANOVA are shown below. SPSS
informs us that it used only 57 observations (due to the missing values of **mpg** and
restrictions on the values of **rep78**). The results suggest that there are significant
differences in **mpg** among the three repair groups (based on the F value of 8.081 with a p-value of 0.001). The means for groups 3, 4 and 5 were 19.43, 21.67 and 27.36.

## 7. Analysis of variance with the glm command

You can also
use the **glm** command. The **glm** command allows the calculation of post hoc tests as
well. Since the **glm** command does not allow the specification of a range, you will have to
use the filter command to restrict the range of **rep78**. An example of the
**glm** command with
filtering and the Tukey HSD post hoc test follows.

** **

COMPUTE filt345=(ANY(rep78 ,3,4,5)). FILTER BY filt345. EXECUTE. GLM mpg BY rep78 /POSTHOC = rep78 ( TUKEY ) /EMMEANS = TABLES(rep78). FILTER OFF. EXECUTE.

The results are shown below. The group with **rep78** of 5 is significantly different both from 3 and
from 4. However, the group with **rep78** of 3 is not significantly
different from **rep78** of 4.

## 8. Problems to look out for

- Be sure to check the N when you do correlations, regression or ANOVA. It is possible to have seemingly small amounts of missing data for each variable, but with listwise deletion you may have very few remaining cases.

** **

## 9. For more information

- For more information on descriptive statistics, see the SPSS Learning Module Descriptive Statistics in SPSS.
- For more information about
**t-test**,**descriptives**,**crosstabs**,**correlations**,**regression**and**anova**, please see the appropriate chapters in the SPSS Command Syntax Reference Guide.