SPSS Class Notes Analyzing Data

1.0 SPSS commands used in this unit

crosstabs	Crosstabulations
t-test	t-tests
glm	General linear models
regression	OLS regressions
pplot	Normal probability plot
logistic	Logistic regressions
npar	Non-parametric tests

2.0 Demonstration and explanation

For this section we will be using the hs1.sav data set that we worked with in previous sections.

File
 Open
  Data
   select C:spss_datahs1.sav

2.1 Chi-square

The chi-square test is used to determine if there is a relationship between two categorical variables.

Analyze
 Descriptive Statistics
  Crosstabs...
   select prgtype for the rows and ses for the columns
    click on "Statistics"
     check the chi-square box

2.2 t-tests

This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.

Analyze
 Compare Means
  One Sample t-test
   select write and compare it to 50

This is the two-sample independent t-test with separate (unequal) variances.

Analyze
 Compare Means
  Independent Samples t-test
   select write as the dependent variable and female as the independent 
   variable

This is the paired t-test, testing whether or not the mean of write equals the mean of science.

Analyze
 Compare Means
  Paired Samples t-test
   select write and science

2.2 ANOVA

In this example the glm command is used to perform a one-way analysis of variance (ANOVA).

Analyze
 General Linear Models
  Univariate
   select write as the dependent variable and prog as the fixed factor

In this example the glm command is used to perform a two-way analysis of variance (ANOVA). The plot option creates plots of the means, which can be a great visual aid to understanding the data.

Analyze
 General Linear Models
  Univariate
   select write as the dependent variable and prog and ses as fixed factors
    Plots
     select prog to be the X axis and ses to be the separate lines
      Add
       Continue

The Tukey test is used to test all the pair-wise comparisons of the levels of prog.

Repeat the above analysis (dialogue recall)
 Post Hoc
  select prog and choose Tukey test

Here the glm command performs an analysis of covariance (ANCOVA). Note that the results are exactly the same as in the regression where write and science are regressed on math.

Analyze
 General Linear Models
  Univariate
   select math as the dependent variable and science and write as covariates
    model 
     select custom
      choose main effect in the build terms field and select every variable in 
      the Factors & Covariates field and move them to the Model field.

2.3 Regression

This is plain old OLS regression.

Analyze
 Regression
  Linear
   select math as the dependent variable and write and science as independent 
   variables

It is often very useful to look at the standardized residual versus standardized predicted plot in order to look for outliers and to check for homogeneity of variance. The ideal situation is to see no observations beyond the reference lines, which means that there are no outliers. Also, we would like the points on the plot to be distributed randomly, which means that all the systematic variance has been explained by the model.

Analyze
 Regression
  Linear
   select math as the dependent variable and female, write and socst
   as independent variables    Save
    Click on Unstarndarized residuals
     Plots...
      select Zresid for the Y axis and ZPred for the X axis
       Continue
        OK
         Double click on the plot
          Options
           Y Axis Reference Line
            add a line at Y = -2.5
             Apply
              add a line at Y = 2.5
               Apply

The P-P plots command produces a normal probability plot. It is a method of testing if the residuals from the regression are normally distributed.

Analyze
 Descriptives
  P-P plots
   select res_1 and the test distribution to be "normal"

The Q-Q plots produces a normal quantile plot. It is another method for testing if the residuals are normally distributed. The normal quantile plot is more sensitive to deviances from normality in the tails of the distribution, whereas the normal probability plot is more sensitive to deviances near the mean of the distribution.

Analyze
 Descriptives
  Q-Q plots
   Select res_1 and the test distribution to be "normal"

2.4 Logistic regression

Logistic regression requires a dependent variable that is dichotomous (i.e., has only two values). As we do not have such a variable in our data set, we will create one called honcomp (honors composition). This is purely for illustrative purposes only!

Transform
 Compute
  select honcomp for the "target variable" and for numeric expression enter 
  "write >= 60".

Analyze
 Regression
  Binary Logistic
   select honcomp as the dependent variable, and select read and socst as 
   covariates

2.5 Non-parametric tests

The binomial test is the nonparametric analog of the single-sample two-sided t-test.

Analyze
 Nonparametric Tests
  Binomial 
   select write and define the cut point to be 50

The signrank test is the nonparametric analog of the paired t-test.

Analyze
 Nonparametric Tests
  2 Related Samples 
   select write and read as the test pair list and select Wilcoxon 
   as the test type

The Mann Whitney U test is the nonparametric analog of the independent two-sample t-test.

Analyze
 Nonparametric Tests
  2 Independent Samples 
   select write as the test variable list, 
   select female as the group variable
    click on Define Groups and enter 0 and 1
     Continue
      select Mann Whitney U as the test type

The Kruskal Wallis test is the nonparametric analog of the one-way ANOVA.

Analyze
 Nonparametric Tests
  K Independent Samples 
   select write as the test variable list and select prog as the group variable
    click on Define Range and enter 1 for the Minimum and 3 for the Maximum
     Continue

3.0 Syntax version

get file "c:spss_datahs1.sav".

* chi-square test.
crosstabs
  /tables prgtype by ses
  /statistic = chisq.

* t-tests.
t-test
  /testval=50
  /variables=write.

t-test
  groups=female(0 1)
  /variables=write.

t-test
  pairs= write with science (paired).

* anova.
glm
  write  by prog
  /design = prog.

glm
  write  by prog ses
  /design = prog, ses, prog*ses
  /plot = profile(prog*ses).

glm
  write  by prog ses
  /design = prog, ses, prog*ses
  /posthoc = prog(tukey).
  
* ancova.
glm
 math with science write
 /design= science write.

* regression.
regression
  /dependent math
  /method=enter write science.

regression
  /dependent math
  /method=enter socst write ses
  /scatterplot=(*zresid ,*zpred ).

*residual plots.
pplot
  /variables=res_1
  /type=p-p
  /dist=normal.

pplot
  /variables=res_1
  /type=q-q
  /dist=normal.

* creating a dichotomous variable.
compute honcomp = (write > 60).
execute.

* logistic regression.
logistic regression var=honcomp
  /method=enter read socst.

* non-parametric tests.

* binomial test.
npar test
  /binomial (.50)= write (50).

* sign test.
npar test
  /sign= read with write (paired).

*signrank test.
npar tests
  /m-w= write by female(1 0).

* kruskal-wallis test.
npar tests
  /k-w=write by prog(1 3).

4.0 For more information

Choosing the Correct Statistical Test in SPSS
Includes guidelines for choosing the correct non-parametric test
SPSS Frequently Asked Questions
Covers many different topics including: ANOVA, Generalized Linear Models (GLM) and linear regression
SPSS Regression Webbook
Includes such topics as diagnostics, categorical predictors, testing interactions and testing contrasts
SPSS Data Analysis Examples
Includes examples of common data analysis techniques
SPSS Annotated Output

Includes annotated output for descriptive statistics, correlation, regression and logistic regression

SPSS Library
Topics in ANOVA and other subjects