1.0 SAS statements and procs in this unit
proc ttest | t-tests, including one sample, two sample and paired |
proc freq | Used here for chi-squared tests |
proc reg | Simple and multiple regression |
proc glm | Used here for ANOVA models |
proc logistic | Logistic regression |
proc npar1way | Non-parametric analyses |
proc univariate | Used here for signrank tests |
2.0 Demonstration and explanation
2.1 Chi-squared test
Below we use proc freq to perform a chi-squared test (chisq) and to show the expected frequencies (expected) used to compute the test statistic.
proc freq data=in.hs1; table prgtype*ses / chisq expected; run;
2.2 T-tests
This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.
proc ttest data=in.hs1 H0=50; var write; run;
This is the paired t-test, testing whether or not the mean of write equals the mean of read.
proc ttest data=in.hs1; paired write*read; run;
This is the two-sample independent t-test. The output includes the t-test for both equal and unequal variances. The class statement is necessary in order to indicate which groups/categories are to be compared on the mean of write.
proc ttest data=in.hs1; class female; var write; run;
2.3 ANOVA
SAS has a procedure called proc anova, but it is only used when there are an equal number of observations in each of the ANOVA cells (which is called a balanced design). Proc glm is a much more general procedure that will work with any balanced or unbalanced design (unbalanced meaning an unequal number of observations in each cell).
In this example we are using proc glm to perform a one-way analysis of variance. As with proc ttest, the class statement is used to indicate that prog is a categorical variable. We use the ss3 option to indicate that we are only interested in looking at the Type III sums of squares, which are the sums of squares that are appropriate for an unbalanced design.
proc glm data=in.hs1; class prog; model write=prog / ss3; run; quit;
Here proc glm performs an analysis of covariance (ANCOVA). In this example, prog is the categorical predictor and read is the continuous covariate.
proc glm data=in.hs1; class prog; model write = read prog / ss3; run; quit;
2.4 Regression
In this example we will demonstrate how to set-up a Ordinary Least Squares (OLS) regression model. proc reg is a very powerful and versatile procedure. In the following examples we will illustrate just a few of the many uses of proc reg. Note that this command does not support a class statement. If you need to use a categorical predictor variable, use proc glm or create dummy variables in a data step.
proc reg data=in.hs1; model write = female read; run; quit;
If you are using SAS 9.3 or earlier, specifying plots=diagnostics on the proc reg statement produces a number of diagnostic graphs. However, version 9.4 provides these diagnostic plots by default. The output statement creates a new dataset, called temp, which includes the predicted values (by using the p = option) and the residuals (by using the r = option). The proc print displays the values of selected variables from the temp dataset.
proc reg data =in.hs1 plots=diagnostics; model math = write socst; output out=temp p=predict r=resid; run; quit;proc print data=temp (obs=20); var math predict resid; run;
2.5 Logistic regression
In order to demonstrate logistic regression, we will create a dichotomous variable called honcomp (honors composition), which will be equal to 1 when the logical test of write >= 60 is true and equal to zero when it is not true. This variable is created purely for illustration purpose only.
data hs2; set in.hs1; honcomp = (write >= 60); run;
The proc logistic performs a logistic regression. It is necessary to include the descending option when a variable is coded 0/1 with 1 representing the event whose probability is being modeled. This is needed so that the odds ratios are calculated for the comparison of interest.
proc logistic data=hs2 descending; model honcomp = female read; run;
2.6 Nonparametric tests
The signtest is the nonparametric analog of the one-sample t-test. The sign test is part of the output of the tests of location in proc univariate. The value that is being tested is specified by the mu0 option on the proc univariate statement.
proc univariate data=in.hs1 mu0=50; var write; run;
The signrank test is the nonparametric analog of the paired t-test. To obtain this test, it is necessary to first compute the difference between the variables to be compared in a separate data step. Then the new difference variable is tested in proc univariate. The signrank test is found in the section of the output called “tests of location”.
data hs1c; set in.hs1; diff = read - write; run; proc univariate data=hs1c; var diff; run;
The ranksum test is the nonparametric analog of the independent two-sample t-test.
proc npar1way data=in.hs1; class female; var write; run;
The kruskal wallis test is the nonparametric analog of the one-way ANOVA.
proc npar1way data=in.hs1; class ses; var write; run;
3.0 For more information
- The
Little SAS Book, Fifth Edition
- Chapter 9
- SAS Statistics by Example
- Chapters 4-12
- Regression and ANOVA: An Integrated Approach Using SAS Software
- SAS System for Linear Models, Fourth Edition
- Logistic Regression Using the SAS System: Theory and Application
- Logistic Regression Examples Using the SAS System
- Choosing the Correct Statistical Test Includes guidelines for choosing the correct non-parametric test
- Data Analysis Examples Gives examples of common analysis and interpretation of the output
- Annotated Output Fully annotates the output from common statistical procedures
- SAS Frequently Asked Questions Covers many different topic including among others: ANOVA, Generalized Linear Models (GLM), linear regression and logistic regression
- SAS Regression Webbook Includes such topics as diagnostics, categorical predictors, testing interactions and testing contrasts