1.0 SAS statements and procs in this unit
proc contents | Contents of a SAS dataset |
proc print | Displays the data |
proc means | Descriptive statistics |
proc univariate | More descriptive statistics |
proc freq | Frequency tables, frequency charts, and crosstabs |
ods | Output delivery system, creating output in various formats |
proc corr | Correlation matrix and scatterplots |
proc sgplot | Produces many types of plots |
2.0 Demonstration and explanation
We will begin by submitting options nocenter so that the output is left justified. We use the libname statement to refer to a folder of SAS data files. We will continue to use the SAS dataset hs0 that was created in the previous unit.
Before we start our statistical exploration we will look at the data using proc contents and proc print.
options nocenter; libname in 'c:\sas_data\'; proc contents position data=in.hs0; run;*If we only want to print some observations we can use the obs= option; proc print data=in.hs0(obs=20); run; *If we only want to print some variables, we can use the var statement; proc print data=in.hs0 (obs=20); var gender id race ses schtyp prgtype read; run;
Before we go any further, let’s use a data step to make a temporary copy of hs0 and we will still call it hs0. Now we can make changes to the temporary data set hs0, without making changes to the permanent data set in c:\sas_data\hs0. If we decide that we want to do so later on, we can save hs0 as a permanent data set.
data hs0; set in.hs0; run;
One of the basic descriptive statistics command in SAS is proc means. Below we get means for all of the variables. Along with proc means, we also show the proc univariate output, which displays additional descriptive statistics.
proc means data=hs0; run; proc univariate data=hs0; var read write; run;
With the var statement, we can specify which variables we want to analyze. Also, the n mean median std var options allow us to indicate which statistics we want computed. By default, SAS outputs sample size, mean, standard deviation, minimum and maximum.
proc means data=hs0 n mean median std var; var read math science write; run;
We use the where statement below to subset our the data and examine just those students with a reading score of 60 or higher.
proc means data=hs0 n mean median std var; where read>=60; var read math science write; run;
With the class statement, we get the descriptive statistics broken down by levels of the categorical variable. Here were will get the means of read, math, science and write by prgtype.
proc means data=hs0 n mean median std var; class prgtype; var read math science write; run;
We can use proc univariate to get detailed descriptive statistics for write along with a histogram with a normal overlay.
proc univariate data=hs0 noprint; var write; histogram / normal; run;
Below we use proc freq to get a frequency table for ses. The second example uses proc freq to produce a bar chart and cumulative frequency graph in addition to the frequency table for ses.
proc freq data=hs0; table ses; run; proc freq data=hs0; table prgtype*ses / plots=freqplot; run;
Here we use proc freq to get frequencies for gender, schtyp and prgtype, each table shown separately.
proc freq data=hs0; table gender schtyp prgtype; run;
Below we show how to get a crosstab of prgtype by ses.
proc freq data=hs0; table prgtype*ses; run;
proc corr is used to get correlations among two or more variables. By default, proc corr uses pairwise deletion for missing observations. If you use the nomiss option, proc corr uses listwise deletion and omits all observations with missing data on any of the named variables.
proc corr data=hs0; var write read science; run; proc corr data=hs0 nomiss; var write read science; run;
In the example below we use proc corr to generate a scatterplot matrix. In the second example below, we use proc sgplot to get a scatterplot showing the relationship between the two variables write and read.
proc corr data=hs0 nomiss plots=matrix; var write read science; run; proc sgplot data = hs0; scatter x = read y = write; run;
We can also modify the symbol with the markerchar option to use the id variable instead of dots. This is especially useful to identify outliers or other interesting observations.
proc sgplot data=hs0; scatter x=write y=read / markerchar=id; run;
We can also create a scatter plot where we have different symbols specifying identifying characteristics of the observation being plotted. Below we will use the group option to label observations depending on the gender of the subjects. This can be used to check if the relationship between write and math is linear for each gender group.
proc sgplot data=hs0; scatter x=write y=read / group=gender; run;
Here are a couple more examples of the flexibility of proc sgplot.
proc sgplot data=hs0; vbar ses /response = write stat=mean limits=both ; run; proc sgplot data=hs0; histogram read; density read / type=normal; density read /type = kernel; run;
3.0 For more information
- The Little SAS Book, Fifth Edition
- Chapter 4 and 8
- Just Enough SAS: A Quick-start Guide to SAS for Engineers
- Chapters 5-6
- SAS Statistics by Example
- Chapters 2-3