#### 1.0 SAS statements and procs in this unit

proc contents |
Contents of a SAS dataset |

proc print |
Displays the data |

proc means |
Descriptive statistics |

proc univariate |
More descriptive statistics |

proc freq |
Frequency tables, frequency charts, and crosstabs |

ods |
Output delivery system, creating output in various formats |

proc corr |
Correlation matrix and scatterplots |

proc sgplot |
Produces many types of plots |

#### 2.0 Demonstration and explanation

We will begin by submitting **options** **nocenter** so that the output is left justified. We use the **libname** statement to refer to a folder of SAS data files. We will continue to use the SAS dataset **hs0** that was created in the previous unit.

Before we start our statistical exploration we will look at the data using **proc contents** and **proc print**.

options nocenter; libname in 'c:\sas_data\'; proc contents position data=in.hs0; run;*If we only want to print some observations we can use the obs= option; proc print data=in.hs0(obs=20); run; *If we only want to print some variables, we can use the var statement; proc print data=in.hs0 (obs=20); var gender id race ses schtyp prgtype read; run;

Before we go any further, let’s use a data step to make a temporary copy of **hs0** and we will still call it **hs0**. Now we can make changes to the temporary data set **hs0**, without making changes to the permanent data set in **c:\sas_data\hs0**. If we decide that we want to do so later on, we can save **hs0** as a permanent data set.

data hs0; set in.hs0; run;

One of the basic descriptive statistics command in SAS is **proc means**. Below we get means for all of the variables. Along with **proc means**, we also show the **proc univariate** output, which displays additional descriptive statistics.

proc means data=hs0; run; proc univariate data=hs0; var read write; run;

With the **var** statement, we can specify which variables we want to analyze. Also, the **n mean median std var** options allow us to indicate which statistics we want computed. By default, SAS outputs sample size, mean, standard deviation, minimum and maximum.

proc means data=hs0 n mean median std var; var read math science write; run;

We use the **where** statement below to subset our the data and examine just those students with a reading score of 60 or higher.

proc means data=hs0 n mean median std var; where read>=60; var read math science write; run;

With the **class** statement, we get the descriptive statistics broken down by** **levels of the categorical variable. Here were will get the means of** read, math, science **and** write **by** prgtype**.

proc means data=hs0 n mean median std var; class prgtype; var read math science write; run;

We can use **proc univariate** to get detailed descriptive statistics for **write** along with a histogram with a normal overlay.

proc univariate data=hs0 noprint; var write; histogram / normal; run;

Below we use **proc freq** to get a frequency table for **ses**. The second example uses **proc freq** to produce a bar chart and cumulative frequency graph in addition to the frequency table for **ses**.

proc freq data=hs0; table ses; run; proc freq data=hs0; table prgtype*ses / plots=freqplot; run;

Here we use **proc freq** to get frequencies for **gender**, **schtyp** and **prgtype**, each table shown separately.

proc freq data=hs0; table gender schtyp prgtype; run;

Below we show how to get a crosstab of **prgtype **by **ses. **

proc freq data=hs0; table prgtype*ses; run;

**proc corr** is used to get correlations among two or more variables. By default, **proc corr** uses pairwise deletion for missing observations. If you use the **nomiss** option, **proc corr** uses listwise deletion and omits all observations with missing data on any of the named variables.

proc corr data=hs0; var write read science; run; proc corr data=hs0 nomiss; var write read science; run;

In the example below we use **proc corr** to generate a scatterplot matrix. In the second example below, we use **proc sgplot** to get a scatterplot showing the relationship between the two variables **write** and **read**.

proc corr data=hs0 nomiss plots=matrix; var write read science; run; proc sgplot data = hs0; scatter x = read y = write; run;

We can also modify the symbol with the **markerchar** option to use the **id** variable instead of dots. This is especially useful to identify outliers or other interesting observations.

proc sgplot data=hs0; scatter x=write y=read / markerchar=id; run;

We can also create a scatter plot where we have different symbols specifying identifying characteristics of the observation being plotted. Below we will use the **group **option to label observations depending on the gender of the subjects. This can be used to check if the relationship between **write** and **math** is linear for each gender group.

proc sgplot data=hs0; scatter x=write y=read / group=gender; run;

Here are a couple more examples of the flexibility of **proc ****sgplot**.

proc sgplot data=hs0; vbar ses /response = write stat=mean limits=both ; run; proc sgplot data=hs0; histogram read; density read / type=normal; density read /type = kernel; run;

#### 3.0 For more information

**The Little SAS Book, Fifth Edition**- Chapter 4 and 8

**Just Enough SAS: A Quick-start Guide to SAS for Engineers**- Chapters 5-6

**SAS Statistics by Example****Chapters 2-3**