1. Introduction
This module illustrates how to obtain basic descriptive statistics using SAS. We illustrate this using a data file about 26 automobiles with their make, price, mpg, repair record, and whether the car was foreign or domestic. The data file is illustrated below.
MAKE PRICE MPG REP78 FOREIGN AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1
The program below reads the data and creates a temporary data file called auto. The descriptive statistics shown in this module are all performed on this data file called auto.
DATA auto ; input MAKE $ PRICE MPG REP78 FOREIGN ; DATALINES; AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1 ; RUN; PROC PRINT DATA=auto(obs=10); RUN;
The output of the proc print is shown below. You can compare the program above to the output below.
OBS MAKE PRICE MPG REP78 FOREIGN 1 AMC 4099 22 3 0 2 AMC 4749 17 3 0 3 AMC 3799 22 3 0 4 Audi 9690 17 5 1 5 Audi 6295 23 3 1 6 BMW 9735 25 4 1 7 Buick 4816 20 3 0 8 Buick 7827 15 4 0 9 Buick 5788 18 3 0 10 Buick 4453 26 3 0
2. Using proc freq for frequencies
We can use proc freq to produce frequency tables. Below, we use it to make frequency tables for make, rep78 and foreign.
PROC FREQ DATA=auto; TABLES make ; RUN; PROC FREQ DATA=auto; TABLES rep78 ; RUN; PROC FREQ DATA=auto; TABLES foreign ; RUN;
Here is the output produced by the proc freq statements above.
Cumulative Cumulative MAKE Frequency Percent Frequency Percent ---------------------------------------------------- AMC 3 11.5 3 11.5 Audi 2 7.7 5 19.2 BMW 1 3.8 6 23.1 Buick 7 26.9 13 50.0 Cad. 3 11.5 16 61.5 Chev. 6 23.1 22 84.6 Datsun 4 15.4 26 100.0 Cumulative Cumulative REP78 Frequency Percent Frequency Percent --------------------------------------------------- 2 3 11.5 3 11.5 3 15 57.7 18 69.2 4 6 23.1 24 92.3 5 2 7.7 26 100.0 Cumulative Cumulative FOREIGN Frequency Percent Frequency Percent ----------------------------------------------------- 0 19 73.1 19 73.1 1 7 26.9 26 100.0
Instead of having three separate proc freqs, we could have done this all in one proc freq step as illustrated below. The output will be the same as shown above.
PROC FREQ DATA=auto; TABLES make rep78 foreign ; RUN;
Let’s use proc freq to look at a cross tabulation of the repair history of the cars (rep78) for foreign and domestic cars (foreign). The proc freq statements for this are shown below. Note the asterisk (*) between the variables rep78 and foreign on the tables statement.
PROC FREQ DATA=auto; TABLES rep78*foreign ; RUN;
This is the output produced.
TABLE OF REP78 BY FOREIGN REP78 FOREIGN Frequency| Percent | Row Pct | Col Pct | 0| 1| Total ---------+--------+--------+ 2 | 3 | 0 | 3 | 11.54 | 0.00 | 11.54 | 100.00 | 0.00 | | 15.79 | 0.00 | ---------+--------+--------+ 3 | 14 | 1 | 15 | 53.85 | 3.85 | 57.69 | 93.33 | 6.67 | | 73.68 | 14.29 | ---------+--------+--------+ 4 | 2 | 4 | 6 | 7.69 | 15.38 | 23.08 | 33.33 | 66.67 | | 10.53 | 57.14 | ---------+--------+--------+ 5 | 0 | 2 | 2 | 0.00 | 7.69 | 7.69 | 0.00 | 100.00 | | 0.00 | 28.57 | ---------+--------+--------+ Total 19 7 26 73.08 26.92 100.00
We can show just the cell percentages to make the table easier to read by using the norow, nocol and nofreq options on the tables statement to suppress the printing of the row percentages, column percentages and frequencies (leaving just the cell percentages). Note that the options come after the forward slash ( / ) on the tables statement.
PROC FREQ DATA=auto; TABLES rep78*foreign / NOROW NOCOL NOFREQ ; RUN;
The output is shown below.
TABLE OF REP78 BY FOREIGN REP78 FOREIGN Percent | 0| 1| Total --------+--------+--------+ 2 | 11.54 | 0.00 | 11.54 --------+--------+--------+ 3 | 53.85 | 3.85 | 57.69 --------+--------+--------+ 4 | 7.69 | 15.38 | 23.08 --------+--------+--------+ 5 | 0.00 | 7.69 | 7.69 --------+--------+--------+ Total 19 7 26 73.08 26.92 100.00
The order of the options does not matter. We would have gotten the same output had we written the command like this.
PROC FREQ DATA=auto; TABLES rep78*foreign / NOFREQ NOROW NOCOL ; RUN;
3. Using proc means for summary statistics
Proc means can be used to produce summary statistics. Below, proc means is used to get descriptive statistics for the variable mpg.
PROC MEANS DATA=auto; VAR mpg; RUN;
The results of the proc means are shown below.
Analysis Variable : MPG N Mean Std Dev Minimum Maximum ---------------------------------------------------------- 26 20.9230769 4.7575042 14.0000000 35.0000000 ----------------------------------------------------------
Suppose we would like to get the summary statistics separately for foreign and domestic cars (indicated by the variable foreign). We can use the class statement (shown below) to get separate results for the different values of foreign.
PROC MEANS DATA=auto; CLASS foreign ; VAR mpg; RUN;
As you see below, the results of proc means are presented separately for the seven foreign cars (when foreign equals 1) and the 19 domestic cars (when foreign equals 0).
Analysis Variable : MPG FOREIGN N Obs N Mean Std Dev Minimum Maximum ------------------------------------------------------------- 0 19 19 19.78 4.0356598 14.0000 29.00 1 7 7 24.00 5.5075705 17.0000 35.00 --------------------------------------------------------------
4. Using proc univariate for detailed summary statistics
You can use proc univariate to get more detailed summary statistics, as shown below.
PROC UNIVARIATE DATA=auto; VAR mpg; RUN;
And here are the results of proc univariate.
Univariate Procedure Variable=MPG Moments N 26 Sum Wgts 26 Mean 20.92308 Sum 544 Std Dev 4.757504 Variance 22.63385 Skewness 0.935473 Kurtosis 1.7927 USS 11948 CSS 565.8462 CV 22.73807 Std Mean 0.933023 T:Mean=0 22.42503 Pr>|T| 0.0001 Num ^= 0 26 Num > 0 26 M(Sign) 13 Pr>=|M| 0.0001 Sgn Rank 175.5 Pr>=|S| 0.0001 Quantiles(Def=5) 100% Max 35 99% 35 75% Q3 23 95% 29 50% Med 21 90% 26 25% Q1 17 10% 15 0% Min 14 5% 14 1% 14 Range 21 Q3-Q1 6 Mode 22 Extremes Lowest Obs Highest Obs 14( 15) 24( 25) 14( 14) 25( 6) 15( 8) 26( 10) 16( 18) 29( 17) 16( 12) 35( 24)
We can use the class statement to obtain separate univariate results for foreign and domestic cars.
PROC UNIVARIATE DATA=auto; CLASS foreign; VAR mpg; RUN;
As you see in the output below, you get a complete set of output for the case when foreign equals 0 and then another set of output when foreign equals 1.
The UNIVARIATE Procedure Variable: MPG FOREIGN = 0 Moments N 19 Sum Weights 19 Mean 19.7894737 Sum Observations 376 Std Deviation 4.03565976 Variance 16.2865497 Skewness 0.477379 Kurtosis 0.04119835 Uncorrected SS 7734 Corrected SS 293.157895 Coeff Variation 20.3929616 Std Error Mean 0.92584385 Basic Statistical Measures Location Variability Mean 19.78947 Std Deviation 4.03566 Median 20.00000 Variance 16.28655 Mode 22.00000 Range 15.00000 Interquartile Range 6.00000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 21.37453 Pr > |t| <.0001 Sign M 9.5 Pr >= |M| <.0001 Signed Rank S 95 Pr >= |S| <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 29 99% 29 95% 29 90% 26 75% Q3 22 50% Median 20 25% Q1 16 10% 14 5% 14 1% 14 0% Min 14 Variable: MPG FOREIGN = 0 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 14 15 22 19 14 14 22 20 15 8 24 21 16 18 26 10 16 12 29 17 Variable: MPG FOREIGN = 1 Moments N 7 Sum Weights 7 Mean 24 Sum Observations 168 Std Deviation 5.50757055 Variance 30.3333333 Skewness 1.34081176 Kurtosis 3.28605241 Uncorrected SS 4214 Corrected SS 182 Coeff Variation 22.9482106 Std Error Mean 2.081666 Basic Statistical Measures Location Variability Mean 24.00000 Std Deviation 5.50757 Median 23.00000 Variance 30.33333 Mode 23.00000 Range 18.00000 Interquartile Range 4.00000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 11.52923 Pr > |t| <.0001 Sign M 3.5 Pr >= |M| 0.0156 Signed Rank S 14 Pr >= |S| 0.0156 Quantiles (Definition 5) Quantile Estimate 100% Max 35 99% 35 95% 35 90% 35 75% Q3 25 50% Median 23 25% Q1 21 10% 17 5% 17 1% 17 0% Min 17 Variable: MPG FOREIGN = 1 Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 17 4 23 5 21 26 23 23 23 23 24 25 23 5 25 6 24 25 35 24
5. Problems to look out for
If you make a crosstab with proc freq and one of the variables has large number of values (say 10 or more) the crosstab table could be very hard to read. In such cases, try using the list option on the tables statement.
TABLES rep78*foreign / LIST ;
When using the by statement in proc univariate, if you choose a by variable with a large number of values (say 5, 10, or more) it will produce a very large amount of output. In such cases, you may try to use proc means with a class statement instead of proc univariate.
6. For more information
For information on Statistical Tests in SAS, see the SAS Learning Module An Overview of Statistical Tests in SAS.
7. Web Notes
You can view the SAS program associated with this module by clicking descript.sas . While viewing the file, you can save it by choosing File then Save As from the pull-down menu of your web browser — In the Save As dialog box, change the file name to descript.sas and then choose the directory where you want to save the file, then click Save.