1. Introduction
This module illustrates how to obtain basic descriptive statistics using SAS. We illustrate this using a data file about 26 automobiles with their make, price, mpg, repair record, and whether the car was foreign or domestic. The data file is illustrated below.
MAKE PRICE MPG REP78 FOREIGN AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1
The program below reads the data and creates a temporary data file called auto. The descriptive statistics shown in this module are all performed on this data file called auto.
DATA auto ; input MAKE $ PRICE MPG REP78 FOREIGN ; DATALINES; AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1 ; RUN; PROC PRINT DATA=auto(obs=10); RUN;
The output of the proc print is shown below. You can compare the program above to the output below.
OBS MAKE PRICE MPG REP78 FOREIGN 1 AMC 4099 22 3 0 2 AMC 4749 17 3 0 3 AMC 3799 22 3 0 4 Audi 9690 17 5 1 5 Audi 6295 23 3 1 6 BMW 9735 25 4 1 7 Buick 4816 20 3 0 8 Buick 7827 15 4 0 9 Buick 5788 18 3 0 10 Buick 4453 26 3 0
2. Using proc freq for frequencies
We can use proc freq to produce frequency tables. Below, we use it to make frequency tables for make, rep78 and foreign.
PROC FREQ DATA=auto; TABLES make ; RUN; PROC FREQ DATA=auto; TABLES rep78 ; RUN; PROC FREQ DATA=auto; TABLES foreign ; RUN;
Here is the output produced by the proc freq statements above.
Cumulative Cumulative
MAKE Frequency Percent Frequency Percent
----------------------------------------------------
AMC 3 11.5 3 11.5
Audi 2 7.7 5 19.2
BMW 1 3.8 6 23.1
Buick 7 26.9 13 50.0
Cad. 3 11.5 16 61.5
Chev. 6 23.1 22 84.6
Datsun 4 15.4 26 100.0
Cumulative Cumulative
REP78 Frequency Percent Frequency Percent
---------------------------------------------------
2 3 11.5 3 11.5
3 15 57.7 18 69.2
4 6 23.1 24 92.3
5 2 7.7 26 100.0
Cumulative Cumulative
FOREIGN Frequency Percent Frequency Percent
-----------------------------------------------------
0 19 73.1 19 73.1
1 7 26.9 26 100.0
Instead of having three separate proc freqs, we could have done this all in one proc freq step as illustrated below. The output will be the same as shown above.
PROC FREQ DATA=auto; TABLES make rep78 foreign ; RUN;
Let’s use proc freq to look at a cross tabulation of the repair history of the cars (rep78) for foreign and domestic cars (foreign). The proc freq statements for this are shown below. Note the asterisk (*) between the variables rep78 and foreign on the tables statement.
PROC FREQ DATA=auto; TABLES rep78*foreign ; RUN;
This is the output produced.
TABLE OF REP78 BY FOREIGN
REP78 FOREIGN
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
2 | 3 | 0 | 3
| 11.54 | 0.00 | 11.54
| 100.00 | 0.00 |
| 15.79 | 0.00 |
---------+--------+--------+
3 | 14 | 1 | 15
| 53.85 | 3.85 | 57.69
| 93.33 | 6.67 |
| 73.68 | 14.29 |
---------+--------+--------+
4 | 2 | 4 | 6
| 7.69 | 15.38 | 23.08
| 33.33 | 66.67 |
| 10.53 | 57.14 |
---------+--------+--------+
5 | 0 | 2 | 2
| 0.00 | 7.69 | 7.69
| 0.00 | 100.00 |
| 0.00 | 28.57 |
---------+--------+--------+
Total 19 7 26
73.08 26.92 100.00
We can show just the cell percentages to make the table easier to read by using the norow, nocol and nofreq options on the tables statement to suppress the printing of the row percentages, column percentages and frequencies (leaving just the cell percentages). Note that the options come after the forward slash ( / ) on the tables statement.
PROC FREQ DATA=auto; TABLES rep78*foreign / NOROW NOCOL NOFREQ ; RUN;
The output is shown below.
TABLE OF REP78 BY FOREIGN
REP78 FOREIGN
Percent | 0| 1| Total
--------+--------+--------+
2 | 11.54 | 0.00 | 11.54
--------+--------+--------+
3 | 53.85 | 3.85 | 57.69
--------+--------+--------+
4 | 7.69 | 15.38 | 23.08
--------+--------+--------+
5 | 0.00 | 7.69 | 7.69
--------+--------+--------+
Total 19 7 26
73.08 26.92 100.00
The order of the options does not matter. We would have gotten the same output had we written the command like this.
PROC FREQ DATA=auto; TABLES rep78*foreign / NOFREQ NOROW NOCOL ; RUN;
3. Using proc means for summary statistics
Proc means can be used to produce summary statistics. Below, proc means is used to get descriptive statistics for the variable mpg.
PROC MEANS DATA=auto; VAR mpg; RUN;
The results of the proc means are shown below.
Analysis Variable : MPG N Mean Std Dev Minimum Maximum ---------------------------------------------------------- 26 20.9230769 4.7575042 14.0000000 35.0000000 ----------------------------------------------------------
Suppose we would like to get the summary statistics separately for foreign and domestic cars (indicated by the variable foreign). We can use the class statement (shown below) to get separate results for the different values of foreign.
PROC MEANS DATA=auto; CLASS foreign ; VAR mpg; RUN;
As you see below, the results of proc means are presented separately for the seven foreign cars (when foreign equals 1) and the 19 domestic cars (when foreign equals 0).
Analysis Variable : MPG
FOREIGN N Obs N Mean Std Dev Minimum Maximum
-------------------------------------------------------------
0 19 19 19.78 4.0356598 14.0000 29.00
1 7 7 24.00 5.5075705 17.0000 35.00
--------------------------------------------------------------
4. Using proc univariate for detailed summary statistics
You can use proc univariate to get more detailed summary statistics, as shown below.
PROC UNIVARIATE DATA=auto; VAR mpg; RUN;
And here are the results of proc univariate.
Univariate Procedure
Variable=MPG
Moments
N 26 Sum Wgts 26
Mean 20.92308 Sum 544
Std Dev 4.757504 Variance 22.63385
Skewness 0.935473 Kurtosis 1.7927
USS 11948 CSS 565.8462
CV 22.73807 Std Mean 0.933023
T:Mean=0 22.42503 Pr>|T| 0.0001
Num ^= 0 26 Num > 0 26
M(Sign) 13 Pr>=|M| 0.0001
Sgn Rank 175.5 Pr>=|S| 0.0001
Quantiles(Def=5)
100% Max 35 99% 35
75% Q3 23 95% 29
50% Med 21 90% 26
25% Q1 17 10% 15
0% Min 14 5% 14
1% 14
Range 21
Q3-Q1 6
Mode 22
Extremes
Lowest Obs Highest Obs
14( 15) 24( 25)
14( 14) 25( 6)
15( 8) 26( 10)
16( 18) 29( 17)
16( 12) 35( 24)
We can use the class statement to obtain separate univariate results for foreign and domestic cars.
PROC UNIVARIATE DATA=auto; CLASS foreign; VAR mpg; RUN;
As you see in the output below, you get a complete set of output for the case when foreign equals 0 and then another set of output when foreign equals 1.
The UNIVARIATE Procedure
Variable: MPG
FOREIGN = 0
Moments
N 19 Sum Weights 19
Mean 19.7894737 Sum Observations 376
Std Deviation 4.03565976 Variance 16.2865497
Skewness 0.477379 Kurtosis 0.04119835
Uncorrected SS 7734 Corrected SS 293.157895
Coeff Variation 20.3929616 Std Error Mean 0.92584385
Basic Statistical Measures
Location Variability
Mean 19.78947 Std Deviation 4.03566
Median 20.00000 Variance 16.28655
Mode 22.00000 Range 15.00000
Interquartile Range 6.00000
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t 21.37453 Pr > |t| <.0001
Sign M 9.5 Pr >= |M| <.0001
Signed Rank S 95 Pr >= |S| <.0001
Quantiles (Definition 5)
Quantile Estimate
100% Max 29
99% 29
95% 29
90% 26
75% Q3 22
50% Median 20
25% Q1 16
10% 14
5% 14
1% 14
0% Min 14
Variable: MPG
FOREIGN = 0
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
14 15 22 19
14 14 22 20
15 8 24 21
16 18 26 10
16 12 29 17
Variable: MPG
FOREIGN = 1
Moments
N 7 Sum Weights 7
Mean 24 Sum Observations 168
Std Deviation 5.50757055 Variance 30.3333333
Skewness 1.34081176 Kurtosis 3.28605241
Uncorrected SS 4214 Corrected SS 182
Coeff Variation 22.9482106 Std Error Mean 2.081666
Basic Statistical Measures
Location Variability
Mean 24.00000 Std Deviation 5.50757
Median 23.00000 Variance 30.33333
Mode 23.00000 Range 18.00000
Interquartile Range 4.00000
Tests for Location: Mu0=0
Test -Statistic- -----p Value------
Student's t t 11.52923 Pr > |t| <.0001
Sign M 3.5 Pr >= |M| 0.0156
Signed Rank S 14 Pr >= |S| 0.0156
Quantiles (Definition 5)
Quantile Estimate
100% Max 35
99% 35
95% 35
90% 35
75% Q3 25
50% Median 23
25% Q1 21
10% 17
5% 17
1% 17
0% Min 17
Variable: MPG
FOREIGN = 1
Extreme Observations
----Lowest---- ----Highest---
Value Obs Value Obs
17 4 23 5
21 26 23 23
23 23 24 25
23 5 25 6
24 25 35 24
5. Problems to look out for
If you make a crosstab with proc freq and one of the variables has large number of values (say 10 or more) the crosstab table could be very hard to read. In such cases, try using the list option on the tables statement.
TABLES rep78*foreign / LIST ;
When using the by statement in proc univariate, if you choose a by variable with a large number of values (say 5, 10, or more) it will produce a very large amount of output. In such cases, you may try to use proc means with a class statement instead of proc univariate.
6. For more information
For information on Statistical Tests in SAS, see the SAS Learning Module An Overview of Statistical Tests in SAS.
7. Web Notes
You can view the SAS program associated with this module by clicking descript.sas . While viewing the file, you can save it by choosing File then Save As from the pull-down menu of your web browser — In the Save As dialog box, change the file name to descript.sas and then choose the directory where you want to save the file, then click Save.
