It is not uncommon for a cross tabulation of two variables to produce cells with zero counts. In these cases, the output for proc freq with the list option will omit combinations of variable values that have zero counts. For example, in the sample data below, there are no cases where gender = 2 and eth = 2. The dataset below contains three variables. Two variables, gender and eth, are the categorical variables we want to cross tabulate. The third variable is count, which is a frequency weight, indicating the number of cases with that pattern in the dataset, i.e., there are 21 cases where gender=1 and eth=3.
data test; input gender eth count; datalines; 1 1 12 1 2 12 1 3 21 2 1 2 2 3 43 ; run;
When we run proc freq without the list option, the table includes the cell with a zero count (i.e., gender=2 and eth=2). The weight statement specifies that the variable count should be used to determine the number of cases with each pattern in the dataset.
proc freq data = test; weight count; table gender*eth; run; The FREQ Procedure Table of gender by eth gender eth Frequency Percent Row Pct Col Pct 1 2 3 Total 1 12 12 21 45 13.33 13.33 23.33 50.00 26.67 26.67 46.67 85.71 100.00 32.81 2 2 0 43 45 2.22 0.00 47.78 50.00 4.44 0.00 95.56 14.29 0.00 67.19 Total 14 12 64 90 15.56 13.33 71.11 100.00
Running proc freq with the list option as shown below, however, produces a table without a line for gender=2 and eth=2, because the count for this combination is zero.
proc freq data = test; weight count; table gender*eth /missprint list; run; Cumulative Cumulative gender eth Frequency Percent Frequency Percent 1 1 12 13.33 12 13.33 1 2 12 13.33 24 26.67 1 3 21 23.33 45 50.00 2 1 2 2.22 47 52.22 2 3 43 47.78 90 100.00
It is possible, with a little work, to get a table such as the one above, that does contain the missing cell. The first step is to create a variable that is a constant equal to one. We do this below with the variable called one.
data test; set test; one = 1; run;
Next we will use ods to capture the output from running proc means. The option summary = t instructs SAS to store the summary produced by proc means in a dataset t. In the proc means command, the n option specifies that only the n (i.e., the count of cases) should be computed. The completetypes option specifies that all combinations of the class variables (gender and eth) should be listed, including those with zero counts in the dataset; otherwise, they would be omitted. The freq option specifies that the variable count should be used as the number of cases that follow a given pattern in the dataset (i.e., in this example, the freq option does in proc means what weight did in proc freq). The var statement indicates the variable about which statistics should be calculated; this is where the variable one is used.
ods output summary = t; proc means data = test n completetypes; class gender eth; freq count; var one; run; The MEANS Procedure Analysis Variable : one gender eth N Obs N 1 1 12 12 2 12 12 3 21 21 2 1 2 2 2 0 0 3 43 43
This creates a dataset t (shown below) with six observations, one for each combination of eth by gender both of which appear as variables in our new dataset. The new dataset also contains a variable called one_N, which is the number of cases that fall into each of the categories of gender by eth (including a count of 0 where appropriate).
proc print data=t; run; Obs gender eth NObs one_N 1 1 1 12 12 2 1 2 12 12 3 1 3 21 21 4 2 1 2 2 5 2 2 0 0 6 2 3 43 43
Now we can run proc freq on the new dataset t. The weight statement uses one_N as the variable containing the count of cases with a given pattern of variables. The /zeros option instructs SAS to include in the list cases in which one_N = 0. The resulting list includes a row for the combination of gender and eth with a zero count.
proc freq data = t; table gender*eth /list; weight one_N /zeros; run; The FREQ Procedure Cumulative Cumulative gender eth Frequency Percent Frequency Percent 1 1 12 13.33 12 13.33 1 2 12 13.33 24 26.67 1 3 21 23.33 45 50.00 2 1 2 2.22 47 52.22 2 2 0 0.00 47 52.22 2 3 43 47.78 90 100.00