Proc tabulate is predominately used to make nice looking tables. Unlike proc freq this procedure can handle multiple variables in the row and column expressions. It can also handle multiple levels in both rows and columns whereas proc freq will only create two variable contingency tables. This procedure is often used to create tables to be used in publications because it allows for a great deal of manipulation and control over almost every aspect of the table.
Inputting the dataset ex1 to be used in all the following code
data ex1; input treat visit ptn score1 score2; cards; 1 1 1 6.8496 1.3007 1 2 1 14.7009 14.4018 1 3 1 8.9982 2.9965 1 1 2 7.5940 0.1880 1 2 2 14.2160 13.4321 1 3 2 14.6928 14.3855 2 1 3 10.4298 5.8596 2 2 3 10.3169 5.6338 2 3 3 5.4979 4.0041 2 1 4 5.6657 3.6687 2 2 4 13.1932 11.3864 2 3 4 10.2387 5.4774 2 1 1 13.5339 12.0679 2 2 1 5.6718 3.6563 2 3 1 14.5702 14.1405 2 1 2 7.9719 0.9439 2 2 2 7.7261 0.4522 2 3 2 11.8993 8.7986 1 1 3 14.7676 14.5353 1 2 3 7.2651 0.4698 1 3 3 11.8824 8.7647 1 1 4 9.1276 3.2553 1 2 4 10.5855 6.1711 1 3 4 7.8723 0.7445 ; run;
Creating a basic table of patients by treatment showing their score of drug A for each treatment averaged over all visits.
proc tabulate data=ex1; class treat ptn; var score1; table ptn='Patient id', mean=' '*score1='Drug A, average score over all visits'*treat='Treatment'*F=10. / RTS=13.; run; ----------------------------------- | |Drug A, average score| | | over all visits | | |---------------------| | | Treatment | | |---------------------| | | 1 | 2 | |-----------+----------+----------| |Patient id | | | |-----------| | | |1 | 10| 11| |-----------+----------+----------| |2 | 12| 9| |-----------+----------+----------| |3 | 11| 9| |-----------+----------+----------| |4 | 9| 10| -----------------------------------
Creating a table with multiple levels for the columns
Note: If you don't use the formatting sum=' ' or variable_name=' ' in the table statement then SAS will add lines to the top of your table with various default options, i.e., sum as the function and the variable names. So, in this table we have five separate lines at the top of the table.
proc tabulate data=ex1; class ptn treat visit; var score1; table ptn='Patient Id', score1='Drug A'*treat*visit*F=6. / RTS=13.; run;------------------------------------------------------- | | Drug A | | |-----------------------------------------| | | Sum | | |-----------------------------------------| | | treat | | |-----------------------------------------| | | 1 | 2 | | |--------------------+--------------------| | | visit | visit | | |--------------------+--------------------| | | 1 | 2 | 3 | 1 | 2 | 3 | |-----------+------+------+------+------+------+------| |Patient Id | | | | | | | |-----------| | | | | | | |1 | 7| 15| 9| 14| 6| 15| |-----------+------+------+------+------+------+------| |2 | 8| 14| 15| 8| 8| 12| |-----------+------+------+------+------+------+------| |3 | 15| 7| 12| 10| 10| 5| |-----------+------+------+------+------+------+------| |4 | 9| 11| 8| 6| 13| 10| -------------------------------------------------------
Here is the same table but using treat=' ' and visit=' ' to eliminate the extra lines. This leaves only the values of the categories of treat and visit which can be confusing, hence the use of the formatting in the last graph.
proc tabulate data=ex1; class ptn treat visit; var score1; table ptn='Patient Id', sum=' '*score1='Drug A'*treat=' '*visit=' '*F=6. / RTS=13.; run; ------------------------------------------------------- | | Drug A | | |-----------------------------------------| | | 1 | 2 | | |--------------------+--------------------| | | 1 | 2 | 3 | 1 | 2 | 3 | |-----------+------+------+------+------+------+------| |Patient Id | | | | | | | |-----------| | | | | | | |1 | 7| 15| 9| 14| 6| 15| |-----------+------+------+------+------+------+------| |2 | 8| 14| 15| 8| 8| 12| |-----------+------+------+------+------+------+------| |3 | 15| 7| 12| 10| 10| 5| |-----------+------+------+------+------+------+------| |4 | 9| 11| 8| 6| 13| 10| -------------------------------------------------------
The final version of this table makes use of the sum=' ' and variable_name=' ' in the table statement and formatting using proc format.
proc format; value visit 1='Visit 1' 2='Visit 2' 3='Visit 3'; value tr 1='Therapy 1' 2='Therapy 2'; run; proc tabulate data=ex1; class ptn treat visit; var score1 score2; table ptn='Patient id', mean=' '*score1='Drug A'*treat=''*visit=''*F=10./ RTS=13.; format treat tr. visit visit.; run; ------------------------------------------------------------------------------ | | Drug A | | |-----------------------------------------------------------------| | | Therapy 1 | Therapy 2 | | |--------------------------------+--------------------------------| | | Visit 1 | Visit 2 | Visit 3 | Visit 1 | Visit 2 | Visit 3 | |-----------+----------+----------+----------+----------+----------+----------| |Patient id | | | | | | | |-----------| | | | | | | |1 | 7| 15| 9| 14| 6| 15| |-----------+----------+----------+----------+----------+----------+----------| |2 | 8| 14| 15| 8| 8| 12| |-----------+----------+----------+----------+----------+----------+----------| |3 | 15| 7| 12| 10| 10| 5| |-----------+----------+----------+----------+----------+----------+----------| |4 | 9| 11| 8| 6| 13| 10| -------------------------------------------------------------------------------
Eliminating the lines separating the rows by using the noseps option in the proc tabulate statement.
proc tabulate data=ex1 noseps; class treat ptn; var score1; table ptn='Patient id', mean=''*score1='Drug A, average score over all visits'*treat=''*F=10. / RTS=13.; format treat tr.; run; ----------------------------------- | |Drug A, average score| | | over all visits | | |---------------------| | |Therapy 1 |Therapy 2 | |-----------+----------+----------| |Patient id | | | |1 | 10| 11| |2 | 12| 9| |3 | 11| 9| |4 | 9| 10| -----------------------------------
Creating a table with multiple levels of rows, i.e., separating patients by treatment.
proc tabulate data=ex1; class ptn treat visit; var score1; table treat='Treatment'*ptn='Patient id', mean='Average Score from all visits'*score1='Drug A'*F=10. / RTS=25.; format treat tr.; run; ------------------------------------ | | Average | | |Score from| | |all visits| | |----------| | | Drug A | |-----------------------+----------| |Treatment |Patient id | | |-----------+-----------| | |Therapy 1 |1 | 10| | |-----------+----------| | |2 | 12| | |-----------+----------| | |3 | 11| | |-----------+----------| | |4 | 9| |-----------+-----------+----------| |Therapy 2 |1 | 11| | |-----------+----------| | |2 | 9| | |-----------+----------| | |3 | 9| | |-----------+----------| | |4 | 10| ------------------------------------
Creating a table with multiple columns, patients by visits within treatment.
proc tabulate data=ex1; class ptn treat visit; var score1; table ptn='Patient id', mean=' '*score1='Drug A'*treat=''*visit='Visit'*F=3. / RTS=13.; format treat tr.; run;------------------------------------- | | Drug A | | |-----------------------| | | Therapy 1 | Therapy 2 | | |-----------+-----------| | | Visit | Visit | | |-----------+-----------| | | 1 | 2 | 3 | 1 | 2 | 3 | |-----------+---+---+---+---+---+---| |Patient id | | | | | | | |-----------| | | | | | | |1 | 7| 15| 9| 14| 6| 15| |-----------+---+---+---+---+---+---| |2 | 8| 14| 15| 8| 8| 12| |-----------+---+---+---+---+---+---| |3 | 15| 7| 12| 10| 10| 5| |-----------+---+---+---+---+---+---| |4 | 9| 11| 8| 6| 13| 10| -------------------------------------
The length of the cells for the patient id is controlled by the RTS option in the table statement, the length of the cells inside the table is controlled by the *F=d. for each column expression (all the variables listed after the comma in the table statement).
In the following table we are adding several columns with multiple levels. We are also formatting visits to be dates.
Note: Since there is only one observation for each cell the mean is the same as the raw score and it doesn't matter which function you specify (mean, sum, etc) but unless you want a separate line with the function name in it in the table it is advisable to include the function with the specification that the line be blank, i.e., by using mean=' ' or sum=' ' as in the program shown. There are various functions available including sum, mean, n (calculates the frequency), colpctsum, rowpctsum and reppctsum.
proc format; value vi 1='3/20' 2='8/30' 3='11/03'; run; proc tabulate data=ex1; class ptn treat visit; var score1 score2; table ptn='Id #', mean=' '*score1='Drug A'*treat=''*visit=''*F=6. sum=' '*score2='Drug B'*treat=''*visit=''*F=6. / RTS=6.; format treat tr. visit vi.; run;------------------------------------------------------------------------------------------ | | Drug A | Drug B | | |-----------------------------------------+-----------------------------------------| | | Therapy 1 | Therapy 2 | Therapy 1 | Therapy 2 | | |--------------------+--------------------+--------------------+--------------------| | | 3/20 | 8/30 |11/03 | 3/20 | 8/30 |11/03 | 3/20 | 8/30 |11/03 | 3/20 | 8/30 |11/03 | |----+------+------+------+------+------+------+------+------+------+------+------+------| |Id #| | | | | | | | | | | | | |----| | | | | | | | | | | | | |1 | 7| 15| 9| 14| 6| 15| 1| 14| 3| 12| 4| 14| |----+------+------+------+------+------+------+------+------+------+------+------+------| |2 | 8| 14| 15| 8| 8| 12| 0| 13| 14| 1| 0| 9| |----+------+------+------+------+------+------+------+------+------+------+------+------| |3 | 15| 7| 12| 10| 10| 5| 15| 0| 9| 6| 6| 4| |----+------+------+------+------+------+------+------+------+------+------+------+------| |4 | 9| 11| 8| 6| 13| 10| 3| 6| 1| 4| 11| 5| ------------------------------------------------------------------------------------------
Creating a table with multiple variables in the row expression (variables listed before the comma in the table statement).
proc tabulate data=ex1; class ptn treat visit; var score1 score2; table treat='Treatment' visit='Visit', mean=' '*score1='Drug A'*ptn='Patient Id'; format treat tr. visit vi.;---------------------------------------------------------------------------- | | Drug A | | |---------------------------------------------------| | | Patient Id | | |---------------------------------------------------| | | 1 | 2 | 3 | 4 | |----------------------+------------+------------+------------+------------| |Treatment | | | | | |----------------------| | | | | |Therapy 1 | 10.18| 12.17| 11.31| 9.20| |----------------------+------------+------------+------------+------------| |Therapy 2 | 11.26| 9.20| 8.75| 9.70| |----------------------+------------+------------+------------+------------| |visit | | | | | |----------------------| | | | | |3/20 | 10.19| 7.78| 12.60| 7.40| |----------------------+------------+------------+------------+------------| |8/30 | 10.19| 10.97| 8.79| 11.89| |----------------------+------------+------------+------------+------------| |11/03 | 11.78| 13.30| 8.69| 9.06| ----------------------------------------------------------------------------
Creating a table with missing values labeled 'Missing' by using the misstext option in the table statement.
data miss; set ex1; if ptn=4 then score1=. ; run; proc tabulate data=miss; class ptn visit; var score1; table ptn='Patient Id', mean=' '*score1='Drug A'* visit='Visit'/ misstext=[label="Missing"] RTS=25.; format visit vi.; run;---------------------------------------------------------- | | Drug A | | |--------------------------------| | | 3/20 | 8/30 | 11/03 | |-----------------------+----------+----------+----------| |Patient id | | | | |-----------------------| | | | |1 | 20| 20| 24| |-----------------------+----------+----------+----------| |2 | 16| 22| 27| |-----------------------+----------+----------+----------| |3 | 25| 18| 17| |-----------------------+----------+----------+----------| |4 | Missing| Missing| Missing| ----------------------------------------------------------