How can I estimate relative risk in SAS using proc genmod for common outcomes in cohort studies?

Credits

This page was developed and written by Karla Lindquist, Senior Statistician in the Division of Geriatrics at UCSF. We are very grateful to Karla for taking the time to develop this page and giving us permission to post it on our site.

Introduction

Binary outcomes in cohort studies are commonly analyzed by applying a logistic regression model to the data to obtain odds ratios for comparing groups with different sets of characteristics. Although this is often appropriate, there may be situations in which it is more desirable to estimate a relative risk or risk ratio (RR) instead of an odds ratio (OR). Several articles in recent medical and public health literature point out that when the outcome event is common (incidence of 10% or more), it is often more desirable to estimate an RR since there is an increasing differential between the RR and OR with increasing incidence rates, and there is a tendency for some to interpret ORs as if they are RRs ([1]-[3]). There are some who hold the opinion that the OR should be used even when the outcome is common, however ([4]). Here the purpose is to demonstrate methods for calculating the RR, assuming that it is the appropriate thing to do. There are several options for how to estimate RRs directly in SAS, which have been demonstrated to be reliable in simulated and real data sets of various sizes and outcome incidence rates ([1],[2]). Two of these methods will be demonstrated here using hypothetical data created for this purpose. Both methods use proc genmod. One estimates the RR with a log-binomial regression model, and the other uses a Poisson regression model with a robust error variance.

Example Data: Odds ratio versus relative risk

A hypothetical data set was created to illustrate two methods of estimating relative risks using SAS. The outcome generated is called lenses, to indicate if the hypothetical study participants require corrective lenses by the time they are 30 years old. Assume all participants do not need them at a baseline assessment when they are 10 years old. Assume none of them have had serious head injuries or had brain tumors or other major health problems during the 20 years between assessments. Suppose we wanted to know if requiring corrective lenses is associated with having a gene which causes one to have a lifelong love and craving for carrots (assume not having this gene results in the opposite), and that we screened everyone for this carrot gene at baseline (carrot = 1 if they have it, = 0 if not). We also noted their gender (= 1 if female, = 2 if male), and what latitude of the continental US they lived on the longest (24 to 48 degrees north). All values (N=100) were assigned using a random number generator. The data are in https://stats.idre.ucla.edu/wp-content/uploads/2016/02/eyestudy.sas7bdat. Here s a quick description of the variables:

proc means data = eyestudy maxdec = 2;
 var carrot gender latitude lenses;
run;

The MEANS Procedure

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
carrot      100            0.51            0.50            0.00            1.00
gender      100            1.48            0.50            1.00            2.00
latitude    100           35.97            7.51           24.00           48.00
lenses      100            0.53            0.50            0.00            1.00
-------------------------------------------------------------------------------

We have an overall outcome rate of 53%. So if we want to talk about whether the carrot-loving gene, gender, or latitude is associated with the risk of requiring corrective lenses by the age of 30, then relative risk is a more appropriate measure than the odds ratio. Here is a simple crosstab of carrot and lenses, which will allow us to calculate the unadjusted OR and RR by hand.

proc freq data = eyestudy;
 tables carrot*lenses/nopercent nocol;
run;

Table of carrot by lenses

carrot     lenses

Frequency|
Row Pct  |       0|       1|  Total
---------+--------+--------+
       0 |     17 |     32 |     49
         |  34.69 |  65.31 |
---------+--------+--------+
       1 |     30 |     21 |     51
         |  58.82 |  41.18 |
---------+--------+--------+
Total          47       53      100

It is interesting that fewer people with the carrot-loving gene needed corrective lenses (especially since these are fake data!). The OR and RR for those without the carrot gene versus those with it are:

OR = (32/17)/(21/30) = 2.69

RR = (32/49)/(21/51) = 1.59

We could use either proc logistic or proc genmod to calculate the OR. Since proc genmod will be used to calculate the RR, it will also be used to calculate the OR for comparison purposes (and it gives the same results as proc logistic). Here is the logistic regression with just carrot as the predictor:

proc genmod data = eyestudy descending;
 class carrot;
 model lenses = carrot/ dist = binomial link = logit;
 estimate 'Beta' carrot 1 -1/ exp;
run;

The GENMOD Procedure

        Model Information

Data Set                 EYESTUDY
Distribution             Binomial
Link Function               Logit
Dependent Variable         lenses
Observations Used             100

  Class Level Information

Class       Levels    Values

carrot           2    0 1

        Response Profile

 Ordered                  Total
   Value    lenses    Frequency

       1    1                53
       2    0                47

PROC GENMOD is modeling the probability that lenses='1'.

       Parameter Information

Parameter       Effect       carrot

Prm1            Intercept
Prm2            carrot       0
Prm3            carrot       1


           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                  98        132.3665          1.3507
Scaled Deviance           98        132.3665          1.3507
Pearson Chi-Square        98        100.0000          1.0204
Scaled Pearson X2         98        100.0000          1.0204
Log Likelihood                      -66.1832

Algorithm converged. 
                               Analysis Of Parameter Estimates

                                    Standard     Wald 95% Confidence       Chi-
Parameter         DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept          1     -0.3567      0.2845     -0.9143      0.2010       1.57        0.2100
carrot       0     1      0.9892      0.4136      0.1786      1.7997       5.72        0.0168
carrot       1     0      0.0000      0.0000      0.0000      0.0000        .           .
Scale              0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.

                                 Contrast Estimate Results

                         Standard                                        Chi-
Label        Estimate       Error     Alpha      Confidence Limits     Square    Pr > ChiSq

Beta           0.9892      0.4136      0.05      0.1786      1.7997      5.72        0.0168
Exp(Beta)      2.6891      1.1121      0.05      1.1956      6.0481

The estimate statement with the exp option gives us the same OR we calculated by hand above for those without the carrot gene versus those with it. Now this can be contrasted with the two methods of calculating the RR described below.

Relative risk estimation by log-binomial regression

With a very minor modification of the statements used above for the logistic regression, a log-binomial model can be run to get the RR instead of the OR. All that needs to be changed is the link function between the covariate(s) and outcome. Here it is specified as log instead of logit:

proc genmod data = eyestudy descending;
 class carrot;
 model lenses = carrot/ dist = binomial link = log;
 estimate 'Beta' carrot 1 -1/ exp;
run;

The GENMOD Procedure

        Model Information

Data Set                 EYESTUDY
Distribution             Binomial
Link Function                 Log
Dependent Variable         lenses
Observations Used             100

  Class Level Information

Class       Levels    Values

carrot           2    0 1

        Response Profile

 Ordered                  Total
   Value    lenses    Frequency

       1    1                53
       2    0                47

PROC GENMOD is modeling the probability that lenses='1'.

       Parameter Information

Parameter       Effect       carrot

Prm1            Intercept
Prm2            carrot       0
Prm3            carrot       1

           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                  98        132.3665          1.3507
Scaled Deviance           98        132.3665          1.3507
Pearson Chi-Square        98        100.0000          1.0204
Scaled Pearson X2         98        100.0000          1.0204
Log Likelihood                      -66.1832

Algorithm converged.

                               Analysis Of Parameter Estimates

                                    Standard     Wald 95% Confidence       Chi-
Parameter         DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept          1     -0.8873      0.1674     -1.2153     -0.5593      28.11        <.0001
carrot       0     1      0.4612      0.1971      0.0749      0.8476       5.48        0.0193
carrot       1     0      0.0000      0.0000      0.0000      0.0000        .           .
Scale              0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.

                                 Contrast Estimate Results

                         Standard                                        Chi-
Label        Estimate       Error     Alpha      Confidence Limits     Square    Pr > ChiSq

Beta           0.4612      0.1971      0.05      0.0749      0.8476      5.48        0.0193
Exp(Beta)      1.5860      0.3126      0.05      1.0778      2.3339

Now the exp option on the estimate statement gives us the estimated RR instead of the OR, and it also matches what was calculated by hand above for the RR. Notice that the standard error (SE) for the beta estimate calculated here is much smaller than that calculated in the logistic regression above (SE = 0.414), but so is the estimate itself (logistic regression beta estimate = 0.989), so the significance level is very similar (logistic regression p = 0.017) in this case. One of the criticisms of using the log-binomial model for the RR is that it produces confidence intervals that are narrower than they should be, and another is that there can be convergence problems (1,2). This is why the second approach is also presented here.

Relative risk estimation by Poisson regression with robust error variance

Zou ([2]) suggests using a modified Poisson approach to estimate the relative risk and confidence intervals by using robust error variances. Using a Poisson model without robust error variances will result in a confidence interval that is too wide. The robust error variances can be estimated by using the repeated statement and the subject identifier (here id), even if there is only one observation per subject, as Zou cleverly points out. Here is how it is done:

proc genmod data = eyestudy;
 class carrot id;
 model lenses = carrot/ dist = poisson link = log;
 repeated subject = id/ type = unstr;
 estimate 'Beta' carrot 1 -1/ exp;
run;

Notice that id, the individual subject identifier, has been added to the class statement and is also on the repeated statement (with an unstructured correlation matrix), telling proc genmod to calculate the robust errors. Also notice that the distribution has been changed to Poisson, but the link function remains log.

The GENMOD Procedure

        Model Information

Data Set                 EYESTUDY
Distribution              Poisson
Link Function                 Log
Dependent Variable         lenses
Observations Used             100

                        Class Level Information

Class       Levels    Values

carrot           2    0 1
id             100    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
                      21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
                      38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
                      55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
                      72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
                      ...

       Parameter Information

Parameter       Effect       carrot

Prm1            Intercept
Prm2            carrot       0
Prm3            carrot       1

           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                  98         64.5361          0.6585
Scaled Deviance           98         64.5361          0.6585
Pearson Chi-Square        98         47.0000          0.4796
Scaled Pearson X2         98         47.0000          0.4796
Log Likelihood                      -85.2681

Algorithm converged.
                           Analysis Of Initial Parameter Estimates

                                    Standard     Wald 95% Confidence       Chi-
Parameter         DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept          1     -0.8873      0.2182     -1.3150     -0.4596      16.53        <.0001
carrot       0     1      0.4612      0.2808     -0.0892      1.0116       2.70        0.1005
carrot       1     0      0.0000      0.0000      0.0000      0.0000        .           .
Scale              0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.

             GEE Model Information

Correlation Structure              Unstructured
Subject Effect                  id (100 levels)
Number of Clusters                          100
Correlation Matrix Dimension                  1
Maximum Cluster Size                          1
Minimum Cluster Size                          1

Algorithm converged.

              Analysis Of GEE Parameter Estimates
               Empirical Standard Error Estimates

                     Standard   95% Confidence
Parameter   Estimate    Error       Limits            Z Pr > |Z|

Intercept    -0.8873   0.1674  -1.2153  -0.5593   -5.30   <.0001
carrot    0   0.4612   0.1971   0.0749   0.8476    2.34   0.0193
carrot    1   0.0000   0.0000   0.0000   0.0000     .      .

                                 Contrast Estimate Results

                         Standard                                        Chi-
Label        Estimate       Error     Alpha      Confidence Limits     Square    Pr > ChiSq

Beta           0.4612      0.1971      0.05      0.0749      0.8476      5.48        0.0193
Exp(Beta)      1.5860      0.3126      0.05      1.0778      2.3339

Again, the exp option on the estimate statement gives us the estimated RR, and it matches exactly what was calculated by the log-binomial method. In this case, the SE for the beta estimate and the p-value are also exactly the same as in the log-binomial model. This may not always be the case, but they should be similar. The SE calculated without the repeated statement (i.e., not using robust error variances) is 0.281, and the p-value is 0.101, so the robust method is quite different.

Adjusting the relative risk for continuous or categorical covariates

Adjusting the RR for other predictors or potential confounders is simply done by adding them to the model statement as you would in any other procedure. Here gender and latitude will be added to the model:

proc genmod data = eyestudy;
 class carrot gender id;
 model lenses = carrot gender latitude/ dist = poisson link = log;
 repeated subject = id/ type = unstr;
 estimate 'Beta Carrot' carrot 1 -1/ exp;
 estimate 'Beta Gender' gender 1 -1/ exp;
 estimate 'Beta Latitude' latitude 1 / exp;
run;

The GENMOD Procedure

        Model Information

Data Set                 EYESTUDY
Distribution              Poisson
Link Function                 Log
Dependent Variable         lenses
Observations Used             100

                        Class Level Information

Class       Levels    Values

carrot           2    0 1
gender           2    1 2
id             100    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
                      21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
                      38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
                      55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
                      72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
                      ...

            Parameter Information

Parameter       Effect       carrot    gender

Prm1            Intercept
Prm2            carrot       0
Prm3            carrot       1
Prm4            gender                 1
Prm5            gender                 2
Prm6            latitude

           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                  96         63.7618          0.6642
Scaled Deviance           96         63.7618          0.6642
Pearson Chi-Square        96         46.7434          0.4869
Scaled Pearson X2         96         46.7434          0.4869
Log Likelihood                      -84.8809

Algorithm converged.

                           Analysis Of Initial Parameter Estimates

                                    Standard     Wald 95% Confidence       Chi-
Parameter         DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept          1     -0.6521      0.6982     -2.0206      0.7163       0.87        0.3503
carrot       0     1      0.4832      0.2831     -0.0716      1.0381       2.91        0.0878
carrot       1     0      0.0000      0.0000      0.0000      0.0000        .           .
gender       1     1      0.2052      0.2781     -0.3398      0.7502       0.54        0.4605
gender       2     0      0.0000      0.0000      0.0000      0.0000        .           .
latitude           1     -0.0100      0.0190     -0.0472      0.0272       0.28        0.5980
Scale              0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.

             GEE Model Information

Correlation Structure              Unstructured
Subject Effect                  id (100 levels)
Number of Clusters                          100
Correlation Matrix Dimension                  1
Maximum Cluster Size                          1
Minimum Cluster Size                          1

Algorithm converged.

              Analysis Of GEE Parameter Estimates
               Empirical Standard Error Estimates

                     Standard   95% Confidence
Parameter   Estimate    Error       Limits            Z Pr > |Z|

Intercept    -0.6521   0.4904  -1.6134   0.3091   -1.33   0.1836
carrot    0   0.4832   0.1954   0.1003   0.8662    2.47   0.0134
carrot    1   0.0000   0.0000   0.0000   0.0000     .      .
gender    1   0.2052   0.1848  -0.1570   0.5674    1.11   0.2669
gender    2   0.0000   0.0000   0.0000   0.0000     .      .
latitude     -0.0100   0.0127  -0.0350   0.0150   -0.79   0.4324

                                  Contrast Estimate Results

                                Standard                                    Chi-
Label                Estimate      Error    Alpha    Confidence Limits    Square   Pr > ChiSq

Beta Carrot            0.4832     0.1954     0.05     0.1003     0.8662     6.12       0.0134
Exp(Beta Carrot)       1.6213     0.3168     0.05     1.1055     2.3777
Beta Gender            0.2052     0.1848     0.05    -0.1570     0.5674     1.23       0.2669
Exp(Beta Gender)       1.2278     0.2269     0.05     0.8547     1.7637
Beta Latitude         -0.0100     0.0127     0.05    -0.0350     0.0150     0.62       0.4324
Exp(Beta Latitude)     0.9900     0.0126     0.05     0.9656     1.0151

We have also requested the RRs for gender and latitude in the estimate statement. In this case, adjusting for them does not reduce the association between having the carrot-loving gene and risk of needing corrective lenses by age 30.

One should always pay attention to goodness of fit statistics and perform other diagnostic tests. Refer to Categorical Data Analysis Using the SAS System, by M. Stokes, C. Davis and G. Kock for standard methods of checking whichever type of model you use.

References

1. McNutt LA, Wu C, Xue X, Hafner JP. Estimating the Relative Risk in Cohort Studies and Clinical Trials of Common Outcomes. Am J Epidemiol 2003; 157(10):940-3. 2. Zou G. A Modified Poisson Regression Approach to Prospective Studies with Binary Data. Am J Epidemiol 2004; 159(7):702-6. 3. Sander Greenland , Model-based Estimation of Relative Risks and Other Epidemiologic Measures in Studies of Common Outcomes and in Case-Control Studies, American Journal of Epidemiology 2004;160:301-305 4. Cook TD. Up with odds ratios! A case for odds ratios when outcomes are common. Acad Emerg Med 2002; 9:1430-4. 5. Spiegelman, D. und Hertzmark, Easy SAS Calculations for Risk or Prevalence Ratios and Differences, E American Journal of Epidemiology, 2005, 162, 199-205.