How can I get an R2 with robust regression (rreg)?

Some Stata users have found that there are values of e(r2) and e(r2_a) after running the rreg. You can see it in the example below using the crime dataset. We want to caution against using these values as measures of model fit (see discussion below).

use https://stats.idre.ucla.edu/stat/data/crime, clear

rreg crime pctmetro pcths poverty single, tolerance(.001)

   Huber iteration 1:  maximum difference in weights = .40486383
   Huber iteration 2:  maximum difference in weights = .10882095
   Huber iteration 3:  maximum difference in weights = .012962
   Huber iteration 4:  maximum difference in weights = .00356214
Biweight iteration 5:  maximum difference in weights = .1550734
Biweight iteration 6:  maximum difference in weights = .00583604
Biweight iteration 7:  maximum difference in weights = .00102764
Biweight iteration 8:  maximum difference in weights = .00023256

Robust regression                                      Number of obs =      50
                                                       F(  4,    45) =   27.85
                                                       Prob > F      =  0.0000

------------------------------------------------------------------------------
       crime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    pctmetro |   7.008278   1.182934     5.92   0.000     4.625726    9.390829
       pcths |  -2.049362    6.96946    -0.29   0.770    -16.08657    11.98785
     poverty |   15.49762   10.16624     1.52   0.134    -4.978252    35.97348
      single |   99.83795   19.01549     5.25   0.000     61.53879    138.1371
       _cons |   -1071.87   641.3146    -1.67   0.102    -2363.544    219.8038
------------------------------------------------------------------------------

display "R2 = " e(r2)

R2 = .71228451

display "adjusted R2 = " e(r2_a)

adjusted R2 = .6867098

To understand why these values shown above are not appropriate you need to understand what is going on inside the rreg program. rreg goes through a series of iterations in which it computes and recomputes weights for each of the observations. After the program reaches convergence, it goes through one more step in which it creates pseduovalues of the dependent variable using the final set of weights, a scaling factor and a couple of other values. It then uses the pseudovalues as the response variable in an OLS regression. The ereturn values, such as e(r2), e(r2_a), etc, are left over from that OLS regression model. According to Street, Carroll and Ruppert these auxiliary values that are left over from the pseudovalue regression are not meaningful and should not be used.

UCLA Statistical Consulting has written a program, rregfit, that will compute R-squared and several other fit indices. You can download the rregfit command by typing search rregfit in the Stata command line (see How can I use the search command to search for programs and get additional help? for more information about using search). It is demonstrated in the example below using the robust regression model from above.

rregfit

robust regression measures of fit
R-square = .66989605
AICR     = 42.917151
BICR     = 55.940273
deviance = 1064093

Using rregfit the R-squared was 0.67 while the ereturn list from rreg gave the incorrect value of 0.71.

Reference

Hampel, F. R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986) Robust Statistics: The Approach Based on Influence Functions, New York: John Wiley & Sons, Inc.

Ronchetti, E. (1985) “Robust Model Selection in Regression,” Statistics and Probability Letters, 3, 21-23.

(2008) SAS 9.2 Documentation for GLM. Cary, NC: SAS Institute Inc.

Street, J.O., Carroll, R.J. and Ruppert, D. A (1988) note on computing robust regression estimates via iteratively reweighted least squares. The American Statistician, Vol 42, No. 2, pp. 152-154.