Robust Regression | SAS Data Analysis Examples

Robust regression is an alternative to least squares regression when data is contaminated with outliers or influential observations and it can also be used for the purpose of detecting influential observations.

Please note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics or potential follow-up analyses.

This page was developed using SAS 9.2.

Introduction

Let’s begin our discussion on robust regression with some terms in linear regression.

Residual: The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients.

Influence: An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients. Influence can be thought of as the product of leverage and outlierness.

Cook’s distance (or Cook’s D): A measure that combines the information of leverage and residual of the observation.

Robust regression can be used in any situation in which you would use least squares regression. When fitting a least squares regression, we might find some outliers or high leverage data points. We have decided that these data points are not data entry errors, neither they are from a different population than most of our data. So we have no compelling reason to exclude them from the analysis. Robust regression might be a good strategy since it is a compromise between excluding these points entirely from the analysis and including all the data points and treating all them equally in OLS regression. The idea of robust regression is to weigh the observations differently based on how well behaved these observations are. Roughly speaking, it is a form of weighted and reweighted least squares regression.

Proc robustreg in SAS command implements several versions of robust regression. In this page, we will show M-estimation with Huber and bisquare weighting. These two are very standard and are combined as the default weighting function in Stata’s robust regression command. In Huber weighting, observations with small residuals get a weight of 1 and the larger the residual, the smaller the weight. With bisquare weighting, all cases with a non-zero residual get down-weighted at least a little.

Description of the example data

For our data analysis below, we will use the data set crime. This dataset appears in Statistical Methods for Social Sciences, Third Edition by Alan Agresti and Barbara Finlay (Prentice Hall, 1997). The variables are state id (sid), state name (state), violent crimes per 100,000 people (crime), murders per 1,000,000 (murder), the percent of the population living in metropolitan areas (pctmetro), the percent of the population that is white (pctwhite), percent of population with a high school education or above (pcths), percent of population living under poverty line (poverty), and percent of population that are single parents (single). It has 51 observations. We are going to use poverty and single to predict crime.

data crime;
  infile "crime.csv" delimiter="," firstobs=2;
  input sid state $ crime murder pctmetro pctwhite pcths poverty single;
run;

proc means data = crime;
  var crime poverty single;
run;

The MEANS Procedure

Variable     N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------
crime       51     612.8431373     441.1003229      82.0000000         2922.00
poverty     51      14.2588235       4.5842415       8.0000000      26.4000000
single      51      11.3254902       2.1214941       8.4000000      22.1000000
------------------------------------------------------------------------------

Using robust regression analysis

In most cases, we begin by running an OLS regression and doing some diagnostics. We will begin by running an OLS regression. We create a graph showing the leverage versus the squared residuals, labeling the points with the state abbreviations. To do so, we output the residuals and leverage in proc reg (along with Cook’s-D, which we will use later).

proc reg data = crime;
  model crime = poverty single ;
  output out = t student=res cookd = cookd h = lev;
run;
quit;

data t; set t;
  resid_sq = res*res;
run;

proc sgplot data = t;
  scatter y = lev x = resid_sq / datalabel = state;
run;
quit;

As we can see, DC, Florida and Mississippi have either high leverage or large residuals. We can display the observations that have relatively large values of Cook’s D. A conventional cut-off point is 4/n, where n is the number of observations in the data set. We will use this criterion to select the values to display.

proc print data = t;
  where cookd > 4/51;
  var state crime poverty single cookd;
run;

Obs    state    crime    poverty    single     cookd

  1     ak        761       9.1      14.3     0.12547
  9     fl       1206      17.8      10.6     0.14259
 25     ms        434      24.7      14.7     0.61387
 51     dc       2922      26.4      22.1     2.63625

We probably should drop DC to begin with since it is not even a state. We include it in the analysis just to show that it has large Cook’s D and demonstrate how it will be handled by proc robustreg. Now we will look at the residuals. We will generate a new variable called absr1, which is the absolute value of the residuals (because the sign of the residual doesn’t matter). We then print the ten observations with the highest absolute residual values.

data t2; set t;
  rabs = abs(res);
run;

proc sort data  = t2;
  by descending rabs;
run;

proc print data = t2 (obs=10);
run;

                          p      p                                                    r
                          c      c           p                                        e
                    m     t      t           o     s                                  s
        s     c     u     m      w     p     v     i                c                 i
        t     r     r     e      h     c     e     n                o                 d        r
 O   s  a     i     d     t      i     t     r     g       r        o        l        _        a
 b   i  t     m     e     r      t     h     t     l       e        k        e        s        b
 s   d  e     e     r     o      e     s     y     e       s        d        v        q        s

 1  25  ms   434  13.5   30.7  63.3  64.3  24.7  14.7  -3.56299  0.61387  0.12669  12.6949  3.56299
 2   9  fl  1206   8.9   93.0  83.5  74.4  17.8  10.6   2.90266  0.14259  0.04832   8.4255  2.90266
 3  51  dc  2922  78.5  100.0  31.8  73.1  26.4  22.1   2.61645  2.63625  0.53602   6.8458  2.61645
 4  46  vt   114   3.6   27.0  98.4  80.8  10.0  11.0  -1.74241  0.04272  0.04050   3.0360  1.74241
 5  26  mt   178   3.0   24.0  92.6  81.0  14.9  10.8  -1.46088  0.01676  0.02301   2.1342  1.46088
 6  21  me   126   1.6   35.7  98.5  78.8  10.7  10.6  -1.42674  0.02233  0.03186   2.0356  1.42674
 7   1  ak   761   9.0   41.8  75.2  86.6   9.1  14.3  -1.39742  0.12547  0.16161   1.9528  1.39742
 8  31  nj   627   5.3  100.0  80.8  76.7  10.9   9.6   1.35415  0.02229  0.03519   1.8337  1.35415
 9  14  il   960  11.4   84.0  81.0  76.2  13.6  11.5   1.33819  0.01266  0.02076   1.7908  1.33819
10  20  md   998  12.7   92.8  68.9  78.4   9.7  12.0   1.28709  0.03570  0.06072   1.6566  1.28709

Now let’s run our first robust regression. Robust regression is done by iterated re-weighted least squares. The procedure for running robust regression is proc robustreg. There are a couple of estimators for IWLS. We are going to first use the Huber weights in this example. We can save the final weights created by the IWLS process. This can be very useful. We will use the data set t2 generated above.

proc robustreg data=t2 method=m (wf=huber) ;
   model crime = poverty single; 
   output out = t3 weight=wgt;
run;
 
                Model Information
Data Set                                  WORK.T2
Dependent Variable                          crime
Number of Independent Variables                 2
Number of Observations                         51
Method                               M Estimation

Number of Observations Read          51
Number of Observations Used          51

                                Summary Statistics
                                                              Standard
Variable           Q1      Median          Q3        Mean    Deviation         MAD

poverty       10.7000     13.1000     17.4000     14.2588       4.5842      4.2995
single        10.0000     10.9000     12.1000     11.3255       2.1215      1.4826
crime           326.0       515.0       780.0       612.8        441.1       345.4

                        Parameter Estimates
                        
                      Standard   95% Confidence     Chi-
Parameter DF Estimate    Error       Limits       Square Pr > ChiSq

Intercept  1 -1423.23 167.5099 -1751.54 -1094.91   72.19     <.0001
poverty    1   8.8694   8.0429  -6.8944  24.6331    1.22     0.2701
single     1 169.0012  17.3795 134.9381 203.0644   94.56     <.0001
Scale      1 181.7251

         Diagnostics Summary

Observation
Type           Proportion      Cutoff
Outlier            0.0392      3.0000

   Goodness-of-Fit

Statistic       Value

R-Square       0.5257
AICR          73.1089
BICR          78.9100
Deviance      2216391

proc sort data = t3;
  by wgt ;
run;
proc print data = t3 (obs=15);
  var state crime poverty single res cookd lev wgt;
run;

Obs    state    crime    poverty    single       res       cookd       lev        wgt
  1     ms        434      24.7      14.7     -3.56299    0.61387    0.12669    0.28886
  2     fl       1206      17.8      10.6      2.90266    0.14259    0.04832    0.35947
  3     vt        114      10.0      11.0     -1.74241    0.04272    0.04050    0.59545
  4     dc       2922      26.4      22.1      2.61645    2.63625    0.53602    0.64980
  5     mt        178      14.9      10.8     -1.46088    0.01676    0.02301    0.68630
  6     me        126      10.7      10.6     -1.42674    0.02233    0.03186    0.72509
  7     nj        627      10.9       9.6      1.35415    0.02229    0.03519    0.73812
  8     il        960      13.6      11.5      1.33819    0.01266    0.02076    0.76600
  9     ak        761       9.1      14.3     -1.39742    0.12547    0.16161    0.78039
 10     md        998       9.7      12.0      1.28709    0.03570    0.06072    0.79570
 11     ma        805      10.7      10.9      1.19854    0.01640    0.03311    0.83933
 12     la       1062      26.4      14.9     -1.02183    0.06700    0.16143    0.91528
 13     ca       1078      18.2      12.5      1.01521    0.01231    0.03458    1.00000
 14     wy        286      13.3      10.8     -0.96626    0.00667    0.02099    1.00000
 15     sc       1023      18.7      12.3      0.91213    0.01111    0.03853    1.00000

We can see that roughly, as the absolute residual goes down, the weight goes up. In other words, cases with a large residuals tend to be down-weighted. We can also see that the values of Cook's D don't really correspond to the weights. This output shows us that the observation for Mississippi will be down-weighted the most. Florida will also be substantially down-weighted. All observations not shown above have a weight of 1. In OLS regression, all cases have a weight of 1. Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions.

Next, let's run the same model, but using the default weighting function. Again, we can look at the weights.

proc robustreg data=t2 method=m (wf=bisquare);
  model crime = poverty single; 
  output out = t4 weight=wgt;
run;

                Model Information

Data Set                                  WORK.T2
Dependent Variable                          crime
Number of Independent Variables                 2
Number of Observations                         51
Method                               M Estimation


Number of Observations Read          51
Number of Observations Used          51

                                Summary Statistics

                                                              Standard
Variable           Q1      Median          Q3        Mean    Deviation         MAD

poverty       10.7000     13.1000     17.4000     14.2588       4.5842      4.2995
single        10.0000     10.9000     12.1000     11.3255       2.1215      1.4826
crime           326.0       515.0       780.0       612.8        441.1       345.4

                        Parameter Estimates

                      Standard   95% Confidence     Chi-
Parameter DF Estimate    Error       Limits       Square Pr > ChiSq

Intercept  1 -1535.36 164.4627 -1857.70 -1213.02   87.15     <.0001
poverty    1  11.6911   7.8966  -3.7859  27.1681    2.19     0.1387
single     1 175.9320  17.0633 142.4884 209.3755  106.31     <.0001
Scale      1 202.3464


         Diagnostics Summary
Observation
Type           Proportion      Cutoff
Outlier            0.0392      3.0000

   Goodness-of-Fit

Statistic       Value

R-Square       0.4545
AICR          55.1327
BICR          62.3771
Deviance      2071016

proc print data = t4 (obs=15);
  var state crime poverty single res cookd lev wgt;
run;

Obs    state    crime    poverty    single       res       cookd       lev        wgt
  1     ms        434      24.7      14.7     -3.56299    0.61387    0.12669    0.00764
  2     fl       1206      17.8      10.6      2.90266    0.14259    0.04832    0.25292
  3     vt        114      10.0      11.0     -1.74241    0.04272    0.04050    0.67151
  4     mt        178      14.9      10.8     -1.46088    0.01676    0.02301    0.73114
  5     nj        627      10.9       9.6      1.35415    0.02229    0.03519    0.75135
  6     la       1062      26.4      14.9     -1.02183    0.06700    0.16143    0.76887
  7     me        126      10.7      10.6     -1.42674    0.02233    0.03186    0.77412
  8     ak        761       9.1      14.3     -1.39742    0.12547    0.16161    0.77766
  9     il        960      13.6      11.5      1.33819    0.01266    0.02076    0.79368
 10     md        998       9.7      12.0      1.28709    0.03570    0.06072    0.79908
 11     ma        805      10.7      10.9      1.19854    0.01640    0.03311    0.81260
 12     dc       2922      26.4      22.1      2.61645    2.63625    0.53602    0.85455
 13     wy        286      13.3      10.8     -0.96626    0.00667    0.02099    0.88166
 14     ca       1078      18.2      12.5      1.01521    0.01231    0.03458    0.91174
 15     ga        723      13.5      13.0     -0.68492    0.00691    0.04232    0.92402

We can see that the weight given to Mississippi is dramatically lower using the bisquare weighting function than the Huber weighting function and the parameter estimates from these two different weighting methods differ. When comparing the results of a regular OLS regression and a robust regression, if the results are very different, you will most likely want to use the results from the robust regression. Large differences suggest that the model parameters are being highly influenced by outliers. When using robust regression, SAS documentation notes: "estimates are more sensitive to the parameters of these weight functions than to the type of the weight function". However, different functions have advantages and drawbacks. Huber weights can have difficulties with severe outliers, and bisquare weights can have difficulties converging or may yield multiple solutions.

As you can see, the results from the two analyses are fairly different, especially with respect to the coefficients of single and the constant (_cons). While normally we are not interested in the constant, if you had centered one or both of the predictor variables, the constant would be useful. On the other hand, you will notice that poverty is not statistically significant in either analysis, while single is significant in both analyses.

Things to consider

Proc robustreg does not address issues of heterogeneity of variance. This problem can be addressed by using proc surveyreg. Details can be found in this SAS Code Fragment.
The examples shown here have presented SAS code for M estimation. There are other estimation options available in proc robustreg: Least trimmed squares, S estimation, and MM estimation. See the SAS documentation for when and how these alternatives are preferable.

References

Li, G. 1985. Robust regression. In Exploring Data Tables, Trends, and Shapes, ed. D. C. Hoaglin, F. Mosteller, and J. W. Tukey, Wiley.
Chen, C. 2002. Robust Regression and Outlier Detection with the ROBUSTREG Procedure. Proceedings of the 27th SAS Users Group International Conference, Cary NC: SAS Institute, Inc.
John Fox, Applied regression analysis, linear models, and related models, Sage publications, Inc, 1997

Introduction

Description of the example data

Using robust regression analysis

Things to consider

References

See also