Version info: Code for this page was tested in Stata 12.
NOTE: We will use the NHANES II data as an example.
The question
Let’s say that you ran an OLS regression model with survey data in Stata.
use http://www.stata-press.com/data/r12/nhanes2.dta, clear svyset psu [pw=finalwgt], strata(strata) pweight: finalwgt VCE: linearized Single unit: missing Strata 1: strata SU 1: psu FPC 1: <zero> svy: regress weight height age female (running regress on estimation sample) Survey: Linear regression Number of strata = 31 Number of obs = 10351 Number of PSUs = 62 Population size = 117157513 Design df = 31 F( 3, 29) = 1177.18 Prob > F = 0.0000 R-squared = 0.2827 ------------------------------------------------------------------------------ | Linearized weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | .7405073 .027744 26.69 0.000 .6839229 .7970917 age | .1484546 .0116501 12.74 0.000 .124694 .1722153 female | -2.898197 .5888597 -4.92 0.000 -4.099184 -1.697209 _cons | -57.6088 4.955696 -11.62 0.000 -67.716 -47.50159 ------------------------------------------------------------------------------
At the top of the output, you see the test of the overall regression model: F(3, 29) = 1177.18, p < 0.0000.
Next, you run the same model in SAS.
proc surveyreg data = nhanes2; cluster psu; strata strata; weight finalwgt; model weight = height age female ; run;The SURVEYREG Procedure Regression Analysis for Dependent Variable weight Data Summary Number of Observations 10351 Sum of Weights 117157513 Weighted Mean of weight 71.90064 Weighted Sum of weight 8423699699 Design Summary Number of Strata 31 Number of Clusters 62 Fit Statistics R-square 0.2827 Root MSE 13.0725 Denominator DF 31 Tests of Model Effects Effect Num DF F Value Pr > F Model 3 1258.00 <.0001 Intercept 1 135.10 <.0001 height 1 712.19 <.0001 age 1 162.33 <.0001 female 1 24.22 <.0001 NOTE: The denominator degrees of freedom for the F tests is 31. Estimated Regression Coefficients Standard Parameter Estimate Error t Value Pr > |t| Intercept -57.608796 4.95641443 -11.62 <.0001 height 0.740507 0.02774807 26.69 <.0001 age 0.148455 0.01165183 12.74 <.0001 female -2.898197 0.58894508 -4.92 <.0001 NOTE: The denominator degrees of freedom for the t tests is 31.
The results for the overall test of the regression model are reported as F(3, 31) = 1258.00, p < .0001. Both the test statistic and denominator degrees of freedom are different from your Stata output, so you decide to run the model in SUDAAN.
proc regress data = nhanes2 filetype = sas design = wr; weight finalwgt; nest strata psu; model weight = height age female; run;S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute October 2009 Release 10.0.1 DESIGN SUMMARY: Variances will be computed using the Taylor Linearization Method, Assuming a With Replacement (WR) Design Sample Weight: FINALWGT Stratification Variables(s): STRATA Primary Sampling Unit: PSU Number of observations read : 10351 Weighted count:117157513 Observations used in the analysis : 10351 Weighted count:117157513 Denominator degrees of freedom : 31 Maximum number of estimable parameters for the model is 4 File NHANES2 contains 62 Clusters 62 clusters were used to fit the model Maximum cluster size is 288 records Minimum cluster size is 67 records Weighted mean response is 71.900636 Multiple R-Square for the dependent variable WEIGHT: 0.282704------------------------------------------------------------------------------------------------ Independent P-value Variables and Beta Lower 95% Upper 95% T-Test Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B=0 B=0 ------------------------------------------------------------------------------------------------ Intercept -57.61 4.96 -67.72 -47.50 -11.62 0.0000 HEIGHT 0.74 0.03 0.68 0.80 26.69 0.0000 AGE 0.15 0.01 0.12 0.17 12.74 0.0000 FEMALE -2.90 0.59 -4.10 -1.70 -4.92 0.0000 ------------------------------------------------------------------------------------------------------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 4 58649.64 0.0000 MODEL MINUS INTERCEPT 3 1258.36 0.0000 INTERCEPT 1 135.14 0.0000 HEIGHT 1 712.39 0.0000 AGE 1 162.38 0.0000 FEMALE 1 24.22 0.0000 -------------------------------------------------------
The test of the overall model is F(3, 31) = 1258.36, p < 0.000. The test statistic is pretty close to the SAS output, and the denominator degrees of freedom match the SAS output. What is going on?
The answer
By default, Stata reports an adjusted Wald F test in the output, while SAS and SUDAAN do not. To have Stata match the results given by SAS and SUDAAN, you can use the nosvyadjust option on the test command. (We use the test command with all of the predictor variables in the model to recreate the test of the overall regression shown at the top of the Stata output.)
svy: regress weight height age female (running regress on estimation sample) Survey: Linear regression Number of strata = 31 Number of obs = 10351 Number of PSUs = 62 Population size = 117157513 Design df = 31 F( 3, 29) = 1177.18 Prob > F = 0.0000 R-squared = 0.2827 ------------------------------------------------------------------------------ | Linearized weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | .7405073 .027744 26.69 0.000 .6839229 .7970917 age | .1484546 .0116501 12.74 0.000 .124694 .1722153 female | -2.898197 .5888597 -4.92 0.000 -4.099184 -1.697209 _cons | -57.6088 4.955696 -11.62 0.000 -67.716 -47.50159 ------------------------------------------------------------------------------ test height age female Adjusted Wald test ( 1) height = 0 ( 2) age = 0 ( 3) female = 0 F( 3, 29) = 1177.18 Prob > F = 0.0000
The output from regress and test match.
test height age female, nosvyadjust Unadjusted Wald test ( 1) height = 0 ( 2) age = 0 ( 3) female = 0 F( 3, 31) = 1258.36 Prob > F = 0.0000
The output from test, nosvyadjust is different from the above Stata output but match the SAS and SUDAAN output. Alternatively, you could use the adjwaldf and adjwaldp options on the print command in SUDAAN to reproduce the results given by default by Stata.
The "why" and the degrees of freedom
A discussion of the adjusted Wald test is given on page 2184 of the Stata 12 Reference Guide (in the section for the -test- command). This cites the 1990 American Statistician article by Edward Korn and Barry Graubard entitled "Simultaneous testing of regression coefficients with complex survey data: Use of Bonferroni t statistics". Basically, they argue that this test statistic is more appropriate when you have more than a few terms being tested simultaneously (in other words, more predictors in the model.) The test statistic (what the authors call the Wald procedure) has numerator degrees of freedom a p, the number of predictors (excluding the intercept), and denominator degrees of freedom # of PSUs – # of strata – p + 1. In the example above, we have 62 PSUs, 31 strata and 3 predictors. Hence, the denominator degrees of freedom are calculated as 62 – 31- 3 + 1 = 29. In SAS and SUDAAN, you see notes indicating that the denominator degrees of freedom equals 31, which is simply 62 – 31 = 31.
References
Korn, E. and Graubard, B. (1990). Simultaneous testing of regression coefficients with complex survey data: Use of Bonferroni t statistics. American Statistician, Vol. 4, No. 4, pages 270-276.
Stata 12 Base Reference Manual. College Station, TX: Stata Press.