Chapter 3: Simple random sampling
page 53 simple random sampling
This example uses the momsag data set.
NOTE: The n on the proc surveymeans statement indicates that there are 773 primary sampling units (PSUs).
proc surveymeans data = momsag n = 773 mean sum std; weight weight1; var momsag; run;
The SURVEYMEANS Procedure Data Summary Number of Observations 25 Sum of Weights 773.000002 Statistics Std Error Variable Mean of Mean Sum Std Dev ------------------------------------------------------------------------ MOMSAG 0.920000 0.054475 711.160002 42.108894 ------------------------------------------------------------------------
Chapter 4: Systematic sampling
page 109 repeated systematic sampling
This example uses the wloss2 data set.
NOTE: The PSUs are listed on the cluster statement in SAS. Also, the coloring in the (enhanced) program editor window does not work properly with the cluster statement, so don’t think that you have specified the statement incorrectly just because the keyword cluster does not turn blue.
proc surveymeans data = wloss2 n = 54 sum std mean; weight wt1; cluster cluster; var xi; run;The SURVEYMEANS Procedure Data Summary Number of Clusters 6 Number of Observations 18 Sum of Weights 162 Statistics Std Error Variable Mean of Mean Sum Std Dev ------------------------------------------------------------------------ XI 4.500000 0.530548 729.000000 85.948822 ------------------------------------------------------------------------
Chapter 5: Stratification and stratified random sampling
page 138 stratification and stratified random sampling
This example uses the hospsamp data set.
data second138; input id _TOTAL_ oblevel; cards; 1 42 1 2 42 1 3 42 1 4 42 1 5 99 2 6 99 2 7 99 2 8 99 2 9 99 2 10 17 3 11 17 3 12 17 3 13 17 3 14 17 3 15 17 3 ; run;
NOTE: You cannot get the totals for both the whole group and the sub-groups in the same proc surveymeans.
NOTE: The data set second138 is used to tell SAS what the totals are in each stratum. These totals are used to compute the finite population correction (fpc). SAS allows only one number to be supplied on the proc surveymeans statement. Because the totals change from one stratum to the next, we need to supply them to SAS in a data set. You can include these data in the primary data set or in a secondary data set. In this example, we will use a secondary data set. Also note that the secondary data set can be "collapsed"; in other words, just one line (observations) for each strata. In the secondary data set, the variable that contains the totals must be called _TOTAL_. The variable oblevel is copied from the original data set because SAS requires all of the variables listed on the strata statement to appear in this data set. In our example, there is only one variable listed on the strata statement, but in other cases, there may be two or more variables listed.
proc surveymeans data = hospsamp n = second138 sum ; weight weighta; strata oblevel; var births; run;The SURVEYMEANS Procedure Data Summary Number of Strata 3 Number of Observations 15 Sum of Weights 157.999931 Statistics Variable Sum Std Dev ---------------------------------------- BIRTHS 183983 34014 ---------------------------------------- Domain Analysis: OBLEVEL OBLEVEL Variable Sum Std Dev --------------------------------------------------- 1 BIRTHS 14931 2669.856738 2 BIRTHS 117117 33068 3 BIRTHS 51935 7508.399372 ---------------------------------------------------
Chapter 6: Stratified random sampling: Further issues
page 167 stratified random sampling: allocation of sample to strata
This example uses the jacktwn data set and the secondjt data set. Please see the SAS documentation for information on how to create the secondary data sets needed.
data secondjt; input stratum _TOTAL_; cards; 1 39867.75 2 11256.75 3 10019.75 7 2542 8 99.25 9 151.5 13 79735.5 14 22513.5 15 20039.5 19 5084 20 198.5 21 303 25 39867.75 26 11256.75 27 11269.75 31 2542 32 99.25 33 151.5 ; run;proc surveymeans data = hospsamp n = second138 sum ; weight weighta; strata oblevel; domain oblevel; var births; run;
The SURVEYMEANS Procedure Data Summary Number of Strata 18 Number of Observations 831 Sum of Weights 256998 Statistics Std Error Variable Mean of Mean Sum Std Dev ------------------------------------------------------------------------ TWIN 0.101384 0.014751 26055 3791.043615 ------------------------------------------------------------------------ Domain Analysis: QUART1 Std Error QUART1 Variable Mean of Mean Sum Std Dev ---------------------------------------------------------------------------------- 1 TWIN 0.300042 0.041629 19184 2661.629006 2 TWIN 0.052692 0.021088 6737.906748 2696.605016 3 TWIN 0.002051 0.001944 133.686957 126.744274 ----------------------------------------------------------------------------------
page 173 stratified random sampling: Stratification after sampling
This has been skipped for now.
Chapter 7: Ratio estimation
page 198 ratio estimation
This example uses the tab7pt1 data set. Note that there may be a typo in the text for the weighted X-sum and the weighted Y-sum.
NOTE: The ratio statement (and the ratio keyword in the proc surveymeans statement) are new to SAS version 9. If you are using SAS version 8, you will need to calculate the ratio by hand by dividing the two sums. However, the standard error cannot be calculated in this way.
proc surveymeans data = tab7pt1 n = 8 sum ; weight wt1; var pharmexp totmedex; ratio 'ratio estimation' pharmexp / totmedex; run;The SURVEYMEANS Procedure Data Summary Number of Observations 7 Sum of Weights 8 Statistics Variable Sum Std Dev ---------------------------------------- PHARMEXP 1028571 58612 TOTMEDEX 3222857 150771 ---------------------------------------- Ratio Analysis: ratio estimation Numerator Denominator Ratio Std Err -------------------------------------------------- PHARMEXP TOTMEDEX 0.319149 0.004007 --------------------------------------------------
Chapter 9: Simple one-stage cluster sampling
page 250 simple one-stage cluster sampling
This example uses the tab9_1c data set.
proc surveymeans data = tab9_1c n = 5 sum mean; weight wt1; cluster devlpmnt; var nge65 nvstnrs hhneedvn; run;The SURVEYMEANS Procedure Data Summary Number of Clusters 2 Number of Observations 40 Sum of Weights 100 Statistics Std Error Variable Mean of Mean Sum Std Dev ------------------------------------------------------------------------ NGE65 1.675000 0.019365 167.500000 1.936492 NVSTNRS 0.575000 0.019365 57.500000 1.936492 HHNEEDVN 0.525000 0.019365 52.500000 1.936492 ------------------------------------------------------------------------
page 253 simple one-stage cluster sampling
data probbksm; input record district eligible treated w n; cards; 1 6 486 79 2.6 26 2 10 240 94 2.6 26 3 14 428 17 2.6 26 4 15 343 57 2.6 26 5 17 1130 63 2.6 26 6 19 983 10 2.6 26 7 20 333 58 2.6 26 8 21 13 0 2.6 26 9 22 1506 101 2.6 26 10 25 1755 411 2.6 26 ; run; proc surveymeans data = probbksm n = 26 sum; weight w; var treated eligible; run;The SURVEYMEANS Procedure Data Summary Number of Observations 10 Sum of Weights 26 Statistics Variable Sum Std Dev ---------------------------------------- treated 2314.000000 763.030216 eligible 18764 3778.280164 ----------------------------------------
Chapter 10: Two-stage cluster sampling: Clusters sampled with equal probability
page 285 two-stage cluster sampling: clusters sample with equal probability and all of the clusters have the same n and without replacement
data pt1; input center nurse m nbar w npatnts nrefrred; cards; 1 2 5 3 2.5 44 6 1 3 5 3 2.5 18 6 2 1 5 3 2.5 42 3 2 3 5 3 2.5 10 2 4 1 5 3 2.5 16 5 4 2 5 3 2.5 32 14 ; run;
This has been skipped for now.
page 286 two-stage cluster sampling: clusters sampled with equal probability, all clusters have the same n and sampling is with replacement (or the sampling fractions are small at each stage)
This has been skipped for now.
page 310 two-stage cluster sampling: clusters sampled with equal probability in which not all clusters have the same n. This example uses the pt210 data set.
proc surveymeans data = pt210 total = secondpt210 sum; weight w; cluster hospno; var dxdead lifethrt; run;The SURVEYMEANS Procedure Data Summary Number of Clusters 3 Number of Observations 708 Sum of Weights 23599.9883 Statistics Variable Sum Std Dev ---------------------------------------- DXDEAD 499.378967 114.637498 LIFETHRT 2932.319092 905.894114 ----------------------------------------
Chapter 11: Cluster sampling in which clusters are sampled with unequal probability: Probability proportional to size sampling
page 350 cluster sampling with unequal probabilities: probability proportional to size sampling
This example uses the hospslect data set.
proc surveymeans data = hospslct n = 4672 sum mean; weight wstar; cluster drawing; var lifethrt dxdead; run;The SURVEYMEANS Procedure Data Summary Number of Clusters 5 Number of Observations 50 Sum of Weights 50056 Statistics Std Error Variable Mean of Mean Sum Std Dev ------------------------------------------------------------------------ LIFETHRT 0.120000 0.019989 6006.720000 1000.584155 DXDEAD 0.040000 0.024482 2002.240000 1225.460312 ------------------------------------------------------------------------
Page 353 cluster sampling with unequal probabilities: probability proportional to size sampling
data hspslct2; set hospslct; /*n is 50*/ /*N_i is admiss*/ /* X is 7087, the total number of life-threatening conditions across all the hospitals*/ /*X_i is tl, the total number of life-threatening conditions for each hospital*/ if hospno = 2 then tl = 785; if hospno = 5 then tl = 3404; if hospno = 9 then tl = 778; w2star = (admiss/50)*(7087/tl); run; proc surveymeans data = hspslct2 n = 4672 mean sum; weight w2star; cluster drawing; var lifethrt dxdead; run;The SURVEYMEANS Procedure Data Summary Number of Clusters 5 Number of Observations 50 Sum of Weights 51344.9973 Statistics Std Error Variable Mean of Mean Sum Std Dev ------------------------------------------------------------------------ LIFETHRT 0.121904 0.021404 6259.175814 1276.638273 DXDEAD 0.034287 0.022991 1760.471466 1078.465840 ------------------------------------------------------------------------
Chapter 12: Variance estimation in complex sample surveys
page 370 variance estimation in complex sample surveys: linearization
This example uses the exmp12_2 data set.
proc surveymeans data = exmp12_2 n = 65 sum; weight w; var ovpaymnt payment; run;The SURVEYMEANS Procedure Data Summary Number of Observations 10 Sum of Weights 65 Statistics Variable Sum Std Dev ---------------------------------------- OVPAYMNT 6922.500000 1844.835998 PAYMENT 17914 2540.097723 ----------------------------------------
NOTE: The problem with the amblnce2 data file on the Wiley website is that the first variable does not have a name. To open the file, you need to use SAS 6.12 and use the viewer (which is not available in SAS 8.x). Add the name to the first variable and save the file. Use StatTransfer (or whatever method that you like) to transfer the file into SAS 8.x format. Note that the data set is printed on page 377, except for the replicate weight variables.
pages 376 and 377 variance estimation in complex sample surveys: replication methods
This example uses the am (amblnce2) data set.
proc surveymeans data = am n = 5 sum; cluster esa; weight wt; var cardarrs alive; run;The SURVEYMEANS Procedure Data Summary Number of Clusters 3 Number of Observations 6 Sum of Weights 15 Statistics Variable Sum Std Dev ---------------------------------------- CARDARRS 4527.500000 1444.644420 ALIVE 695.000000 142.047527 ----------------------------------------
page 379 variance estimation in complex sample surveys: replication methods
NOTE: SAS cannot handle replicate weights.
page 485 strategies for design-based analysis of sample survey data
data ch16; input id region nurshome patient medicaid rgnhomes nhadmiss ; weight = (rgnhomes*nhadmiss)/10; cards; 1 1 1 1 1 12 123 2 1 1 2 1 12 123 3 1 1 3 1 12 123 4 1 1 4 0 12 123 5 1 1 5 1 12 123 6 1 2 1 0 12 89 7 1 2 2 0 12 89 8 1 2 3 1 12 89 9 1 2 4 0 12 89 10 1 2 5 0 12 89 11 2 1 1 1 20 231 12 2 1 2 0 20 231 13 2 1 3 1 20 231 14 2 1 4 0 20 231 15 2 1 5 1 20 231 16 2 2 1 0 20 187 17 2 2 2 0 20 187 18 2 2 3 0 20 187 19 2 2 4 1 20 187 20 2 2 5 0 20 187 21 3 1 1 1 11 43 22 3 1 2 1 11 43 23 3 1 3 1 11 43 24 3 1 4 0 11 43 25 3 1 5 1 11 43 26 3 2 1 1 11 49 27 3 2 2 1 11 49 28 3 2 3 1 11 49 29 3 2 4 1 11 49 30 3 2 5 0 11 49 31 4 1 1 0 8 56 32 4 1 2 1 8 56 33 4 1 3 1 8 56 34 4 1 4 0 8 56 35 4 1 5 0 8 56 36 4 2 1 0 8 38 37 4 2 2 0 8 38 38 4 2 3 0 8 38 39 4 2 4 0 8 38 40 4 2 5 1 8 38 41 5 1 1 1 6 359 42 5 1 2 0 6 359 43 5 1 3 1 6 359 44 5 1 4 1 6 359 45 5 1 5 0 6 359 46 5 2 1 0 6 460 47 5 2 2 1 6 460 48 5 2 3 0 6 460 49 5 2 4 1 6 460 50 5 2 5 0 6 460 ; run;
This has been skipped for now.