Sampling of Populations, Third Edition by Levy and Lemeshow

Chapter 3: Simple random sampling

page 53 simple random sampling

This example uses the momsag data set.

NOTE: The n on the proc surveymeans statement indicates that there are 773 primary sampling units (PSUs).

proc surveymeans data = momsag n = 773 mean sum std;
  weight weight1;
  var momsag;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Observations            25
Sum of Weights            773.000002

                               Statistics

                               Std Error
Variable            Mean         of Mean             Sum         Std Dev
------------------------------------------------------------------------
MOMSAG          0.920000        0.054475      711.160002       42.108894
------------------------------------------------------------------------

Chapter 4: Systematic sampling

page 109 repeated systematic sampling

This example uses the wloss2 data set.

NOTE: The PSUs are listed on the cluster statement in SAS. Also, the coloring in the (enhanced) program editor window does not work properly with the cluster statement, so don’t think that you have specified the statement incorrectly just because the keyword cluster does not turn blue.

proc surveymeans data = wloss2 n = 54 sum std mean;
  weight wt1;
  cluster cluster;
  var xi;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Clusters                 6
Number of Observations            18
Sum of Weights                   162

                               Statistics

                               Std Error
Variable            Mean         of Mean             Sum         Std Dev
------------------------------------------------------------------------
XI              4.500000        0.530548      729.000000       85.948822
------------------------------------------------------------------------

Chapter 5: Stratification and stratified random sampling

page 138 stratification and stratified random sampling

This example uses the hospsamp data set.
data second138;
  input id _TOTAL_ oblevel;
  cards;
  1 42 1
  2 42 1
  3 42 1
  4 42 1
  5 99 2
  6 99 2
  7 99 2
  8 99 2
  9 99 2
  10 17 3
  11 17 3
  12 17 3
  13 17 3
  14 17 3
  15 17 3
  ;
run;

NOTE: You cannot get the totals for both the whole group and the sub-groups in the same proc surveymeans.

NOTE: The data set second138 is used to tell SAS what the totals are in each stratum. These totals are used to compute the finite population correction (fpc). SAS allows only one number to be supplied on the proc surveymeans statement. Because the totals change from one stratum to the next, we need to supply them to SAS in a data set. You can include these data in the primary data set or in a secondary data set. In this example, we will use a secondary data set. Also note that the secondary data set can be "collapsed"; in other words, just one line (observations) for each strata. In the secondary data set, the variable that contains the totals must be called _TOTAL_. The variable oblevel is copied from the original data set because SAS requires all of the variables listed on the strata statement to appear in this data set. In our example, there is only one variable listed on the strata statement, but in other cases, there may be two or more variables listed.

proc surveymeans data = hospsamp n = second138 sum ;
  weight weighta;
  strata oblevel;
  var births;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   3
Number of Observations            15
Sum of Weights            157.999931


               Statistics

Variable             Sum         Std Dev
----------------------------------------
BIRTHS            183983           34014
----------------------------------------


              Domain Analysis: OBLEVEL

OBLEVEL    Variable             Sum         Std Dev
---------------------------------------------------
      1    BIRTHS             14931     2669.856738
      2    BIRTHS            117117           33068
      3    BIRTHS             51935     7508.399372
---------------------------------------------------

Chapter 6: Stratified random sampling: Further issues

page 167 stratified random sampling: allocation of sample to strata

This example uses the jacktwn data set and the secondjt data set. Please see the SAS documentation for information on how to create the secondary data sets needed.

data secondjt;
input stratum _TOTAL_;
cards;
1 39867.75
2 11256.75 
3 10019.75 
7 2542 
8 99.25
9 151.5
13 79735.5 
14 22513.5
15 20039.5 
19 5084
20 198.5
21 303
25 39867.75 
26 11256.75
27 11269.75
31 2542 
32 99.25
33 151.5
;
run;

proc surveymeans data = hospsamp n = second138 sum ; weight weighta; strata oblevel; domain oblevel; var births; run;

The SURVEYMEANS Procedure

            Data Summary

Number of Strata                  18
Number of Observations           831
Sum of Weights                256998


                               Statistics

                               Std Error
Variable            Mean         of Mean             Sum         Std Dev
------------------------------------------------------------------------
TWIN            0.101384        0.014751           26055     3791.043615
------------------------------------------------------------------------


                              Domain Analysis: QUART1

                                         Std Error
QUART1    Variable            Mean         of Mean             Sum         Std Dev
----------------------------------------------------------------------------------
     1    TWIN            0.300042        0.041629           19184     2661.629006
     2    TWIN            0.052692        0.021088     6737.906748     2696.605016
     3    TWIN            0.002051        0.001944      133.686957      126.744274
----------------------------------------------------------------------------------

page 173 stratified random sampling: Stratification after sampling

This has been skipped for now.

Chapter 7: Ratio estimation

page 198 ratio estimation

This example uses the tab7pt1 data set. Note that there may be a typo in the text for the weighted X-sum and the weighted Y-sum.

NOTE: The ratio statement (and the ratio keyword in the proc surveymeans statement) are new to SAS version 9. If you are using SAS version 8, you will need to calculate the ratio by hand by dividing the two sums. However, the standard error cannot be calculated in this way.

proc surveymeans data = tab7pt1 n = 8 sum ;
weight wt1;
var pharmexp totmedex;
ratio 'ratio estimation' pharmexp / totmedex;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Observations             7
Sum of Weights                     8


               Statistics

Variable             Sum         Std Dev
----------------------------------------
PHARMEXP         1028571           58612
TOTMEDEX         3222857          150771
----------------------------------------


         Ratio Analysis: ratio estimation

Numerator Denominator        Ratio         Std Err
--------------------------------------------------
PHARMEXP  TOTMEDEX        0.319149        0.004007
--------------------------------------------------

Chapter 9: Simple one-stage cluster sampling

page 250 simple one-stage cluster sampling

This example uses the tab9_1c data set.

proc surveymeans data = tab9_1c n = 5 sum mean;
  weight wt1;
  cluster devlpmnt;
  var nge65 nvstnrs hhneedvn;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Clusters                 2
Number of Observations            40
Sum of Weights                   100

                               Statistics

                               Std Error
Variable            Mean         of Mean             Sum         Std Dev
------------------------------------------------------------------------
NGE65           1.675000        0.019365      167.500000        1.936492
NVSTNRS         0.575000        0.019365       57.500000        1.936492
HHNEEDVN        0.525000        0.019365       52.500000        1.936492
------------------------------------------------------------------------

page 253 simple one-stage cluster sampling

data probbksm;
  input record district eligible treated w n;
  cards;
  1 6 486 79 2.6 26
  2 10 240 94 2.6 26
  3 14 428 17 2.6 26
  4 15 343 57 2.6 26
  5 17 1130 63 2.6 26
  6 19 983 10 2.6 26
  7 20 333 58 2.6 26
  8 21 13 0 2.6 26
  9 22 1506 101 2.6 26
  10 25 1755 411 2.6 26
  ;
run;
proc surveymeans data = probbksm n = 26 sum;
  weight w;
  var treated eligible;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Observations            10
Sum of Weights                    26

               Statistics

Variable             Sum         Std Dev
----------------------------------------
treated      2314.000000      763.030216
eligible           18764     3778.280164
----------------------------------------

Chapter 10: Two-stage cluster sampling: Clusters sampled with equal probability

page 285 two-stage cluster sampling: clusters sample with equal probability and all of the clusters have the same n and without replacement

data pt1;
  input center nurse m nbar w npatnts nrefrred;
  cards;
  1 2 5 3 2.5 44 6
  1 3 5 3 2.5 18 6
  2 1 5 3 2.5 42 3
  2 3 5 3 2.5 10 2
  4 1 5 3 2.5 16 5
  4 2 5 3 2.5 32 14
  ;
run;

This has been skipped for now.

page 286 two-stage cluster sampling: clusters sampled with equal probability, all clusters have the same n and sampling is with replacement (or the sampling fractions are small at each stage)

This has been skipped for now.

page 310 two-stage cluster sampling: clusters sampled with equal probability in which not all clusters have the same n. This example uses the pt210 data set.

proc surveymeans data = pt210 total = secondpt210 sum;
  weight w;
  cluster hospno;
  var dxdead lifethrt;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Clusters                 3
Number of Observations           708
Sum of Weights            23599.9883

               Statistics

Variable             Sum         Std Dev
----------------------------------------
DXDEAD        499.378967      114.637498
LIFETHRT     2932.319092      905.894114
----------------------------------------

Chapter 11: Cluster sampling in which clusters are sampled with unequal probability: Probability proportional to size sampling

page 350 cluster sampling with unequal probabilities: probability proportional to size sampling

This example uses the hospslect data set.

proc surveymeans data = hospslct n = 4672 sum mean;
  weight wstar;
  cluster drawing;
  var lifethrt dxdead;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Clusters                 5
Number of Observations            50
Sum of Weights                 50056

                               Statistics

                               Std Error
Variable            Mean         of Mean             Sum         Std Dev
------------------------------------------------------------------------
LIFETHRT        0.120000        0.019989     6006.720000     1000.584155
DXDEAD          0.040000        0.024482     2002.240000     1225.460312
------------------------------------------------------------------------

Page 353 cluster sampling with unequal probabilities: probability proportional to size sampling

data hspslct2;
  set hospslct;
  /*n is 50*/
  /*N_i is admiss*/
  /* X is 7087, the total number of life-threatening conditions across all the hospitals*/
  /*X_i is tl, the total number of life-threatening conditions for each hospital*/
  if hospno = 2 then  tl = 785; 
  if hospno = 5 then  tl = 3404; 
  if hospno = 9 then  tl = 778; 
  w2star = (admiss/50)*(7087/tl);
run;
proc surveymeans data = hspslct2 n = 4672 mean sum;
  weight w2star;
  cluster drawing;
  var lifethrt dxdead;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Clusters                 5
Number of Observations            50
Sum of Weights            51344.9973

                               Statistics

                               Std Error
Variable            Mean         of Mean             Sum         Std Dev
------------------------------------------------------------------------
LIFETHRT        0.121904        0.021404     6259.175814     1276.638273
DXDEAD          0.034287        0.022991     1760.471466     1078.465840
------------------------------------------------------------------------

Chapter 12: Variance estimation in complex sample surveys

page 370 variance estimation in complex sample surveys: linearization

This example uses the exmp12_2 data set.

proc surveymeans data = exmp12_2 n = 65 sum;
  weight w;
  var ovpaymnt payment;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Observations            10
Sum of Weights                    65

               Statistics

Variable             Sum         Std Dev
----------------------------------------
OVPAYMNT     6922.500000     1844.835998
PAYMENT            17914     2540.097723
----------------------------------------

NOTE: The problem with the amblnce2 data file on the Wiley website is that the first variable does not have a name. To open the file, you need to use SAS 6.12 and use the viewer (which is not available in SAS 8.x). Add the name to the first variable and save the file. Use StatTransfer (or whatever method that you like) to transfer the file into SAS 8.x format. Note that the data set is printed on page 377, except for the replicate weight variables.

pages 376 and 377 variance estimation in complex sample surveys: replication methods

This example uses the am (amblnce2) data set.

proc surveymeans data = am n = 5 sum;
  cluster esa;
  weight wt;
  var cardarrs alive;
run;

The SURVEYMEANS Procedure

            Data Summary

Number of Clusters                 3
Number of Observations             6
Sum of Weights                    15

               Statistics

Variable             Sum         Std Dev
----------------------------------------
CARDARRS     4527.500000     1444.644420
ALIVE         695.000000      142.047527
----------------------------------------

page 379 variance estimation in complex sample surveys: replication methods

NOTE: SAS cannot handle replicate weights.

page 485 strategies for design-based analysis of sample survey data

data ch16;
  input id region nurshome patient medicaid rgnhomes nhadmiss ;
  weight = (rgnhomes*nhadmiss)/10;
  cards;
  1 1 1 1 1 12 123
  2 1 1 2 1 12 123
  3 1 1 3 1 12 123
  4 1 1 4 0 12 123
  5 1 1 5 1 12 123
  6 1 2 1 0 12 89
  7 1 2 2 0 12 89
  8 1 2 3 1 12 89
  9 1 2 4 0 12 89
  10 1 2 5 0 12 89
  11 2 1 1 1 20 231
  12 2 1 2 0 20 231
  13 2 1 3 1 20 231
  14 2 1 4 0 20 231
  15 2 1 5 1 20 231
  16 2 2 1 0 20 187 
  17 2 2 2 0 20 187
  18 2 2 3 0 20 187
  19 2 2 4 1 20 187
  20 2 2 5 0 20 187
  21 3 1 1 1 11 43
  22 3 1 2 1 11 43
  23 3 1 3 1 11 43
  24 3 1 4 0 11 43
  25 3 1 5 1 11 43
  26 3 2 1 1 11 49
  27 3 2 2 1 11 49
  28 3 2 3 1 11 49
  29 3 2 4 1 11 49
  30 3 2 5 0 11 49
  31 4 1 1 0 8 56
  32 4 1 2 1 8 56
  33 4 1 3 1 8 56
  34 4 1 4 0 8 56
  35 4 1 5 0 8 56
  36 4 2 1 0 8 38
  37 4 2 2 0 8 38
  38 4 2 3 0 8 38
  39 4 2 4 0 8 38
  40 4 2 5 1 8 38
  41 5 1 1 1 6 359
  42 5 1 2 0 6 359
  43 5 1 3 1 6 359
  44 5 1 4 1 6 359
  45 5 1 5 0 6 359
  46 5 2 1 0 6 460
  47 5 2 2 1 6 460
  48 5 2 3 0 6 460
  49 5 2 4 1 6 460
  50 5 2 5 0 6 460
  ;
run;

This has been skipped for now.