Elementary Survey Sampling, 5th Edition by Scheaffer, Mendenhall and Ott Chapter 9: Two-stage cluster sampling

Page 339, Table 9.1

use "A:table91.dta", clear
rename col1 plant
rename col2 downtime
sort plant
by plant: list downtime
_______________________________________________________________________________
-> plant = 1

     +----------+
     | downtime |
     |----------|
  1. |        5 |
  2. |        7 |
  3. |        9 |
  4. |        0 |
  5. |       11 |
     |----------|
  6. |        2 |
  7. |        8 |
  8. |        4 |
  9. |        3 |
 10. |        5 |
     +----------+

_______________________________________________________________________________
-> plant = 2

     +----------+
     | downtime |
     |----------|
  1. |        4 |
  2. |        3 |
  3. |        7 |
  4. |        2 |
  5. |       11 |
     |----------|
  6. |        0 |
  7. |        1 |
  8. |        9 |
  9. |        4 |
 10. |        3 |
     |----------|
 11. |        2 |
 12. |        1 |
 13. |        5 |
     +----------+

_______________________________________________________________________________
-> plant = 3

     +----------+
     | downtime |
     |----------|
  1. |        5 |
  2. |        6 |
  3. |        4 |
  4. |       11 |
  5. |       12 |
     |----------|
  6. |        0 |
  7. |        1 |
  8. |        8 |
  9. |        4 |
     +----------+

_______________________________________________________________________________
-> plant = 4

     +----------+
     | downtime |
     |----------|
  1. |        6 |
  2. |        4 |
  3. |        0 |
  4. |        1 |
  5. |        0 |
     |----------|
  6. |        9 |
  7. |        8 |
  8. |        4 |
  9. |        6 |
 10. |       10 |
     +----------+

_______________________________________________________________________________
-> plant = 5

     +----------+
     | downtime |
     |----------|
  1. |       11 |
  2. |        4 |
  3. |        3 |
  4. |        1 |
  5. |        0 |
     |----------|
  6. |        2 |
  7. |        8 |
  8. |        6 |
  9. |        5 |
 10. |        3 |
     +----------+

_______________________________________________________________________________
-> plant = 6

     +----------+
     | downtime |
     |----------|
  1. |       12 |
  2. |       11 |
  3. |        3 |
  4. |        4 |
  5. |        2 |
     |----------|
  6. |        0 |
  7. |        0 |
  8. |        1 |
  9. |        4 |
 10. |        3 |
     |----------|
 11. |        2 |
 12. |        4 |
     +----------+

_______________________________________________________________________________
-> plant = 7

     +----------+
     | downtime |
     |----------|
  1. |        3 |
  2. |        7 |
  3. |        6 |
  4. |        7 |
  5. |        8 |
     |----------|
  6. |        4 |
  7. |        3 |
  8. |        2 |
     +----------+

_______________________________________________________________________________
-> plant = 8

     +----------+
     | downtime |
     |----------|
  1. |        3 |
  2. |        6 |
  3. |        4 |
  4. |        3 |
  5. |        2 |
     |----------|
  6. |        2 |
  7. |        8 |
  8. |        4 |
  9. |        0 |
 10. |        4 |
     |----------|
 11. |        5 |
 12. |        6 |
 13. |        3 |
     +----------+

_______________________________________________________________________________
-> plant = 9

     +----------+
     | downtime |
     |----------|
  1. |        6 |
  2. |        4 |
  3. |        7 |
  4. |        3 |
  5. |        9 |
     |----------|
  6. |        1 |
  7. |        4 |
  8. |        5 |
     +----------+

_______________________________________________________________________________
-> plant = 10

     +----------+
     | downtime |
     |----------|
  1. |        6 |
  2. |        7 |
  3. |        5 |
  4. |       10 |
  5. |       11 |
     |----------|
  6. |        2 |
  7. |        1 |
  8. |        4 |
  9. |        0 |
 10. |        5 |
     |----------|
 11. |        4 |
     +----------+

by plant: tabstat downtime, s(mean var)

_______________________________________________________________________________
-> plant = 1

    variable |      mean  variance
-------------+--------------------
    downtime |       5.4  11.37778
----------------------------------

_______________________________________________________________________________
-> plant = 2

    variable |      mean  variance
-------------+--------------------
    downtime |         4  10.66667
----------------------------------

_______________________________________________________________________________
-> plant = 3

    variable |      mean  variance
-------------+--------------------
    downtime |  5.666667     16.75
----------------------------------

_______________________________________________________________________________
-> plant = 4

    variable |      mean  variance
-------------+--------------------
    downtime |       4.8  13.28889
----------------------------------

_______________________________________________________________________________
-> plant = 5

    variable |      mean  variance
-------------+--------------------
    downtime |       4.3  11.12222
----------------------------------

_______________________________________________________________________________
-> plant = 6

    variable |      mean  variance
-------------+--------------------
    downtime |  3.833333  14.87879
----------------------------------

_______________________________________________________________________________
-> plant = 7

    variable |      mean  variance
-------------+--------------------
    downtime |         5  5.142857
----------------------------------

_______________________________________________________________________________
-> plant = 8

    variable |      mean  variance
-------------+--------------------
    downtime |  3.846154  4.307692
----------------------------------

_______________________________________________________________________________
-> plant = 9

    variable |      mean  variance
-------------+--------------------
    downtime |     4.875     6.125
----------------------------------

_______________________________________________________________________________
-> plant = 10

    variable |      mean  variance
-------------+--------------------
    downtime |         5      11.8
----------------------------------

_______________________________________________________________________________

Page 340

NOTE: Some of the data needed for the calculations below need to be entered by hand, or you can merge the table9.1 and table9.2 datasets.

use "A:page340.dta", clear
bysort plant: gen num = _n
reshape wide hours, i(plant) j(num)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      104   ->      10
Number of variables                   7   ->      18
j variable (13 values)              num   ->   (dropped)
xij variables:
                                  hours   ->   hours1 hours2 ... hours13
-----------------------------------------------------------------------------

egen y_bar = rowmean( hours1 hours2 hours3 hours4 hours5 hours6 hours7 hours8 hours9 hours10 hours11 hours12 hours13)
egen sd = rowsd( hours1 hours2 hours3 hours4 hours5 hours6 hours7 hours8 hours9 hours10 hours11 hours12 hours13)
gen miyi = M*y_bar
tabstat m y_bar miyi, s(n mean p50 sd)

   stats |         m     y_bar      miyi
---------+------------------------------
       N |        10        10        10
    mean |      10.4  4.672115  240.0179
     p50 |        10    4.8375  242.1231
      sd |  1.837873  .6469981  27.71871
----------------------------------------

Page 341-342

use "A:page340.dta", clear
gen p1=nplant/10
gen p2=M/m
gen pwt=p1*p2
svyset plant [pweight=pwt], fpc(nplant) vce(linearized) || _n, fpc(nmachine)

      pweight: pwt
          VCE: linearized
     Strata 1: <one>
         SU 1: plant
        FPC 1: nplant
     Strata 2: <one>
         SU 2: <observations>
        FPC 2: nmachine

svy: mean hours
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =     104
Number of PSUs   =      10          Population size  =    4698
                                    Design df        =       9

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       hours |   4.598045   .2268273      4.084926    5.111164
--------------------------------------------------------------

svy: total hours
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =     104
Number of PSUs   =      10          Population size  =    4698
                                    Design df        =       9

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       hours |   21601.62    894.421      19578.29    23624.94
--------------------------------------------------------------

NOTE: The estimate of the mean given by Stata (and other statistics packages) is somewhat different from that shown in the text. The difference is in the way the mean is calculated. In the text, they say that the total number of elements in the population is 4500, so they divide the total by that to get the 4.8. In Stata, the total number of elements in the population is estimated (by summing the probability weights) to be 4698, so Stata divides the total by 4698 to get 4.598. Although it may seem that Stata’s estimate of the mean is less precise than that given in the text, remember that both numbers are merely estimates of the true population mean, which is unknown. If a different sample was drawn, you would get a different estimate of the mean both using the textbook’s method and from Stata. If you know the total number of elements in the population, you can do the division yourself so that only one number (the total) is an estimate.

Page 345

use "A:table92.dta", clear
rename col1 plant
rename col2 Mi
rename col3 mi
rename col4 pi

Page 347

gen Mipi = Mi*pigen diff = Mipi - .34*Mi
gen within = Mi*(Mi-mi)*pi*(1 - pi)/(mi - 1)
list

     +------------------------------------------------------+
     | plant   Mi   mi    pi    Mipi        diff     within |
     |------------------------------------------------------|
  1. |     1   50   10    .4      20           3   53.33333 |
  2. |     2   65   13   .38    24.7    2.599999   66.36066 |
  3. |     3   45    9   .22     9.9   -5.400001     34.749 |
  4. |     4   48   10    .3    14.4   -1.919999      42.56 |
  5. |     5   52   10    .5      26        8.32   60.66667 |
     |------------------------------------------------------|
  6. |     6   58   12   .25    14.5       -5.22   45.47727 |
  7. |     7   42    8   .38   15.96        1.68    48.0624 |
  8. |     8   66   13   .31   20.46   -1.979999   62.35185 |
  9. |     9   40    8   .25      10        -3.6   34.28571 |
 10. |    10   56   11   .36   20.16        1.12    58.0608 |
     +------------------------------------------------------+

tabstat Mi mi Mipi diff, s(n mean median sd)

   stats |        Mi        mi      Mipi      diff
---------+----------------------------------------
       N |        10        10        10        10
    mean |      52.2      10.4    17.608      -.14
     p50 |        51        10     17.98 -.3999998
      sd |  9.003703  1.837873  5.588203  4.292608
--------------------------------------------------

tabstat Mipi Mi, s(sum)

   stats |      Mipi        Mi
---------+--------------------
     sum |    176.08       522
------------------------------

di 176.08/522

.33731801