When using survey data, the mean of a variable is calculated as the total (sum of the values) for that variable divided by the population size (the total number of elements in the population). In many cases, the population size is unknown and is estimated as the sum of the probability weights. In the example below, which is taken from chapter 9 of Elementary Survey Sampling, Fifth Edition, by Scheaffer, Mendenhall and Ott, we can see that Stata estimates the total of the variable hours to be 21601.62 and the population size to be 4698. Doing the division, 21601.62/4698 = 4.598, which is what Stata estimates the mean of hours to be. In some cases, however, you may know the true population size. If this is the case, then you can calculate the mean by hand using the known population size instead of the estimated population size. There is no way to tell Stata what the known population size is, so you cannot get an estimate calculated using that method from Stata (or other survey data analysis packages). For example, let’s say that you know the population size to be 4500. You could divide 21601.62/4500 = 4.8. Please note that both 4.6 and 4.8 are both estimates of the true (unknown) population mean and that if a different sample was drawn, a slightly different mean would be obtained using either method.
use "A:page340.dta", clear gen p1=nplant/10 gen p2=M/m gen pwt=p1*p2 svyset plant [pweight=pwt], fpc(nplant) vce(linearized) || _n, fpc(nmachine) pweight: pwt VCE: linearized Strata 1: <one> SU 1: plant FPC 1: nplant Strata 2: <one> SU 2: <observations> FPC 2: nmachine svy: total hours (running total on estimation sample) Survey: Total estimation Number of strata = 1 Number of obs = 104 Number of PSUs = 10 Population size = 4698 Design df = 9 -------------------------------------------------------------- | Linearized | Total Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hours | 21601.62 894.421 19578.29 23624.94 --------------------------------------------------------------
svy: mean hours (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 104 Number of PSUs = 10 Population size = 4698 Design df = 9 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hours | 4.598045 .2268273 4.084926 5.111164 --------------------------------------------------------------