As a statistical programming language, R allows users to access precise statistics instead of simply printing a mass of output to the screen. The examples below highlight how to create a complex sample survey design object and then directly query specific coefficients, error terms, and other survey design-related information as needed.
This page uses the survey
package. Make sure that you can load
it before trying to run the examples on this page. If you do not have
a package installed, run: install.packages("packagename")
, or
if you see the version is out of date, run: update.packages()
.
library(survey)
Version info: Code for this page was tested in R version 3.0.1 (2013-05-16)
On: 2013-06-25
With: survey 3.29-5; knitr 1.2
Example
This example is taken from Levy and Lemeshow’s Sampling of Populations.
Page 106 repeated systematic sampling
Import the dataset from text directly into R using the read.table
function and the text=
parameter specifying the entire data set. The syntax n
indicates the end of one line of data.
mydata <-
read.table( text =
"cluster xi wt1 xibar M
2 15 9 5 54
13 13 9 4.33 54
31 12 9 4.00 54
34 10 9 3.33 54
46 21 9 7 54
53 10 9 3.33 54" ,
header = TRUE
)
More detail about read.table
or any other R function can be viewed by typing a question mark in front of the function name in the R console. For example: ?read.table
.
In the R language, individual data sets are stored as data.frame
objects, allowing users to load as many tables into working memory as necessary for the analysis. After loading the mydata
table into memory, R functions can be run directly on this data table.
# the `class` of the `mydata` object class(mydata)
## [1] "data.frame"
# the first six rows of the data head(mydata)
## cluster xi wt1 xibar M ## 1 2 15 9 5.00 54 ## 2 13 13 9 4.33 54 ## 3 31 12 9 4.00 54 ## 4 34 10 9 3.33 54 ## 5 46 21 9 7.00 54 ## 6 53 10 9 3.33 54
# the last six rows of the data tail(mydata)
## cluster xi wt1 xibar M ## 1 2 15 9 5.00 54 ## 2 13 13 9 4.33 54 ## 3 31 12 9 4.00 54 ## 4 34 10 9 3.33 54 ## 5 46 21 9 7.00 54 ## 6 53 10 9 3.33 54
# the number of columns ncol(mydata)
## [1] 5
View a simple summary of your variable of interest. This summary
function accesses the xibar
column (variable) stored inside the mydata
data.frame object.
summary(mydata$xibar)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.33 3.50 4.16 4.50 4.83 7.00
Initiate your svydesign
object for a repeated systematic sample design. This `mydesign`
object will be used for all subsequent analysis commands:
mydesign <-
svydesign(
id = ~1 ,
data = mydata ,
weight = ~wt1 ,
fpc = ~M
)
From this point forward, the sampling specifications of the mydata
data set’s survey design have been fixed and most analysis commands will simply use the set of tools outlined on the R survey package homepage, referring to the object `mydesign`
at the design=
parameter of the specific R function or method.
View the survey design structure:
mydesign
## Independent Sampling design ## svydesign(id = ~1, data = mydata, weight = ~wt1, fpc = ~M)
View the survey design’s object class or type:
class(mydesign)
## [1] "survey.design2" "survey.design"
View the weighted total population of this survey design, by referring to the wt1
column from the original data.frame:
sum(mydata$wt1)
## [1] 54
View the degrees of freedom for this survey design object:
degf(mydesign)
## [1] 5
Count the number of unweighted observations where the variable xibar
is not missing:
unwtd.count(~xibar, mydesign)
## counts SE ## counts 6 0
Print the mean and standard error of the xibar
variable:
svymean(~xibar, mydesign)
## mean SE ## xibar 4.5 0.53
Print the weighted total and standard error of the xi
variable:
svytotal(~xi, mydesign)
## total SE ## xi 729 86
Alternatively, the result of a function call like svymean
or svytotal
can be stored into a secondary R object.
mysvytotal <- svytotal(~xi, mydesign)
Once created, this svytotal
can be queried independently from the mydesign
object. For example, the coef
and SE
functions can directly extract those attributes:
coef(mysvytotal)
## xi ## 729
SE(mysvytotal)
## xi ## xi 85.9
Note that the number of decimal places shown can be adjusted by modifying the digits
parameter within the options
function at any time.
options(digits = 10) SE(mysvytotal)
## xi ## xi 85.94882198
We can use the confint
function to obtain confidence intervals for the coefficient estimates.
confint(mysvytotal)
## 2.5 % 97.5 % ## xi 560.5434044 897.4565956
We can specify the df=
parameter to use the survey design’s degrees of freedom (instead of the default df=Inf
) to replicate Stata’s confidence interval calculation method.
# matches stata confint(mysvytotal, df = degf(mydesign))
## 2.5 % 97.5 % ## xi 508.0615194 949.9384806
Print the median of the xibar
variable, including the confidence interval in the output.
svyquantile(~xibar, mydesign, quantiles = 0.5, ci = TRUE)
## $quantiles ## 0.5 ## xibar 4 ## ## $CIs ## , , xibar ## ## 0.5 ## (lower 3.330000000 ## upper) 5.958360258
See also
- The R survey package homepage
- Lumley, T. Complex Surveys: A Guide to Analysis Using R (Wiley Series in Survey Methodology)
- Damico, A. Step-by-step instructions to analyze major public-use survey data sets with the R language