As a statistical programming language, R allows users to access precise statistics instead of simply printing a mass of output to the screen. The examples below highlight how to create a complex sample survey design object and then directly query specific coefficients, error terms, and other survey design-related information as needed.

This page uses the `survey`

package. Make sure that you can load
it before trying to run the examples on this page. If you do not have
a package installed, run: `install.packages("packagename")`

, or
if you see the version is out of date, run: `update.packages()`

.

```
library(survey)
```

**Version info: **Code for this page was tested in R version 3.0.1 (2013-05-16)
On: 2013-06-25
With: survey 3.29-5; knitr 1.2

## Example

This example is taken from Levy and Lemeshow’s Sampling of Populations.

Page 106 repeated systematic sampling

Import the dataset from text directly into R using the `read.table`

function and the `text=`

parameter specifying the entire data set. The syntax `n`

indicates the end of one line of data.

```
mydata <-
read.table( text =
"cluster xi wt1 xibar M
2 15 9 5 54
13 13 9 4.33 54
31 12 9 4.00 54
34 10 9 3.33 54
46 21 9 7 54
53 10 9 3.33 54" ,
header = TRUE
)
```

More detail about `read.table`

or any other R function can be viewed by typing a question mark in front of the function name in the R console. For example: `?read.table`

.

In the R language, individual data sets are stored as `data.frame`

objects, allowing users to load as many tables into working memory as necessary for the analysis. After loading the `mydata`

table into memory, R functions can be run directly on this data table.

# the `class` of the `mydata` object class(mydata)

## [1] "data.frame"

# the first six rows of the data head(mydata)

## cluster xi wt1 xibar M ## 1 2 15 9 5.00 54 ## 2 13 13 9 4.33 54 ## 3 31 12 9 4.00 54 ## 4 34 10 9 3.33 54 ## 5 46 21 9 7.00 54 ## 6 53 10 9 3.33 54

# the last six rows of the data tail(mydata)

## cluster xi wt1 xibar M ## 1 2 15 9 5.00 54 ## 2 13 13 9 4.33 54 ## 3 31 12 9 4.00 54 ## 4 34 10 9 3.33 54 ## 5 46 21 9 7.00 54 ## 6 53 10 9 3.33 54

# the number of columns ncol(mydata)

## [1] 5

View a simple summary of your variable of interest. This `summary`

function accesses the `xibar`

column (variable) stored inside the `mydata`

data.frame object.

```
summary(mydata$xibar)
```

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.33 3.50 4.16 4.50 4.83 7.00

Initiate your `svydesign`

object for a **repeated systematic sample** design. This ``mydesign``

object will be used for all subsequent analysis commands:

```
mydesign <-
svydesign(
id = ~1 ,
data = mydata ,
weight = ~wt1 ,
fpc = ~M
)
```

From this point forward, the sampling specifications of the `mydata`

data set’s survey design have been fixed and most analysis commands will simply use the set of tools outlined on the R survey package homepage, referring to the object ``mydesign``

at the `design=`

parameter of the specific R function or method.

View the survey design structure:

mydesign

## Independent Sampling design ## svydesign(id = ~1, data = mydata, weight = ~wt1, fpc = ~M)

View the survey design’s object class or type:

```
class(mydesign)
```

## [1] "survey.design2" "survey.design"

View the weighted total population of this survey design, by referring to the `wt1`

column from the original data.frame:

```
sum(mydata$wt1)
```

## [1] 54

View the degrees of freedom for this survey design object:

```
degf(mydesign)
```

## [1] 5

Count the number of unweighted observations where the variable `xibar`

is not missing:

```
unwtd.count(~xibar, mydesign)
```

## counts SE ## counts 6 0

Print the mean and standard error of the `xibar`

variable:

```
svymean(~xibar, mydesign)
```

## mean SE ## xibar 4.5 0.53

Print the weighted total and standard error of the `xi`

variable:

```
svytotal(~xi, mydesign)
```

## total SE ## xi 729 86

Alternatively, the result of a function call like `svymean`

or `svytotal`

can be stored into a secondary R object.

```
mysvytotal <- svytotal(~xi, mydesign)
```

Once created, this `svytotal`

can be queried independently from the `mydesign`

object. For example, the `coef`

and `SE`

functions can directly extract those attributes:

```
coef(mysvytotal)
```

## xi ## 729

```
SE(mysvytotal)
```

## xi ## xi 85.9

Note that the number of decimal places shown can be adjusted by modifying the `digits`

parameter within the `options`

function at any time.

options(digits = 10) SE(mysvytotal)

## xi ## xi 85.94882198

We can use the `confint`

function to obtain confidence intervals for the coefficient estimates.

```
confint(mysvytotal)
```

## 2.5 % 97.5 % ## xi 560.5434044 897.4565956

We can specify the `df=`

parameter to use the survey design’s degrees of freedom (instead of the default `df=Inf`

) to replicate Stata’s confidence interval calculation method.

# matches stata confint(mysvytotal, df = degf(mydesign))

## 2.5 % 97.5 % ## xi 508.0615194 949.9384806

Print the median of the `xibar`

variable, including the confidence interval in the output.

svyquantile(~xibar, mydesign, quantiles = 0.5, ci = TRUE)

## $quantiles ## 0.5 ## xibar 4 ## ## $CIs ## , , xibar ## ## 0.5 ## (lower 3.330000000 ## upper) 5.958360258

## See also

- The R survey package homepage
- Lumley, T. Complex Surveys: A Guide to Analysis Using R (Wiley Series in Survey Methodology)
- Damico, A. Step-by-step instructions to analyze major public-use survey data sets with the R language