Probabilities and Distributions | R Learning Modules

Version info: Code for this page was tested in R version 3.0.2 (2013-09-25) On: 2013-11-19 With: lattice 0.20-24; foreign 0.8-57; knitr 1.5

1. Generating random samples from a normal distribution

Even though we would like to think of our samples as random, it is in fact almost impossible to generate random numbers on a computer. So, we will admit that we are really drawing a pseudo-random sample. In order to be able to reproduce the results on this page we will set the seed for our pseudo-random number generator to the value of 124 using the set.seed function. (For more information on the random number generator used in R please refer to the help pages for the Random.Seed function which has a very detailed explanation.)

set.seed(124)

It is often very useful to be able to generate a sample from a specific distribution. To generate a sample of size 100 from a standard normal distribution (with mean 0 and standard deviation 1) we use the rnorm function. We only have to supply the n (sample size) argument since mean 0 and standard deviation 1 are the default values for the mean and stdev arguments.

norm <- rnorm(100)

Now let’s look at the first 10 observations. We use square brackets to surround the first and last element number. In the output, the number of the first element listed on the line is given in the square brackets. For example, the [9] indicates that the first number given (0.19709386) is the ninth element.

norm[1:10]

##  [1] -1.3851  0.0383 -0.7630  0.2123  1.4255  0.7445  0.7002 -0.2294
##  [9]  0.1971  1.2072

mean(norm)

## [1] 0.00962

sd(norm)

## [1] 0.884

If we want to obtain a sample of values drawn from a normal distribution with a different value for the mean and standard deviation then we just have to use the mean and sd arguments. Let’s draw a sample of size 100 from a normal distribution with mean 2 and standard deviation 5.

set.seed(124)
norm <- rnorm(100, 2, 5)
norm[1:10]

##  [1] -4.925  2.192 -1.815  3.062  9.128  5.722  5.501  0.853  2.985  8.036

mean(norm)

## [1] 2.05

sd(norm)

## [1] 4.42

2. Generating random samples from other distributions

Here is a list of the functions that will generate a random sample from other common distributions: runif, rpois, rmvnorm, rnbinom, rbinom, rbeta, rchisq, rexp, rgamma, rlogis, rstab, rt, rgeom, rhyper, rwilcox, rweibull. Each function has its own set of parameter arguments. For example, the rpois function is the random number generator for the Poisson distribution and it has only the parameter argument lambda. The rbinom function is the random number generator for the binomial distribution and it takes two arguments: size and prob. The size argument specifies the number of Bernoulli trials and the prob argument specifies the probability of a success for each trial.

# Generating a random sample from a Poisson distribution with lambda=3
set.seed(124)
pois <- rpois(100, lambda = 3)
pois[1:10]

##  [1] 1 2 3 2 2 2 3 3 6 2

mean(pois)

## [1] 2.83

var(pois)

## [1] 2.34

# Generating a random sample from a Binomial distribution with size=20 and
# prob=.2
set.seed(124)
binom <- rbinom(100, 20, 0.2)
binom[1:10]

##  [1] 2 3 4 3 3 3 4 4 7 3

mean(binom)

## [1] 3.85

sd(binom)

## [1] 1.6

3. Other probability and distribution functions

For each of the distributions there are four functions which will generate fundamental quantities of a distribution. Let’s consider the normal distribution as an example. We have already given examples of the rnorm function which will generate a random sample from a specific normal distribution. The dnorm function will generate the density (or point) probability for a specific value for a normal distribution. This function is very useful for creating a plot of a density function of a distribution. In the list of the random number generator functions all the functions started with an “r”, similarly the density functions for all the distributions all start with a “d”.

# point probability for a specific value of a standard normal dist
dnorm(-1.96)

## [1] 0.0584

# plotting the density function of a normal distribution: N(2, .25)
x <- seq(0, 4, 0.1)

plot(x, dnorm(x, 2, 0.5), type = "l")

# plotting the density function of a binomial distribution: Binom(30, .25)
y <- 0:30
plot(y, dbinom(y, 30, 0.25), type = "h")

It is also possible to calculate p-values using the cumulative distribution functions. For the normal distribution this function is the pnorm and for the other distributions these functions all start with a “p”.

# calculating the p-values for the quantiles of a standard normal
1 - pnorm(1.959964)

## [1] 0.025

1 - pnorm(1.644854)

## [1] 0.05

It is also possible to calculate the quantiles for a specific distribution. For the normal distribution this function is the qnorm and for the other distribution these functions all start with a “q”.

# calculating the quantiles for the standard normal
qnorm(0.05)

## [1] -1.64

qnorm(0.025)

## [1] -1.96

4. The sample function

The sample function is used to generate a random sample from a given population. It can be used to sample with or without replacement by using the replace argument (the default is F). The only obligatory argument is a vector of data which will constitute the population from which the sample will be drawn. The default is to create a sample equal in size to the population but by using the size argument any sample size can be specified. A vector of probabilities can also be supplied in the prob argument. This vector has to be equal in length to the size of the population and it will automatically be normalized if its elements do not sum up to one. The default is for every element in the population to have equal chance of being chosen.

# random sample of size 8 from sequence [5, 15]
set.seed(124)

sample(seq(5, 15), 8)

## [1]  5  9 14  8  6 11  7 10

# random permutation of sequence [1, 10]
set.seed(124)

sample(10)

##  [1]  1  4  5  3  2  6  7  8  9 10

# random sample of size 10 from sequence [1, 5] with unequal probabilities
# of being chosen
set.seed(124)

sample(5, 10, prob = c(0.3, 0.4, 0.1, 0.1, 0.1), replace = T)

##  [1] 2 1 1 2 2 2 1 1 4 2