Introduction to Power Simulations in R

OARC Statistical Methods and Data Analytics

Introduction to Power Analysis

A scientific study

The average systolic blood pressure (SBP) of adult American males is 120 mmHg, with a standard deviation of 15 mmHg.
Amphetamine is a drug used to treat ADHD; however, we suspect it may have a side effect that increases SBP.
To determine whether the drug affects SBP, we conduct a hypothesis test.
We collect data from treated individuals to test this hypothesis.
We use a power analysis to determine how many participants are needed to reliably detect the drug’s potential side effect on SBP

Hypothesis Testing

Hypothesis testing is a statistical method used to make inferences about a population based on a sample. It involves the following steps:

State a null and an alternative hypothesis

Null Hypothesis ($H_0$): The drug has no effect on SBP; that is, the mean SBP of the treated group is 120 mmHg.

Alternative Hypothesis ($H_{1}$): The drug has a side effect; that is, the mean SBP of the treated group is greater than 120 mmHg.

\[ \begin{array}{l} H_0 : & \mu_{treated} = 120 \\ H_{1} : &\mu_{treated} > 120 \end{array} \]
Select an statistical test

In this case, we can use a one-sample z-test or one-sample t-test to compare the sample mean to the known population mean under $H_0$.
Collect data and calculate the test statistic

We gather a sample from the treated group, compute the sample mean ($\bar{x}$), and use it to calculate the test statistic for the chosen test.
Draw a conclusion

Based on the test statistic, we assess whether the sample provides enough evidence to conclude that the null hypothesis should be rejected in favor of the alternative.

Type I and Type II Errors

In hypothesis testing, our decisions are based on random sample data, which introduces uncertainty. As a result, we might sometimes make mistakes. These mistakes are classified as Type I and Type II errors.

Type I error ($\alpha$): Rejecting the null hypothesis when it is actually true. Also known as a false positive.
Type II error ($\beta$): Failing to reject the null hypothesis when the alternative hypothesis is actually true. Also known as a false negative.

Table of type one and type two errors

The power of a test is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true. It is calculated as:

\[\text{Power} = 1 - \beta\]

Test statistics, critical value and significance level

In this example, we assume the population standard deviation is known ($\sigma = 15$), so we use a one-sample z-test to construct our test.

We take a random sample of size $n$ from the treated group and calculate the sample mean $\bar{x}$.
We then standardize the sample mean to form a test statistic called the z-statistic:

\[z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\]

This z-statistic follows a standard normal distribution under the null hypothesis, assuming the population standard deviation $\sigma$ is known.
We set a significance level $\alpha$ (e.g., 0.05), to control the probability of making a Type I error — rejecting the null hypothesis when it is actually true.
The critical value for the test is the value beyond which we reject $H_0$:

\[\Pr(z > z_{1 - \alpha}) = \alpha\]

Figure 1 below shows the sampling distribution of the sample mean under the null hypothesis and how the critical value and type I error are related.

Figure 1: One-sided z-test: Alpha and Critical Value

Distribution of the alternative hypothesis and power

The power of a statistical test is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true.
In our example, the null hypothesis states that the mean SBP is 120 mmHg, while the alternative hypothesis states that it is greater than 120 mmHg.
If the true mean is actually 130 mmHg, then the power of the test is the probability that the test statistic falls in the rejection region, correctly leading us to reject the null hypothesis.
Figure 2 shows the distribution of the test statistic under the null hypothesis and under two alternative hypotheses, where the true mean SBP is assumed to be either 130 mmHg or 135 mmHg.

Figure 2: One-sided z-test: power and Critical Value

Sample size calculation for a given power

From the graph in the previous slides, we observe that under the alternative hypothesis, the distribution of the test statistic is shifted to the right, depending on the value of $\mu_1$.
More specifically, the distribution of the test statistic under the alternative is normal with mean:

\[ \delta = \frac{\mu_{1} - \mu_0}{\sigma/\sqrt{n}} \] and standard deviation of 1.

The power of the test is the probability that the test statistic under the alternative distribution exceeds the critical value $z_{1 - \alpha}$.
By setting the power to $1 - \beta$ and using the cumulative distribution function (CDF) of the standard normal distribution, we can express the required sample size as:

\[ n = \left[\frac{z_{1 - \alpha} - z_{\beta}}{(\mu_{1} - \mu_0)/\sigma}\right]^2 \]

Effect size

Effect size measures how large a difference or relationship is — not just whether it exists.
In power analysis, effect size is crucial. It directly affects how large your sample needs to be to detect a meaningful effect.
Effect sizes can be unscaled or scaled.

For example, in the one sample mean test, is simply the raw difference between the true mean and the mean under the null:

\[ d = \mu_{1} - \mu_0 \]

and the scaled effect size can be define as standardizes this difference by dividing by the standard deviation:

\[ d_{scaled} = \frac{\mu_{1} - \mu_0}{\sigma} \]

There are many different ways to define effect size, depending on the type of test and the context of the analysis.

Why Does Power Matter?

A low-powered study may fail to detect real effects — leading to false negatives.
A high-powered study gives you greater confidence that real effects will be detected when they exist.
Power analysis helps you choose the minimum sample size needed to detect meaningful effects with a desired level of certainty.

Components of Power Analysis

The power of a statistical test depends on several key components:

Component	Description
$\boldsymbol{\alpha}$ (alpha)	Type I error rate (false positive rate)
$\boldsymbol{\beta}$ (beta)	Type II error rate (false negative rate)
Effect size	Magnitude of the effect or difference being tested
Sample size	Larger sample sizes increase precision and power
Variability	Lower variability makes effects easier to detect

General Considerations for Sample Size Calculations

Significance level ($\alpha$):

The maximum probability of making a Type I error; commonly set at 0.05 for a two-sided test.
Other choices should be justified. For example, in rare diseases where participants are hard to find, a higher significance level (e.g., 0.1) may be appropriate.
May also need adjustment for multiple comparisons.

Power ($1 - \beta$):

Commonly used values are $1 - \beta = 0.8$ or $0.9$.
If a study cannot be repeated, consider $1 - \beta = 0.95$.
Power below 0.8 should be justified. Is it worth even conducting the study then?

Effect size:

The effect size should reflect a clinically or scientifically meaningful difference.
A study should not be conducted if the expected difference is not important or relevant to the research question.
For sample size calculations, use either the smallest clinically relevant difference or the expected difference, whichever is larger.
If no established threshold for clinical relevance exists, use domain expertise to define an appropriate effect size.

Assumed value of nuisance parameters

Nuisance parameters are not the focus of the primary research question but still affect power and sample size calculations.
These values can often be obtained from prior studies, but they typically vary or may be poorly estimated.
Therefore, it is important to account for their uncertainty by conducting a sensitivity analysis to examine how the required sample size changes with different assumptions.

Sample size allocation ratio

Generally, when comparing treatments, balanced groups will produce the most efficient test (smallest variance) and will thus require the smallest sample size overall
Efficiency loss is usually modest if the allocation ratio isn’t too extreme (e.g., 2:1 is often acceptable).
Additionally, if dropouts or missing data are expected, sample size should be increased to maintain power.

Example: Visualizing Power Curve in R

In this example, we visualize the power curve for a two-sample t-test using R.

\[ \begin{array}{l} H_0 : & \mu_1 - \mu_2 = 120 \\ H_{1} : & \mu_1 - \mu_2 \neq 120 \end{array} \]
The power curve shows how the power of the test changes with different sample sizes. The effect size is varying by color $d = .2, 0.5, .7, .9$ (small to large effect sizes) and the significance level is set to 0.05 and 0.1.
Effect size (Cohen’s d) is difference between the means divided by the pooled standard deviation.

\[ d = \frac{\mu_1 - \mu_2}{S_{pooled}} \]

Figure 3: Power Curve for Two-Sample t-test

Power Calculation using `pwr` Package

The pwr package provides analytical power calculations for:
- t-tests
- ANOVA
- Correlation
- Chi-squared tests
These are helpful when assumptions are met and models are simple.

Power Calculations for Common Tests

Power Calculation for One-Sample Z-Test

\[ \begin{array}{l} H_0 : & \mu = 120 \\ H_{1} : & \mu > 120 \end{array} \]

Power at $\mu=130$,

Given: $\sigma=15$, $n = 16$, $\alpha=0.05$.

#population standard deviation  
sigma <- 15
#mu under null hypothesis
mu0 <- 120
#mu under alternative hypothesis
mu1 <- 130
# Effect size
d <- (mu1 - mu0) / sigma 
# Power calculation for given sample size

pwr.norm.test(d = d, n = 16, sig.level = 0.05, alternative="greater")


     Mean power calculation for normal distribution with known variance 

              d = 0.6666667
              n = 16
      sig.level = 0.05
          power = 0.8465653
    alternative = greater

Required Sample Size for One-Sample Z-Test

# Required sample size for a given power
pwr.norm.test(d = d, power=0.8, sig.level = 0.05, alternative="greater")


     Mean power calculation for normal distribution with known variance 

              d = 0.6666667
              n = 13.91076
      sig.level = 0.05
          power = 0.8
    alternative = greater

The sample size should be rounded up.

Visualizing Power Function

# Power function of the same test for different value of mu, effect size
mu <- seq(120, 140, length.out = 100)
# Effect side
d <- (mu - mu0) / sigma
plot(d,
     pwr.norm.test(d = d, n = 16, sig.level = 0.05, alternative = "greater")$power,
     type = "l", xlab = "Effect size (Cohen's d)", ylab = "Power", ylim = c(0,1))
abline(h = 0.05, lty = 2)
abline(h = 0.80, col = "blue", lty = 2)

Two-Sample, two sided t-Test

Medium effect size (Cohen’s d = 0.5)

# Calculating n for two sided, two sample t-test 
pwr.t.test(d = 0.5, power = 0.8, sig.level = 0.05, 
           type = "two.sample", alternative = "two.sided")


     Two-sample t test power calculation 

              n = 63.76561
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

One-Way ANOVA

Three groups, medium effect size (Cohen’s f = 0.25)

#Power Calculation for One-Way ANOVA
pwr.anova.test(k = 3, f = 0.25, sig.level = 0.05, power = 0.8)


     Balanced one-way analysis of variance power calculation 

              k = 3
              n = 52.3966
              f = 0.25
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

Correlation Test

#Power Calculation for correlation Test
pwr.r.test(r = 0.3, sig.level = 0.05, power = 0.8, alternative = "two.sided")


     approximate correlation power calculation (arctangh transformation) 

              n = 84.07364
              r = 0.3
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

Chi-Squared Test of Independence

#Power Calculation for Chi-Squared Test
pwr.chisq.test(w = 0.3, df = 1, sig.level = 0.05, power = 0.8)


     Chi squared power calculation 

              w = 0.3
              N = 87.20954
             df = 1
      sig.level = 0.05
          power = 0.8

NOTE: N is the number of observations

Power Simulations in R

Why Use Power Simulations?

Analytical methods rely on simplifying assumptions:
- Normality.
- Equal variances.
- Simple comparisons (e.g., t-tests, ANOVA).
- Challenging to account for missing data or complex sampling.
Simulations are helpful when:
- Assumptions are violated (e.g., skewed or non-normal data).
- Models are complex (e.g., mixed models, mediation, nonlinear relationships).
Benefits of simulation-based power analysis:
- Flexible: Adaptable to any hypothesis or study design.
- Realistic: mimics the data-generating process.
- Transparent: you control and inspect every step.

Workflow for a Power Simulation

Identify the hypothesis and the parameter of interest.
Define the data-generating process
- Set the effect size, sample size, variance, and distribution.
- Specify the statistical model to be tested (e.g., two-sample t-test, logistic regression).
Simulate data
- Use functions like rnorm(), rbinom(), etc.
- Optionally, generate a large population dataset and sample from it.
Fit the model
- Use t.test(), lm(), glm(), or other relevant functions.
- Extract test statistics (e.g., t-value, p-value)
Evaluate significance
- Apply a decision rule (e.g., compare p-value to significance level such as 0.05).
Repeat Repeat the process many times (e.g., 1,000 simulations):
- Use replicate() or loops to repeat the simulation.
- Store results (e.g., p-values, test statistics).
Estimate power
- Power is the proportion of simulations where the effect is detected (i.e., $p < \alpha$).

For example if we collect 1000 p-values, the estimated power is:

\[ \text{Power} = \dfrac{ \sum_{i=1}^{1000} I(p_i < \alpha) } {1000 } \] Where $I(p_i<\alpha)$ is an indicator function for the $i$-th simulation that equals 1 if the p-value is less than $\alpha$, and 0 otherwise.

Useful R Tools We’ll Need

Random data generation
- rnorm(n, mean, sd) — normal distribution
- rbinom(n, size, prob) — binomial outcomes
- runif(n, min, max) — uniform distribution
- sample(x, size, replace = TRUE) — random sampling
Model fitting and hypothesis testing
- t.test(x, y) — for comparing means
- lm(y ~ x) — linear regression
- glm(y ~ x, family=binomial) — logistic regression
Simulation helpers
- for loops — repeat steps manually or by condition
- replicate(n, expr) — repeat simulation n times
- set.seed() — for reproducibility
- mean(), sd(), summary() — for descriptive summaries

Functions and Loops

Turn blocks of code into functions to avoid repetition and improve readability.

Function defination:

my_function <- function(arg1, arg2) { ... }

Basic structure of a for loop:

for (i in 1:n) {

# repeat this code n times

}

We can use a function inside a loop:

nsim <- 1000

results <- numeric(nsim)

for (i in 1:nsim) {

results[i] <- my_function(arg1, arg2)

}

Or use replicate() for compact syntax:

results <- replicate(nsim, my_function(arg1, arg2))

Example:

We write a function to sample data from a normal distribution and calculate the mean.

# Function to sample data and calculate mean
# Set seed for reproducibility
set.seed(101)
sample_mean <- function(n, mean = 0, sd = 1) {
  data <- rnorm(n, mean, sd)
  return(mean(data))
}

Use the function to simulate systolic blood pressure (SBP) under the null hypothesis ($\mu = 120$), and calculate the mean of a random sample.

# Function to sample data and calculate mean
# Set seed for reproducibility
set.seed(101)
sample_mean(n=100, mean = 120, sd = 15)

[1] 119.4421

Repeated Sampling using `replicate()`

We use the replicate() function to repeat a simulation multiple times.
Here, we simulate systolic blood pressure (SBP) under the null hypothesis ($\mu = 120$) 1,000 times, and calculate the sample mean each time.

# Set seed for reproducibility
set.seed(101)
# Replicate 1000 times
nsim <- 1000
x_bars <- replicate(nsim, sample_mean(100, mean = 120, sd = 15))
x_bars[1:6]

[1] 119.4421 119.3700 120.0311 119.6180 116.8302 120.7440

Then estimate the standard error of the sampling distribution:

#standard error
sd(x_bars)

[1] 1.43297

This should closely approximate the theoretical value of $15/\sqrt{100} = 1.5$

We can also visualize the sampling distribution of the sample means:

#Histogram of the sample means
hist(x_bars, breaks = 30, main = "Distribution of Sample Means", 
     xlab = "Sample Mean", col = "lightblue", freq = FALSE)

Question: If we repeatedly take samples of size $n = 10$ given the true population mean is 120 and conduct a hypothesis test each time, in what proportion of tests will the p-value be less than 0.05?

Hint: This proportion approximates the Type I error rate ($\alpha$).

# Function to sample data and calculate z-value and p-value
sample_p_value <- function(n, mu0 = 0, mu1 = 0, sd = 1) {
  # Generate random sample from population with normal distribution
  data <- rnorm(n, mu1, sd)
  # Conduct one-sample t-test
  z_value <- (mean(data) - mu0) / (sd / sqrt(n))
  # p-value is the probability a standard normal greater than z_value 
  p_value <- pnorm(z_value, lower.tail = FALSE)
  return(p_value)
}
#set seed
set.seed(101)

#set the number of simulations
nsim <- 1000

#replicate "nsim" times
p_values <- replicate(nsim, sample_p_value(10, mu0 = 120, mu1 = 120, sd = 15))

#power is the proportion of p-values less than 0.05
mean(p_values < 0.05)

[1] 0.039

Exercise

Simulate the statistical power when the true population mean is 125, the null hypothesis mean is 120, and the sample size is 100. Use nsim = 1000 repetitions.

[1] 0.959

R loops for Simulations

We use for loops to iterate over multiple parameters, conditions, or simply repeat the same process many times. A loop can be used instead of replicate().
for loops are more flexible and allow for complex operations, but they are generally less efficient than replicate() for simple tasks.

set.seed(101)

# Set number of simulations
nsim <- 1000

# Initialize an empty vector to store p-values
p_values <- rep(NA, nsim)

# Run the simulation
for (i in 1:nsim) {
  p_values[i] <- sample_p_value(n = 100, mu0 = 120, mu1 = 125, sd = 15)
}

# Estimate power as the proportion of p-values less than 0.05
mean(p_values < 0.05)

[1] 0.959

Here, we use a custom function inside the loop to simulate data and compute the p-value for each iteration.

Example: Simulating a Two-Sample t-Test

We want to estimate the power of a two-sample t-test for difference of BP between two group, treatment and control, using simulation.

From previous studies, we know that the standard deviation is approximately 15 mmHg and is expected to be similar between the two groups.

We aim to detect a 5 mmHg difference in systolic blood pressure between the treatment and control groups.

This is the steps for power simulations:

Step 1: Define the Hypothesis and Identify the Test Statistic:

Null hypothesis ($H_0$): $\mu_1 - \mu_2 = 0$ (no difference in means)

Alternative hypothesis ($H_1$): $_1 - _2 $ (there is a difference)

Step 2: Define the data-generating process:

Population means: $\mu_1 = 120$, $\mu_2 = 125$
Common standard deviation: $\sigma = 15$
Sample size per group: $n = 50$
Data are generated from normal distributions

Step 3: We simulate data

We simulate one dataset and store it in a data.frame.

The R code will be as follow:

set.seed(111)

# Parameters
n <- 50
mu1 <- 120
mu2 <- 125
sd <- 15

# Simulate data for both groups
group1 <- rnorm(n, mu1, sd)
group2 <- rnorm(n, mu2, sd)

# Create data frame
dat <- data.frame(BP = c(group1, group2),
                  treatment = rep(c("Yes", "No"), each = n))

dat[1:6, ]

         BP treatment
1 123.52831       Yes
2 115.03896       Yes
3 115.32564       Yes
4  85.46482       Yes
5 117.43686       Yes
6 122.10417       Yes

Step 4: Conduct two-sample t-test and extract p-value

# two-sample t-test and return p-value
test_result <- t.test(BP ~ treatment, var.equal = TRUE,
       conf.level = 0.95, data = dat)  
test_result


    Two Sample t-test

data:  BP by treatment
t = 3.2839, df = 98, p-value = 0.00142
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
  4.13435 16.76217
sample estimates:
 mean in group No mean in group Yes 
         127.5318          117.0836

#extract p-value
test_result$p.value

[1] 0.001420294

Step 5 & 6: Repeat the Process and Evaluate Significance

We repeat this simulation 1000 times and store the resulting p-values.

#set seed
set.seed(111)

# Set parameters
n <- 50 # sample size for each group
mu1 <- 120 # mean of group 1
mu2 <- 125 # mean of group 2 (5 mmHg difference)
sd <- 15 # standard deviation of both groups
# Number of simulations
n_sim <- 1000 

#initiate an empty vector for p-values
p_values <- rep(NA, nsim)

for(i in 1:nsim) {
    # Generate two random samples from normal distributions
  group1 <- rnorm(n, mu1, sd)
  group2 <- rnorm(n, mu2, sd)
# Create a data frame
dat <- data.frame(BP = c(group1, group2), treatment = rep(c("Yes", "No"), each = n))
# two-sample t-test and return p-value
test_result <- t.test(BP ~ treatment, var.equal = TRUE,
       conf.level = 0.95, data = dat)  

#extract p-value
p_values[i] <- test_result$p.value
}

Step 7: Estimate power

 # Calculate power as the proportion of p-values less than 0.05
power_t_test <- mean(p_values  < 0.05)
power_t_test

[1] 0.402

Wrapping It All Up in a Function

Let’s create a reusable function that simulates the power of a two-sample t-test, given input parameters such as sample size, effect size, standard deviation, and significance level.
This function makes it easy to explore how power changes across different scenarios.

To be more flexible we include extra options:

1- Use the alpha argument when calculating power (instead of hardcoded 0.05).

2- Include optional reproducibility with set.seed().

3- Add the option var.equal.

# Function to simulate power for a two-sample t-test
sample_t_test_power <- function(n, mu1 = 0, mu2 = 0, sd = 1,
                                alpha = 0.05, var.equal = TRUE,
                                nsim = 1000, seed = NULL) {
  # Set seed for reproducibility
  if (!is.null(seed)) set.seed(seed)

  # Preallocate vector to store p-values
  p_values <- rep(NA, nsim)

  # Run simulation
  for (i in 1:nsim) {
    group1 <- rnorm(n, mu1, sd)
    group2 <- rnorm(n, mu2, sd)
    dat <- data.frame(BP = c(group1, group2),
                      treatment = rep(c("Yes", "No"), each = n))
    result <- t.test(BP ~ treatment, data = dat, 
                     var.equal = var.equal, conf.level = 1 - alpha)
    p_values[i] <- result$p.value
  }

  # Return estimated power
  mean(p_values < alpha)
}

Example usage:

sample_t_test_power(n = 150, mu1 = 120, mu2 = 125, sd = 15, seed = 101)

[1] 0.816

Step-by-Step Code Using `replicate()` “optional”

# Function to simulate a two-sample t-test and return p-value
sample_t_test_p_value <- function(n, mu1 = 120, mu2 = 125, sd = 15) {
  group1 <- rnorm(n, mu1, sd)
  group2 <- rnorm(n, mu2, sd)
  t.test(group1, group2, var.equal = TRUE)$p.value
}

# Simulation setup
set.seed(111)
n_sim <- 1000
n <- 50  # sample size per group

# Run simulations using replicate
p_values <- replicate(n_sim, sample_t_test_p_value(n))

# Estimate power as proportion of p < 0.05
power_estimate <- mean(p_values < 0.05)
power_estimate

[1] 0.402

We can simulate the distribution of the test statistics and plot it.

# Plot the distribution of p-values from the two-sample t-test
#| fig-height: 8
#| fig-asp: .6
#| fig-align: 'center'
#| fig-alt: 'Power Simulation for Two-Sample t-Test'
#| fig-cap: 'Power Simulation for Two-Sample t-Test'
#| fig-ncol: 2
  
# Function to simulate two-sample t-test and return p-value
sample_t_test_t_value <- function(n, mu1 = 0, mu2 = 0, sd = 1) {
  # Generate two random samples from normal distributions
  group1 <- rnorm(n, mu1, sd)
  group2 <- rnorm(n, mu2, sd)
  # Conduct two-sample t-test
  t_test_result <- t.test(group1, group2, alternative = "two.sided")
  return(t_test_result$statistic)
}
# Set parameters
n <- 25 # sample size for each group
mu1 <- 120 # mean of group 1
mu2 <- 125 # mean of group 2 (5 mmHg difference)
sd <- 15 # standard deviation of both groups
# Number of simulations
n_sim <- 1000 
# Simulate p-values for two-sample t-test
t_values_t_test <- replicate(n_sim, sample_t_test_t_value(n, mu1, mu2, sd))

hist(t_values_t_test , breaks = 30, main = "Distribution of T statistics from Two-Sample t-Test", xlab = "T", col = "lightblue")

Plotting sample size vs. power.

Sample size is a key input in power simulations and must be set at the beginning of each simulation.
To determine the sample size required for a desired level of power, we run simulations across a range of sample sizes.
We then plot the estimated power for each sample size to visualize how power increases with sample size.
This approach helps identify the minimum sample size needed to achieve a target power (e.g., 0.8 or 0.9).
We can make a loop to simulate the power for different sample sizes or we can use vectorized operations for code efficiency.

# Let n be from 10 to 200 for each group
n_values <- 10:200
#use vectorized operation sapply
powers <- sapply(n_values, function(n) sample_t_test_power(n = n, mu1 = 120, mu2 = 125, sd = 15, nsim = 1000))

plot(powers ~ n_values, xlab = "Sample size", ylab = "Power")
abline(h = 0.8, col = "red", lty = 2)

Plot of power vs. sample size for two sided two sample t-test

Example: Power Simulation for Regression

In the two-sample t-test example, we generated data from two normal distributions and compared group means.

In this example, we simulate the same comparison using a linear regression model.

When the outcome is continuous and the exposure is binary (e.g., treatment vs. control), linear regression can be used to estimate the difference in means.

The model is specified as:

\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \]

Where:

$Y_i$ is the outcome variable
$X_i$ is a binary exposure variable
$\beta_0$ is the intercept
$\beta_1$ is the coefficient for the exposure variable
$\epsilon_i$ is the error term

In term of $\mu_1$ and $\mu_2$ the above equation will be:

\[ Y_i = \mu_1 + (\mu_2 - \mu_1) X_i + \epsilon_i \]

This is equivalent to

\[ Y_i = \mu_1(1 - X_i) + \mu_2 X_i + \epsilon_i\]

Where $\mu_1$ is the mean for the control group and $\mu_2$ is the mean for the treatment group.

We will simulate data under this model and estimate power by checking how often the regression detects a significant $\beta_1$.

# Function to simulate data for linear regression and return p-value
power_lm_p_value <- function(n, mu1 = 0, mu2 = 0, sd = 1) {
  # Generate binary exposure variable
  X <- rbinom(n, 1, 0.5) # 50% chance of being in either group
  # Generate outcome variable based on the linear model
  Y <- mu1 * (1 - X) + mu2 * X + rnorm(n, 0, sd)
  # Fit linear regression model
  lm_result <- lm(Y ~ X)
  #extract p-value for the exposure variable
  p_value <- summary(lm_result)$coefficients["X", "Pr(>|t|)"]
  return(p_value) # p-value for the exposure variable
}
# Set parameters
n <- 300 # sample size for both group
mu1 <- 120 # mean of group 1
mu2 <- 125 # mean of group 2 (5 mmHg difference)
sd <- 15 # standard deviation of both groups
# Number of simulations
n_sim <- 1000
# Simulate p-values for linear regression
p_values_lm <- replicate(n_sim, power_lm_p_value(n, mu1, mu2, sd))
# Calculate power as the proportion of p-values less than 0.05
power_lm <- mean(p_values_lm < 0.05)
power_lm

[1] 0.822

Adding a Covariate (e.g., Weight)

# Function with a covariate (e.g., weight)
power_lm_p_value_cov <- function(n, mu1 = 0, mu2 = 0, b1 = 0, sd = 1) {
  X <- rbinom(n, 1, 0.5)
  w <- rnorm(n, 170, 20)
  Y <- mu1 * (1 - X) + mu2 * X + b1 * w + rnorm(n, 0, sd)
  lm_result <- lm(Y ~ X + w)
  summary(lm_result)$coefficients["X", "Pr(>|t|)"]
}

# Parameters
b1 <- 5
p_values_cov <- replicate(n_sim, power_lm_p_value_cov(n, mu1, mu2, b1, sd))
power_lm_cov <- mean(p_values_cov < 0.05)
power_lm_cov

[1] 0.822

Advanced Power Simulation Examples

In this section, we extend power simulation methods to more advanced scenarios, including:

Paired t-test: Simulating power for detecting a mean difference in paired data (e.g., before vs. after treatment).
Logistic Regression with a Binary Predictor: Estimating power to detect an association between a binary exposure and a binary outcome.
Poisson Regression: Estimating power for count data modeled with Poisson regression (e.g., number of events per time unit).

Paired t-test

The paired t-test is used when measurements are taken on the same subject before and after a treatment, or when observations are naturally paired (e.g., left vs. right eye).
To simulate paired data, we need to account for the correlation between paired observations, which often requires simulating from a multivariate normal distribution.
This makes the simulation slightly more complex than for independent samples, but still manageable using R functions such as MASS::mvrnorm().

# Simulate power for a paired t-test using multivariate normal data
# Requires the MASS package for mvrnorm()

library(MASS)

power_paired_ttest <- function(mu1 = 0, mu2 = 1,          # Means of the two conditions
                               sigma1 = 1, sigma2 = 1,    # Standard deviations
                               rho = 0.5,                 # Correlation between paired values
                               alpha = 0.05,              # Significance level
                               n = 20,                    # Number of paired observations
                               nsim = 1000,               # Number of simulations
                               varequal = TRUE,           # Whether variances are assumed equal (for completeness)
                               seed = 12345) {
  
  # Compute the covariance from correlation and standard deviations
  cov12 <- rho * sigma1 * sigma2
  
  # Create covariance matrix
  Sigma <- matrix(c(sigma1^2, cov12, 
                    cov12, sigma2^2), nrow = 2)

  # Set random seed for reproducibility
  set.seed(seed)
  
  # Initialize vector to store whether null was rejected
  reject <- logical(nsim)
  
  for (i in seq_len(nsim)) {
    # Simulate n paired observations from bivariate normal
    y <- mvrnorm(n, mu = c(mu1, mu2), Sigma = Sigma)
    
    # Paired t-test
    p <- t.test(y[, 1], y[, 2], paired = TRUE)$p.value
    
    # Store whether the null hypothesis is rejected
    reject[i] <- (p < alpha)
  }
  
  # Calculate power
  power <- mean(reject)
  
  cat("Estimated power:", round(power, 4), "\n")
  return(power)
}

# Example usage
power_paired_ttest(mu1 = 0, mu2 = 1, sigma1 = 1, sigma2 = 1, rho = 0.5, nsim = 10000)

Estimated power: 0.9892

[1] 0.9892

power_paired_ttest(mu1 = 0, mu2 = 1, sigma1 = 1, sigma2 = 1, rho = 0.0, nsim = 10000)

Estimated power: 0.85

[1] 0.85

Logistic Regression with Binary Predictor

We simulate binary outcome data using the logistic regression model:

\[ \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X \]

Where:

$X$ is the binary outcome (0 or 1).
$p$ is the probability of success (e.g., disease presence).
$\beta_0$ is the intercept.
$\beta_1$ is the coefficient for the binary predictor (e.g., treatment effect).

To estimate power:

Simulate a large population under a logistic model
Sample from the population and fit logistic regression
Estimate power as the proportion of times the p-value < $\alpha$

# Function to estimate power for logistic regression
power_logistic <- function(p0 = 0.3,      # probability of success when x = 0
                           b0 = 0,        # intercept (logit scale)
                           b1 = 1,        # effect size (log odds ratio)
                           n = 40,        # sample size per simulation
                           alpha = 0.05,  # significance level
                           nsim = 1000) { # number of simulations
  
  # Create a large synthetic population
  popsize <- 1e6
  pop <- data.frame(x = rep(0:1, popsize / 2))
  
  # Compute log-odds and convert to probabilities
  pop$logodds <- b0 + b1 * pop$x
  pop$p <- exp(pop$logodds)/(1+exp(pop$logodds)) # logistic transformation
  
  # Simulate binary outcome from Bernoulli trials
  pop$y <- rbinom(popsize, size = 1, prob = pop$p)
  
  # Print average probabilities for x = 0 and x = 1
  cat("Mean p for x=0:", mean(pop$p[pop$x == 0]), "\n")
  cat("Mean p for x=1:", mean(pop$p[pop$x == 1]), "\n")
  
  # Initialize storage for results
  reject <- numeric(nsim)
  b1_sim <- numeric(nsim)
  
  # Simulation loop
  for (i in 1:nsim) {
    samp <- pop[sample(1:popsize, size = n), ]  # draw sample
    fit <- glm(y ~ x, data = samp, family = binomial)  # fit model
    b1_sim[i] <- coef(fit)["x"]  # store estimated coefficient
    # store whether null was rejected
    reject[i] <- summary(fit)$coefficients["x", "Pr(>|z|)"] < alpha
  }
  
  # Compute and print estimated power and average coefficient
  power <- mean(reject)
  cat("Estimated power:", round(power, 4), "\n")
  cat("Average estimated b1:", round(mean(b1_sim), 3), "\n")
}

# Run the power simulation
power_logistic(nsim = 1000)

Mean p for x=0: 0.5 
Mean p for x=1: 0.7310586 
Estimated power: 0.31 
Average estimated b1: 1.161

power_logistic(nsim = 1000, n = 100)

Mean p for x=0: 0.5 
Mean p for x=1: 0.7310586 
Estimated power: 0.638 
Average estimated b1: 1.011

Poisson Regression Simulation

We simulate count data using a Poisson regression model:

\[ \log(\lambda_i) = \beta_0 + \beta_1 X_i \]

Where:

$X_i$: binary predictor (e.g., control vs. treatment)
$\lambda_i$: expected count for individual $i$
$\beta_0$: log mean for control group
$\beta_1$: log rate ratio between groups

power_poisson <- function(b0 = 0, b1 = 1,
                          n = 40, alpha = 0.05, nsim = 1000) {
  
  # Step 1: Create a large synthetic population
  popsize <- 1e6
  pop <- data.frame(x = rep(0:1, popsize / 2))  # Binary predictor (0 or 1)
  
  # Step 2: Define expected counts (lambda) based on Poisson regression model
  pop$lambda <- exp(b0 + b1 * pop$x)
  
  # Step 3: Generate Poisson-distributed outcomes for the entire population
  pop$y <- rpois(popsize, lambda = pop$lambda)
  
  # Print average counts by group (to show expected difference)
  cat("Mean y for x = 0:", mean(pop$y[pop$x == 0]), "\n")
  cat("Mean y for x = 1:", mean(pop$y[pop$x == 1]), "\n")
  
  # Step 4: Initialize vectors to track results
  reject <- rep(0, nsim)   # 1 if null hypothesis is rejected
  b1_sim <- rep(0, nsim)   # Stores estimated b1 from each simulation
  
  # Step 5: Run simulations
  for (i in 1:nsim) {
    # Sample n individuals from the population
    samp <- pop[sample(1:popsize, size = n), ]
    
    # Fit Poisson regression model to sampled data
    m <- glm(y ~ x, data = samp, family = "poisson")
    
    # Store estimated coefficient for x
    b1_sim[i] <- coef(m)["x"]
    
    # Check if p-value for x is less than alpha
    reject[i] <- summary(m)$coefficients["x", 4] < alpha
  }
  
  # Step 6: Estimate power as proportion of simulations with significant result
  power <- sum(reject) / nsim
  cat("Power is", power, "\n")
  
  # Also report average estimated effect size (b1)
  cat("Mean b1 is", mean(b1_sim), "\n")
}

# Run the function with default parameters
power_poisson(nsim = 1000)

Mean y for x = 0: 0.998118 
Mean y for x = 1: 2.721098 
Power is 0.983 
Mean b1 is 1.030355

Summary

These examples show how to simulate power for GLMs (logistic and Poisson).
Flexibility of simulation allows control over predictors, link functions, and data-generating processes.
Can be extended to other models (e.g., mixed models, survival, etc.)

Reference

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.)

Introduction to Power Simulations in R

Introduction to Power Analysis

A scientific study

Hypothesis Testing

Type I and Type II Errors

Test statistics, critical value and significance level

Distribution of the alternative hypothesis and power

Sample size calculation for a given power

Effect size

Why Does Power Matter?

Components of Power Analysis

General Considerations for Sample Size Calculations

Significance level (\(\alpha\)):

Power (\(1 - \beta\)):

Effect size:

Assumed value of nuisance parameters

Sample size allocation ratio

Example: Visualizing Power Curve in R

Power Calculation using `pwr` Package

Power Calculations for Common Tests

Power Simulations in R

Why Use Power Simulations?

Workflow for a Power Simulation

Useful R Tools We’ll Need

Functions and Loops

Example:

Repeated Sampling using `replicate()`

R loops for Simulations

Example: Simulating a Two-Sample t-Test

Wrapping It All Up in a Function

Step-by-Step Code Using `replicate()` “optional”

Plotting sample size vs. power.

Example: Power Simulation for Regression

Adding a Covariate (e.g., Weight)

Advanced Power Simulation Examples

Paired t-test

Logistic Regression with Binary Predictor

Poisson Regression Simulation

Summary

Reference

Component	Description
\(\boldsymbol{\alpha}\) (alpha)	Type I error rate (false positive rate)
\(\boldsymbol{\beta}\) (beta)	Type II error rate (false negative rate)
Effect size	Magnitude of the effect or difference being tested
Sample size	Larger sample sizes increase precision and power
Variability	Lower variability makes effects easier to detect

Introduction to Power Simulations in R

Introduction to Power Analysis

A scientific study

Hypothesis Testing

Type I and Type II Errors

Test statistics, critical value and significance level

Distribution of the alternative hypothesis and power

Sample size calculation for a given power

Effect size

Why Does Power Matter?

Components of Power Analysis

General Considerations for Sample Size Calculations

Significance level (\(\alpha\)):

Power (\(1 - \beta\)):

Effect size:

Assumed value of nuisance parameters

Sample size allocation ratio

Example: Visualizing Power Curve in R

Power Calculation using pwr Package

Power Calculations for Common Tests

Power Simulations in R

Why Use Power Simulations?

Workflow for a Power Simulation

Useful R Tools We’ll Need

Functions and Loops

Example:

Repeated Sampling using replicate()

R loops for Simulations

Example: Simulating a Two-Sample t-Test

Wrapping It All Up in a Function

Step-by-Step Code Using replicate() “optional”

Plotting sample size vs. power.

Example: Power Simulation for Regression

Adding a Covariate (e.g., Weight)

Advanced Power Simulation Examples

Paired t-test

Logistic Regression with Binary Predictor

Poisson Regression Simulation

Summary

Reference

Power Calculation using `pwr` Package

Repeated Sampling using `replicate()`

Step-by-Step Code Using `replicate()` “optional”