Goal of this workshop

  • Tables are the main way we present statistical results in papers, reports, and dissertations.

  • Journals expect tables that are clear, properly labeled, and reproducible.

  • Manually formatted tables are time-consuming to update and easy to get wrong.

  • In this workshop, we’ll focus on creating publication-ready tables directly from R, so formatting and numbers stay in sync with your analysis.

  • We will introduce packages such as kableExtra, flextable, gt, gtExtras, DT, sjPlot, and gtsummary as examples of reproducible table creation in R.

From copy-paste to reproducible tables

A common workflow is:

  • run a model in R

  • copy estimates and p-values into Word or PowerPoint

  • adjust the table by hand

This is slow and error-prone. Any change in the code or data requires editing the table again.

A code-based workflow (for example using R Markdown):

  • keeps code, results, and text connected

  • generates tables directly from fitted models and data

  • updates tables automatically when the analysis changes

  • makes it easier to reproduce and share the results later

What is a statistical table?

A statistical table is a structured arrangement of data in rows and columns that helps readers see patterns and compare results.

Key components:

  • Title – briefly describes what the table shows and in what context

  • Rows – groups, categories, or observations

  • Columns – variables or summary measures

  • Cells – the numerical values or estimates

  • Headings – labels for rows and columns

  • Footnotes – details such as sample size, model type, or abbreviations

The following table is an example of a well-formatted statistical table created using the flextable package in R. It includes a title, clear column headings, and footnotes for additional context.

Table 1. Annual Population and Growth Rate, 2020–2024

Year

Population (millions)

Growth Rate (%)

2020

3.98

1.2

2021

4.02

1.0

2022

4.05

0.8

2023

4.10

1.2

2024

4.12

0.5

Population estimates based on mid-year counts.

‡ Source: National Statistics Department (2024).

Basic tables and output in R

  • For simple tabular results in R, you can put values into a small data.frame and print it.

  • The print() function is the most common way to display output in the console.

  • The summary() function provides a quick overview of the main features of an object.

  • Both print() and summary() are generic functions. They behave differently for different R objects (for example, data frames, model objects, and factors).

Example: Summary statistics of mtcars dataset

# Load the mtcars dataset
data(mtcars)

# First 6 rows
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
#create a data frame with mean and sd of mpg in mtcars
tab <- data.frame(
  Statistic = c("Mean", "SD"),
  Value     = c(mean(mtcars$mpg), sd(mtcars$mpg))
)

print(tab)      # print using print function (explicitly)
##   Statistic     Value
## 1      Mean 20.090625
## 2        SD  6.026948
#tab        #Or Implicit Printing

summary(mtcars[,1:4]) # summary of first 4 columns of mtcars
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0

hsbdemo dataset

hsbdemo, is a sample of high school performance for 200 students.

The first step in any statistical analysis is to understand the data.

Note: The datasets used in this workshop are not real. They are only for demonstrating statistical analysis.

# Read the data
hsb <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")
# Variable names
names(hsb)
##  [1] "id"      "female"  "ses"     "schtyp"  "prog"    "read"    "write"  
##  [8] "math"    "science" "socst"   "honors"  "awards"  "cid"
# Structure of the data frame
str(hsb)
## 'data.frame':    200 obs. of  13 variables:
##  $ id     : int  45 108 15 67 153 51 164 133 2 53 ...
##  $ female : chr  "female" "male" "male" "male" ...
##  $ ses    : chr  "low" "middle" "high" "low" ...
##  $ schtyp : chr  "public" "public" "public" "public" ...
##  $ prog   : chr  "vocation" "general" "vocation" "vocation" ...
##  $ read   : int  34 34 39 37 39 42 31 50 39 34 ...
##  $ write  : int  35 33 39 37 31 36 36 31 41 37 ...
##  $ math   : int  41 41 44 42 40 42 46 40 33 46 ...
##  $ science: int  29 36 26 33 39 31 39 34 42 39 ...
##  $ socst  : int  26 36 42 32 51 39 46 31 41 31 ...
##  $ honors : chr  "not enrolled" "not enrolled" "not enrolled" "not enrolled" ...
##  $ awards : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cid    : int  1 1 1 1 1 1 1 1 1 1 ...
# First 6 rows
head(hsb)
##    id female    ses schtyp     prog read write math science socst       honors
## 1  45 female    low public vocation   34    35   41      29    26 not enrolled
## 2 108   male middle public  general   34    33   41      36    36 not enrolled
## 3  15   male   high public vocation   39    39   44      26    42 not enrolled
## 4  67   male    low public vocation   37    37   42      33    32 not enrolled
## 5 153   male middle public vocation   39    31   40      39    51 not enrolled
## 6  51 female   high public  general   42    36   42      31    39 not enrolled
##   awards cid
## 1      0   1
## 2      0   1
## 3      0   1
## 4      0   1
## 5      0   1
## 6      0   1

By printing the first 6 rows of the data we created a Tabular of the first 6 observations.

We can use summary() function to report summary statistics. for example we can get the summary statistics of students scores and honors.

#Summary statistics
summary(hsb[c("read", "write", "math", "science", "socst", "honors")])
##       read           write            math          science     
##  Min.   :28.00   Min.   :31.00   Min.   :33.00   Min.   :26.00  
##  1st Qu.:44.00   1st Qu.:45.75   1st Qu.:45.00   1st Qu.:44.00  
##  Median :50.00   Median :54.00   Median :52.00   Median :53.00  
##  Mean   :52.23   Mean   :52.77   Mean   :52.65   Mean   :51.85  
##  3rd Qu.:60.00   3rd Qu.:60.00   3rd Qu.:59.00   3rd Qu.:58.00  
##  Max.   :76.00   Max.   :67.00   Max.   :75.00   Max.   :74.00  
##      socst          honors         
##  Min.   :26.00   Length:200        
##  1st Qu.:46.00   Class :character  
##  Median :52.00   Mode  :character  
##  Mean   :52.41                     
##  3rd Qu.:61.00                     
##  Max.   :71.00

which variables look categorical?

Contingency tables with table() and xtabs()

The table() function in base R creates frequency tables that summarize categorical data.

Some variables in hsbdemo are categorical.
We convert them to factors before creating tables.

In our data, we want a cross-tabulation (contingency table) for the variables ses and honors.

# Convert categorical variables to factors
hsb <- within(hsb, {
  female <- factor(female)
  ses    <- factor(ses)
  schtyp <- factor(schtyp)
  prog   <- factor(prog)
  honors <- factor(honors)
})

# Two-way contingency table
tab1 <- table(hsb$ses, hsb$honors)
tab1
##         
##          enrolled not enrolled
##   high         26           32
##   low          11           36
##   middle       16           79
# Proportional table
prop.table(tab1)
##         
##          enrolled not enrolled
##   high      0.130        0.160
##   low       0.055        0.180
##   middle    0.080        0.395
# Proportional table by row
prop.table(tab1, margin = 1)
##         
##           enrolled not enrolled
##   high   0.4482759    0.5517241
##   low    0.2340426    0.7659574
##   middle 0.1684211    0.8315789

We can also use the xtabs() function from the stats package.

#Two-way contingency table using xtabs()
tab2 <- xtabs(~ ses + honors, data = hsb)
tab2
##         honors
## ses      enrolled not enrolled
##   high         26           32
##   low          11           36
##   middle       16           79
#Proportional table
prop.table(tab2)
##         honors
## ses      enrolled not enrolled
##   high      0.130        0.160
##   low       0.055        0.180
##   middle    0.080        0.395
#Proportional table by row
prop.table(tab2, margin = 1)
##         honors
## ses       enrolled not enrolled
##   high   0.4482759    0.5517241
##   low    0.2340426    0.7659574
##   middle 0.1684211    0.8315789
#Summary on a table object will perform a chi-squared test
summary(tab2)
## Call: xtabs(formula = ~ses + honors, data = hsb)
## Number of cases in table: 200 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 14.783, df = 2, p-value = 0.0006164
#Three-way cross-tabulation
tab3 <- xtabs(~ ses + honors + female, data = hsb)

tab3
## , , female = female
## 
##         honors
## ses      enrolled not enrolled
##   high         15           14
##   low          10           22
##   middle       10           38
## 
## , , female = male
## 
##         honors
## ses      enrolled not enrolled
##   high         11           18
##   low           1           14
##   middle        6           41
ftable(tab3)
##                     female female male
## ses    honors                         
## high   enrolled                15   11
##        not enrolled            14   18
## low    enrolled                10    1
##        not enrolled            22   14
## middle enrolled                10    6
##        not enrolled            38   41

Regression models

Regression models are often used to understand the relationship between a dependent variable and one or more independent variables.

In R, we use the summary() function to extract and report results from a regression model.

As an example, we use the hsb data to regress math score on read and write scores and prog.

# Run regression of math on read, write, and prog
m1 <- lm(math ~ read + write + prog, data = hsb)
lm.result <- summary(m1)
lm.result
## 
## Call:
## lm(formula = math ~ read + write + prog, data = hsb)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.257  -4.564  -0.211   4.271  17.527 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.20202    3.35561   5.722 3.91e-08 ***
## read          0.37186    0.05685   6.541 5.24e-10 ***
## write         0.29591    0.06149   4.812 2.98e-06 ***
## proggeneral  -2.87185    1.18968  -2.414  0.01670 *  
## progvocation -3.79862    1.23942  -3.065  0.00249 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.408 on 195 degrees of freedom
## Multiple R-squared:  0.5415, Adjusted R-squared:  0.5321 
## F-statistic: 57.57 on 4 and 195 DF,  p-value: < 2.2e-16
# Extracting coefficients table (it is a matrix)
lm.result$coefficients
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  19.2020151 3.35561280  5.722357 3.911030e-08
## read          0.3718589 0.05684928  6.541138 5.240877e-10
## write         0.2959093 0.06149161  4.812190 2.984266e-06
## proggeneral  -2.8718518 1.18968055 -2.413969 1.670404e-02
## progvocation -3.7986171 1.23941526 -3.064846 2.486293e-03
# Adding confidence interval for coefficients
lm.table <- cbind(lm.result$coefficients, confint(m1))

# Changing the names of columns
colnames(lm.table)[c(5,6)] <- c("LL", "UL")

# Round numbers to 4 digits and print
round(lm.table, 4)
##              Estimate Std. Error t value Pr(>|t|)      LL      UL
## (Intercept)   19.2020     3.3556  5.7224   0.0000 12.5841 25.8200
## read           0.3719     0.0568  6.5411   0.0000  0.2597  0.4840
## write          0.2959     0.0615  4.8122   0.0000  0.1746  0.4172
## proggeneral   -2.8719     1.1897 -2.4140   0.0167 -5.2181 -0.5256
## progvocation  -3.7986     1.2394 -3.0648   0.0025 -6.2430 -1.3542

Advanced Tables in R Using R Packages within R Markdown

Advantages of using R Markdown to create tables

  • Reproducibility
    Embedding code directly in the document ensures that tables can be reproduced easily and reduces errors from manual copying.

  • Dynamic updates
    Changes to the data or analysis are automatically reflected in the tables when the document is re-rendered.

  • Automation
    R Markdown generates and formats tables automatically, avoiding repetitive tasks such as reformatting or retyping results.

  • Consistency
    Tables maintain a consistent format and style throughout the document, which is especially useful for longer reports.

  • Customization and formatting
    Table packages in R allow advanced customization and professional presentation, without relying on external tools for formatting.

In summary, using R Markdown to create tables supports automation, reproducibility, and consistency, while also providing more control over formatting than simple copy–paste from the console.

In the rest of the workshop, we introduce some of these packages through examples.

knitr::kable and kableExtra

The kable() function in the knitr package is a simple table generator for rectangular data (matrices and data frames).
It creates basic tables in formats such as HTML, LaTeX, and Pandoc pipe tables, and is often the starting point before adding styling with kableExtra.

Advantages of knitr::kable and kableExtra

  • Simplicity
    knitr::kable provides a straightforward way to create clean tables with minimal code. It is easy to use for simple tables that do not require extensive customization.

  • Integration with R Markdown
    kable() is designed to work seamlessly with R Markdown, making it easy to generate tables that fit well within dynamic documents.

  • Flexibility with kableExtra
    When paired with kableExtra, kable becomes highly customizable. You can add features such as multi-row headers, colors, borders, alignment adjustments, column spanning, custom styling, and footnotes.

  • Themes
    kableExtra offers several alternative HTML table themes beyond the default Bootstrap theme.

The kable() function in the knitr package is a very simple table generator. It only generates tables for strictly rectangular data such as matrices and data frames.

This function has a number of arguments that can be used to customize the appearance of tables:

kable(x, format, digits = getOption("digits"), row.names = NA, col.names = NA, align, caption = NULL, label = NULL, format.args = list(), escape = TRUE, …)

format is a character string. Possible values include "latex", "html", "pipe" (Pandoc’s pipe tables), and others.

If you only need one table format that is not the default format for a document, you can set the global R option knitr.table.format, for example:

options(knitr.table.format = "html")

In the examples below, we use the dataset state.x77 to make table of population, income, and literacy for the first seven states in state.x77.

state7 <- data.frame(state.x77)[1:7, ]

knitr::kable(head(state7), format = "html", 
             caption = "Table 1. Population, Income, and Literacy for First Seven States")
Table 1. Population, Income, and Literacy for First Seven States
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766

If we use the pipe format, the table will look like the image below. This is a Pandoc pipe table and how it is rendered depends on the type of R Markdown output format.

For example, since these slides use ioslides, if we do not specify format or if we use format = "pipe", the output table will look like this:

my.table <- knitr::kable(state7, 
             caption = "Table 1. Population, Income, and Literacy for First Seven States")
my.table
Table 1. Population, Income, and Literacy for First Seven States
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

To learn more about knitr::kable() and it’s options you can check out the link below:

rmarkdown-cookbook, 10.1 The function knitr::kable()

kableExtra

The kableExtra package extends knitr::kable(). Its goal is to help you build common complex tables and manipulate table styles.
It imports the pipe %>% from magrittr (and also works with the base R pipe |>) and provides a set of functions that can be added in layers to a kable output, in a way similar to how layers are added in ggplot2.

The basic HTML output from kbl() is just a plain HTML table without any styling.

#plain HTML
kbl(state7, 
             caption = "Table 1. Population, Income, and Literacy for First Seven States")
Table 1. Population, Income, and Literacy for First Seven States
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Bootstrap theme

kable_styling() will automatically apply a Bootstrap theme to the table.

To see more options for this function, check the help file:

?kable_styling

state7 %>%
  kbl(caption = "Table 1. Population, Income, and Literacy for First Seven States") %>%
  # Bootstrap theme
  kable_styling()
Table 1. Population, Income, and Literacy for First Seven States
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Alternative themes

kableExtra also offers six alternative HTML table themes besides the default Bootstrap theme:
kable_paper, kable_classic, kable_classic_2, kable_minimal, kable_material, and kable_material_dark.

We can also use options in kable_styling() and theme functions to customize the output table.

Here are some examples:

state7 %>%
  kbl(caption = "Table 1. Population, Income, and Literacy for First Seven States") %>%
  # paper theme with hover and full_width = FALSE
  kable_paper("hover", full_width = FALSE)
Table 1. Population, Income, and Literacy for First Seven States
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Full width

state7 %>%
  kbl(caption = "Recreating booktabs-style table") %>%
  # classic theme and other options
  kable_classic(full_width = FALSE, html_font = "Cambria", position = "left")
Recreating booktabs-style table
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

striped

state7 %>%
  kbl(caption = "material theme with striped rows") %>%
  # material theme with striped rows
  kable_material(lightable_options = c("striped"))
material theme with striped rows
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Column / Row Specification

kbl(state7) %>%
  # paper theme 
  kable_paper(full_width = FALSE) %>%
  # make first column bold and add right border
  column_spec(1, bold = TRUE, border_right = TRUE) %>%
  # make column 9 wider with yellow background
  column_spec(9, width = "6em", background = "yellow")
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Conditional formatting

kbl(state7) %>%
  # paper theme 
  kable_paper(full_width = FALSE) %>%
  # conditional text color in column 2
  column_spec(2, color = spec_color(state7$Population, palette = c("black", "red"))) %>%
  # conditional background in column 4 with white text
  column_spec(4, color = "white",
              background = spec_color(state7$Illiteracy <= 1.5, palette = c("red", "green"))) %>%
  # change first row angle
  row_spec(0, angle = -45)
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

One useful feature of kableExtra (available for HTML output) is scroll_box().
If you have a large table and want to include it in a website or HTML document without using a lot of space, adding a scroll box is a good option.

kbl(state.x77) %>%
  kable_paper() %>%
  # add scroll bar
  scroll_box(width = "400px", height = "200px")
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862
Delaware 579 4809 0.9 70.06 6.2 54.6 103 1982
Florida 8277 4815 1.3 70.66 10.7 52.6 11 54090
Georgia 4931 4091 2.0 68.54 13.9 40.6 60 58073
Hawaii 868 4963 1.9 73.60 6.2 61.9 0 6425
Idaho 813 4119 0.6 71.87 5.3 59.5 126 82677
Illinois 11197 5107 0.9 70.14 10.3 52.6 127 55748
Indiana 5313 4458 0.7 70.88 7.1 52.9 122 36097
Iowa 2861 4628 0.5 72.56 2.3 59.0 140 55941
Kansas 2280 4669 0.6 72.58 4.5 59.9 114 81787
Kentucky 3387 3712 1.6 70.10 10.6 38.5 95 39650
Louisiana 3806 3545 2.8 68.76 13.2 42.2 12 44930
Maine 1058 3694 0.7 70.39 2.7 54.7 161 30920
Maryland 4122 5299 0.9 70.22 8.5 52.3 101 9891
Massachusetts 5814 4755 1.1 71.83 3.3 58.5 103 7826
Michigan 9111 4751 0.9 70.63 11.1 52.8 125 56817
Minnesota 3921 4675 0.6 72.96 2.3 57.6 160 79289
Mississippi 2341 3098 2.4 68.09 12.5 41.0 50 47296
Missouri 4767 4254 0.8 70.69 9.3 48.8 108 68995
Montana 746 4347 0.6 70.56 5.0 59.2 155 145587
Nebraska 1544 4508 0.6 72.60 2.9 59.3 139 76483
Nevada 590 5149 0.5 69.03 11.5 65.2 188 109889
New Hampshire 812 4281 0.7 71.23 3.3 57.6 174 9027
New Jersey 7333 5237 1.1 70.93 5.2 52.5 115 7521
New Mexico 1144 3601 2.2 70.32 9.7 55.2 120 121412
New York 18076 4903 1.4 70.55 10.9 52.7 82 47831
North Carolina 5441 3875 1.8 69.21 11.1 38.5 80 48798
North Dakota 637 5087 0.8 72.78 1.4 50.3 186 69273
Ohio 10735 4561 0.8 70.82 7.4 53.2 124 40975
Oklahoma 2715 3983 1.1 71.42 6.4 51.6 82 68782
Oregon 2284 4660 0.6 72.13 4.2 60.0 44 96184
Pennsylvania 11860 4449 1.0 70.43 6.1 50.2 126 44966
Rhode Island 931 4558 1.3 71.90 2.4 46.4 127 1049
South Carolina 2816 3635 2.3 67.96 11.6 37.8 65 30225
South Dakota 681 4167 0.5 72.08 1.7 53.3 172 75955
Tennessee 4173 3821 1.7 70.11 11.0 41.8 70 41328
Texas 12237 4188 2.2 70.90 12.2 47.4 35 262134
Utah 1203 4022 0.6 72.90 4.5 67.3 137 82096
Vermont 472 3907 0.6 71.64 5.5 57.1 168 9267
Virginia 4981 4701 1.4 70.08 9.5 47.8 85 39780
Washington 3559 4864 0.6 71.72 4.3 63.5 32 66570
West Virginia 1799 3617 1.4 69.48 6.7 41.6 100 24070
Wisconsin 4589 4468 0.7 72.48 3.0 54.5 149 54464
Wyoming 376 4566 0.6 70.29 6.9 62.9 173 97203

To learn more about table styles and options in kable_styling You can check the link below:

Create Awesome HTML Table with knitr::kable and kableExtra

Using the flextable package

flextable is designed to create and format tables that can be easily exported to Word and PowerPoint documents. It allows users to build richly formatted tables with features such as text formatting, colors, borders, and alignment, making it useful for generating professional-looking tables in document reports.

Advantages of flextable compared to other packages

  • Extensive customization
    flextable offers detailed control over the formatting of tables, including text alignment, fonts, colors, borders, and cell-level styling. This level of customization goes beyond what simpler packages like kable can offer.

  • Integration with Word and PowerPoint
    A key feature of flextable is its integration with Microsoft Word and PowerPoint through the officer package. You can export formatted tables directly into these documents, which is ideal if you frequently work in Word or PowerPoint.

  • Conditional formatting
    The package supports conditional formatting based on the values in the table, which is useful for highlighting key data points or making tables more informative visually.

  • Predefined themes
    flextable provides built-in themes that give tables a consistent and professional appearance, while reducing the effort needed to style them.

The main function is flextable(), which takes a data.frame as an argument and returns a flextable object.

ft <- flextable(hsb[1:10, -13])
ft |>
  # add header row
  add_header_row(
    colwidths = c(3, 2, 5, 2),
    values = c("Student", "School", "Grades", "Achievements")
  ) |>
  # apply vanilla theme
  theme_vanilla() |>
  # add footer
  add_footer_lines("This data is simulated and it is not real") |>
  color(part = "footer", color = "#666666") |>
  # set caption
  set_caption(caption = "First 10 rows of a sample of high school data") |>
  # align header to center
  align(align = "center", part = "header", i = 1)
First 10 rows of a sample of high school data

Student

School

Grades

Achievements

id

female

ses

schtyp

prog

read

write

math

science

socst

honors

awards

45

female

low

public

vocation

34

35

41

29

26

not enrolled

0

108

male

middle

public

general

34

33

41

36

36

not enrolled

0

15

male

high

public

vocation

39

39

44

26

42

not enrolled

0

67

male

low

public

vocation

37

37

42

33

32

not enrolled

0

153

male

middle

public

vocation

39

31

40

39

51

not enrolled

0

51

female

high

public

general

42

36

42

31

39

not enrolled

0

164

male

middle

public

vocation

31

36

46

39

46

not enrolled

0

133

male

middle

public

vocation

50

31

40

34

31

not enrolled

0

2

female

middle

public

vocation

39

41

33

42

41

not enrolled

0

53

male

middle

public

vocation

34

37

46

39

31

not enrolled

0

This data is simulated and it is not real

The flextable package will not aggregate data for you, but it helps you present aggregated data. It also has some useful functions to generate descriptive statistics.

Cross-tabulation with proc_freq()

The function proc_freq() computes a contingency table and creates a flextable from the result. The goal is to reproduce the output of the SAS PROC FREQ.

proc_freq(
  hsb, "ses", "honors",
  include.row_percent    = TRUE,
  include.column_percent = TRUE,
  include.table_percent  = TRUE
)

ses

honors

enrolled

not enrolled

Total

high

Count

26 (13.0%)

32 (16.0%)

58 (29.0%)

Mar. pct (1)

49.1% ; 44.8%

21.8% ; 55.2%

low

Count

11 (5.5%)

36 (18.0%)

47 (23.5%)

Mar. pct

20.8% ; 23.4%

24.5% ; 76.6%

middle

Count

16 (8.0%)

79 (39.5%)

95 (47.5%)

Mar. pct

30.2% ; 16.8%

53.7% ; 83.2%

Total

Count

53 (26.5%)

147 (73.5%)

200 (100.0%)

(1) Columns and rows percentages

There is much more flexibility in the flextable package, especially when used in conjunction with other packages, than we can cover in this workshop.

For more on flextable you can check the links below:

DT

The R package DT provides an interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides interactive tables with filtering, pagination, sorting, and many other features.

Key features of the DT package

  • Interactivity
    DT creates interactive tables with features such as sorting and searching, which is ideal for web use and Shiny apps. In contrast, flextable and kableExtra focus on static tables.

  • JavaScript integration
    DT leverages the DataTables library for advanced client-side features such as filtering, pagination, and exporting, making it well suited for web applications.

  • Ease of use for web applications
    DT is easy to use and implement when building HTML reports or Shiny apps.

The main function in this package is datatable(). It creates an HTML widget to display R data objects with DataTables.

library(ggplot2)

datatable(diamonds[1:200, ])

If you are familiar with the DataTables JavaScript table library, you can use the options argument to customize the table.

We can also add a filter argument to datatable() to automatically generate column filters. By default, the filters are not shown (filter = "none"). You can enable these filters with filter = "top" or filter = "bottom".

datatable(
  diamonds[1:200, ],
  filter  = "top",
  options = list(
    pageLength = 5,
    autoWidth  = TRUE
  )
)

For more examples and options for the DT package, see:

DT: An R interface to the DataTables library

gt (Grammar of Tables)

The gt package is designed to take data frames and tibbles and turn them into presentation-ready tables, with tools for labels, footnotes, and formatting suitable for reports and publications.

Advantages of the gt package

  • Customization
    gt provides extensive options for customizing table appearance, including fonts, colors, borders, and spacing. This makes it possible to create visually appealing and professionally formatted tables.

  • Easy to use
    The package has a user-friendly syntax that simplifies the creation of complex tables. It is designed to be intuitive and relatively easy to learn.

  • Integration with R Markdown
    gt integrates well with R Markdown, enabling you to include sophisticated tables in dynamic documents. It supports rendering to HTML and other formats commonly used in reports.

  • Publication-ready tables
    gt is designed for generating clean, well-formatted tables suitable for academic papers, reports, and presentations where table aesthetics are important.

Here we run a simple example adapted from the package reference page.

The gt package is similar to flextable, but it currently supports HTML, LaTeX, and RTF output.
The flextable package is particularly compatible with Microsoft Word and PowerPoint through the officer package.

library(dplyr)
library(gt)

# Modify the `airquality` dataset by adding the year
# of the measurements (1973) and limiting to 10 rows
airquality_m <-
  airquality |>
  # add year 1973
  mutate(Year = 1973L) |>
  # select the first 10 rows
  slice(1:10)

# Create a display table using the modified `airquality` dataset
gt_tbl <- gt(airquality_m)

# Print basic gt table
gt_tbl
Ozone Solar.R Wind Temp Month Day Year
41 190 7.4 67 5 1 1973
36 118 8.0 72 5 2 1973
12 149 12.6 74 5 3 1973
18 313 11.5 62 5 4 1973
NA NA 14.3 56 5 5 1973
28 NA 14.9 66 5 6 1973
23 299 8.6 65 5 7 1973
19 99 13.8 59 5 8 1973
8 19 20.1 61 5 9 1973
NA 194 8.6 69 5 10 1973
# Add title, subtitle, and column groups
gt_tbl |>
  tab_header(
    title = "New York Air Quality Measurements",
    subtitle = "Daily measurements in New York City (May 1–10, 1973)"
  ) |>
  tab_spanner(
    label   = "Time",
    columns = c(Year, Month, Day)
  ) |>
  tab_spanner(
    label   = "Measurement",
    columns = c(Ozone, Solar.R, Wind, Temp)
  )
New York Air Quality Measurements
Daily measurements in New York City (May 1–10, 1973)
Measurement
Time
Ozone Solar.R Wind Temp Year Month Day
41 190 7.4 67 1973 5 1
36 118 8.0 72 1973 5 2
12 149 12.6 74 1973 5 3
18 313 11.5 62 1973 5 4
NA NA 14.3 56 1973 5 5
28 NA 14.9 66 1973 5 6
23 299 8.6 65 1973 5 7
19 99 13.8 59 1973 5 8
8 19 20.1 61 1973 5 9
NA 194 8.6 69 1973 5 10

Reference for the gt package:

gtExtras

The gtExtras package provides additional functions to extend the gt package, especially when you want to include plots in tables or apply more advanced styling.

Overall, there are four families of functions in gtExtras:

  • Themes
    Seven themes that style almost every element of a gt table, inspired by data journalism–styled tables.

  • Utilities
    Helper functions for aligning and padding numbers, adding Font Awesome icons and images, highlighting, adding dividers, styling by group, creating two-table or two-column layouts, extracting ordered data from gt internals, and generating example datasets.

  • Plotting
    Twelve plotting functions for inline sparklines, win–loss charts, distributions (density/histogram), percentiles, dot + bar plots, bar charts, confidence intervals, and summarizing an entire data frame.

  • Colors
    Three functions for color scales, including a "Hulk" style (purple/green), coloring rows with default palettes from paletteer, and adding a "color box" next to cell values.

gt_tbl |>
  # use NYT-style theme
  gt_theme_nytimes() |>
  # change header title
  tab_header(title = "Table styled like the New York Times") |>
  # apply Hulk color scale to Ozone column (trim gives a tighter range)
  gt_hulk_col_numeric(Ozone, trim = TRUE)
Table styled like the New York Times
Ozone Solar.R Wind Temp Month Day Year
41 190 7.4 67 5 1 1973
36 118 8.0 72 5 2 1973
12 149 12.6 74 5 3 1973
18 313 11.5 62 5 4 1973
NA NA 14.3 56 5 5 1973
28 NA 14.9 66 5 6 1973
23 299 8.6 65 5 7 1973
19 99 13.8 59 5 8 1973
8 19 20.1 61 5 9 1973
NA 194 8.6 69 5 10 1973

For more options and features in the gtExtras package, see:

Summary tables

  • Several R packages can create publication-ready summary tables with minimal effort.

  • These packages provide built-in functions to generate standard summary table formats.

In this part of the workshop, we will look at:

  • table1
  • sjPlot
  • gtsummary

Advantages and Disadvantages of Summary Table Packages

  • Advantage: no need to write separate data-preparation code to summarize data or model results.

  • Disadvantage: less flexibility, and it can be harder to customize tables beyond the defaults.

"Table 1" in statistical analysis and the table1 package

In journal articles, especially in epidemiology and health research, the first table ("Table 1") usually presents descriptive statistics of baseline characteristics for the study sample.
This table is often stratified by one or more grouping variables, such as treatment group or outcome status.

The table1 package in R simplifies the creation of such tables.

Key features of the table1 package

  • Descriptive statistics
    Provides means, medians, standard deviations, and proportions for various variables.

  • Stratification
    Allows grouping and stratifying by one or more categorical variables.

  • Customization
    Offers options to customize labels, units, and layout to meet publication standards, but customization is not always straightforward.

  • Easy to use (with limits)
    table1 is easy to use for default tables, but users may find more advanced customization challenging.

  • Converting to other table packages
    The output of table1() can be converted (with some limitations) to data.frame, kableExtra, or flextable using as.data.frame(), t1kable(), and t1flex().

Example: basic table1 table

The data used here are from the boot package (melanoma).
The data consist of measurements on patients with malignant melanoma.

The grouping variable is patient status at the end of the study:
1 = melanoma death, 2 = alive, 3 = non-melanoma death.

library(boot)
library(table1)

melanoma1 <- melanoma

# Change status to factor with labels
melanoma1$status <- factor(
  melanoma1$status,
  levels = c(2, 1, 3),
  labels = c("Alive",           # reference
             "Melanoma death",
             "Non-melanoma death")
)

# Change sex to factor and label
melanoma1$sex <- factor(
  melanoma1$sex,
  labels = c("Male", "Female")
)

# Change ulcer to factor and label
melanoma1$ulcer <- factor(
  melanoma1$ulcer,
  labels = c("Absent", "Present")
)

# Basic Table 1
(table1.1 <- table1(~ sex + age + ulcer + thickness | status,
                    data = melanoma1))
Alive
(N=134)
Melanoma death
(N=57)
Non-melanoma death
(N=14)
Overall
(N=205)
sex
Male 91 (67.9%) 28 (49.1%) 7 (50.0%) 126 (61.5%)
Female 43 (32.1%) 29 (50.9%) 7 (50.0%) 79 (38.5%)
age
Mean (SD) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9) 52.5 (16.7)
Median [Min, Max] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0] 54.0 [4.00, 95.0]
ulcer
Absent 92 (68.7%) 16 (28.1%) 7 (50.0%) 115 (56.1%)
Present 42 (31.3%) 41 (71.9%) 7 (50.0%) 90 (43.9%)
thickness
Mean (SD) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63) 2.92 (2.96)
Median [Min, Max] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6] 1.94 [0.100, 17.4]

Improving labels and adding units

We can improve the table by:

  • adding descriptive labels for variables
  • specifying units for continuous variables
  • adding a caption and a footnote
  • labeling the "Total" column and placing it on the left
melanoma2 <- melanoma1

# Label variables
label(melanoma2$sex)       <- "Sex"
label(melanoma2$age)       <- "Age"
label(melanoma2$ulcer)     <- "Ulceration"
# use asterisk for footnote
label(melanoma2$thickness) <- "Thickness *"

# Assign units
units(melanoma2$age)       <- "years"
units(melanoma2$thickness) <- "mm"

# Caption and footnote
caption  <- "Descriptive statistics of patient characteristics by status"
footnote <- "* Also known as Breslow thickness"

table1(
  ~ sex + age + ulcer + thickness | status,
  data     = melanoma2,
  overall  = c(left = "Total"),
  caption  = caption,
  footnote = footnote
)
Descriptive statistics of patient characteristics by status
Total
(N=205)
Alive
(N=134)
Melanoma death
(N=57)
Non-melanoma death
(N=14)

* Also known as Breslow thickness

Sex
Male 126 (61.5%) 91 (67.9%) 28 (49.1%) 7 (50.0%)
Female 79 (38.5%) 43 (32.1%) 29 (50.9%) 7 (50.0%)
Age (years)
Mean (SD) 52.5 (16.7) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9)
Median [Min, Max] 54.0 [4.00, 95.0] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0]
Ulceration
Absent 115 (56.1%) 92 (68.7%) 16 (28.1%) 7 (50.0%)
Present 90 (43.9%) 42 (31.3%) 41 (71.9%) 7 (50.0%)
Thickness * (mm)
Mean (SD) 2.92 (2.96) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63)
Median [Min, Max] 1.94 [0.100, 17.4] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6]

Grouping strata under a common heading

Now we group the two death strata (Melanoma and Non-melanoma) under a common "Death" heading.

# Labels for variables and group header
labels <- list(
  variables = list(
    sex       = "Sex",
    age       = "Age (years)",
    ulcer     = "Ulceration",
    thickness = "Thickness* (mm)"
  ),
  groups = list("", "", "Death")
)

# Remove the word "death" from the levels, since it appears above
levels(melanoma2$status) <- c("Alive", "Melanoma", "Non-melanoma")

# Set up strata (columns) as a list of data frames
strata <- c(list(Total = melanoma2),
            split(melanoma2, melanoma2$status))

# New Table 1 with grouped columns
table1(
  strata,
  labels,
  groupspan = c(1, 1, 2),
  caption   = caption,
  footnote  = footnote
)
Descriptive statistics of patient characteristics by status
Death
Total
(N=205)
Alive
(N=134)
Melanoma
(N=57)
Non-melanoma
(N=14)

* Also known as Breslow thickness

Sex
Male 126 (61.5%) 91 (67.9%) 28 (49.1%) 7 (50.0%)
Female 79 (38.5%) 43 (32.1%) 29 (50.9%) 7 (50.0%)
Age (years)
Mean (SD) 52.5 (16.7) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9)
Median [Min, Max] 54.0 [4.00, 95.0] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0]
Ulceration
Absent 115 (56.1%) 92 (68.7%) 16 (28.1%) 7 (50.0%)
Present 90 (43.9%) 42 (31.3%) 41 (71.9%) 7 (50.0%)
Thickness* (mm)
Mean (SD) 2.92 (2.96) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63)
Median [Min, Max] 1.94 [0.100, 17.4] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6]

Customizing table1 in this way is powerful but not very straightforward and requires extra work.

Converting to flextable for further customization

One advantage of table1 is the ability to convert its output to other table formats, such as flextable, where formatting and layout may be easier to control.

library(flextable)

# Convert table1.1 to flextable
tab1.flex <- table1.1 |> t1flex()
tab1.flex

 

Alive
(N=134)

Melanoma death
(N=57)

Non-melanoma death
(N=14)

Overall
(N=205)

sex

  Male

91 (67.9%)

28 (49.1%)

7 (50.0%)

126 (61.5%)

  Female

43 (32.1%)

29 (50.9%)

7 (50.0%)

79 (38.5%)

age

  Mean (SD)

50.0 (15.9)

55.1 (17.9)

65.3 (10.9)

52.5 (16.7)

  Median [Min, Max]

52.0 [4.00, 84.0]

56.0 [14.0, 95.0]

65.0 [49.0, 86.0]

54.0 [4.00, 95.0]

ulcer

  Absent

92 (68.7%)

16 (28.1%)

7 (50.0%)

115 (56.1%)

  Present

42 (31.3%)

41 (71.9%)

7 (50.0%)

90 (43.9%)

thickness

  Mean (SD)

2.24 (2.33)

4.31 (3.57)

3.72 (3.63)

2.92 (2.96)

  Median [Min, Max]

1.36 [0.100, 12.9]

3.54 [0.320, 17.4]

2.26 [0.160, 12.6]

1.94 [0.100, 17.4]

Now we modify tab1.flex directly with flextable functions:

tab1.flex |>
  # add header row
  add_header_row(
    values    = c("", "Death", ""),       # labels for top header row
    colwidths = c(2, 2, 1)                # columns spanned by each header
  ) |>
  # remove the border line under "Death"
  hline(part = "header", i = 1,
        border = officer::fp_border(width = 0)) |>
  # add line over Melanoma and Non-melanoma columns
  hline(part = "header", i = 1,
        border = officer::fp_border(width = 1.5), j = 3:4) |>
  # change labels in the first column
  compose(i = 1,  j = 1, as_paragraph(as_chunk("SEX"))) |>
  compose(i = 4,  j = 1, as_paragraph(as_chunk("AGE (years)"))) |>
  compose(i = 7,  j = 1, as_paragraph(as_chunk("Ulceration"))) |>
  compose(i = 10, j = 1, as_paragraph(as_chunk("Thickness (mm)"))) |>
  # add caption
  set_caption(caption = "Table 1: Descriptive statistics of patient characteristics by status") |>
  # add footnote
  footnote(
    i           = 10,
    j           = 1,
    ref_symbols = "a",
    value       = as_paragraph("Also known as Breslow thickness")
  ) |>
  # adjust font size in footer/footnote if needed
  fontsize(i = 1, j = 1, size = 9, part = "footer")
Table 1: Descriptive statistics of patient characteristics by status

Death

 

Alive
(N=134)

Melanoma death
(N=57)

Non-melanoma death
(N=14)

Overall
(N=205)

SEX

  Male

91 (67.9%)

28 (49.1%)

7 (50.0%)

126 (61.5%)

  Female

43 (32.1%)

29 (50.9%)

7 (50.0%)

79 (38.5%)

AGE (years)

  Mean (SD)

50.0 (15.9)

55.1 (17.9)

65.3 (10.9)

52.5 (16.7)

  Median [Min, Max]

52.0 [4.00, 84.0]

56.0 [14.0, 95.0]

65.0 [49.0, 86.0]

54.0 [4.00, 95.0]

Ulceration

  Absent

92 (68.7%)

16 (28.1%)

7 (50.0%)

115 (56.1%)

  Present

42 (31.3%)

41 (71.9%)

7 (50.0%)

90 (43.9%)

Thickness (mm)a

  Mean (SD)

2.24 (2.33)

4.31 (3.57)

3.72 (3.63)

2.92 (2.96)

  Median [Min, Max]

1.36 [0.100, 12.9]

3.54 [0.320, 17.4]

2.26 [0.160, 12.6]

1.94 [0.100, 17.4]

aAlso known as Breslow thickness

You may find that this approach uses more lines of code, but the steps are more explicit and can be easier to control.

To learn more about the table1 package, see:

Using the table1 Package to Create HTML Tables of Descriptive Statistics

sjPlot

The sjPlot package is a collection of plotting and table-output functions for data visualization.

Results of many statistical analyses (commonly used in the social sciences) can be visualized using this package, including simple and cross-tabulated frequencies, linear models, GLM models, mixed-effects models, PCA and correlation matrices, cluster analyses, and more.

Key features of sjPlot

  • Cross-tabulation
    tab_xtab() creates cross-tabulations with options for adding row and column percentages and association statistics.

  • Regression tables
    tab_model() creates tables of regression models with detailed statistical summaries, including coefficients, standard errors, p-values, and confidence intervals. It supports various model types such as linear, logistic, and mixed-effects models.

  • Multiple models
    You can combine results from multiple models into a single table for comparative analysis.

Cross-tabulation with tab_xtab()

library(sjPlot)

# Cross-tabulation of SES by sex for the hsb data
tab_xtab(
  var.row     = hsb$ses,
  var.col     = hsb$female,
  show.col.prc = TRUE
)
ses female Total
female male
high 29
26.6 %
29
31.9 %
58
29 %
low 32
29.4 %
15
16.5 %
47
23.5 %
middle 48
44 %
47
51.6 %
95
47.5 %
Total 109
100 %
91
100 %
200
100 %
χ2=4.577 · df=2 · Cramer’s V=0.151 · p=0.101
tab_xtab(
  var.row      = hsb$ses,
  var.col      = hsb$female,
  show.row.prc = TRUE,
  statistics   = "phi"
)
ses female Total
female male
high 29
50 %
29
50 %
58
100 %
low 32
68.1 %
15
31.9 %
47
100 %
middle 48
50.5 %
47
49.5 %
95
100 %
Total 109
54.5 %
91
45.5 %
200
100 %
χ2=4.577 · df=2 · &phi=0.151 · p=0.101

Regression tables with tab_model()

library(MASS)   # for glm.nb

# Poisson model of awards on math, read, and SES
m.pois <- glm(awards ~ math + read + ses,
              family = poisson(),
              data   = hsb)

# Print using tab_model
tab_model(
  m.pois,
  dv.labels = "Poisson model"
)
  Poisson model
Predictors Incidence Rate Ratios CI p
(Intercept) 0.03 0.01 – 0.07 <0.001
math 1.05 1.03 – 1.06 <0.001
read 1.03 1.01 – 1.04 <0.001
ses [low] 0.89 0.65 – 1.22 0.491
ses [middle] 0.78 0.61 – 1.00 0.049
Observations 200
R2 Nagelkerke 0.629
# Poisson model with cluster-robust covariance matrix and deviance
tab_model(
  m.pois,
  vcov.fun  = "CL",
  vcov.args = list(type = "HC1", cluster = hsb$cid),
  dv.labels = "Poisson with cluster-robust covariance matrix"
)
  Poisson with cluster-robust covariance matrix
Predictors Incidence Rate Ratios CI p
(Intercept) 0.03 0.01 – 0.07 <0.001
math 1.05 1.03 – 1.06 <0.001
read 1.03 1.01 – 1.04 <0.001
ses [low] 0.89 0.65 – 1.22 0.487
ses [middle] 0.78 0.61 – 1.00 0.035
Observations 200
R2 Nagelkerke 0.629
# Negative binomial model of awards on math, read, and SES
m.nbin <- glm.nb(awards ~ math + read + ses, data = hsb)

# Print two models together in one table, adding AIC and deviance
tab_model(
  m.pois,
  m.nbin,
  vcov.fun  = "CL",
  vcov.args = list(type = "HC1", cluster = hsb$cid),
  show.dev  = TRUE,
  show.aic  = TRUE,
  dv.labels = c("Poisson", "Negative binomial")
)
  Poisson Negative binomial
Predictors Incidence Rate Ratios CI p Incidence Rate Ratios CI p
(Intercept) 0.03 0.01 – 0.07 <0.001 0.03 0.01 – 0.06 <0.001
math 1.05 1.03 – 1.06 <0.001 1.05 1.03 – 1.07 <0.001
read 1.03 1.01 – 1.04 <0.001 1.03 1.01 – 1.05 <0.001
ses [low] 0.89 0.65 – 1.22 0.487 0.89 0.62 – 1.27 0.484
ses [middle] 0.78 0.61 – 1.00 0.035 0.79 0.60 – 1.04 0.040
Observations 200 200
R2 Nagelkerke 0.629 0.595
Deviance 256.818 221.505
AIC 613.047 611.548

For more details, see:

Summary of regression models as HTML table

gtsummary

The gtsummary package is designed to create summary tables for a variety of statistical analyses.
It focuses on making publication-ready tables that are easy to generate and aesthetically pleasing.

Key advantages

  • Ease of use
    Minimal coding is required to generate complex, publication-quality tables.

  • Flexibility
    Tables can be customized to suit the needs of different publications or audiences.

  • Integrated statistical reporting
    Automatically includes relevant statistics such as p-values, confidence intervals, and effect sizes.

  • Exportability
    Tables can be converted to gt or flextable objects for further customization, and exported to formats suitable for Word and PowerPoint.

  • Themes
    It is possible to set themes in gtsummary. Themes control many aspects of how a table is printed (labels, style, formatting, etc.).

Descriptive summary tables

library(gtsummary)

# Descriptive summary table (similar to Table 1)
tab1_gt <- tbl_summary(
  melanoma1,
  include = -c(time, year),
  by      = status
)

tab1_gt
Characteristic Alive
N = 134
1
Melanoma death
N = 57
1
Non-melanoma death
N = 14
1
sex


    Male 91 (68%) 28 (49%) 7 (50%)
    Female 43 (32%) 29 (51%) 7 (50%)
age 52 (40, 62) 56 (44, 68) 65 (56, 72)
thickness 1.36 (0.81, 2.90) 3.54 (2.24, 4.84) 2.26 (1.29, 6.12)
ulcer


    Absent 92 (69%) 16 (28%) 7 (50%)
    Present 42 (31%) 41 (72%) 7 (50%)
1 n (%); Median (Q1, Q3)
# Add p-values and modify headers/labels
tab1_gt |>
  add_p() |>
  modify_header(label = "**Variable**") |>
  bold_labels()
Variable Alive
N = 134
1
Melanoma death
N = 57
1
Non-melanoma death
N = 14
1
p-value2
sex


0.033
    Male 91 (68%) 28 (49%) 7 (50%)
    Female 43 (32%) 29 (51%) 7 (50%)
age 52 (40, 62) 56 (44, 68) 65 (56, 72) 0.001
thickness 1.36 (0.81, 2.90) 3.54 (2.24, 4.84) 2.26 (1.29, 6.12) <0.001
ulcer


<0.001
    Absent 92 (69%) 16 (28%) 7 (50%)
    Present 42 (31%) 41 (72%) 7 (50%)
1 n (%); Median (Q1, Q3)
2 Pearson’s Chi-squared test; Kruskal-Wallis rank sum test
# Summary statistics for continuous variables as mean (SD),
# add overall column and confidence intervals
tbl_summary(
  melanoma1,
  include   = -c(time, year),
  by        = status,
  statistic = all_continuous() ~ "{mean} ({sd})"
) |>
  add_overall() |>
  add_ci(
    pattern          = "{stat} ({ci})",
    all_categorical() ~ "wald"
  ) |>
  modify_spanning_header(
    c("stat_2", "stat_3") ~ "**Death**"
  )
Characteristic Overall
N = 205 (95% CI)
1
Alive
N = 134 (95% CI)
1
Death
Melanoma death
N = 57 (95% CI)
1
Non-melanoma death
N = 14 (95% CI)
1
sex



    Male 126 (61%) (55%, 68%) 91 (68%) (60%, 76%) 28 (49%) (35%, 63%) 7 (50%) (20%, 80%)
    Female 79 (39%) (32%, 45%) 43 (32%) (24%, 40%) 29 (51%) (37%, 65%) 7 (50%) (20%, 80%)
age 52 (17) (50, 55) 50 (16) (47, 53) 55 (18) (50, 60) 65 (11) (59, 72)
thickness 2.92 (2.96) (2.5, 3.3) 2.24 (2.33) (1.8, 2.6) 4.31 (3.57) (3.4, 5.3) 3.72 (3.63) (1.6, 5.8)
ulcer



    Absent 115 (56%) (49%, 63%) 92 (69%) (60%, 77%) 16 (28%) (16%, 41%) 7 (50%) (20%, 80%)
    Present 90 (44%) (37%, 51%) 42 (31%) (23%, 40%) 41 (72%) (59%, 84%) 7 (50%) (20%, 80%)
Abbreviation: CI = Confidence Interval
1 n (%); Mean (SD)

Cross tables of categorical variables

# Basic cross-tabulation with p-value
tbl_cross(
  row     = ses,
  col     = honors,
  percent = "row",
  data    = hsb
) |>
  add_p()
honors
Total p-value1
enrolled not enrolled
ses


<0.001
    high 26 (45%) 32 (55%) 58 (100%)
    low 11 (23%) 36 (77%) 47 (100%)
    middle 16 (17%) 79 (83%) 95 (100%)
Total 53 (27%) 147 (74%) 200 (100%)
1 Pearson’s Chi-squared test
# Set compact theme
theme_gtsummary_compact()
## Setting theme "Compact"
tbl_cross(
  row     = ses,
  col     = honors,
  percent = "row",
  data    = hsb
) |>
  add_p()
honors
Total p-value1
enrolled not enrolled
ses


<0.001
    high 26 (45%) 32 (55%) 58 (100%)
    low 11 (23%) 36 (77%) 47 (100%)
    middle 16 (17%) 79 (83%) 95 (100%)
Total 53 (27%) 147 (74%) 200 (100%)
1 Pearson’s Chi-squared test

Formatted table of regression model results

# Logistic regression model
m2 <- glm(
  honors ~ math + ses,
  family = binomial(link = "logit"),
  data   = hsb
)

# Reset to gtsummary default theme
reset_gtsummary_theme()

# Regression table with odds ratios
tab1.glm <- tbl_regression(m2, exponentiate = TRUE)
tab1.glm
Characteristic OR 95% CI p-value
math 0.84 0.79, 0.88 <0.001
ses


    high
    low 1.05 0.35, 3.10 >0.9
    middle 3.44 1.43, 8.58 0.007
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
# Set theme to Journal of the American Medical Association (JAMA)
theme_gtsummary_journal(journal = "jama")
## Setting theme "JAMA"
tab1.glm
Characteristic OR 95% CI p-value
math 0.84 0.79, 0.88 <0.001
ses


    high
    low 1.05 0.35, 3.10 >0.9
    middle 3.44 1.43, 8.58 0.007
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
# Convert table for further formatting with gt
tab1.glm |>
  add_global_p() |>          # add overall p-value for ses
  bold_p(t = 0.01) |>        # bold p-values < 0.01
  as_gt() |>                 # convert to gt table
  tab_source_note(           # add source note (Markdown interpreted)
    md("*This data is simulated*")
  )
Characteristic OR 95% CI p-value
math 0.84 0.79, 0.88 <0.001
ses

0.011
    high
    low 1.05 0.35, 3.10
    middle 3.44 1.43, 8.58
Abbreviations: CI = Confidence Interval, OR = Odds Ratio
This data is simulated

Inline reporting with inline_text()

Reproducible reports are an important part of good practice.
We often need to report results from a table in the text of an R Markdown report.
Inline reporting is made simple with inline_text().

The method inline_text.tbl_regression() has the following format:

inline_text( x, variable, level = NULL, pattern = “{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})“, estimate_fun = x$inputs$estimate_fun, pvalue_fun = label_style_pvalue(prepend_p = TRUE), … )

We can use inline_text function inside two backtick, ` r inline_text() `, to report result of a gtsummary table.

For example we can use inline_text() to report the OR of the regression table in the text we can type:

For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of `r inline_text(tab1.glm, variable = math, pattern = ” {estimate}; 95% CI ({conf.low}, {conf.high})“)` keeping ses constant.

In the report it will appear like this:

For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of 0.84; 95% CI (0.79, 0.88) keeping ses constant.

Converting to other packages

The output of gtsummary tables can be converted to gt, kableExtra, or flextable objects.

Below is a summary of various Quarto and R Markdown output types and the print engines that support them (image from the gtsummary website):

To learn more about using gtsummary in R Markdown, see:

https://www.danieldsjoberg.com/gtsummary/articles/rmarkdown.html

gtsummary reference

Sjoberg DD, Whiting K, Curry M, Lavery JA, Larmarange J.
Reproducible summary tables with the gtsummary package. The R Journal 2021;13:570–80.
https://doi.org/10.32614/RJ-2021-053

Principles for effective statistical tables

  • 1. Purpose and audience
    Be clear why the table exists (compare, summarize, show trends, show relationships) and design it for the intended readers.

  • 2. Clear structure
    Arrange rows and columns in a logical order (time, region, alphabetical, or by size).
    Use short, specific headings and explain any units (e.g., "in millions", "in %").

  • 3. Consistent units and scale
    Use the same units and number of decimal places within a table.
    If different units are needed, label them clearly.

  • 4. Simplicity and visual balance
    Keep the layout simple and easy to scan.
    Avoid overcrowding—use spacing, alignment, and borders to keep the table readable.
    If needed, split an overloaded table into two smaller related tables.

  • 5. Accuracy and comparability
    Check that all numbers, totals, and derived values are correct.
    Ensure data from different years, regions, or sources are truly comparable
    (same definitions, time periods, and methods).

  • 6. Titles, notes, and sources
    Use a precise title that states what, where, and when the data represent.
    Add footnotes for special cases or symbols, and always include the data source.

  • 7. Emphasize key figures
    Use subtle formatting (e.g., bold, spacing, or grouping) to highlight important totals, averages, or results.

Thanks!