Introduction

A table is a structured arrangement of data, typically organized in rows and columns. It helps you see and compare data easily, making it simpler to understand and communicate the results.

Key components of a statistical table:

  • Title: Clearly describes the content and context of the table.

  • Rows: Represent different categories, groups, or individual data points.

  • Columns: Indicate variables or measures being reported.

  • Cells: Contain the actual data values corresponding to the intersection of rows and columns.

  • Headings: Labels for rows and columns to clarify the data being presented.

  • Footnotes: Additional information or explanations about the data.

We will explore how to effectively generate and present data using R, with a focus on utilizing RMarkdown for creating professional reports.

In this workshop we will cover r packages: kableExtra, flextable, gt, gtExtras, DT, table1, sjPlot, and gtsummary.

Basic Report and Result Tables in R

  • A data.frame in R is a type of data structure used to store data in a table format. It is one of the most common and versatile structures in R, allowing for the storage of different types of data (e.g., numeric, character, factor) in a single object.

There are two generic function in R that are used to display the output in the console.

  • The print() function is used to display the contents of an object in the console. It’s the most basic way to output data or results in R.

  • The summary() function provides a quick overview of the main statistical features of an object.

Both functions above work on various object types, such as vectors, data frames, and models.

Note: Implicit Printing: When we type an object’s name and run it, R internally calls the print() function to display the object’s contents. This is why you see the output in the console even if you don’t explicitly use print().

Review of R basic outputs

The first data that we are using in this workshop is the hsbdemo data set. The data is a sample of high school performance for 200 students.

The first step in any statistical analysis is to understand our data.

Note: The datasets used in this workshop are not real and are intended solely to demonstrate statistical analysis.

#Read the data
hsb <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")
#Names of columns
names(hsb)
##  [1] "id"      "female"  "ses"     "schtyp"  "prog"    "read"    "write"  
##  [8] "math"    "science" "socst"   "honors"  "awards"  "cid"
#Structure of data.frame
str(hsb)
## 'data.frame':    200 obs. of  13 variables:
##  $ id     : int  45 108 15 67 153 51 164 133 2 53 ...
##  $ female : chr  "female" "male" "male" "male" ...
##  $ ses    : chr  "low" "middle" "high" "low" ...
##  $ schtyp : chr  "public" "public" "public" "public" ...
##  $ prog   : chr  "vocation" "general" "vocation" "vocation" ...
##  $ read   : int  34 34 39 37 39 42 31 50 39 34 ...
##  $ write  : int  35 33 39 37 31 36 36 31 41 37 ...
##  $ math   : int  41 41 44 42 40 42 46 40 33 46 ...
##  $ science: int  29 36 26 33 39 31 39 34 42 39 ...
##  $ socst  : int  26 36 42 32 51 39 46 31 41 31 ...
##  $ honors : chr  "not enrolled" "not enrolled" "not enrolled" "not enrolled" ...
##  $ awards : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cid    : int  1 1 1 1 1 1 1 1 1 1 ...
#Change categorical variables from character to factors
hsb <- within(hsb,{
             female <- factor(female)
             ses <- factor(ses)
             schtyp <- factor(schtyp)
             prog <- factor(prog)
             honors <- factor(honors)
             })
#Print first 6 rows of data
hsb6 <- head(hsb)
print(hsb6)
##    id female    ses schtyp     prog read write math science socst       honors
## 1  45 female    low public vocation   34    35   41      29    26 not enrolled
## 2 108   male middle public  general   34    33   41      36    36 not enrolled
## 3  15   male   high public vocation   39    39   44      26    42 not enrolled
## 4  67   male    low public vocation   37    37   42      33    32 not enrolled
## 5 153   male middle public vocation   39    31   40      39    51 not enrolled
## 6  51 female   high public  general   42    36   42      31    39 not enrolled
##   awards cid
## 1      0   1
## 2      0   1
## 3      0   1
## 4      0   1
## 5      0   1
## 6      0   1

By printing the first 6 rows of the data we created a Tabular of the first 6 observations.

We can use summary() function to report summary statistics. for example we can get the summary statistics of students scores and honors.

#Summary statistics
summary(hsb[c("read", "write", "math", "science", "socst", "honors")])
##       read           write            math          science     
##  Min.   :28.00   Min.   :31.00   Min.   :33.00   Min.   :26.00  
##  1st Qu.:44.00   1st Qu.:45.75   1st Qu.:45.00   1st Qu.:44.00  
##  Median :50.00   Median :54.00   Median :52.00   Median :53.00  
##  Mean   :52.23   Mean   :52.77   Mean   :52.65   Mean   :51.85  
##  3rd Qu.:60.00   3rd Qu.:60.00   3rd Qu.:59.00   3rd Qu.:58.00  
##  Max.   :76.00   Max.   :67.00   Max.   :75.00   Max.   :74.00  
##      socst                honors   
##  Min.   :26.00   enrolled    : 53  
##  1st Qu.:46.00   not enrolled:147  
##  Median :52.00                     
##  Mean   :52.41                     
##  3rd Qu.:61.00                     
##  Max.   :71.00

Contingency Table with table() and xtab()

The table() function from R base creates frequency tables that summarize categorical data. We can also use function xtab from R stats package.

In our data we want to make a cross tabulate or contingency table for variables ses and honors.

#Tow-way Contingency Table
tab1 <- table(hsb$ses, hsb$honors)
tab1
##         
##          enrolled not enrolled
##   high         26           32
##   low          11           36
##   middle       16           79
#Proportional table
prop.table(tab1)
##         
##          enrolled not enrolled
##   high      0.130        0.160
##   low       0.055        0.180
##   middle    0.080        0.395
#Proportional table by row
prop.table(tab1, margin = 1)
##         
##           enrolled not enrolled
##   high   0.4482759    0.5517241
##   low    0.2340426    0.7659574
##   middle 0.1684211    0.8315789
#Tow-way Contingency Table
tab2 <- xtabs(~ ses + honors, data = hsb)
tab2
##         honors
## ses      enrolled not enrolled
##   high         26           32
##   low          11           36
##   middle       16           79
#Proportional table
prop.table(tab2)
##         honors
## ses      enrolled not enrolled
##   high      0.130        0.160
##   low       0.055        0.180
##   middle    0.080        0.395
#Proportional table by row
prop.table(tab2, margin = 1)
##         honors
## ses       enrolled not enrolled
##   high   0.4482759    0.5517241
##   low    0.2340426    0.7659574
##   middle 0.1684211    0.8315789
#Summary on a table object will perform a chi-squared test
summary(tab2)
## Call: xtabs(formula = ~ses + honors, data = hsb)
## Number of cases in table: 200 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 14.783, df = 2, p-value = 0.0006164
#Three-way cross tab
tab3 <- xtabs(~ ses + honors + female, data = hsb)
tab3
## , , female = female
## 
##         honors
## ses      enrolled not enrolled
##   high         15           14
##   low          10           22
##   middle       10           38
## 
## , , female = male
## 
##         honors
## ses      enrolled not enrolled
##   high         11           18
##   low           1           14
##   middle        6           41
ftable(tab3)
##                     female female male
## ses    honors                         
## high   enrolled                15   11
##        not enrolled            14   18
## low    enrolled                10    1
##        not enrolled            22   14
## middle enrolled                10    6
##        not enrolled            38   41

Regression models

Regression model often used to understand the relationship between a dependent variable and one or more independent variables.

In R we use summary() function to extract and report results of a regression model.

As an example we are using hsb data to regress math score on read and write score and prog.

#Run regression of math on read, write, and prog
m1 <- lm(math ~ read + write + prog, data = hsb)
lm.result <- summary(m1)
lm.result
## 
## Call:
## lm(formula = math ~ read + write + prog, data = hsb)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.257  -4.564  -0.211   4.271  17.527 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.20202    3.35561   5.722 3.91e-08 ***
## read          0.37186    0.05685   6.541 5.24e-10 ***
## write         0.29591    0.06149   4.812 2.98e-06 ***
## proggeneral  -2.87185    1.18968  -2.414  0.01670 *  
## progvocation -3.79862    1.23942  -3.065  0.00249 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.408 on 195 degrees of freedom
## Multiple R-squared:  0.5415, Adjusted R-squared:  0.5321 
## F-statistic: 57.57 on 4 and 195 DF,  p-value: < 2.2e-16
#Extracting coefficients table (it is a matrix)
lm.result$coefficients
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  19.2020151 3.35561280  5.722357 3.911030e-08
## read          0.3718589 0.05684928  6.541138 5.240877e-10
## write         0.2959093 0.06149161  4.812190 2.984266e-06
## proggeneral  -2.8718518 1.18968055 -2.413969 1.670404e-02
## progvocation -3.7986171 1.23941526 -3.064846 2.486293e-03
#Adding Confidence interval for coefficients
lm.table <- cbind(lm.result$coefficients, confint(m1))
#Changing the names of columns
colnames(lm.table)[c(5,6)] <- c("LL", "UL")
#Round number to 4 digits and print
round(lm.table, 4)
##              Estimate Std. Error t value Pr(>|t|)      LL      UL
## (Intercept)   19.2020     3.3556  5.7224   0.0000 12.5841 25.8200
## read           0.3719     0.0568  6.5411   0.0000  0.2597  0.4840
## write          0.2959     0.0615  4.8122   0.0000  0.1746  0.4172
## proggeneral   -2.8719     1.1897 -2.4140   0.0167 -5.2181 -0.5256
## progvocation  -3.7986     1.2394 -3.0648   0.0025 -6.2430 -1.3542

Advanced Tables in R Using R Packages within RMarkdown

Advantages of Using R Markdown to Create Tables

  • Reproducibility: Embedding code directly in the document ensures that tables can be easily reproduced by anyone, reducing errors from manual copying.

  • Dynamic Updates: Changes to the data or analysis are automatically reflected in the tables when the document is re-rendered, eliminating the need for manual updates.

  • Automation: RMarkdown generates and formats tables automatically, streamlining the process and avoiding repetitive tasks like reformatting.

  • Consistency: Tables maintain a consistent format and style throughout the document, which is particularly useful for large reports.

  • Customization and Formatting Advanced table customization options allow for professional and polished presentation, without needing to rely on external tools for formatting.

In summary, using RMarkdown to create tables ensures automation, reproducibility, and consistency, while also providing powerful customization and formatting options that are not available with simple copy-pasting from the console.

In the rest of the workshops we are introducing some of those packages with examples.

knitr::kable and kableExtra

Advantages of knitr::kable and kableExtra

  • Simplicity: knitr::kable offers a straightforward way to create clean tables with minimal code. It’s easy to use for beginners and perfect for simple tables that don’t require extensive customization.

  • Integration with RMarkdown: kable() is designed to work seamlessly with RMarkdown, making it easy to generate tables that fit well within dynamic documents.

  • Flexibility with kableExtra: When paired with kableExtra, kable becomes highly customizable. You can add advanced features like multi-row headers, colors, borders, alignment adjustments, column spanning, custom styling, and footnotes.

  • Theme: kableExtra offers some alternative HTML table themes other than the default bootstrap theme.

The kable() function in package knitr is a very simple table generator. It only generates tables for strictly rectangular data such as matrices and data frames.

This function does have a large number of arguments for you to customize the appearance of tables:

kable(x, format, digits = getOption(“digits”), row.names = NA, col.names = NA, align, caption = NULL, label = NULL, format.args = list(), escape = TRUE, …)

format is A character string. Possible values are latex, html, pipe (Pandoc’s pipe tables), … .

If you only need one table format that is not the default format for a document, you can set the global R option knitr.table.format, e.g.,

options(knitr.table.format = “html”)

We are using dataset state.x77.

state7 <- data.frame(state.x77)[1:7,]

knitr::kable(head(state7), format = "html")
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766

If we use pipe format the table will be like below image which is a Pandoc’s pipe table and depends on type of R Markdown specifications it will be rendered.

For example, since I use ioslide to create this slides, If I do not specify format or I use format = pipe the output table will be look like this:

my.table <- knitr::kable(state7, format = "pipe")
my.table
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

To learn more about knitr::kable() and it’s options you can check out the link below:

rmarkdown-cookbook, 10.1 The function knitr::kable()

kableExtra

Package kableExtra is an addition to knitr::kable(). The goal of R package kableExtra is to help you build common complex tables and manipulate table styles. It imports the pipe %>% symbol from magrittr (also works with R base pipe, |>) and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with ggplot2.

The basic HTML output is just a plain HTML table without any styling.

#plain HTML
kbl(state7)
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Bootstrap theme

kable_styling() will automatically apply twitter bootstrap theme to the table.

To see more option for this function please check the help file:

?kable_styling

state7 %>%
  kbl() %>%
  #twitter bootstrap theme
  kable_styling()
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Alternative themes

kableExtra also offers 6 other alternative HTML table themes other than the default bootstrap theme. They are: kable_paper, kable_classic, kable_classic_2, kable_minimal, kable_material and kable_material_dark.

We can also use options in kable_styling() to customize output table.

Here is some examples:

state7  %>%
  kbl() %>%
  #paper  theme with hover and full_width = F
  kable_paper("hover", full_width = F)
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Full width

state7 %>%
  kbl(caption = "Recreating booktabs style table") %>%
  # classic theme  and other options
  kable_classic(full_width = F, html_font = "Cambria",  position = "left")
Recreating booktabs style table
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

striped

state7 %>%
  kbl() %>%
  #material theme with striped rows
  kable_material(lightable_options= c("striped"))
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

Column / Row Specification

kbl(state7) %>%
  #paper theme 
  kable_paper(full_width = F) %>%
  #Make first column bold and add border
  column_spec(1, bold = T, border_right = T) %>%
  #Make column 9 width larger and background yellow
  column_spec(9, width = "6em", background = "yellow")
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862
kbl(state7) %>%
  #paper theme 
kable_paper(full_width = F) %>%
  #Conditional formatting column 2 
  column_spec(2, color = spec_color(state7$Population, palette = c("black", "red"))) %>%
  #Conditional formatting background column 4 text white 
  column_spec(4, color = "white",
              background = spec_color(state7$Illiteracy<=1.5, palette = c("red", "green"))) %>% 
  #Change fist row angle
    row_spec(0, angle = -45)
Population Income Illiteracy Life.Exp Murder HS.Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862

One of the nice future of kableExtra which only available for html format is Scroll box. If you have a huge table and you want to include it in your website or HTML document but don’t want to use a lots of space, using scroll box is a good solution.

kbl(state.x77) %>%
  kable_paper() %>%
  #Add scroll bar
  scroll_box(width = "400px", height = "200px")
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
California 21198 5114 1.1 71.71 10.3 62.6 20 156361
Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862
Delaware 579 4809 0.9 70.06 6.2 54.6 103 1982
Florida 8277 4815 1.3 70.66 10.7 52.6 11 54090
Georgia 4931 4091 2.0 68.54 13.9 40.6 60 58073
Hawaii 868 4963 1.9 73.60 6.2 61.9 0 6425
Idaho 813 4119 0.6 71.87 5.3 59.5 126 82677
Illinois 11197 5107 0.9 70.14 10.3 52.6 127 55748
Indiana 5313 4458 0.7 70.88 7.1 52.9 122 36097
Iowa 2861 4628 0.5 72.56 2.3 59.0 140 55941
Kansas 2280 4669 0.6 72.58 4.5 59.9 114 81787
Kentucky 3387 3712 1.6 70.10 10.6 38.5 95 39650
Louisiana 3806 3545 2.8 68.76 13.2 42.2 12 44930
Maine 1058 3694 0.7 70.39 2.7 54.7 161 30920
Maryland 4122 5299 0.9 70.22 8.5 52.3 101 9891
Massachusetts 5814 4755 1.1 71.83 3.3 58.5 103 7826
Michigan 9111 4751 0.9 70.63 11.1 52.8 125 56817
Minnesota 3921 4675 0.6 72.96 2.3 57.6 160 79289
Mississippi 2341 3098 2.4 68.09 12.5 41.0 50 47296
Missouri 4767 4254 0.8 70.69 9.3 48.8 108 68995
Montana 746 4347 0.6 70.56 5.0 59.2 155 145587
Nebraska 1544 4508 0.6 72.60 2.9 59.3 139 76483
Nevada 590 5149 0.5 69.03 11.5 65.2 188 109889
New Hampshire 812 4281 0.7 71.23 3.3 57.6 174 9027
New Jersey 7333 5237 1.1 70.93 5.2 52.5 115 7521
New Mexico 1144 3601 2.2 70.32 9.7 55.2 120 121412
New York 18076 4903 1.4 70.55 10.9 52.7 82 47831
North Carolina 5441 3875 1.8 69.21 11.1 38.5 80 48798
North Dakota 637 5087 0.8 72.78 1.4 50.3 186 69273
Ohio 10735 4561 0.8 70.82 7.4 53.2 124 40975
Oklahoma 2715 3983 1.1 71.42 6.4 51.6 82 68782
Oregon 2284 4660 0.6 72.13 4.2 60.0 44 96184
Pennsylvania 11860 4449 1.0 70.43 6.1 50.2 126 44966
Rhode Island 931 4558 1.3 71.90 2.4 46.4 127 1049
South Carolina 2816 3635 2.3 67.96 11.6 37.8 65 30225
South Dakota 681 4167 0.5 72.08 1.7 53.3 172 75955
Tennessee 4173 3821 1.7 70.11 11.0 41.8 70 41328
Texas 12237 4188 2.2 70.90 12.2 47.4 35 262134
Utah 1203 4022 0.6 72.90 4.5 67.3 137 82096
Vermont 472 3907 0.6 71.64 5.5 57.1 168 9267
Virginia 4981 4701 1.4 70.08 9.5 47.8 85 39780
Washington 3559 4864 0.6 71.72 4.3 63.5 32 66570
West Virginia 1799 3617 1.4 69.48 6.7 41.6 100 24070
Wisconsin 4589 4468 0.7 72.48 3.0 54.5 149 54464
Wyoming 376 4566 0.6 70.29 6.9 62.9 173 97203

To learn more about table styles and options in kable_styling You can check the link below:

Create Awesome HTML Table with knitr::kable and kableExtra

Using the flextable

flextable is designed to create and format tables that can be easily exported into Word and PowerPoint documents. It allows users to create richly formatted tables with features like text formatting, colors, borders, and alignment, making it ideal for generating professional-looking tables in document reports.

Advantages of flextable Compared to Other Packages

  • Extensive Customization: flextable offers detailed control over the formatting of tables, including text alignment, fonts, colors, borders, and cell-level styling. This level of customization goes beyond what simpler packages like kable can offer, allowing for professional and polished tables.

  • Integration with Word and PowerPoint: One of the standout features of flextable is its seamless integration with Microsoft Word and PowerPoint through the officer package. You can directly export beautifully formatted tables into these documents, making it ideal for users who frequently work with Word and PowerPoint.

  • Conditional Formatting: The package allows for conditional formatting based on the values in the table, which is useful for highlighting key data points or making tables more informative visually.

  • Predefined Themes: flextable offers built-in themes that provide consistent, aesthetically pleasing styles for tables. This reduces the effort needed to style tables while maintaining a professional appearance.

The main function is flextable which takes a data.frame as argument and returns a flextable object.

#def
ft <- flextable(hsb[1:10, -13])
ft

id

female

ses

schtyp

prog

read

write

math

science

socst

honors

awards

45

female

low

public

vocation

34

35

41

29

26

not enrolled

0

108

male

middle

public

general

34

33

41

36

36

not enrolled

0

15

male

high

public

vocation

39

39

44

26

42

not enrolled

0

67

male

low

public

vocation

37

37

42

33

32

not enrolled

0

153

male

middle

public

vocation

39

31

40

39

51

not enrolled

0

51

female

high

public

general

42

36

42

31

39

not enrolled

0

164

male

middle

public

vocation

31

36

46

39

46

not enrolled

0

133

male

middle

public

vocation

50

31

40

34

31

not enrolled

0

2

female

middle

public

vocation

39

41

33

42

41

not enrolled

0

53

male

middle

public

vocation

34

37

46

39

31

not enrolled

0

ft |>
  #add header row
  add_header_row(
    colwidths = c(3, 2, 5, 2),
    values = c("Student", "School", "Grades", "Achievements")) |>
  #Use theme_vanilla
  theme_vanilla() |>
  #Add footer
  add_footer_lines("This data is simulated and it is not real") |>
  color(part = "footer", color = "#666666") |>
  #set Caption
  set_caption(caption = "First 10 rows of a sample of high school data") |>
  #Align header to center
  align(align = "center", part = "header", i = 1)
First 10 rows of a sample of high school data

Student

School

Grades

Achievements

id

female

ses

schtyp

prog

read

write

math

science

socst

honors

awards

45

female

low

public

vocation

34

35

41

29

26

not enrolled

0

108

male

middle

public

general

34

33

41

36

36

not enrolled

0

15

male

high

public

vocation

39

39

44

26

42

not enrolled

0

67

male

low

public

vocation

37

37

42

33

32

not enrolled

0

153

male

middle

public

vocation

39

31

40

39

51

not enrolled

0

51

female

high

public

general

42

36

42

31

39

not enrolled

0

164

male

middle

public

vocation

31

36

46

39

46

not enrolled

0

133

male

middle

public

vocation

50

31

40

34

31

not enrolled

0

2

female

middle

public

vocation

39

41

33

42

41

not enrolled

0

53

male

middle

public

vocation

34

37

46

39

31

not enrolled

0

This data is simulated and it is not real

The flextable package will not aggregate data for you but it will help you to present aggregated data. However, it has some useful function to generate descriptive statistics.

Cross tab with proc_freq()

Function proc_freq() compute a contingency table and create a flextable from the result. The aim of the function is to reproduce the results of the SAS PROC FREQ.

proc_freq(hsb, "ses", "honors",
          include.row_percent = TRUE,
          include.column_percent = TRUE,
          include.table_percent = TRUE)

ses

honors

enrolled

not enrolled

Total

high

Count

26 (13.0%)

32 (16.0%)

58 (29.0%)

Mar. pct (1)

49.1% ; 44.8%

21.8% ; 55.2%

low

Count

11 (5.5%)

36 (18.0%)

47 (23.5%)

Mar. pct

20.8% ; 23.4%

24.5% ; 76.6%

middle

Count

16 (8.0%)

79 (39.5%)

95 (47.5%)

Mar. pct

30.2% ; 16.8%

53.7% ; 83.2%

Total

Count

53 (26.5%)

147 (73.5%)

200 (100.0%)

(1) Columns and rows percentages

There are many more flexibility in the flextable package, especially when used in conjunction with other packages, that we cannot cover in this workshop.

For more on flextable you can check the links below:

Using flextable

flextable cheat sheet

Function reference (manuals)

flextable gallery

DT

The R package DT provides an R interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides interactive table with filtering, pagination, sorting, and many other features in the tables.

Key Features of DT Package

  • Interactivity: DT creates interactive tables with features like sorting, and searching, ideal for web use and Shiny apps. flextable and kableExtra focus on static tables.

JavaScript Integration: DT leverages DataTables for advanced client-side features like inline editing and exporting, making it great for web applications.

Ease of Use for Web Applications: DT is best for web applications and easy to use and implement.

The main function in this package is datatable(). It creates an HTML widget to display R data objects with DataTables.

datatable(diamonds[1:200,])

If you are familiar with DataTables Javascript HTML table library, you may use the options argument to customize the table.

We can added a filter argument in datatable() to automatically generate column filters. By default, the filters are not shown since filter = "none". You can enable these filters by filter = "top" or "bottom".

#Add filter
datatable(diamonds[1:200,], filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE
))

For more examples and options for package DT you can check the link below:

DT: An R interface to the DataTables library

gt

Package gt is aimed to distinguish between data tables (e.g., tibbles, data.frames, etc.) and presentation tables and summary tables.

Advantage of the gt Package

  • Customization:

    gt provides extensive options for customizing table appearance, including fonts, colors, borders, and spacing. This allows for creating visually appealing and professionally formatted tables.

  • Easy to Use:

    The package has a user-friendly syntax that simplifies the creation of complex tables. It’s designed to be intuitive and easy to learn, making table creation straightforward.

  • Integration with RMarkdown:

    gt integrates well with RMarkdown, enabling you to include sophisticated tables in dynamic documents. It supports rendering in HTML and integrates seamlessly into RMarkdown reports.

  • Publication-Ready Tables:

    gt is designed for generating publication-quality tables that are clean and well-formatted. It’s ideal for academic papers, reports, and presentations where table aesthetics are important.

Here we run one simple example from the package reference page. The package gt is very similar to package flextable but it currently supports HTML, LaTex, and RTF. Package flextable is compatible with Microsoft software like word and power point.

# Modify the `airquality` dataset by adding the year
# of the measurements (1973) and limiting to 10 rows
airquality_m <- 
  airquality |>
  #add year 1973
  mutate(Year = 1973L) |>
  #select the first 10 rows
  slice(1:10)
  
# Create a display table using the `airquality`
# dataset; arrange columns into groups
gt_tbl <- 
  gt(airquality_m)
#Print gt table
gt_tbl
Ozone Solar.R Wind Temp Month Day Year
41 190 7.4 67 5 1 1973
36 118 8.0 72 5 2 1973
12 149 12.6 74 5 3 1973
18 313 11.5 62 5 4 1973
NA NA 14.3 56 5 5 1973
28 NA 14.9 66 5 6 1973
23 299 8.6 65 5 7 1973
19 99 13.8 59 5 8 1973
8 19 20.1 61 5 9 1973
NA 194 8.6 69 5 10 1973
gt_tbl |>
  #Add title and subtitle 
  tab_header(
    title = "New York Air Quality Measurements",
    subtitle = "Daily measurements in New York City (May 1-10, 1973)"
  ) |>
  #Span columns 
  tab_spanner(
    label = "Time",
    columns = c(Year, Month, Day)
  ) |>
  tab_spanner(
    label = "Measurement",
    columns = c(Ozone, Solar.R, Wind, Temp)
  )
New York Air Quality Measurements
Daily measurements in New York City (May 1-10, 1973)
Measurement Time
Ozone Solar.R Wind Temp Year Month Day
41 190 7.4 67 1973 5 1
36 118 8.0 72 1973 5 2
12 149 12.6 74 1973 5 3
18 313 11.5 62 1973 5 4
NA NA 14.3 56 1973 5 5
28 NA 14.9 66 1973 5 6
23 299 8.6 65 1973 5 7
19 99 13.8 59 1973 5 8
8 19 20.1 61 1973 5 9
NA 194 8.6 69 1973 5 10

The reference for package gt

gt package

gtExtras

Package gtExtras also provide additional functions to assist with package gt, specially if you want to include plots in your tables:

Overall, there are four families of functions in gtExtras:

  • Themes: 7 themes that style almost every element of a gt table, built off of data journalism-styled tables

  • Utilities: Helper functions for aligning/padding numbers, adding fontawesome icons, images, highlighting, dividers, styling by group, creating two tables or two column layouts, extracting ordered data from a gt table internals, or generating a random dataset.

  • Plotting: 12 plotting functions for inline sparklines, win-loss charts, distributions (density/histogram), percentiles, dot + bar, bar charts, confidence intervals, or summarizing an entire dataframe!

  • Colors: 3 functions, a palette for “Hulk” style scale (purple/green), coloring rows with good defaults from paletteer, or adding a “color box” along with the cell value

gt_tbl %>% 
  #USe theme NYT
  gt_theme_nytimes() %>% 
  #Change header title
  tab_header(title = "Table styled like the NY Times") %>% 
  #Hulk data_color
  #Trim provides a tighter range of purple/green
  gt_hulk_col_numeric(Ozone, trim = TRUE)
Table styled like the NY Times
Ozone Solar.R Wind Temp Month Day Year
41 190 7.4 67 5 1 1973
36 118 8.0 72 5 2 1973
12 149 12.6 74 5 3 1973
18 313 11.5 62 5 4 1973
NA NA 14.3 56 5 5 1973
28 NA 14.9 66 5 6 1973
23 299 8.6 65 5 7 1973
19 99 13.8 59 5 8 1973
8 19 20.1 61 5 9 1973
NA 194 8.6 69 5 10 1973

For more options and future in the package gtExtras check the package references:

Plotting with gtExtras

Beautiful tables in R with gtExtras

Summary tables

There are some packages in R that will create publication-ready summary tables in R with minimal effort. Those packages have built-in functions to generate standard format summary tables.

In this part we are going to use R packages table1, SjPlot, gtsummary, and stargazer.

The advantage of using those packages are that you do not need to write extra data preparation codes to summarize data and analysis in a data.frame. However, we lose flexibility and it will be more difficult to customize tables.

“Table 1” in Statistical Analysis and the table1 Package

In journal articles, particularly in fields like epidemiology and health data, the first table (commonly referred to as “Table 1”), presents the descriptive statistics of baseline characteristics of the study sample. This table is typically stratified by one or more grouping variables, such as treatment groups or demographic categories.

The table1 package in R simplifies the creation of such tables.

Key Features of table1 Package:

  • Descriptive Statistics: Provides means, medians, standard deviations, and proportions for various variables.

  • Stratification: Allows for grouping and stratifying by one or more categorical variables.

  • Customization: Although, the package offers options for customizing the appearance and content of the table to meet publication standards but it is not straightforward and easy to customize your table.

  • Easy to use: table1 is easy to use. However, users might find it challenging to customize.

  • Converting to to other packages: One advantage of this package is that it is possible (with some limitations) to convert the output of table1() to a data.frame, kableExtra or flextable, using the functions as.data.frame(), t1kable() and t1flex() respectively.

Example of table1

The data used for this example if from package boot, called melanoma. The data consist of measurements made on patients with malignant melanoma.

The Grouping variable is patients status at the end of the study. 1 indicates that they had died from melanoma, 2 indicates that they were still alive and 3 indicates that they had died from causes unrelated to their melanoma.

melanoma1 <- melanoma
# Change status to factor
melanoma1$status <- 
  factor(melanoma1$status, 
         levels=c(2,1,3),
         labels=c("Alive", # Reference
                  "Melanoma death", 
                  "Non-melanoma death"))
# Change sex to factor and label them 
melanoma1$sex <- 
  factor(melanoma1$sex, labels = c("Male", 
                  "Female"))
# Change ulcer to factor and label them  
melanoma1$ulcer <- 
  factor(melanoma1$ulcer, labels = c("Absent", 
                  "Present"))
#Basic table 1
(table1.1 <- table1(~ sex + age + ulcer + thickness | status, data=melanoma1))
Alive
(N=134)
Melanoma death
(N=57)
Non-melanoma death
(N=14)
Overall
(N=205)
sex
Male 91 (67.9%) 28 (49.1%) 7 (50.0%) 126 (61.5%)
Female 43 (32.1%) 29 (50.9%) 7 (50.0%) 79 (38.5%)
age
Mean (SD) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9) 52.5 (16.7)
Median [Min, Max] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0] 54.0 [4.00, 95.0]
ulcer
Absent 92 (68.7%) 16 (28.1%) 7 (50.0%) 115 (56.1%)
Present 42 (31.3%) 41 (71.9%) 7 (50.0%) 90 (43.9%)
thickness
Mean (SD) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63) 2.92 (2.96)
Median [Min, Max] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6] 1.94 [0.100, 17.4]

To improve things, we can create factors with descriptive labels for the categorical variables (sex and ulcer), label each variable the way we want, and specify units for the continuous variables (age and thickness). We also specify that the overall column to be labeled “Total” and be positioned on the left, and add a caption and footnote.

melanoma2 <- melanoma1
#Label the variables name
label(melanoma2$sex)       <- "Sex"
label(melanoma2$age)       <- "Age"
label(melanoma2$ulcer)     <- "Ulceration"
#I Use asterisk for footnote!
label(melanoma2$thickness) <- "Thickness *"
#Assign unit to age and thickness
units(melanoma2$age)       <- "years"
units(melanoma2$thickness) <- "mm"
#create caption
caption  <- "Descriptive statistics of patients characteristics by status"
#create footnote
footnote <- "* Also known as Breslow thickness"
#Create table1
table1(~ sex + age + ulcer + thickness | status, data=melanoma2,
    overall=c(left="Total"), caption=caption, footnote=footnote)
Descriptive statistics of patients characteristics by status
Total
(N=205)
Alive
(N=134)
Melanoma death
(N=57)
Non-melanoma death
(N=14)

* Also known as Breslow thickness

Sex
Male 126 (61.5%) 91 (67.9%) 28 (49.1%) 7 (50.0%)
Female 79 (38.5%) 43 (32.1%) 29 (50.9%) 7 (50.0%)
Age (years)
Mean (SD) 52.5 (16.7) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9)
Median [Min, Max] 54.0 [4.00, 95.0] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0]
Ulceration
Absent 115 (56.1%) 92 (68.7%) 16 (28.1%) 7 (50.0%)
Present 90 (43.9%) 42 (31.3%) 41 (71.9%) 7 (50.0%)
Thickness * (mm)
Mean (SD) 2.92 (2.96) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63)
Median [Min, Max] 1.94 [0.100, 17.4] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6]

Now we grouped together two “Death” strata (Melanoma and Non-melanoma) under a common heading.

#label the variables sex. age, ulcer, and Thickness
#Add groups label
labels <- list(
    variables=list(sex="Sex",
                   age="Age (years)",
                   ulcer="Ulceration",
                   thickness="Thickness* (mm)"),
    groups=list("", "", "Death"))

# Remove the word "death" from the labels, since it now appears above
levels(melanoma2$status) <- c("Alive", "Melanoma", "Non-melanoma")
#Set up our “strata”, or column, as a list of data.frame
strata <- c(list(Total=melanoma2), split(melanoma2, melanoma2$status))
#Create new table1
table1(strata, labels, groupspan=c(1, 1, 2), caption=caption, footnote=footnote)
Descriptive statistics of patients characteristics by status
Death
Total
(N=205)
Alive
(N=134)
Melanoma
(N=57)
Non-melanoma
(N=14)

* Also known as Breslow thickness

Sex
Male 126 (61.5%) 91 (67.9%) 28 (49.1%) 7 (50.0%)
Female 79 (38.5%) 43 (32.1%) 29 (50.9%) 7 (50.0%)
Age (years)
Mean (SD) 52.5 (16.7) 50.0 (15.9) 55.1 (17.9) 65.3 (10.9)
Median [Min, Max] 54.0 [4.00, 95.0] 52.0 [4.00, 84.0] 56.0 [14.0, 95.0] 65.0 [49.0, 86.0]
Ulceration
Absent 115 (56.1%) 92 (68.7%) 16 (28.1%) 7 (50.0%)
Present 90 (43.9%) 42 (31.3%) 41 (71.9%) 7 (50.0%)
Thickness* (mm)
Mean (SD) 2.92 (2.96) 2.24 (2.33) 4.31 (3.57) 3.72 (3.63)
Median [Min, Max] 1.94 [0.100, 17.4] 1.36 [0.100, 12.9] 3.54 [0.320, 17.4] 2.26 [0.160, 12.6]

You can see that Customizing in table1 in the last step is not very easy and need some extra work.

Converting to flextable

Now lets try to convert it to flextable and customize it there!

#Converting to flextable
tab1.flex <- table1.1 |> t1flex() 
#Print 
tab1.flex

 

Alive
(N=134)

Melanoma death
(N=57)

Non-melanoma death
(N=14)

Overall
(N=205)

sex

  Male

91 (67.9%)

28 (49.1%)

7 (50.0%)

126 (61.5%)

  Female

43 (32.1%)

29 (50.9%)

7 (50.0%)

79 (38.5%)

age

  Mean (SD)

50.0 (15.9)

55.1 (17.9)

65.3 (10.9)

52.5 (16.7)

  Median [Min, Max]

52.0 [4.00, 84.0]

56.0 [14.0, 95.0]

65.0 [49.0, 86.0]

54.0 [4.00, 95.0]

ulcer

  Absent

92 (68.7%)

16 (28.1%)

7 (50.0%)

115 (56.1%)

  Present

42 (31.3%)

41 (71.9%)

7 (50.0%)

90 (43.9%)

thickness

  Mean (SD)

2.24 (2.33)

4.31 (3.57)

3.72 (3.63)

2.92 (2.96)

  Median [Min, Max]

1.36 [0.100, 12.9]

3.54 [0.320, 17.4]

2.26 [0.160, 12.6]

1.94 [0.100, 17.4]

#Modify tab1.flex
tab1.flex |>
  #add header row
  add_header_row(
  values = c("",  "Death", ""),  # Labels for the top header row
  colwidths = c(2, 2, 1)  # Number of columns spanned by each header
) |>
  #Fist remove the borderline under death
  hline(part = "header", i = 1, border = officer::fp_border(width = 0))|>
  # Line over Melanoma and Non-melanoma
   hline(part = "header", i = 1, border = officer::fp_border(width = 1.5), j = 3:4) |>
  #Change lables for rows!
compose(i = 1, j = 1, as_paragraph(as_chunk("SEX"))) |>
compose(i = 4, j = 1, as_paragraph(as_chunk("AGE (years)"))) |>
compose(i = 7, j = 1, as_paragraph(as_chunk("Ulceration"))) |>
compose(i = 10, j = 1, as_paragraph(as_chunk("Thickness (mm)"))) |>
#Add Caption
set_caption(ft, caption = "Table 1: Descriptive statistics of patients characteristics by status") |>
##Add footnote 
footnote( i = 10, j = 1,
  ref_symbols = "a",
  value = as_paragraph("Also known as Breslow thickness")
) |>
  fontsize(i = 1, j =1 , size = 9, part = "footer")
Table 1: Descriptive statistics of patients characteristics by status

Death

 

Alive
(N=134)

Melanoma death
(N=57)

Non-melanoma death
(N=14)

Overall
(N=205)

SEX

  Male

91 (67.9%)

28 (49.1%)

7 (50.0%)

126 (61.5%)

  Female

43 (32.1%)

29 (50.9%)

7 (50.0%)

79 (38.5%)

AGE (years)

  Mean (SD)

50.0 (15.9)

55.1 (17.9)

65.3 (10.9)

52.5 (16.7)

  Median [Min, Max]

52.0 [4.00, 84.0]

56.0 [14.0, 95.0]

65.0 [49.0, 86.0]

54.0 [4.00, 95.0]

Ulceration

  Absent

92 (68.7%)

16 (28.1%)

7 (50.0%)

115 (56.1%)

  Present

42 (31.3%)

41 (71.9%)

7 (50.0%)

90 (43.9%)

Thickness (mm)a

  Mean (SD)

2.24 (2.33)

4.31 (3.57)

3.72 (3.63)

2.92 (2.96)

  Median [Min, Max]

1.36 [0.100, 12.9]

3.54 [0.320, 17.4]

2.26 [0.160, 12.6]

1.94 [0.100, 17.4]

aAlso known as Breslow thickness

You may find it somewhat more line of the code but the steps are more straightforward and clear.

To learn more about package table1 see the link below:

Using the table1 Package to Create HTML Tables of Descriptive Statistics

sjPlot

sjPlot package is a collection of plotting and table output functions for data visualization.

Results of various statistical analyses (that are commonly used in social sciences) can be visualized using this package, including simple and cross tabulated frequencies, linear models, glm models, mixed effects models, PCA and correlation matrices, cluster analyses, and much more.

Key Features of sjPlot:

  • Cross tabulation: tab_xtab() Creates cross-tabulations with options for adding row and column percentages.

  • Regression Tables: tab_model() Create tables of regression models with detailed statistical summaries, including coefficients, standard errors, p-values, and confidence intervals. Supports various model types such as linear, logistic, and mixed-effects models.

  • Multiple models: Combine results from multiple models into a single table for comparative analysis.

Cross tabulation with tab_xtab()

#Cross tab for ses and "honors for hsb data
tab_xtab(var.row = hsb$ses, var.col = hsb$female, 
         show.col.prc = TRUE)
ses female Total
female male
high 29
26.6 %
29
31.9 %
58
29 %
low 32
29.4 %
15
16.5 %
47
23.5 %
middle 48
44 %
47
51.6 %
95
47.5 %
Total 109
100 %
91
100 %
200
100 %
χ2=4.577 · df=2 · Cramer’s V=0.151 · p=0.101
tab_xtab(var.row = hsb$ses, var.col = hsb$female, 
         show.row.prc = TRUE,
         statistics = "phi")
ses female Total
female male
high 29
50 %
29
50 %
58
100 %
low 32
68.1 %
15
31.9 %
47
100 %
middle 48
50.5 %
47
49.5 %
95
100 %
Total 109
54.5 %
91
45.5 %
200
100 %
χ2=4.577 · df=2 · &phi=0.151 · p=0.101

Regression Tables with tab_model()

#Run poisson model of awards on math, read, and ses
m.pois <- glm(awards ~ math + read + ses, family = poisson(), data = hsb)
#Print using tab_model
tab_model(m.pois, dv.labels = c("Poisson Model"))
  Poisson Model
Predictors Incidence Rate Ratios CI p
(Intercept) 0.03 0.01 – 0.07 <0.001
math 1.05 1.03 – 1.06 <0.001
read 1.03 1.01 – 1.04 <0.001
ses [low] 0.89 0.65 – 1.22 0.491
ses [middle] 0.78 0.61 – 1.00 0.049
Observations 200
R2 Nagelkerke 0.629
#Print using tab_model with clustered covariance matrix estimation and add deviance
tab_model(m.pois, vcov.fun = "CL", 
  vcov.args = list(type = "HC1", cluster = hsb$cid), 
  dv.labels = c("Poisson With Cluster-Robust Covariance Matrix"))
  Poisson With Cluster-Robust Covariance Matrix
Predictors Incidence Rate Ratios CI p
(Intercept) 0.03 0.01 – 0.07 <0.001
math 1.05 1.03 – 1.06 <0.001
read 1.03 1.01 – 1.04 <0.001
ses [low] 0.89 0.65 – 1.22 0.487
ses [middle] 0.78 0.61 – 1.00 0.035
Observations 200
R2 Nagelkerke 0.629
#Run negative binomial model of awards on math, read, and ses
m.nbin <- glm.nb(awards ~ math + read + ses, data = hsb)

#Print two model together in one table add AIC, deviance
tab_model(m.pois, m.nbin, vcov.fun = "CL", 
  vcov.args = list(type = "HC1", cluster = hsb$cid), show.dev = TRUE, show.aic = TRUE,
  dv.labels = c("Poisson",
    "Negative-binomial"))
  Poisson Negative-binomial
Predictors Incidence Rate Ratios CI p Incidence Rate Ratios CI p
(Intercept) 0.03 0.01 – 0.07 <0.001 0.03 0.01 – 0.06 <0.001
math 1.05 1.03 – 1.06 <0.001 1.05 1.03 – 1.07 <0.001
read 1.03 1.01 – 1.04 <0.001 1.03 1.01 – 1.05 <0.001
ses [low] 0.89 0.65 – 1.22 0.487 0.89 0.62 – 1.27 0.484
ses [middle] 0.78 0.61 – 1.00 0.035 0.79 0.60 – 1.04 0.040
Observations 200 200
R2 Nagelkerke 0.629 0.595
Deviance 256.818 221.505
AIC 613.047 611.548

Summary of Regression Models as HTML Table

gtsummary

The gtsummary package is designed to create summary tables for a variety of statistical analyses. It focuses on making publication-ready tables that are easy to generate and aesthetically pleasing.

Key Advantages:

  • Ease of Use: Minimal coding required to generate complex, publication-quality tables. Flexibility: The ability to customize tables to suit the needs of different publications or audiences.

  • Integrated Statistical Reporting: Automatically includes relevant statistics such as p-values, confidence intervals, and effect sizes.

  • Exportability: Ability to export tables to various formats for different types of reports and manuscripts. You can convert gtsummary tables to gt or flextable objects for further customization or export to Microsoft Word and PowerPoint.

  • gtsummary themes It’s possible to set themes in gtsummary. The themes control many aspects of how a table is printed.

Here are some key features of using gtsummary:

  • Descriptive summary tables: tbl_summary() automatically creates summary statistics tables for data frames, stratified by groups (if desired), with options for mean, median, standard deviation, proportions, and more. Similar to table1 package, this is particularly useful for creating “Table 1” in medical and epidemiological research.

  • Tables for statistical tests: tbl_cross() creates cross-tabulations (contingency tables) with statistical tests like chi-square or Fisher’s exact test.

  • Summary tables for regression models: tbl_regression() Generates detailed tables from regression models, including coefficients, confidence intervals, p-values, and more.

Descriptive summary tables

#Use tbl_summary for summary statistics
tab1_gt <- tbl_summary(melanoma1, include = -c(time, year), by = status)
#Print
tab1_gt 
Characteristic Alive
N = 134
1
Melanoma death
N = 57
1
Non-melanoma death
N = 14
1
sex


    Male 91 (68%) 28 (49%) 7 (50%)
    Female 43 (32%) 29 (51%) 7 (50%)
age 52 (40, 62) 56 (44, 68) 65 (56, 72)
thickness 1.36 (0.81, 2.90) 3.54 (2.24, 4.84) 2.26 (1.29, 6.12)
ulcer


    Absent 92 (69%) 16 (28%) 7 (50%)
    Present 42 (31%) 41 (72%) 7 (50%)
1 n (%); Median (Q1, Q3)
#Test for a difference between groups
tab1_gt |>
#  Add P-value column
  add_p() |>
#Change header and Labels
 modify_header(label = "**Variable**") |>
  bold_labels()
Variable Alive
N = 134
1
Melanoma death
N = 57
1
Non-melanoma death
N = 14
1
p-value2
sex


0.033
    Male 91 (68%) 28 (49%) 7 (50%)
    Female 43 (32%) 29 (51%) 7 (50%)
age 52 (40, 62) 56 (44, 68) 65 (56, 72) 0.001
thickness 1.36 (0.81, 2.90) 3.54 (2.24, 4.84) 2.26 (1.29, 6.12) <0.001
ulcer


<0.001
    Absent 92 (69%) 16 (28%) 7 (50%)
    Present 42 (31%) 41 (72%) 7 (50%)
1 n (%); Median (Q1, Q3)
2 Pearson’s Chi-squared test; Kruskal-Wallis rank sum test
#Statistics for continuous variable to mean and sd
tbl_summary(melanoma1, include = -c(time, year), by = status,
  statistic = all_continuous() ~ "{mean} ({sd})") |>
#Add overall column
add_overall() |>
#Add CI
add_ci(pattern = "{stat} ({ci})", 
       all_categorical() ~ "wald") |>
 #Update spanning headers
 modify_spanning_header(c("stat_2", "stat_3") ~ "**Death**") 
Characteristic Overall
N = 205 (95% CI)
1,2
Alive
N = 134 (95% CI)
1,2
Death
Melanoma death
N = 57 (95% CI)
1,2
Non-melanoma death
N = 14 (95% CI)
1,2
sex



    Male 126 (61%) (55%, 68%) 91 (68%) (60%, 76%) 28 (49%) (35%, 63%) 7 (50%) (20%, 80%)
    Female 79 (39%) (32%, 45%) 43 (32%) (24%, 40%) 29 (51%) (37%, 65%) 7 (50%) (20%, 80%)
age 52 (17) (50, 55) 50 (16) (47, 53) 55 (18) (50, 60) 65 (11) (59, 72)
thickness 2.92 (2.96) (2.5, 3.3) 2.24 (2.33) (1.8, 2.6) 4.31 (3.57) (3.4, 5.3) 3.72 (3.63) (1.6, 5.8)
ulcer



    Absent 115 (56%) (49%, 63%) 92 (69%) (60%, 77%) 16 (28%) (16%, 41%) 7 (50%) (20%, 80%)
    Present 90 (44%) (37%, 51%) 42 (31%) (23%, 40%) 41 (72%) (59%, 84%) 7 (50%) (20%, 80%)
1 n (%); Mean (SD)
2 CI = Confidence Interval

Cross table of categorical variables

tbl_cross(row = ses, col = honors,   percent = "row", data = hsb) |>
  add_p()
honors Total p-value1
enrolled not enrolled
ses


<0.001
    high 26 (45%) 32 (55%) 58 (100%)
    low 11 (23%) 36 (77%) 47 (100%)
    middle 16 (17%) 79 (83%) 95 (100%)
Total 53 (27%) 147 (74%) 200 (100%)
1 Pearson’s Chi-squared test
# Setting theme "Compact"
theme_gtsummary_compact()
## Setting theme "Compact"
tbl_cross(row = ses, col = honors,   percent = "row", data = hsb) |>
  add_p()
honors Total p-value1
enrolled not enrolled
ses


<0.001
    high 26 (45%) 32 (55%) 58 (100%)
    low 11 (23%) 36 (77%) 47 (100%)
    middle 16 (17%) 79 (83%) 95 (100%)
Total 53 (27%) 147 (74%) 200 (100%)
1 Pearson’s Chi-squared test

Formatted table of regression model results

#Running a logistic regression model
m2 <- glm(honors ~ math + ses, family = binomial(link = "logit"), data= hsb)
#gtsummary default theme
reset_gtsummary_theme()
tab1.glm <- tbl_regression(m2, exponentiate = TRUE)
tab1.glm 
Characteristic OR1 95% CI1 p-value
math 0.84 0.79, 0.88 <0.001
ses


    high
    low 1.05 0.35, 3.10 >0.9
    middle 3.44 1.43, 8.58 0.007
1 OR = Odds Ratio, CI = Confidence Interval
#Set theme to Journal of the American Medical Association—JAMA.
theme_gtsummary_journal(journal = "jama")
## Setting theme "JAMA"
tab1.glm 
Characteristic OR1 95% CI1 p-value
math 0.84 0.79, 0.88 <0.001
ses


    high
    low 1.05 0.35, 3.10 >0.9
    middle 3.44 1.43, 8.58 0.007
1 OR = Odds Ratio, CI = Confidence Interval
#We can covert our table to package gt and modify table output.
tab1.glm |>
  #add overall p-values for ses
  add_global_p() |>
  #bold p-value less than 0.01
  bold_p(t = 0.01) |>
  #covert our table to package gt 
  as_gt() |>
  #add source note from package gt
  #md() interpret input text as Markdown-formatted text
  tab_source_note(md("*This data is simulated*"))
Characteristic OR1 95% CI1 p-value
math 0.84 0.79, 0.88 <0.001
ses

0.011
    high
    low 1.05 0.35, 3.10
    middle 3.44 1.43, 8.58
This data is simulated
1 OR = Odds Ratio, CI = Confidence Interval

inline_text()

Reproducible reports are an important part of good practices. We often need to report the results from a table in the text of an R markdown report. Inline reporting has been made simple with inline_text().

The inline_text.tbl_regression has the following format:

inline_text( x, variable, level = NULL, pattern = “{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})“, estimate_fun = x$inputs$estimate_fun, pvalue_fun = label_style_pvalue(prepend_p = TRUE), … )

For example we can use inline_text function inside two backtick, ` r inline_text() `, to report result of a gtsummary table.

For example we can use inline_text() to report the OR of the regression table in the text we can type:

For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of `r inline_text(tab1.glm, variable = math, pattern = ” {estimate}; 95% CI ({conf.low}, {conf.high})“)` keeping ses constant.

In the report it will appear like this:

For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of 0.84; 95% CI (0.79, 0.88) keeping ses constant.

Reference

Sjoberg DD, Whiting K, Curry M, Lavery JA, Larmarange J. Reproducible summary tables with the gtsummary package. The R Journal 2021;13:570–80. https://doi.org/10.32614/RJ-2021-053.

Thanks!