Output Tables in R

Introduction

A table is a structured arrangement of data, typically organized in rows and columns. It helps you see and compare data easily, making it simpler to understand and communicate the results.

Key components of a statistical table:

Title: Clearly describes the content and context of the table.
Rows: Represent different categories, groups, or individual data points.
Columns: Indicate variables or measures being reported.
Cells: Contain the actual data values corresponding to the intersection of rows and columns.
Headings: Labels for rows and columns to clarify the data being presented.
Footnotes: Additional information or explanations about the data.

We will explore how to effectively generate and present data using R, with a focus on utilizing RMarkdown for creating professional reports.

In this workshop we will cover r packages: kableExtra, flextable, gt, gtExtras, DT, table1, sjPlot, and gtsummary.

Basic Report and Result Tables in R

A data.frame in R is a type of data structure used to store data in a table format. It is one of the most common and versatile structures in R, allowing for the storage of different types of data (e.g., numeric, character, factor) in a single object.

There are two generic function in R that are used to display the output in the console.

The print() function is used to display the contents of an object in the console. It’s the most basic way to output data or results in R.
The summary() function provides a quick overview of the main statistical features of an object.

Both functions above work on various object types, such as vectors, data frames, and models.

Note: Implicit Printing: When we type an object’s name and run it, R internally calls the print() function to display the object’s contents. This is why you see the output in the console even if you don’t explicitly use print().

Review of R basic outputs

The first data that we are using in this workshop is the hsbdemo data set. The data is a sample of high school performance for 200 students.

The first step in any statistical analysis is to understand our data.

Note: The datasets used in this workshop are not real and are intended solely to demonstrate statistical analysis.

#Read the data
hsb <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")
#Names of columns
names(hsb)

##  [1] "id"      "female"  "ses"     "schtyp"  "prog"    "read"    "write"  
##  [8] "math"    "science" "socst"   "honors"  "awards"  "cid"

#Structure of data.frame
str(hsb)

## 'data.frame':    200 obs. of  13 variables:
##  $ id     : int  45 108 15 67 153 51 164 133 2 53 ...
##  $ female : chr  "female" "male" "male" "male" ...
##  $ ses    : chr  "low" "middle" "high" "low" ...
##  $ schtyp : chr  "public" "public" "public" "public" ...
##  $ prog   : chr  "vocation" "general" "vocation" "vocation" ...
##  $ read   : int  34 34 39 37 39 42 31 50 39 34 ...
##  $ write  : int  35 33 39 37 31 36 36 31 41 37 ...
##  $ math   : int  41 41 44 42 40 42 46 40 33 46 ...
##  $ science: int  29 36 26 33 39 31 39 34 42 39 ...
##  $ socst  : int  26 36 42 32 51 39 46 31 41 31 ...
##  $ honors : chr  "not enrolled" "not enrolled" "not enrolled" "not enrolled" ...
##  $ awards : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cid    : int  1 1 1 1 1 1 1 1 1 1 ...

#Change categorical variables from character to factors
hsb <- within(hsb,{
             female <- factor(female)
             ses <- factor(ses)
             schtyp <- factor(schtyp)
             prog <- factor(prog)
             honors <- factor(honors)
             })
#Print first 6 rows of data
hsb6 <- head(hsb)
print(hsb6)

##    id female    ses schtyp     prog read write math science socst       honors
## 1  45 female    low public vocation   34    35   41      29    26 not enrolled
## 2 108   male middle public  general   34    33   41      36    36 not enrolled
## 3  15   male   high public vocation   39    39   44      26    42 not enrolled
## 4  67   male    low public vocation   37    37   42      33    32 not enrolled
## 5 153   male middle public vocation   39    31   40      39    51 not enrolled
## 6  51 female   high public  general   42    36   42      31    39 not enrolled
##   awards cid
## 1      0   1
## 2      0   1
## 3      0   1
## 4      0   1
## 5      0   1
## 6      0   1

By printing the first 6 rows of the data we created a Tabular of the first 6 observations.

We can use summary() function to report summary statistics. for example we can get the summary statistics of students scores and honors.

#Summary statistics
summary(hsb[c("read", "write", "math", "science", "socst", "honors")])

##       read           write            math          science     
##  Min.   :28.00   Min.   :31.00   Min.   :33.00   Min.   :26.00  
##  1st Qu.:44.00   1st Qu.:45.75   1st Qu.:45.00   1st Qu.:44.00  
##  Median :50.00   Median :54.00   Median :52.00   Median :53.00  
##  Mean   :52.23   Mean   :52.77   Mean   :52.65   Mean   :51.85  
##  3rd Qu.:60.00   3rd Qu.:60.00   3rd Qu.:59.00   3rd Qu.:58.00  
##  Max.   :76.00   Max.   :67.00   Max.   :75.00   Max.   :74.00  
##      socst                honors   
##  Min.   :26.00   enrolled    : 53  
##  1st Qu.:46.00   not enrolled:147  
##  Median :52.00                     
##  Mean   :52.41                     
##  3rd Qu.:61.00                     
##  Max.   :71.00

Contingency Table with `table()` and `xtab()`

The table() function from R base creates frequency tables that summarize categorical data. We can also use function xtab from R stats package.

In our data we want to make a cross tabulate or contingency table for variables ses and honors.

#Tow-way Contingency Table
tab1 <- table(hsb$ses, hsb$honors)
tab1

##         
##          enrolled not enrolled
##   high         26           32
##   low          11           36
##   middle       16           79

#Proportional table
prop.table(tab1)

##         
##          enrolled not enrolled
##   high      0.130        0.160
##   low       0.055        0.180
##   middle    0.080        0.395

#Proportional table by row
prop.table(tab1, margin = 1)

##         
##           enrolled not enrolled
##   high   0.4482759    0.5517241
##   low    0.2340426    0.7659574
##   middle 0.1684211    0.8315789

#Tow-way Contingency Table
tab2 <- xtabs(~ ses + honors, data = hsb)
tab2

##         honors
## ses      enrolled not enrolled
##   high         26           32
##   low          11           36
##   middle       16           79

#Proportional table
prop.table(tab2)

##         honors
## ses      enrolled not enrolled
##   high      0.130        0.160
##   low       0.055        0.180
##   middle    0.080        0.395

#Proportional table by row
prop.table(tab2, margin = 1)

##         honors
## ses       enrolled not enrolled
##   high   0.4482759    0.5517241
##   low    0.2340426    0.7659574
##   middle 0.1684211    0.8315789

#Summary on a table object will perform a chi-squared test
summary(tab2)

## Call: xtabs(formula = ~ses + honors, data = hsb)
## Number of cases in table: 200 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 14.783, df = 2, p-value = 0.0006164

#Three-way cross tab
tab3 <- xtabs(~ ses + honors + female, data = hsb)
tab3

## , , female = female
## 
##         honors
## ses      enrolled not enrolled
##   high         15           14
##   low          10           22
##   middle       10           38
## 
## , , female = male
## 
##         honors
## ses      enrolled not enrolled
##   high         11           18
##   low           1           14
##   middle        6           41

ftable(tab3)

##                     female female male
## ses    honors                         
## high   enrolled                15   11
##        not enrolled            14   18
## low    enrolled                10    1
##        not enrolled            22   14
## middle enrolled                10    6
##        not enrolled            38   41

Regression models

Regression model often used to understand the relationship between a dependent variable and one or more independent variables.

In R we use summary() function to extract and report results of a regression model.

As an example we are using hsb data to regress math score on read and write score and prog.

#Run regression of math on read, write, and prog
m1 <- lm(math ~ read + write + prog, data = hsb)
lm.result <- summary(m1)
lm.result

## 
## Call:
## lm(formula = math ~ read + write + prog, data = hsb)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.257  -4.564  -0.211   4.271  17.527 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.20202    3.35561   5.722 3.91e-08 ***
## read          0.37186    0.05685   6.541 5.24e-10 ***
## write         0.29591    0.06149   4.812 2.98e-06 ***
## proggeneral  -2.87185    1.18968  -2.414  0.01670 *  
## progvocation -3.79862    1.23942  -3.065  0.00249 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.408 on 195 degrees of freedom
## Multiple R-squared:  0.5415, Adjusted R-squared:  0.5321 
## F-statistic: 57.57 on 4 and 195 DF,  p-value: < 2.2e-16

#Extracting coefficients table (it is a matrix)
lm.result$coefficients

##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  19.2020151 3.35561280  5.722357 3.911030e-08
## read          0.3718589 0.05684928  6.541138 5.240877e-10
## write         0.2959093 0.06149161  4.812190 2.984266e-06
## proggeneral  -2.8718518 1.18968055 -2.413969 1.670404e-02
## progvocation -3.7986171 1.23941526 -3.064846 2.486293e-03

#Adding Confidence interval for coefficients
lm.table <- cbind(lm.result$coefficients, confint(m1))
#Changing the names of columns
colnames(lm.table)[c(5,6)] <- c("LL", "UL")
#Round number to 4 digits and print
round(lm.table, 4)

##              Estimate Std. Error t value Pr(>|t|)      LL      UL
## (Intercept)   19.2020     3.3556  5.7224   0.0000 12.5841 25.8200
## read           0.3719     0.0568  6.5411   0.0000  0.2597  0.4840
## write          0.2959     0.0615  4.8122   0.0000  0.1746  0.4172
## proggeneral   -2.8719     1.1897 -2.4140   0.0167 -5.2181 -0.5256
## progvocation  -3.7986     1.2394 -3.0648   0.0025 -6.2430 -1.3542

Advanced Tables in R Using R Packages within RMarkdown

Advantages of Using R Markdown to Create Tables

Reproducibility: Embedding code directly in the document ensures that tables can be easily reproduced by anyone, reducing errors from manual copying.
Dynamic Updates: Changes to the data or analysis are automatically reflected in the tables when the document is re-rendered, eliminating the need for manual updates.
Automation: RMarkdown generates and formats tables automatically, streamlining the process and avoiding repetitive tasks like reformatting.
Consistency: Tables maintain a consistent format and style throughout the document, which is particularly useful for large reports.
Customization and Formatting Advanced table customization options allow for professional and polished presentation, without needing to rely on external tools for formatting.

In summary, using RMarkdown to create tables ensures automation, reproducibility, and consistency, while also providing powerful customization and formatting options that are not available with simple copy-pasting from the console.

In the rest of the workshops we are introducing some of those packages with examples.

knitr::kable and kableExtra

Advantages of knitr::kable and kableExtra

Simplicity: knitr::kable offers a straightforward way to create clean tables with minimal code. It’s easy to use for beginners and perfect for simple tables that don’t require extensive customization.
Integration with RMarkdown: kable() is designed to work seamlessly with RMarkdown, making it easy to generate tables that fit well within dynamic documents.
Flexibility with kableExtra: When paired with kableExtra, kable becomes highly customizable. You can add advanced features like multi-row headers, colors, borders, alignment adjustments, column spanning, custom styling, and footnotes.
Theme: kableExtra offers some alternative HTML table themes other than the default bootstrap theme.

The kable() function in package knitr is a very simple table generator. It only generates tables for strictly rectangular data such as matrices and data frames.

This function does have a large number of arguments for you to customize the appearance of tables:

kable(x, format, digits = getOption(“digits”), row.names = NA, col.names = NA, align, caption = NULL, label = NULL, format.args = list(), escape = TRUE, …)

format is A character string. Possible values are latex, html, pipe (Pandoc’s pipe tables), … .

If you only need one table format that is not the default format for a document, you can set the global R option knitr.table.format, e.g.,

options(knitr.table.format = “html”)

We are using dataset state.x77.

state7 <- data.frame(state.x77)[1:7,]

knitr::kable(head(state7), format = "html")

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766

If we use pipe format the table will be like below image which is a Pandoc’s pipe table and depends on type of R Markdown specifications it will be rendered.

For example, since I use ioslide to create this slides, If I do not specify format or I use format = pipe the output table will be look like this:

my.table <- knitr::kable(state7, format = "pipe")
my.table

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

To learn more about knitr::kable() and it’s options you can check out the link below:

rmarkdown-cookbook, 10.1 The function knitr::kable()

kableExtra

Package kableExtra is an addition to knitr::kable(). The goal of R package kableExtra is to help you build common complex tables and manipulate table styles. It imports the pipe %>% symbol from magrittr (also works with R base pipe, |>) and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with ggplot2.

The basic HTML output is just a plain HTML table without any styling.

#plain HTML
kbl(state7)

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

Bootstrap theme

kable_styling() will automatically apply twitter bootstrap theme to the table.

To see more option for this function please check the help file:

?kable_styling

state7 %>%
  kbl() %>%
  #twitter bootstrap theme
  kable_styling()

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

Alternative themes

kableExtra also offers 6 other alternative HTML table themes other than the default bootstrap theme. They are: kable_paper, kable_classic, kable_classic_2, kable_minimal, kable_material and kable_material_dark.

We can also use options in kable_styling() to customize output table.

Here is some examples:

state7  %>%
  kbl() %>%
  #paper  theme with hover and full_width = F
  kable_paper("hover", full_width = F)

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

Full width

state7 %>%
  kbl(caption = "Recreating booktabs style table") %>%
  # classic theme  and other options
  kable_classic(full_width = F, html_font = "Cambria",  position = "left")

Recreating booktabs style table
	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

striped

state7 %>%
  kbl() %>%
  #material theme with striped rows
  kable_material(lightable_options= c("striped"))

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

Column / Row Specification

kbl(state7) %>%
  #paper theme 
  kable_paper(full_width = F) %>%
  #Make first column bold and add border
  column_spec(1, bold = T, border_right = T) %>%
  #Make column 9 width larger and background yellow
  column_spec(9, width = "6em", background = "yellow")

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

kbl(state7) %>%
  #paper theme 
kable_paper(full_width = F) %>%
  #Conditional formatting column 2 
  column_spec(2, color = spec_color(state7$Population, palette = c("black", "red"))) %>%
  #Conditional formatting background column 4 text white 
  column_spec(4, color = "white",
              background = spec_color(state7$Illiteracy<=1.5, palette = c("red", "green"))) %>% 
  #Change fist row angle
    row_spec(0, angle = -45)

	Population	Income	Illiteracy	Life.Exp	Murder	HS.Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862

One of the nice future of kableExtra which only available for html format is Scroll box. If you have a huge table and you want to include it in your website or HTML document but don’t want to use a lots of space, using scroll box is a good solution.

kbl(state.x77) %>%
  kable_paper() %>%
  #Add scroll bar
  scroll_box(width = "400px", height = "200px")

	Population	Income	Illiteracy	Life Exp	Murder	HS Grad	Frost	Area
Alabama	3615	3624	2.1	69.05	15.1	41.3	20	50708
Alaska	365	6315	1.5	69.31	11.3	66.7	152	566432
Arizona	2212	4530	1.8	70.55	7.8	58.1	15	113417
Arkansas	2110	3378	1.9	70.66	10.1	39.9	65	51945
California	21198	5114	1.1	71.71	10.3	62.6	20	156361
Colorado	2541	4884	0.7	72.06	6.8	63.9	166	103766
Connecticut	3100	5348	1.1	72.48	3.1	56.0	139	4862
Delaware	579	4809	0.9	70.06	6.2	54.6	103	1982
Florida	8277	4815	1.3	70.66	10.7	52.6	11	54090
Georgia	4931	4091	2.0	68.54	13.9	40.6	60	58073
Hawaii	868	4963	1.9	73.60	6.2	61.9	0	6425
Idaho	813	4119	0.6	71.87	5.3	59.5	126	82677
Illinois	11197	5107	0.9	70.14	10.3	52.6	127	55748
Indiana	5313	4458	0.7	70.88	7.1	52.9	122	36097
Iowa	2861	4628	0.5	72.56	2.3	59.0	140	55941
Kansas	2280	4669	0.6	72.58	4.5	59.9	114	81787
Kentucky	3387	3712	1.6	70.10	10.6	38.5	95	39650
Louisiana	3806	3545	2.8	68.76	13.2	42.2	12	44930
Maine	1058	3694	0.7	70.39	2.7	54.7	161	30920
Maryland	4122	5299	0.9	70.22	8.5	52.3	101	9891
Massachusetts	5814	4755	1.1	71.83	3.3	58.5	103	7826
Michigan	9111	4751	0.9	70.63	11.1	52.8	125	56817
Minnesota	3921	4675	0.6	72.96	2.3	57.6	160	79289
Mississippi	2341	3098	2.4	68.09	12.5	41.0	50	47296
Missouri	4767	4254	0.8	70.69	9.3	48.8	108	68995
Montana	746	4347	0.6	70.56	5.0	59.2	155	145587
Nebraska	1544	4508	0.6	72.60	2.9	59.3	139	76483
Nevada	590	5149	0.5	69.03	11.5	65.2	188	109889
New Hampshire	812	4281	0.7	71.23	3.3	57.6	174	9027
New Jersey	7333	5237	1.1	70.93	5.2	52.5	115	7521
New Mexico	1144	3601	2.2	70.32	9.7	55.2	120	121412
New York	18076	4903	1.4	70.55	10.9	52.7	82	47831
North Carolina	5441	3875	1.8	69.21	11.1	38.5	80	48798
North Dakota	637	5087	0.8	72.78	1.4	50.3	186	69273
Ohio	10735	4561	0.8	70.82	7.4	53.2	124	40975
Oklahoma	2715	3983	1.1	71.42	6.4	51.6	82	68782
Oregon	2284	4660	0.6	72.13	4.2	60.0	44	96184
Pennsylvania	11860	4449	1.0	70.43	6.1	50.2	126	44966
Rhode Island	931	4558	1.3	71.90	2.4	46.4	127	1049
South Carolina	2816	3635	2.3	67.96	11.6	37.8	65	30225
South Dakota	681	4167	0.5	72.08	1.7	53.3	172	75955
Tennessee	4173	3821	1.7	70.11	11.0	41.8	70	41328
Texas	12237	4188	2.2	70.90	12.2	47.4	35	262134
Utah	1203	4022	0.6	72.90	4.5	67.3	137	82096
Vermont	472	3907	0.6	71.64	5.5	57.1	168	9267
Virginia	4981	4701	1.4	70.08	9.5	47.8	85	39780
Washington	3559	4864	0.6	71.72	4.3	63.5	32	66570
West Virginia	1799	3617	1.4	69.48	6.7	41.6	100	24070
Wisconsin	4589	4468	0.7	72.48	3.0	54.5	149	54464
Wyoming	376	4566	0.6	70.29	6.9	62.9	173	97203

To learn more about table styles and options in kable_styling You can check the link below:

Create Awesome HTML Table with knitr::kable and kableExtra

Using the `flextable`

flextable is designed to create and format tables that can be easily exported into Word and PowerPoint documents. It allows users to create richly formatted tables with features like text formatting, colors, borders, and alignment, making it ideal for generating professional-looking tables in document reports.

Advantages of flextable Compared to Other Packages

Extensive Customization: flextable offers detailed control over the formatting of tables, including text alignment, fonts, colors, borders, and cell-level styling. This level of customization goes beyond what simpler packages like kable can offer, allowing for professional and polished tables.
Integration with Word and PowerPoint: One of the standout features of flextable is its seamless integration with Microsoft Word and PowerPoint through the officer package. You can directly export beautifully formatted tables into these documents, making it ideal for users who frequently work with Word and PowerPoint.
Conditional Formatting: The package allows for conditional formatting based on the values in the table, which is useful for highlighting key data points or making tables more informative visually.
Predefined Themes: flextable offers built-in themes that provide consistent, aesthetically pleasing styles for tables. This reduces the effort needed to style tables while maintaining a professional appearance.

The main function is flextable which takes a data.frame as argument and returns a flextable object.

#def
ft <- flextable(hsb[1:10, -13])
ft

id	female	ses	schtyp	prog	read	write	math	science	socst	honors
45	female	low	public	vocation	34	35	41	29	26	not enrolled
108	male	middle	public	general	34	33	41	36	36	not enrolled
15	male	high	public	vocation	39	39	44	26	42	not enrolled
67	male	low	public	vocation	37	37	42	33	32	not enrolled
153	male	middle	public	vocation	39	31	40	39	51	not enrolled
51	female	high	public	general	42	36	42	31	39	not enrolled
164	male	middle	public	vocation	31	36	46	39	46	not enrolled
133	male	middle	public	vocation	50	31	40	34	31	not enrolled
2	female	middle	public	vocation	39	41	33	42	41	not enrolled
53	male	middle	public	vocation	34	37	46	39	31	not enrolled

ft |>
  #add header row
  add_header_row(
    colwidths = c(3, 2, 5, 2),
    values = c("Student", "School", "Grades", "Achievements")) |>
  #Use theme_vanilla
  theme_vanilla() |>
  #Add footer
  add_footer_lines("This data is simulated and it is not real") |>
  color(part = "footer", color = "#666666") |>
  #set Caption
  set_caption(caption = "First 10 rows of a sample of high school data") |>
  #Align header to center
  align(align = "center", part = "header", i = 1)

First 10 rows of a sample of high school data
Student			School		Grades					Achievements
id	female	ses	schtyp	prog	read	write	math	science	socst	honors	awards
45	female	low	public	vocation	34	35	41	29	26	not enrolled	0
108	male	middle	public	general	34	33	41	36	36	not enrolled	0
15	male	high	public	vocation	39	39	44	26	42	not enrolled	0
67	male	low	public	vocation	37	37	42	33	32	not enrolled	0
153	male	middle	public	vocation	39	31	40	39	51	not enrolled	0
51	female	high	public	general	42	36	42	31	39	not enrolled	0
164	male	middle	public	vocation	31	36	46	39	46	not enrolled	0
133	male	middle	public	vocation	50	31	40	34	31	not enrolled	0
2	female	middle	public	vocation	39	41	33	42	41	not enrolled	0
53	male	middle	public	vocation	34	37	46	39	31	not enrolled	0
This data is simulated and it is not real

The flextable package will not aggregate data for you but it will help you to present aggregated data. However, it has some useful function to generate descriptive statistics.

Cross tab with `proc_freq()`

Function proc_freq() compute a contingency table and create a flextable from the result. The aim of the function is to reproduce the results of the SAS PROC FREQ.

proc_freq(hsb, "ses", "honors",
          include.row_percent = TRUE,
          include.column_percent = TRUE,
          include.table_percent = TRUE)

ses		honors
ses		enrolled	not enrolled	Total
high	Count	26 (13.0%)	32 (16.0%)	58 (29.0%)
high	Mar. pct (1)	49.1% ; 44.8%	21.8% ; 55.2%
low	Count	11 (5.5%)	36 (18.0%)	47 (23.5%)
low	Mar. pct	20.8% ; 23.4%	24.5% ; 76.6%
middle	Count	16 (8.0%)	79 (39.5%)	95 (47.5%)
middle	Mar. pct	30.2% ; 16.8%	53.7% ; 83.2%
Total	Count	53 (26.5%)	147 (73.5%)	200 (100.0%)
(1) Columns and rows percentages

There are many more flexibility in the flextable package, especially when used in conjunction with other packages, that we cannot cover in this workshop.

For more on flextable you can check the links below:

Using flextable

flextable cheat sheet

Function reference (manuals)

flextable gallery

DT

The R package DT provides an R interface to the JavaScript library DataTables. R data objects (matrices or data frames) can be displayed as tables on HTML pages, and DataTables provides interactive table with filtering, pagination, sorting, and many other features in the tables.

Key Features of DT Package

Interactivity: DT creates interactive tables with features like sorting, and searching, ideal for web use and Shiny apps. flextable and kableExtra focus on static tables.

– JavaScript Integration: DT leverages DataTables for advanced client-side features like inline editing and exporting, making it great for web applications.

– Ease of Use for Web Applications: DT is best for web applications and easy to use and implement.

The main function in this package is datatable(). It creates an HTML widget to display R data objects with DataTables.

datatable(diamonds[1:200,])

If you are familiar with DataTables Javascript HTML table library, you may use the options argument to customize the table.

We can added a filter argument in datatable() to automatically generate column filters. By default, the filters are not shown since filter = "none". You can enable these filters by filter = "top" or "bottom".

#Add filter
datatable(diamonds[1:200,], filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE
))

For more examples and options for package DT you can check the link below:

DT: An R interface to the DataTables library

gt

Package gt is aimed to distinguish between data tables (e.g., tibbles, data.frames, etc.) and presentation tables and summary tables.

Advantage of the gt Package

Customization:

gt provides extensive options for customizing table appearance, including fonts, colors, borders, and spacing. This allows for creating visually appealing and professionally formatted tables.
Easy to Use:

The package has a user-friendly syntax that simplifies the creation of complex tables. It’s designed to be intuitive and easy to learn, making table creation straightforward.
Integration with RMarkdown:

gt integrates well with RMarkdown, enabling you to include sophisticated tables in dynamic documents. It supports rendering in HTML and integrates seamlessly into RMarkdown reports.
Publication-Ready Tables:

gt is designed for generating publication-quality tables that are clean and well-formatted. It’s ideal for academic papers, reports, and presentations where table aesthetics are important.

Here we run one simple example from the package reference page. The package gt is very similar to package flextable but it currently supports HTML, LaTex, and RTF. Package flextable is compatible with Microsoft software like word and power point.

# Modify the `airquality` dataset by adding the year
# of the measurements (1973) and limiting to 10 rows
airquality_m <- 
  airquality |>
  #add year 1973
  mutate(Year = 1973L) |>
  #select the first 10 rows
  slice(1:10)
  
# Create a display table using the `airquality`
# dataset; arrange columns into groups
gt_tbl <- 
  gt(airquality_m)
#Print gt table
gt_tbl

Ozone	Solar.R	Wind	Temp	Month	Day	Year
41	190	7.4	67	5	1	1973
36	118	8.0	72	5	2	1973
12	149	12.6	74	5	3	1973
18	313	11.5	62	5	4	1973
NA	NA	14.3	56	5	5	1973
28	NA	14.9	66	5	6	1973
23	299	8.6	65	5	7	1973
19	99	13.8	59	5	8	1973
8	19	20.1	61	5	9	1973
NA	194	8.6	69	5	10	1973

gt_tbl |>
  #Add title and subtitle 
  tab_header(
    title = "New York Air Quality Measurements",
    subtitle = "Daily measurements in New York City (May 1-10, 1973)"
  ) |>
  #Span columns 
  tab_spanner(
    label = "Time",
    columns = c(Year, Month, Day)
  ) |>
  tab_spanner(
    label = "Measurement",
    columns = c(Ozone, Solar.R, Wind, Temp)
  )

Measurement				Time
New York Air Quality Measurements
Daily measurements in New York City (May 1-10, 1973)
Ozone	Solar.R	Wind	Temp	Year	Month	Day
41	190	7.4	67	1973	5	1
36	118	8.0	72	1973	5	2
12	149	12.6	74	1973	5	3
18	313	11.5	62	1973	5	4
NA	NA	14.3	56	1973	5	5
28	NA	14.9	66	1973	5	6
23	299	8.6	65	1973	5	7
19	99	13.8	59	1973	5	8
8	19	20.1	61	1973	5	9
NA	194	8.6	69	1973	5	10

The reference for package gt

gt package

gtExtras

Package gtExtras also provide additional functions to assist with package gt, specially if you want to include plots in your tables:

Overall, there are four families of functions in gtExtras:

Themes: 7 themes that style almost every element of a gt table, built off of data journalism-styled tables
Utilities: Helper functions for aligning/padding numbers, adding fontawesome icons, images, highlighting, dividers, styling by group, creating two tables or two column layouts, extracting ordered data from a gt table internals, or generating a random dataset.
Plotting: 12 plotting functions for inline sparklines, win-loss charts, distributions (density/histogram), percentiles, dot + bar, bar charts, confidence intervals, or summarizing an entire dataframe!
Colors: 3 functions, a palette for “Hulk” style scale (purple/green), coloring rows with good defaults from paletteer, or adding a “color box” along with the cell value

gt_tbl %>% 
  #USe theme NYT
  gt_theme_nytimes() %>% 
  #Change header title
  tab_header(title = "Table styled like the NY Times") %>% 
  #Hulk data_color
  #Trim provides a tighter range of purple/green
  gt_hulk_col_numeric(Ozone, trim = TRUE)

Ozone	Solar.R	Wind	Temp	Month	Day	Year
Table styled like the NY Times
41	190	7.4	67	5	1	1973
36	118	8.0	72	5	2	1973
12	149	12.6	74	5	3	1973
18	313	11.5	62	5	4	1973
NA	NA	14.3	56	5	5	1973
28	NA	14.9	66	5	6	1973
23	299	8.6	65	5	7	1973
19	99	13.8	59	5	8	1973
8	19	20.1	61	5	9	1973
NA	194	8.6	69	5	10	1973

For more options and future in the package gtExtras check the package references:

Plotting with gtExtras

Beautiful tables in R with gtExtras

Summary tables

There are some packages in R that will create publication-ready summary tables in R with minimal effort. Those packages have built-in functions to generate standard format summary tables.

In this part we are going to use R packages table1, SjPlot, gtsummary, and stargazer.

The advantage of using those packages are that you do not need to write extra data preparation codes to summarize data and analysis in a data.frame. However, we lose flexibility and it will be more difficult to customize tables.

“Table 1” in Statistical Analysis and the `table1` Package

In journal articles, particularly in fields like epidemiology and health data, the first table (commonly referred to as “Table 1”), presents the descriptive statistics of baseline characteristics of the study sample. This table is typically stratified by one or more grouping variables, such as treatment groups or demographic categories.

The table1 package in R simplifies the creation of such tables.

Key Features of table1 Package:

Descriptive Statistics: Provides means, medians, standard deviations, and proportions for various variables.
Stratification: Allows for grouping and stratifying by one or more categorical variables.
Customization: Although, the package offers options for customizing the appearance and content of the table to meet publication standards but it is not straightforward and easy to customize your table.
Easy to use: table1 is easy to use. However, users might find it challenging to customize.
Converting to to other packages: One advantage of this package is that it is possible (with some limitations) to convert the output of table1() to a data.frame, kableExtra or flextable, using the functions as.data.frame(), t1kable() and t1flex() respectively.

Example of `table1`

The data used for this example if from package boot, called melanoma. The data consist of measurements made on patients with malignant melanoma.

The Grouping variable is patients status at the end of the study. 1 indicates that they had died from melanoma, 2 indicates that they were still alive and 3 indicates that they had died from causes unrelated to their melanoma.

melanoma1 <- melanoma
# Change status to factor
melanoma1$status <- 
  factor(melanoma1$status, 
         levels=c(2,1,3),
         labels=c("Alive", # Reference
                  "Melanoma death", 
                  "Non-melanoma death"))
# Change sex to factor and label them 
melanoma1$sex <- 
  factor(melanoma1$sex, labels = c("Male", 
                  "Female"))
# Change ulcer to factor and label them  
melanoma1$ulcer <- 
  factor(melanoma1$ulcer, labels = c("Absent", 
                  "Present"))
#Basic table 1
(table1.1 <- table1(~ sex + age + ulcer + thickness | status, data=melanoma1))

	Alive (N=134)	Melanoma death (N=57)	Non-melanoma death (N=14)	Overall (N=205)
sex
Male	91 (67.9%)	28 (49.1%)	7 (50.0%)	126 (61.5%)
Female	43 (32.1%)	29 (50.9%)	7 (50.0%)	79 (38.5%)
age
Mean (SD)	50.0 (15.9)	55.1 (17.9)	65.3 (10.9)	52.5 (16.7)
Median [Min, Max]	52.0 [4.00, 84.0]	56.0 [14.0, 95.0]	65.0 [49.0, 86.0]	54.0 [4.00, 95.0]
ulcer
Absent	92 (68.7%)	16 (28.1%)	7 (50.0%)	115 (56.1%)
Present	42 (31.3%)	41 (71.9%)	7 (50.0%)	90 (43.9%)
thickness
Mean (SD)	2.24 (2.33)	4.31 (3.57)	3.72 (3.63)	2.92 (2.96)
Median [Min, Max]	1.36 [0.100, 12.9]	3.54 [0.320, 17.4]	2.26 [0.160, 12.6]	1.94 [0.100, 17.4]

To improve things, we can create factors with descriptive labels for the categorical variables (sex and ulcer), label each variable the way we want, and specify units for the continuous variables (age and thickness). We also specify that the overall column to be labeled “Total” and be positioned on the left, and add a caption and footnote.

melanoma2 <- melanoma1
#Label the variables name
label(melanoma2$sex)       <- "Sex"
label(melanoma2$age)       <- "Age"
label(melanoma2$ulcer)     <- "Ulceration"
#I Use asterisk for footnote!
label(melanoma2$thickness) <- "Thickness *"
#Assign unit to age and thickness
units(melanoma2$age)       <- "years"
units(melanoma2$thickness) <- "mm"
#create caption
caption  <- "Descriptive statistics of patients characteristics by status"
#create footnote
footnote <- "* Also known as Breslow thickness"
#Create table1
table1(~ sex + age + ulcer + thickness | status, data=melanoma2,
    overall=c(left="Total"), caption=caption, footnote=footnote)

Descriptive statistics of patients characteristics by status
	Total (N=205)	Alive (N=134)	Melanoma death (N=57)	Non-melanoma death (N=14)
* Also known as Breslow thickness
Sex
Male	126 (61.5%)	91 (67.9%)	28 (49.1%)	7 (50.0%)
Female	79 (38.5%)	43 (32.1%)	29 (50.9%)	7 (50.0%)
Age (years)
Mean (SD)	52.5 (16.7)	50.0 (15.9)	55.1 (17.9)	65.3 (10.9)
Median [Min, Max]	54.0 [4.00, 95.0]	52.0 [4.00, 84.0]	56.0 [14.0, 95.0]	65.0 [49.0, 86.0]
Ulceration
Absent	115 (56.1%)	92 (68.7%)	16 (28.1%)	7 (50.0%)
Present	90 (43.9%)	42 (31.3%)	41 (71.9%)	7 (50.0%)
Thickness * (mm)
Mean (SD)	2.92 (2.96)	2.24 (2.33)	4.31 (3.57)	3.72 (3.63)
Median [Min, Max]	1.94 [0.100, 17.4]	1.36 [0.100, 12.9]	3.54 [0.320, 17.4]	2.26 [0.160, 12.6]

Now we grouped together two “Death” strata (Melanoma and Non-melanoma) under a common heading.

#label the variables sex. age, ulcer, and Thickness
#Add groups label
labels <- list(
    variables=list(sex="Sex",
                   age="Age (years)",
                   ulcer="Ulceration",
                   thickness="Thickness* (mm)"),
    groups=list("", "", "Death"))

# Remove the word "death" from the labels, since it now appears above
levels(melanoma2$status) <- c("Alive", "Melanoma", "Non-melanoma")
#Set up our “strata”, or column, as a list of data.frame
strata <- c(list(Total=melanoma2), split(melanoma2, melanoma2$status))
#Create new table1
table1(strata, labels, groupspan=c(1, 1, 2), caption=caption, footnote=footnote)

Descriptive statistics of patients characteristics by status
			Death
	Total (N=205)	Alive (N=134)	Melanoma (N=57)	Non-melanoma (N=14)
* Also known as Breslow thickness
Sex
Male	126 (61.5%)	91 (67.9%)	28 (49.1%)	7 (50.0%)
Female	79 (38.5%)	43 (32.1%)	29 (50.9%)	7 (50.0%)
Age (years)
Mean (SD)	52.5 (16.7)	50.0 (15.9)	55.1 (17.9)	65.3 (10.9)
Median [Min, Max]	54.0 [4.00, 95.0]	52.0 [4.00, 84.0]	56.0 [14.0, 95.0]	65.0 [49.0, 86.0]
Ulceration
Absent	115 (56.1%)	92 (68.7%)	16 (28.1%)	7 (50.0%)
Present	90 (43.9%)	42 (31.3%)	41 (71.9%)	7 (50.0%)
Thickness* (mm)
Mean (SD)	2.92 (2.96)	2.24 (2.33)	4.31 (3.57)	3.72 (3.63)
Median [Min, Max]	1.94 [0.100, 17.4]	1.36 [0.100, 12.9]	3.54 [0.320, 17.4]	2.26 [0.160, 12.6]

You can see that Customizing in table1 in the last step is not very easy and need some extra work.

Converting to flextable

Now lets try to convert it to flextable and customize it there!

#Converting to flextable
tab1.flex <- table1.1 |> t1flex() 
#Print 
tab1.flex

	Alive (N=134)	Melanoma death (N=57)	Non-melanoma death (N=14)	Overall (N=205)
sex
Male	91 (67.9%)	28 (49.1%)	7 (50.0%)	126 (61.5%)
Female	43 (32.1%)	29 (50.9%)	7 (50.0%)	79 (38.5%)
age
Mean (SD)	50.0 (15.9)	55.1 (17.9)	65.3 (10.9)	52.5 (16.7)
Median [Min, Max]	52.0 [4.00, 84.0]	56.0 [14.0, 95.0]	65.0 [49.0, 86.0]	54.0 [4.00, 95.0]
ulcer
Absent	92 (68.7%)	16 (28.1%)	7 (50.0%)	115 (56.1%)
Present	42 (31.3%)	41 (71.9%)	7 (50.0%)	90 (43.9%)
thickness
Mean (SD)	2.24 (2.33)	4.31 (3.57)	3.72 (3.63)	2.92 (2.96)
Median [Min, Max]	1.36 [0.100, 12.9]	3.54 [0.320, 17.4]	2.26 [0.160, 12.6]	1.94 [0.100, 17.4]

#Modify tab1.flex
tab1.flex |>
  #add header row
  add_header_row(
  values = c("",  "Death", ""),  # Labels for the top header row
  colwidths = c(2, 2, 1)  # Number of columns spanned by each header
) |>
  #Fist remove the borderline under death
  hline(part = "header", i = 1, border = officer::fp_border(width = 0))|>
  # Line over Melanoma and Non-melanoma
   hline(part = "header", i = 1, border = officer::fp_border(width = 1.5), j = 3:4) |>
  #Change lables for rows!
compose(i = 1, j = 1, as_paragraph(as_chunk("SEX"))) |>
compose(i = 4, j = 1, as_paragraph(as_chunk("AGE (years)"))) |>
compose(i = 7, j = 1, as_paragraph(as_chunk("Ulceration"))) |>
compose(i = 10, j = 1, as_paragraph(as_chunk("Thickness (mm)"))) |>
#Add Caption
set_caption(ft, caption = "Table 1: Descriptive statistics of patients characteristics by status") |>
##Add footnote 
footnote( i = 10, j = 1,
  ref_symbols = "a",
  value = as_paragraph("Also known as Breslow thickness")
) |>
  fontsize(i = 1, j =1 , size = 9, part = "footer")

Table 1: Descriptive statistics of patients characteristics by status
		Death
	Alive (N=134)	Melanoma death (N=57)	Non-melanoma death (N=14)	Overall (N=205)
SEX
Male	91 (67.9%)	28 (49.1%)	7 (50.0%)	126 (61.5%)
Female	43 (32.1%)	29 (50.9%)	7 (50.0%)	79 (38.5%)
AGE (years)
Mean (SD)	50.0 (15.9)	55.1 (17.9)	65.3 (10.9)	52.5 (16.7)
Median [Min, Max]	52.0 [4.00, 84.0]	56.0 [14.0, 95.0]	65.0 [49.0, 86.0]	54.0 [4.00, 95.0]
Ulceration
Absent	92 (68.7%)	16 (28.1%)	7 (50.0%)	115 (56.1%)
Present	42 (31.3%)	41 (71.9%)	7 (50.0%)	90 (43.9%)
Thickness (mm)a
Mean (SD)	2.24 (2.33)	4.31 (3.57)	3.72 (3.63)	2.92 (2.96)
Median [Min, Max]	1.36 [0.100, 12.9]	3.54 [0.320, 17.4]	2.26 [0.160, 12.6]	1.94 [0.100, 17.4]
aAlso known as Breslow thickness

You may find it somewhat more line of the code but the steps are more straightforward and clear.

To learn more about package table1 see the link below:

Using the table1 Package to Create HTML Tables of Descriptive Statistics

sjPlot

sjPlot package is a collection of plotting and table output functions for data visualization.

Results of various statistical analyses (that are commonly used in social sciences) can be visualized using this package, including simple and cross tabulated frequencies, linear models, glm models, mixed effects models, PCA and correlation matrices, cluster analyses, and much more.

Key Features of `sjPlot`:

Cross tabulation: tab_xtab() Creates cross-tabulations with options for adding row and column percentages.
Regression Tables: tab_model() Create tables of regression models with detailed statistical summaries, including coefficients, standard errors, p-values, and confidence intervals. Supports various model types such as linear, logistic, and mixed-effects models.
Multiple models: Combine results from multiple models into a single table for comparative analysis.

Cross tabulation with `tab_xtab()`

#Cross tab for ses and "honors for hsb data
tab_xtab(var.row = hsb$ses, var.col = hsb$female, 
         show.col.prc = TRUE)

ses	female		Total
ses	female	male	Total
high	29 26.6 %	29 31.9 %	58 29 %
low	32 29.4 %	15 16.5 %	47 23.5 %
middle	48 44 %	47 51.6 %	95 47.5 %
Total	109 100 %	91 100 %	200 100 %
χ²=4.577 · df=2 · Cramer’s V=0.151 · p=0.101

tab_xtab(var.row = hsb$ses, var.col = hsb$female, 
         show.row.prc = TRUE,
         statistics = "phi")

ses	female		Total
ses	female	male	Total
high	29 50 %	29 50 %	58 100 %
low	32 68.1 %	15 31.9 %	47 100 %
middle	48 50.5 %	47 49.5 %	95 100 %
Total	109 54.5 %	91 45.5 %	200 100 %
χ²=4.577 · df=2 · &phi=0.151 · p=0.101

Regression Tables with `tab_model()`

#Run poisson model of awards on math, read, and ses
m.pois <- glm(awards ~ math + read + ses, family = poisson(), data = hsb)
#Print using tab_model
tab_model(m.pois, dv.labels = c("Poisson Model"))

	Poisson Model
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.03	0.01 – 0.07	<0.001
math	1.05	1.03 – 1.06	<0.001
read	1.03	1.01 – 1.04	<0.001
ses [low]	0.89	0.65 – 1.22	0.491
ses [middle]	0.78	0.61 – 1.00	0.049
Observations	200
R² Nagelkerke	0.629

#Print using tab_model with clustered covariance matrix estimation and add deviance
tab_model(m.pois, vcov.fun = "CL", 
  vcov.args = list(type = "HC1", cluster = hsb$cid), 
  dv.labels = c("Poisson With Cluster-Robust Covariance Matrix"))

	Poisson With Cluster-Robust Covariance Matrix
Predictors	Incidence Rate Ratios	CI	p
(Intercept)	0.03	0.01 – 0.07	<0.001
math	1.05	1.03 – 1.06	<0.001
read	1.03	1.01 – 1.04	<0.001
ses [low]	0.89	0.65 – 1.22	0.487
ses [middle]	0.78	0.61 – 1.00	0.035
Observations	200
R² Nagelkerke	0.629

#Run negative binomial model of awards on math, read, and ses
m.nbin <- glm.nb(awards ~ math + read + ses, data = hsb)

#Print two model together in one table add AIC, deviance
tab_model(m.pois, m.nbin, vcov.fun = "CL", 
  vcov.args = list(type = "HC1", cluster = hsb$cid), show.dev = TRUE, show.aic = TRUE,
  dv.labels = c("Poisson",
    "Negative-binomial"))

	Poisson			Negative-binomial
Predictors	Incidence Rate Ratios	CI	p	Incidence Rate Ratios	CI	p
(Intercept)	0.03	0.01 – 0.07	<0.001	0.03	0.01 – 0.06	<0.001
math	1.05	1.03 – 1.06	<0.001	1.05	1.03 – 1.07	<0.001
read	1.03	1.01 – 1.04	<0.001	1.03	1.01 – 1.05	<0.001
ses [low]	0.89	0.65 – 1.22	0.487	0.89	0.62 – 1.27	0.484
ses [middle]	0.78	0.61 – 1.00	0.035	0.79	0.60 – 1.04	0.040
Observations	200			200
R² Nagelkerke	0.629			0.595
Deviance	256.818			221.505
AIC	613.047			611.548

Summary of Regression Models as HTML Table

`gtsummary`

The gtsummary package is designed to create summary tables for a variety of statistical analyses. It focuses on making publication-ready tables that are easy to generate and aesthetically pleasing.

Key Advantages:

Ease of Use: Minimal coding required to generate complex, publication-quality tables. Flexibility: The ability to customize tables to suit the needs of different publications or audiences.
Integrated Statistical Reporting: Automatically includes relevant statistics such as p-values, confidence intervals, and effect sizes.
Exportability: Ability to export tables to various formats for different types of reports and manuscripts. You can convert gtsummary tables to gt or flextable objects for further customization or export to Microsoft Word and PowerPoint.
gtsummary themes It’s possible to set themes in gtsummary. The themes control many aspects of how a table is printed.

Here are some key features of using `gtsummary`:

Descriptive summary tables: tbl_summary() automatically creates summary statistics tables for data frames, stratified by groups (if desired), with options for mean, median, standard deviation, proportions, and more. Similar to table1 package, this is particularly useful for creating “Table 1” in medical and epidemiological research.
Tables for statistical tests: tbl_cross() creates cross-tabulations (contingency tables) with statistical tests like chi-square or Fisher’s exact test.
Summary tables for regression models: tbl_regression() Generates detailed tables from regression models, including coefficients, confidence intervals, p-values, and more.

Descriptive summary tables

#Use tbl_summary for summary statistics
tab1_gt <- tbl_summary(melanoma1, include = -c(time, year), by = status)
#Print
tab1_gt

Characteristic	Alive N = 134¹	Melanoma death N = 57¹	Non-melanoma death N = 14¹
sex
Male	91 (68%)	28 (49%)	7 (50%)
Female	43 (32%)	29 (51%)	7 (50%)
age	52 (40, 62)	56 (44, 68)	65 (56, 72)
thickness	1.36 (0.81, 2.90)	3.54 (2.24, 4.84)	2.26 (1.29, 6.12)
ulcer
Absent	92 (69%)	16 (28%)	7 (50%)
Present	42 (31%)	41 (72%)	7 (50%)
¹ n (%); Median (Q1, Q3)

#Test for a difference between groups
tab1_gt |>
#  Add P-value column
  add_p() |>
#Change header and Labels
 modify_header(label = "**Variable**") |>
  bold_labels()

Variable	Alive N = 134¹	Melanoma death N = 57¹	Non-melanoma death N = 14¹	p-value²
sex				0.033
Male	91 (68%)	28 (49%)	7 (50%)
Female	43 (32%)	29 (51%)	7 (50%)
age	52 (40, 62)	56 (44, 68)	65 (56, 72)	0.001
thickness	1.36 (0.81, 2.90)	3.54 (2.24, 4.84)	2.26 (1.29, 6.12)	<0.001
ulcer				<0.001
Absent	92 (69%)	16 (28%)	7 (50%)
Present	42 (31%)	41 (72%)	7 (50%)
¹ n (%); Median (Q1, Q3)
² Pearson’s Chi-squared test; Kruskal-Wallis rank sum test

#Statistics for continuous variable to mean and sd
tbl_summary(melanoma1, include = -c(time, year), by = status,
  statistic = all_continuous() ~ "{mean} ({sd})") |>
#Add overall column
add_overall() |>
#Add CI
add_ci(pattern = "{stat} ({ci})", 
       all_categorical() ~ "wald") |>
 #Update spanning headers
 modify_spanning_header(c("stat_2", "stat_3") ~ "**Death**")

Characteristic	Overall N = 205 (95% CI)^1,2	Alive N = 134 (95% CI)^1,2	Death
Characteristic	Overall N = 205 (95% CI)^1,2	Alive N = 134 (95% CI)^1,2	Melanoma death N = 57 (95% CI)^1,2	Non-melanoma death N = 14 (95% CI)^1,2
sex
Male	126 (61%) (55%, 68%)	91 (68%) (60%, 76%)	28 (49%) (35%, 63%)	7 (50%) (20%, 80%)
Female	79 (39%) (32%, 45%)	43 (32%) (24%, 40%)	29 (51%) (37%, 65%)	7 (50%) (20%, 80%)
age	52 (17) (50, 55)	50 (16) (47, 53)	55 (18) (50, 60)	65 (11) (59, 72)
thickness	2.92 (2.96) (2.5, 3.3)	2.24 (2.33) (1.8, 2.6)	4.31 (3.57) (3.4, 5.3)	3.72 (3.63) (1.6, 5.8)
ulcer
Absent	115 (56%) (49%, 63%)	92 (69%) (60%, 77%)	16 (28%) (16%, 41%)	7 (50%) (20%, 80%)
Present	90 (44%) (37%, 51%)	42 (31%) (23%, 40%)	41 (72%) (59%, 84%)	7 (50%) (20%, 80%)
¹ n (%); Mean (SD)
² CI = Confidence Interval

Cross table of categorical variables

tbl_cross(row = ses, col = honors,   percent = "row", data = hsb) |>
  add_p()

	honors		Total	p-value¹
	enrolled	not enrolled	Total	p-value¹
ses				<0.001
high	26 (45%)	32 (55%)	58 (100%)
low	11 (23%)	36 (77%)	47 (100%)
middle	16 (17%)	79 (83%)	95 (100%)
Total	53 (27%)	147 (74%)	200 (100%)
¹ Pearson’s Chi-squared test

# Setting theme "Compact"
theme_gtsummary_compact()

## Setting theme "Compact"

tbl_cross(row = ses, col = honors,   percent = "row", data = hsb) |>
  add_p()

	honors		Total	p-value¹
	enrolled	not enrolled	Total	p-value¹
ses				<0.001
high	26 (45%)	32 (55%)	58 (100%)
low	11 (23%)	36 (77%)	47 (100%)
middle	16 (17%)	79 (83%)	95 (100%)
Total	53 (27%)	147 (74%)	200 (100%)
¹ Pearson’s Chi-squared test

Formatted table of regression model results

#Running a logistic regression model
m2 <- glm(honors ~ math + ses, family = binomial(link = "logit"), data= hsb)
#gtsummary default theme
reset_gtsummary_theme()
tab1.glm <- tbl_regression(m2, exponentiate = TRUE)
tab1.glm

Characteristic	OR¹	95% CI¹	p-value
math	0.84	0.79, 0.88	<0.001
ses
high	—	—
low	1.05	0.35, 3.10	>0.9
middle	3.44	1.43, 8.58	0.007
¹ OR = Odds Ratio, CI = Confidence Interval

#Set theme to Journal of the American Medical Association—JAMA.
theme_gtsummary_journal(journal = "jama")

## Setting theme "JAMA"

tab1.glm

Characteristic	OR¹	95% CI¹	p-value
math	0.84	0.79, 0.88	<0.001
ses
high	—	—
low	1.05	0.35, 3.10	>0.9
middle	3.44	1.43, 8.58	0.007
¹ OR = Odds Ratio, CI = Confidence Interval

#We can covert our table to package gt and modify table output.
tab1.glm |>
  #add overall p-values for ses
  add_global_p() |>
  #bold p-value less than 0.01
  bold_p(t = 0.01) |>
  #covert our table to package gt 
  as_gt() |>
  #add source note from package gt
  #md() interpret input text as Markdown-formatted text
  tab_source_note(md("*This data is simulated*"))

Characteristic	OR¹	95% CI¹	p-value
math	0.84	0.79, 0.88	<0.001
ses			0.011
high	—	—
low	1.05	0.35, 3.10
middle	3.44	1.43, 8.58
This data is simulated
¹ OR = Odds Ratio, CI = Confidence Interval

inline_text()

Reproducible reports are an important part of good practices. We often need to report the results from a table in the text of an R markdown report. Inline reporting has been made simple with inline_text().

The inline_text.tbl_regression has the following format:

inline_text( x, variable, level = NULL, pattern = “{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})“, estimate_fun = x$inputs$estimate_fun, pvalue_fun = label_style_pvalue(prepend_p = TRUE), … )

For example we can use inline_text function inside two backtick, ` r inline_text() `, to report result of a gtsummary table.

For example we can use inline_text() to report the OR of the regression table in the text we can type:

For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of `r inline_text(tab1.glm, variable = math, pattern = ” {estimate}; 95% CI ({conf.low}, {conf.high})“)` keeping ses constant.

In the report it will appear like this:

For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of 0.84; 95% CI (0.79, 0.88) keeping ses constant.

Reference

Sjoberg DD, Whiting K, Curry M, Lavery JA, Larmarange J. Reproducible summary tables with the gtsummary package. The R Journal 2021;13:570–80. https://doi.org/10.32614/RJ-2021-053.