A table is a structured arrangement of data, typically organized in rows and columns. It helps you see and compare data easily, making it simpler to understand and communicate the results.
Key components of a statistical table:
Title: Clearly describes the content and context of the table.
Rows: Represent different categories, groups, or individual data points.
Columns: Indicate variables or measures being reported.
Cells: Contain the actual data values corresponding to the intersection of rows and columns.
Headings: Labels for rows and columns to clarify the data being presented.
Footnotes: Additional information or explanations about the data.
We will explore how to effectively generate and present data using R, with a focus on utilizing RMarkdown for creating professional reports.
In this workshop we will cover r packages: kableExtra
,
flextable
, gt
, gtExtras
,
DT
, sjPlot
, and gtsummary
.
data.frame
in R is a type of data structure used to
store data in a table format. It is one of the most common and versatile
structures in R, allowing for the storage of different types of data
(e.g., numeric, character, factor) in a single object.There are two generic function in R that are used to display the output in the console.
The print()
function is used to display the contents
of an object in the console. It’s the most basic way to output data or
results in R.
The summary()
function provides a quick overview of
the main statistical features of an object.
Both functions above work on various object types, such as vectors, data frames, and models.
Note: Implicit Printing: When we
type an object’s name and run it, R internally calls the
print()
function to display the object’s contents. This is
why you see the output in the console even if you don’t
explicitly use print()
.
The first data that we are using in this workshop is the hsbdemo data set. The data is a sample of high school performance for 200 students.
The first step in any statistical analysis is to understand our data.
Note: The datasets used in this workshop are not real and are intended solely to demonstrate statistical analysis.
#Read the data
hsb <- read.csv("https://stats.idre.ucla.edu/stat/data/hsbdemo.csv")
#Names of columns
names(hsb)
## [1] "id" "female" "ses" "schtyp" "prog" "read" "write"
## [8] "math" "science" "socst" "honors" "awards" "cid"
#Structure of data.frame
str(hsb)
## 'data.frame': 200 obs. of 13 variables:
## $ id : int 45 108 15 67 153 51 164 133 2 53 ...
## $ female : chr "female" "male" "male" "male" ...
## $ ses : chr "low" "middle" "high" "low" ...
## $ schtyp : chr "public" "public" "public" "public" ...
## $ prog : chr "vocation" "general" "vocation" "vocation" ...
## $ read : int 34 34 39 37 39 42 31 50 39 34 ...
## $ write : int 35 33 39 37 31 36 36 31 41 37 ...
## $ math : int 41 41 44 42 40 42 46 40 33 46 ...
## $ science: int 29 36 26 33 39 31 39 34 42 39 ...
## $ socst : int 26 36 42 32 51 39 46 31 41 31 ...
## $ honors : chr "not enrolled" "not enrolled" "not enrolled" "not enrolled" ...
## $ awards : int 0 0 0 0 0 0 0 0 0 0 ...
## $ cid : int 1 1 1 1 1 1 1 1 1 1 ...
#Change categorical variables from character to factors
hsb <- within(hsb,{
female <- factor(female)
ses <- factor(ses)
schtyp <- factor(schtyp)
prog <- factor(prog)
honors <- factor(honors)
})
#Print first 6 rows of data
hsb6 <- head(hsb)
print(hsb6)
## id female ses schtyp prog read write math science socst honors
## 1 45 female low public vocation 34 35 41 29 26 not enrolled
## 2 108 male middle public general 34 33 41 36 36 not enrolled
## 3 15 male high public vocation 39 39 44 26 42 not enrolled
## 4 67 male low public vocation 37 37 42 33 32 not enrolled
## 5 153 male middle public vocation 39 31 40 39 51 not enrolled
## 6 51 female high public general 42 36 42 31 39 not enrolled
## awards cid
## 1 0 1
## 2 0 1
## 3 0 1
## 4 0 1
## 5 0 1
## 6 0 1
By printing the first 6 rows of the data we created a Tabular of the first 6 observations.
We can use summary()
function to report summary
statistics. for example we can get the summary statistics of students
scores and honors.
#Summary statistics
summary(hsb[c("read", "write", "math", "science", "socst", "honors")])
## read write math science
## Min. :28.00 Min. :31.00 Min. :33.00 Min. :26.00
## 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00 1st Qu.:44.00
## Median :50.00 Median :54.00 Median :52.00 Median :53.00
## Mean :52.23 Mean :52.77 Mean :52.65 Mean :51.85
## 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00 3rd Qu.:58.00
## Max. :76.00 Max. :67.00 Max. :75.00 Max. :74.00
## socst honors
## Min. :26.00 enrolled : 53
## 1st Qu.:46.00 not enrolled:147
## Median :52.00
## Mean :52.41
## 3rd Qu.:61.00
## Max. :71.00
table()
and
xtab()
The table()
function from R base
creates
frequency tables that summarize categorical data. We can also use
function xtab
from R stats
package.
In our data we want to make a cross tabulate or contingency table for variables ses and honors.
#Tow-way Contingency Table
tab1 <- table(hsb$ses, hsb$honors)
tab1
##
## enrolled not enrolled
## high 26 32
## low 11 36
## middle 16 79
#Proportional table
prop.table(tab1)
##
## enrolled not enrolled
## high 0.130 0.160
## low 0.055 0.180
## middle 0.080 0.395
#Proportional table by row
prop.table(tab1, margin = 1)
##
## enrolled not enrolled
## high 0.4482759 0.5517241
## low 0.2340426 0.7659574
## middle 0.1684211 0.8315789
#Tow-way Contingency Table
tab2 <- xtabs(~ ses + honors, data = hsb)
tab2
## honors
## ses enrolled not enrolled
## high 26 32
## low 11 36
## middle 16 79
#Proportional table
prop.table(tab2)
## honors
## ses enrolled not enrolled
## high 0.130 0.160
## low 0.055 0.180
## middle 0.080 0.395
#Proportional table by row
prop.table(tab2, margin = 1)
## honors
## ses enrolled not enrolled
## high 0.4482759 0.5517241
## low 0.2340426 0.7659574
## middle 0.1684211 0.8315789
#Summary on a table object will perform a chi-squared test
summary(tab2)
## Call: xtabs(formula = ~ses + honors, data = hsb)
## Number of cases in table: 200
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 14.783, df = 2, p-value = 0.0006164
#Three-way cross tab
tab3 <- xtabs(~ ses + honors + female, data = hsb)
tab3
## , , female = female
##
## honors
## ses enrolled not enrolled
## high 15 14
## low 10 22
## middle 10 38
##
## , , female = male
##
## honors
## ses enrolled not enrolled
## high 11 18
## low 1 14
## middle 6 41
ftable(tab3)
## female female male
## ses honors
## high enrolled 15 11
## not enrolled 14 18
## low enrolled 10 1
## not enrolled 22 14
## middle enrolled 10 6
## not enrolled 38 41
Regression model often used to understand the relationship between a dependent variable and one or more independent variables.
In R we use summary()
function to extract and report
results of a regression model.
As an example we are using hsb data to regress math score on read and write score and prog.
#Run regression of math on read, write, and prog
m1 <- lm(math ~ read + write + prog, data = hsb)
lm.result <- summary(m1)
lm.result
##
## Call:
## lm(formula = math ~ read + write + prog, data = hsb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.257 -4.564 -0.211 4.271 17.527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.20202 3.35561 5.722 3.91e-08 ***
## read 0.37186 0.05685 6.541 5.24e-10 ***
## write 0.29591 0.06149 4.812 2.98e-06 ***
## proggeneral -2.87185 1.18968 -2.414 0.01670 *
## progvocation -3.79862 1.23942 -3.065 0.00249 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.408 on 195 degrees of freedom
## Multiple R-squared: 0.5415, Adjusted R-squared: 0.5321
## F-statistic: 57.57 on 4 and 195 DF, p-value: < 2.2e-16
#Extracting coefficients table (it is a matrix)
lm.result$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.2020151 3.35561280 5.722357 3.911030e-08
## read 0.3718589 0.05684928 6.541138 5.240877e-10
## write 0.2959093 0.06149161 4.812190 2.984266e-06
## proggeneral -2.8718518 1.18968055 -2.413969 1.670404e-02
## progvocation -3.7986171 1.23941526 -3.064846 2.486293e-03
#Adding Confidence interval for coefficients
lm.table <- cbind(lm.result$coefficients, confint(m1))
#Changing the names of columns
colnames(lm.table)[c(5,6)] <- c("LL", "UL")
#Round number to 4 digits and print
round(lm.table, 4)
## Estimate Std. Error t value Pr(>|t|) LL UL
## (Intercept) 19.2020 3.3556 5.7224 0.0000 12.5841 25.8200
## read 0.3719 0.0568 6.5411 0.0000 0.2597 0.4840
## write 0.2959 0.0615 4.8122 0.0000 0.1746 0.4172
## proggeneral -2.8719 1.1897 -2.4140 0.0167 -5.2181 -0.5256
## progvocation -3.7986 1.2394 -3.0648 0.0025 -6.2430 -1.3542
Reproducibility: Embedding code directly in the document ensures that tables can be easily reproduced by anyone, reducing errors from manual copying.
Dynamic Updates: Changes to the data or analysis are automatically reflected in the tables when the document is re-rendered, eliminating the need for manual updates.
Automation: RMarkdown generates and formats tables automatically, streamlining the process and avoiding repetitive tasks like reformatting.
Consistency: Tables maintain a consistent format and style throughout the document, which is particularly useful for large reports.
Customization and Formatting Advanced table customization options allow for professional and polished presentation, without needing to rely on external tools for formatting.
In summary, using RMarkdown to create tables ensures automation, reproducibility, and consistency, while also providing powerful customization and formatting options that are not available with simple copy-pasting from the console.
In the rest of the workshops we are introducing some of those packages with examples.
Advantages of knitr::kable
and
kableExtra
Simplicity: knitr::kable
offers a
straightforward way to create clean tables with minimal code. It’s easy
to use for beginners and perfect for simple tables that don’t require
extensive customization.
Integration with RMarkdown: kable()
is
designed to work seamlessly with RMarkdown, making it easy to generate
tables that fit well within dynamic documents.
Flexibility with kableExtra
: When paired
with kableExtra
, kable
becomes highly
customizable. You can add advanced features like multi-row headers,
colors, borders, alignment adjustments, column spanning, custom styling,
and footnotes.
Theme: kableExtra
offers some alternative
HTML table themes other than the default bootstrap theme.
The kable()
function in package knitr
is a
very simple table generator. It only generates tables for strictly
rectangular data such as matrices and data frames.
This function does have a large number of arguments for you to customize the appearance of tables:
kable(x, format, digits = getOption(“digits”), row.names = NA, col.names = NA, align, caption = NULL, label = NULL, format.args = list(), escape = TRUE, …)
format
is A character string. Possible values are latex,
html, pipe (Pandoc’s pipe tables), … .
If you only need one table format that is not the default format for
a document, you can set the global R option
knitr.table.format
, e.g.,
options(knitr.table.format = “html”)
We are using dataset state.x77.
state7 <- data.frame(state.x77)[1:7,]
knitr::kable(head(state7), format = "html")
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
If we use pipe
format the table will be like below image
which is a Pandoc’s pipe table and depends on type of R Markdown
specifications it will be rendered.
For example, since I use ioslide to create this slides, If I do not
specify format
or I use format = pipe
the
output table will be look like this:
my.table <- knitr::kable(state7)
Nice!
To learn more about knitr::kable()
and it’s options you
can check out the link below:
Package kableExtra
is an addition to
knitr::kable()
. The goal of R package
kableExtra
is to help you build common complex tables and
manipulate table styles. It imports the pipe %>%
symbol
from magrittr (also works with R base pipe, |>
) and
verbalize all the functions, so basically you can add “layers” to a
kable
output in a way that is similar with
ggplot2
.
The basic HTML output is just a plain HTML table without any styling.
#plain HTML
kbl(state7)
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
kable_styling()
will automatically apply twitter
bootstrap theme to the table.
To see more option for this function please check the help file:
?kable_styling
state7 %>%
kbl() %>%
#twitter bootstrap theme
kable_styling()
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
kableExtra
also offers 6 other alternative HTML table
themes other than the default bootstrap theme. They are:
kable_paper
, kable_classic
,
kable_classic_2
, kable_minimal
,
kable_material
and kable_material_dark
.
We can also use options in kable_styling()
to customize
output table.
Here is some examples:
state7 %>%
kbl() %>%
#paper theme with hover and full_width = F
kable_paper("hover", full_width = F)
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
Full width
state7 %>%
kbl(caption = "Recreating booktabs style table") %>%
# classic theme and other options
kable_classic(full_width = F, html_font = "Cambria", position = "left")
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
striped
state7 %>%
kbl() %>%
#material theme with striped rows
kable_material(lightable_options= c("striped"))
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
Column / Row Specification
kbl(state7) %>%
#paper theme
kable_paper(full_width = F) %>%
#Make first column bold and add border
column_spec(1, bold = T, border_right = T) %>%
#Make column 9 width larger and background yellow
column_spec(9, width = "6em", background = "yellow")
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
kbl(state7) %>%
#paper theme
kable_paper(full_width = F) %>%
#Conditional formatting column 2
column_spec(2, color = spec_color(state7$Population, palette = c("black", "red"))) %>%
#Conditional formatting background column 4 text white
column_spec(4, color = "white",
background = spec_color(state7$Illiteracy<=1.5, palette = c("red", "green"))) %>%
#Change fist row angle
row_spec(0, angle = -45)
Population | Income | Illiteracy | Life.Exp | Murder | HS.Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
One of the nice future of kableExtra
which only
available for html format is Scroll box. If you have a huge table and
you want to include it in your website or HTML document but don’t want
to use a lots of space, using scroll box is a good solution.
kbl(state.x77) %>%
kable_paper() %>%
#Add scroll bar
scroll_box(width = "400px", height = "200px")
Population | Income | Illiteracy | Life Exp | Murder | HS Grad | Frost | Area | |
---|---|---|---|---|---|---|---|---|
Alabama | 3615 | 3624 | 2.1 | 69.05 | 15.1 | 41.3 | 20 | 50708 |
Alaska | 365 | 6315 | 1.5 | 69.31 | 11.3 | 66.7 | 152 | 566432 |
Arizona | 2212 | 4530 | 1.8 | 70.55 | 7.8 | 58.1 | 15 | 113417 |
Arkansas | 2110 | 3378 | 1.9 | 70.66 | 10.1 | 39.9 | 65 | 51945 |
California | 21198 | 5114 | 1.1 | 71.71 | 10.3 | 62.6 | 20 | 156361 |
Colorado | 2541 | 4884 | 0.7 | 72.06 | 6.8 | 63.9 | 166 | 103766 |
Connecticut | 3100 | 5348 | 1.1 | 72.48 | 3.1 | 56.0 | 139 | 4862 |
Delaware | 579 | 4809 | 0.9 | 70.06 | 6.2 | 54.6 | 103 | 1982 |
Florida | 8277 | 4815 | 1.3 | 70.66 | 10.7 | 52.6 | 11 | 54090 |
Georgia | 4931 | 4091 | 2.0 | 68.54 | 13.9 | 40.6 | 60 | 58073 |
Hawaii | 868 | 4963 | 1.9 | 73.60 | 6.2 | 61.9 | 0 | 6425 |
Idaho | 813 | 4119 | 0.6 | 71.87 | 5.3 | 59.5 | 126 | 82677 |
Illinois | 11197 | 5107 | 0.9 | 70.14 | 10.3 | 52.6 | 127 | 55748 |
Indiana | 5313 | 4458 | 0.7 | 70.88 | 7.1 | 52.9 | 122 | 36097 |
Iowa | 2861 | 4628 | 0.5 | 72.56 | 2.3 | 59.0 | 140 | 55941 |
Kansas | 2280 | 4669 | 0.6 | 72.58 | 4.5 | 59.9 | 114 | 81787 |
Kentucky | 3387 | 3712 | 1.6 | 70.10 | 10.6 | 38.5 | 95 | 39650 |
Louisiana | 3806 | 3545 | 2.8 | 68.76 | 13.2 | 42.2 | 12 | 44930 |
Maine | 1058 | 3694 | 0.7 | 70.39 | 2.7 | 54.7 | 161 | 30920 |
Maryland | 4122 | 5299 | 0.9 | 70.22 | 8.5 | 52.3 | 101 | 9891 |
Massachusetts | 5814 | 4755 | 1.1 | 71.83 | 3.3 | 58.5 | 103 | 7826 |
Michigan | 9111 | 4751 | 0.9 | 70.63 | 11.1 | 52.8 | 125 | 56817 |
Minnesota | 3921 | 4675 | 0.6 | 72.96 | 2.3 | 57.6 | 160 | 79289 |
Mississippi | 2341 | 3098 | 2.4 | 68.09 | 12.5 | 41.0 | 50 | 47296 |
Missouri | 4767 | 4254 | 0.8 | 70.69 | 9.3 | 48.8 | 108 | 68995 |
Montana | 746 | 4347 | 0.6 | 70.56 | 5.0 | 59.2 | 155 | 145587 |
Nebraska | 1544 | 4508 | 0.6 | 72.60 | 2.9 | 59.3 | 139 | 76483 |
Nevada | 590 | 5149 | 0.5 | 69.03 | 11.5 | 65.2 | 188 | 109889 |
New Hampshire | 812 | 4281 | 0.7 | 71.23 | 3.3 | 57.6 | 174 | 9027 |
New Jersey | 7333 | 5237 | 1.1 | 70.93 | 5.2 | 52.5 | 115 | 7521 |
New Mexico | 1144 | 3601 | 2.2 | 70.32 | 9.7 | 55.2 | 120 | 121412 |
New York | 18076 | 4903 | 1.4 | 70.55 | 10.9 | 52.7 | 82 | 47831 |
North Carolina | 5441 | 3875 | 1.8 | 69.21 | 11.1 | 38.5 | 80 | 48798 |
North Dakota | 637 | 5087 | 0.8 | 72.78 | 1.4 | 50.3 | 186 | 69273 |
Ohio | 10735 | 4561 | 0.8 | 70.82 | 7.4 | 53.2 | 124 | 40975 |
Oklahoma | 2715 | 3983 | 1.1 | 71.42 | 6.4 | 51.6 | 82 | 68782 |
Oregon | 2284 | 4660 | 0.6 | 72.13 | 4.2 | 60.0 | 44 | 96184 |
Pennsylvania | 11860 | 4449 | 1.0 | 70.43 | 6.1 | 50.2 | 126 | 44966 |
Rhode Island | 931 | 4558 | 1.3 | 71.90 | 2.4 | 46.4 | 127 | 1049 |
South Carolina | 2816 | 3635 | 2.3 | 67.96 | 11.6 | 37.8 | 65 | 30225 |
South Dakota | 681 | 4167 | 0.5 | 72.08 | 1.7 | 53.3 | 172 | 75955 |
Tennessee | 4173 | 3821 | 1.7 | 70.11 | 11.0 | 41.8 | 70 | 41328 |
Texas | 12237 | 4188 | 2.2 | 70.90 | 12.2 | 47.4 | 35 | 262134 |
Utah | 1203 | 4022 | 0.6 | 72.90 | 4.5 | 67.3 | 137 | 82096 |
Vermont | 472 | 3907 | 0.6 | 71.64 | 5.5 | 57.1 | 168 | 9267 |
Virginia | 4981 | 4701 | 1.4 | 70.08 | 9.5 | 47.8 | 85 | 39780 |
Washington | 3559 | 4864 | 0.6 | 71.72 | 4.3 | 63.5 | 32 | 66570 |
West Virginia | 1799 | 3617 | 1.4 | 69.48 | 6.7 | 41.6 | 100 | 24070 |
Wisconsin | 4589 | 4468 | 0.7 | 72.48 | 3.0 | 54.5 | 149 | 54464 |
Wyoming | 376 | 4566 | 0.6 | 70.29 | 6.9 | 62.9 | 173 | 97203 |
To learn more about table styles and options in
kable_styling
You can check the link below:
flextable
flextable
is designed to create and format tables that
can be easily exported into Word and PowerPoint documents. It allows
users to create richly formatted tables with features like text
formatting, colors, borders, and alignment, making it ideal for
generating professional-looking tables in document reports.
Advantages of flextable Compared to Other Packages
Extensive Customization: flextable
offers detailed control over the formatting of tables, including text
alignment, fonts, colors, borders, and cell-level styling. This level of
customization goes beyond what simpler packages like kable
can offer, allowing for professional and polished tables.
Integration with Word and PowerPoint: One of the
standout features of flextable
is its seamless integration
with Microsoft Word and PowerPoint through the officer
package. You can directly export beautifully formatted tables into these
documents, making it ideal for users who frequently work with Word and
PowerPoint.
Conditional Formatting: The package allows for conditional formatting based on the values in the table, which is useful for highlighting key data points or making tables more informative visually.
Predefined Themes: flextable
offers
built-in themes that provide consistent, aesthetically pleasing styles
for tables. This reduces the effort needed to style tables while
maintaining a professional appearance.
The main function is flextable
which takes a
data.frame
as argument and returns a flextable
object.
#def
ft <- flextable(hsb[1:10, -13])
ft
id | female | ses | schtyp | prog | read | write | math | science | socst | honors | awards |
---|---|---|---|---|---|---|---|---|---|---|---|
45 | female | low | public | vocation | 34 | 35 | 41 | 29 | 26 | not enrolled | 0 |
108 | male | middle | public | general | 34 | 33 | 41 | 36 | 36 | not enrolled | 0 |
15 | male | high | public | vocation | 39 | 39 | 44 | 26 | 42 | not enrolled | 0 |
67 | male | low | public | vocation | 37 | 37 | 42 | 33 | 32 | not enrolled | 0 |
153 | male | middle | public | vocation | 39 | 31 | 40 | 39 | 51 | not enrolled | 0 |
51 | female | high | public | general | 42 | 36 | 42 | 31 | 39 | not enrolled | 0 |
164 | male | middle | public | vocation | 31 | 36 | 46 | 39 | 46 | not enrolled | 0 |
133 | male | middle | public | vocation | 50 | 31 | 40 | 34 | 31 | not enrolled | 0 |
2 | female | middle | public | vocation | 39 | 41 | 33 | 42 | 41 | not enrolled | 0 |
53 | male | middle | public | vocation | 34 | 37 | 46 | 39 | 31 | not enrolled | 0 |
ft |>
#add header row
add_header_row(
colwidths = c(3, 2, 5, 2),
values = c("Student", "School", "Grades", "Achievements")) |>
#Use theme_vanilla
theme_vanilla() |>
#Add footer
add_footer_lines("This data is simulated and it is not real") |>
color(part = "footer", color = "#666666") |>
#set Caption
set_caption(caption = "First 10 rows of a sample of high school data") |>
#Align header to center
align(align = "center", part = "header", i = 1)
Student | School | Grades | Achievements | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
id | female | ses | schtyp | prog | read | write | math | science | socst | honors | awards |
45 | female | low | public | vocation | 34 | 35 | 41 | 29 | 26 | not enrolled | 0 |
108 | male | middle | public | general | 34 | 33 | 41 | 36 | 36 | not enrolled | 0 |
15 | male | high | public | vocation | 39 | 39 | 44 | 26 | 42 | not enrolled | 0 |
67 | male | low | public | vocation | 37 | 37 | 42 | 33 | 32 | not enrolled | 0 |
153 | male | middle | public | vocation | 39 | 31 | 40 | 39 | 51 | not enrolled | 0 |
51 | female | high | public | general | 42 | 36 | 42 | 31 | 39 | not enrolled | 0 |
164 | male | middle | public | vocation | 31 | 36 | 46 | 39 | 46 | not enrolled | 0 |
133 | male | middle | public | vocation | 50 | 31 | 40 | 34 | 31 | not enrolled | 0 |
2 | female | middle | public | vocation | 39 | 41 | 33 | 42 | 41 | not enrolled | 0 |
53 | male | middle | public | vocation | 34 | 37 | 46 | 39 | 31 | not enrolled | 0 |
This data is simulated and it is not real |
The flextable
package will not aggregate data for you
but it will help you to present aggregated data. However, it has some
useful function to generate descriptive statistics.
proc_freq()
Function proc_freq()
compute a contingency table and
create a flextable
from the result. The aim of the function
is to reproduce the results of the SAS
PROC FREQ
.
proc_freq(hsb, "ses", "honors",
include.row_percent = TRUE,
include.column_percent = TRUE,
include.table_percent = TRUE)
ses | honors | |||
---|---|---|---|---|
enrolled | not enrolled | Total | ||
high | Count | 26 (13.0%) | 32 (16.0%) | 58 (29.0%) |
Mar. pct (1) | 49.1% ; 44.8% | 21.8% ; 55.2% | ||
low | Count | 11 (5.5%) | 36 (18.0%) | 47 (23.5%) |
Mar. pct | 20.8% ; 23.4% | 24.5% ; 76.6% | ||
middle | Count | 16 (8.0%) | 79 (39.5%) | 95 (47.5%) |
Mar. pct | 30.2% ; 16.8% | 53.7% ; 83.2% | ||
Total | Count | 53 (26.5%) | 147 (73.5%) | 200 (100.0%) |
(1) Columns and rows percentages |
There are many more flexibility in the flextable
package, especially when used in conjunction with other packages, that
we cannot cover in this workshop.
For more on flextable
you can check the links below:
The R package DT
provides an R interface to the
JavaScript library DataTables
. R data objects (matrices or
data frames) can be displayed as tables on HTML pages, and
DataTables
provides interactive table with filtering,
pagination, sorting, and many other features in the tables.
Key Features of DT
Package
DT
creates interactive
tables with features like sorting, and searching, ideal for web use and
Shiny apps. flextable
and kableExtra
focus on
static tables.– JavaScript Integration: DT
leverages
DataTables for advanced client-side features like inline editing and
exporting, making it great for web applications.
– Ease of Use for Web Applications: DT
is best for web applications and easy to use and implement.
The main function in this package is datatable()
. It
creates an HTML widget to display R data objects with
DataTables
.
datatable(diamonds[1:200,])
If you are familiar with DataTables
Javascript HTML
table library, you may use the options argument to customize the
table.
We can added a filter argument in datatable()
to
automatically generate column filters. By default, the filters are not
shown since filter = "none"
. You can enable these filters
by filter = "top"
or "bottom"
.
#Add filter
datatable(diamonds[1:200,], filter = 'top', options = list(
pageLength = 5, autoWidth = TRUE
))
For more examples and options for package DT
you can
check the link below:
Package gt
is aimed to distinguish between data tables
(e.g., tibbles, data.frames, etc.) and presentation tables and summary
tables.
Advantage of the gt
Package
Customization:
gt provides extensive options for customizing table appearance, including fonts, colors, borders, and spacing. This allows for creating visually appealing and professionally formatted tables.
Easy to Use:
The package has a user-friendly syntax that simplifies the creation of complex tables. It’s designed to be intuitive and easy to learn, making table creation straightforward.
Integration with RMarkdown:
gt
integrates well with RMarkdown, enabling you to
include sophisticated tables in dynamic documents. It supports rendering
in HTML and integrates seamlessly into RMarkdown reports.
Publication-Ready Tables:
gt
is designed for generating publication-quality tables
that are clean and well-formatted. It’s ideal for academic papers,
reports, and presentations where table aesthetics are
important.
Here we run one simple example from the package reference page. The
package gt
is very similar to package
flextable
but it currently supports HTML,
LaTex, and RTF. Package
flextable
is compatible with Microsoft software like word
and power point.
# Modify the `airquality` dataset by adding the year
# of the measurements (1973) and limiting to 10 rows
airquality_m <-
airquality |>
#add year 1973
mutate(Year = 1973L) |>
#select the first 10 rows
slice(1:10)
# Create a display table using the `airquality`
# dataset; arrange columns into groups
gt_tbl <-
gt(airquality_m)
#Print gt table
gt_tbl
Ozone | Solar.R | Wind | Temp | Month | Day | Year |
---|---|---|---|---|---|---|
41 | 190 | 7.4 | 67 | 5 | 1 | 1973 |
36 | 118 | 8.0 | 72 | 5 | 2 | 1973 |
12 | 149 | 12.6 | 74 | 5 | 3 | 1973 |
18 | 313 | 11.5 | 62 | 5 | 4 | 1973 |
NA | NA | 14.3 | 56 | 5 | 5 | 1973 |
28 | NA | 14.9 | 66 | 5 | 6 | 1973 |
23 | 299 | 8.6 | 65 | 5 | 7 | 1973 |
19 | 99 | 13.8 | 59 | 5 | 8 | 1973 |
8 | 19 | 20.1 | 61 | 5 | 9 | 1973 |
NA | 194 | 8.6 | 69 | 5 | 10 | 1973 |
gt_tbl |>
#Add title and subtitle
tab_header(
title = "New York Air Quality Measurements",
subtitle = "Daily measurements in New York City (May 1-10, 1973)"
) |>
#Span columns
tab_spanner(
label = "Time",
columns = c(Year, Month, Day)
) |>
tab_spanner(
label = "Measurement",
columns = c(Ozone, Solar.R, Wind, Temp)
)
New York Air Quality Measurements | ||||||
Daily measurements in New York City (May 1-10, 1973) | ||||||
Measurement | Time | |||||
---|---|---|---|---|---|---|
Ozone | Solar.R | Wind | Temp | Year | Month | Day |
41 | 190 | 7.4 | 67 | 1973 | 5 | 1 |
36 | 118 | 8.0 | 72 | 1973 | 5 | 2 |
12 | 149 | 12.6 | 74 | 1973 | 5 | 3 |
18 | 313 | 11.5 | 62 | 1973 | 5 | 4 |
NA | NA | 14.3 | 56 | 1973 | 5 | 5 |
28 | NA | 14.9 | 66 | 1973 | 5 | 6 |
23 | 299 | 8.6 | 65 | 1973 | 5 | 7 |
19 | 99 | 13.8 | 59 | 1973 | 5 | 8 |
8 | 19 | 20.1 | 61 | 1973 | 5 | 9 |
NA | 194 | 8.6 | 69 | 1973 | 5 | 10 |
The reference for package gt
gtExtras
Package gtExtras
also provide additional functions to assist with package
gt
, specially if you want to include plots in your
tables:
Overall, there are four families of functions in gtExtras:
Themes: 7 themes that style almost every element of a gt table, built off of data journalism-styled tables
Utilities: Helper functions for aligning/padding numbers, adding fontawesome icons, images, highlighting, dividers, styling by group, creating two tables or two column layouts, extracting ordered data from a gt table internals, or generating a random dataset.
Plotting: 12 plotting functions for inline sparklines, win-loss charts, distributions (density/histogram), percentiles, dot + bar, bar charts, confidence intervals, or summarizing an entire dataframe!
Colors: 3 functions, a palette for “Hulk” style scale (purple/green), coloring rows with good defaults from paletteer, or adding a “color box” along with the cell value
gt_tbl %>%
#USe theme NYT
gt_theme_nytimes() %>%
#Change header title
tab_header(title = "Table styled like the NY Times") %>%
#Hulk data_color
#Trim provides a tighter range of purple/green
gt_hulk_col_numeric(Ozone, trim = TRUE)
Table styled like the NY Times | ||||||
Ozone | Solar.R | Wind | Temp | Month | Day | Year |
---|---|---|---|---|---|---|
41 | 190 | 7.4 | 67 | 5 | 1 | 1973 |
36 | 118 | 8.0 | 72 | 5 | 2 | 1973 |
12 | 149 | 12.6 | 74 | 5 | 3 | 1973 |
18 | 313 | 11.5 | 62 | 5 | 4 | 1973 |
NA | NA | 14.3 | 56 | 5 | 5 | 1973 |
28 | NA | 14.9 | 66 | 5 | 6 | 1973 |
23 | 299 | 8.6 | 65 | 5 | 7 | 1973 |
19 | 99 | 13.8 | 59 | 5 | 8 | 1973 |
8 | 19 | 20.1 | 61 | 5 | 9 | 1973 |
NA | 194 | 8.6 | 69 | 5 | 10 | 1973 |
For more options and future in the package gtExtras
check the package references:
There are some packages in R that will create publication-ready summary tables in R with minimal effort. Those packages have built-in functions to generate standard format summary tables.
In this part we are going to use R packages table1
,
SjPlot
, gtsummary
, and
stargazer
.
The advantage of using those packages are that you do not need to write extra data preparation codes to summarize data and analysis in a data.frame. However, we lose flexibility and it will be more difficult to customize tables.
table1
PackageIn journal articles, particularly in fields like epidemiology and health data, the first table (commonly referred to as “Table 1”), presents the descriptive statistics of baseline characteristics of the study sample. This table is typically stratified by one or more grouping variables, such as treatment groups or demographic categories.
The table1
package in R simplifies the creation of such
tables.
Descriptive Statistics: Provides means, medians, standard deviations, and proportions for various variables.
Stratification: Allows for grouping and stratifying by one or more categorical variables.
Customization: Although, the package offers options for customizing the appearance and content of the table to meet publication standards but it is not straightforward and easy to customize your table.
Easy to use: table1
is easy to use.
However, users might find it challenging to customize.
Converting to to other packages: One advantage
of this package is that it is possible (with some limitations) to
convert the output of table1()
to a
data.frame
, kableExtra
or
flextable
, using the functions
as.data.frame()
, t1kable()
and
t1flex()
respectively.
table1
The data used for this example if from package boot
,
called melanoma. The data consist of measurements made
on patients with malignant melanoma.
The Grouping variable is patients status at the end of the study. 1 indicates that they had died from melanoma, 2 indicates that they were still alive and 3 indicates that they had died from causes unrelated to their melanoma.
melanoma1 <- melanoma
# Change status to factor
melanoma1$status <-
factor(melanoma1$status,
levels=c(2,1,3),
labels=c("Alive", # Reference
"Melanoma death",
"Non-melanoma death"))
# Change sex to factor and label them
melanoma1$sex <-
factor(melanoma1$sex, labels = c("Male",
"Female"))
# Change ulcer to factor and label them
melanoma1$ulcer <-
factor(melanoma1$ulcer, labels = c("Absent",
"Present"))
#Basic table 1
(table1.1 <- table1(~ sex + age + ulcer + thickness | status, data=melanoma1))
Alive (N=134) |
Melanoma death (N=57) |
Non-melanoma death (N=14) |
Overall (N=205) |
|
---|---|---|---|---|
sex | ||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) |
age | ||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) |
Median [Min, Max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] |
ulcer | ||||
Absent | 92 (68.7%) | 16 (28.1%) | 7 (50.0%) | 115 (56.1%) |
Present | 42 (31.3%) | 41 (71.9%) | 7 (50.0%) | 90 (43.9%) |
thickness | ||||
Mean (SD) | 2.24 (2.33) | 4.31 (3.57) | 3.72 (3.63) | 2.92 (2.96) |
Median [Min, Max] | 1.36 [0.100, 12.9] | 3.54 [0.320, 17.4] | 2.26 [0.160, 12.6] | 1.94 [0.100, 17.4] |
To improve things, we can create factors with descriptive labels for the categorical variables (sex and ulcer), label each variable the way we want, and specify units for the continuous variables (age and thickness). We also specify that the overall column to be labeled “Total” and be positioned on the left, and add a caption and footnote.
melanoma2 <- melanoma1
#Label the variables name
label(melanoma2$sex) <- "Sex"
label(melanoma2$age) <- "Age"
label(melanoma2$ulcer) <- "Ulceration"
#I Use asterisk for footnote!
label(melanoma2$thickness) <- "Thickness *"
#Assign unit to age and thickness
units(melanoma2$age) <- "years"
units(melanoma2$thickness) <- "mm"
#create caption
caption <- "Descriptive statistics of patients characteristics by status"
#create footnote
footnote <- "* Also known as Breslow thickness"
#Create table1
table1(~ sex + age + ulcer + thickness | status, data=melanoma2,
overall=c(left="Total"), caption=caption, footnote=footnote)
Total (N=205) |
Alive (N=134) |
Melanoma death (N=57) |
Non-melanoma death (N=14) |
|
---|---|---|---|---|
* Also known as Breslow thickness | ||||
Sex | ||||
Male | 126 (61.5%) | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) |
Female | 79 (38.5%) | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) |
Age (years) | ||||
Mean (SD) | 52.5 (16.7) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) |
Median [Min, Max] | 54.0 [4.00, 95.0] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] |
Ulceration | ||||
Absent | 115 (56.1%) | 92 (68.7%) | 16 (28.1%) | 7 (50.0%) |
Present | 90 (43.9%) | 42 (31.3%) | 41 (71.9%) | 7 (50.0%) |
Thickness * (mm) | ||||
Mean (SD) | 2.92 (2.96) | 2.24 (2.33) | 4.31 (3.57) | 3.72 (3.63) |
Median [Min, Max] | 1.94 [0.100, 17.4] | 1.36 [0.100, 12.9] | 3.54 [0.320, 17.4] | 2.26 [0.160, 12.6] |
Now we grouped together two “Death” strata (Melanoma and Non-melanoma) under a common heading.
#label the variables sex. age, ulcer, and Thickness
#Add groups label
labels <- list(
variables=list(sex="Sex",
age="Age (years)",
ulcer="Ulceration",
thickness="Thickness* (mm)"),
groups=list("", "", "Death"))
# Remove the word "death" from the labels, since it now appears above
levels(melanoma2$status) <- c("Alive", "Melanoma", "Non-melanoma")
#Set up our “strata”, or column, as a list of data.frame
strata <- c(list(Total=melanoma2), split(melanoma2, melanoma2$status))
#Create new table1
table1(strata, labels, groupspan=c(1, 1, 2), caption=caption, footnote=footnote)
Death |
||||
---|---|---|---|---|
Total (N=205) |
Alive (N=134) |
Melanoma (N=57) |
Non-melanoma (N=14) |
|
* Also known as Breslow thickness | ||||
Sex | ||||
Male | 126 (61.5%) | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) |
Female | 79 (38.5%) | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) |
Age (years) | ||||
Mean (SD) | 52.5 (16.7) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) |
Median [Min, Max] | 54.0 [4.00, 95.0] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] |
Ulceration | ||||
Absent | 115 (56.1%) | 92 (68.7%) | 16 (28.1%) | 7 (50.0%) |
Present | 90 (43.9%) | 42 (31.3%) | 41 (71.9%) | 7 (50.0%) |
Thickness* (mm) | ||||
Mean (SD) | 2.92 (2.96) | 2.24 (2.33) | 4.31 (3.57) | 3.72 (3.63) |
Median [Min, Max] | 1.94 [0.100, 17.4] | 1.36 [0.100, 12.9] | 3.54 [0.320, 17.4] | 2.26 [0.160, 12.6] |
You can see that Customizing in table1 in the last step is not very easy and need some extra work.
Converting to flextable
Now lets try to convert it to flextable
and customize it
there!
#Converting to flextable
tab1.flex <- table1.1 |> t1flex()
#Print
tab1.flex
| Alive | Melanoma death | Non-melanoma death | Overall |
---|---|---|---|---|
sex | ||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) |
age | ||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) |
Median [Min, Max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] |
ulcer | ||||
Absent | 92 (68.7%) | 16 (28.1%) | 7 (50.0%) | 115 (56.1%) |
Present | 42 (31.3%) | 41 (71.9%) | 7 (50.0%) | 90 (43.9%) |
thickness | ||||
Mean (SD) | 2.24 (2.33) | 4.31 (3.57) | 3.72 (3.63) | 2.92 (2.96) |
Median [Min, Max] | 1.36 [0.100, 12.9] | 3.54 [0.320, 17.4] | 2.26 [0.160, 12.6] | 1.94 [0.100, 17.4] |
#Modify tab1.flex
tab1.flex |>
#add header row
add_header_row(
values = c("", "Death", ""), # Labels for the top header row
colwidths = c(2, 2, 1) # Number of columns spanned by each header
) |>
#Fist remove the borderline under death
hline(part = "header", i = 1, border = officer::fp_border(width = 0))|>
# Line over Melanoma and Non-melanoma
hline(part = "header", i = 1, border = officer::fp_border(width = 1.5), j = 3:4) |>
#Change lables for rows!
compose(i = 1, j = 1, as_paragraph(as_chunk("SEX"))) |>
compose(i = 4, j = 1, as_paragraph(as_chunk("AGE (years)"))) |>
compose(i = 7, j = 1, as_paragraph(as_chunk("Ulceration"))) |>
compose(i = 10, j = 1, as_paragraph(as_chunk("Thickness (mm)"))) |>
#Add Caption
set_caption(ft, caption = "Table 1: Descriptive statistics of patients characteristics by status") |>
##Add footnote
footnote( i = 10, j = 1,
ref_symbols = "a",
value = as_paragraph("Also known as Breslow thickness")
) |>
fontsize(i = 1, j =1 , size = 9, part = "footer")
Death | ||||
---|---|---|---|---|
| Alive | Melanoma death | Non-melanoma death | Overall |
SEX | ||||
Male | 91 (67.9%) | 28 (49.1%) | 7 (50.0%) | 126 (61.5%) |
Female | 43 (32.1%) | 29 (50.9%) | 7 (50.0%) | 79 (38.5%) |
AGE (years) | ||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 52.5 (16.7) |
Median [Min, Max] | 52.0 [4.00, 84.0] | 56.0 [14.0, 95.0] | 65.0 [49.0, 86.0] | 54.0 [4.00, 95.0] |
Ulceration | ||||
Absent | 92 (68.7%) | 16 (28.1%) | 7 (50.0%) | 115 (56.1%) |
Present | 42 (31.3%) | 41 (71.9%) | 7 (50.0%) | 90 (43.9%) |
Thickness (mm)a | ||||
Mean (SD) | 2.24 (2.33) | 4.31 (3.57) | 3.72 (3.63) | 2.92 (2.96) |
Median [Min, Max] | 1.36 [0.100, 12.9] | 3.54 [0.320, 17.4] | 2.26 [0.160, 12.6] | 1.94 [0.100, 17.4] |
aAlso known as Breslow thickness |
You may find it somewhat more line of the code but the steps are more straightforward and clear.
To learn more about package table1
see the link
below:
Using the table1 Package to Create HTML Tables of Descriptive Statistics
sjPlot
package is a collection of plotting and table
output functions for data visualization.
Results of various statistical analyses (that are commonly used in social sciences) can be visualized using this package, including simple and cross tabulated frequencies, linear models, glm models, mixed effects models, PCA and correlation matrices, cluster analyses, and much more.
sjPlot
:Cross tabulation: tab_xtab()
Creates cross-tabulations with options for adding row and column
percentages.
Regression Tables: tab_model()
Create tables of regression models with detailed statistical summaries,
including coefficients, standard errors, p-values, and confidence
intervals. Supports various model types such as linear, logistic, and
mixed-effects models.
Multiple models: Combine results from multiple models into a single table for comparative analysis.
tab_xtab()
#Cross tab for ses and "honors for hsb data
tab_xtab(var.row = hsb$ses, var.col = hsb$female,
show.col.prc = TRUE)
ses | female | Total | |
---|---|---|---|
female | male | ||
high |
29 26.6 % |
29 31.9 % |
58 29 % |
low |
32 29.4 % |
15 16.5 % |
47 23.5 % |
middle |
48 44 % |
47 51.6 % |
95 47.5 % |
Total |
109 100 % |
91 100 % |
200 100 % |
χ2=4.577 · df=2 · Cramer’s V=0.151 · p=0.101 |
tab_xtab(var.row = hsb$ses, var.col = hsb$female,
show.row.prc = TRUE,
statistics = "phi")
ses | female | Total | |
---|---|---|---|
female | male | ||
high |
29 50 % |
29 50 % |
58 100 % |
low |
32 68.1 % |
15 31.9 % |
47 100 % |
middle |
48 50.5 % |
47 49.5 % |
95 100 % |
Total |
109 54.5 % |
91 45.5 % |
200 100 % |
χ2=4.577 · df=2 · &phi=0.151 · p=0.101 |
tab_model()
#Run poisson model of awards on math, read, and ses
m.pois <- glm(awards ~ math + read + ses, family = poisson(), data = hsb)
#Print using tab_model
tab_model(m.pois, dv.labels = c("Poisson Model"))
Poisson Model | |||
---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p |
(Intercept) | 0.03 | 0.01 – 0.07 | <0.001 |
math | 1.05 | 1.03 – 1.06 | <0.001 |
read | 1.03 | 1.01 – 1.04 | <0.001 |
ses [low] | 0.89 | 0.65 – 1.22 | 0.491 |
ses [middle] | 0.78 | 0.61 – 1.00 | 0.049 |
Observations | 200 | ||
R2 Nagelkerke | 0.629 |
#Print using tab_model with clustered covariance matrix estimation and add deviance
tab_model(m.pois, vcov.fun = "CL",
vcov.args = list(type = "HC1", cluster = hsb$cid),
dv.labels = c("Poisson With Cluster-Robust Covariance Matrix"))
Poisson With Cluster-Robust Covariance Matrix | |||
---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p |
(Intercept) | 0.03 | 0.01 – 0.07 | <0.001 |
math | 1.05 | 1.03 – 1.06 | <0.001 |
read | 1.03 | 1.01 – 1.04 | <0.001 |
ses [low] | 0.89 | 0.65 – 1.22 | 0.487 |
ses [middle] | 0.78 | 0.61 – 1.00 | 0.035 |
Observations | 200 | ||
R2 Nagelkerke | 0.629 |
#Run negative binomial model of awards on math, read, and ses
m.nbin <- glm.nb(awards ~ math + read + ses, data = hsb)
#Print two model together in one table add AIC, deviance
tab_model(m.pois, m.nbin, vcov.fun = "CL",
vcov.args = list(type = "HC1", cluster = hsb$cid), show.dev = TRUE, show.aic = TRUE,
dv.labels = c("Poisson",
"Negative-binomial"))
Poisson | Negative-binomial | |||||
---|---|---|---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p | Incidence Rate Ratios | CI | p |
(Intercept) | 0.03 | 0.01 – 0.07 | <0.001 | 0.03 | 0.01 – 0.06 | <0.001 |
math | 1.05 | 1.03 – 1.06 | <0.001 | 1.05 | 1.03 – 1.07 | <0.001 |
read | 1.03 | 1.01 – 1.04 | <0.001 | 1.03 | 1.01 – 1.05 | <0.001 |
ses [low] | 0.89 | 0.65 – 1.22 | 0.487 | 0.89 | 0.62 – 1.27 | 0.484 |
ses [middle] | 0.78 | 0.61 – 1.00 | 0.035 | 0.79 | 0.60 – 1.04 | 0.040 |
Observations | 200 | 200 | ||||
R2 Nagelkerke | 0.629 | 0.595 | ||||
Deviance | 256.818 | 221.505 | ||||
AIC | 613.047 | 611.548 |
gtsummary
The gtsummary
package is designed to create summary
tables for a variety of statistical analyses. It focuses on making
publication-ready tables that are easy to generate and aesthetically
pleasing.
Ease of Use: Minimal coding required to generate complex, publication-quality tables. Flexibility: The ability to customize tables to suit the needs of different publications or audiences.
Integrated Statistical Reporting: Automatically includes relevant statistics such as p-values, confidence intervals, and effect sizes.
Exportability: Ability to export tables to
various formats for different types of reports and manuscripts. You can
convert gtsummary
tables to gt
or
flextable
objects for further customization or export to
Microsoft Word and PowerPoint.
gtsummary themes It’s possible to set themes in
gtsummary
. The themes control many aspects of how a table
is printed.
gtsummary
:Descriptive summary tables:
tbl_summary()
automatically creates summary statistics
tables for data frames, stratified by groups (if desired), with options
for mean, median, standard deviation, proportions, and more. Similar to
table1
package, this is particularly useful for creating
“Table 1” in medical and epidemiological research.
Tables for statistical tests:
tbl_cross()
creates cross-tabulations (contingency tables)
with statistical tests like chi-square or Fisher’s exact test.
Summary tables for regression models:
tbl_regression()
Generates detailed tables from regression
models, including coefficients, confidence intervals, p-values, and
more.
#Use tbl_summary for summary statistics
tab1_gt <- tbl_summary(melanoma1, include = -c(time, year), by = status)
#Print
tab1_gt
Characteristic | Alive N = 1341 |
Melanoma death N = 571 |
Non-melanoma death N = 141 |
---|---|---|---|
sex | |||
Male | 91 (68%) | 28 (49%) | 7 (50%) |
Female | 43 (32%) | 29 (51%) | 7 (50%) |
age | 52 (40, 62) | 56 (44, 68) | 65 (56, 72) |
thickness | 1.36 (0.81, 2.90) | 3.54 (2.24, 4.84) | 2.26 (1.29, 6.12) |
ulcer | |||
Absent | 92 (69%) | 16 (28%) | 7 (50%) |
Present | 42 (31%) | 41 (72%) | 7 (50%) |
1 n (%); Median (Q1, Q3) |
#Test for a difference between groups
tab1_gt |>
# Add P-value column
add_p() |>
#Change header and Labels
modify_header(label = "**Variable**") |>
bold_labels()
Variable | Alive N = 1341 |
Melanoma death N = 571 |
Non-melanoma death N = 141 |
p-value2 |
---|---|---|---|---|
sex | 0.033 | |||
Male | 91 (68%) | 28 (49%) | 7 (50%) | |
Female | 43 (32%) | 29 (51%) | 7 (50%) | |
age | 52 (40, 62) | 56 (44, 68) | 65 (56, 72) | 0.001 |
thickness | 1.36 (0.81, 2.90) | 3.54 (2.24, 4.84) | 2.26 (1.29, 6.12) | <0.001 |
ulcer | <0.001 | |||
Absent | 92 (69%) | 16 (28%) | 7 (50%) | |
Present | 42 (31%) | 41 (72%) | 7 (50%) | |
1 n (%); Median (Q1, Q3) | ||||
2 Pearson’s Chi-squared test; Kruskal-Wallis rank sum test |
#Statistics for continuous variable to mean and sd
tbl_summary(melanoma1, include = -c(time, year), by = status,
statistic = all_continuous() ~ "{mean} ({sd})") |>
#Add overall column
add_overall() |>
#Add CI
add_ci(pattern = "{stat} ({ci})",
all_categorical() ~ "wald") |>
#Update spanning headers
modify_spanning_header(c("stat_2", "stat_3") ~ "**Death**")
Characteristic | Overall N = 205 (95% CI)1,2 |
Alive N = 134 (95% CI)1,2 |
Death | |
---|---|---|---|---|
Melanoma death N = 57 (95% CI)1,2 |
Non-melanoma death N = 14 (95% CI)1,2 |
|||
sex | ||||
Male | 126 (61%) (55%, 68%) | 91 (68%) (60%, 76%) | 28 (49%) (35%, 63%) | 7 (50%) (20%, 80%) |
Female | 79 (39%) (32%, 45%) | 43 (32%) (24%, 40%) | 29 (51%) (37%, 65%) | 7 (50%) (20%, 80%) |
age | 52 (17) (50, 55) | 50 (16) (47, 53) | 55 (18) (50, 60) | 65 (11) (59, 72) |
thickness | 2.92 (2.96) (2.5, 3.3) | 2.24 (2.33) (1.8, 2.6) | 4.31 (3.57) (3.4, 5.3) | 3.72 (3.63) (1.6, 5.8) |
ulcer | ||||
Absent | 115 (56%) (49%, 63%) | 92 (69%) (60%, 77%) | 16 (28%) (16%, 41%) | 7 (50%) (20%, 80%) |
Present | 90 (44%) (37%, 51%) | 42 (31%) (23%, 40%) | 41 (72%) (59%, 84%) | 7 (50%) (20%, 80%) |
1 n (%); Mean (SD) | ||||
2 CI = Confidence Interval |
tbl_cross(row = ses, col = honors, percent = "row", data = hsb) |>
add_p()
honors | Total | p-value1 | ||
---|---|---|---|---|
enrolled | not enrolled | |||
ses | <0.001 | |||
high | 26 (45%) | 32 (55%) | 58 (100%) | |
low | 11 (23%) | 36 (77%) | 47 (100%) | |
middle | 16 (17%) | 79 (83%) | 95 (100%) | |
Total | 53 (27%) | 147 (74%) | 200 (100%) | |
1 Pearson’s Chi-squared test |
# Setting theme "Compact"
theme_gtsummary_compact()
## Setting theme "Compact"
tbl_cross(row = ses, col = honors, percent = "row", data = hsb) |>
add_p()
honors | Total | p-value1 | ||
---|---|---|---|---|
enrolled | not enrolled | |||
ses | <0.001 | |||
high | 26 (45%) | 32 (55%) | 58 (100%) | |
low | 11 (23%) | 36 (77%) | 47 (100%) | |
middle | 16 (17%) | 79 (83%) | 95 (100%) | |
Total | 53 (27%) | 147 (74%) | 200 (100%) | |
1 Pearson’s Chi-squared test |
#Running a logistic regression model
m2 <- glm(honors ~ math + ses, family = binomial(link = "logit"), data= hsb)
#gtsummary default theme
reset_gtsummary_theme()
tab1.glm <- tbl_regression(m2, exponentiate = TRUE)
tab1.glm
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
math | 0.84 | 0.79, 0.88 | <0.001 |
ses | |||
high | — | — | |
low | 1.05 | 0.35, 3.10 | >0.9 |
middle | 3.44 | 1.43, 8.58 | 0.007 |
1 OR = Odds Ratio, CI = Confidence Interval |
#Set theme to Journal of the American Medical Association—JAMA.
theme_gtsummary_journal(journal = "jama")
## Setting theme "JAMA"
tab1.glm
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
math | 0.84 | 0.79, 0.88 | <0.001 |
ses | |||
high | — | — | |
low | 1.05 | 0.35, 3.10 | >0.9 |
middle | 3.44 | 1.43, 8.58 | 0.007 |
1 OR = Odds Ratio, CI = Confidence Interval |
#We can covert our table to package gt and modify table output.
tab1.glm |>
#add overall p-values for ses
add_global_p() |>
#bold p-value less than 0.01
bold_p(t = 0.01) |>
#covert our table to package gt
as_gt() |>
#add source note from package gt
#md() interpret input text as Markdown-formatted text
tab_source_note(md("*This data is simulated*"))
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
math | 0.84 | 0.79, 0.88 | <0.001 |
ses | 0.011 | ||
high | — | — | |
low | 1.05 | 0.35, 3.10 | |
middle | 3.44 | 1.43, 8.58 | |
This data is simulated | |||
1 OR = Odds Ratio, CI = Confidence Interval |
inline_text()
Reproducible reports are an important part of good practices. We
often need to report the results from a table in the text of an R
markdown report. Inline reporting has been made simple with
inline_text()
.
The inline_text.tbl_regression
has the following
format:
inline_text( x, variable, level = NULL, pattern = “{estimate} ({conf.level*100}% CI {conf.low}, {conf.high}; {p.value})“, estimate_fun = x$inputs$estimate_fun, pvalue_fun = label_style_pvalue(prepend_p = TRUE), … )
For example we can use inline_text
function inside two
backtick, ` r
inline_text()
`, to report
result of a gtsummary
table.
For example we can use inline_text()
to report the OR of
the regression table in the text we can type:
For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of `r inline_text(tab1.glm, variable = math, pattern = ” {estimate}; 95% CI ({conf.low}, {conf.high})“)` keeping ses constant.
In the report it will appear like this:
For every unit increase of math score we expect on average the odds of not enrolled in honors program changes by a factor of 0.84; 95% CI (0.79, 0.88) keeping ses constant.
Reference
Sjoberg DD, Whiting K, Curry M, Lavery JA, Larmarange J. Reproducible summary tables with the gtsummary package. The R Journal 2021;13:570–80. https://doi.org/10.32614/RJ-2021-053.