How can I generate a Venn diagram in R?

Venn diagrams are a very commonly used graphing technique that illustrates levels of overlap between groups in data. They can be created in R using code written as part of the Bioconductor Project. We are following the directions supplied here for installing a package for linear models for microarray data (limma).

For more information on how to install packages from Bioconductor you can look at this page: https://bioconductor.org/install

Note that you need to install the latest version of R for this package to work properly!

To install the package limma we need to install package BiocManager.

install.packages("BiocManager")
BiocManager::install("limma")

Finally, we need to load this package.

library(limma)

We can now use the commands in this package for generating Venn diagrams. The data needed for a Venn diagram consists of a set of binary variables indicating membership. We will be using the hsb2 dataset consisting of data from 200 students including scores from writing, reading, and math tests. We will create indicators for “high” values in each of these variables and generate Venn diagrams that tell us about the degree of overlap in high math, writing, and reading scores.


hsb2 <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb2-3.csv") 
attach(hsb2)
hw <- (write >= 60)
hm <- (math >= 60)
hr <- (read >= 60)
c3 <- cbind(hw, hm, hr)

Next, we can use the vennCounts command to impose the structure needed to generate the Venn diagram.

a <- vennCounts(c3)

a
     hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22
attr(,"class")
[1] "VennCounts"

We can now generate our Venn diagram with the vennDiagram command:

vennDiagram(a)

Venn diagram showing overlapping and unique areas

While some of the options for the vennDiagram command are specific to tests run on microarray data, we can change some of the formatting. Below, we add names to the groups, we change the relative size of the labels and counts, and we opt for the counts to appear in red.

vennDiagram(a, include = "both", 
  names = c("High Writing", "High Math", "High Reading"), 
  cex = 1, counts.col = "red")

We could opt to present just two groups in this way, but it is not possible to add a fourth. Note that the size of the areas of overlap do not coincide with the relative counts. It is also worth noting that the areas in these Venn diagrams may suggest overlap where there, in fact, is none. The example below illustrates this.

  g <- cbind(
  g1 = c(rep(0, 6), rep(1, 3)), 
  g2 = c(rep(1, 6), rep(0, 3)))
  d <- vennCounts(g)
  vennDiagram(d)