Making Heat Maps In R: A Comparison


Amanda Birmingham, CCBB, UCSD (abirmingham at ucsd.edu)

Heat maps are a staple of data visualization for many tasks, including differential expression analyses on microarray and RNA-Seq data. Many people have already written heat-map-plotting packages for R, so it takes a little effort to decide which to use; here I investigate the performance of the six that I found referenced most frequently online.

My main goals (YMMV) beyond basic plotting were to be able to (1) annotate rows and columns with metadata information, (b) include scales and labels in the figure itself (since often figures are reused in presentations/etc without caption information), and (c) do as much label customization as possible with the shallowest learning curve. I also want automatic dendrogram creation, so rolling my own using ggplot2/etc was out. Note that throughout I have accepted the default colors for every heat map tool, as these are pretty easy to change after the fact if one cares.

TL;DR: I recommend using heatmap.2 if you want simple and easy, and heatmap3 (NB: not “heatmap.3”) if you want more fine-grained control. Both use the same interface so moving from the first to the second if needed should be relatively painless.

Table of Contents

Table of Contents

Set-Up

Note that this is an R-kernel Jupyter notebook, which will only be runnable on your notebook server if you have the R kernel installed.

The two test files below must be in the same directory as this notebook before running it. If you download the entire “Making Heat Maps In R” folder from GitHub, they will come with it, or you can download them manually from the links below:

heatmap_test_matrix.txt
heatmap_test_annotation.txt

All test “data” used here are just random numbers 🙂

In [1]:
# This line prevents SVG output, which does not play well with export to HTML
options(jupyter.plot_mimetypes = c("text/plain", "image/png" ))
In [2]:
# Load the example "data"
gLogCpmData = as.matrix(read.table("heatmap_test_matrix.txt"))
gLogCpmData
In [3]:
# Load the example annotation/metadata
gAnnotationData = read.table("heatmap_test_annotation.txt")
gAnnotationData
In [4]:
# Make helper function to map metadata category to color
mapDrugToColor<-function(annotations){
    colorsVector = ifelse(annotations["subject_drug"]=="MiracleDrugA", 
        "blue", ifelse(annotations["subject_drug"]=="MiracleDrugB", 
        "green", "red"))
    return(colorsVector)
}

Table of Contents

heatmap

heatmap is the built-in option for heat maps in R:

In [5]:
# Test heatmap with column annotations
testHeatmap<-function(logCPM, annotations) {    
    sampleColors = mapDrugToColor(annotations)
    heatmap(logCPM, margins=c(5,8), ColSideColors=sampleColors)
}

testHeatmap(gLogCpmData, gAnnotationData)

Not bad, but there are no legends for either the main or annotation information …

Table of Contents

heatmap.2

heatmap.2 is an “enhanced” heat map function from the add-on package gplots:

In [6]:
install.packages("gplots")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [7]:
library(gplots)

# Test heatmap.2 with column annotations and custom legend text
testHeatmap2<-function(logCPM, annotations) {    
    sampleColors = mapDrugToColor(annotations)
    heatmap.2(logCPM, margins=c(5,8), ColSideColors=sampleColors,
        key.xlab="log CPM",
        key=TRUE, symkey=FALSE, density.info="none", trace="none")
}

testHeatmap2(gLogCpmData, gAnnotationData)
Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

I turned off a few of the default options (density.info, trace) to make the graphic a bit less busy. The default main legend is nice, but I don’t see an option to include a legend for the annotation information.

Table of Contents

aheatmap

aheatmap, which stands for “annotated heatmap”, is a heat map plotting function from the NMF package:

In [8]:
install.packages("NMF")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [9]:
library(NMF)

# Test aheatmap with column annotations
testAheatmap<-function(logCPM, annotations) {    
    aheatmap(logCPM, annCol=annotations[
        "subject_drug"])
}

testAheatmap(gLogCpmData, gAnnotationData)
Loading required package: pkgmaker
Loading required package: registry

Attaching package: 'pkgmaker'

The following object is masked from 'package:base':

    isNamespaceLoaded

Loading required package: rngtools
Loading required package: cluster
NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 3/4
  To enable shared memory capabilities, try: install.extras('
NMF
')

Yay, legends for both the main data and the annotations! However, note that something weird is going on here: The dendrograms aren’t showing up right. It appears that somehow the body of the heatmap is overlapping with the finer levels of the dendrograms at both top and left. There may be a way to fix this by digging further into the settings of aheatmap, but since I’m looking for something easy to use out-of-the-box, I consider this a disqualifier for my usage.

Table of Contents

pheatmap

pheatmap, where the “p” stands for “pretty”, is the sole function of the package pheatmap:

In [10]:
install.packages("pheatmap")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [11]:
library(pheatmap)

# Test pheatmap with two annotation options
testPheatmap<-function(logCPM, annotations) {    
    drug_info = data.frame(annotations[,"subject_drug"])
    rownames(drug_info) = annotations[["sample_name"]]
    
    # Assign the column annotation straight from 
    # the input annotation dataframe
    pheatmap(logCPM, annotation_col=drug_info, 
        annotation_names_row=FALSE,
        annotation_names_col=FALSE,
        fontsize_col=5)
    
    # Assign the column annotation to an intermediate
    # variable first in order to change the name 
    # pheatmap uses for its legend
    subject_drug = annotations[["subject_drug"]]
    drug_df = data.frame(subject_drug)
    rownames(drug_df) = annotations[["sample_name"]]
    
    pheatmap(logCPM, annotation_col=drug_df, 
        annotation_names_row=FALSE,
        annotation_names_col=FALSE,
        fontsize_col=5)
}
testPheatmap(gLogCpmData, gAnnotationData)

Again, nice to have legends for both main and annotation information. Note that:

  1. One controls the annotation legend title through the variable name, which I consider suboptimal as variable names often do not read nicely as English text.
  2. The function uses row names to match annotations to data, so all data and annotations must be contained in dataframes (not matrices).

Table of Contents

heatmap3

heatmap3 is the central function of the heatmap3 package. Beware that this is different from “heatmap.3”, of which there are numerous versions (e.g., here, here, and here–apparently a lot of people felt heatmap.2 needed an upgrade! I don’t investigate these others here because I haven’t seen them discussed online by users very often.)

In [12]:
install.packages("heatmap3")
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
In [13]:
library(heatmap3)

# Test heatmap3 with several annotation options
testHeatmap3<-function(logCPM, annotations) {    
    sampleColors = mapDrugToColor(annotations)
    
    # Assign just column annotations
    heatmap3(logCPM, margins=c(5,8), ColSideColors=sampleColors) 
    
    # Assign column annotations and make a custom legend for them
    heatmap3(logCPM, margins=c(5,8), ColSideColors=sampleColors, 
        legendfun=function()showLegend(legend=c("MiracleDrugA", 
        "MiracleDrugB", "?"), col=c("blue", "green", "red"), cex=1.5))
    
    # Assign column annotations as a mini-graph instead of colors,
    # and use the built-in labeling for them
    ColSideAnn<-data.frame(Drug=annotations[["subject_drug"]])
    heatmap3(logCPM,ColSideAnn=ColSideAnn,
        ColSideFun=function(x)showAnn(x),
        ColSideWidth=0.8)
}
             
testHeatmap3(gLogCpmData, gAnnotationData)

This one follows the syntax of heatmap.2, which is good if one already knows the latter. But it is quite complicated … definitely complicated enough to get me into trouble (e.g., in the second option above, my annotation legend runs into my heat map and I’ve lost the main legend). It may also be complicated enough to get me out of trouble again (e.g., via explicit setting of the legend and/or heat map placement) but it would clearly take more digging.

Table of Contents

annHeatmap2

annHeatmap2 is the core function of the Heatplus package. Unlike other packages discussed in this evaluation, Heatplus is available through the bioconductor bioinformatics software project rather than through CRAN.

In [14]:
# Source bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite("Heatplus")
Bioconductor version 3.3 (BiocInstaller 1.22.3), ?biocLite for help
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.3 (BiocInstaller 1.22.3), R 3.3.0 (2016-05-03).
Installing package(s) 'Heatplus'
The downloaded binary packages are in
	/var/folders/hn/rpn4rhms41v939mg20d7w0dh0000gn/T//RtmpjRP53o/downloaded_packages
Old packages: 'AnnotationDbi', 'DBI', 'IRanges', 'IRdisplay', 'Matrix', 'R6',
  'Rcpp', 'S4Vectors', 'curl', 'devtools', 'digest', 'httr', 'irlba',
  'jsonlite', 'limma', 'manipulate', 'mgcv', 'mime', 'plyr', 'repr',
  'rstudioapi', 'statmod', 'stringi', 'stringr', 'survival', 'withr'
In [15]:
library(Heatplus)

# Test annHeatmap2 with column annotations
testAnnHeatmap2<-function(logCPM, annotations){
    ann.dat = data.frame(annotations[,"subject_drug"])

    plot(annHeatmap2(logCPM, legend=2,
        ann = list(Col = list(data = ann.dat))))
}

testAnnHeatmap2(gLogCpmData, gAnnotationData)

Like heatmap3, annHeatmap2 does metadata annotations as a mini-graph; apparently it doesn’t do such annotations as color bars? It requires that all annotation (and dendrogram, etc) options be passed in as lists, which clearly offers a lot of powerful abilities but is sort of heavy-weight.

Table of Contents

Summary

Below I summarize the features I assessed for these tools:

feature heatmap heatmap.2 aheatmap pheatmap heatmap3 annHeatmap2
source built-in cran cran cran cran bioconductor
can add main legend x x x x x
can control main legend text x x
can add row/col annotations x x x x x x
can specify annotations as table column x x x
can add annotation legend x x x ~
can control annotation legend text ~ x ~
notes no clear advantages pretty nice results with little tinkering Appears UNUSABLE as dendrograms show up wrong, at least in notebook pretty nice results with little tinkering powerful but complicated powerful but complicated; doesn’t support ColSideColors?

Table of Contents

Recommendation

I recommend using heatmap.2 if you want simple and easy, and heatmap3 if you want more fine-grained control. Since they both use the same interface, it should be easy to migrate from the first to the second if you discover you need a more complex solution.

Table of Contents

In [ ]: