Make heatmaps in R

Here are a few tips for making heatmaps in R. We’ll use quantile color breaks, so each color represents an equal proportion of the data. We’ll also cluster the data with neatly sorted dendrograms, so it’s easy to see which samples are closely or distantly related.

Read the source code for this post.

Summary

1. Making random data
2. Making a heatmap
3. Uniform breaks
4. Quantile breaks
5. Transforming the data
6. Sorting the dendrogram
7. Rotating column labels

Making random data

Let’s make some random data:

Here’s the data:

Let’s split our columns into 3 groups:

Let’s increase the values for group 1 by a factor of 5:

The data is skewed, so most of the values are below 50, but the maximum value is 172 :

Making a heatmap

Let’s make a heatmap and check if we can see that the group 1 values are 5 times larger than the group 2 and 3 values:

The default color breaks in pheatmap are uniformly distributed across the range of the data.

We can see that values in group 1 are larger than values in groups 2 and 3. However, we can’t distinguish different values within groups 2 and 3.

Uniform breaks

We can visualize the unequal proportions of data represented by each color:

With our uniform breaks and non-uniformly distributed data, we represent 86.5% of the data with a single color.

On the other hand, 6 data points greater than or equal to 100 are represented with 4 different colors.

Quantile breaks

If we reposition the breaks at the quantiles of the data, then each color will represent an equal proportion of the data:

When we use quantile breaks in the heatmap, we can clearly see that group 1 values are much larger than values in groups 2 and 3, and we can also distinguish different values within groups 2 and 3:

Transforming the data

We can also transform the data to the log scale instead of using quantile breaks, and notice that the clustering is different on this scale:

Sorting the dendrograms

The dendrogram on top of the heatmap is messy, because the branches are ordered randomly:

Let’s flip the branches to sort the dendrogram. The most similar columns will appear clustered toward the left side of the plot. The columns that are more distant from each other will appear clustered toward the right side of the plot.

Let’s do the same for rows, too, and use these dendrograms in the heatmap:

Rotating column labels

Here’s a way to rotate the column labels in pheatmap (thanks to Josh O’Brien):