Kamil Slowikowski
About Archive

Color points by density with ggplot2

A code snippet that shows how to color points by density in R with ggplot2.

Load libraries, define a convenience function to call MASS::kde2d, and generate some data:

library(MASS)
library(ggplot2)
library(viridis)
theme_set(theme_bw(base_size = 16))

# Get density of points in 2 dimensions.
# @param x A numeric vector.
# @param y A numeric vector.
# @param n Create a square n by n grid to compute density.
# @return The density within each square.
get_density <- function(x, y, n = 100) {
  dens <- MASS::kde2d(x = x, y = y, n = n)
  ix <- findInterval(x, dens$x)
  iy <- findInterval(y, dens$y)
  ii <- cbind(ix, iy)
  return(dens$z[ii])
}

set.seed(1)
dat <- data.frame(
  x = c(
    rnorm(1e4, mean = 0, sd = 0.1),
    rnorm(1e3, mean = 0, sd = 0.1)
  ),
  y = c(
    rnorm(1e4, mean = 0, sd = 0.1),
    rnorm(1e3, mean = 0.1, sd = 0.2)
  )
)

Notice how the points are overplotted, so you can’t see the peak density:

ggplot(dat) + geom_point(aes(x, y))

plot of chunk plot-without-density

Here, we split the plot into a 100 by 100 grid of squares and then color the points by the density in each square. I recommend viridis for the color scheme.

dat$density <- get_density(dat$x, dat$y)
ggplot(dat) + geom_point(aes(x, y, color = density)) + scale_color_viridis()

plot of chunk plot-with-density

Here’s what happens when you set n = 15 (the squares in the grid are too big):

dat$density <- get_density(dat$x, dat$y, n = 15)
ggplot(dat) + geom_point(aes(x, y, color = density)) + scale_color_viridis()

plot of chunk plot-with-density-rough