Color points by density with ggplot2

2017-01-17

Here, we use the 2D kernel density estimation function from the MASS R package to to color points by density in a plot created with ggplot2. This helps us to see where most of the data points lie in a busy plot with many overplotted points.

Load libraries, define a convenience function to call MASS::kde2d, and generate some data:

library(MASS)
library(ggplot2)
library(viridis)
#> Loading required package: viridisLite
theme_set(theme_bw(base_size = 16))

# Get density of points in 2 dimensions.
# @param x A numeric vector.
# @param y A numeric vector.
# @param n Create a square n by n grid to compute density.
# @return The density within each square.
get_density <- function(x, y, ...) {
  dens <- MASS::kde2d(x, y, ...)
  ix <- findInterval(x, dens$x)
  iy <- findInterval(y, dens$y)
  ii <- cbind(ix, iy)
  return(dens$z[ii])
}

set.seed(1)
dat <- data.frame(
  x = c(
    rnorm(1e4, mean = 0, sd = 0.1),
    rnorm(1e3, mean = 0, sd = 0.1)
  ),
  y = c(
    rnorm(1e4, mean = 0, sd = 0.1),
    rnorm(1e3, mean = 0.1, sd = 0.2)
  )
)

Notice how the points are overplotted, so you can’t see the peak density:

ggplot(dat) + geom_point(aes(x, y))

plot of chunk plot-without-density

Here, we split the plot into a 100 by 100 grid of squares and then color the points by the estimated density in each square. I recommend viridis for the color scheme.

dat$density <- get_density(dat$x, dat$y, n = 100)
ggplot(dat) + geom_point(aes(x, y, color = density)) + scale_color_viridis()

plot of chunk plot-with-density

Here’s what happens when you set n = 15 (the squares in the grid are too big):

dat$density <- get_density(dat$x, dat$y, n = 15)
ggplot(dat) + geom_point(aes(x, y, color = density)) + scale_color_viridis()

plot of chunk plot-with-density-rough

And what if you modify the bandwidth of the normal kernel with h = c(1, 1)?

dat$density <- get_density(dat$x, dat$y, h = c(1, 1), n = 100)
ggplot(dat) + geom_point(aes(x, y, color = density)) + scale_color_viridis()

plot of chunk plot-with-density-bandwith

Check out the MASS package for more cool functions!

© 2024 Kamil Slowikowski