Quickly aggregate your data with data.table in R
In genomics data, we often have multiple measurements for each gene.
Sometimes we want to aggregate those measurements with the mean, median, or
The data.table R package is perfect for this task! It can quickly process
very large datasets.
In this note, I show how to average multiple probes in a gene expression
matrix. To see what else you can do with
data.table, check out these
fantastic cheat sheets:
- Make random data
- Aggregate quickly with
- Aggregate slowly with
Step 1. Make random data
Step 2. Aggregate quickly with data.table
Now we can easily average the probes for each gene.
Step 3. Aggregate slowly with stats::aggregate()
The base R function
stats::aggregate() can do the same thing, but it is
The results are identical:
Feel free to edit the source code for this post.