R

Get notifications on desktop and mobile from long-running jobs in your terminal sessions
·847 words·4 mins
If you’re like me, you get tired of waiting for long-running jobs in the terminal. You run a new command, and you don’t really know how long it should take to finish. …
hlabud: HLA genotype analysis in R
·71 words·1 min
hlabud provides methods to retrieve sequence alignment data from IMGTHLA and convert the data into convenient R matrices ready for downstream analysis. See the usage examples to …
Benchmark principal component analysis (PCA) of scRNA-seq data in R
·1896 words·9 mins
Principal component analysis (PCA) is frequently used for analysis of single-cell RNA-seq (scRNA-seq) data. We can use it to reduce the dimensionality of a large matrix with …
Make a table with ligands and receptors in R with OmnipathR
·4860 words·23 mins
Curated lists of genes help computational biologists to focus analyses on a subset of genes that might be important for a research question. For example, we might be interested to …
Make a table with your most recent coauthors in R
·538 words·3 mins
Some grant agencies might require a table that lists all of your coauthors, departments, and dates for publications from the last few years. Making such a table can be a laborious …
Working with a sparse matrix in R
·2272 words·11 mins
Sparse matrices are necessary for dealing with large single-cell RNA-seq datasets. They require less memory than dense matrices, and they allow some computations to be more …
Harmony in motion: visualize an iterative algorithm for aligning multiple datasets
·2487 words·12 mins
Harmony is a an algorithm for aligning multiple high-dimensional datasets, described by Ilya Korsunsky et al. in this paper. When analyzing multiple single-cell RNA-seq datasets, …
Extract data from a PDF file with Tabula
·358 words·2 mins
Kirkham et al. 2006 is a prospective 2-year study of 60 patients with rheumatoid arthritis (RA). It shows that “synovial membrane cytokine mRNA expression is predictive of …
Generate a large color palette with Colorgorical
·222 words·2 mins
Sometimes we need a lot of colors to represent all the categories in our data. We can use the httr and jsonlite packages to retrieve a list of colors from the Colorgorical website …
Make heatmaps in R with pheatmap
·932 words·5 mins
Here are a few tips for making heatmaps with the pheatmap R package by Raivo Kolde. We’ll use quantile color breaks, so each color represents an equal proportion of the data. …
Color points by density with ggplot2
·396 words·2 mins
Here, we use the 2D kernel density estimation function from the MASS R package to to color points by density in a plot created with ggplot2. This helps us to see where most of the …
Immunogenomics.io
·36 words·1 min
View the primary genomics data from several biomedical research studies. I developed many of the data visualizations on this site with R and Javascript. You can view bulk RNA-seq, …
ggrepel: Automatically Position Non-Overlapping Text Labels with 'ggplot2'
·106 words·1 min
ggrepel is an R package that provides geoms for ggplot2 to repel overlapping text labels: geom_text_repel() geom_label_repel() Text labels repel away from each other, away from …
Determine if a transcription factor is bound to a genomic site with CENTIPEDE
·106 words·1 min
I wrote a practical tutorial for how to use CENTIPEDE to determine if a transcription factor is bound to a site in the genome. The tutorial explains how to prepare appropriate …
Get human transcription factor target genes
·380 words·2 mins
I made a data package with human transcription factor target genes for use in R. It is a collection of data from three sources: TRED, ITFP, and ENCODE. I use them to test if the …
Quickly aggregate your data in R with data.table
·429 words·3 mins
In genomics data, we often have multiple measurements for each gene. Sometimes we want to aggregate those measurements with the mean, median, or sum. The data.table R package can …
Create a quantile-quantile plot with ggplot2
·652 words·4 mins
After performing many tests for statistical significance, the next step is to check if any results are more extreme than we would expect by random chance. One way to do this is by …