Tutorial

Replace a failed hard drive in a RAID array
·2116 words·10 mins
One of our servers started beeping very loudly and disturbing the people working in the same room. What happened? It turns out that the RAID device was beeping due to a failed hard …
Reduce packet loss on a cable internet connection by increasing power
·953 words·5 mins
An internet connection with high packet loss is frustating to use. When packets of data are lost, phone calls over wifi can be randomly punctuated with silence. Zoom meetings might …
Benchmark principal component analysis (PCA) of scRNA-seq data in R
·1896 words·9 mins
Principal component analysis (PCA) is frequently used for analysis of single-cell RNA-seq (scRNA-seq) data. We can use it to reduce the dimensionality of a large matrix with …
Debug a workflow on the Terra platform
·1499 words·8 mins
Some biomedical researchers use the Terra platform to run data analysis jobs on Google Cloud. When we run into errors, it can be daunting to figure out where the errors are coming …
Find the most abundant barcodes in FASTQ files
·418 words·2 mins
Single-cell RNA-seq data contains oligonucleotide barcodes to uniquely identify each multiplexed sample, each single cell, and each individual molecule. Can we check which barcodes …
Make a table with ligands and receptors in R with OmnipathR
·4860 words·23 mins
Curated lists of genes help computational biologists to focus analyses on a subset of genes that might be important for a research question. For example, we might be interested to …
Make a table with your most recent coauthors in R
·538 words·3 mins
Some grant agencies might require a table that lists all of your coauthors, departments, and dates for publications from the last few years. Making such a table can be a laborious …
Working with a sparse matrix in R
·2272 words·11 mins
Sparse matrices are necessary for dealing with large single-cell RNA-seq datasets. They require less memory than dense matrices, and they allow some computations to be more …
Harmony in motion: visualize an iterative algorithm for aligning multiple datasets
·2487 words·12 mins
Harmony is a an algorithm for aligning multiple high-dimensional datasets, described by Ilya Korsunsky et al. in this paper. When analyzing multiple single-cell RNA-seq datasets, …
Extract data from a PDF file with Tabula
·358 words·2 mins
Kirkham et al. 2006 is a prospective 2-year study of 60 patients with rheumatoid arthritis (RA). It shows that “synovial membrane cytokine mRNA expression is predictive of …
Generate a large color palette with Colorgorical
·222 words·2 mins
Sometimes we need a lot of colors to represent all the categories in our data. We can use the httr and jsonlite packages to retrieve a list of colors from the Colorgorical website …
Make heatmaps in R with pheatmap
·932 words·5 mins
Here are a few tips for making heatmaps with the pheatmap R package by Raivo Kolde. We’ll use quantile color breaks, so each color represents an equal proportion of the data. …
Color points by density with ggplot2
·396 words·2 mins
Here, we use the 2D kernel density estimation function from the MASS R package to to color points by density in a plot created with ggplot2. This helps us to see where most of the …
Embed compressed data in HTML files
·173 words·1 min
HTML is great for presenting rich text documents on the web. Javascript takes the web experience to the next level by allowing the content creator to develop scripts that run on …
Build bioinformatics pipelines with Snakemake
·1458 words·7 mins
Snakemake is a Pythonic variant of GNU Make. Recently, I learned how to use it to build and launch bioinformatics pipelines on an LSF cluster. However, I had trouble understanding …
Determine if a transcription factor is bound to a genomic site with CENTIPEDE
·106 words·1 min
I wrote a practical tutorial for how to use CENTIPEDE to determine if a transcription factor is bound to a site in the genome. The tutorial explains how to prepare appropriate …
Get human transcription factor target genes
·380 words·2 mins
I made a data package with human transcription factor target genes for use in R. It is a collection of data from three sources: TRED, ITFP, and ENCODE. I use them to test if the …
How to ssh to a remote server without typing your password
·408 words·2 mins
Here are a few tips to use ssh more effectively. Login to your server using public key encryption instead of typing a password. Use the ~/.ssh/config file to create short and …