Notes

Here, you can find 39 posts about computers, progamming, computational biology, or biomedical research.

You can also see all my posts by category.

2025

Automatically name screenshots with the active window's title on macOS
·1210 words·6 mins
On macOS, screenshots are named like Screenshot 2025-10-06 at 1.39.16 PM.png. When I’m taking multiple screenshots of figures in multiple papers, I can’t tell which …
Animate a rotating protein from the Protein Data Bank
·870 words·5 mins
Let’s make an animated gif of a rotating protein from PDB. This animation shows a human HLA-TCR complex (6py2). You’re looking at two human proteins: The HLA protein …

2023

pubmed-pairs: Search PubMed for each pair of terms from two lists
·292 words·2 mins
Searching PubMed is my favorite way to find biomedical research about any topic. But sometimes I want to explore complicated queries. For example, how many papers have been …
Replace a failed hard drive in a RAID array
·2116 words·10 mins
One of our servers started beeping very loudly and disturbing the people working in the same room. What happened? It turns out that the RAID device was beeping due to a failed hard …
Reduce packet loss on a cable internet connection by increasing power
·953 words·5 mins
An internet connection with high packet loss is frustating to use. When packets of data are lost, phone calls over wifi can be randomly punctuated with silence. Zoom meetings might …

2022

Benchmark principal component analysis (PCA) of scRNA-seq data in R
·1896 words·9 mins
Principal component analysis (PCA) is frequently used for analysis of single-cell RNA-seq (scRNA-seq) data. We can use it to reduce the dimensionality of a large matrix with …

2021

Debug a workflow on the Terra platform
·1499 words·8 mins
Some biomedical researchers use the Terra platform to run data analysis jobs on Google Cloud. When we run into errors, it can be daunting to figure out where the errors are coming …
Tools for exploring the scientific literature
·2108 words·10 mins
There are millions of scientific publications, and many people have created tools for exploring them. In this post, we’ll highlight a few websites and tools that allow us to …
Find the most abundant barcodes in FASTQ files
·418 words·2 mins
Single-cell RNA-seq data contains oligonucleotide barcodes to uniquely identify each multiplexed sample, each single cell, and each individual molecule. Can we check which barcodes …

2020

Make a table with ligands and receptors in R with OmnipathR
·4860 words·23 mins
Curated lists of genes help computational biologists to focus analyses on a subset of genes that might be important for a research question. For example, we might be interested to …
Resources for illustrating biomedical science
·862 words·5 mins
Clear illustrations are essential for effective science communication. Here, we’ll mention a few resources that might be of interest for researchers, educators, or anyone …
Make a table with your most recent coauthors in R
·538 words·3 mins
Some grant agencies might require a table that lists all of your coauthors, departments, and dates for publications from the last few years. Making such a table can be a laborious …
Working with a sparse matrix in R
·2272 words·11 mins
Sparse matrices are necessary for dealing with large single-cell RNA-seq datasets. They require less memory than dense matrices, and they allow some computations to be more …

2019

Harmony in motion: visualize an iterative algorithm for aligning multiple datasets
·2487 words·12 mins
Harmony is a an algorithm for aligning multiple high-dimensional datasets, described by Ilya Korsunsky et al. in this paper. When analyzing multiple single-cell RNA-seq datasets, …
Monitor disk usage on your server
·532 words·3 mins
You might consider monitoring your disk usage, because this might reveal trends that help you to plan for the future. Here, we’ll use a cron job to periodically scan a …
Make tidy variant tables with MyVariant.info and Tabulator
·140 words·1 min
We can use the MyVariant.info API and Tabulator to create a web page for making tidy tables with genomic variants.
Make tidy gene tables with MyGene.info and Tabulator
·139 words·1 min
Let’s use the MyGene.info API with the Tabulator JavaScript library by Oli Folkerd to create a simple web page for making tidy tables with information about genes. See how it …

2018

Repeating pattern of colorful circles
·39 words·1 min
What happens if you move circles back and forth, and each one changes its color as it moves? 🌈 You end up with this groovy animation: Full screen: circles
Extract data from a PDF file with Tabula
·358 words·2 mins
Kirkham et al. 2006 is a prospective 2-year study of 60 patients with rheumatoid arthritis (RA). It shows that “synovial membrane cytokine mRNA expression is predictive of …
Generate a large color palette with Colorgorical
·222 words·2 mins
Sometimes we need a lot of colors to represent all the categories in our data. We can use the httr and jsonlite packages to retrieve a list of colors from the Colorgorical website …
Animated doodle
Animated doodle.

2017

Barnsley fern
·26 words·1 min
🌿 A Javascript animation of the Barnsley Fern, inspired by Chaos Game - Numberphile. Please, fork it on GitHub:
Make heatmaps in R with pheatmap
·932 words·5 mins
Here are a few tips for making heatmaps with the pheatmap R package by Raivo Kolde. We’ll use quantile color breaks, so each color represents an equal proportion of the data. …
Color points by density with ggplot2
·396 words·2 mins
Here, we use the 2D kernel density estimation function from the MASS R package to to color points by density in a plot created with ggplot2. This helps us to see where most of the …

2016

Embed compressed data in HTML files
·173 words·1 min
HTML is great for presenting rich text documents on the web. Javascript takes the web experience to the next level by allowing the content creator to develop scripts that run on …
snpbook: view single nucleotide polymorphism linkage disequilibrium in your browser
SNP LD in your browser. (Doesn’t load anymore).

2015

Build bioinformatics pipelines with Snakemake
·1458 words·7 mins
Snakemake is a Pythonic variant of GNU Make. Recently, I learned how to use it to build and launch bioinformatics pipelines on an LSF cluster. However, I had trouble understanding …
Determine if a transcription factor is bound to a genomic site with CENTIPEDE
·106 words·1 min
I wrote a practical tutorial for how to use CENTIPEDE to determine if a transcription factor is bound to a site in the genome. The tutorial explains how to prepare appropriate …
Print bigWig data for each region in a BED file
·275 words·2 mins
I wrote a Bash script to call bigWigToBedGraph for each region in a BED file. You can quickly take a subset of bigWig data for regions of interest. In my particular case, I needed …
Get human transcription factor target genes
·380 words·2 mins
I made a data package with human transcription factor target genes for use in R. It is a collection of data from three sources: TRED, ITFP, and ENCODE. I use them to test if the …
Quickly aggregate your data in R with data.table
·429 words·3 mins
In genomics data, we often have multiple measurements for each gene. Sometimes we want to aggregate those measurements with the mean, median, or sum. The data.table R package can …

2014

Make ribosomal RNA intervals for Picard CollectRnaSeqMetrics
·219 words·2 mins
Before you can use the CollectRnaSeqMetrics Picard tool, you must create a table of genomic intervals with the coordinates of all ribosomal genes in the genome. I wrote a bash …
Join multiple PLINK dosage files into one file
·179 words·1 min
If you have multiple PLINK dosage files and would like to merge them into one file, this script might save you some time.
Autocomplete gene names with mygene.info and typeahead.js
·128 words·1 min
We can use mygene.info with typeahead.js to autocomplete gene names and retrieve every annotation you can think of (GO, Kegg, Ensembl, position, homologs, etc.). Try typing your …
Create a quantile-quantile plot with ggplot2
·652 words·4 mins
After performing many tests for statistical significance, the next step is to check if any results are more extreme than we would expect by random chance. One way to do this is by …
How to ssh to a remote server without typing your password
·408 words·2 mins
Here are a few tips to use ssh more effectively. Login to your server using public key encryption instead of typing a password. Use the ~/.ssh/config file to create short and …

2013

Count the number of coding base pairs in each Gencode gene
·226 words·2 mins
We can use Python to count the coding base pairs in each Gencode gene. Here, we report the base pair count by gene rather than by transcript. When we encounter different …
GTEx RNA-Seq Visualizations
·121 words·1 min
I created three visualizations of RNA-Seq data from the GTEx project (version 2013-03-21). They’re powered by JBrowse, the WashU Epigenome Browser, and canvasXpress.
0-based and 1-based genomic intervals, overlap, and distance
·425 words·2 mins
Here, I describe two kinds of genomic intervals and include source code for testing overlap and calculating distance between intervals.