Notes

Here, you can find 40 posts about computers, programming, computational biology, or biomedical research.

You can also see all my posts by category.

2025

Get notifications on desktop and mobile from long-running jobs in your terminal sessions

2 December 2025·847 words·4 mins

Bash R Python

Use ntfy.sh to send yourself desktop and mobile push notifications when long-running terminal jobs finish.

Automatically name screenshots with the active window's title on macOS

6 October 2025·1210 words·6 mins

Automatically rename macOS screenshots using the active window’s title, so each file says what it shows.

Animate a rotating protein from the Protein Data Bank

4 October 2025·870 words·5 mins

Python

Make an animated gif of a rotating protein structure from the Protein Data Bank, here a human HLA-TCR complex (6py2).

2023

pubmed-pairs: Search PubMed for each pair of terms from two lists

18 September 2023·292 words·2 mins

Javascript

A tool that searches PubMed for every pairwise combination of terms from two lists, to map associations in the literature.

Replace a failed hard drive in a RAID array

22 February 2023·2116 words·10 mins

Tutorial

One of our servers started beeping very loudly and disturbing the people working in the same room. What happened? It turns out that the RAID device was beeping due to a failed hard …

Reduce packet loss on a cable internet connection by increasing power

4 January 2023·953 words·5 mins

Tutorial

An internet connection with high packet loss is frustating to use. When packets of data are lost, phone calls over wifi can be randomly punctuated with silence. Zoom meetings might …

2022

Benchmark principal component analysis (PCA) of scRNA-seq data in R

24 January 2022·1896 words·9 mins

R Tutorial

Principal component analysis (PCA) is frequently used for analysis of single-cell RNA-seq (scRNA-seq) data. We can use it to reduce the dimensionality of a large matrix with …

2021

Debug a workflow on the Terra platform

2 September 2021·1499 words·8 mins

Tutorial Cloud

Some biomedical researchers use the Terra platform to run data analysis jobs on Google Cloud. When we run into errors, it can be daunting to figure out where the errors are coming …

Tools for exploring the scientific literature

10 August 2021·2108 words·10 mins

List

There are millions of scientific publications, and many people have created tools for exploring them. In this post, we’ll highlight a few websites and tools that allow us to …

Find the most abundant barcodes in FASTQ files

22 April 2021·418 words·2 mins

Tutorial

Single-cell RNA-seq data contains oligonucleotide barcodes to uniquely identify each multiplexed sample, each single cell, and each individual molecule. Can we check which barcodes …

2020

Make a table with ligands and receptors in R with OmnipathR

24 November 2020·4860 words·23 mins

R Tutorial

Curated lists of genes help computational biologists to focus analyses on a subset of genes that might be important for a research question. For example, we might be interested to …

Resources for illustrating biomedical science

24 August 2020·907 words·5 mins

List

Clear illustrations are essential for effective science communication. Here, we’ll mention a few resources that might be of interest for researchers, educators, or anyone …

Make a table with your most recent coauthors in R

13 August 2020·538 words·3 mins

R Tutorial

Some grant agencies might require a table that lists all of your coauthors, departments, and dates for publications from the last few years. Making such a table can be a laborious …

Working with a sparse matrix in R

11 March 2020·2272 words·11 mins

R Tutorial

Sparse matrices are necessary for dealing with large single-cell RNA-seq datasets. They require less memory than dense matrices, and they allow some computations to be more …

2019

Harmony in motion: visualize an iterative algorithm for aligning multiple datasets

25 August 2019·2487 words·12 mins

R Tutorial

Harmony is a an algorithm for aligning multiple high-dimensional datasets, described by Ilya Korsunsky et al. in this paper. When analyzing multiple single-cell RNA-seq datasets, …

Monitor disk usage on your server

18 July 2019·532 words·3 mins

Bash

You might consider monitoring your disk usage, because this might reveal trends that help you to plan for the future. Here, we’ll use a cron job to periodically scan a …

Make tidy variant tables with MyVariant.info and Tabulator

14 July 2019·140 words·1 min

Javascript

We can use the MyVariant.info API and Tabulator to create a web page for making tidy tables with genomic variants.

Make tidy gene tables with MyGene.info and Tabulator

10 July 2019·139 words·1 min

Javascript

Let’s use the MyGene.info API with the Tabulator JavaScript library by Oli Folkerd to create a simple web page for making tidy tables with information about genes. See how it …

2018

Repeating pattern of colorful circles

30 December 2018·39 words·1 min

Javascript

A groovy Javascript animation made by moving circles back and forth while each one shifts color as it goes.

Extract data from a PDF file with Tabula

29 December 2018·358 words·2 mins

R Data Rheumatoid-Arthritis Tutorial

Kirkham et al. 2006 is a prospective 2-year study of 60 patients with rheumatoid arthritis (RA). It shows that “synovial membrane cytokine mRNA expression is predictive of …

Generate a large color palette with Colorgorical

23 July 2018·222 words·2 mins

R Tutorial

Sometimes we need a lot of colors to represent all the categories in our data. We can use the httr and jsonlite packages to retrieve a list of colors from the Colorgorical website …

Animated doodle ↗ ↖

3 July 2018

Animated doodle.

2017

Barnsley fern

7 May 2017·59 words·1 min

Javascript

A Javascript animation that draws the Barnsley fern with the chaos game, inspired by Numberphile.

Make heatmaps in R with pheatmap

16 February 2017·932 words·5 mins

R Tutorial

Here are a few tips for making heatmaps with the pheatmap R package by Raivo Kolde. We’ll use quantile color breaks, so each color represents an equal proportion of the data. …

Color points by density with ggplot2

17 January 2017·396 words·2 mins

R Tutorial

Here, we use the 2D kernel density estimation function from the MASS R package to to color points by density in a plot created with ggplot2. This helps us to see where most of the …

2016

Embed compressed data in HTML files

17 December 2016·173 words·1 min

Tutorial

HTML is great for presenting rich text documents on the web. Javascript takes the web experience to the next level by allowing the content creator to develop scripts that run on …

snpbook: view single nucleotide polymorphism linkage disequilibrium in your browser ↗ ↖

18 November 2016

SNP LD in your browser. (Doesn’t load anymore).

2015

Build bioinformatics pipelines with Snakemake

23 November 2015·1458 words·7 mins

Python Tutorial

Snakemake is a Pythonic variant of GNU Make. Recently, I learned how to use it to build and launch bioinformatics pipelines on an LSF cluster. However, I had trouble understanding …

Determine if a transcription factor is bound to a genomic site with CENTIPEDE

3 June 2015·106 words·1 min

R Tutorial

I wrote a practical tutorial for how to use CENTIPEDE to determine if a transcription factor is bound to a site in the genome. The tutorial explains how to prepare appropriate …

Print bigWig data for each region in a BED file

30 March 2015·275 words·2 mins

Bash

I wrote a Bash script to call bigWigToBedGraph for each region in a BED file. You can quickly take a subset of bigWig data for regions of interest. In my particular case, I needed …

Get human transcription factor target genes

5 March 2015·380 words·2 mins

R Tutorial

I made a data package with human transcription factor target genes for use in R. It is a collection of data from three sources: TRED, ITFP, and ENCODE. I use them to test if the …

Quickly aggregate your data in R with data.table

28 January 2015·429 words·3 mins

In genomics data, we often have multiple measurements for each gene. Sometimes we want to aggregate those measurements with the mean, median, or sum. The data.table R package can …

2014

Make ribosomal RNA intervals for Picard CollectRnaSeqMetrics

12 December 2014·219 words·2 mins

Bash

Before you can use the CollectRnaSeqMetrics Picard tool, you must create a table of genomic intervals with the coordinates of all ribosomal genes in the genome. I wrote a bash …

Join multiple PLINK dosage files into one file

29 October 2014·179 words·1 min

Python

If you have multiple PLINK dosage files and would like to merge them into one file, this script might save you some time.

Autocomplete gene names with mygene.info and typeahead.js

5 October 2014·128 words·1 min

Javascript

We can use mygene.info with typeahead.js to autocomplete gene names and retrieve every annotation you can think of (GO, Kegg, Ensembl, position, homologs, etc.). Try typing your …

Create a quantile-quantile plot with ggplot2

16 February 2014·652 words·4 mins

After performing many tests for statistical significance, the next step is to check if any results are more extreme than we would expect by random chance. One way to do this is by …

How to ssh to a remote server without typing your password

4 February 2014·408 words·2 mins

Tutorial

Here are a few tips to use ssh more effectively. Login to your server using public key encryption instead of typing a password. Use the ~/.ssh/config file to create short and …

2013

Count the number of coding base pairs in each Gencode gene

23 December 2013·226 words·2 mins

Python

We can use Python to count the coding base pairs in each Gencode gene. Here, we report the base pair count by gene rather than by transcript. When we encounter different …

GTEx RNA-Seq Visualizations

21 September 2013·121 words·1 min

Javascript

I created three visualizations of RNA-Seq data from the GTEx project (version 2013-03-21). They’re powered by JBrowse, the WashU Epigenome Browser, and canvasXpress.

0-based and 1-based genomic intervals, overlap, and distance

7 August 2013·425 words·2 mins

Python

Here, I describe two kinds of genomic intervals and include source code for testing overlap and calculating distance between intervals.

↑