Kamil Slowikowski
About Archive

Make heatmaps in R

Here are a few tips for making heatmaps in R. We’ll use quantile color breaks, so each color represents an equal proportion of the data. We’ll also cluster the data with neatly sorted dendrograms, so it’s easy to see which samples are closely or distantly related.

Color points by density with ggplot2

A code snippet that shows how to color points by density in R with ggplot2.

Build bioinformatics pipelines with Snakemake

Snakemake is a Pythonic variant of GNU Make. Recently, I learned how to use it to build and launch bioinformatics pipelines on an LSF cluster. However, I had trouble understanding the documentation for Snakemake. I like to learn by trying simple examples, so this post will walk you through a very simple pipeline step by step. If you already know how to use Snakemake, then you might be interested to copy my Snakefiles for RNA-seq data analysis here.

Compare distributions with box plots, not bar plots

In many scientific journals, authors use bar plots to compare two or more distributions. Often, the error bar is only present for the upper limit and not for the lower limit. Sometimes, the bar for the control group has no error bars due to data normalization. Here, I simulate a small experiment to illustrate why this normalization is problematic and why box plots are better than bar plots for comparing two distributions.

Determine if a transcription factor is bound with CENTIPEDE

I wrote a practical tutorial for how to use CENTIPEDE to determine if a transcription factor is bound to a site in the genome. The tutorial explains how to prepare appropriate input data and how to run the analysis. Please get in touch if you have any comments or suggestions. In the future, I would like to incorporate the code from the tutorial into the CENTIPEDE R package.

Print bigWig data for each region in a BED file

I wrote a Bash script to call bigWigToBedGraph for each region in a BED file. You can quickly take a subset of bigWig data for regions of interest. In my particular case, I needed to get phastCons conservation scores for putative transcription factor binding sites.