Kamil Slowikowski
About Archive

Run Picard tools and collate multiple metrics files

Picard is a set of Java command line tools for manipulating high-throughput sequencing (HTS) data files such as BAM and VCF. I needed to check the quality of thousands of BAM files, so I created a Bash script called picardmetrics. It runs 10 of the Picard tools on a BAM file and easily collates all of the generated metrics files into a single table. I also include utility scripts for generating the reference files required for Picard.

Get transcription factor target genes

I made a data package with human transcription factor target genes for use in R. It is a collection of data from three sources: TRED, ITFP, and ENCODE. I use them to test if the targets of a transcription factor are differentially expressed in my data. Also, I can test if a set of transcription factor target genes is enriched for some gene set of interest.

Quickly aggregate your data with data.table in R

In genomics data, we often have multiple measurements for each gene. Sometimes we want to aggregate those measurements with the mean, median, or sum.

featureCounts requires identical mate ids

featureCounts, a read-counting program, requires identical mate ids to identify a pair of read mates as correctly paired. However, FASTQ files generated from an SRA file with fastq-dump have different mate ids for each mate in a pair. The forward and reverse mate ids end with .1 and .2, respectively. I wrote a bash function to fix BAM files with this problem.

Make ribosomal RNA intervals for Picard CollectRnaSeqMetrics

Before you can use the CollectRnaSeqMetrics Picard tool, you must create a table of genomic intervals with the coordinates of all ribosomal genes in the genome. I wrote a bash script to prepare this ribosomal interval file from Gencode gene annotations.

Join multiple PLINK dosage files into one file

If you have multiple PLINK dosage files and would like to merge them into one file, this script might save you some time.