Picard is a set of Java command line tools for manipulating high-throughput sequencing (HTS) data files such as BAM and VCF. I needed to check the quality of thousands of BAM files, so I created a Bash script called picardmetrics. It runs 10 of the Picard tools on a BAM file and easily collates all of the generated metrics files into a single table. I also include utility scripts for generating the reference files required for Picard.
I made a data package with human transcription factor target genes for use in R. It is a collection of data from three sources: TRED, ITFP, and ENCODE. I use them to test if the targets of a transcription factor are differentially expressed in my data. Also, I can test if a set of transcription factor target genes is enriched for some gene set of interest.
In genomics data, we often have multiple measurements for each gene. Sometimes we want to aggregate those measurements with the mean, median, or sum.
featureCounts, a read-counting program, requires identical mate ids to
identify a pair of read mates as correctly paired. However, FASTQ files
generated from an SRA file with fastq-dump have different mate ids for each
mate in a pair. The forward and reverse mate ids end with
respectively. I wrote a bash function to fix BAM files with this problem.
Before you can use the CollectRnaSeqMetrics Picard tool, you must create a table of genomic intervals with the coordinates of all ribosomal genes in the genome. I wrote a bash script to prepare this ribosomal interval file from Gencode gene annotations.