Print bigWig data for each region in a BED file

2015-03-30

I wrote a Bash script to call bigWigToBedGraph for each region in a BED file. You can quickly take a subset of bigWig data for regions of interest. In my particular case, I needed to get phastCons conservation scores for putative transcription factor binding sites.

Suppose you’d like to determine the evolutionary conservation of putative transcription factor binding sites, to improve discrimination of true and false positive sites. It is possible to use conservation information with CENTIPEDE, for example.

Let’s start by retrieving precomputed phastCons values for conservation across 100 vertebrates from UCSC:

See: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons100way/

mkdir phastCons100way
cd phastCons100way
URL=rsync://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons100way
rsync -avz --progress ${URL}/hg19.100way.phastCons.bw .

We also need the bigWigToBedGraph utility. Download it from: http://hgdownload.cse.ucsc.edu/admin/exe/

If we have our binding sites in a BED file called sites.bed, we can get the conservation scores for those sites as follows:

bigWigRegions hg19.100way.phastCons.bw sites.bed > sites.phastCons.bedGraph

Source code#

© 2024 Kamil Slowikowski