Kamil Slowikowski
About Archive

Print bigWig data for each region in a BED file

I wrote a Bash script to call bigWigToBedGraph for each region in a BED file. You can quickly take a subset of bigWig data for regions of interest. In my particular case, I needed to get phastCons conservation scores for putative transcription factor binding sites.

Suppose you’d like to determine the evolutionary conservation of putative transcription factor binding sites, to improve discrimination of true and false positive sites. It is possible to use conservation information with CENTIPEDE, for example.

Let’s start by retrieving precomputed phastCons values for conservation across 100 vertebrates from UCSC:

See: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons100way/

mkdir phastCons100way
cd phastCons100way
URL=rsync://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons100way
rsync -avz --progress ${URL}/hg19.100way.phastCons.bw .

We also need the bigWigToBedGraph utility. See: http://hgdownload.cse.ucsc.edu/admin/exe/

If we have our binding sites in a BED file called sites.bed, we can get the conservation scores for those sites as follows:

bigWigRegions hg19.100way.phastCons.bw sites.bed > sites.phastCons.bedGraph

Source code