# 0-based and 1-based genomic intervals, overlap, and distance

2013-08-07 Here, I describe two kinds of genomic intervals and include source code for testing overlap and calculating distance between intervals.

You will find files specifying genomic coordinates in two formats:

``````0-based : 0 1 2 3 4   (UCSC, BED, bedGraph, narrowPeak)
1-based :  1 2 3 4    (NCBI, Ensembl, GFF, GTF, VCF, SAM, BAM, wiggle)
sequence:  A T G C
``````

0-based starts with 0 and numbers the spaces in between nucleotides.

1-based starts with 1 and numbers the nucleotides.

The subsequence `TG` of the full string `ATGC` is:

``````0-based : [1, 3)
1-based : [2, 3]
``````

The 0-based style does not include the last position: `)`

The 1-based style includes the last position: `]`

This results in different length calculations for subsequence `TG`:

``````0-based : 3 - 1     = 2
1-based : 3 - 2 + 1 = 2
``````

# Example#

``````>>> a, b = (1, 3), (3, 7)

>>> print_intervals0(a, b)
01234567890
==
====

>>> print_intervals1(a, b)
1234567890
===
=====

>>> overlap0(a, b)
False

>>> overlap1(a, b)
True

>>> distance0(a, b)
0

>>> distance1(a, b)
-1
``````
``````# 0-based intervals

def overlap0(a, b):
"""Check if two 0-based intervals overlap."""
# a.start < b.end and a.end > b.start
return a < b and a > b

def distance0(a, b):
"""Get the number of bases between two 1-based intervals, 0 if the
intervals are book-ended against each other, or, if negative, the number
of bases in the overlap.
"""
return max(a - b, b - a)

def print_intervals0(*intervals):
start  = min([i for i in intervals])
stop   = max([i for i in intervals])
length = stop - start
print '0' + '1234567890' * ((length + 10) / 10)
for i in intervals:
spaces = ' ' * i
marks  = '=' * (i - i)
print spaces + marks

# 1-based intervals

def overlap1(a, b):
"""Check if two 1-based intervals overlap."""
# a.start <= b.end and a.end >= b.start
return a <= b and a >= b

def distance1(a, b):
"""Get the number of bases between two 1-based intervals, 0 if the
intervals are book-ended against each other, or, if negative, the number
of bases in the overlap.
"""
return max(a - b, b - a) - 1

def print_intervals1(*intervals):
start  = min([i for i in intervals])
stop   = max([i for i in intervals])
length = stop - start + 1
print '1234567890' * ((length + 10) / 10)
for i in intervals:
spaces = ' ' * (i - 1)
marks  = '=' * (i - i + 1)
print spaces + marks
``````