[Biopython-dev] Where should feature intersection code go?
Michael Sandford
sandford at ufl.edu
Mon Feb 8 21:49:20 UTC 2010
I'm working on a project that's looking for alternative splicing using
solexa data instead of microarray data. Basically we've got a GFF file
containing all the genes, introns and exons and 35M reads that have been
placed into one of the various chromosomes via the excellent bowtie
application out of Maryland.
Bowtie output is documented here:
http://bowtie-bio.sourceforge.net/manual.shtml#default-bowtie-output
In summary it's roughly a cross between fastq and GFF. It's got the
read name, strand, sequence the read aligned to, position, sequence,
quality, and a few others. It seems like it could rather easily be
coerced into a SeqRecord
(http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html).
It might not get filled up completely, but it'd be better than handling
things in a one-off way.
The FeatureLocation class provides for approximate and exact locations
(both start and stop positions). It seems like the correct location to
put code that determines if two FeatureLocations overlap, or if one
contains another, or is contained by another.
Overall I'm talking about writing a bowtie .map parser and the
comparison code for FeatureLocation. Would these be welcome features?
Thanks,
Mike
More information about the Biopython-dev
mailing list