[Biopython] New to BP. Looking for closely spaced genes

Peter Cock p.j.a.cock at googlemail.com
Tue Apr 2 16:33:53 UTC 2013


On Mon, Apr 1, 2013 at 7:41 PM, Mark Budde <markbudde at gmail.com> wrote:
> Hi,
> Before I dive too far into BioPython, I'd like to get some input if you
> BioPython is an appropriate tool for my task....
>
> I would like to look at the human genome ORF structure and identify regions
> where ORFs are closely spaced but differentially regulated, and also
> identify whether the ORFs are facing the same direction of opposing
> directions. To do this, I assume I would first download the annotated
> genome and write a script in BioPython annotating how far each ORF is from
> it's neighbors, what the orientation is, and store the result in a
> dictionary. Then I would download some expression data sets and add this to
> the data to the dictionary. Then I would write some algorithm comparing
> gene distance, orientation and expression correlation to generate a list of
> candidate ORF pairs which fit my criteria.
>
> My question is, is BioPython a reasonable tool to accomplish this, or is it
> going to be way to slow whereas some alternative package is better suited
> for my task?
> Thanks,
> Mark Budde

Hi Mark,

That sounds very doable with Biopython parsing GenBank format
chromosomes downloaded form the NCBI/EMBL/DDBJ. I did
something similar to look at overlaps and gaps between genes of
bacteria some years back - also using the Biopython GenBank
parser, e.g. http://mbe.oxfordjournals.org/cgi/content/abstract/msp302

In your case with humans there'll be lots of intron/exon structure
(join locations in GenBank) so I'm recommend trying the current
code from git (which will become Biopython 1.62) where this has
been re-factored to hopefully make joins much easier than before.

Regards,

Peter



More information about the Biopython mailing list