[Biopython] Parsing problem

Peter biopython at maubp.freeserve.co.uk
Tue Dec 8 22:43:29 UTC 2009


On Tue, Dec 8, 2009 at 6:52 PM, Iwan Grin <iwan.grin at googlemail.com> wrote:
> Hi all,
>
> I am having a little problem while trying to parse a GenBank (or rather
> GenProt) file using BioPython. I am trying to extract the position on the
> genome from the "coded_by" qualifier of the CDS feature of a protein.
>
> The "coded_by" string in this specific case looks like this:
>
> 'complement(NC_012967.1:
> 3622110..3624728)'

Oh, one of those tricky cross references to another file :(

> Now, when I run
>
> Bio.GFF.easy.LocationFromString('complement(NC_012967.1:3622110..3624728)' )
>

This is interesting timing - Bio.GFF.easy has a lot of
code which duplicated the EMBL/GenBank parsing,
and I'm actually suggesting we deprecate it in the
next release (!). What made you use Bio.GFF in the
first place? It has never been documented. That said,
it does look like you found a bug in Bio.GFF.easy ...

In the long term, I think Bio.GenBank would be a
better place to put this functionality (and reworking
the location parsing is on the todo list, partly as it
is currently a speed bottleneck).

Peter



More information about the Biopython mailing list