[Biopython] Parsing problem

Wed Dec 9 09:51:20 EST 2009

2009/12/9 Brad Chapman <chapmanb at 50mail.com>

> Iwan and Peter;
>
> > > I am new to BioPython and stumbled upon GFF.easy while searching
> through the
> > > API docs. Actually, What I wanted was a way to parse that location
> string
> > > into an SeqFeature-like thing from which I could get start, end and
> > > strand.Unfortunately I could not find the correct parser in Bio.Genbank
> -
> > > any suggestions are welcome.
> >
> > Right now Bio.GenBank doesn't really expose the location parsing in an
> > easy to use way like Bio.GFF.easy does.
>
> If you don't like ugly code, please avert your eyes now. This will
> work with the standard GenBank parsing and is definitely not future
> proof since it involves using private members. However, it'll work
> for something quick n' dirty:
>
> from Bio.GenBank import _FeatureConsumer
> from Bio.SeqFeature import SeqFeature
>
> def gb_string_to_feature(content, use_fuzziness=True):
>    """Convert a GenBank location string into a SeqFeature.
>    """
>    consumer = _FeatureConsumer(use_fuzziness)
>    consumer._cur_feature = SeqFeature()
>    consumer.location(content)
>    return consumer._cur_feature
>
> print gb_string_to_feature('complement(NC_012967.1:3622110..3624728)')
>
> Hope this helps,
> Brad
>

Brad, Thank you very much!

as much as this is a hack, it works for what I want to have. I guess for
future proofness, either the parsers from Bio.GenBank should be exposed, or
the coded_by qualifier should be parsed as location by default, although I
am not sure how well the latter idea fits into the present data structure.

Iwan