[Biopython] "raw" genbank locations?

Brad Chapman chapmanb at 50mail.com
Thu Mar 10 02:05:45 UTC 2011


Cedar;
Glad to hear Biopython has been helping out with your work.

> I'm trying to get the "raw" genbank locations for a downstream
> application after parsing a genbank file. Is there any way to get at
> this (or reproduce it)? As it is, the SeqRecord feature has start and
> stop information for the whole feature, and a list of sub-features
> each with it's own start and stops. I'm looking for one concise text
> string the describes the entire feature location, much like the
> original raw genbank locations do.

You can do this with the GenBank RecordParser, which doesn't parse
the location strings:

>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> handle = open("NT_019265.gb")
>>> rec = parser.parse(handle)
>>> for f in rec.features:
...     print f.location
... 
1..1250660
1..3290
215902..365470
217508
join(342430..342515,363171..363300,365741..365814,376398..376499,390169..390297,391257..391379,392606..392679,398230..398419,399082..399167,399534..399650,405844..405913,406704..406761,406868..407010,407962..408091,408508..409092)

If you have SeqRecord objects from SeqIO you can do this in a ugly
way by reaching into the internals of the GenBank writer:

>>> from Bio import SeqIO
>>> from Bio.SeqIO import InsdcIO
>>> handle = open("NT_019265.gb")
>>> for rec in SeqIO.parse(handle, "genbank"):
...     for f in rec.features:
...         print InsdcIO._insdc_feature_location_string(f, len(rec.seq))
... 
1..1250660
1..3290
215902..365470
217508
join(342430..342515,363171..363300,365741..365814,376398..376499,390169..390297,391257..391379,392606..392679,398230..398419,399082..399167,399534..399650,405844..405913,406704..406761,406868..407010,407962..408091,408508..409092)

That might work for a quick hack but is not necessarily future proof
is the internal change. Peter, do you think this would be useful to
expose as a function of a SeqFeature directly, so you could do
feature.insdc_string() or something similar?

Brad



More information about the Biopython mailing list