[Biopython] "raw" genbank locations?
Brad Chapman
chapmanb at 50mail.com
Wed Mar 9 21:05:45 EST 2011
Cedar;
Glad to hear Biopython has been helping out with your work.
> I'm trying to get the "raw" genbank locations for a downstream
> application after parsing a genbank file. Is there any way to get at
> this (or reproduce it)? As it is, the SeqRecord feature has start and
> stop information for the whole feature, and a list of sub-features
> each with it's own start and stops. I'm looking for one concise text
> string the describes the entire feature location, much like the
> original raw genbank locations do.
You can do this with the GenBank RecordParser, which doesn't parse
the location strings:
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> handle = open("NT_019265.gb")
>>> rec = parser.parse(handle)
>>> for f in rec.features:
... print f.location
...
1..1250660
1..3290
215902..365470
217508
join(342430..342515,363171..363300,365741..365814,376398..376499,390169..390297,391257..391379,392606..392679,398230..398419,399082..399167,399534..399650,405844..405913,406704..406761,406868..407010,407962..408091,408508..409092)
If you have SeqRecord objects from SeqIO you can do this in a ugly
way by reaching into the internals of the GenBank writer:
>>> from Bio import SeqIO
>>> from Bio.SeqIO import InsdcIO
>>> handle = open("NT_019265.gb")
>>> for rec in SeqIO.parse(handle, "genbank"):
... for f in rec.features:
... print InsdcIO._insdc_feature_location_string(f, len(rec.seq))
...
1..1250660
1..3290
215902..365470
217508
join(342430..342515,363171..363300,365741..365814,376398..376499,390169..390297,391257..391379,392606..392679,398230..398419,399082..399167,399534..399650,405844..405913,406704..406761,406868..407010,407962..408091,408508..409092)
That might work for a quick hack but is not necessarily future proof
is the internal change. Peter, do you think this would be useful to
expose as a function of a SeqFeature directly, so you could do
feature.insdc_string() or something similar?
Brad
More information about the Biopython
mailing list