[Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format

Brynjar Smári Bjarnason binni at binnisb.com
Thu Dec 5 18:08:20 UTC 2013


Thanks, will look at this when I'm at the computer :-)
On 5 Dec 2013 19:06, "Brynjar Smári Bjarnason" <binni at binnisb.com> wrote:

> I'll ask one who knows but I think I could skip using the bonds. Can you
> suggest how I can ignore the bonds in efetch response, or the parser?
>
> Thanks a lot for looking at this!
> On 5 Dec 2013 18:12, "Peter Cock" <p.j.a.cock at googlemail.com> wrote:
>
>> On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock <p.j.a.cock at googlemail.com>
>> wrote:
>> >
>> > Not to worry - the site did respond when I retried a bit later, and
>> > I can reproduce the parser error:
>> >
>> >>>> from Bio import SeqIO
>> >>>> r = SeqIO.read("1MRR_A.gp", "genbank")
>> > BiopythonParserWarning: Couldn't parse feature location:
>> > 'join(bond(84),bond(115),bond(118),bond(238))'
>> > BiopythonParserWarning: Couldn't parse feature location:
>> > 'join(bond(115),bond(204),bond(238),bond(241))'
>> > BiopythonParserWarning: Couldn't parse feature location:
>> > 'join(bond(194),bond(272))'
>> > ...
>> > ValueError: CompoundLocation should have at least 2 parts
>>
>> The problem is the bond locations, and in particular while the
>> parser gave up on the ones with a warning, it fell over the
>> single bond entry, bond(196).
>>
>> This is partly due to a change in the use of the bond term,
>> which used to be a compound entry like bond(194,272).
>> Also the GenBank parser was and is primarily used on
>> nucleotide sequences rather than GenPept files which are
>> occasionally more weird (like here!).
>>
>> A short term hack would be to strip out the bond term
>> (with a warning) and parse the remainder as a simple
>> join or single residue accordingly.
>>
>> Would that work for you - do you need the bond bit?
>>
>> Peter
>>
>




More information about the Biopython-dev mailing list