[Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence.

Peter Cock p.j.a.cock at googlemail.com
Mon Nov 19 16:32:11 UTC 2012

On Mon, Nov 19, 2012 at 4:25 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> GenBank parser accepts these features, but later my program crashes.
>>Perhaps we should have a parser error/warning at that point?
>>(as well as any fix to the extract method)
> Probably a bit tricky because the GenBank file might not contain a
> sequence at all, and we can't tell until we either see the sequence or
> an end of record marker.

The first line should tell you the length, and we already have
a warning in place for naughty GenBank files where the actual
sequence has a different length. Those could be a problem for
this new warning, as you'd only know the expected sequence
length from the header while parsing the features.

>>> I have the feeling that raising an exception would be the best way
>>> of dealing with this, but of course I can also check the result
>>> of extract() to be different from an empty Seq object.
>>> The line I'd like to throw a ValueError on out-of-bounds coordinates
>>> is
>>> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811
>>> What are your thoughts on this?
>>Some might find this surprising given the (initially rather odd)
>>Python slicing behviour with out of range coordindates (which
>>indirectly cause the behaviour ovserved here):
>>>>> "hello"[100:200]
>>i.e. Slicing a string outside its bounds gives an empty string.
> Yes, that is why we end up with an empty Seq object.
>>On balance you're probably right that an error in this situation
>>makes more sense (a discrepancy between feature location
>>and the given parent sequence not being long enough).
> Yes. The way I understand the intention of the parent sequence,
> the whole point is that the feature should be located on it.
> I'll gladly prepare a patch (and some test).
> Cheers,
>  Kai



More information about the Biopython-dev mailing list