[Biopython-dev] SeqFeature.FeatureLocation.extract() silently fails when coordinates are outside of the parent_sequence.

Peter Cock p.j.a.cock at googlemail.com
Mon Nov 19 16:10:15 UTC 2012


On Mon, Nov 19, 2012 at 2:11 PM, Kai Blin
<kai.blin at biotech.uni-tuebingen.de> wrote:
> Hi folks,
>
> I'm currently investigating an error caused by an invalid GenBank file
> input that annotates CDS features with invalid coordinates. The
> GenBank parser accepts these features, but later my program crashes.

Perhaps we should have a parser error/warning at that point?
(as well as any fix to the extract method)

> It turns out the crash is because I'm calling the extract() method for
> my seq features, which then return an empty Seq object for
> out-of-range parent_sequence.
>
> I have the feeling that raising an exception would be the best way of
> dealing with this, but of course I can also check the result of
> extract() to be different from an empty Seq object.
>
> The line I'd like to throw a ValueError on out-of-bounds coordinates is
> https://github.com/biopython/biopython/blob/master/Bio/SeqFeature.py#L811
>
> What are your thoughts on this?

Some might find this surprising given the (initially rather odd)
Python slicing behviour with out of range coordindates (which
indirectly cause the behaviour ovserved here):

>>> "hello"[100:200]
''

i.e. Slicing a string outside its bounds gives an empty string.

On balance you're probably right that an error in this situation
makes more sense (a discrepancy between feature location
and the given parent sequence not being long enough).

Peter



More information about the Biopython-dev mailing list