[Biopython-dev] SeqFeature's FeatureLocation for GenBank
Peter
biopython-dev at maubp.freeserve.co.uk
Thu Nov 3 05:38:38 EST 2005
Marc Colosimo wrote:
> I want to point out the very bizarre behavior of FeatureLocations when
> using GenBank.FeatureParser (well to me anyways).
Its by design...
> When I was testing out some code, I noticed that the start positions
> were 1 less that in the GenBank Record, but the end positions were
> correct. My first thought was that this must be a bug and such went
> looking for it. I soon gave up because I just don't have the time to
> understand all the code that is involved (I was going to file a bug
> report). So, I just added 1 to the start positions and went on to get
> the features from the DNA. Suddenly I now understand why the positions
> were like that: slicing!
Exactly, e.g. something like:
seq[feature.location.start.position:feature.location.end.position]
> Unless I missed something, I didn't see anything talking about this
> behavior.
Python (like C) starts counting at zero, and this behaviour is
deliberate to make handling of the BioPython sequence objects as easy as
possible. Why - because the biopython DNA/RNA/Proteins sequences are as
much like Python strings as possible.
For example, to extract letters the 5 to 7 from "abcdefghijk" (using one
based counting, i.e. "efg") in Python you say "abcdefghijk"[4:7]
Suppose your gene is bases 150..300 (using one based counting as in a
GenBank file).
To extract this from the full DNA sequence, you would use something
like: fullsequence[149:300]
I suppose the CookBook may have assumed people were familiar with Python
strings already...
> Is this consistent with other parsers? If so, I would suggest
> that this is included in the Cookbook ...
It should be consistent with other parsers. Would you be able to
suggest some rewording of the CookBook to clarify this?
(I'm sure I have seen a similar question on the mailing list in the
past, so something could be improved)
> ... and that the classes are modified so that when printed (__str__)
> reports 1 instead of 0 (basically +1).
That would be bad for people using the existing behaviour.
You'll get used to it (especially if you have to switch between zero
based and one based languages).
Peter
More information about the Biopython-dev
mailing list