[Biopython-dev] SeqFeature's FeatureLocation for GenBank
Marc Colosimo
mcolosimo at mitre.org
Fri Nov 4 10:20:15 EST 2005
Michiel,
That would probably help. Along with some additions to the cookbook.
Marc
Michiel Jan Laurens de Hoon wrote:
> I think the confusion is coming from the way FeatureLocation prints
> itself. (0..2701) looks too much like a GenBank-style location. If
> FeatureLocation were to print (0:2701) instead, it's pretty clear that
> this is a Python-style slice. One solution might be to let
> FeatureLocation inherit from list, and override as needed for abstract
> positions.
>
> --Michiel.
>
> Peter wrote:
>
>> Marc Colosimo wrote:
>>
>>> Thank you for the response. However, I know how lists work in
>>> Python (and C, and Java, etc...). That was not question. Here is
>>> some code to show you what I mean about the inconsistent behavior of
>>> Locations.
>>>
>>> from Bio import GenBank
>>> gi_list = GenBank.search_for("AB077698")
>>> ncbi_dict = GenBank.NCBIDictionary( 'nucleotide', 'genbank', parser
>>> = GenBank.FeatureParser() )
>>> seq_rec = ncbi_dict[gi_list[0]]
>>> print len(seq_rec.seq) # returns 2701, which is correct
>>> # now lets look at a feature location
>>> source_feature = seq_rec.features[0] print
>>> source_feature.type # should be 'source'
>>> print source_feature.location # (0..2701), in the gb record it
>>> was (1..2701). The start is correct, the end is NOT
>>
>>
>>
>> The start and end ARE correct in that seq_rec.seq[0:2701] will return
>> all of the sequence.
>>
>> The first nucleotide is seq_rec.seq[0]
>> The last nucleotide is seq_rec.seq[2700]
>> The length is 2701
>>
>> It makes more sense in the case of (sub)features, rather than the
>> source 'feature' which is everything.
>>
>> In the same way, a string of length 5, e.g. "abcde"
>> "abcde"[0] == "a"
>> "abcde"[4] == "e"
>> "abcde"[0:5] == "abcde"
>> "abcde"[5] is out of range.
>>
>> From memory, the location object is actually rather more complicated
>> because it copes with nasty locations like 123..<150 plus joins etc,
>> as well as the simple cases like 123..150
>>
>> It took me a while to get my head round the location object too.
>>
>>> # get a slice
>>> seq_rec.seq[source_feature.location.start.position :
>>> source_feature.location.end.position]
>>> # returns the correct thing
>>
>>
>>
>>> # now lets see what the first nt looks like
>>> seq_rec.seq[source_feature.location.start.position] # works fine
>>
>>
>>
>>> # now lets see what the last nt looks like
>>> seq_rec.seq[source_feature.location.end.position]
>>> IndexError: string index out of range
>>
>>
>>
>> This is correct. See my example with a string "abcde"[5]
>>
>>> # The correct answer for is...
>>> seq_rec.seq[source_feature.location.end.position - 1] # now, this
>>> is different from how start position works!
>>
>>
>>
>> It has to be different for the splicing trick. Again, its the
>> "fault" of trying to be the same as python strings.
>>
>>> # but wait there is more...
>>> # What if I didn't know about the funny end position business and
>>> wrote this,
>>> seq_rec.seq[source_feature.location.start.position:
>>> source_feature.location.end.position + 1]
>>> # This works, but it is not correct because it has added a nt from
>>> the beginning to the end (slices are nice about that)
>>> # If I were to use this on the other internal features I would get
>>> the wrong thing (by one nt)
>>>
>>> So, either location End should be 2700, Start should be 1, or state
>>> 'explicitly' what Locations positions represent. But not 0..2701.
>>
>>
>>
>> I personally am happy with having start 0, end 2701 for a genbank
>> location of 1..2701 and this it is logical.
>>
>> However, the documentation could be improved.
>>
>>> Changing the end position probably would mess up lots of code. So
>>> that leaves documentation. You can add my code above to the cookbook
>>> <http://biopython.org/docs/tutorial/Tutorial004.html#toc16>.
>>
>>
>>
>> Maybe an extra sub section, before the current "3.7.2.2 Locations"
>> for the location simple case? i.e. No joins, no fuzzy locations.
>> Just some very simple examples...
>>
>> Peter
>
More information about the Biopython-dev
mailing list