[Biopython-dev] Bio.GFF and Brad's code

Brad Chapman chapmanb at 50mail.com
Fri Apr 17 20:05:58 UTC 2009


Peter and Michiel;

[start/end attributes on SeqFeatures]
> The SeqFeature already has start and end "attributes", but they are
> done with some magic in __getattr__, I was planning to update this
> to use a modern python property get.  I can't find an enhancement
> bug on this so it may just have been on my mental to do list ;)

These attributes are on the FeatureLocation object. The whole
location hierarchy is a bit complicated to represent all of the
GenBank fuzziness, but it looks like:

SeqFeature -- has_a --> FeatureLocation -- has_two --> Positions (start, end)

So if you wanted to get a non-fuzzy start end, you need to do:

feature.location.nofuzzy_start, feature.location.nofuzzy_end

Your way above would be:

feature.location.start.position

So, I was thinking of hiding this Location/Position stuff from the
end user and just adding a start and end attribute directly on the
feature. For everyone that never touches fuzziness, this would make
more sense; it is also in line with making SeqFeature like Michiel's
proposed GFFRecord object.

[GFF to SeqFeature example]
> > I	Orfeome	PCR_product	12759747	12764936	.	-	.	PCR_product "mv_B0019.1" ; Amplified 1 ; Amplified 1
> > 
> > type: PCR_product
> > location: [12759746:12764936]
> > strand: -1
> > qualifiers: 
> >         Key: amplified, Value: ['1']
> >         Key: pcr_product, Value: ['mv_B0019.1']
> >         Key: source, Value: ['Orfeome']
> > 
> 
> Just to make I understand how this works, looking at your previous code example:
> 
> >>> from BCBio.GFF.GFFParser import GFFAddingIterator
> >>> gff_iterator = GFFAddingIterator()
> >>> rec_dict = gff_iterator.get_all_features(gff_file)
> 
> > The returned dictionary is like a dictionary from SeqIO.to_dict;
> > keys are ids and values are SeqRecords.
> 
> What will be the key in rec_dict for the example GFF file above? Is that the "I" in the first column, as in
>
> rec_dict["I"] = a SeqRecord with the SeqFeature you described above?

Yes, that is exactly right. If we decide to have a SeqFeature
iterator, we should also add a 'rec_id' key/value pair to the
qualifiers that would map to the record -- chromosome 'I' in this
case. This would let the user do the mapping themselves.

Brad



More information about the Biopython-dev mailing list