[Biopython-dev] Bio.GFF and Brad's code
Brad Chapman
chapmanb at 50mail.com
Fri Apr 17 20:05:58 UTC 2009
Peter and Michiel;
[start/end attributes on SeqFeatures]
> The SeqFeature already has start and end "attributes", but they are
> done with some magic in __getattr__, I was planning to update this
> to use a modern python property get. I can't find an enhancement
> bug on this so it may just have been on my mental to do list ;)
These attributes are on the FeatureLocation object. The whole
location hierarchy is a bit complicated to represent all of the
GenBank fuzziness, but it looks like:
SeqFeature -- has_a --> FeatureLocation -- has_two --> Positions (start, end)
So if you wanted to get a non-fuzzy start end, you need to do:
feature.location.nofuzzy_start, feature.location.nofuzzy_end
Your way above would be:
feature.location.start.position
So, I was thinking of hiding this Location/Position stuff from the
end user and just adding a start and end attribute directly on the
feature. For everyone that never touches fuzziness, this would make
more sense; it is also in line with making SeqFeature like Michiel's
proposed GFFRecord object.
[GFF to SeqFeature example]
> > I Orfeome PCR_product 12759747 12764936 . - . PCR_product "mv_B0019.1" ; Amplified 1 ; Amplified 1
> >
> > type: PCR_product
> > location: [12759746:12764936]
> > strand: -1
> > qualifiers:
> > Key: amplified, Value: ['1']
> > Key: pcr_product, Value: ['mv_B0019.1']
> > Key: source, Value: ['Orfeome']
> >
>
> Just to make I understand how this works, looking at your previous code example:
>
> >>> from BCBio.GFF.GFFParser import GFFAddingIterator
> >>> gff_iterator = GFFAddingIterator()
> >>> rec_dict = gff_iterator.get_all_features(gff_file)
>
> > The returned dictionary is like a dictionary from SeqIO.to_dict;
> > keys are ids and values are SeqRecords.
>
> What will be the key in rec_dict for the example GFF file above? Is that the "I" in the first column, as in
>
> rec_dict["I"] = a SeqRecord with the SeqFeature you described above?
Yes, that is exactly right. If we decide to have a SeqFeature
iterator, we should also add a 'rec_id' key/value pair to the
qualifiers that would map to the record -- chromosome 'I' in this
case. This would let the user do the mapping themselves.
Brad
More information about the Biopython-dev
mailing list