[BioPython] I don't understand why SeqRecord.feature is a list

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Thu Jul 12 15:00:03 UTC 2007


Yes, it's true, it is something similar to the way SeqFeature should work.
But I just still don't get how to represent my genes in biopython :(

You know, I've printed the Bio module UML scheme from here:
http://www.pasteur.fr/recherche/unites/sis/formation/python/images/seq_class.png
and putted it in the wall above the monitor of my computer like a
poster.
So everyday, when I come at work, I see the Bio module UML scheme and
ask myself why SeqRecord.features is a list instead of a dictionary :)



2007/7/5, Peter <biopython at maubp.freeserve.co.uk>:
> Giovanni Marco Dall'Olio wrote:
> > Let's have a look at your example:
> > - we have a list of features like this:
> > list_features = ['GTAAGT', 'TACTAAC', 'TGT']
> >
> > - then we specify the meaning of these features in another dictionary:
> > splicesignal5 = list_features[0]
> > polypirimidinetract = list_features[1]
> > splicesignal3 = list_features[2]
>  >
> > python passes the variables by value: this means that if you change
> > one of the values in the list_features list, then you have to update
> > all the variables which refer to it manually.
> >
> >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT']
> >>>> splicesignal5 = list_features[0]
> >>>> print splicesignal5
> > 'GTAAGT'
> >>>> list_features[0] = 'TTTTTTT'
> >>>> print splicesignal5
> > 'GTAAGT'     # wrong!
> >>>> splicesignal5 = list_features[0]    # have to update all the
> > variables which refer to list_features manually
> >>>> print splicesignal5'
> > 'TTTTTTT'
> >
> > This is why I prefer to save the positions of the features instead of
> > their values:
> >>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT']
> >>>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1],
> > 'splicesignal3': [2]}
> >>>> def get_feature(feature_name): return
> > list_features[dict_aliases[feature_name]] # (this code doesn't work)
>
> ...
>
>  > Another option could be to use references to memory positions instead
>  > of dictionary keys, but I don't know how to implement this in python,
>  > and I'm not sure it would be computationally convenient.
>
> Have you considered making "feature objects", where each object can hold
> multiple pieces of information such as a name, alias, type - as well as
> the sequence data itself. You may wish to create your own class here, or
> try and use the existing Biopython SeqFeature object.
>
> You could then use a list to hold your feature objects, or a dictionary
> keyed on the alias perhaps. Or both.
>
> e.g.
>
> class Feature :
>      #Very simple class which could be extended
>      def __init__(self, seq_string) :
>          self.seq = seq_string
>
>      def __repr__(self) :
>          #Use id(self) is to show the memory location (in hex), just
>          #to show difference between two instance with same seq
>          return "Feature(%s) instance at %s" \
>                 % (self.seq, hex(id(self)))
>
>
> list_features = [Feature('GTAAGT'),
>                   Feature('TACTAAC'),
>                   Feature('TGT')]
>
> splicesignal5 = list_features[0]
> print splicesignal5
> print list_features[0]
>
> print "EDITING first object in the list:"
> list_features[0].seq = 'TTTTTTT'
>
> print splicesignal5 #changed, now TTTTTTT
> print list_features[0]
>
> print "REPLACING first object in the list:"
> list_features[0] = Feature('GGGGGG')
>
> print splicesignal5 #still points to old object, TTTTTTT
> print list_features[0]
>
> --
>
> I'm not sure if that is closer to what you wanted, or not.
>
> Peter
>
>


-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://dalloliogm.wordpress.com



More information about the Biopython mailing list