[BioPython] I don't understand why SeqRecord.feature is a list

Peter biopython at maubp.freeserve.co.uk
Thu Jul 5 09:33:26 UTC 2007


Giovanni Marco Dall'Olio wrote:
> Let's have a look at your example:
> - we have a list of features like this:
> list_features = ['GTAAGT', 'TACTAAC', 'TGT']
> 
> - then we specify the meaning of these features in another dictionary:
> splicesignal5 = list_features[0]
> polypirimidinetract = list_features[1]
> splicesignal3 = list_features[2]
 >
> python passes the variables by value: this means that if you change
> one of the values in the list_features list, then you have to update
> all the variables which refer to it manually.
> 
>>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT']
>>>> splicesignal5 = list_features[0]
>>>> print splicesignal5
> 'GTAAGT'
>>>> list_features[0] = 'TTTTTTT'
>>>> print splicesignal5
> 'GTAAGT'     # wrong!
>>>> splicesignal5 = list_features[0]    # have to update all the
> variables which refer to list_features manually
>>>> print splicesignal5'
> 'TTTTTTT'
> 
> This is why I prefer to save the positions of the features instead of
> their values:
>>>> list_features = ['GTAAGT', 'TACTAAC', 'TGT']
>>>> dict_aliases = {'splicesignal5': [0], 'polypirimidinetract' : [1],
> 'splicesignal3': [2]}
>>>> def get_feature(feature_name): return
> list_features[dict_aliases[feature_name]] # (this code doesn't work)

...

 > Another option could be to use references to memory positions instead
 > of dictionary keys, but I don't know how to implement this in python,
 > and I'm not sure it would be computationally convenient.

Have you considered making "feature objects", where each object can hold 
multiple pieces of information such as a name, alias, type - as well as 
the sequence data itself. You may wish to create your own class here, or 
try and use the existing Biopython SeqFeature object.

You could then use a list to hold your feature objects, or a dictionary 
keyed on the alias perhaps. Or both.

e.g.

class Feature :
     #Very simple class which could be extended
     def __init__(self, seq_string) :
         self.seq = seq_string

     def __repr__(self) :
         #Use id(self) is to show the memory location (in hex), just
         #to show difference between two instance with same seq
         return "Feature(%s) instance at %s" \
                % (self.seq, hex(id(self)))


list_features = [Feature('GTAAGT'),
                  Feature('TACTAAC'),
                  Feature('TGT')]

splicesignal5 = list_features[0]
print splicesignal5
print list_features[0]

print "EDITING first object in the list:"
list_features[0].seq = 'TTTTTTT'

print splicesignal5 #changed, now TTTTTTT
print list_features[0]

print "REPLACING first object in the list:"
list_features[0] = Feature('GGGGGG')

print splicesignal5 #still points to old object, TTTTTTT
print list_features[0]

--

I'm not sure if that is closer to what you wanted, or not.

Peter




More information about the Biopython mailing list