[Biopython] Adaptor trimmer and dimers

natassa natassa_g_2000 at yahoo.com
Thu Oct 22 09:29:35 UTC 2009


Hi Brad, 
Thank you very much for your comments! 

Peter had a good suggestion on profiling. The Python profile module
is quick to learn and can quickly point you in the direction of the
most used functions:
http://docs.python.org/library/profile.html


I looked at the profile module, I am still not sure about the input type I may give to cProfile (my module name?) - it is syntax comprehension problem now, but i am sure i ll solve it ;-)


- You are calling the pairwise2 alignment 3 times. You should call
  this once, assign the alignment information to a variable, and then
  perform your if/else tests on that. The updated trimming code above 
  is a good example of doing this.
Thanks! I forgot to clean up the code after I solved out this index error-this was my 'dirty' version when I was trying to understand this issue.


- You are slicing SeqRecord objects, and then never using the sliced
  records. Your code doesn't look like adaptor trimming, but rather
  filtering out reads without a sequence. If you don't need the
  trimmed record, pass a string (str(rec1.seq) and str(rec2.seq)) to
  the handle_adaptor function instead of the record; the slicing is
  then done on a much simpler object and you avoid the substantial 
  overhead of slicing up quality scores that are never used.

Again, not very clean code as I have been oscillating between trimming/removing  for some days now. I finally decided that if I don't have a big proportion of nearly exact (max 2 errors) matches to the adaptor in my reads, I may just discard them, as trimming a 33/37 bp adaptor from a 55-bp read does not leave much anyway. 
You were right about passing a string to the function, I had not thought that passing the whole record would be more heavy. The revised script (for removing, but taking into account all your suggestions, so using the general iterator) is still running for very long, unfortunately without a profiler-I need to understand this module more..
Thanks for all suggestions!
Anastasia


Anastasia Gioti
Post-Doc, Evolutionary Biology Department
Upssala University
Norbyvägen 18D
SE-752 36  UPPSALA
anastasia.gioti at ebc.uu.se
Tel: +46-18-471 2837
Fax: +46-18-471 6310



      



More information about the Biopython mailing list