[Biopython] Adaptor trimmer and dimers
natassa
natassa_g_2000 at yahoo.com
Thu Oct 22 09:29:35 UTC 2009
Hi Brad,
Thank you very much for your comments!
Peter had a good suggestion on profiling. The Python profile module
is quick to learn and can quickly point you in the direction of the
most used functions:
http://docs.python.org/library/profile.html
I looked at the profile module, I am still not sure about the input type I may give to cProfile (my module name?) - it is syntax comprehension problem now, but i am sure i ll solve it ;-)
- You are calling the pairwise2 alignment 3 times. You should call
this once, assign the alignment information to a variable, and then
perform your if/else tests on that. The updated trimming code above
is a good example of doing this.
Thanks! I forgot to clean up the code after I solved out this index error-this was my 'dirty' version when I was trying to understand this issue.
- You are slicing SeqRecord objects, and then never using the sliced
records. Your code doesn't look like adaptor trimming, but rather
filtering out reads without a sequence. If you don't need the
trimmed record, pass a string (str(rec1.seq) and str(rec2.seq)) to
the handle_adaptor function instead of the record; the slicing is
then done on a much simpler object and you avoid the substantial
overhead of slicing up quality scores that are never used.
Again, not very clean code as I have been oscillating between trimming/removing for some days now. I finally decided that if I don't have a big proportion of nearly exact (max 2 errors) matches to the adaptor in my reads, I may just discard them, as trimming a 33/37 bp adaptor from a 55-bp read does not leave much anyway.
You were right about passing a string to the function, I had not thought that passing the whole record would be more heavy. The revised script (for removing, but taking into account all your suggestions, so using the general iterator) is still running for very long, unfortunately without a profiler-I need to understand this module more..
Thanks for all suggestions!
Anastasia
Anastasia Gioti
Post-Doc, Evolutionary Biology Department
Upssala University
Norbyvägen 18D
SE-752 36 UPPSALA
anastasia.gioti at ebc.uu.se
Tel: +46-18-471 2837
Fax: +46-18-471 6310
More information about the Biopython
mailing list