[Biopython] Trimming adaptors sequences

Mon Aug 10 07:12:21 EDT 2009

Hi all,

Brad's got an interesting blog post up on using Biopython for trimming
adaptors for next gen sequencing reads, using Bio.pairwise2 for
pairwise alignments between the adaptor and the reads:

http://bcbio.wordpress.com/2009/08/09/trimming-adaptors-from-short-read-sequences/

The basic idea is similar to what Giles Weaver was describing last
month, although Giles was using EMBOSS needle to do a global pairwise
alignment via BioPerl:
http://lists.open-bio.org/pipermail/biopython/2009-July/005338.html

We already had a simple FASTQ "primer trimming" example in the
tutorial, which I have just extended to add a more general FASTQ
"adaptor trimming" example. For this I am deliberately only looking
for exact matches. This is faster of course, but it also makes the
example much more easily understood as well - something important for
an introductory example.

A full cookbook example of how to use pairwise alignments would seem
like a great idea for a cookbook entry on the wiki. It would be
interesting to see which is faster - using EMBOSS needle/water or
Bio.pairwise2. Both are written in C, but using EMBOSS we'd have the
overhead of parsing the output file.

Brad - why are you using a local alignment and not a global alignment?
Shouldn't we be looking for the entire adaptor sequence? It looks like
you don't consider the the unaligned parts of the adaptor when you
count the mismatches - is this a bug? I wonder if it would be simpler
(and faster) to take a score based threshold.

Regards,

Peter