[Biopython] Adaptor trimmer and dimers

Brad Chapman chapmanb at 50mail.com
Mon Oct 19 11:24:41 UTC 2009


Hi Anastasia;

> I ve gone through these posts already, so my question was whether
> a global alignment script exists-Brad Chapman's script does a
> local alignment.

I found that local alignments behaved better in terms of trimming,
but if you want global alignments it's easy to change. Edit line 42
of the script from:

pairwise2.align.localms

to:

pairwise2.align.globalms


> Also, I would be mostly interested in discarding
> adapter-dimer reads and I do not find any adaptation on his code to
> detect those, unless I am wrong.. 

You should get back an empty or very short read, which you can then
discard in your script.

> I would also like to discard their
> pairs, as I am inputting those to velvet assembler which takes into
> account the pair-read information for scaffolding. 

This is also something you can do after calling the trimmer. Read
each end of the pair, trim both sequences and then check that they
pass your size threshold. If both pass, then write them to the file
you'll be using for assembly:

adaptor = "GATC"
num_errors = 2
size_thresh = 17
pair1 = read_seq()
pair2 = read_seq()
trim1 = trim_adaptor(pair1, adaptor, num_errors)
trim2 = trim_adaptor(pair2, adaptor, num_errors)

if len(trim1) >= size_thresh and len(trim2) >= size_thresh:
    write_pair(trim1, trim2)

Hope this helps,
Brad


> I can try to  write
> up something integrating the above features, I was just wondering if
> there is anything out there already and whether people find this a
> sensible approach. Kind regards,
> Anastasia 
> --- On Thu, 10/15/09, Peter <biopython at maubp.freeserve.co.uk> wrote:
> 
> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython] Adaptor trimmer and dimers
> To: "natassa" <natassa_g_2000 at yahoo.com>
> Cc: biopython at lists.open-bio.org
> Date: Thursday, October 15, 2009, 12:20 PM
> 
> On Thu, Oct 15, 2009 at 5:00 PM, natassa <natassa_g_2000 at yahoo.com> wrote:
> > Hallo Biopythoners,
> > I followed a recent thread conversation about adaptor trimming,
> > which I intend to do on Illumina runs, and I am not sure I know
> > where exactly in github I could find Brad Chapman's code for
> > trimming AFTER modifications that he has done based on the
> > thread conversation. ...
> 
> I guess you mean Brad's August Blog Post:
> http://bcbio.wordpress.com/2009/08/09/trimming-adaptors-from-short-read-sequences/
> and the following mailing list thread which included some tips on
> speeding up the Biopython side of things:
> http://lists.open-bio.org/pipermail/biopython/2009-August/005417.html
> 
> For anyone else interested, there are some simple examples in the
> tutorial (using SeqRecord slicing - elegant and simple, but a bit slow):
> http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:FASTQ-slicing-off-primer
> http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec:FASTQ-slicing-off-adaptor
> 
> And I did a blog post about low level FASTQ handling for speed
> at the cost of flexibility and simplicity (using some of the same
> ideas from the August mailing list discussion):
> http://news.open-bio.org/news/2009/09/biopython-fast-fastq/
> 
> Peter
> 
> 
> 
>       
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython



More information about the Biopython mailing list