[Biopython] still more questions about NGS sequence trimming

Kiss, Csaba csaba.kiss at lanl.gov
Thu Oct 25 11:34:46 EDT 2012


I believe mothur does check the moving average quality of a sequence with a sliding window of 50 bp. If the quality falls below the given value then it tosses the sequence out. I don't think it does end trimming beside removing the small letters from the ends. 
Of course, it  can remove adapter and primer sequences but that's not based on quality values.

C
-----Original Message-----
From: Peter Cock [mailto:p.j.a.cock at googlemail.com] 
Sent: Thursday, October 25, 2012 9:30 AM
To: Kiss, Csaba
Cc: biopython at lists.open-bio.org
Subject: Re: [Biopython] still more questions about NGS sequence trimming

On Thu, Oct 25, 2012 at 3:49 PM, Kiss, Csaba <csaba.kiss at lanl.gov> wrote:
> Thanks, Peter. I am writing my quality functions. Another question 
> about trimming. As you mentioned, the quality of the ends tend to be 
> lower than in the middle. Could that be fixed just by using "sff-trim" 
> when I create my FASTQ file? If I don't do that I get sequences with small and capital letters.
> Are you suggesting further trimming than just "sff-trim".

In Bio.SeqIO, we use the file format names "sff" and "sff-trim" to mean the raw sequence data from the SFF file in full, or with the trimming values inside the SFF file applied. If you have used the Roche tools you'll see a similar option in their SFF extraction tool. This default trimming is decided by the Roche 454 instrument and does quite a good job at removing the adapters, barcodes and poor quality bits.

I assume you were using Mothur to do further trimming based on a more stringent sliding window of quality scores?

Peter



More information about the Biopython mailing list