[Biopython] still more questions about NGS sequence trimming

Peter Cock p.j.a.cock at googlemail.com
Thu Oct 25 15:29:57 UTC 2012


On Thu, Oct 25, 2012 at 3:49 PM, Kiss, Csaba <csaba.kiss at lanl.gov> wrote:
> Thanks, Peter. I am writing my quality functions. Another question about
> trimming. As you mentioned, the quality of the ends tend to be lower than
> in the middle. Could that be fixed just by using "sff-trim" when I create my
> FASTQ file? If I don't do that I get sequences with small and capital letters.
> Are you suggesting further trimming than just "sff-trim".

In Bio.SeqIO, we use the file format names "sff" and "sff-trim" to mean
the raw sequence data from the SFF file in full, or with the trimming
values inside the SFF file applied. If you have used the Roche tools
you'll see a similar option in their SFF extraction tool. This default
trimming is decided by the Roche 454 instrument and does quite a
good job at removing the adapters, barcodes and poor quality bits.

I assume you were using Mothur to do further trimming based on a
more stringent sliding window of quality scores?

Peter



More information about the Biopython mailing list