[Open-bio-l] FASTQ support in Biopython, BioPerl, and EMBOSS

Aaron Mackey ajmackey at gmail.com
Thu Jul 30 23:52:03 UTC 2009


I would strongly warn against truncation, for any reason.  Use the formulas
you have for quality-encoding conversions, but do not assume that you know
more than I do about what my data contains, or that you are in any way
helping me by altering my data, silently or otherwise.  Said another way,
feel free to warn me that my data may contain garbage, and utterly fail to
convert it for me, but do not try to fix it for me.

-Aaron

On Thu, Jul 30, 2009 at 5:50 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Thu, Jul 30, 2009 at 9:08 PM, Chris Fields<cjfields at illinois.edu>
> wrote:
> >
> > I do think if it affects performance to a significant enough degree we
> > can do this silently, we just need to ensure this is well-documented.
>
> Agreed.
>
> > My opinions is this use will prove to be a edge case anyway (most will
> > want conversion to Sanger vs. Illumina/Solexa).
>
> Absolutely.
>
> Going from Solexa/Illumina to Sanger FASTQ will be more common
> (and there are no truncation issues). Going from Sanger FASTQ to
> Solexa or Illumina FASTQ will be rarer, and while a truncation is
> possible it requires very high scores (above PHRED 62) which are
> likely only to be possible from a consensus alignment or such like.
> i.e. Yes, it should be an edge case.
>
> I guess this expected usage supports the argument about issuing a
> warning on truncation, even with a modest performance overhead
> (because it only slows down the rarer expected usage).
>
> But let's get some benchmarks done to help settle this...
>
> Peter
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/open-bio-l
>



More information about the Open-Bio-l mailing list