[EMBOSS] Conservation of FASTQ scores by the EMBOSS tools.

Peter biopython at maubp.freeserve.co.uk
Wed Sep 16 05:31:22 EDT 2009


On Wed, Sep 16, 2009 at 7:57 AM, Charles Plessy
<charles-listes-emboss at plessy.org> wrote:
>
> Dear EMBOSS developers,
>
> I have multi-sequence file in FASTQ format that contains sequencing reads, and
> would like to retreive them the with seqret. But as you see in the following
> example, quality scores are not preserved:
>
> $ seqret P13-CA.fq:F1EZY7316JY25B fastq::stdout
> Reads and writes (returns) sequences
> @F1EZY7316JY25B rank=0000040 x=3973.0 y=285.0 length=68
> AATGATACGGCGACCACCGAACACTGCGTTTGCTGGCTTTGATGCACTTCTCATGGCCAATTTCATTG
> +
> """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

You need to use "fastq-sanger" (or the other variants), since in
EMBOSS, "fastq" currently means FASTQ ignoring the qualities.
This is documented:

http://emboss.sourceforge.net/docs/themes/SequenceFormats.html

As an EMBOSS user, I think the current situation is confusing, and
it would make much more sense to have "fastq" just an alias for
"fastq-sanger" (which would be consistent with Biopython and BioPerl).

http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000576.html

And also this email - especially the last example:
http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000599.html

> The purpose was to use seqret as a workaround for the fact that
> vectorstrip does not keep the quality either.

That's also been suggested, and is likely to be supported in future.
http://lists.open-bio.org/pipermail/emboss/2009-August/003722.html

Peter


More information about the EMBOSS mailing list