[EMBOSS] Conservation of FASTQ scores by the EMBOSS tools.
Peter Rice
pmr at ebi.ac.uk
Thu Sep 17 03:24:07 EDT 2009
Charles Plessy wrote:
> I would also like if the qualities were kept by default. I actually
had tried
> to force the fastq-sanger format before, but by adding its name to the USAs,
> like in ‘seqret fastq-sanger::stdin fastq-sanger::stdout’. Unfortunately it did
> not work; I do not know if it is by design or because of the dash in the format
> name. Nevertheless -sformat=fastq-sanger and -osformat=fastq-sanger worked very
> well after I applied Mahmut's patch.
Yes, the dash in the format name is causing problems. It should be
allowed where there is a '::' in the USA (it is not allowed in database
queries because of the dbname-field:value query syntax).
I will make a patch for this.
> I am tempted to apply it also to the Debian EMBOSS package, but maybe it is too
> prematurate. In particular, I have the following warning each time the quality
> is encoded by an equal sign:
>
> Warning: Illegal character '='
> Warning: Illegal pattern: =
This is surprising. Is your EMBOSS version the original distribution or
have you applied the current patches.
If it fails with the patched version, could you send me an input file
that causes this error.
> By the way, I think I found a bug in revseq: it seems that it does not reverse
> the qualities:
True ... this I will also patch. We have used quaslities for some years
(in Staden experiment format) but it appears nobody has reversed
sequences and kept the qualities. Life is changing with FASTQ data!
> Also, in contrary to what the documentation predicts, using the fastq format
> for the output does not ignore the quality scores. (Not that would be
> particularly useful, but…)
This is deliberate. We have to write somethign in FASTQ format and we
default to the fastq-sanger format. On input, fastq-sanger ignores
qualities because there is no safe way to decide which format is correct.
regards,
Peter
More information about the EMBOSS
mailing list