[emboss-dev] EMBOSS seqret FASTQ support

Fri Jul 24 10:01:11 EDT 2009

On Fri, Jul 24, 2009 at 11:14 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> I'd like to re-test with your fixes. I presume these things are being fixed
>> in the public CVS repository, so I could try building EMBOSS from there.
>> Is there a particular branch? Or are you planning an EMBOSS 6.1.1
>> release shortly?
>
> You found various things in sequence formats, but all are resolved by
> changes to ajseqread.c and ajseqwrite.c
>
> Assuming I am happy with the test I plan to make a patch which will
> update those files.

If issuing patches is how you prefer to handle this, that's fine with me.
Will you do updates to the binaries for Windows users etc?

> The CVS code would have new things for the next release. For now,
> if you are using the 6.1.0 release, patching is the way to go.

So if I want to retest with your fixes, I can either use CVS or wait for
the patches?

> Fixes so far:
>
> FASTQ format changes:
>
> * sequence and quality scores on one line

That does seem to be preferred in general.

> * quality ID line shortened to '+'

This is certainly the way MAQ does it, and as a Sanger based tool
that gives this some status - in addition to the file size benefit ;)

> * Solexa negative quality score output corrected
> * Phred quality score rounding error fixed

Were the above two the same issue?

> * Corrected reading of quality lines starting with '@'

Great. Can you read this file fine now?
http://biopython.org/SRC/biopython/Tests/Quality/tricky.fastq

> GenBank format changes:
>
> * protein (genpept and refseqp) formats auto-detect fix for multiple
> input sequences
>
> Intelligenetics format:
>
> * Sequence ID corrected for DOS format input file
>
> Did I miss anything?

I think that's everything. Thank you!

Peter