[Biopython-dev] Line wrapping in FASTQ output

Peter Rice pmr at ebi.ac.uk
Thu Jul 23 04:08:51 EDT 2009


Peter C. wrote:
> Hi Peter R. et al,
>
> For Biopython we should be able cope with any strange line breaks in
> the sequences and qualities lines on input, but for output don't do
> any line wrapping. I felt this would result in more widely parseable
> output. I wondered what your thought process was, and if you think it
> is worth removing the line wrapping on EMBOSS's FASTQ output (or
> indeed, if you have a good argument to convince me to make Biopython
> output FASTQ with line wrapping by default).

There is also an issue with making the ines so long that brain-damaged 
parsers (those that read a line in C and fail to check it was a complete 
line) will fail.

Leaving the line breaks in was deliberate in EMBOSS 6.1.0 to see whether 
any parsers would object.

The obvious compromise is to increase the default line length in EMBOSS 
to say 500 so that anyone reading up to 512 characters will still be 
safe. Unfortunately some flk will then assume there will never be a line 
break.

Alternatively, we could truly make everything fit on one line.

Or we could double up the fastq outputs with and without line breaks 
(horrible problems with naming the ouptut formats)

I suspect this one-line thing is a simple attempt to avoid the "quality 
line starting with '@' or '+'" issue.

> [I nearly CC'd BioPerl-l with this. In fact, this topic strikes me as
> ideal for an OBF cross project mailing list, something we talked about
> at BOSC/ISMB 2009. Am I right in thinking you (Peter Rice) were going
> to look into this?]

Yes indeed I was. Waylaid by the demands of the 6.1.0 EMOSS release but 
I will get back on to it.

regards,

Peter


More information about the Biopython-dev mailing list