[Biopython-dev] Line wrapping in FASTQ output
Peter Rice
pmr at ebi.ac.uk
Thu Jul 23 08:08:51 UTC 2009
Peter C. wrote:
> Hi Peter R. et al,
>
> For Biopython we should be able cope with any strange line breaks in
> the sequences and qualities lines on input, but for output don't do
> any line wrapping. I felt this would result in more widely parseable
> output. I wondered what your thought process was, and if you think it
> is worth removing the line wrapping on EMBOSS's FASTQ output (or
> indeed, if you have a good argument to convince me to make Biopython
> output FASTQ with line wrapping by default).
There is also an issue with making the ines so long that brain-damaged
parsers (those that read a line in C and fail to check it was a complete
line) will fail.
Leaving the line breaks in was deliberate in EMBOSS 6.1.0 to see whether
any parsers would object.
The obvious compromise is to increase the default line length in EMBOSS
to say 500 so that anyone reading up to 512 characters will still be
safe. Unfortunately some flk will then assume there will never be a line
break.
Alternatively, we could truly make everything fit on one line.
Or we could double up the fastq outputs with and without line breaks
(horrible problems with naming the ouptut formats)
I suspect this one-line thing is a simple attempt to avoid the "quality
line starting with '@' or '+'" issue.
> [I nearly CC'd BioPerl-l with this. In fact, this topic strikes me as
> ideal for an OBF cross project mailing list, something we talked about
> at BOSC/ISMB 2009. Am I right in thinking you (Peter Rice) were going
> to look into this?]
Yes indeed I was. Waylaid by the demands of the 6.1.0 EMOSS release but
I will get back on to it.
regards,
Peter
More information about the Biopython-dev
mailing list