[Open-bio-l] FASTQ records with no sequence?
biopython at maubp.freeserve.co.uk
Thu Jul 30 11:35:25 EDT 2009
On the continuing topic of the nebulous FASTQ format, are there
any strong views as to weather a FASTQ files could hold records
without a sequence (and therefore no quality scores)? This could
make sense as output from an (aggressive) quality filter.
This was a discussion I meant to start on the OBF list, not the
EMBOSS list - so here is the start of the thread:
Basically in some contexts an empty FASTQ record makes sense,
so perhaps we should include examples of this for our test suite.
However, there is more than one reasonable way to represent
such a record (either omitting the sequence and quality lines, or
including blank sequence and quality lines).
On Thu, Jul 30, 2009 at 4:09 PM, Peter Rice<pmr at ebi.ac.uk> wrote:
> Peter C. wrote:
>> As we are recommending no line wrapping on output this means
>> typical FASTQ records would be four lines - so doing the same
>> makes sense here too.
> I vote for 4 lines on output.
If we want to allow zero length sequences, then yes, I would also
vote for the 4 line output (i.e. blank lines for the sequence and
the quality string).
> It should be possible to allow zero lines on input depending on
> where the '+' check is.
Yes, I'm pretty sure a parser could cope with any of the zero length
sequence FASTQ examples I gave.
More information about the Open-Bio-l