[EMBOSS] fasta single-line sequence format?
    Niels Larsen 
    niels at genomics.dk
       
    Tue Aug 27 16:40:07 UTC 2013
    
    
  
> Ah, but can you trust the first record? If it is a relatively short 
> sequence it may be on one line, but later sequences may wrap. Depends on 
> the record limit.
> 
> As to the format name .... a name beginning 'fasta-' would be easiest to 
> document. For FASTQ we used fastq, fastq-sanger, fastq-solexa, and so on.
> 
Indeed file sampling isn't water-tight, but i still think the
programmatic
equivalent of this:  head -n 2000 file | grep '^>' | wc --lines (where
the 
output number is 1000 if unwrapped) is much faster than being water
tight and bulletproof, given the very large files being handled. Besides
when did i last see a fasta file with 580 long sequence lines .. can't 
think of it. 
Niels L
> regards,
> 
> Peter Rice
> EMBOSS Team
    
    
More information about the EMBOSS
mailing list