[Biopython] SeqIO for fasta conversion of Illumina files with > 60 bp

Peter biopython at maubp.freeserve.co.uk
Tue Sep 15 15:08:13 UTC 2009


On Tue, Sep 15, 2009 at 4:02 PM, natassa <natassa_g_2000 at yahoo.com> wrote:
>
>> That does look like a FASTQ file, and you probably know that it
>> came from a Solexa/Illumina machine. However, it could be an early
>> Solexa/Illumina file using Solexa scores ("fastq-solexa" in SeqIO),
>> or a more recent Illumina GA pipeline 1.3+ FASTQ file with PHRED
>> scores ("fastq-illumina" in SeqIO). From the read length (76bp) I
>> would guess this probably is an "fastq-illumina" file, but you
>> should double check this, as it does matter for poor quality reads.
>
> Because you created some doubts in my already confused mind:
> The machine is indeed Solexa/Illumina. I have 55bp and 76 bp
> reads from pipeline v1.3 and v1.4, respectively. In the pipeline
> manuals they say that the scoring scheme is Phred.  I know
> there is a lot of confusion about the terms, this is why I
> preferred to use the seqIO -I hope I did not mix the formats....

That's fine then - the Solexa/Illumina 1.3 and 1.4 pipelines use
PHRED scores (with a FASTQ ASCII offset of 64), and in
Biopython we call this the "fastq-illumina" format.

Peter




More information about the Biopython mailing list