[Biopython] SeqIO for fasta conversion of Illumina files with > 60 bp

natassa natassa_g_2000 at yahoo.com
Tue Sep 15 15:02:06 UTC 2009


That does look like a FASTQ file, and you probably know that it
came from a Solexa/Illumina machine. However, it could be an early
Solexa/Illumina file using Solexa scores ("fastq-solexa" in SeqIO),
or a more recent Illumina GA pipeline 1.3+ FASTQ file with PHRED
scores ("fastq-illumina" in SeqIO). From the read length (76bp) I
would guess this probably is an "fastq-illumina" file, but you
should double check this, as it does matter for poor quality reads.

Because you created some doubts in my already confused mind:
The machine is indeed Solexa/Illumina. I have 55bp and 76 bp reads from pipeline v1.3 and v1.4, respectively. In the pipeline manuals they say that the scoring scheme is Phred.  I know there is a lot of confusion about the terms, this is why I preferred to use the seqIO -I hope I did not mix the formats.... 


If you prefer to work in Python, it should be easy to recreate
a Biopython version of the same script. Which script are we
talking about? Is it publicly available?

It is called shuffleSequences_fasta.pl and goes along with the (free) distribution of velvet (Zerbino, EBI). The script is really simple. 
Thanks again, 
Anastasia



      


More information about the Biopython mailing list