[Biopython] SeqIO for fasta conversion of Illumina files with > 60 bp
natassa
natassa_g_2000 at yahoo.com
Tue Sep 15 15:02:06 UTC 2009
That does look like a FASTQ file, and you probably know that it
came from a Solexa/Illumina machine. However, it could be an early
Solexa/Illumina file using Solexa scores ("fastq-solexa" in SeqIO),
or a more recent Illumina GA pipeline 1.3+ FASTQ file with PHRED
scores ("fastq-illumina" in SeqIO). From the read length (76bp) I
would guess this probably is an "fastq-illumina" file, but you
should double check this, as it does matter for poor quality reads.
Because you created some doubts in my already confused mind:
The machine is indeed Solexa/Illumina. I have 55bp and 76 bp reads from pipeline v1.3 and v1.4, respectively. In the pipeline manuals they say that the scoring scheme is Phred. I know there is a lot of confusion about the terms, this is why I preferred to use the seqIO -I hope I did not mix the formats....
If you prefer to work in Python, it should be easy to recreate
a Biopython version of the same script. Which script are we
talking about? Is it publicly available?
It is called shuffleSequences_fasta.pl and goes along with the (free) distribution of velvet (Zerbino, EBI). The script is really simple.
Thanks again,
Anastasia
More information about the Biopython
mailing list