[Biopython] SeqIO for fasta conversion of Illumina files with > 60 bp

Peter biopython at maubp.freeserve.co.uk
Tue Sep 15 10:11:05 UTC 2009


On Tue, Sep 15, 2009 at 9:37 AM, natassa <natassa_g_2000 at yahoo.com> wrote:
>
> Hallo,
> I have been using SeqIO to convert Illumina (v1.3+) *sequence.txt files
> (containing both quality and sequence info) to simple Fastas.

Are you talking about Illumina FASTQ files? i.e. fastq-illumina in SeqIO?

> This worked well until I tried it on new reads of 75 bp. I need to have
> them in a single line, so fiddling around with the code I guess I need to
> change the wrap=60 argument in the FastaIO/FastaWriter class to
> wrap=0 to make it work. Am I right? are there any other bits of code
> that may be affected that I may have missed?

Bio.SeqIO defaults to writing FASTA files with 60bp line wrapping.
You want to output 75bp FASTA files without line wrapping?
As an aside, why? Line wrapping is common and normal in
FASTA files and in fact is more widely used than non-wrapping.
If another software tool can't read line wrapped FASTA it has
a bug in my opinion.

> I am sure this way of handling things is not a good one;-) , so I was
> wondering if other people have had the same problem and how this
> class could be modified to adress it in the future.

If you don't like the Bio.SeqIO.write(...) defaults, you can use the
underlying writer which may offer some options. In the case of
FASTA output, Bio.SeqIO.FastaIO allows you to set the wrapping.

e.g.

from Bio import SeqIO
from Bio.SeqIO.FastaIO import FastaWriter
records = SeqIO.parse(open("illumina.fastq"), "fastq-illumina")
handle = open("example.fasta", "w")
count = FastaWriter(handle, wrap=80).write_file(records)
handle.close()
print "Converted %i records" % count

Peter



More information about the Biopython mailing list