[Biopython] SeqIO for fasta conversion of Illumina files with > 60 bp

Peter biopython at maubp.freeserve.co.uk
Tue Sep 15 15:14:52 UTC 2009


On Tue, Sep 15, 2009 at 4:02 PM, natassa <natassa_g_2000 at yahoo.com> wrote:
>
>> If you prefer to work in Python, it should be easy to recreate
>> a Biopython version of the same script. Which script are we
>> talking about? Is it publicly available?
>
> It is called shuffleSequences_fasta.pl and goes along with the
> (free) distribution of velvet (Zerbino, EBI). The script is really
> simple.

Oh right - you can see the scripts on Daniel's github repository,
http://github.com/dzerbino/velvet

Both scripts are very very simple minded, which means fixing
the bug will actually be a big change:

shuffleSequences_fasta.pl appears to assume every FASTA
entry is exactly two lines (a safe assumption for short reads
like 36bp from early Solexa/Illumina), but not a safe choice
in general as wrapping in FASTA is normal.

shuffleSequences_fastq.pl appears to assume every FASTQ
entry is exactly four lines (a reasonable assumption, especially
for short reads like 36bp reads from early Solexa/Illumina),
but not a safe choice in general as FASTQ files can also be
wrapped (even if it is discouraged).

We should be able to mimic these in Biopython using SeqIO...

Peter



More information about the Biopython mailing list