[Biopython] SeqIO for fasta conversion of Illumina files with > 60 bp
Peter
biopython at maubp.freeserve.co.uk
Tue Sep 15 11:14:52 EDT 2009
On Tue, Sep 15, 2009 at 4:02 PM, natassa <natassa_g_2000 at yahoo.com> wrote:
>
>> If you prefer to work in Python, it should be easy to recreate
>> a Biopython version of the same script. Which script are we
>> talking about? Is it publicly available?
>
> It is called shuffleSequences_fasta.pl and goes along with the
> (free) distribution of velvet (Zerbino, EBI). The script is really
> simple.
Oh right - you can see the scripts on Daniel's github repository,
http://github.com/dzerbino/velvet
Both scripts are very very simple minded, which means fixing
the bug will actually be a big change:
shuffleSequences_fasta.pl appears to assume every FASTA
entry is exactly two lines (a safe assumption for short reads
like 36bp from early Solexa/Illumina), but not a safe choice
in general as wrapping in FASTA is normal.
shuffleSequences_fastq.pl appears to assume every FASTQ
entry is exactly four lines (a reasonable assumption, especially
for short reads like 36bp reads from early Solexa/Illumina),
but not a safe choice in general as FASTQ files can also be
wrapped (even if it is discouraged).
We should be able to mimic these in Biopython using SeqIO...
Peter
More information about the Biopython
mailing list