[Bioperl-l] Next-gen modules
Chris Fields
cjfields at illinois.edu
Thu Jul 23 22:58:01 UTC 2009
On Jul 23, 2009, at 6:31 AM, Peter Cock wrote:
> On Wed, Jul 8, 2009 at 5:24 PM, Chris Fields<cjfields at illinois.edu>
> wrote:
>>
>> It would be nice to get some regression tests going for this to
>> make sure it
>> does what we expect, so maybe some test data and expected results?
>>
>
> Regression tests for BioPerl's FASTQ support would of course
> be sensible. In terms of sample data and expected results...
>
> I've got some test files put together for Biopython, and I have
> been cross checking Biopython's FASTQ support against
> EMBOSS 6.1.0 which has turned up a few issues:
> http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000577.html
>
> ------------------------------------------------------------------------------
>
> I'd like to get comparisons against BioPerl's new FASTQ support
> going too. To do this I'd need to know which (branch?) of BioPerl I
> should install, and I'd also like a trivial sample BioPerl script to
> do
> piped FASTQ conversion. i.e. read a FASTQ file from stdin (say
> as "fastq-solexa"), and output it to stdout (say as "fastq" meaning
> the Sanger Standard FASTQ).
You would have to install svn (bioperl-live) if you want the
refactored fastq. That commit was within the last month.
> i.e. Something like this four line Biopython script would be perfect:
> http://biopython.org/wiki/Reading_from_unix_pipes
We use named parameters so it's a little more verbose.
use Bio::SeqIO;
my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-sanger');
my $out = Bio::SeqIO->new(-format => 'fastq-solexa');
while (my $seq = $in->next_seq) { $out->write_seq($seq) }
Don't be surprised if there are still bugs lurking about, just let me
know and I'll fix 'em.
> ------------------------------------------------------------------------------
>
> Peter Rice and I have also been talking about line wrapping when
> writing FASTQ output, and if this is a good idea or not:
> http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000593.html
>
> Thanks!
>
> Peter C. (@Biopython)
BTW, I think the bioperl parser does handle line-wrapped FASTQ now.
Anyway, I tend to agree with Aaron on that point. Too many exceptions
to the rule make it harder to write parsers for human-readable format.
chris
More information about the Bioperl-l
mailing list