[Bioperl-l] FASTQ, was Re: BioPerl long-term, was Re: dependencies on perl version

Aaron Mackey amackey at virginia.edu
Thu Feb 7 15:25:07 UTC 2013


You might also want to consider a lazy/pull-based parser to defer
parsing/object-building for pieces of the object that don't get used.  This
also usually provides some error tolerance.

-Aaron

--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
amackey at virginia.edu
http://www.cphg.virginia.edu/mackey


On Wed, Feb 6, 2013 at 5:53 PM, Fields, Christopher J <cjfields at illinois.edu
> wrote:

> On Feb 6, 2013, at 4:43 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > On Wed, Feb 6, 2013 at 10:11 PM, Fields, Christopher J
> > <cjfields at illinois.edu> wrote:
> >>
> >> I see no problem in stating any generic parsing and low-level interfaces
> >> are just as much a part of what BioPerl encompasses as the higher-level
> >> Bio::* classes themselves.  Steve and Jason were on to something with
> >> SearchIO; it's maybe not as performant as we would like, but it
> certainly
> >> is more flexible in terms of what can be done, b/c it separates out
> >> low-level parsing from object creation.  That's the general model we
> >> should look at.  There is a good reason Biopython is following this
> >> model with their SearchIO implementation (Peter C, are you reading
> this?)
> >
> > Actually I don't think we did end up with that kind of separation in the
> > Biopython SearchIO - which is not so say it isn't an excellent model
> > to follow. Rather the Biopython SearchIO (like the BioPerl one) had
> > as the first goal a consistent object model across assorted file
> > formats.
> >
> > The idea of a low level minimal overhead parsers (which are very
> > format specific), on which a heavier but consistent object model
> > can be built might be a good balance - the high level API has the
> > connivence, but if you give that up you can have more speed.
> > That's what I recommend with FASTQ and Biopython, e.g.
> > http://news.open-bio.org/news/2009/09/biopython-fast-fastq/
> >
> >>
> >> I have started a wrapper around Heng's FASTQ/FASTA parsing
> >> code (kseq), it seems to work quite well (~20M FASTQ in 30 sec
> >> last I recall?).
> >>
> >
> > I'd have to dig through my emails, but I think the BioRuby guys
> > looked at that too - as I recall while it was fast, the error handling
> > left something to be desired. Email me directly or on the BioRuby
> > list if you want to follow up on that.
> >
> > Regards,
> >
> > Peter
>
> I did a little on this, worth following up on, but I pulled the FASTQ test
> examples you created from the paper to test it out.  IIRC it parsed where
> it needed to, but I'm not sure how it handled bad sequences, so yes, worth
> looking into.  Maybe worth moving to open-bio-l for broader discussion.
>
> chris
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list