[Bioperl-l] Bio::Index::Fastq '@' in qual

Tue Nov 1 10:38:43 EDT 2011

On Tue, Nov 1, 2011 at 1:40 PM, Fields, Christopher J
<cjfields at illinois.edu> wrote:
>
> One problem the various Bio* indexers have currently is the lack of
> standardization on a specific schema for indexing.  There are in-roads
> towards this (OBDA) that haven't been adequately traveled IMHO,
> which need to be taken up again.
>

Something to switch to open-bio-l at lists.open-bio.org for,
http://lists.open-bio.org/mailman/listinfo/open-bio-l

We can continue this thread from last summer,
http://lists.open-bio.org/pipermail/open-bio-l/2010-April/000662.html
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000676.html
...
http://lists.open-bio.org/pipermail/open-bio-l/2010-June/000680.html

And CC Peter Rice from EMBOSS too - we chatted about this
at ISMB/BOSC 2011 in July - and whomever looks after the
OBDA/indexing code in BioRuby and BioJava too.

> A second, and maybe this is more specific to BioPerl, is that the
> parsers and indexers essentially reimplement the format parsing
> in each module, so if there are bugs they have to be independently
> fixed (hence why SeqIO works and the indexer doesn't; I wrote the
> first but not the second).  The best place for any optimizations
> would be in a unified parser that both the SeqIO and indexer
> modules could use.

We have that problem to an extent in Biopython's Bio.SeqIO code.
The indexing code duplicates some logic of the parsing code
(how much depends on the format), sufficient to extract the read
ID and the bounds on disk. The two could be more unified but
the parsers came first and didn't want to change them at the time.
Instead I tried to be rigorous in consistency testing for the index
code's unit tests.

Regards,

Peter