[Biopython-dev] SearchIO, was: PEP8 lower case module names?

Peter Cock p.j.a.cock at googlemail.com
Mon Dec 3 16:49:47 UTC 2012


On Mon, Dec 3, 2012 at 2:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> I've started work on SearchIO indexing of BGZF files now,
> enabling it was quite simple (the same code as used for
> SeqIO the indexing):
> https://github.com/biopython/biopython/commit/cf063bf6a2dca4d534d00699310548e43bf2e14f
>
> Thus far I've only tested this with BLAST XML, but that did
> require a bit of reworking to avoid doing file offset arithmetic:
> https://github.com/biopython/biopython/commit/600b231a1817035141c8de80e5689dcfd31290b5
>
> I will resume this work later this afternoon, going over all
> the SearchIO file formats one by one.

I've refactored test_SearchIO_index.py to make adding
additional get_raw tests easier. Proper testing of all the
formats with BGZF will some larger test files (over 64k
before compression) which we probably don't want to
include in the repository.

However, I also added code to additionally test
Bio.SearchIO.index_db(...).get_raw(...) as well as your
original testing of Bio.SearchIO.index(...).get_raw(...)
alone. These should return the exact same string, and
that is now working nicely for BLAST XML (and BGZF
from limited testing), but not on all the formats.

Could you look at the difference in get_raw and the
record length found during indexing for: blast-tab
(with comments), hmmscan3-domtab, hmmer3-tab,
and hmmer3-text?

i.e. Anything where test_SearchIO_index.py is now
printing a WARNING line when run.

Thanks,

Peter



More information about the Biopython-dev mailing list