[Biopython-dev] Binary blast DB reader

Peter Cock p.j.a.cock at googlemail.com
Tue Mar 10 09:17:21 UTC 2015


Hi Alexey,

I think this would be a very interesting addition to Bio.SeqIO,
initially a simple iterator for parsing the records one by one
(like a binary FASTA file), but with more work if we can decode
the BLAST lookup files efficient random access via the
Bio.SeqIO.index interface would be great.

Your code currently only looks at protein databases - have
you tried nucleotide BLAST databases yet? I would expect
the basic structure to be quite similar...

Also for many uses parsing *.pal (and *.nal) alias files would
be important (e.g. accessing sequencing from the nr/nt
databases).

Once you have some functional tests, generalising Python 3
code to also run on Python 2.6 and 2.7 is not too hard - and
we would need to do that for inclusion in Biopython. See
the code in SffIO.py for another example which deals with
a binary sequence file format:

https://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py

Peter



On Tue, Mar 10, 2015 at 7:59 AM, Alexey Morozov
<alexeymorozov1991 at gmail.com> wrote:
> Dear colleagues,
> I have a pretty crude pure Python3 module that reads binary BLAST databases
> (https://github.com/SynedraAcus/BinaryBlast). Is there any chance it'll make
> to main distribuition as SeqIO subclass if I add iterator behaviour and fix
> dirty hacks?
>
> --
> Alexey Morozov,
> LIN SB RAS, bioinformatics group.
> Irkutsk, Russia.
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev


More information about the Biopython-dev mailing list