[Biopython-dev] Binary blast DB reader

Alexey Morozov alexeymorozov1991 at gmail.com
Fri Mar 13 11:06:01 UTC 2015


I got the iterator support and random access working. Alias support can
also be worked around by using each of nr.*.pin bases as a separate DB. So
before I go to more complex stuff like proper Aliases and support for DNA,
I have a few integration-related questions:
1) How adding code to Biopython work anyway? Do I send a pull request to
main Biopython repo?
2) What handle you think SeqIO-compatible module should take, if any? BLAST
database is at least 3 files (if there are no aliases and no additional
indexes)
3) Is there any need to even bother with SeqIO.write()? Writing one
sequence is a major hassle and no one is gonna generate these bases
directly from Python when there is Bio.Applications

2015-03-11 11:54 GMT+08:00 Alexey Morozov <alexeymorozov1991 at gmail.com>:

> Yes, DNA support, alias files and so on will be added. I'm not sure about
> indexing, though. That may be a nice option to have, but definitely not
> default.
>
> 2015-03-10 17:17 GMT+08:00 Peter Cock <p.j.a.cock at googlemail.com>:
>
>> Hi Alexey,
>>
>> I think this would be a very interesting addition to Bio.SeqIO,
>> initially a simple iterator for parsing the records one by one
>> (like a binary FASTA file), but with more work if we can decode
>> the BLAST lookup files efficient random access via the
>> Bio.SeqIO.index interface would be great.
>>
>> Your code currently only looks at protein databases - have
>> you tried nucleotide BLAST databases yet? I would expect
>> the basic structure to be quite similar...
>>
>> Also for many uses parsing *.pal (and *.nal) alias files would
>> be important (e.g. accessing sequencing from the nr/nt
>> databases).
>>
>> Once you have some functional tests, generalising Python 3
>> code to also run on Python 2.6 and 2.7 is not too hard - and
>> we would need to do that for inclusion in Biopython. See
>> the code in SffIO.py for another example which deals with
>> a binary sequence file format:
>>
>> https://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py
>>
>> Peter
>>
>>
>>
>> On Tue, Mar 10, 2015 at 7:59 AM, Alexey Morozov
>> <alexeymorozov1991 at gmail.com> wrote:
>> > Dear colleagues,
>> > I have a pretty crude pure Python3 module that reads binary BLAST
>> databases
>> > (https://github.com/SynedraAcus/BinaryBlast). Is there any chance
>> it'll make
>> > to main distribuition as SeqIO subclass if I add iterator behaviour and
>> fix
>> > dirty hacks?
>> >
>> > --
>> > Alexey Morozov,
>> > LIN SB RAS, bioinformatics group.
>> > Irkutsk, Russia.
>> >
>> > _______________________________________________
>> > Biopython-dev mailing list
>> > Biopython-dev at mailman.open-bio.org
>> > http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>>
>
>
>
> --
> Alexey Morozov,
> LIN SB RAS, bioinformatics group.
> Irkutsk, Russia.
>



-- 
Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150313/c93f2d19/attachment.html>


More information about the Biopython-dev mailing list