[Biopython-dev] Binary blast DB reader

Peter Cock p.j.a.cock at googlemail.com
Fri Mar 13 11:29:35 UTC 2015

On Fri, Mar 13, 2015 at 11:06 AM, Alexey Morozov
<alexeymorozov1991 at gmail.com> wrote:
> I got the iterator support and random access working. Alias support can also
> be worked around by using each of nr.*.pin bases as a separate DB. So before
> I go to more complex stuff like proper Aliases and support for DNA, I have a
> few integration-related questions:


> 1) How adding code to Biopython work anyway? Do I send a pull request to
> main Biopython repo?

Yes please.

> 2) What handle you think SeqIO-compatible module should take, if any?
> BLAST database is at least 3 files (if there are no aliases and no additional
> indexes)

Unlike the rest of Bio.SeqIO, a single handle does not make sense here.

>From a user perspective I suggest we require the user give us the
BLAST database name as used with the BLAST command line tools.
i.e. The stem of the filename without the .p?? or .n?? extension.

The high level SeqIO API accepts either a filename or a handle. We have
special case code to determine which mode to open handles with (text or
binary), see variable _BinaryFormats in Bio/SeqIO/__init__.py - we'd
need something similar for BLAST database where we'd check the
argument is a string and not a handle.

> 3) Is there any need to even bother with SeqIO.write()?

I see no strong reason to attempt to write out a BLAST database from
within SeqIO (note the same issues apply regarding setting the filename).

> Writing one sequence is a major hassle and no one is gonna generate
> these bases directly from Python when there is Bio.Applications

Writing a single sequence BLAST database would be near useless,
but ought to be just part of the general case of writing a BLAST
database with many sequences.

Recommending the user call makeblastdb seems much more sensible.



More information about the Biopython-dev mailing list