[Biopython-dev] Documentation for new Blast parser

Peter biopython-dev at maubp.freeserve.co.uk
Sat Mar 10 07:24:33 EST 2007


Michiel Jan Laurens de Hoon wrote:
> Hi everybody,
> 
> For the upcoming Biopython release, I rewrote the chapter on Blast in 
> the tutorial to describe our new Blast parser. For those of you who want 
> to have a preview, I put a copy here:
> 
> http://biopython.org/DIST/docs/tutorial/Tutorial-new.html
> 
> Please let me know if you have any comments, or if you find any errors 
> or omissions.

Good work - I've made a few notes.

--------------------------------------------------------------------

In section "3.1  Running BLAST locally" I would also stress the fact 
that this is your only choice if you are using "private" data, for 
example unpublished data from a company.  e.g. add something like this 
after the "two advantages" of running BLAST locally:

Another reason to run blast locally is if you are dealing with 
proprietary or unpublished sequence data.  You may not be allowed to 
redistribute the sequences, so submitting them to the NCBI as a blast 
query would not be an option.

--------------------------------------------------------------------

In section "3.1  Running BLAST locally" the wording about the location 
of the database files could be a little clearer.

You wrote the following (which I have reformatted to use shorter lines):

 >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis"
# I used formatdb to create a BLAST database named bsubtilis in the
# directory /home/mdehoon/Data/Genomes/Databases.
# The BLAST database consists of the files bsubtilis.nhr, bsubtilis.nin,
# and bsubtilis.nsq in this directory.

You talk about four files, but only name three of them.

I also found the path to be a little unclear... I think you meant this:

 >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis"
# I used formatdb to create a BLAST database named bsubtilis
# (for Bacillus subtilis) consisting of the following four files:
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.nhr
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.nin
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.nsq
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.???

rather than the file being inside a subdirectory, bsubtilis, like this:

 >>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis"
# I used formatdb to create a BLAST database named bsubtilis
# (for Bacillus subtilis) consisting of the following four files:
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nhr
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nin
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nsq
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.???

--------------------------------------------------------------------

I think you should include an explicit example of running standalone 
blast and getting XML files back, i.e. include this at the end of 
section 3.1 (rather than just mentioning it):

 >>> from Bio.Blast import NCBIStandalone
 >>> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, \
                     'blastn', my_blast_db, my_blast_file, align_view=7)

I am wondering if now is a good time to switch the default output format 
to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast etc given 
NCBIWWW.qblast already defaults to XML.

----------------------------------------------------------------------

There is an extra "the" at the end of the first paragraph of section 
"3.4  Parsing BLAST output":

"..., it is also much easier to parse automatically, making the 
Biopython a whole lot more stable."

Should read:

"..., it is also much easier to parse automatically, making Biopython a 
whole lot more stable."

Also should it be "Biopython" or "BioPython"?  The website uses a mixture...

-----------------------------------------------------------------------

This email is getting a bit long - I'll read the rest of the document later.

Peter



More information about the Biopython-dev mailing list