[Biopython-dev] Documentation for new Blast parser
Peter
biopython-dev at maubp.freeserve.co.uk
Sat Mar 10 07:24:33 EST 2007
Michiel Jan Laurens de Hoon wrote:
> Hi everybody,
>
> For the upcoming Biopython release, I rewrote the chapter on Blast in
> the tutorial to describe our new Blast parser. For those of you who want
> to have a preview, I put a copy here:
>
> http://biopython.org/DIST/docs/tutorial/Tutorial-new.html
>
> Please let me know if you have any comments, or if you find any errors
> or omissions.
Good work - I've made a few notes.
--------------------------------------------------------------------
In section "3.1 Running BLAST locally" I would also stress the fact
that this is your only choice if you are using "private" data, for
example unpublished data from a company. e.g. add something like this
after the "two advantages" of running BLAST locally:
Another reason to run blast locally is if you are dealing with
proprietary or unpublished sequence data. You may not be allowed to
redistribute the sequences, so submitting them to the NCBI as a blast
query would not be an option.
--------------------------------------------------------------------
In section "3.1 Running BLAST locally" the wording about the location
of the database files could be a little clearer.
You wrote the following (which I have reformatted to use shorter lines):
>>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis"
# I used formatdb to create a BLAST database named bsubtilis in the
# directory /home/mdehoon/Data/Genomes/Databases.
# The BLAST database consists of the files bsubtilis.nhr, bsubtilis.nin,
# and bsubtilis.nsq in this directory.
You talk about four files, but only name three of them.
I also found the path to be a little unclear... I think you meant this:
>>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis"
# I used formatdb to create a BLAST database named bsubtilis
# (for Bacillus subtilis) consisting of the following four files:
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.nhr
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.nin
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.nsq
# /home/mdehoon/Data/Genomes/Databases/bsubtilis.???
rather than the file being inside a subdirectory, bsubtilis, like this:
>>> my_blast_db = "/home/mdehoon/Data/Genomes/Databases/bsubtilis"
# I used formatdb to create a BLAST database named bsubtilis
# (for Bacillus subtilis) consisting of the following four files:
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nhr
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nin
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.nsq
# /home/mdehoon/Data/Genomes/Databases/bsubtilis/bsubtilis.???
--------------------------------------------------------------------
I think you should include an explicit example of running standalone
blast and getting XML files back, i.e. include this at the end of
section 3.1 (rather than just mentioning it):
>>> from Bio.Blast import NCBIStandalone
>>> result_handle, error_info = NCBIStandalone.blastall(my_blast_exe, \
'blastn', my_blast_db, my_blast_file, align_view=7)
I am wondering if now is a good time to switch the default output format
to XML in NCBIStandalone.blastall, NCBIStandalone.rpsblast etc given
NCBIWWW.qblast already defaults to XML.
----------------------------------------------------------------------
There is an extra "the" at the end of the first paragraph of section
"3.4 Parsing BLAST output":
"..., it is also much easier to parse automatically, making the
Biopython a whole lot more stable."
Should read:
"..., it is also much easier to parse automatically, making Biopython a
whole lot more stable."
Also should it be "Biopython" or "BioPython"? The website uses a mixture...
-----------------------------------------------------------------------
This email is getting a bit long - I'll read the rest of the document later.
Peter
More information about the Biopython-dev
mailing list