[BioPython] Parsing...

Brad Chapman chapmanb at uga.edu
Tue Jun 17 17:43:10 EDT 2003


Hi Uriel;

> i am working and searching informations on Blastp and PSIBlast.
> in order to compare Blast Results with Clustal Results, i need to parse Blast 
> output files and obtain completes sequences (not only hits locations).
> could you tell me:
>   1-Where could i find parsing blast results scripts/programs?

Biopython has BLAST parsers in Bio.Blast. The module you want to
work with NCBIStandalone.

Documentation of using the parser and the record information it
parses it into is located in the Biopython Tutorial:

http://www.biopython.org/docs/tutorial/index.html

Section 3.1.5 would probably be a good place to start to learn how
to use the code.

>   2- is it possible to get completes sequences?

The parser can only get what is in the BLAST result file, and this
does not include the complete sequence. If you need this, the best
solution is to the parse the GI (or some other identifier, depending
on the source of your database files) and then retrieve the full
sequence from some other source.

Using Biopython, there are really two different ways you could do
this. The first would be to index a FASTA file of all the sequences
and then retrieve it from this indexed file. Section 2.4.4 describes
doing this.

The second solution, which may be viable depending on how many
sequences you need to retrieve, would be to fetch the sequence
information from NCBI (assuming you have information you can get the
GI on). Section 3.4.1 of the tutorial describes how to do this.
There is also brand new code (in CVS only as of this weekend) 
to access EUtils at NCBI, but that is not yet documented (except 
in the source files in Bio.EUtils).

Hope this helps.
Brad


More information about the BioPython mailing list