[Biopython-dev] Blast records
Peter
biopython at maubp.freeserve.co.uk
Tue Sep 22 07:40:46 EDT 2009
On Tue, Sep 22, 2009 at 11:12 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> When I was checking the current behavior of Biopython's blast parsers,
> I noticed that the plain-text parser and the XML parser give different
> results when parsing psi-blast output. The plain-text parser returns a
> Blast.Record.PSIBlast object, whereas the XML parser returns
> Blast.Record.Blast objects. ...
>
> Any other opinions, comments, suggestions?
As I recall (backed up by what I wrote in the tutorial), when I last
checked, the plain text PSI-BLAST output (i.e. from the command
line tool blastpgp) included a lot of information missing in the XML
output. Perhaps this has improved? If it hasn't, I am inclinded to
leave things as they are. If the current PSI-BLAST outputs more
details in the XML we may be able to do a better job.
The next bit is my recollection of some of the background to this:
Classic BLAST (and also RPS-BLAST) allow multiple queries and
use the "iterator" block in the XML file for each query. This was an
odd choice of naming, but I think the XML tag was originally only
intended for the PSI-BLAST outout where each "iteration" block
in the XML corresponds to each step of the algorithm. You may
recall early versions of BLAST would output "concatenated" XML
files for multiple queries - which were not true XML files. I guess
they fixed this by reusing the existing "iteration" structure for
multiple queries (rather than adding new XML tags). With this in
mind the current parsing of the XML from PSI-BLAST makes
sense.
[In any case, I plan to do Biopython 1.52 this afternoon, with
the PSI BLAST parsing left as is it].
Peter
More information about the Biopython-dev
mailing list