[Biopython] Problems parsing with PSIBlastParser

Tue Nov 3 13:16:02 UTC 2009

Peter,

On Nov 3, 2009, at 6:52 AM, Peter wrote:

> On Fri, Oct 16, 2009 at 1:04 AM, Michiel de Hoon  
> <mjldehoon at yahoo.com> wrote:
>>
>> Last time I checked (which was a few weeks ago), a multiple-query  
>> PSIBlast
>> search gives a file consisting of concatenated XML files. The  
>> problem is in
>> the design of Blast XML output. For a single-query PSIBlast, the  
>> fields under
>> <BlastOutput_iterations> are used to store the output of the  
>> PSIBlast iterations.
>> For multiple-query regular Blast, the same fields are used to store  
>> the search
>> results of each query. With multiple-query PSIBlast, there is then  
>> no way to
>> store the output in the current XML format. I've been meaning to  
>> write to NCBI
>> about this, but I haven't gotten round to it yet. Will do so this  
>> weekend.
>>
>> --Michiel.
>
> Did you get any reply?
>
> Peter

We had the same problem w/ the BioPerl XML parser and ended up  
preprocessing the data into separate XML files, carrying over the  
relevant information into each file (yes, there is a better way, but  
it essentially involves a redesign of the XML parser and related  
objects).

BTW, the same thing happens if one runs multiple queries in the same  
file.  All individual report XML are in one single XML file, and  
information relevant to all reports is only found into the first  
report.  I think this has been known for a while.  I've repeatedly  
tried contacting NCBI but haven't had a response re: this problem.

chris