[Biopython] Problems parsing with PSIBlastParser
Chris Fields
cjfields at illinois.edu
Tue Nov 3 13:16:02 UTC 2009
Peter,
On Nov 3, 2009, at 6:52 AM, Peter wrote:
> On Fri, Oct 16, 2009 at 1:04 AM, Michiel de Hoon
> <mjldehoon at yahoo.com> wrote:
>>
>> Last time I checked (which was a few weeks ago), a multiple-query
>> PSIBlast
>> search gives a file consisting of concatenated XML files. The
>> problem is in
>> the design of Blast XML output. For a single-query PSIBlast, the
>> fields under
>> <BlastOutput_iterations> are used to store the output of the
>> PSIBlast iterations.
>> For multiple-query regular Blast, the same fields are used to store
>> the search
>> results of each query. With multiple-query PSIBlast, there is then
>> no way to
>> store the output in the current XML format. I've been meaning to
>> write to NCBI
>> about this, but I haven't gotten round to it yet. Will do so this
>> weekend.
>>
>> --Michiel.
>
> Did you get any reply?
>
> Peter
We had the same problem w/ the BioPerl XML parser and ended up
preprocessing the data into separate XML files, carrying over the
relevant information into each file (yes, there is a better way, but
it essentially involves a redesign of the XML parser and related
objects).
BTW, the same thing happens if one runs multiple queries in the same
file. All individual report XML are in one single XML file, and
information relevant to all reports is only found into the first
report. I think this has been known for a while. I've repeatedly
tried contacting NCBI but haven't had a response re: this problem.
chris
More information about the Biopython
mailing list