[Biopython] help with NCBIXML.parse

Peter Cock p.j.a.cock at googlemail.com
Wed Mar 28 10:19:05 EDT 2012


On Wed, Mar 28, 2012 at 3:03 PM,  <ferreirafm at usp.br> wrote:
> Citando Peter Cock <p.j.a.cock at googlemail.com>:
>> You seem to be calling BLAST multiple times in a loop and
>> trying to give it SeqRecord objects.
>
>
> Yes, because I want just only one hit per sequence. If someone has a
> overcome to this, it would be great. If a run it with a multiple fasta file,
> I'll take several hits per sequence. Like this:
>
> ...

Try using the -max_target_seqs argument.

>> As to the specific error, did you look at your blast_out.xml
>> file and what it said on line 88?
>
> line 88 is a second "header" of the xml file. It seems xmlparse can't handle
> it.
>
> </BlastOutput><?xml version="1.0"?>
> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN"
> "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
> <BlastOutput>

That is not allowed in XML. On re-reading your code, I see
this happens because you are effectively concatenating the
output for several BLAST runs (via stdout) into the one file.

Historically the NCBI BLAST tools used to do something like
this but with <?xml version="1.0"?> on a new line, so we do
have some special case code to cope with that. You could
try making this small change:

 outf.write(stdout)

to:

 outf.write(stdout)
 outf.write("\n")

That might work. However that isn't an elegant solution
 because if it works it relies on some special case code
in Biopython for an NCBI bug.

Instead you could parse each output inside the for loop?

Peter


More information about the Biopython mailing list