[BioPython] Problem parsing Blast XML output from different sources

Steffi Gebauer-Jung gebauer-jung at ice.mpg.de
Fri Oct 13 13:23:35 UTC 2006


Hello Michiel,

the fix works fine.
Thanks for the fast reply and fixing!

Maybe there should be a hint for other users
not to use the frame information of the blast xml output
and to test the start/end positions of the hsp sequences instead,
and to be aware of reverse query sequences.

For my needs I have to have the query sequence in forward direction.
That's why I try to reverse-complement the complete alignment
if this isn't the case yet.

Thereby I found, that Bio.Seq.Seq.complement() cannot handle unicode 
sequences,
in spite of Bio.Seq.Seq might be initialized with unicode strings:
 >>> import Bio.Seq
 >>> s = Bio.Seq.Seq(u'acgt')
 >>> s
Seq(u'acgt', Alphabet())
 >>> s.complement()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.5/site-packages/Bio/Seq.py", line 101, in 
complement
    s = self.data.translate(ttable)
TypeError: character mapping must return integer, None or unicode

And just another idea:
In order to (reverse)complement aligned sequences it would be useful to 
have
the gap sign '-' in the alphabets.

Steffi



Michiel Jan Laurens de Hoon wrote:

> Hi Steffi,
>
> I had the same result when running Blast locally.
>
> I added hsp.query_end and hsp.sbjct_end to the Blast XML parser, so 
> you can get around this problem. Could you try the fixed Blast parser?
> You'll need to pick up Bio/Blast/NCBIXML.py and Bio/Blast/Record.py from
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/?cvsroot=biopython 
>
>
> If it works fine (or if it doesn't), please send a message to the 
> Biopython mailing list (instead of my email address), so that this 
> gets into the mailing list archives.
>
> --Michiel.
>
>
> Steffi Gebauer-Jung wrote:
>
>> Hello,
>>
>> the db was downloaded from ftp://ftp.ncbi.nih.gov//blast/db/patnt.tar.gz
>>
>> In fact the special query sequence and db shouldn't matter.
>>
>> If you have any 'Plus / Minus' HSP in a pairwise BlastN output
>> you can run BlastN again in order to get the xml formatted output.
>>
>> Comparing the special HSP in both formats you should see the effect.
>
>
>




More information about the Biopython mailing list