[BioPython] Problem parsing Blast XML output from different sources

Michiel de Hoon mdehoon at c2b2.columbia.edu
Sun Oct 8 04:51:09 UTC 2006


Hi Steffi,

I am trying to replicate this problem with Blast. Where did you get the 
pat database? I searched for it with google, but there seems to be more 
than one blast database called pat.

--Michiel.

Steffi Gebauer-Jung wrote:
> Hello,
> 
> I don't know what local databases you have available for testing.
> The discrepancy between xml and 'pairwise text' output  should be seen
> for every Plus/Minus Hsp created by local Blastn (local server or
> standalone blastall from command line, I use version 2.2.14)
> 
> I tried several combinations, one is M38240 vs. pat database,
> the hsp hit was BD298385.
> Here are the interesting output snippets:
> 
>> dbj|BD298385.1| 
>> <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=Nucleotide&list_uids=92136243&dopt=GenBank> 
>> CLEAN SYNTHETIC VECTORS, PLASMIDS, TRANSGENIC PLANTS AND PLANT PARTS
>            CONTAINING THEM, AND METHODS FOR OBTAINING THEM
>          Length = 14108
> 
> Score =  125 bits (63), Expect = 1e-25
> Identities = 63/63 (100%)
> Strand = Plus / Minus
> 
>                                                                        
> Query: 727  aatgaagactaatctttttctctttctcatcttttcacttctcctatcattatcctcggc 
> 786
>            ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> Sbjct: 8332 aatgaagactaatctttttctctttctcatcttttcacttctcctatcattatcctcggc 
> 8273
> 
>               Query: 787  cga 789
>            |||
> Sbjct: 8272 cga 8270
> 
> =====================================================
>        <Hit>
>          <Hit_num>15</Hit_num>
>          <Hit_id>gi|92136243|dbj|BD298385.1|</Hit_id>
>          <Hit_def>CLEAN SYNTHETIC VECTORS, PLASMIDS, TRANSGENIC PLANTS 
> AND PLANT PARTS CONTAINING THEM, AND METHODS FOR OBTAINING THEM</Hit_def>
>          <Hit_accession>BD298385</Hit_accession>
>          <Hit_len>14108</Hit_len>
>          <Hit_hsps>
>            <Hsp>
>              <Hsp_num>1</Hsp_num>
>              <Hsp_bit-score>125.381</Hsp_bit-score>
>              <Hsp_score>63</Hsp_score>
>              <Hsp_evalue>9.63859e-26</Hsp_evalue>
>              <Hsp_query-from>789</Hsp_query-from>
>              <Hsp_query-to>727</Hsp_query-to>
>              <Hsp_hit-from>8270</Hsp_hit-from>
>              <Hsp_hit-to>8332</Hsp_hit-to>
>              <Hsp_query-frame>1</Hsp_query-frame>
>              <Hsp_hit-frame>-1</Hsp_hit-frame>
>              <Hsp_identity>63</Hsp_identity>
>              <Hsp_positive>63</Hsp_positive>
>              <Hsp_align-len>63</Hsp_align-len>
>              
> <Hsp_qseq>TCGGCCGAGGATAATGATAGGAGAAGTGAAAAGATGAGAAAGAGAAAAAGATTAGTCTTCATT</Hsp_qseq> 
> 
>              
> <Hsp_hseq>TCGGCCGAGGATAATGATAGGAGAAGTGAAAAGATGAGAAAGAGAAAAAGATTAGTCTTCATT</Hsp_hseq> 
> 
>              
> <Hsp_midline>|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline> 
> 
>            </Hsp>
>          </Hit_hsps>
>        </Hit>
> 
> Thanks, Steffi
> 
> 
> 
> 
> 
> 
> Michiel Jan Laurens de Hoon wrote:
> 
>> Which sequence are you running blast on?
>> I'd like to try this on our local blast installation.
>>
>> --Michiel.
>>
>> Steffi Gebauer-Jung wrote:
>>
>>> Hello,
>>>
>>> because of blastall 2.2.14 output was not parsed from the 
>>> Bio.Blast.NCBIStandalone parser,
>>> I tried to switch to the recommended Bio.Blast.NCBIXML parser.
>>>
>>> Thereby I found, that the xml output of the locally installed 
>>> standalone blastall (2.2.14)
>>> differs from the web xml output.
>>>
>>> For BlastN hsps on Plus/Minus strands, the xml gives
>>> query_frame/hit_frame  1 / -1 as usual.
>>> But query and frame positions and sequences are switched in direction
>>> (would match frames -1/1).
>>>
>>> As the Bio.Blast.Record returned by the NCBIXML parser only gives 
>>> frames, sequences
>>> and start positions it is not possible (without knowing the source of 
>>> the xml file)
>>> to be sure to find the right data.
>>>
>>> This is clearly a problem of Blast.
>>> But because of the missing end positions in the returned record object
>>> it becomes a problem for users of the parser too.
>>>
>>> Could somebody try to confirm the different behaviour of the xml 
>>> blast output
>>> with his/her own examples/installation?
>>>
>>> Thanks, Steffi
>>>
>>>
>>>
>>> _______________________________________________
>>> BioPython mailing list  -  BioPython at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biopython
>>
>>
>>
> 




More information about the Biopython mailing list