[BioPython] import Standalone problems

Jacob Joseph jmjoseph at andrew.cmu.edu
Wed Jul 19 19:18:12 UTC 2006


I do not believe the current version of the parser will work with
multiple queries using recent version of blast, regardless of the output
format.  I do know that blastall 2.2.13 with XML functions with the
parser corrections previously attached.  I have attached a further
updated NCBIXML.py, fixing the performance issues in parse() that I
mentioned.

-Jacob

Rohini Damle wrote:
> Hi,
> Can someone suggest me for which version of Blast, the Biopython's
> (text or xml) parser works fine?
> I will download that blast version locally and can use biopython's parser.
> thanx,
> Rohini
> 
> On 7/18/06, Jacob Joseph <jacob at jjoseph.org> wrote:
>> Hi.
>> I encountered similar difficulties over the past few days myself and
>> have made some improvements to the XML parser.  Well, that is, it now
>> functions with blastall, but I have made no effort to parse the other
>> blast programs.  I do not expect I have done any harm to other parsing,
>> however.
>>
>> Attached are Record.py, NCBIStandalone.py, and NCBIXML.py.  I have not
>> yet spent significant time to clean up my changes.  Without getting into
>> specific modifications, I have made an effort to make consistent the
>> variables in Record and NCBIXML, focusing primarily on what I needed
>> this week.
>>
>> One portion I am not settled on reinitialization of Record.Blast at
>> every call to iterator.next(), and, by extension, BlastParser.parse().
>> See NCBIXML.py, line 114.  Without re-initializing this class, we run
>> the risk of retaining portions of a Record from previously parsed
>> queries.   This causes the bug 1970, mentioned below.  Unfortunately,
>> this re-initialization exacts a significant performance penalty of at
>> least a factor of 10 by some rough measures.  I would appreciate any
>> suggestions for improvement here.
>>
>> I do apologize for not being more specific about my changes.  When I get
>> a chance(next week?), I will package them up as a proper patch and file
>> a bug.  Perhaps what I have done so far will be of use until then.
>>
>> fyi, I have done all of my testing with Blast 2.2.13.  2.2.14 seems to
>> not have separate <?xml> blocks within its output, requiring a different
>> method of iteration.
>>
>> -Jacob
>>
>> Peter wrote:
>> > Rohini Damle wrote:
>> >> Hi,
>> >> I have a XML file with 4 blast records (for proteins P1, P2, P3, P4)
>> >> I am trying to extract alignment information for each of them.
>> >> So I wrote the following code:
>> >>
>> >>  for b_record in b_iterator :
>> >>
>> >>                 E_VALUE_THRESH =20
>> >>                 for alignment in b_record.alignments:
>> >>                        for hsp in alignment.hsps:
>> >>                        if hsp.expect< E_VALUE_THRESH:
>> >>
>> >>                             print '****Alignment****'
>> >>                             print 'sequence:', 
>> alignment.title.split()[0]
>> >>
>> >> With this code, I am getting information for P1,
>> >> then information for P1 + P2
>> >> then for P1+P2 +P3
>> >> and finally for P1+P2+P3+P4
>> >> why this is so?
>> >> is there something wrong with the looping?
>> >
>> > I'm aware of something funny with the XML parsing, Bug 1970, which 
>> might
>> > well be the same issue:
>> >
>> > http://bugzilla.open-bio.org/show_bug.cgi?id=1970
>> >
>> > I confess I haven't looked into exactly what is going wrong here - too
>> > many other demands on my time to learn about XML and how BioPython
>> > parses it.
>> >
>> > Does the work around on the bug report help?  Depending on which 
>> version
>> > of standalone blast you have installed, you might have better luck with
>> > plain text output - the trouble is this is a moving target and the NBCI
>> > keeps tweaking it.
>> >
>> > Peter
>> >
>> > _______________________________________________
>> > BioPython mailing list  -  BioPython at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biopython

-------------- next part --------------
A non-text attachment was scrubbed...
Name: NCBIXML.py.gz
Type: application/x-gzip
Size: 3209 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20060719/a4f5167e/attachment-0002.gz>


More information about the Biopython mailing list