[BioPython] import Standalone problems
Jacob Joseph
jmjoseph at andrew.cmu.edu
Wed Jul 19 19:18:12 UTC 2006
I do not believe the current version of the parser will work with
multiple queries using recent version of blast, regardless of the output
format. I do know that blastall 2.2.13 with XML functions with the
parser corrections previously attached. I have attached a further
updated NCBIXML.py, fixing the performance issues in parse() that I
mentioned.
-Jacob
Rohini Damle wrote:
> Hi,
> Can someone suggest me for which version of Blast, the Biopython's
> (text or xml) parser works fine?
> I will download that blast version locally and can use biopython's parser.
> thanx,
> Rohini
>
> On 7/18/06, Jacob Joseph <jacob at jjoseph.org> wrote:
>> Hi.
>> I encountered similar difficulties over the past few days myself and
>> have made some improvements to the XML parser. Well, that is, it now
>> functions with blastall, but I have made no effort to parse the other
>> blast programs. I do not expect I have done any harm to other parsing,
>> however.
>>
>> Attached are Record.py, NCBIStandalone.py, and NCBIXML.py. I have not
>> yet spent significant time to clean up my changes. Without getting into
>> specific modifications, I have made an effort to make consistent the
>> variables in Record and NCBIXML, focusing primarily on what I needed
>> this week.
>>
>> One portion I am not settled on reinitialization of Record.Blast at
>> every call to iterator.next(), and, by extension, BlastParser.parse().
>> See NCBIXML.py, line 114. Without re-initializing this class, we run
>> the risk of retaining portions of a Record from previously parsed
>> queries. This causes the bug 1970, mentioned below. Unfortunately,
>> this re-initialization exacts a significant performance penalty of at
>> least a factor of 10 by some rough measures. I would appreciate any
>> suggestions for improvement here.
>>
>> I do apologize for not being more specific about my changes. When I get
>> a chance(next week?), I will package them up as a proper patch and file
>> a bug. Perhaps what I have done so far will be of use until then.
>>
>> fyi, I have done all of my testing with Blast 2.2.13. 2.2.14 seems to
>> not have separate <?xml> blocks within its output, requiring a different
>> method of iteration.
>>
>> -Jacob
>>
>> Peter wrote:
>> > Rohini Damle wrote:
>> >> Hi,
>> >> I have a XML file with 4 blast records (for proteins P1, P2, P3, P4)
>> >> I am trying to extract alignment information for each of them.
>> >> So I wrote the following code:
>> >>
>> >> for b_record in b_iterator :
>> >>
>> >> E_VALUE_THRESH =20
>> >> for alignment in b_record.alignments:
>> >> for hsp in alignment.hsps:
>> >> if hsp.expect< E_VALUE_THRESH:
>> >>
>> >> print '****Alignment****'
>> >> print 'sequence:',
>> alignment.title.split()[0]
>> >>
>> >> With this code, I am getting information for P1,
>> >> then information for P1 + P2
>> >> then for P1+P2 +P3
>> >> and finally for P1+P2+P3+P4
>> >> why this is so?
>> >> is there something wrong with the looping?
>> >
>> > I'm aware of something funny with the XML parsing, Bug 1970, which
>> might
>> > well be the same issue:
>> >
>> > http://bugzilla.open-bio.org/show_bug.cgi?id=1970
>> >
>> > I confess I haven't looked into exactly what is going wrong here - too
>> > many other demands on my time to learn about XML and how BioPython
>> > parses it.
>> >
>> > Does the work around on the bug report help? Depending on which
>> version
>> > of standalone blast you have installed, you might have better luck with
>> > plain text output - the trouble is this is a moving target and the NBCI
>> > keeps tweaking it.
>> >
>> > Peter
>> >
>> > _______________________________________________
>> > BioPython mailing list - BioPython at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NCBIXML.py.gz
Type: application/x-gzip
Size: 3209 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20060719/a4f5167e/attachment-0002.gz>
More information about the Biopython
mailing list