[BioPython] import Standalone problems

Jacob Joseph jmjoseph at andrew.cmu.edu
Thu Jul 20 20:05:14 UTC 2006


Great!

Can someone point me to the current maintainer of the Blast parsing package?

-Jacob

Rohini Damle wrote:
> Hi,
> I used hsp.evalue instead of hsp.expect and I am getting the desired 
> output.
> 
> Thank you very much for your help, efforts, and all those modified files.
> Rohini
> 
> On 7/20/06, Rohini Damle <rohini.damle at gmail.com> wrote:
>> Hi,
>> Now I used your updated Record.py, NCBIXML.py and NcbiStandalone.py
>> (all updated)
>> I am not getting that previous error.
>> BUT I am still not getting the desired output ...
>> Here is my code
>>
>> blast_out = open("C:/Documents and Settings/rdamle/My
>> Documents/Rohini's Documents/Blast
>> Parsing/onlymouse4proteinblastout.xml", "r")
>>
>> b_parser = NCBIXML.BlastParser()
>> b_iterator = NCBIStandalone.Iterator(blast_out, b_parser)
>> E_VALUE_THRESH = 22
>>
>> for b_record in b_iterator :
>>             for alignment in b_record.alignments:
>>                    for hsp in alignment.hsps:
>>                        if (hsp.expect< E_VALUE_THRESH):
>>                                print b_record.query.split()[0]
>>                                print '****Alignment****'
>>                                print 'sequence:',
>> alignment.title.split()[0]
>>
>>
>> with this code I was expecting to get all the alignments with
>> hsp.expect<E_VALUE_THRESH
>>
>> BUT I AM GETTING ALL the alignments not just the one with evalue <22
>> -Rohini.
>>
>>
>>
>>
>>
>> On 7/20/06, Jacob Joseph <jmjoseph at andrew.cmu.edu> wrote:
>> > Hi.  I suspect you are not using my updated Record.py.   You'll notice
>> > that, at least for the moment, I have changed _blast.gap_penalties 
>> to an
>> > array to allow assignment per item without worrying about the order of
>> > entries within the xml file.  There are other ways this could be
>> > accomplished while still using a tuple.
>> >
>> > -Jacob
>> >
>> > Rohini Damle wrote:
>> > > Hi,
>> > > When I tried on your NCBIXML.py code instead of oringinal one I am
>> > > getting following error messege:
>> > >
>> > > File "C:\Python24\lib\site-packages\Bio\Blast\NCBIXML.py", line 210,
>> > > in _end_Parameters_gap_open
>> > >    self._blast.gap_penalties[0] = int(self._value)
>> > > TypeError: object does not support item assignment
>> > >
>> > > in the original version
>> > > we don't have that " [0] " in self._blast.gap_penalties
>> > >
>> > > what might be causing this error?
>> > > -Rohini
>> > >
>> > > On 7/19/06, Jacob Joseph <jmjoseph at andrew.cmu.edu> wrote:
>> > >> I do not believe the current version of the parser will work with
>> > >> multiple queries using recent version of blast, regardless of the 
>> output
>> > >> format.  I do know that blastall 2.2.13 with XML functions with the
>> > >> parser corrections previously attached.  I have attached a further
>> > >> updated NCBIXML.py, fixing the performance issues in parse() that I
>> > >> mentioned.
>> > >>
>> > >> -Jacob
>> > >>
>> > >> Rohini Damle wrote:
>> > >> > Hi,
>> > >> > Can someone suggest me for which version of Blast, the Biopython's
>> > >> > (text or xml) parser works fine?
>> > >> > I will download that blast version locally and can use biopython's
>> > >> parser.
>> > >> > thanx,
>> > >> > Rohini
>> > >> >
>> > >> > On 7/18/06, Jacob Joseph <jacob at jjoseph.org> wrote:
>> > >> >> Hi.
>> > >> >> I encountered similar difficulties over the past few days 
>> myself and
>> > >> >> have made some improvements to the XML parser.  Well, that is, 
>> it now
>> > >> >> functions with blastall, but I have made no effort to parse 
>> the other
>> > >> >> blast programs.  I do not expect I have done any harm to other
>> > >> parsing,
>> > >> >> however.
>> > >> >>
>> > >> >> Attached are Record.py, NCBIStandalone.py, and NCBIXML.py.  I 
>> have not
>> > >> >> yet spent significant time to clean up my changes.  Without 
>> getting
>> > >> into
>> > >> >> specific modifications, I have made an effort to make 
>> consistent the
>> > >> >> variables in Record and NCBIXML, focusing primarily on what I 
>> needed
>> > >> >> this week.
>> > >> >>
>> > >> >> One portion I am not settled on reinitialization of 
>> Record.Blast at
>> > >> >> every call to iterator.next(), and, by extension, 
>> BlastParser.parse().
>> > >> >> See NCBIXML.py, line 114.  Without re-initializing this class, 
>> we run
>> > >> >> the risk of retaining portions of a Record from previously parsed
>> > >> >> queries.   This causes the bug 1970, mentioned below.  
>> Unfortunately,
>> > >> >> this re-initialization exacts a significant performance 
>> penalty of at
>> > >> >> least a factor of 10 by some rough measures.  I would 
>> appreciate any
>> > >> >> suggestions for improvement here.
>> > >> >>
>> > >> >> I do apologize for not being more specific about my changes.  
>> When
>> > >> I get
>> > >> >> a chance(next week?), I will package them up as a proper patch 
>> and
>> > >> file
>> > >> >> a bug.  Perhaps what I have done so far will be of use until 
>> then.
>> > >> >>
>> > >> >> fyi, I have done all of my testing with Blast 2.2.13.  2.2.14 
>> seems to
>> > >> >> not have separate <?xml> blocks within its output, requiring a
>> > >> different
>> > >> >> method of iteration.
>> > >> >>
>> > >> >> -Jacob
>> > >> >>
>> > >> >> Peter wrote:
>> > >> >> > Rohini Damle wrote:
>> > >> >> >> Hi,
>> > >> >> >> I have a XML file with 4 blast records (for proteins P1, 
>> P2, P3,
>> > >> P4)
>> > >> >> >> I am trying to extract alignment information for each of them.
>> > >> >> >> So I wrote the following code:
>> > >> >> >>
>> > >> >> >>  for b_record in b_iterator :
>> > >> >> >>
>> > >> >> >>                 E_VALUE_THRESH =20
>> > >> >> >>                 for alignment in b_record.alignments:
>> > >> >> >>                        for hsp in alignment.hsps:
>> > >> >> >>                        if hsp.expect< E_VALUE_THRESH:
>> > >> >> >>
>> > >> >> >>                             print '****Alignment****'
>> > >> >> >>                             print 'sequence:',
>> > >> >> alignment.title.split()[0]
>> > >> >> >>
>> > >> >> >> With this code, I am getting information for P1,
>> > >> >> >> then information for P1 + P2
>> > >> >> >> then for P1+P2 +P3
>> > >> >> >> and finally for P1+P2+P3+P4
>> > >> >> >> why this is so?
>> > >> >> >> is there something wrong with the looping?
>> > >> >> >
>> > >> >> > I'm aware of something funny with the XML parsing, Bug 1970, 
>> which
>> > >> >> might
>> > >> >> > well be the same issue:
>> > >> >> >
>> > >> >> > http://bugzilla.open-bio.org/show_bug.cgi?id=1970
>> > >> >> >
>> > >> >> > I confess I haven't looked into exactly what is going wrong 
>> here
>> > >> - too
>> > >> >> > many other demands on my time to learn about XML and how 
>> BioPython
>> > >> >> > parses it.
>> > >> >> >
>> > >> >> > Does the work around on the bug report help?  Depending on 
>> which
>> > >> >> version
>> > >> >> > of standalone blast you have installed, you might have better
>> > >> luck with
>> > >> >> > plain text output - the trouble is this is a moving target and
>> > >> the NBCI
>> > >> >> > keeps tweaking it.
>> > >> >> >
>> > >> >> > Peter
>> > >> >> >
>> > >> >> > _______________________________________________
>> > >> >> > BioPython mailing list  -  BioPython at lists.open-bio.org
>> > >> >> > http://lists.open-bio.org/mailman/listinfo/biopython
>> > >>
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> BioPython mailing list  -  BioPython at lists.open-bio.org
>> > >> http://lists.open-bio.org/mailman/listinfo/biopython
>> > >>
>> > >>
>> > >>
>> > >>
>> > _______________________________________________
>> > BioPython mailing list  -  BioPython at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biopython
>> >
>>



More information about the Biopython mailing list