[Bioperl-l] remoteblast xml problem

Hubert Prielinger hubert.prielinger at gmx.at
Mon Jun 5 19:12:37 UTC 2006


hi chris,
sorry, I have tried it with the latest CVS version:

# $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $

but it still doesn't work.

Hubert

Chris Fields wrote:
> Hubert, 
>
> Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS.  The
> option to save XML was committed relatively recently (last month or so).
>
> Chris
>
>   
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>> Sent: Monday, June 05, 2006 1:18 PM
>> To: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>
>> hi,
>> you were right, removing the composition-based statistics solved the
>> problem. Now I get the result viewed on STDIN, but it doesn't save the
>> output in the file.
>> I haved tried it by reopening the file and writing it to an other file
>> again, but it doesn't work.....
>> The strange thing is that if I retrieve text instead of xml output it
>> works without any problem. Don't know why
>>
>> Hubert
>>
>>
>>
>> Chris Fields wrote:
>>     
>>> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi chris,
>>>> thanks but I never intended to run the remoteblast with so much,
>>>> only a few of them, acutally I goal is to run the phiblast with
>>>> regular expression, so that i just don't need that
>>>> file anymore
>>>>
>>>>         
>>> Not a problem.  Just to let you know, I did manage to get the script
>>> working, so I'm marking the bug INVALID.  I think the problem isn't
>>> that there is an infinite loop so much as setting composition-based
>>> statistics causes the search to take much much longer; try removing
>>> that line to see what I mean.
>>>
>>> Just so you know, using $result->query_name doesn't get you what you
>>> would expect (it gives you a part of the RID, which you don't want;
>>> this is something in the XML output that is beyond our control).  You
>>> might want to change it to something else or you'll get filenames
>>> with numerical names.
>>>
>>>
>>>       
>>>> another question for parsing the xml output....is there a xml
>>>> parser available for blast xml output or how to start.....
>>>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,
>>>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>>>> is their maybe another introduction or an example.
>>>>
>>>>         
>>> Bio::SearchIO objects are used to parse BLAST XML output if you have
>>> it saved to a file.  For instance:
>>>
>>> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>>>
>>> while (my $result = $factory->next_result) {
>>>    while (my $hit = $result->next_hit) {
>>>       while (my $hsp = $hit->next_hsp {
>>>          #do stuff here
>>>        }
>>>     }
>>> }
>>>
>>> The only thing that changes in parsing a text BLAST report from an
>>> XML BLAST report is the -format line (similar to the -readmethod
>>> parameter in RemoteBlast).  You shouldn't need to look up any more
>>> documentation other than these on the wiki:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>>>
>>> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>>>
>>> Pay attention to the fact you'll need to install XML::SAX (CPAN) and
>>> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding
>>> up parsing.
>>>
>>> Chris
>>>
>>>
>>>       
>>>> thanks
>>>> Hubert
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> Yes, I see the same error you do.  But I have a similar script
>>>>> (blastp, XML blast report, XML parsing, similar loop structure)
>>>>> that  works fine.  I'm trying to dissect the problem but I think
>>>>> it may be  something logically wrong here (something not so
>>>>> obvious) and not a  bug...
>>>>>
>>>>> What I'm trying to say is, when you send sequences using
>>>>> remoteblast  like, this you are essentially spamming the NCBI
>>>>> BLAST server with  ~1600 requests.  This script wasn't set up with
>>>>> that intent in mind;  you should really try to set up your own
>>>>> local blast database if  possible.  If you can't, try running this
>>>>> script in off-hours  (10pm-6am EST or something like that).
>>>>>
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> input database: swissprot
>>>>>>         matrix: pam30
>>>>>>         count: 1
>>>>>>         gapcosts: 9 1
>>>>>>
>>>>>> I know that there are  a lot of sequences, but that doesn't
>>>>>> matter,  you can delete all of them except one, the amount of the
>>>>>> sequences  is not the problem, the script reads one line and
>>>>>> submits  it.....then the second line and so on.....I have tried
>>>>>> it with only  one sequence either and I got the same result....
>>>>>> the script run at  that time for more than 20
>>>>>> minutes!!!!!! .....and that should be  enough time to retrieve
>>>>>> the results for ONE sequence, I guess
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> You need to add the input conditions as well (you have several
>>>>>>> <STDIN> lines which may play a role; I would like to know what
>>>>>>> you  normally enter for those).
>>>>>>>
>>>>>>> How long did you let the script run?  I ran a quick check on
>>>>>>> your  sequences; you have almost 1600, so you have to expect
>>>>>>> that you'll  run into some problems here!  Most here (including
>>>>>>> me) would  suggest you try installing a local blast setup for
>>>>>>> something like  this.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> hi,
>>>>>>>> I have submitted the bug -> Bug 2017
>>>>>>>> with the script and input file, just start it from command line
>>>>>>>>
>>>>>>>> thank you very much
>>>>>>>> greetings
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Hubert,
>>>>>>>>>
>>>>>>>>> I have a script that's using blastxml and XML output which
>>>>>>>>> seems  to work.
>>>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;
>>>>>>>>>> 'Sendu  Bala'
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> hi,
>>>>>>>>>> sorry, but I have updated the remoteblast module and I have
>>>>>>>>>> run  several
>>>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>>>> I didn't get any results.
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Chris Fields wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Sendu, Hubert,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix
>>>>>>>>>>> the  problem
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> (break
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to
>>>>>>>>>>> RemoteBlast in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> CVS;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from
>>>>>>>>>>> CVS  to see if
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Chris
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>>>
>>>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>>>> hi,
>>>>>>>>>>>>> I have the following program and it worked quite well,
>>>>>>>>>>>>> for  retrieving
>>>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>>>> now I have altered it to to xml, and it didn't work
>>>>>>>>>>>>> anymore.....
>>>>>>>>>>>>> it takes all the parameter at the commandline, submits
>>>>>>>>>>>>> the  query, but
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>>>
>>>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and
>>>>>>>>>>>>> over..... it
>>>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> There is no problem with your code. The problem is with
>>>>>>>>>>>> the  NCBI server
>>>>>>>>>>>> and should be reported to them. You can visit the site and
>>>>>>>>>>>> do  a blast,
>>>>>>>>>>>> requesting xml format, and you will typically get one
>>>>>>>>>>>> normal  'waiting'
>>>>>>>>>>>> message and the promise that it will be updated in x
>>>>>>>>>>>> seconds,  but
>>>>>>>>>>>> subsequent attempts to get progress information result in
>>>>>>>>>>>> an  xml error
>>>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the way that the bioperl code is written, it
>>>>>>>>>>>> treats no
>>>>>>>>>>>> data as 'waiting' instead of an error. I've offered a
>>>>>>>>>>>> patch  to fix this
>>>>>>>>>>>> at this bug page:
>>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher
>>>>>>> Lab of Dr. Robert Switzer
>>>>>>> Dept of Biochemistry
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   




More information about the Bioperl-l mailing list