[Bioperl-l] remoteblast xml problem

Hubert Prielinger hubert.prielinger at gmx.at
Mon Jun 5 18:17:53 UTC 2006


hi,
you were right, removing the composition-based statistics solved the 
problem. Now I get the result viewed on STDIN, but it doesn't save the 
output in the file.
I haved tried it by reopening the file and writing it to an other file 
again, but it doesn't work.....
The strange thing is that if I retrieve text instead of xml output it 
works without any problem. Don't know why

Hubert



Chris Fields wrote:
> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote:
>
>   
>> hi chris,
>> thanks but I never intended to run the remoteblast with so much,  
>> only a few of them, acutally I goal is to run the phiblast with  
>> regular expression, so that i just don't need that
>> file anymore
>>     
>
> Not a problem.  Just to let you know, I did manage to get the script  
> working, so I'm marking the bug INVALID.  I think the problem isn't  
> that there is an infinite loop so much as setting composition-based  
> statistics causes the search to take much much longer; try removing  
> that line to see what I mean.
>
> Just so you know, using $result->query_name doesn't get you what you  
> would expect (it gives you a part of the RID, which you don't want;  
> this is something in the XML output that is beyond our control).  You  
> might want to change it to something else or you'll get filenames  
> with numerical names.
>
>   
>> another question for parsing the xml output....is there a xml  
>> parser available for blast xml output or how to start.....
>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml,  
>> but I'm not sure how to start....sorry, I guess I'm too stupid....
>> is their maybe another introduction or an example.
>>     
>
> Bio::SearchIO objects are used to parse BLAST XML output if you have  
> it saved to a file.  For instance:
>
> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml');
>
> while (my $result = $factory->next_result) {
>    while (my $hit = $result->next_hit) {
>       while (my $hsp = $hit->next_hsp {
>          #do stuff here
>        }
>     }
> }
>
> The only thing that changes in parsing a text BLAST report from an  
> XML BLAST report is the -format line (similar to the -readmethod  
> parameter in RemoteBlast).  You shouldn't need to look up any more  
> documentation other than these on the wiki:
>
> http://www.bioperl.org/wiki/HOWTO:SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO
>
> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml
>
> Pay attention to the fact you'll need to install XML::SAX (CPAN) and  
> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding  
> up parsing.
>
> Chris
>
>   
>> thanks
>> Hubert
>>
>>
>> Chris Fields wrote:
>>     
>>> Yes, I see the same error you do.  But I have a similar script   
>>> (blastp, XML blast report, XML parsing, similar loop structure)  
>>> that  works fine.  I'm trying to dissect the problem but I think  
>>> it may be  something logically wrong here (something not so  
>>> obvious) and not a  bug...
>>>
>>> What I'm trying to say is, when you send sequences using  
>>> remoteblast  like, this you are essentially spamming the NCBI  
>>> BLAST server with  ~1600 requests.  This script wasn't set up with  
>>> that intent in mind;  you should really try to set up your own  
>>> local blast database if  possible.  If you can't, try running this  
>>> script in off-hours  (10pm-6am EST or something like that).
>>>
>>>
>>> Chris
>>>
>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote:
>>>
>>>
>>>       
>>>> hi,
>>>> input database: swissprot
>>>>         matrix: pam30
>>>>         count: 1
>>>>         gapcosts: 9 1
>>>>
>>>> I know that there are  a lot of sequences, but that doesn't  
>>>> matter,  you can delete all of them except one, the amount of the  
>>>> sequences  is not the problem, the script reads one line and  
>>>> submits  it.....then the second line and so on.....I have tried  
>>>> it with only  one sequence either and I got the same result....  
>>>> the script run at  that time for more than 20  
>>>> minutes!!!!!! .....and that should be  enough time to retrieve  
>>>> the results for ONE sequence, I guess
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>         
>>>>> You need to add the input conditions as well (you have several   
>>>>> <STDIN> lines which may play a role; I would like to know what  
>>>>> you  normally enter for those).
>>>>>
>>>>> How long did you let the script run?  I ran a quick check on  
>>>>> your  sequences; you have almost 1600, so you have to expect  
>>>>> that you'll  run into some problems here!  Most here (including  
>>>>> me) would  suggest you try installing a local blast setup for  
>>>>> something like  this.
>>>>>
>>>>> Chris
>>>>>
>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> hi,
>>>>>> I have submitted the bug -> Bug 2017
>>>>>> with the script and input file, just start it from command line
>>>>>>
>>>>>> thank you very much
>>>>>> greetings
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>             
>>>>>>> Hubert,
>>>>>>>
>>>>>>> I have a script that's using blastxml and XML output which  
>>>>>>> seems  to work.
>>>>>>> I'll try looking at it to get a better idea this weekend.
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM
>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields;  
>>>>>>>> 'Sendu  Bala'
>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> sorry, but I have updated the remoteblast module and I have  
>>>>>>>> run  several
>>>>>>>> attempts with the same results as before. It didn't work.
>>>>>>>> I didn't get any results.
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>> Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Sendu, Hubert,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix  
>>>>>>>>> the  problem
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> (break
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> out of that infinite loop).  I applied Sendu's patch to   
>>>>>>>>> RemoteBlast in
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> CVS;
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> it passed all tests in RemoteBlast.t.  Try updating from  
>>>>>>>>> CVS  to see if
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> it
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala
>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM
>>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem
>>>>>>>>>>
>>>>>>>>>> Hubert Prielinger wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> hi,
>>>>>>>>>>> I have the following program and it worked quite well,  
>>>>>>>>>>> for  retrieving
>>>>>>>>>>> remoteblast results in a textfile,
>>>>>>>>>>> now I have altered it to to xml, and it didn't work   
>>>>>>>>>>> anymore.....
>>>>>>>>>>> it takes all the parameter at the commandline, submits  
>>>>>>>>>>> the  query, but
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>> I
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>>>> don't retrieve any results file anymore.....
>>>>>>>>>>>
>>>>>>>>>>> it seems that it hangs in a endless loop......
>>>>>>>>>>> the only output I get is:  $rc is not a ref! over and   
>>>>>>>>>>> over..... it
>>>>>>>>>>> doesn't enter the else term anymore....
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> There is no problem with your code. The problem is with  
>>>>>>>>>> the  NCBI server
>>>>>>>>>> and should be reported to them. You can visit the site and  
>>>>>>>>>> do  a blast,
>>>>>>>>>> requesting xml format, and you will typically get one  
>>>>>>>>>> normal  'waiting'
>>>>>>>>>> message and the promise that it will be updated in x  
>>>>>>>>>> seconds,  but
>>>>>>>>>> subsequent attempts to get progress information result in  
>>>>>>>>>> an  xml error
>>>>>>>>>> page because the NCBI server doesn't actually send any data.
>>>>>>>>>>
>>>>>>>>>> Unfortunately the way that the bioperl code is written, it   
>>>>>>>>>> treats no
>>>>>>>>>> data as 'waiting' instead of an error. I've offered a  
>>>>>>>>>> patch  to fix this
>>>>>>>>>> at this bug page:
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>             
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher
>>>>> Lab of Dr. Robert Switzer
>>>>> Dept of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>       
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>   




More information about the Bioperl-l mailing list