[Bioperl-l] Parsing a netblast file

Wes Barris wes.barris at csiro.au
Thu Jul 31 22:36:23 EDT 2003


Jason Stajich wrote:

> Here is the patch it is pretty simple or you just need to grab the latest
> version of blast.pm from CVS.

Thank you!  It works like a charm.


> 
> Index: Bio/SearchIO/blast.pm
> ===================================================================
> RCS file: /home/repository/bioperl/bioperl-live/Bio/SearchIO/blast.pm,v
> retrieving revision 1.42.2.9
> diff -r1.42.2.9 blast.pm
> 273c273
> <                if( /\(([\d,]+)\s+letters.*\)/ ) {
> ---
> 
>>               if( /\((\-?[\d,]+)\s+letters.*\)/ ) {
> 
> 325c325
> < 	       if(
> /^\s+([\d\,]+)\s+sequences\;\s+([\d,]+)\s+total\s+letters/){
> ---
> 
>>	       if(
> 
> /^\s+(\-?[\d\,]+)\s+sequences\;\s+(\-?[\d,]+)\s+total\s+letters/){
> 525c525
> < 	       } elsif ( /letters in database:\s+([\d,]+)/i) {
> ---
> 
>>	       } elsif ( /letters in database:\s+(\-?[\d,]+)/i) {
> 
> 
> 
> On Fri, 1 Aug 2003, Wes Barris wrote:
> 
> 
>>Jason Stajich wrote:
>>
>>
>>>>Through trial and error I have narrowed down the problem to the negative
>>>>sign in the database details.  Here is the section in question from a
>>>>netblast result file:
>>>>
>>>>Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,
>>>>or phase 0, 1 or 2 HTGS sequences)
>>>>           1,819,241 sequences; -24,217,474 total letters
>>>
>>>
>>>integer overflow.  The number of letters in nt is > than the
>>>largest signed number (2147483647) that an integer can represent.
>>>
>>>Looks like nt length is 8,782,847,770 - seems like it has been larger than
>>>INT_MAX for a while, surprised they haven't updated their code.  Do you
>>>have the latest version of netblast on your machine?  A bug report to NCBI
>>>is probably a good idea if you are running the latest version
>>
>>Hi Jason,
>>
>>Thanks for responding.  Yes, I am running the latest blastcl3 from the NCBI
>>ftp site.  I had already alerted NCBI to the problem (although I didn't
>>understand the source of the problem until you pointed it out).  Here is their
>>response.  It doesn't look like they are interested in fixing it:
>>
>>--------------------------
>>We have some back compatibility issue for the older client and would not be
>>able to change this.
>>
>>The best way is to address it to bioperl and have it changed to be more
>>tolerant.  As I mentioned before, the correct db info is given at the end.
>>
>>Regards,
>>
>>Tao Tao
>>NCBI USER Service
>>----------------------------
>>
>>[...snip...]
>>
>>
>>>We'd just need to tweak the regexp a little bit to handle a leading -.
>>>What version of bioperl are you running so can provide a patch which is
>>>appropriate for your version?
>>
>>I am running bioperl-1.2.2
>>
>>
> 
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu


-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au



More information about the Bioperl-l mailing list