[Bioperl-l] StandAloneBlast->blastall array of Bio::Seq objects

Andrew Stewart stewarta at nmrc.navy.mil
Thu Dec 14 21:23:07 UTC 2006


> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris

Interesting.  I ran the 738-sequence dataset through blastall  
manually and the report only returned 198 of the 738 expected  
results.  Not only that, it seems to have just cut off right in the  
middle of the 198th result and a Segmentation fault was reported.   I  
removed the 198th sequence, wondering if it might be some issue with  
the input, and the segmentation fault occured again with the results  
ending on the 210th result.  I stuck the 198th sequence back in, but  
at the start of the file and sure enough the Segmentation error  
occurred earlier.  I think we can rule out the size of the input or  
number of sequences as the source of error here.  I'm more inclined  
to think it has something to do with the blast databases being  
queried against.

I found an old discussion on a problem that sounds fairly similar to  
this one, for anyone interested.
http://bioinformatics.org/pipermail/bioclusters/2004-June/001742.html

I think I'll try to work around the problem for now.

andrew


On Dec 14, 2006, at 1:36 PM, Chris Fields wrote:

>
> On Dec 14, 2006, at 11:49 AM, Andrew Stewart wrote:
>
>>> So can you look at the tempfile that is created and see if it is  
>>> sane?
>>>
>>> Set -save_tempfiles => 1 whene you initialize the factory object  
>>> or do
>>> $factory->save_tempfiles(1)
>>> before calling the blastall.
>>>
>>> -jason
>>>
>>
>> Jason,
>> I was actually wondering how to do that.  Thanks.  Odd though, it
>> still doesn't seem to be saving the tempfiles.  Might not matter
>
> That needs to be checked out.  Can anyone verify that?
>
>>> The error pops up when the executable returns a bad status, so
>>> maybe it's choking on too many input sequences (i.e. Bioperl is
>>> doing everything correctly, but you are attempting to BLAST too
>>> many sequences in one go).  How many sequences are you attempting
>>> to use as input?  What happens when you use fewer input sequences?
>>>
>>> chris
>>>
>>
>> I was processing 738 sequences for input.  I cut that down to 20
>> sequences and I'm getting some other exception thrown further
>> downstream, so it appears you may be correct.  You don't happen to
>> know what the max number of sequences that blastall allows for input,
>> would ya? ;)  I suppose I'll have to break @query down into smaller
>> doses or something.
>>
>> Thanks,
>> Andrew
>
> It was a shot in the dark, really.  The fact that the return status  
> was bad could be due to a number of problems (permissions issues,  
> bad data, etc).  The fact that a single sequence worked indicated  
> that permissions and output format likely weren't to blame.  The  
> only other thing left was a problem with blastall itself.
>
> BTW, the blast docs do not indicate whether there is a maximum  
> number of sequences.  There may be a point where available memory  
> becomes the limiting issue.
>
> chris



--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270





More information about the Bioperl-l mailing list