[Bioperl-l] PullHSP bug

Chris Fields cjfields at illinois.edu
Tue Dec 2 21:40:32 EST 2008


On Dec 2, 2008, at 6:25 PM, Sendu Bala wrote:

> Chris Fields wrote:
>> Sendu,
>> I found a silent bug while fixing some issues with SimpleAlign  
>> which are causing blast_pull tests to fail (HSP-related).  It  
>> appears that the query sequence ID is not being caught
>
> I don't see this as a bug of PullHSPI. The blastn report being  
> parsed in that failing test has no query sequence id, so obviously  
> PullHSPI can't give an ID to the query sequence it creates.

It appears the problem boils down to a small inconsistency with the  
way blank IDs are handled between the two BLAST parsers.  The regular  
blast parser explicitly sets this to '', while blast_pull leaves it  
undef.  When one constructs a LocatableSeq w/o an ID it dies, but ''  
works, hence the problem.  I don't think using '' is a real solution  
though (seems like something added to silence errors).

The short and easy fix is either having get_nse() or blast_pull fall  
back to '', but both are technically wrong as there isn't a true ID.   
Ideas?  Maybe a fallback 'unknown' ID (which seems to be used as a ).

> Why does SimpleAlign now fall over when an added sequence has no id?  
> I'd suggest the old behaviour be retained...


I don't think going back to the old behavior would be the correct  
thing to do.  When adding seqs to a SimpleAlign, the NSE is used to  
index sequences and check for conflicts.  The index was previously  
built from scratch (didn't use LocatableSeq::get_nse) but was recalled  
using get_nse, which is inconsistent.  If get_nse changes slightly,  
such as optionally adding the version to remain consistent with Pfam/ 
Rfam, this suddenly breaks.  That behavior is now fixed to always use  
get_nse regardless.  Coming up with a fix for the ID so indexing works  
correctly would be a better solution.

chris




More information about the Bioperl-l mailing list