[Bioperl-l] alignIO::fasta bug

Chris Fields cjfields at illinois.edu
Mon Jan 26 08:37:00 EST 2009


On Jan 26, 2009, at 4:24 AM, Bernd Web wrote:

> Hi,
>
> Regarding the symbols, I change the variable with allowed symbols in  
> my script:
> $Bio::PrimarySeq::MATCHPATTERN = 'A-Za-z\-\.\*\?=~:';
>
> This works fine; if Iihave an unusual gap symbol i can just add it to
> this var. For transparency it might be nice to have a methods for
> setting allowed symbols, of better allowed gap symbols.

PrimarySeqs shouldn't have a way to define gaps (no start/end);  
LocatableSeqs (on the other hand) have the global $GAP_SYMBOLS.  But  
see here for caveats:

http://bugzilla.open-bio.org/show_bug.cgi?id=2715

> I also needed to change Bio::LocatableSeq::_ungapped_len to include
> the same gap symbols. SimpleAlign (sub slice) deletes all non-word
> characters from the string, but LocatableSeq does not. This caused
> SimpleAlign to crash after slicing an alignment. E.g. it looked for a
> sequence with end 0, whereas end had become 17 in LocatableSeq (since
> i used a non-standard gap symbol).  LocatableSeq always calculates the
> end (sub end) and returns a different end due to the difference in
> treating the allowed/gap symbols, when slicing an alignment.
>
> SimpleAlign slice uses: 		$slice_seq =~ s/\W//g;
> LocatableSeq, _ungapped_len uses:     $string =~ s/[\.\-]+//g;
>
> Regards,
> Benrd

This behavior stems from various problems within both LcatableSeq and  
SimpleAlign, nothing that isn't fixable per se.  If anything  
SimpleAlign::slice shouldn't role it's own way of determining the  
ungapped length.  A bug report with the problems you are seeing would  
help tremendously.

chris




More information about the Bioperl-l mailing list