[Bioperl-l] problem with alignments and sequence locations

Florent Angly florent.angly at gmail.com
Thu Dec 3 13:26:57 EST 2009


Hi all,

Like Steffen, I've had a few burning questions too regarding 
LocatableSeq lately.

I've had an occasional issue with LocatableSeq. Most assembly-related 
modules use LocatableSeq objects. They specify the sequence start but 
not the sequence end. This works in most cases, but I've recently 
encountered very occasional error messages related to having not 
explicitely set the end of the sequence. I've been unable to put 
together a small test case to reproduce the bug easily.

My question is. If the start of the sequence is set, is it mandatory to 
set the end of the sequence? If so, then maybe the documentation needs 
to be explicit about it and maybe there needs to be a check that 
enforces that the end is set. In fact, it seems like if I provide a 
sequence and its start position, the LocatableSeq code should be able to 
automatically calculate its end, no?

Florent



Steffen Heyne wrote:
> Hello,
>
> so I tried to fix the problem with the location. Currently it works for
> me with the following changes:
>
> LocatableSeq.pm
>
> sub get_nse{
>
> ...
>
> 	my $ret;
> 	if ($self->strand() >= 0) {
> 		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
> 	} else {
> 		$ret = $id . $v. $char1 . $end . $char2 . $st ;
> 	}
> 	return $ret;
> }
>
> Then I recognized during the usage of $aln->remove_seq() that it cannot
> remove a seq as it uses a wrong NSE to lookup sequences. I changed the
> following:
>
> SimpleAlign.pm
>
> sub remove_seq {
>
> ...
> 	$id = $seq->id();
>     	$start = $seq->start();
>     	$end  = $seq->end();
>
> ## changed code:
>
> 	my $v = $seq->version ? '.'.$seq->version : '';
>     	if ($seq->strand >=0){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
> 	} elsif ($seq->strand == -1){
> 		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
> 	}	
> ...
>
> }
>
> The above code in LocatableSeq.pm worked in the case if I read an
> alignment in stockholm format and write it out in clustalw format. But
> if I read an alignment in clustalw and write it out as stockholm (or
> something else) it didn't worked, as the strand is not correctly set in
> ClustalW::next_aln. It works with the following changes:
>
> ClustalW.pm
>
> sub next_aln{
>
> ...
>
> 	my ( $sname, $start, $end, $strand );	## strand added
> 	$strand = 0;				## new, standard = 0???
>     	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
> %alignments ) {
>         if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
>         	( $sname, $start, $end ) = ( $1, $2, $3 );
> 		$strand = 1;			## new			
> 		if ($start > $end) {		## new
>        		($start, $end, $strand) = ($end, $start, -1); ##new
> 		}				## new
> 	
>       }
>         else {
>             ( $sname, $start ) = ( $name, 1 );
>             my $str = $alignments{$name};
>             $str =~ s/[^A-Za-z]//g;
>             $end = length($str);
>         }
>
>         my $seq = Bio::LocatableSeq->new(
>             -seq   => $alignments{$name},
>             -id    => $sname,
>             -start => $start,
>             -end   => $end,
> 	    -strand=> $strand			## new
>         );
>
> ...
>
> }
>
> So I don't know if I changed things at their correct position. And I
> found them only because I used certain functions. I dont know how broad
> the effect of a changed NSE in LocatableSeq.pm is to other Modules and
> functions. But I'm happy with my changes (so far :-)...).
>
> Do you will change this to your proposed way in bioperl trunk?
>
> Thanks!
>
> steffen
>
>
> Chris Fields schrieb:
>   
>> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
>>
>>     
>>> Hi,
>>>
>>> I'm using Bioperl for my research and it is very useful! Thank you!
>>>
>>> Currently I have a problem with locations tags of sequences. I read in
>>> seed alignments of Rfam (in stockholm format, but I think it is
>>> similar to other formats).
>>>
>>> If the location is like:
>>>
>>> AB194432.1/908-846
>>>
>>> the start/end values are changed to
>>>
>>> $seq->start = 846
>>> $seq->end = 908
>>>
>>> and therefore the new location (e.g.$seq->get_nse) is:
>>>
>>> AB194432.1/846-908
>>>
>>> The $seq->strand tag is correctly set to -1 in this case, but if the
>>> alignment is written out again (clustal, stockholm,...) this strand
>>> info is lost and the sequences have this "wrong" location. But this
>>> information is important in respect to the sequence accession number.
>>>
>>> Is there a way to set the location back to the original one or is this
>>> behavior desired? Any manually setting with $seq->start($val) failed
>>> due to automatic checking.
>>>
>>> I'm using bioperl 1.6.1
>>>
>>> Thanks!
>>>
>>> steffen
>>>       
>> This is a definite bug. We recently discussed amending the NSE format
>> due to this (the subject came up over the last few months or so); it's
>> fallen through the cracks.  Fortunaely it is very easy to fix (the
>> relevant method is in LocatableSeq).
>>
>> Does anyone have a problem with me adding this in?  It will change
>> output for only those instances where the strand is -1, so
>>
>> AB194432.1/908-846
>>
>> would be start = 846, end = 908, strand = -1
>>
>> AB194432.1/846-908
>>
>> would be start = 846, end = 908, strand = 1
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>     
>
>
>   



More information about the Bioperl-l mailing list