[Bioperl-l] problem with alignments and sequence locations

Steffen Heyne heyne at informatik.uni-freiburg.de
Thu Dec 3 13:19:51 UTC 2009


Hello,

so I tried to fix the problem with the location. Currently it works for
me with the following changes:

LocatableSeq.pm

sub get_nse{

...

	my $ret;
	if ($self->strand() >= 0) {
		$ret = $id . $v. $char1 . $st . $char2 . $end ;	
	} else {
		$ret = $id . $v. $char1 . $end . $char2 . $st ;
	}
	return $ret;
}

Then I recognized during the usage of $aln->remove_seq() that it cannot
remove a seq as it uses a wrong NSE to lookup sequences. I changed the
following:

SimpleAlign.pm

sub remove_seq {

...
	$id = $seq->id();
    	$start = $seq->start();
    	$end  = $seq->end();

## changed code:

	my $v = $seq->version ? '.'.$seq->version : '';
    	if ($seq->strand >=0){
		$name = sprintf("%s%s/%d-%d",$id,$v,$start,$end);
	} elsif ($seq->strand == -1){
		$name = sprintf("%s%s/%d-%d",$id,$v,$end,$start);
	}	
...

}

The above code in LocatableSeq.pm worked in the case if I read an
alignment in stockholm format and write it out in clustalw format. But
if I read an alignment in clustalw and write it out as stockholm (or
something else) it didn't worked, as the strand is not correctly set in
ClustalW::next_aln. It works with the following changes:

ClustalW.pm

sub next_aln{

...

	my ( $sname, $start, $end, $strand );	## strand added
	$strand = 0;				## new, standard = 0???
    	foreach my $name ( sort { $order{$a} <=> $order{$b} } keys
%alignments ) {
        if ( $name =~ /(\S+):(\d+)-(\d+)/ ) {
        	( $sname, $start, $end ) = ( $1, $2, $3 );
		$strand = 1;			## new			
		if ($start > $end) {		## new
       		($start, $end, $strand) = ($end, $start, -1); ##new
		}				## new
	
      }
        else {
            ( $sname, $start ) = ( $name, 1 );
            my $str = $alignments{$name};
            $str =~ s/[^A-Za-z]//g;
            $end = length($str);
        }

        my $seq = Bio::LocatableSeq->new(
            -seq   => $alignments{$name},
            -id    => $sname,
            -start => $start,
            -end   => $end,
	    -strand=> $strand			## new
        );

...

}

So I don't know if I changed things at their correct position. And I
found them only because I used certain functions. I dont know how broad
the effect of a changed NSE in LocatableSeq.pm is to other Modules and
functions. But I'm happy with my changes (so far :-)...).

Do you will change this to your proposed way in bioperl trunk?

Thanks!

steffen


Chris Fields schrieb:
> On Nov 10, 2009, at 6:55 AM, Steffen Heyne wrote:
> 
>> Hi,
>>
>> I'm using Bioperl for my research and it is very useful! Thank you!
>>
>> Currently I have a problem with locations tags of sequences. I read in
>> seed alignments of Rfam (in stockholm format, but I think it is
>> similar to other formats).
>>
>> If the location is like:
>>
>> AB194432.1/908-846
>>
>> the start/end values are changed to
>>
>> $seq->start = 846
>> $seq->end = 908
>>
>> and therefore the new location (e.g.$seq->get_nse) is:
>>
>> AB194432.1/846-908
>>
>> The $seq->strand tag is correctly set to -1 in this case, but if the
>> alignment is written out again (clustal, stockholm,...) this strand
>> info is lost and the sequences have this "wrong" location. But this
>> information is important in respect to the sequence accession number.
>>
>> Is there a way to set the location back to the original one or is this
>> behavior desired? Any manually setting with $seq->start($val) failed
>> due to automatic checking.
>>
>> I'm using bioperl 1.6.1
>>
>> Thanks!
>>
>> steffen
> 
> This is a definite bug. We recently discussed amending the NSE format
> due to this (the subject came up over the last few months or so); it's
> fallen through the cracks.  Fortunaely it is very easy to fix (the
> relevant method is in LocatableSeq).
> 
> Does anyone have a problem with me adding this in?  It will change
> output for only those instances where the strand is -1, so
> 
> AB194432.1/908-846
> 
> would be start = 846, end = 908, strand = -1
> 
> AB194432.1/846-908
> 
> would be start = 846, end = 908, strand = 1
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
---
Steffen Heyne, Dipl.-Bioinf.
Lehrstuhl für Bioinformatik
Institut für Informatik
Albert-Ludwigs-Universität Freiburg
Georges-Köhler-Allee 106
79110 Freiburg, Germany

Tel: (+49) 761 203 7465
Fax: (+49) 761 203 7462
Mail: heyne at informatik.uni-freiburg.de




More information about the Bioperl-l mailing list