[Bioperl-l] Bio::SimpleAlign constructor?

Dan Bolser dan.bolser at gmail.com
Mon Aug 24 12:50:46 UTC 2009


I just ran into the same problem described here. Here is my code to
demonstrate what I expected:

#!/usr/bin/perl -w

use strict;

use Bio::SimpleAlign;
use Bio::LocatableSeq;
use Bio::AlignIO;

my $CLUDGE = 0;


## REF  tacattaaagacccg
## SEQ1 taca.taaa......
## SEQ2 .....taaaga.ccg

my $aln = Bio::SimpleAlign->new();

$aln->gap_char('.');


my $r  = Bio::LocatableSeq->new( -id=>'r', -seq=>'tacattaaagacccg' );
my $s1 = Bio::LocatableSeq->new( -id=>'s1', -start=>1, -seq=>'taca.taaa' );
my $s2 = Bio::LocatableSeq->new( -id=>'s2', -start=>6, -seq=>'taaaga.ccg' );

$aln->add_seq( $r );
$aln->add_seq( $s1 );
$aln->add_seq( $s2 );

if($CLUDGE){
 foreach(($r, $s1, $s2)){
   $_->seq( '.' x ($_->start - 1) . $_->seq )
 }
}

## Prepare an 'output stream' for the alignment:
my $aliWriter = Bio::AlignIO->
  new( -fh     => \*STDOUT,
       -format => 'clustalw',
     );

warn "\nOUTPUT:\n";
$aliWriter->write_aln($aln);


I was calling the "fill in the gaps yourself" step a CLUDGE because I
had expected the alignment object to take care of this for me.  Is
there any reason that it couldn't do this 'CLUDGE' automatically? It
seems strange that it insists on being passed locatable sequence
objects, but then largely ignore the given location.

Would it not be possible to have this happen when the sequences are
written out from the alignment? I think it should still be possible to
index the column number via the (gapless) sequence number... or did I
get confused? There are two levels of confusion here (on my part), 1)
the concepts behind the objects and 2) the implementation details.

Thanks for any hints on how to understand or potentially how to fix
these problems.


Cheers,
Dan.


2009/7/22 Mark A. Jensen <maj at fortinbras.us>:
> Hi Paolo,
> I think I see what you want to do, however, it doesn't quite work
> this way. I'm supposing you want to specify something like
>
> s1/3-6 attc
> s2/7-10 gaag
>
> and obtain output like
>
> s1 --attc----
> s2 ------gaag
>
> But (and this is why LocatableSeqs are "locatable"), the alignment described
> by the former data is always going to be
>
> s1 attc
> s2 gaag
>
> so that I can query the alignment *column* number 1 and obtain
> the residue coordinates of the original sequences in that column:
>
> $loc = $aln->get_seq_by_pos(1)->location_from_column(1); # 3
>
> or vice-versa
>
> $col = $aln->column_from_residue_number( 's1', 3); # 1
>
> As far as I know, you have to fill in the gaps yourself; a good
> exercise, since you already have all the information you need, in having set
> up the start and end coordinates (which are really
> the column coordinates in this model).
> If this wasn't what you had in mind, I apologize.
> cheers, Mark
>
>
> ----- Original Message ----- From: "Paolo Pavan" <paolo.pavan at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, July 16, 2009 6:17 AM
> Subject: [Bioperl-l] Bio::SimpleAlign constructor?
>
>
>> Hi,
>> I have a brief question: I would like to know if there is a method to
>> obtain a valid formatted and flush Bio::SimpleAlign object (i.e.
>> properly filled with gaps on the right and on the left side of each
>> sequence) given a bounch of Bio::LocatableSeq objects in which I have
>> specified the -start and -end properties.
>> Can anyone help me? Thank you very much,
>>
>> Paolo
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list