[Bioperl-l] Cannot read in alignment data with Bio::AlignIO

Chris Fields cjfields at illinois.edu
Tue Feb 10 15:27:23 UTC 2009


On Feb 10, 2009, at 8:40 AM, manni122 wrote:

>
> Hi,
> I am trying to read in a file with multiple pairwise alignments.  
> Some IDs
> appear frequently. So if I am using this code below I get the error  
> message:
> --- MSG: Replacing one sequence xxx ---
> Is there a way to read the data even with those similar names?
> Regards, manni122. [...]

The NSE (Name.version/start-end) is used to distinguish the sequences  
from one another, so if each sequence has one or more unique accession/ 
version/start/end there should be no replacement (and no warning).

If you think about it that's a feature.  Any single sequence that  
appears in an alignment more than once is either (1) matching multiple  
regions (i.e. repeats, motifs, etc) so the location varies, or (2) the  
sequence was modified so the version changes (the last one is fairly  
new).  Beyond that one has to question the logic of including multiple  
copies of exactly the same sequence record in a multiple alignment, so  
unless additional information distinguishing the potential duplicates  
is provided we assume unintentional (and erroneous) duplication and  
punt.

Weighing the options I would rather have the warning indicating a  
problem than nothing at all.  If you absolutely need duplicates (I am  
curious as to why) I suggest changing the version number:

use Bio::LocatableSeq;
use Bio::SimpleAlign;
use Bio::AlignIO;

my $aln = Bio::SimpleAlign->new();
my $out = Bio::AlignIO->new(-format => 'clustalw');

for my $v (1..10) {
     my $ls = Bio::LocatableSeq->new(-id => 'ABCD1234',
                                     -version => $v,
                                     -alphabet => 'dna',
                                     -seq => '--atg---gta--');
     $aln->add_seq($ls);
}
$out->write_aln($aln);

# output below
___DATA___

CLUSTAL W(1.81) multiple sequence alignment


ABCD1234.1/1-6         --atg---gta--
ABCD1234.2/1-6         --atg---gta--
ABCD1234.3/1-6         --atg---gta--
ABCD1234.4/1-6         --atg---gta--
ABCD1234.5/1-6         --atg---gta--
ABCD1234.6/1-6         --atg---gta--
ABCD1234.7/1-6         --atg---gta--
ABCD1234.8/1-6         --atg---gta--
ABCD1234.9/1-6         --atg---gta--
ABCD1234.10/1-6        --atg---gta--
                          ***   ***


chris




More information about the Bioperl-l mailing list