[Bioperl-l] Cannot read in alignment data with Bio::AlignIO
Chris Fields
cjfields at illinois.edu
Tue Feb 10 15:27:23 UTC 2009
On Feb 10, 2009, at 8:40 AM, manni122 wrote:
>
> Hi,
> I am trying to read in a file with multiple pairwise alignments.
> Some IDs
> appear frequently. So if I am using this code below I get the error
> message:
> --- MSG: Replacing one sequence xxx ---
> Is there a way to read the data even with those similar names?
> Regards, manni122. [...]
The NSE (Name.version/start-end) is used to distinguish the sequences
from one another, so if each sequence has one or more unique accession/
version/start/end there should be no replacement (and no warning).
If you think about it that's a feature. Any single sequence that
appears in an alignment more than once is either (1) matching multiple
regions (i.e. repeats, motifs, etc) so the location varies, or (2) the
sequence was modified so the version changes (the last one is fairly
new). Beyond that one has to question the logic of including multiple
copies of exactly the same sequence record in a multiple alignment, so
unless additional information distinguishing the potential duplicates
is provided we assume unintentional (and erroneous) duplication and
punt.
Weighing the options I would rather have the warning indicating a
problem than nothing at all. If you absolutely need duplicates (I am
curious as to why) I suggest changing the version number:
use Bio::LocatableSeq;
use Bio::SimpleAlign;
use Bio::AlignIO;
my $aln = Bio::SimpleAlign->new();
my $out = Bio::AlignIO->new(-format => 'clustalw');
for my $v (1..10) {
my $ls = Bio::LocatableSeq->new(-id => 'ABCD1234',
-version => $v,
-alphabet => 'dna',
-seq => '--atg---gta--');
$aln->add_seq($ls);
}
$out->write_aln($aln);
# output below
___DATA___
CLUSTAL W(1.81) multiple sequence alignment
ABCD1234.1/1-6 --atg---gta--
ABCD1234.2/1-6 --atg---gta--
ABCD1234.3/1-6 --atg---gta--
ABCD1234.4/1-6 --atg---gta--
ABCD1234.5/1-6 --atg---gta--
ABCD1234.6/1-6 --atg---gta--
ABCD1234.7/1-6 --atg---gta--
ABCD1234.8/1-6 --atg---gta--
ABCD1234.9/1-6 --atg---gta--
ABCD1234.10/1-6 --atg---gta--
*** ***
chris
More information about the Bioperl-l
mailing list