[Bioperl-l] Extracting matching subsequence from pairwise alignment

WoA witch.of.agnessi at gmail.com
Wed May 8 19:24:53 UTC 2013


Hello All,

I've a pairwise global alignemnet of two DNA sequences generated by the
program NEEDLE of EMBOSS package. I wish to extract the sub-sequence that
matches/aligns to a given region of the other sequence. In  this alignment
<http://pastebin.com/wrhEDzJU>   (Pastebin Link) the given region (actually
the CDS) falls between base number 24:485 in the original sequence with ID
'XM_001005073.' I wish to extract the sub-sequence in the sequence ID
'Homolog' that aligns with that 24:485 region of the other sequence.

I'm using Bioperl to parse the alignment. I find out the the alignment
column numbers corresponding to 24:485 region in the particular sequence,
using 'column_from_residue_number'. Then I extract the sub-sequence from the
'aligned' other sequence(containing gaps) using the corresponding column
numbers. Finally I remove the gap characters.

Am I doing this thing correctly and are there any pitfalls ? Is there any
better way to do it by (Bio)Perl/Python? The code goes here:
use strict;
use warnings;
use Bio::AlignIO;

# read in an alignment generated by the EMBOSS program Needle
my $in = new Bio::AlignIO(-format => 'emboss',
                  -file   => 'test_needle.aln');

while( my $aln = $in->next_aln ) {
          #Seqnames: 'XM_001005073.'(CDS:24-485),'Homolog'
          my ($cds_start,$cds_end)=(24,485);#
          my $col_cdsstart = $aln->column_from_residue_number(
'XM_001005073.', $cds_start);
          my $col_cdsend= $aln->column_from_residue_number( 'XM_001005073.',
$cds_end);

          foreach my $seq ($aln->each_seq) {
                    if($seq->id() eq 'Homolog'){
                        my
$homolog_cds=$seq->subseq($col_cdsstart,$col_cdsend);
                         $homolog_cds=~s/\-//g;
                        print $homolog_cds,"\n";
                    }

        }
}





--
View this message in context: http://bioperl.996286.n3.nabble.com/Extracting-matching-subsequence-from-pairwise-alignment-tp16935.html
Sent from the Bioperl-L mailing list archive at Nabble.com.



More information about the Bioperl-l mailing list