[Bioperl-l] sequence comparison

Phillip Lord p.lord@russet.org.uk
Mon, 11 Mar 2002 19:38:12 +0000




I'm trying to compare two sequences at the moment. I want to end up
with a simple metric of how similar the sequences are. My thought was
just to align the sequences and come up with a percent ID value. 

I've tried using the pSW class to do this. It seems to work but it
appears to be producing alignments only over a very small section of
the two proteins


So for instance

(where $swissprot->{inx} is a Bio::Index::Swissprot object)

my $seq1 = $swissprot->{inx}->fetch( "KV1K_HUMAN" );
my $seq2 = $swissprot->{inx}->fetch( "IF1Y_HUMAN" );


my $factory = new Bio::Tools::pSW( '-matrix' => `blosum62.bla,
                                  '-gap' => 12,
                                  '-ext' => 2, );

print "Seq1 is " . $seq1->seq . "\n";
print "Seq2 is " . $seq2->seq . "\n";

$aln->write_fasta( \*STDOUT);


gives me...

Seq1 is DIQMTQSPSTLSVSVGDRVTITCEASQTVLSYLNWYQQKPGKAPKLLIYAASSLETGV
        PSRFSGQGSGTBFTFTISSVZPZBFATYYCQZYLDLPRTFGQGTKVDLKR
Seq2 is PKNKGKGGKNRRRGKNENESEKRELVFKEDGQEYAQVIKMLGNGRLEALCFDGVKRLC
        HIRGKLRKKVWINTSDIILVGLRDYQDNKADVILKYNADEARSLKAYGGLPEHAKINE
        TDTFGPGDDDEIQFDDIGDDDEDIDDI

>KV1K_HUMAN/89-101
QZYLDLPRTFGQG
>IF1Y_HUMAN/32-44
QEYAQVIKMLGNG
% ID is 30.7692307692308


I'm not sure whether or not this is the expected behaviour of the pSW
class. I'm guessing its just returning the maximal scoring fragment
between the two sequences. 

My question is this though. What's the best way that I can extract
some measure of sequence similarity from two sequences using bioperl?
Will the bl2seq module work better for me? 

Thanks in advance

Phil