[Bioperl-l] Grouping the sequences in the alignment based on similarity?

Bhakti Dwivedi bhakti.dwivedi at gmail.com
Tue Feb 16 15:17:48 UTC 2010


I have nucleotide sequence alignments of closely related and distantly
related sequences.  I wish to produce new sequence alignment(s) based on the
presence of conserved regions among the sequences in the alignment.  So that
the new alignment (s) will be subgroups of the original alignment based on
the degree of similarity.

For example:(may not be the perfect example, but just to show the point)
seq1    ATGGCAR
seq2    ATGGCAR
seq3    GCGCTAN
seq4    GCCGTAY

will produce the following
seq 1  ATGGCAR              seq3  GCGCTAN
seq 2  ATGGCAR              seq4  GCCGTAY


This is a manual process, where I select/de-select the sequences in the
alignment based on how similar they look to each other to obtain a better
consensus conserved sequence for each group.  I know that there are
techniques like clustering algorithm to group the sequences in the multiple
alignment, but wondering if there is a way to automate this in bioperl?

Thanks!

Bhakti



More information about the Bioperl-l mailing list