[Bioperl-l] Grouping the sequences in the alignment based on similarity?
Bhakti Dwivedi
bhakti.dwivedi at gmail.com
Tue Feb 16 15:17:48 UTC 2010
I have nucleotide sequence alignments of closely related and distantly
related sequences. I wish to produce new sequence alignment(s) based on the
presence of conserved regions among the sequences in the alignment. So that
the new alignment (s) will be subgroups of the original alignment based on
the degree of similarity.
For example:(may not be the perfect example, but just to show the point)
seq1 ATGGCAR
seq2 ATGGCAR
seq3 GCGCTAN
seq4 GCCGTAY
will produce the following
seq 1 ATGGCAR seq3 GCGCTAN
seq 2 ATGGCAR seq4 GCCGTAY
This is a manual process, where I select/de-select the sequences in the
alignment based on how similar they look to each other to obtain a better
consensus conserved sequence for each group. I know that there are
techniques like clustering algorithm to group the sequences in the multiple
alignment, but wondering if there is a way to automate this in bioperl?
Thanks!
Bhakti
More information about the Bioperl-l
mailing list