[Biojava-l] Similarity measures for generalized sequences
Andreas Prlic
andreas at sdsc.edu
Tue May 8 02:33:33 UTC 2012
Hi Oliver,
Here just a couple of keywords for you to look at, not sure if you
have looked at any of these already...
- distance between sequences
- probabilistic similarity measures
- multidimensional scaling
Hope that makes some sense...
Andreas
On Mon, May 7, 2012 at 3:21 AM, Oliver Schmitt
<schmitt at med.uni-rostock.de> wrote:
> Hi,
>
> I'm looking for a general advice regarding the comparison of sequences
> (S). I mean not necessarily DNA sequences, however,
> sequences like Region A is connected with Regions B (shortly A->B) and
> then a distance or similarity measure that
> allows to identify similiar sequences or paths. The regions are
> alphanumerically coded like "Bed nucleus of the stria terminalis
> anterior division".
> Given are 10^2 to 10^7 different paths, searched are all there mutual
> similiarities (e.g., similarity matrix) and a multivariate
> classificartion like a dendrogram
> based on a meaningful cluster analysis.
>
> Example
> Given:
> S1: A->B->C->G
> S2: A->B->F->G
> S3: A->C->B->G
> S4: A->B->D->G
>
> Searched:
> Similiarity matrix
>
> S1 S2 S3 S4
> S1 ? ? ? ?
> S2 ? ? ? ?
> S3 ? ? ? ?
> S4 ? ? ? ?
>
> Then I would like to generate a dendrogram based on similarity measure:
>
> S1--
> |--
> S2-- |
> |----
> S3-- |
> |-- |
> S4--
>
>
> Thanks a lot for any advices.
>
> Regards,
> Oliver
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
More information about the Biojava-l
mailing list