[Biojava-l] Similarity measures for generalized sequences

Tue May 8 02:33:33 UTC 2012

Hi Oliver,

Here just a couple of keywords for you to look at, not sure if you
have looked at any of these already...

- distance between sequences
- probabilistic similarity measures
- multidimensional scaling

Hope that makes some sense...

Andreas

On Mon, May 7, 2012 at 3:21 AM, Oliver Schmitt
<schmitt at med.uni-rostock.de> wrote:
> Hi,
>
> I'm looking for a general advice regarding the comparison of sequences
> (S). I mean not necessarily DNA sequences, however,
> sequences like Region A is connected with Regions B (shortly A->B) and
> then a distance or similarity measure that
> allows to identify similiar sequences or paths. The regions are
> alphanumerically coded like "Bed nucleus of the stria terminalis
> anterior division".
> Given are 10^2 to 10^7 different paths, searched are all there mutual
> similiarities (e.g., similarity matrix) and a multivariate
> classificartion like a dendrogram
> based on a meaningful cluster analysis.
>
> Example
> Given:
> S1: A->B->C->G
> S2: A->B->F->G
> S3: A->C->B->G
> S4: A->B->D->G
>
> Searched:
> Similiarity matrix
>
>     S1  S2  S3  S4
> S1  ?    ?    ?    ?
> S2  ?    ?    ?    ?
> S3  ?    ?    ?    ?
> S4  ?    ?    ?    ?
>
> Then I would like to generate a dendrogram based on similarity measure:
>
> S1--
>        |--
> S2--     |
>             |----
> S3--     |
>        |-- |
> S4--
>
>
> Thanks a lot for any advices.
>
> Regards,
> Oliver
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>