[Bioperl-l] Comparative genomics

Osborne, Brian Brian.Osborne@osip.com
Fri, 28 Sep 2001 10:02:56 -0400


Elia,

>> protein families, genomic location, phylogenetic tree structure, sequence
similarity

>> Comments?

A thought, only. When you have trees or families you have the opportunity to
compare members of orthologous groups to see what's unique to the individual
orthologues. I.e. mouse mGluR5 and human mGluR5 might not simply the most
similar, in the overall sense, but might share motifs|regular
expressions|sequences that other members of the mGlu family do not. So the
code would look for these mGluR5-unique ids. Of course, databases of motifs
already exist, but they don't always get down to the individual orthologue
level. Another form of verification, I suppose, if the commonality exists or
is detectable. I hear biologists say "these are orthologues" all the time -
I wonder "how do they know?"

I can also imagine that these unique identifiers don't always exist at the
primary sequence level, but are conserved in folds, etc.

Brian O.

 -----Original Message-----
From: 	Elia Stupka [mailto:elia@ebi.ac.uk] 
Sent:	Friday, September 28, 2001 6:56 AM
To:	Ewan Birney
Cc:	ensdev; Bioperl; Fugu Project mailing list
Subject:	[Bioperl-l] Comparative genomics

Hello,

   during my week in Singapore we had some interesting discussions over
genome comparisons. The take home messages seemed to be:

1-one cannot do anything serious about orthologue/homologue/paralog
detection with sequence similarity alone

2-To get good scientific results (read research project) one needs one or
more researchers actually going down there and analysing families with
the help of phylogenetic trees, and spending considerable time and brain
on figuring out what is what.

Over in Singapore, we are going to try and develop a two-track system. On
one hand we will have a postdoc looking at each gene family, trees, etc.
in detail as per point 2. On the other hand we obviously want to get
something as good as possible to run automatically as part of the
ensembl pipeline.

It turns out that phylogenetic trees have a nice flatfile format, and can
be parsed. We'll write the db schema, object layer,etc. to store those
phylogenetic trees in the db. Please shout if similar things exist
already, I am thinkig of adding them to bioperl.

Once those trees are in the db, we hope to write a magic OrthologueFinder
which will take into account protein families, genomic location,
phylogenetic tree structure, sequence similarity, and assign orthologues.
As far as I am aware the question hasn't been tackled in an automated way
yet, so probably we will find out what is doable, and how good the
approach is once we get the data rolling...

Once the ortholgoues are identified we could also have a
GeneClusterFinder, which looks at a gene and its ortholgoues, and walks on
the sides of them to look for conserved gene clusters.

Comments?

Elia

******************************
* http://www.ebi.ac.uk/~elia *
* tel:    +44 1223 49 44 31  *
* mobile: +44 7971 59 03 69  *
* fax:    +44 1223 49 44 68  *
******************************



_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l