[Bioperl-l] Comparative genomics

Elia Stupka elia@ebi.ac.uk
Fri, 28 Sep 2001 11:56:00 +0100 (BST)


Hello,

   during my week in Singapore we had some interesting discussions over
genome comparisons. The take home messages seemed to be:

1-one cannot do anything serious about orthologue/homologue/paralog
detection with sequence similarity alone

2-To get good scientific results (read research project) one needs one or
more researchers actually going down there and analysing families with
the help of phylogenetic trees, and spending considerable time and brain
on figuring out what is what.

Over in Singapore, we are going to try and develop a two-track system. On
one hand we will have a postdoc looking at each gene family, trees, etc.
in detail as per point 2. On the other hand we obviously want to get
something as good as possible to run automatically as part of the
ensembl pipeline.

It turns out that phylogenetic trees have a nice flatfile format, and can
be parsed. We'll write the db schema, object layer,etc. to store those
phylogenetic trees in the db. Please shout if similar things exist
already, I am thinkig of adding them to bioperl.

Once those trees are in the db, we hope to write a magic OrthologueFinder
which will take into account protein families, genomic location,
phylogenetic tree structure, sequence similarity, and assign orthologues.
As far as I am aware the question hasn't been tackled in an automated way
yet, so probably we will find out what is doable, and how good the
approach is once we get the data rolling...

Once the ortholgoues are identified we could also have a
GeneClusterFinder, which looks at a gene and its ortholgoues, and walks on
the sides of them to look for conserved gene clusters.

Comments?

Elia

******************************
* http://www.ebi.ac.uk/~elia *
* tel:    +44 1223 49 44 31  *
* mobile: +44 7971 59 03 69  *
* fax:    +44 1223 49 44 68  *
******************************