[Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers

Mon Aug 17 17:14:57 UTC 2009

One of the questions facing people working in bioinformatics is "How do we
present information so that it can be effectively interpreted by
non-informatics specialists?"

Now, my expertise lies in computer science (esp. O.S. & databases) and as a
second vocation the biology of aging (DNA damage & repair, to a lesser
extent cancer and pathologies of aging, etc.).  Now by my estimate there are
perhaps 5 people in the world who are able to effectively discuss computer
science X aging (gerontology) [3].  There are perhaps several dozen people
where those areas, esp aging, may overlap with DNA damage & repair.  But
then there is a wider audience of perhaps a few hundred members of AGE, and
maybe a thousand or so who are members of the scientific subgroup of GSA.
But most of those individuals are "old school" scientists who know
relatively little about bioinformatics.  So one has barriers to presenting
bioinformatics information in ways that they can use usefully.

I have found in my limited experience that homology graphs of conserved
protein domains, such as those displayed in HomloGene or those in Ensembl
(including phylogeny graphs) can be quite useful in reaching interesting
conclusions.  For example, double strand break repair processes which may
involve 8-10 relatively conserved proteins, may have a critical role in the
mechanisms of aging.  In particular two of those proteins, WRN & DCLRE1C
(Artemis) contain complementary exonuclease activities which chew up the DNA
in order to prepare the strands for ligation.  Of course, programmers may
appreciate better than gerontologists the significance of deleting random
bytes from instruction sequences in ones code.  At the recent AGE meeting in
June several discussions arose as to possible differences in "aging" in
yeast, *C. elegans* and mammals. [1].  A quick database search showed that *C.
elegans* seems to be lacking the exonuclease domain on the WRN homologue and
may be missing a DCLRE1C homologue entirely (which if true would lead to
conclusions that aging in *C. elegans* may be fundamentally different from
aging in vertebrates).  Explaining this to researchers can best be done
using pictures.

I've been through PubMed and have several papers (NAR / BMC Bioinformatics)
regarding programs to do homology comparisons and phylogeny trees.  However
these seem to lean towards producing less condensed bioinformatics-ish
information.  I do not know however whether the outputs from databases like
PubMed HomoloGene or Ensembl have been packaged in tools that might be part
of BioPerl.  I am interested in programs that can be run on a regular basis
to draw "pretty pictures" that can be used for publication and/or internet
browsing.  In particular I'm interested in running such programs on species
of interest to various gerontological communities [2] which involves subsets
of databases which seem to be scattered around the world.

Thanks.

1. Of course there has been lots of discussion and rationalization over the
last 15+ years about how "aging" is largely the same in more complex and
simpler organisms -- in part to justify sequencing some organisms and in
part to justify funding research at certain laboratories.  A closer
examination based on some of the complete and emerging genome sequences may
suggest this is a very swampy discussion.
2. For example, nematode DNA repair gene comparisons would be interesting to
nematode researchers, insect DNA repair gene comparisons to insect
researchers, both to invertebrate researchers, etc.
3. The recently published textbooks *Aging of the Genome* by Jan Vijg and
the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg *et al*,
go a long way towards moving these areas from the stacks of research
libraries into areas for more general discussion.  Both volumes deal
extensively with the ~150 DNA repair genes.