[Bioperl-l] Generalized reciprocal blast

Robert Bradbury robert.bradbury at gmail.com
Wed Aug 26 15:38:44 UTC 2009


I would like to know whether or not anyone has attempted to create a
"generalized" reciprocal blast component for BioPerl?

One sees papers all the time where they discuss running reciprocal blasts to
compare a new species to an old "standard" species or a set of species or
running an all-to-all set of comparisons to match up all of the "known"
proteins from species and determine which are outliers (and therefore
"novel").  There are also accumulating merged sets in NCBI HomoloGene (which
seems to be a some strict subset (perhaps a dozen) "well sequenced" genomes)
and Ensembl (which seems to be working with a much larger set of 40-50
genomes some of which may be somewhat incomplete and are certainly poorly
"explored".

I have, I believe, seen code "fragments" from various authors, perhaps some
on the BioPerl list, which perform some major subset of a typical
"reciprocal blast".

Now what I am looking for is a relatively generalizable some-to-some
reciprocal blast utility.  I want to be able to specify the genes (or gene
family), e.g. some of the ~150 known DNA repair genes.  It would be helpful
to also specify how "tolerant" the blast "true reciprocal" criteria are.
There are some genes where there is a very strict 1-to-1 relationship across
many genomes.  But for genes which involve relatively standard domains, e.g.
"helicase" domains, the 1-to-1 relationship becomes cloudy -- in mammals for
example its more like 5-to-5 and it would be really nice to be able to
specify the strictness or quality level [1] for "matching" genes (and even
which genes are to be excluded because they are known to be false
homologues).

Then to top this off I want to be able to combine known public e.g.
(HomoloGene / Uniigene / Ensembl) databases with perhaps local private
databases or database subsets (e.g. emerging or specialized genomes).

The goal here of course to determine the precise phylogenetic relationships
between all of the DNA repair genes and how there may be gain / loss /
evolution of function that can be related to species characteristics (size,
longevity, etc.).

Is there a generalized reciprocal blast component in BioPerl?  Or is it a
"build-it-yourself" situation (that I have to believe has been built
probably a few dozen times by various researchers / organizations /
companies)?

Thanks,
Robert Bradbury

1. This would be handled in BioPerl with a customizable user function which
could be tailored to handle specific cases -- for example a function which
when handed a set of 100 potential "matches" could go through those 100
matches, identify common domains, and then "re-rate" matches based on
considerations such as the type and number of common domains, domains being
in the same order, etc.  I.e. criteria which may be difficult to completely
generalize across entire genomes but are fairly obvious if you are looking
at a graphical replication of a gene set in HomoloGene.



More information about the Bioperl-l mailing list