[Bioperl-l] Reciprocal best hits using Bioperl?
Tristan Lefebure
tristan.lefebure at gmail.com
Sun Jan 17 20:36:38 EST 2010
On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
> yes - but mcl alone is something slightly different in
> that it doesn't correct for inparalogs, but for
> incomplete genomes this is probably okay.
interestingly, my experience with not too divergent
bacterial genomes (same genera) does not support the
normalization used in the orthoMCL (which, as far as I
understand, is a standardization of the -Log10(evalue) per
taxa combination, including a taxa with itself). MCL, which
does not do any normalization (just -Log10(evalue)) gives
about the same number of false negative (i.e. missed
orthologs), but a lot less false positive (false orthologs).
In other words, you get many fake singletons. I don't known
exactly if the problem lies in the normalization process or
the fact that orthoMCLv1.x is using a very old version of
MCL. What I do known is that many false positive are made of
short or incomplete proteins that are very common in draft
genomes and automatic annotations... Things might be
completely different with more divergent and globally longer
proteins. Testing orthoMCLv2 on the same data set would
probably give the answer.
--Tristan
More information about the Bioperl-l
mailing list