[Bioperl-l] Reciprocal best blast hits using BioPerl?
Chris Larsen
clarsen at vecna.com
Mon Jan 18 17:42:13 UTC 2010
Bhakti, (and Chris, Mark)--
Yes there is some perl available to parse reciprocal best blast hits.
Mark's referenced / archived post was mine, we were looking to do what
you wanted. Here we proceed with the thread.
We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then
made a simple perl parser that would take the raw OrthoMCL output, do
splits, and spit out a delimited table of all the orthologs in a
group, for say Mycobacterium Genus, so you could stuff it into DBLoader.
The link to the script, SOP, and method is at:
http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf
Giving e.g.:
Francisella 1 110321310
Francisella 1 110321361
Francisella 1 56707275
Francisella 1 56707366
Francisella 1 56707462
Five members of Ortholog Group 1, with just their gi number. And you
can see the results of that parsing, supported by a database, being
used to load BioHealthbase with all the reciprocal best blast hits
plus other OrthoMCL parsing, for mycobacterial PolA at:
http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium
See? Pretty? We were just interested in making ortholog groups on the
bais of paralog-conscious reciprocal blast stuff. Like you. This
package and doc I've made does what you want I think, as long as you
stay in prokaryotes. But--careful...garbage in, garbage out. We
started with clean Genuses. (. o O Genii?). You'll get more junky HUGE
and TINY ortholog groups if you put in different Orders of microbes.
Its taxa sensitive. OrthoMCL author David Roos is great at it though
and designed it in mind of higher unicellular euks too...comb the docs
for that; sorry I was doing bacterial work at the time and cant guide
you if thats what you want.. If you end up installing OrthMCL 1.4, you
can pipe the output to this method and get out useable stuff.
Hope it works for you.
Cheers,
Chris L
--
Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525
More information about the Bioperl-l
mailing list