[Bioperl-l] Reciprocal blast

Dave Messina David.Messina at sbc.su.se
Fri Mar 7 18:14:38 UTC 2008


Hey Matt,

Your question is a little beyond the scope of this mailing list. I don't
know what your bioinformatics background is, but in my experience it's best
to get started hands-on, either in a class or with someone you can sit down
and work through it with. You'll have a million questions, and a mailing
list isn't really suitable for that.

That being said, I would run the blasts on the command-line, parse out the
best hits with BioPerl, and then use hashes to identify mutual best hits.

Briefly, you have two datasets A & B. Format each dataset into a blast
database using xdformat or formatdb. Run two blasts, one with A as query and
B as database and then one with B as query and A as database. The two output
files, each containing multiple Blast reports, can then be processed with
Bio::SearchIO to extract the best hit for each protein.

Read this tutorial for help with that:
http://www.bioperl.org/wiki/HOWTO:SearchIO

Once you get the best hit for each protein, then you can use Perl to find
every instance where two proteins, one from each set, are each other's best
hit. One way would be to create two hashes, one for each set, with query
proteins as keys and best hits as values, and then step through to find the
reciprocal bests.


Dave



More information about the Bioperl-l mailing list