[BioPython] Substitution matrices

Brad Chapman chapmanb@arches.uga.edu
30 Sep 2000 12:37:39 EDT


Iddo wrote:
>  1) Pretty encouraging to see that someone else has been playing around
>  with it, along the same line of thought I took :)

Agreed :-). I was very happy to see that we were thinking about it in the
same way.

>  2) I was thinking along the line of enabling users to generate
>  substitution matrices from their own data. This means that once
>  someone has a bunch of multiple/pairwise alignments, that person could
use
>  the data to make their own substitution matrix/log-odds
matrix/frequency
>  tables/whatever. So I think that providing a generic matrix type which
>  has an attribute telling what it is is a good idea.

This sounds like an excellent idea! If you have ideas for how to model
this in a good way, I think it would be very nice to see. I only have a
basic knowledge of what goes into making a good substitution table, so it
would at least be useful for me to see a nice implementation of the
ideas.

>  3) This opens up a new arena: biopython objects for pairwise
>  alignments, and for multiple alignments. 
>  o  Does biopython have a pairwise alignment object?

I was just working this morning on finishing up the alignment objects I
have been talking about, and just committed them. This added the
following new code:

o The Bio.Align directory - This has the Generic Alignment object and an
object to covert between different alignment formats.

o The Bio.Clustalw directory - This has code to parse clustal formatted
alignment files into ClustalAlignment objects (which inherit from the
Generic object), and also has helpful code for dealing directly with
clustalw.

o Bio.Fasta.FastaAlign.py - Allows reading and writing of fasta formatted
alignment files into a FastaAlignment object.

There are tests for all of these (but note, the alignment test will fail
if you don't have Martel and all of its dependencies!).

The API documentation is updated with these new classes, and I also wrote
some tutorial documentation about using them. All of these docs can be
grabbed from the usual place:

http://www.biopython.org/wiki/html/BioPython/BiopythonCode.html

The alignment object is modelled sort of after SimpleAlign.pm in bioperl
distribution, with the main differences being that I inherit from the
base object to provide implementations for specific formats, and deal
with the conversions in a separate converter object.

What do you think about working off of these objects to deal with the
necessary alignment stuff? I would love to get feedback and thoughts on
them.

Sorry, I can't help with the Blast stuff. I need to play with that stuff
more...

Brad