[Bioperl-l] Perl wrappers for multiple sequence alignment algorithms

Peter Schattner schattner@alum.swarthmore.edu
Mon, 14 Aug 2000 16:30:17 -0700


Hello folks,

I have had a need for a easy-to-use perl interface to a good multiple
sequence alignment program.   Are any of you aware of such a
perl-interface out there??

Since I have not found such a program, I have written a simple Perl
wrapper script around the clustalw program.  If there is interest,  I’d
like to modify this script into a bioperl module facilitating easier
(bio)perl access to clustalw and also enabling expansion for wrapping of
other multiple sequence alignment algorithms.  (Although the source code
for clustalw is freely available, I intend to contact the authors to
confirm that they have nothing against generating a perl interface to
their program).

I have attached a brief outline describing the capabilities and usage of
the proposed module.   Initially I envision implementing the module
simply through one or more Perl "system()" calls to the clustalw
program.  I am aware that there are limitations to this approach. 
Consequently, If the interface turns out to be useful, I would imagine
converting the implementation into something more flexible and robust in
the future (eg using "SWIG" or "XS").

I would appreciate any feedback re the usefulness, structure, usage,
interaction with other modules, etc. of this proposed object.

Thanks

Peter Schattner
 
============
MultiAlign.pm: a proposed bioperl module for the calculation of a
multiple sequence alignment from a set of unaligned sequences or
alignments.   

An "alignment factory object" is created by passing to the constructor 
the name of an implemented method for computing the alignment (eg
clustalw, Gibbs sampling, hmm) and a reference to a hash of
(non-default) parameters to be used by the factory, eg:

$params = {ktuple => 2, output => GCG, pwmatrix => BLOSUM);
$clustalfactory = Bio::Tools:: MultiAlign ->new(method=>’clustalw’,  $params);

Alignment parameters can be changed and/or examined in the usual manner:

$ktuple = $clustalfactory->get_set(‘ktuple’);
$ktuple++;
$ktuple = $clustalfactory-> get_set(‘ktuple’, $ktuple);

Once the factory has been created and the appropriate parameters set,
one can call a method to perform a multiple sequence alignment.  Input
to this method consists of a set of unaligned sequences in the form of
either a Bio:SeqIO object or an array of bioperl sequence objects:

$in  = Bio::SeqIO->new(-file => "inputfilename" , -format => 'Fasta');
$alignment = $clustalfactory->align($in);

or 

$in  = Bio::SeqIO->new(-file => "inputfilename" , -format => 'Fasta');
while $seq ( $in->next_seq() ) {
        push (@seq_array, $seq)
    }
$alignment = $clustalfactory->align(@seq_array);

In either case, align() returns a reference to a SimpleAlign object
which can then be displayed, stored, or converted to a UnivAln object
for further manipulation.

In the future, I envision providing perl-wrapper methods for additional
multiple sequence alignment functions (such as aligning two aligned
sequence sets to each other).