[Bioperl-l] Recoding Bio::SimpleAlign

Jun Yin jun.yin at ucd.ie
Fri Jul 16 11:54:36 EDT 2010


Dear all,

 

I am the Google Summer of Code student working on refactoring Bio::Align
subsystems. The first aim of the project is to recode Bio::SimpleAlign. This
is because this package is really useful, but it was created a long time
ago, written by several people, and a bit inconsistent mainly due to the
above two reasons.

 

I tried to keep the package consistent (e.g. method calling, coding styles)
with the previous distribution. However, there are still a few changes.
Since this package is created and used by the community, I think it is
better to show it to everyone before it is merged with the major
distribution. Any suggestions and criticisms are welcome.

 

Here are the major improvements on Bio::SimpleAlign

 

1. MSA modifying and selection methods are more consistent and easier to
use. I have enabled multiple/reverse selections for all sequences/columns
selection methods, and change the names to be more understandable.

 

For example, 

$aln->select() and $aln->select_noncont() are both deprecated, and renamed
as $aln->select_Seqs() now. Because selections should be both in seqs and
columns, which need to be explicit in the method call. 

 

For example, multiple sequence selections can be called by:

 

$newaln=$aln->select_Seqs([4..10,20..35,37]);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37]);

Or you can toggle selection(reverse selection) using:
$newaln=$aln->select_Seqs([4..10,20..35,37],1);
$newaln=$aln->select_Seqs(-selection=>[4..10,20..35,37],-toggle=>1);

 

If you can the method using the old ways, e.g.

$newaln=$aln->select(1,5);

A warning will be shown:

select - deprecated method. Use select_Seqs() instead.

And, the calling will be redirected to

$newaln=$aln->select_Seqs([1..5]);

 

2. gap chars/missing chars are more consistent in the package

Default values for gap char and missing char are now set in the package. 

Calling/Setting gap char should be made by calling $aln->gap_char("-").

 

3. Some redundant methods are removed. The methods are moved to more
reasonable categories.

For example, $aln->select and $aln->select_noncont are deprecated now.
Please use $aln->select_Seqs.

 

 

4. Some methods are renamed. Methods selecting/giving objects are
capitalized, e.g. each_seq to each_Seq.

Another example, the method is renamed to give a clearer information.

$aln->purge is renamed into $aln->remove_redundant_Seqs

$aln->splice_by_seq_pos is renamed to $aln->remove_gaps

 

For further information, you can visit:

http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdDFfZGpJZlhidFY5blBneGdh
QUZ6WFE
<http://spreadsheets.google.com/ccc?key=0AssLTcJFJMbXdDFfZGpJZlhidFY5blBneGd
hQUZ6WFE&hl=en&authkey=CJTCw4QL> &hl=en&authkey=CJTCw4QL

 

Cheers,

Jun Yin

Ph.D. student in U.C.D.

 

Bioinformatics Laboratory

Conway Institute

University College Dublin

 




More information about the Bioperl-l mailing list