[Bioperl-l] Packages retrieving online alignment sequences

Jun Yin jun.yin at ucd.ie
Fri Aug 6 10:52:14 UTC 2010


Hi, all,

 

I am the google summer of code student working on refactoring Bio::Align
subsystem. I recently implemented several packages retrieving online
alignment sequences. The aim of the packages are to provide convenient
methods to retrieve online alignment sequences for the BioPerl users. The
alignment sequences are converted into Bio::SimpleAlign object after the
retrieval, which will be easy to manipulate and write to local disk. Now the
packages support Pfam, Rfam, Prosite and Entrez Protein Clusters databases.

 

Here is the structure of the packages:

Packages

Bio::DB::Align (interface, and calling other packages)

Bio::DB::Align::Pfam (retrieving alignment from Pfam)

Bio::DB::Align::Rfam (retrieving alignment from Rfam)

Bio::DB::Align:Prosite (retrieving alignment from Prosite)

Bio::DB::Align:ProtClustDB (retrieving alignment from Entrez Protein
Clusters Database)

 

Usually four methods are provided for each package:

Methods

get_Aln_by_id (retrieving alignment by id and returns Bio::SimpleAlign
object)

get_Aln_by_acc (retrieving alignment by acession and returns
Bio::SimpleAlign object) (Rfam and Prosite only supports this method)

id2acc (id to accession conversion)

acc2id (accession to id conversion)

 

These packages are built dependent on LWP::UserAgent, HTTP::Request and
Bio::DB::GenericWebAgent. Bio::DB::Align::ProtClustDB is dependent on
Bio::DB::EUtilities.

 

Calling the packages can be:

 

my $dbobj=Bio::DB::Align->new(-db=>"rfam");

Or, my $dbobj= Bio::DB::Align::Pfam->new();


my $aln=$dbobj->get_Aln_by_acc("RF0001");
my $aln2=$dbobj->get_Aln_by_acc(-accession=>"RF0001",-alignment=>"full");

print $aln->length();

foreach my $seq ($aln->each_Seq) {
#do something
}

 

I have done some tests on these packages. And, I will write them into
standard tests later. Any suggestions on these packages are welcome.

 

Cheers,

Jun Yin

Ph.D. student in U.C.D.

 

Bioinformatics Laboratory

Conway Institute

University College Dublin

 




More information about the Bioperl-l mailing list