[Bioperl-l] working with large alignments

Heikki Lehvaslaiho heikki at ebi.ac.uk
Mon Feb 2 06:41:23 EST 2004


Albert Vilella who is visiting me here at EBI works with really big genomic 
sequence alignments. I've committed several of his modules into cvs for that 
purpose. The most important additions are:

* Bio::Seq::LargeLocatableSeq
    Bio::RangeI compliant Bio::Seq::LargePrimarySeq 
    uses File::Tmp for seq storing
* Bio::Seq::LargeSeqI
    Interface class for LargeSeq implemantations
* Bio::AlignIO::largemultifasta
    IO class creating Bio::Seq::LargeLocatableSeq and SimpleAlign objects


The LargeLocatableSeq is based on code from Bio::Seq::LargePrimarySeq. 
Everything seems to work but if we run tests added to the end of the 
t/AlignIO.t file with larger files, the process is still using large amount 
of memory. We'be interested from hearing from anyone who can suggest 
improvements.

You are willling to test the code with larger data sets, I've put two files 
here:
 
http://www.ebi.ac.uk/~lehvasla/bioperl/medium.largemultifasta (1.3M)
http://www.ebi.ac.uk/~lehvasla/bioperl/large.largemultifasta (31M)

Thanks,

	-Heikki  and Albert
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list