[Bioperl-l] Proposal for bio-perl updates: ACE assembly file

Jordan Swanson jswanson at iastate.edu
Mon Feb 14 13:05:52 EST 2005


Hi,
I am new to bioperl, but I have a proposal for updating bioperl with some of 
the code I have been using.

Bioperl packages currently exist that open ACE assembly files (output by 
phrap/cap3, and other assembly program).  However, the current code brings in 
the entire file in one call:

my $assembly_in =
	 Bio::Assembly::IO->new(-file=>"input.ace",                         
						-format=>'ace');

my $assembly = $assembly_in->next_assembly;

I am working on a large EST assembly project(roughly 150K) and our assembly 
files have been around 200 MB in size.  For many of our applications, we only 
need to process one contig at a time, not to mention that reading the entire 
assembly at once requires a large amount of memory and/or disc space.  

I have developed some code that reads in contigs one at a time, therefore 
using only the amount of space needed for one contig object. A brief 
synopsis:

my $contig_in = ContigIO->new(-file=>$filename, -format=>'ace');
while( my $contig = $contig_in->next_contig)
{
	do_stuff_with_contig(); 
}

Furthermore, there is no code that currently writes out ACE files or reverses 
the contigs orientation.  I have developed some code that implements both, 
and if you would have it, I would like to submit this code.  I have been 
working on converting this code to a more bioperl friendly format 
( inheriting from bioseq objects, using the bioperl IO system, bioperl style 
warnings and so forth)  

I would appreciate some advice on how to proceed, specifically on inheriting 
from the correct classes and avoiding duplication of code. My initial 
thoughts:

*  Pull out the parsing code from Assembly::IO::ace.pm and into a new 
ContigIO::ace.pm, (possibly inherited from AlignIO, since the contig object 
is an AssemblyI)
* Alter Assembly::IO.ace.pm to use the ContigIO.pm to load the entire contig 
into, and to output the assembly
* Incorporate somewhere, my reverse_contig function ( which is like revcom for 
Bio::SeqI, so possibly in the ContigI.pm file)

Thoughts?

---  
Jordan M Swanson   
Department of Ecology, Evolution, and Organismal Biology 
431 Bessey Hall 
Iowa State University 
Ames, IA 50011 
Lab 515 294-7098 
FAX: 515-294-1337 


More information about the Bioperl-l mailing list