[Bioperl-l] Proposal for bio-perl updates: ACE assembly file

Jordan Swanson jswanson at iastate.edu
Mon Feb 28 11:05:06 EST 2005


On Monday 14 February 2005 12:05 pm, Jordan Swanson wrote:
> Hi,
> I am new to bioperl, but I have a proposal for updating bioperl with some
> of the code I have been using.
>
> Bioperl packages currently exist that open ACE assembly files (output by
> phrap/cap3, and other assembly program).  However, the current code brings
> in the entire file in one call:
>
> my $assembly_in =
> 	 Bio::Assembly::IO->new(-file=>"input.ace",
> 						-format=>'ace');
>
> my $assembly = $assembly_in->next_assembly;
>
> I am working on a large EST assembly project(roughly 150K) and our assembly
> files have been around 200 MB in size.  For many of our applications, we
> only need to process one contig at a time, not to mention that reading the
> entire assembly at once requires a large amount of memory and/or disc
> space.
>
> I have developed some code that reads in contigs one at a time, therefore
> using only the amount of space needed for one contig object. A brief
> synopsis:
>
> my $contig_in = ContigIO->new(-file=>$filename, -format=>'ace');
> while( my $contig = $contig_in->next_contig)
> {
> 	do_stuff_with_contig();
> }
>
> Furthermore, there is no code that currently writes out ACE files or
> reverses the contigs orientation.  I have developed some code that
> implements both, and if you would have it, I would like to submit this
> code.  I have been working on converting this code to a more bioperl
> friendly format
> ( inheriting from bioseq objects, using the bioperl IO system, bioperl
> style warnings and so forth)
>
> I would appreciate some advice on how to proceed, specifically on
> inheriting from the correct classes and avoiding duplication of code. My
> initial thoughts:
>
> *  Pull out the parsing code from Assembly::IO::ace.pm and into a new
> ContigIO::ace.pm, (possibly inherited from AlignIO, since the contig object
> is an AssemblyI)
> * Alter Assembly::IO.ace.pm to use the ContigIO.pm to load the entire
> contig into, and to output the assembly
> * Incorporate somewhere, my reverse_contig function ( which is like revcom
> for Bio::SeqI, so possibly in the ContigI.pm file)
>
> Thoughts?

I've gone ahead and incorporated my changes into bioperl compliant objects.  

*Bio/Assembly/ContigIO.pm created
*Bio/Assembly/ContigIO directory created
*Bio/Assembly/ContigIO/ace.pm created
*Bio/Assembly/IO/ace.pm modified to use Bio::Assembly::Contig
*Bio/Assembly/Contig.pm modified to allow base segments and to add a revcom 
method
*t/ContigIO.t created

How does one submit their code for inspection/review/incorporation?  I used 
cvs to check out the code I've been using, but "cvs add" is not working at my 
permission level.




-- 
Jordan M Swanson   
Department of Ecology, Evolution, and Organismal Biology 
Iowa State University 


More information about the Bioperl-l mailing list