[BioPython] iterative ace parsing

Thu May 26 09:59:48 EDT 2005

Hi,

On Thu, 2005-05-26 at 15:31 +0200, edeveaud at pasteur.fr wrote:
> 	Hi,
> 	
> after reading the doc for Bio.Sequencing.Ace
> 
> I would like to run some analysis on an assembly composed of 174 contigs 
> based on approximatively 49000 reads.
> 
> the only problem is that parsing whole ace file at once needs 872M of memory.
> 
> my idea was to itereate over the contigs in order to decrease the memory needs,
> but the doc claims 
>  2) *** DEPRECATED: not entirely suitable for ACE files! 
>              Or you can iterate over the contigs of an ace file one by one in
> 			 the ususal way:        
> 			  
> could someone point me to some explanation about this warning ??
> 

It works fine, in theory. The problem with ace files is, that they are
not entirely suitable for contg-by-contig parsing, they can contain
contig-specific information at the very end of the file. So in your
case, after reading contig no. 174, there might be still some more info
left in the file about contigs no. 12, 132, and 160. Depending on what
kind of contigs you have, there might be no info at all or it's just
irrelevant for your analysis. The phrap manual (you're using phrap to
create the contigs?) lists the tags that can appear at the end of an ace
file, so you might want to have a look there and decide whether they are
important for you or not. If not, iterating voer contigs should just do
fine. 

Frank

> is the ace parser suitable for iterative tasks ??
> 

> 	thank's
> 
-- 
Frank Kauff
Dept. of Biology
Duke University
Box 90338
Durham, NC 27708
USA

Phone 919-660-7382
Fax 919-660-7293
Web http://www.lutzonilab.net/member/frankkauff.shtml