[Bioperl-l] Bio::Assembly::IO problems reading .ace files

Chris Fields cjfields at uiuc.edu
Tue Dec 4 05:10:57 UTC 2007


Yes, it's possible this would cause memory issues as each  
Bio::Assembly::Contig instance would have a  
Bio::SeqFeature::Collection attached (each Collection having a tied DB  
hash, which would be an open filehandle),  So if you had over 1000  
contigs open at any one time (in a parsed scaffold, for instance) you  
would have 1000 open file handles.  Not very efficient.

My thought was to have each Bio::Assembly::Scaffold instance carry a  
single Bio::SeqFeature::CollectionI (it could be a  
Bio::SeqFeature::Collection, Bio::DB::SeqFeature::Store, or any other  
CollectionI, whatever's easiest).  Each Contig would be passed (and  
store) a reference to the Scaffold SF::Collection and pull features  
from there; just haven't had time to mess with it.  I don't think  
anyone's tackling it, so feel free to code away!

chris

On Dec 3, 2007, at 8:25 PM, Florent Angly wrote:

> Would this issue cause an excessive memory usage? Because I was  
> getting a high memory usage when parsing some TIGR Assembler files  
> and was wondering if the tigr parser was responsible for that or the  
> parent assembly IO module.
> I'd definitely be interested in a fix of the Bio::Assembly  
> implementation if it's the assembly IO module's fault....
> Florent
>
> Chris Fields wrote:
>> This seems similar to the 'too many open filehandles issue'  
>> documented  here:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2320
>>
>> It unfortunately is due to having an open DB_File for every  
>> contig,  and is a problem with the Bio::Assembly implementation  
>> that isn't  easily fixed.  Changing the open filehandle limit using  
>> ulimit is the  only known fix:
>>
>> ulimit -n 10000
>>
>> chris
>>
>> On Dec 3, 2007, at 6:49 PM, Smithies, Russell wrote:
>>
>>
>>> Hi all,
>>>
>>> It' trying to read .ace files but keep getting an error that I don't
>>> know the cause of.
>>> Really basic example code:
>>>
>>> 	#!/usr/local/bin/perl -w
>>>
>>> 	use lib "/data/home/smithiesr/bioperl-live";
>>> 	use Bio::Assembly::IO;
>>> 	use Data::Dumper;
>>>
>>> 	$ace = "CLP0001001240-cE15_20030319.ace";
>>>
>>> 	$io = new Bio::Assembly::IO(-file=>$ace,-format=>"ace");
>>> 	$assembly = $io->next_assembly;
>>>
>>> 	foreach $contig ($assembly->all_contigs) {
>>>   		print Dumper $contig;
>>> 	}
>>>
>>> Gives this error;
>>> 	[smithiesr at impala ace_phrap]$ perl bp_read_ace.pl
>>> 	Can't call method "get_consensus_sequence" on an undefined value
>>> at /data/home/smithiesr/bioperl-live/Bio/Assembly/IO/ace.pm line  
>>> 170,
>>> <GEN0> line 42.
>>>
>>> Which relates to this bit in ace.pm:
>>> 	# Loading contig qualities... (Base Quality field)
>>> 	/^BQ/ && do {
>>> 	    my $consensus = $contigOBJ->get_consensus_sequence()->seq();
>>>
>>> Is this caused by a dud ace file or a problem with   
>>> Bio::Assembly::IO:ace
>>> or is the Contig object not getting created?
>>> Any ideas?
>>>
>>> Thanx,
>>>
>>> Russell Smithies
>>>
>>> Bioinformatics Software Developer
>>> T +64 3 489 9085
>>> E  russell.smithies at agresearch.co.nz
>>>
>>> Invermay  Research Centre
>>> Puddle Alley,
>>> Mosgiel,
>>> New Zealand
>>> T  +64 3 489 3809
>>> F  +64 3 489 9174
>>> www.agresearch.co.nz
>>>
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>> Attention: The information contained in this message and/or   
>>> attachments
>>> from AgResearch Limited is intended only for the persons or entities
>>> to which it is addressed and may contain confidential and/or   
>>> privileged
>>> material. Any review, retransmission, dissemination or other use  
>>> of,  or
>>> taking of any action in reliance upon, this information by persons  
>>> or
>>> entities other than the intended recipients is prohibited by   
>>> AgResearch
>>> Limited. If you have received this message in error, please notify  
>>> the
>>> sender immediately.
>>> =  
>>> = 
>>> = 
>>> ====================================================================
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list