[Bioperl-l] DB_File and assembly IO

Chris Fields cjfields at illinois.edu
Fri Aug 29 14:30:49 UTC 2008


This is a known problem with Bio::Assembly and stems from having a  
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained  
Bio::SeqFeature::Collection).  You can extend the number of open  
filehandles on UNIX'y flavors using ulimit (see following link), but  
I'm not sure about Win32.

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

The general bug is reproducible using the following simple script.  If  
needed adjust the range end in the for loop to exceed the ulimit (via  
'ulimit -n);  Mac OS X 10.5 is set to 2560.

---------------------------
use Bio::Assembly::Contig;

my @contigs;

push @contigs, Bio::Assembly::Contig->new() for (1..10000);
---------------------------

I'll open a bug report on this for tracking (for release 1.7, along  
with any other Bio::Assembly issues).  That doesn't mean it won't get  
fixed sooner, just that we aren't under pressure with the next  
release, which already has a full plate.  IMO, I don't think there  
needs to be one SF::Collection per contig; one instance should work do  
for the entire assembly, using the same SF::Collection passed in to  
each contig and distinguishing the contig using the SeqFeature  
seq_id.  It would also be nice if we could change that to also allow  
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the  
like, for instance).

chris

On Aug 29, 2008, at 3:40 AM, Florent Angly wrote:

> Hi Joshua,
>
> I don't know the specifics of DB_File, but the 'Cannot open file  
> tree: Too many open files' is pretty explicit.
> If you're on Unix/Linux you can check the files that are open by  
> your program by typing:
>   lsof | grep name_of_program
> There is probably a filehandle that in not closed somewhere in your  
> code or the BioPerl code.
> Best,
>
> Florent
>
>
>
> Joshua Udall wrote:
>> Bioperl -
>>
>> I'm trying to read/parse a single cap3 ace file with several thousand
>> contigs.  I get a DB_File error at Contig247.  Here's the error:
>>
>> ------------- EXCEPTION -------------
>> MSG: Unable to tie DB_File handle
>> STACK Bio::SeqFeature::Collection::new
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
>> STACK Bio::Assembly::Contig::new
>> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
>> STACK Bio::Assembly::IO::ace::next_assembly
>> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
>> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
>> -------------------------------------
>>
>> Looking at the Collection::new, the error is on the middle line:
>>
>>  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
>> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die  
>> "Cannot open
>> file: $!\n" ;
>>  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>>  return $self;
>>
>> If I uncomment out the $! die statement that I inserted, I get this:
>>
>> 'Cannot open file tree: Too many open files'
>>
>> Apparently the Collection constructor is creating a new index file  
>> for each
>> one and the handles for each are sticking around?  That confuses me  
>> because
>> reading more about the Collection.pm and DB_File, it appeared to me  
>> that no
>> files were written by default (as I'm doing), rather the Collection  
>> objects
>> are all stored in memory.  I'm pretty sure the error is not a  
>> permission
>> error, and if it is not the open file-handles, what else should I  
>> look for?
>>
>>
>> If I 'warn' the error instead of throwing it, I get:
>>
>> Can't call method "get_dup" on an undefined value at
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm  
>> line 360
>>
>> This kind of makes sense because the index appears not be be  
>> created and it
>> can't look stuff up in an undefined tied hash.  I'm stuck.
>>
>> Thanks for any help and suggestions.
>>
>> OSX, perl 5.8.8, bioperl-live (svn last week)
>>



More information about the Bioperl-l mailing list