[Bioperl-l] Trouble with Bio::DB::Fasta and large files

Wed, 06 Nov 2002 09:48:31 +0000

  I would like to add to this discussion by saying that any reasonably 
recent Linux kernel (if Linux is the 2Gb-deficient platform in this 
case), starting with 2.4 I think, can handle >2Gb files, up to some 4Tb 
or whatever the next level is. I have not had any problems like this 
with the kernel itself since RedHat 7 or so.

The problem in our case, as Lincoln describes, was solved just by 
recompiling Perl. But in a previous setting where I needed to do a 
similar thing (2.2 series kernel, recompiled for large file support), 
just a gzip-on-the-fly and piping into a reformatting script, so Perl 
never saw a >2Gb file. In that case, I needed to do the same thing as 
Allen just mentioned, i.e recompile the (Bash) shall and some file 
utilities and related GNU-stuff. That took care of that problem.

Mummi, CSHL

Allen Day wrote:

>Tyler,
>
>Are you perchance using tcsh?  It could simply be a problem with your 
>shell.  This came up on the bioclusters mailing list a while ago:
>
>http://bioinformatics.org/pipermail/bioclusters/2002-May/000220.html
>
>I ran into the problem last week when it appeared gzip wouldn't work for 
>me when trying to load a big (human) file into Bio::DB::GFF.  Recompiled 
>the shell and it was fine.
>
>-Allen
>
>
>
>On Tue, 5 Nov 2002, Lincoln Stein wrote:
>
>  
>
>>I believe you are hitting the 2 GB file limit on some Unix systems.  In 
>>general, you will have to do three things:
>>
>>	1) make sure that your kernel supports large files > 2 Gb
>>	Recompile the kernel if not.
>>
>>	2) make sure that you have a recent version of the C library,
>>	libc, that supports large files.  Install a new one if not (good luck!)
>>
>>	3) make sure that you have a version of Perl that was compiled
>>	with large file support.  Recompile with large file support turned
>>	on if not.
>>
>>It's a big pain.  We just had to do this for one of our servers when we 
>>experienced a similar problem.
>>
>>Lincoln
>>
>>On Tuesday 05 November 2002 07:32 pm, Tyler wrote:
>>    
>>
>>>I have been using Bio::DB::Fasta to extract sequences from fasta BLAST
>>>databases for zebrafish and fugu with no problems. I've used both the
>>>tied hash and object oriented implementations and they work great with
>>>these databases. Thanks Lincoln.
>>>
>>>However, when trying to use Bio::DB::Fasta on local mouse or human
>>>genome databases (ensembl raw data) they throw the "Invalid file or
>>>dirname" exception. The mouse fasta file is 2.7GB and the human one is
>>>3.2GB, as opposed to 1.2GB for zebrafish and 340MB for fugu. All
>>>scripts are the same except for the name of the database file. All
>>>databases work fine with standalone blast (both the web interface and
>>>the bioperl interface).
>>>
>>>Is there a work around for dealing with these extremely large files?
>>>
>>>-Tyler
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l@bioperl.org
>>>http://bioperl.org/mailman/listinfo/bioperl-l
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@bioperl.org
>>http://bioperl.org/mailman/listinfo/bioperl-l
>>
>>    
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l
>
>  
>