[Bioperl-l] Trouble with Bio::DB::Fasta and large files
Gudmundur A. Thorisson
Wed, 06 Nov 2002 09:48:31 +0000
I would like to add to this discussion by saying that any reasonably
recent Linux kernel (if Linux is the 2Gb-deficient platform in this
case), starting with 2.4 I think, can handle >2Gb files, up to some 4Tb
or whatever the next level is. I have not had any problems like this
with the kernel itself since RedHat 7 or so.
The problem in our case, as Lincoln describes, was solved just by
recompiling Perl. But in a previous setting where I needed to do a
similar thing (2.2 series kernel, recompiled for large file support),
just a gzip-on-the-fly and piping into a reformatting script, so Perl
never saw a >2Gb file. In that case, I needed to do the same thing as
Allen just mentioned, i.e recompile the (Bash) shall and some file
utilities and related GNU-stuff. That took care of that problem.
Allen Day wrote:
>Are you perchance using tcsh? It could simply be a problem with your
>shell. This came up on the bioclusters mailing list a while ago:
>I ran into the problem last week when it appeared gzip wouldn't work for
>me when trying to load a big (human) file into Bio::DB::GFF. Recompiled
>the shell and it was fine.
>On Tue, 5 Nov 2002, Lincoln Stein wrote:
>>I believe you are hitting the 2 GB file limit on some Unix systems. In
>>general, you will have to do three things:
>> 1) make sure that your kernel supports large files > 2 Gb
>> Recompile the kernel if not.
>> 2) make sure that you have a recent version of the C library,
>> libc, that supports large files. Install a new one if not (good luck!)
>> 3) make sure that you have a version of Perl that was compiled
>> with large file support. Recompile with large file support turned
>> on if not.
>>It's a big pain. We just had to do this for one of our servers when we
>>experienced a similar problem.
>>On Tuesday 05 November 2002 07:32 pm, Tyler wrote:
>>>I have been using Bio::DB::Fasta to extract sequences from fasta BLAST
>>>databases for zebrafish and fugu with no problems. I've used both the
>>>tied hash and object oriented implementations and they work great with
>>>these databases. Thanks Lincoln.
>>>However, when trying to use Bio::DB::Fasta on local mouse or human
>>>genome databases (ensembl raw data) they throw the "Invalid file or
>>>dirname" exception. The mouse fasta file is 2.7GB and the human one is
>>>3.2GB, as opposed to 1.2GB for zebrafish and 340MB for fugu. All
>>>scripts are the same except for the name of the database file. All
>>>databases work fine with standalone blast (both the web interface and
>>>the bioperl interface).
>>>Is there a work around for dealing with these extremely large files?
>>>Bioperl-l mailing list
>>Bioperl-l mailing list
>Bioperl-l mailing list