[Bioperl-l] SeqIO problem on fasta file with large number of sequences.

Kun Zhang kzhang@gsbs3.gs.uth.tmc.edu
Thu, 11 Oct 2001 18:54:29 -0500


It turns out that nothing is wrong with SeqIO. StandAloneBlast was used 
with the loop to do some comparison with each sequence. And that's the 
problem. My program alway stops after 1003 sequences have been processed. 
And my solution was to write a very simple wrapper for the local blast 
instead. It looks very ugly compared with StandAloneBlast, but it works.

Kun

At 08:15 AM 10/9/2001 -0400, Jason Eric Stajich wrote:
>Kun -
>
>I'm not aware of any times when we allocate tempfiles in the SeqIO system
>I regularly parse files with SeqIO that are 10k+ with no problem so
>perhaps you are doing something else in addition within these loops that
>is allocating new tempfiles.
>
>The tempfile cleanup function as part of Temp::File is often not called
>until the program exits so you have the confusing situation of running out
>of space/free FH while the directory is empty when program exits.  We have
>worked around this by trying to register cleanup in the object destructor
>but I'm not sure what version of bioperl you are running and whether or
>not that fix has been implemented properly.
>
>Happy to help chase it down but would be helpful if you could submit this
>as a bug report and describe your version of perl, bioperl, architecture,
>etc.  Is it possible for you to send your complete code so we don't go
>chasing too far?
>
>-Jason
>
>On Mon, 8 Oct 2001, Kun Zhang wrote:
>
> > Hello! I'm using SeqIO to read DNA sequences of NCBI's unigene cluster,
> > which is distributed as a big fasta file
> > (ftp://ftp.ncbi.nlm.nih.gov/pub/schuler/unigene/Hs.seq.uniq.Z), and process
> > every gene sequentially. I got the following error message when 1002 (or
> > 1003) sequences have been processed.
> > 
> =================================================================================
> > Error in tempfile() using /tmp/XXXXXXXXXX: Could not create temp file
> > /tmp/eo0UwZVasW: Too many open files at
> > /usr/lib/perl5/site_perl/5.6.0/Bio/Root/IO.pm line 41
> > 
> ==================================================================================
> >
> > However, I check the /tmp directory, and found not a single temporary
> > sequence. Is there any way to work around this problem except for chopping
> > the fasta file into several smaller ones? Thanks!
> >
> > Kun Zhang
> > Human Genetics Center
> > UT-Houston Health Science Center
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
>--
>Jason Stajich
>Duke University
>jason@cgt.mc.duke.edu