[Bioperl-l] SeqIO problem on fasta file with large number of sequences.

Jason Eric Stajich jason@cgt.mc.duke.edu
Thu, 11 Oct 2001 20:18:09 -0400 (EDT)


This is an obvious bug we should work on fixing - I believe you can also
work around this by just allocating a new StandAloneBlast object in the
loop each time.

-jason


On Thu, 11 Oct 2001, Kun Zhang wrote:

> It turns out that nothing is wrong with SeqIO. StandAloneBlast was used
> with the loop to do some comparison with each sequence. And that's the
> problem. My program alway stops after 1003 sequences have been processed.
> And my solution was to write a very simple wrapper for the local blast
> instead. It looks very ugly compared with StandAloneBlast, but it works.
>
> Kun
>
> At 08:15 AM 10/9/2001 -0400, Jason Eric Stajich wrote:
> >Kun -
> >
> >I'm not aware of any times when we allocate tempfiles in the SeqIO system
> >I regularly parse files with SeqIO that are 10k+ with no problem so
> >perhaps you are doing something else in addition within these loops that
> >is allocating new tempfiles.
> >
> >The tempfile cleanup function as part of Temp::File is often not called
> >until the program exits so you have the confusing situation of running out
> >of space/free FH while the directory is empty when program exits.  We have
> >worked around this by trying to register cleanup in the object destructor
> >but I'm not sure what version of bioperl you are running and whether or
> >not that fix has been implemented properly.
> >
> >Happy to help chase it down but would be helpful if you could submit this
> >as a bug report and describe your version of perl, bioperl, architecture,
> >etc.  Is it possible for you to send your complete code so we don't go
> >chasing too far?
> >
> >-Jason
> >
> >On Mon, 8 Oct 2001, Kun Zhang wrote:
> >
> > > Hello! I'm using SeqIO to read DNA sequences of NCBI's unigene cluster,
> > > which is distributed as a big fasta file
> > > (ftp://ftp.ncbi.nlm.nih.gov/pub/schuler/unigene/Hs.seq.uniq.Z), and process
> > > every gene sequentially. I got the following error message when 1002 (or
> > > 1003) sequences have been processed.
> > >
> > =================================================================================
> > > Error in tempfile() using /tmp/XXXXXXXXXX: Could not create temp file
> > > /tmp/eo0UwZVasW: Too many open files at
> > > /usr/lib/perl5/site_perl/5.6.0/Bio/Root/IO.pm line 41
> > >
> > ==================================================================================
> > >
> > > However, I check the /tmp directory, and found not a single temporary
> > > sequence. Is there any way to work around this problem except for chopping
> > > the fasta file into several smaller ones? Thanks!
> > >
> > > Kun Zhang
> > > Human Genetics Center
> > > UT-Houston Health Science Center
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> > >
> >
> >--
> >Jason Stajich
> >Duke University
> >jason@cgt.mc.duke.edu
>

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu