[Bioperl-l] Processing large fasta sequences throught SeqIO

Jason Stajich jason@chg.mc.duke.edu
Fri, 31 Aug 2001 11:31:22 -0400 (EDT)


On Fri, 31 Aug 2001, Josep Francesc Abril Ferrando wrote:

> Hi Jason,
> 
> > > Error in tempdir() using /tmp/XXXXXXXXXX: Could not create directory
> > > /tmp/Z0gD8R0rlB: Too many links at
> > > /usr/lib/perl5/site_perl/5.005//Bio/Root/IO.pm line 457
> >
> > Is your tmp dir really full of files/directories or have not enough space
> > for the collection of all the sequence data?  This seems like a system
> > problem.
> 
> Currently, "/tmp" is only ~150Mb and I have more than 1Gb of free hard
> disk space (on a PC box with 386Mb of RAM, Red Hat 6.2 with kernel
> version 2.2.14, and perl 5.6.1). Maybe it could be a permissions
> issue.
>
Seems strange, again.  
Will cook up a testing script for you in a minute.  Can you at least do

% mkdir /tmp/me
% echo "I am great" > /tmp/me.txt
% rm -rf /tmp/me /tmp/me.txt
 
> > Do you have File::Temp installed?  There is a known bug in 0.7 release
> > that if you do not have File::Temp installed the application will not
> > cleanup its tempdirs/tempfiles cleanly.  Installing File::Temp will take
> > care of that.
> 
> It is installed and it is version 0.12. Do I have to include the
> corresponding "use File::Temp;" in the script ? Maybe I have to tell
> our sysadmin to update both, File::Temp and BioPerl.
> 
Nope, don't need to include it, it is done for you in Bio::Root::IO.
We have tried to make it as simple as possible to use the modules, and
I've never had the problems you can describe.  0.12 is fine for sure.

I have access to a RH box so I'll see if I can duplicate any of the
problems. 

> > > If I look at the saved file, the sequence is OK (do not have more or
> > > less nucleotides than expected and they are in the correct ordering)
> > > but the file contains a lot of empty lines (or just having '>') after
> > > the finished sequence. Any idea of what should be wrong in the
> > > following script:
> >
> > Nothing obvious is jumping out right now by looking at your code -
> > How large are your files?
> 
> At this moment I am working around 50Mbp length sequences, but I would
> like being able to scale up to 250Mbp.
>
> > > Is that the right way to use "Bio::SeqIO" for processing large fasta
> > > files. Do I have to include "Bio::Seq::LargeSeq" and, if yes, how can
> > > I do that ?
> >
> > you could add the line
> > use Bio::Seq::LargeSeq;
> > just below --> use Bio::SeqIO <--
> > if you wanted, but it is included by the largefasta modules so it is
> > optional.
> 
> Well, I've made some test, including "use Bio::Seq::LargeSeq" first
> and then also with "use File::Temp", and I've got the same results
> (the same error/warning -only changing the temporary directory name
> that cannot be created- and the same trailing extra lines).
> 
> Thanks again... Josep F.
> 
> ________________________________________
> 
>     Josep Francesc ABRIL FERRANDO
> 
> RESEARCH GROUP on BIOMEDICAL INFORMATICS
>         GENOME INFORMATICS LAB
>               IMIM - UPF
>           C/ Dr. Aiguader 80
>        08003 - Barcelona  (SPAIN)
> 
>     Ph:  +34 93 2211009 ext 2016
>     Fax: +34 93 2213237
> 
>     http://www1.imim.es/~jabril/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>