[Bioperl-l] Processing large fasta sequences throught SeqIO

Josep Francesc Abril Ferrando jabril@imim.es
Thu, 30 Aug 2001 20:05:06 +0200


I need to work with chromosome size fasta sequences and I was trying to run some perl code using
BioPerl version 0.7 ("$Id:largefasta.pm,v 1.5.2.1$", which is the one currently installed in our
system). As I read in the "Bio::SeqIO::largefasta" documentation that this module has to be accessed
from "Bio:SeqIO",I do not included directly that module in the program. I wrote a script that
basically reads the whole seq, may process a little bit the sequence (i.e. reformating non-uniform
length sequence lines -if I am building the input by joining many sequences under the same id-), and
then save the processed large sequence. It seems to work OK, but I got some strange results in the
saved file while I get the following error/warning:

Error in tempdir() using /tmp/XXXXXXXXXX: Could not create directory /tmp/Z0gD8R0rlB: Too many links
at /usr/lib/perl5/site_perl/5.005//Bio/Root/IO.pm line 457

If I look at the saved file, the sequence is OK (do not have more or less nucleotides than expected
and they are in the correct ordering) but the file contains a lot of empty lines (or just having
'>') after the finished sequence. Any idea of what should be wrong in the following script:

---->8---->8---->8---->8---->8----

perl -ne 'BEGIN{ print ">bigseq\n"; }
   $_ !~ /^>|^\s*$/o && print ;  ' $INDIR/*.fa |
  perl -e '
      use Bio::Seq;
      use Bio::SeqIO;
      my $seqin  = Bio::SeqIO->new(-format => "largefasta", -fh => \*STDIN );
      my $seqout = Bio::SeqIO->new(-format => "largefasta", -fh => \*STDOUT);
      while (my $sequence = $seqin->next_seq()) {
           # do here some checkings/changes on substrings of the sequence
          $seqout->write_seq($sequence);
      }; # while
     exit(0);
    ' - > $OUTDIR/bigseq.fa

----8<----8<----8<----8<----8<----

Is that the right way to use "Bio::SeqIO" for processing large fasta files. Do I have to include
"Bio::Seq::LargeSeq" and, if yes, how can I do that ?

Thanks for your attention... Josep F.
________________________________________

    Josep Francesc ABRIL FERRANDO

RESEARCH GROUP on BIOMEDICAL INFORMATICS
        GENOME INFORMATICS LAB
              IMIM - UPF
          C/ Dr. Aiguader 80
       08003 - Barcelona  (SPAIN)

    Ph:  +34 93 2211009 ext 2016
    Fax: +34 93 2213237

    http://www1.imim.es/~jabril/