[Bioperl-l] Parsing FASTA files into PrimarySeq objects

Florent Angly florent.angly at gmail.com
Mon Feb 15 22:13:42 UTC 2010


Hi all,

I am trying to reduce memory usage and speedup reading FASTA files using 
the facilities provided by BioPerl.

The first thing I noticed is that when using Bio::SeqIO::fasta, the 
objects returned are Bio::Seq, not Bio::PrimarySeq objects. 
Bio::PrimarySeq sequences are lighter than Bio::Seq sequences, so it's 
what I need. See code below:
> use warnings;
> use strict;
> use Data::Dumper;
> use Bio::SeqIO;
> my $in = Bio::SeqIO->new(-fh=>\*DATA);
> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
> print "The factory is a ".ref($seqfactory)."\n";
> $seqfactory->type('Bio::PrimarySeq'); # gives an error
> my $seq = $in->next_seq;
> print Dumper($seq);
> __END__
> >seq1 a small test sequence
> ACGTACGACTACGACTAGCGCCATCAGC
It returns:
> $VAR1 = bless( {
>                  'primary_id' => 'seq1',
>                  'primary_seq' => bless( {
>                                            'display_id' => 'seq1',
>                                            'primary_id' => 'seq1',
>                                            'desc' => 'a small test 
> sequence',
>                                            'seq' => 
> 'ACGTACGACTACGACTAGCGCCATCAGC',
>                                            'alphabet' => 'dna'
>                                          }, 'Bio::PrimarySeq' )
>                }, 'Bio::Seq' );

Actually, we have a Bio::Seq containing a Bio::PrimarySeq. I really only 
need the Bio::PrimarySeq. Looking at the documentation for Bio::SeqIO I 
found that I could in theory adjust the sequence factory and sequence 
builder to my liking... I tried this:
> use warnings;
> use strict;
> use Data::Dumper;
> use Bio::SeqIO;
> my $in = Bio::SeqIO->new(-fh=>\*DATA);
> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
> print "The factory is a ".ref($seqfactory)."\n";
> $seqfactory->type('Bio::PrimarySeq'); # gives an error
> my $seq = $in->next_seq;
> print Dumper($seq);
> __END__
> >seq1 a small test sequence
> ACGTACGACTACGACTAGCGCCATCAGC
This returns:
> The factory is a Bio::Seq::SeqFastaSpeedFactory
> Can't locate object method "type" via package 
> "Bio::Seq::SeqFastaSpeedFactory" at ./seqbuilder_test_3.pl line 12, 
> <DATA> line 1.

According to Bio::Seq::FastaSpeedFactory's documentation:
> If you want the factory to create Bio::Seq objects instead
> of the default Bio::PrimarySeq objects, use the -type parameter
So, PrimarySeq should be the default type, even though in my case it 
seems not to be. Second, I can't seem to use the -type method to change 
what the return type is... It errors.

Any ideas??? Thanks,

Florent






More information about the Bioperl-l mailing list