[Bioperl-l] Parsing FASTA files into PrimarySeq objects
Florent Angly
florent.angly at gmail.com
Mon Feb 15 22:13:42 UTC 2010
Hi all,
I am trying to reduce memory usage and speedup reading FASTA files using
the facilities provided by BioPerl.
The first thing I noticed is that when using Bio::SeqIO::fasta, the
objects returned are Bio::Seq, not Bio::PrimarySeq objects.
Bio::PrimarySeq sequences are lighter than Bio::Seq sequences, so it's
what I need. See code below:
> use warnings;
> use strict;
> use Data::Dumper;
> use Bio::SeqIO;
> my $in = Bio::SeqIO->new(-fh=>\*DATA);
> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
> print "The factory is a ".ref($seqfactory)."\n";
> $seqfactory->type('Bio::PrimarySeq'); # gives an error
> my $seq = $in->next_seq;
> print Dumper($seq);
> __END__
> >seq1 a small test sequence
> ACGTACGACTACGACTAGCGCCATCAGC
It returns:
> $VAR1 = bless( {
> 'primary_id' => 'seq1',
> 'primary_seq' => bless( {
> 'display_id' => 'seq1',
> 'primary_id' => 'seq1',
> 'desc' => 'a small test
> sequence',
> 'seq' =>
> 'ACGTACGACTACGACTAGCGCCATCAGC',
> 'alphabet' => 'dna'
> }, 'Bio::PrimarySeq' )
> }, 'Bio::Seq' );
Actually, we have a Bio::Seq containing a Bio::PrimarySeq. I really only
need the Bio::PrimarySeq. Looking at the documentation for Bio::SeqIO I
found that I could in theory adjust the sequence factory and sequence
builder to my liking... I tried this:
> use warnings;
> use strict;
> use Data::Dumper;
> use Bio::SeqIO;
> my $in = Bio::SeqIO->new(-fh=>\*DATA);
> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
> print "The factory is a ".ref($seqfactory)."\n";
> $seqfactory->type('Bio::PrimarySeq'); # gives an error
> my $seq = $in->next_seq;
> print Dumper($seq);
> __END__
> >seq1 a small test sequence
> ACGTACGACTACGACTAGCGCCATCAGC
This returns:
> The factory is a Bio::Seq::SeqFastaSpeedFactory
> Can't locate object method "type" via package
> "Bio::Seq::SeqFastaSpeedFactory" at ./seqbuilder_test_3.pl line 12,
> <DATA> line 1.
According to Bio::Seq::FastaSpeedFactory's documentation:
> If you want the factory to create Bio::Seq objects instead
> of the default Bio::PrimarySeq objects, use the -type parameter
So, PrimarySeq should be the default type, even though in my case it
seems not to be. Second, I can't seem to use the -type method to change
what the return type is... It errors.
Any ideas??? Thanks,
Florent
More information about the Bioperl-l
mailing list