[Bioperl-l] Parsing FASTA files into PrimarySeq objects
Jason Stajich
jason at bioperl.org
Tue Feb 16 00:03:36 UTC 2010
I don't think that aspect of the documentation vs interface was ever
implemented - the interface object doesn't specify a type method or init
argument even though the documentation says so. Not really sure why not,
this was ages ago unfortunately.
This particular factory definitely assumes you are building Bio::Seq
objects - you can try and subclass and build your own to see if it makes
much of a difference in speed/memory - I would posit it won't make a
significant difference but be interested to see what you find.
Just make your own Bio::Seq::PrimarySeqFactory object based on
Bio::Seq::FastaSpeedFactory and simplify that code so it doesn't build
Bio::Seq object wrapper around the Bio::PrimarySeq and do some perf
tests so we'll know if it makes any difference here.
-jason
Florent Angly wrote:
> Hi all,
>
> I am trying to reduce memory usage and speedup reading FASTA files
> using the facilities provided by BioPerl.
>
> The first thing I noticed is that when using Bio::SeqIO::fasta, the
> objects returned are Bio::Seq, not Bio::PrimarySeq objects.
> Bio::PrimarySeq sequences are lighter than Bio::Seq sequences, so it's
> what I need. See code below:
>> use warnings;
>> use strict;
>> use Data::Dumper;
>> use Bio::SeqIO;
>> my $in = Bio::SeqIO->new(-fh=>\*DATA);
>> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
>> print "The factory is a ".ref($seqfactory)."\n";
>> $seqfactory->type('Bio::PrimarySeq'); # gives an error
>> my $seq = $in->next_seq;
>> print Dumper($seq);
>> __END__
>> >seq1 a small test sequence q
>> ACGTACGACTACGACTAGCGCCATCAGC
> It returns:
>> $VAR1 = bless( {
>> 'primary_id' => 'seq1',
>> 'primary_seq' => bless( {
>> 'display_id' => 'seq1',
>> 'primary_id' => 'seq1',
>> 'desc' => 'a small test
>> sequence',
>> 'seq' =>
>> 'ACGTACGACTACGACTAGCGCCATCAGC',
>> 'alphabet' => 'dna'
>> }, 'Bio::PrimarySeq' )
>> }, 'Bio::Seq' );
>
> Actually, we have a Bio::Seq containing a Bio::PrimarySeq. I really
> only need the Bio::PrimarySeq. Looking at the documentation for
> Bio::SeqIO I found that I could in theory adjust the sequence factory
> and sequence builder to my liking... I tried this:
>> use warnings;
>> use strict;
>> use Data::Dumper;
>> use Bio::SeqIO;
>> my $in = Bio::SeqIO->new(-fh=>\*DATA);
>> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
>> print "The factory is a ".ref($seqfactory)."\n";
>> $seqfactory->type('Bio::PrimarySeq'); # gives an error
>> my $seq = $in->next_seq;
>> print Dumper($seq);
>> __END__
>> >seq1 a small test sequence
>> ACGTACGACTACGACTAGCGCCATCAGC
> This returns:
>> The factory is a Bio::Seq::SeqFastaSpeedFactory
>> Can't locate object method "type" via package
>> "Bio::Seq::SeqFastaSpeedFactory" at ./seqbuilder_test_3.pl line 12,
>> <DATA> line 1.
>
> According to Bio::Seq::FastaSpeedFactory's documentation:
>> If you want the factory to create Bio::Seq objects instead
>> of the default Bio::PrimarySeq objects, use the -type parameter
> So, PrimarySeq should be the default type, even though in my case it
> seems not to be. Second, I can't seem to use the -type method to
> change what the return type is... It errors.
>
> Any ideas??? Thanks,
>
> Florent
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list