[Bioperl-l] Parsing FASTA files into PrimarySeq objects

Jason Stajich jason at bioperl.org
Tue Feb 16 00:03:36 UTC 2010


I don't think that aspect of the documentation vs interface was ever 
implemented - the interface object doesn't specify a type method or init 
argument even though the documentation says so. Not really sure why not, 
this was ages ago unfortunately.

This particular factory definitely assumes you are building Bio::Seq 
objects - you can try and subclass and build your own to see if it makes 
much of a difference in speed/memory - I would posit it won't make a 
significant difference but be interested to see what you find.

Just make your own Bio::Seq::PrimarySeqFactory object based on 
Bio::Seq::FastaSpeedFactory and simplify that code so it doesn't build 
Bio::Seq object wrapper around the Bio::PrimarySeq and do some perf 
tests so we'll know if it makes any difference here.

-jason
Florent Angly wrote:
> Hi all,
>
> I am trying to reduce memory usage and speedup reading FASTA files 
> using the facilities provided by BioPerl.
>
> The first thing I noticed is that when using Bio::SeqIO::fasta, the 
> objects returned are Bio::Seq, not Bio::PrimarySeq objects. 
> Bio::PrimarySeq sequences are lighter than Bio::Seq sequences, so it's 
> what I need. See code below:
>> use warnings;
>> use strict;
>> use Data::Dumper;
>> use Bio::SeqIO;
>> my $in = Bio::SeqIO->new(-fh=>\*DATA);
>> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
>> print "The factory is a ".ref($seqfactory)."\n";
>> $seqfactory->type('Bio::PrimarySeq'); # gives an error
>> my $seq = $in->next_seq;
>> print Dumper($seq);
>> __END__
>> >seq1 a small test sequence q
>> ACGTACGACTACGACTAGCGCCATCAGC 
> It returns:
>> $VAR1 = bless( {
>>                  'primary_id' => 'seq1',
>>                  'primary_seq' => bless( {
>>                                            'display_id' => 'seq1',
>>                                            'primary_id' => 'seq1',
>>                                            'desc' => 'a small test 
>> sequence',
>>                                            'seq' => 
>> 'ACGTACGACTACGACTAGCGCCATCAGC',
>>                                            'alphabet' => 'dna'
>>                                          }, 'Bio::PrimarySeq' )
>>                }, 'Bio::Seq' ); 
>
> Actually, we have a Bio::Seq containing a Bio::PrimarySeq. I really 
> only need the Bio::PrimarySeq. Looking at the documentation for 
> Bio::SeqIO I found that I could in theory adjust the sequence factory 
> and sequence builder to my liking... I tried this:
>> use warnings;
>> use strict;
>> use Data::Dumper;
>> use Bio::SeqIO;
>> my $in = Bio::SeqIO->new(-fh=>\*DATA);
>> my $seqfactory = $in->sequence_factory; # Bio::Factory::ObjectBuilderI
>> print "The factory is a ".ref($seqfactory)."\n";
>> $seqfactory->type('Bio::PrimarySeq'); # gives an error
>> my $seq = $in->next_seq;
>> print Dumper($seq);
>> __END__
>> >seq1 a small test sequence
>> ACGTACGACTACGACTAGCGCCATCAGC 
> This returns:
>> The factory is a Bio::Seq::SeqFastaSpeedFactory
>> Can't locate object method "type" via package 
>> "Bio::Seq::SeqFastaSpeedFactory" at ./seqbuilder_test_3.pl line 12, 
>> <DATA> line 1. 
>
> According to Bio::Seq::FastaSpeedFactory's documentation:
>> If you want the factory to create Bio::Seq objects instead
>> of the default Bio::PrimarySeq objects, use the -type parameter 
> So, PrimarySeq should be the default type, even though in my case it 
> seems not to be. Second, I can't seem to use the -type method to 
> change what the return type is... It errors.
>
> Any ideas??? Thanks,
>
> Florent
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 



More information about the Bioperl-l mailing list