[Bioperl-l] SeqFeatureCollection issue

Jason Stajich jason at cgt.duhs.duke.edu
Thu Jul 15 14:17:01 EDT 2004


Well each feature has a reference to the original contig - this gets set
when a feature is added to a sequence in $seq->add_SeqFeature($f) (which
in turn calls attach_seq on the feature).

In memory this is fine since we're just talking references -
presumably Storable tries to follow the reference and store the
sequence as well when freeze is called.   I had some tests which instead
of using Storable used $feature->gff_string to "serialize" the feature
object - but this didn't seem to work so well and wouldn't of course allow
Bio::RangeI objects to also be passed in.

Most of my tests had centered around building feature sets and reading
them in/out to GFF so I probably never saw this because I wasn't getting
my features from sequence objects.  Clearly a problem though if you are
experiencing the behavior you are seeing.

Let me know if it works.

-jason
 On Thu, 15 Jul 2004, Wiepert, Mathieu wrote:

> Hi,
>
> Just a question on that large sequence that gets attached.  How many
> times *does* it get attached?  I still wondering how 38MB of data gets
> the hefty weight of an 18 GB file.
>
> I will try your suggestion, I think that will work,
>
> Thanks,
>
> -mat
>
> > -----Original Message-----
> > From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
> > Sent: Thursday, July 15, 2004 9:19 AM
> > To: Wiepert, Mathieu
> > Cc: bioperl-l at portal.open-bio.org
> > Subject: RE: [Bioperl-l] SeqFeatureCollection issue
> >
> > I suspect it has something to do with freeze/thaw and the large attached
> > contig sequence which is also getting frozen for each feature.
> >
> > If you call
> > $feature->{'_gsf_seq'} = undef;
> > on each feature (sorry no one wrote an 'unattach_seq' method) before it
> > gets added that might help.
> >
> > -jason
> > On Thu, 15 Jul 2004, Wiepert, Mathieu wrote:
> >
> > > Hi,
> > >
> > > I did try that actually, that was the last thing I was doing, as I left
> > > last night.  I thought it was going to work, but it didn't get far
> > > before I got "Out of memory!" again.  It seems a contig with a file size
> > > of 38,914,775 bytes, which hat 619 features of type mRNA, CDA, or gene,
> > > creates a temp file of 18,520,702,976.  SO that's 38 MB to 18 GB.  Wow!
> > > Pulling a range out of that collections does take a bit of time too.
> > > Perhaps there is a better way to do this...
> > >
> > > I am just not sure where all the memory is getting eaten up, if you have
> > > an idea (large seq, something with that?) let me know.  I made the temp
> > > file get created in a place that I know can hold it at least, and it is
> > > working (though I have a 100mb file, I am afraid what that one will do)
> > >
> > > Thanks for the input though,
> > >
> > > -mat
> > >
> > > > -----Original Message-----
> > > > From: Jason Stajich [mailto:jason at cgt.duhs.duke.edu]
> > > > Sent: Wednesday, July 14, 2004 8:58 PM
> > > > To: Wiepert, Mathieu
> > > > Cc: bioperl-l at portal.open-bio.org
> > > > Subject: Re: [Bioperl-l] SeqFeatureCollection issue
> > > >
> > > > Did you try passing in a filename with -file => '/tmp/myfile.idx'?
> > > >
> > > >  Title   : new
> > > >  Usage   : my $obj = new Bio::SeqFeature::Collection();
> > > >  Function: Builds a new Bio::SeqFeature::Collection object
> > > >  Returns : Bio::SeqFeature::Collection
> > > >  Args    :
> > > >
> > > >            -minbin        minimum value to use for binning
> > > >                           (default is 100,000,000)
> > > >            -maxbin        maximum value to use for binning
> > > >                           (default is 1,000)
> > > >            -file          filename to store/read the
> > > >                           BTREE from rather than an in-memory
> > structure
> > > >                           (default is false and in-memory).
> > > >            -keep          boolean, will not remove index file on
> > > >                           object destruction.
> > > >            -features      Array ref of features to add initially
> > > >
> > > > No idea where the /var/tmp is going...
> > > >
> > > > This *should* work but I haven't done much with it/used it for quite a
> > > > while so I don't know if there are things that don't work...
> > > >
> > > > If it is really not working you can always go the -> to GFF -> load in
> > > > Bio::DB::GFF route using the in-memory adaptor - I wanted to merge the
> > > > interface so that SeqFeature::Collection used the same method names
> > but
> > > > never got around to it.  If someone is using the module would be a
> > nice
> > > > thing to have...
> > > >
> > > > -jason
> > > >
> > > >
> > > > On Wed, 14 Jul 2004, Wiepert, Mathieu wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >
> > > > >
> > > > > I was trying to use the seqfeature collection to pull out features
> > in a
> > > > range I was interested in.  I have two problems (maybe because I am
> > > > loading features form a contig?)
> > > > >
> > > > >
> > > > >
> > > > > In the first case, I ended up running out of space on /var/tmp.  We
> > have
> > > > about .5 GB there I am.  Code is like
> > > > >
> > > > > my $in1  = Bio::SeqIO->new('-file' => $contig.'.gb' , '-format' =>
> > > > 'Genbank');
> > > > >
> > > > > while (my $seq = $in1->next_seq) {
> > > > >
> > > > >             my @feat_ary = $seq->get_SeqFeatures();
> > > > >
> > > > >             my $col = new Bio::SeqFeature::Collection();
> > > > >
> > > > >             # add these features to the object
> > > > >
> > > > >             my $totaladded = $col->add_features(\@feat_ary);
> > > > >
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > I end up filling /var/tmp to 100%, as I said.
> > > > >
> > > > >
> > > > >
> > > > > So I tried to initialize the collection like
> > > > >
> > > > > my $col = new Bio::SeqFeature::Collection(-features => \@feat_ary);
> > > > >
> > > > >
> > > > >
> > > > > but that gave an error:
> > > > >
> > > > >
> > > > >
> > > > > "Can't call method "put" on an undefined value at
> > > >
> > /usr/local/biotools/perl/5.8.2/lib/site_perl/5.8.2/Bio/SeqFeature/Collecti
> > > > on.pm line 225, <GEN0> line 95373."
> > > > >
> > > > >
> > > > >
> > > > > That looked like the _btree wasn't set, but not sure.
> > > > >
> > > > >
> > > > >
> > > > > I am told we have plenty of room in /tmp, so I should change my tmp
> > dir,
> > > > but the docs said that it was all in memory by default, is that not
> > the
> > > > case?  I tried to export a new tmp dir, but that didn't fix the
> > problem...
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -mat
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Jason Stajich
> > > > Duke University
> > > > jason at cgt.mc.duke.edu
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list