[Bioperl-l] Zombie processes with GenBank get_Seq_by_acc()

Dave Messina David.Messina at sbc.su.se
Thu May 12 07:22:04 UTC 2011


Thanks for posting the code, O'car.

I haven't tried running it, but one thing that occurs to me is that on line
18 when you create your Bio::DB::Genbank object, there's no 'my', so those
objects may be hanging around longer than you expect. The zombies may be
those objects' forked processes for connecting to Genbank. Similar to what
Kevin said earlier.

But that's all speculation.

The other thing I'll say as a general comment is that fetching thousands of
records from Genbank this way (or really fetching any more than 100) is
inefficient and probably slow also.

Instead you might try using Genbank's own fetching tools, EUtilities, either
directly or via the two BioPerl interfaces to them (Bio::DB::EUtilities and
Bio::DB::SoapEUtilities).


Dave




On Thu, May 12, 2011 at 00:16, O'car Johann Campos
<ocarnorsk138 at gmail.com>wrote:

> Kevin Brown <Kevin.M.Brown <at> asu.edu> writes:
>
> >
> > Seeing your code might help. They could just be forked children waiting
> > for the script to exit before they go away or something else forked them
> > and failed to clean up before quitting.
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces <at> lists.open-bio.org [mailto:bioperl-l-
> > > bounces <at> lists.open-bio.org] On Behalf Of Belaid MOA
> > > Sent: Tuesday, May 10, 2011 1:41 PM
> > > To: bioperl-l <at> lists.open-bio.org
> > > Subject: [Bioperl-l] Zombie processes with GenBank get_Seq_by_acc()
> > >
> > >
> > > Dear All,
> > >   I installed the latest version of BioPerl and I ran a very simple
> > > code: it goes through each line (an ACC) in a file and uses GenBank to
> > > get the sequence
> > > via get_Seq_by_acc(). A look at ps shows that there were a lot of
> > > zombie processes (with <defunct> attribute) created. The list grows
> > > with the time.
> > > This means that Bio:DB:GenBank is forking and not cleaning the
> > > children. Is there any way to overcome the issue? Moreover, is there
> > > any way
> > > to specify the number of forked processes?
> > >
> > > With best regards,
> > > -Belaid.
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l <at> lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> Kevin, Belaid, All:
>
>        Recently I've been working with genbank too and ran a code to get
> Genbank info from accession numbers, I also noticed the weird behavior and
> the
> zombie processes that are in the background, altough the code works and I
> get
> the info I need there are a lot of zombie processes in the background and
> for
> example running this task with 8000 accession numbers would be a pain where
> you
> all know. I'm not a bioperl expert and I may be missing some piece of code
> to
> quit the forked children as may be happening to belaid, so this is my piece
> of
> code in case any get and idea why is this happening.
>
> http://pastebin.com/Zq88cpwb
>
> Thanks in advance.
> Cheers.
>
> O'car.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list