[Bioperl-l] bioperl pulls, xml parsers push, and things get complicated

Stephen Gordon Lenk slenk at emich.edu
Wed Jul 19 16:04:16 EDT 2006


Hi,

I have found that POE fails to execute a periodic task after 32 
iterations in a Perl thread, consistent failure on both XP and OSX - 
if I knew how to write up a defect for Perl I would do this (hint ? 
how is this done - I'm *not* asking RTFM etc) - probably remiss for 
not doing so - I was going to write messages to a Controller Area 
Network (CAN) to control automotive widgets from Perl - I wound up 
using a C code exe (piped to from Perl) with its own threads to do 
this. Oh yes I believe that bio lab systems can be done this way as 
well.

But ... POE is really neat if you think in state machine terms. I have 
an alternate architecture for my test harness (Perlizer) that would 
use POE to run tests with CAN and GPIB.

Steve Lenk

----- Original Message -----
From: Robert Buels <rmb32 at cornell.edu>
Date: Wednesday, July 19, 2006 3:30 pm
Subject: Re: [Bioperl-l] bioperl pulls, xml parsers push, and things 
get complicated

> POE is a really neat thing, I didn't know about it before.  
> Something 
> tells me, however, that I would have trouble convincing people to 
> install POE as a dependency for a genomethreader output parser.  ;-
> )  I 
> hope I'll have the opportunity to use it sometime.
> 
> For the curious, here's a nice intro to POE:
> http://perl.com/pub/a/2001/01/poe.html
> And the POE main site:
> http://poe.perl.org/
> 
> Rob
> 
> aaron.j.mackey at GSK.COM wrote:
> > There are 3rd generation XML "Pull" parsers (also called "StAX" 
> for 
> > Streaming API for XML), but they seem to still be stuck in Java 
> land (e.g. 
> > "MXP1")
> >
> > You could probably use POE to setup a state machine that used 
> XML::Twig to 
> > "push" units of XML content onto a stack, to be read by your 
> "next_*" pull 
> > method (where the XML::Twig push "stalled" until the "next_*" 
> method was 
> > called, and vice versa).
> >
> > -Aaron
> >
> > bioperl-l-bounces at lists.open-bio.org wrote on 07/18/2006 
> 08:06:02 PM:
> >
> >   
> >> Hi all,
> >>
> >> Here's a kind of abstract question about Bioperl and XML parsing:
> >>
> >> I'm thinking about writing a bioperl parser for genomethreader 
> XML, and 
> >> I'm sort of mulling over the 'impedence mismatch' between the 
> way 
> >> bioperl Bio::*IO::* modules work and the way all of the current 
> XML 
> >> parsers work.  Bioperl uses a 'pull' model, where every time 
> you want a 
> >> new chunk of stuff, you call $io_object->next_thing.  All the 
> XML 
> >> parsers (including XML::SAX, XML::Parser::PerlSAX and 
> XML::Twig) use a 
> >> 'push' model, where every time they parse a chunk, they call 
> _your_ 
> >> code, usually via a subroutine reference you've given to the 
> XML parser 
> >> when you start it up.
> >>
> >>  From what I can tell, current Bioperl IO modules that parse 
> XML are 
> >> using push parsers to parse the whole document, holding stuff 
> in memory, 
> >>     
> >
> >   
> >> then spoon-feeding it in chunks to the calling program when it 
> calls 
> >> next_*().  This is fine until the input XML gets really big, in 
> which 
> >> case you can quickly run out of memory.
> >>
> >> Does anybody have good ideas for nice, robust ways of writing a 
> bioperl 
> >> IO module for really big input XML files?  There don't seem to 
> be any 
> >> perl pull parsers for XML.  All I've dug up so far would be 
> having the 
> >> XML push parser running in a different thread or process, 
> pushing chunks 
> >>     
> >
> >   
> >> of data into a pipe or similar structure that blocks the 
> progress of the 
> >>     
> >
> >   
> >> push parser until the pulling bioperl code wants the next piece 
> of data, 
> >>     
> >
> >   
> >> but there are plenty of ugly issues with that, whether one were 
> too use 
> >> perl threads for it (aaagh!) or fork and push some kind of 
> intermediate 
> >> format through a pipe or socket between the two processes (eek!).
> >>
> >> So, um, if you've read this far, do you have any ideas?
> >>
> >> Rob
> >>
> >> -- 
> >> Robert Buels
> >> SGN Bioinformatics Analyst
> >> 252A Emerson Hall, Cornell University
> >> Ithaca, NY  14853
> >> Tel: 503-889-8539
> >> rmb32 at cornell.edu
> >> http://www.sgn.cornell.edu
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>     
> >
> >
> >   
> 
> -- 
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 503-889-8539
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


More information about the Bioperl-l mailing list