[Bioperl-l] GenBankParser comparison to bioperl parser

John Kloss jkloss@sapiens.wustl.edu
Thu, 12 Sep 2002 11:22:30 -0700


You guys all get up so early.  It's almost like you're in a different
time zone :)

First: if you guys could take off the gishlab cc, I'd appreciate it.
The guys in my lab know I've been working on this for a long time and I
just thought they'd like to know what some results were, that's why I
cc'd them before.  I don't think they wanted all of last nights
discussion.  My fault.  I should've just walked over to their desks and
said "Hey, look what I did".  

Second: I'm happy to maintain my own code.  I wasn't looking to hand it
off.  I use this parser everyday so it has to work.  If you all think it
would be a nice addition to a bioperl-util or contrib directory, that's
fine with me.  I'll field bug and usage issues.

Third: If you'd like to use the code as an underlying base to Bio::SeqIO
genbank format.  That's fine with me, too.  I'll still maintain it and
I'd be willing to at least try and integrate it into the bioperl
framework.  I actually like building parsers so it doesn't seem much of
a burden to me.

And the results of the GenBankParser against Lincoln Stein's
Boulder::Genbank

GenBankParser

real    1m0.093s
user    0m55.430s
sys     0m6.820s

Boulder::Genbank

real    13m21.597s
user    12m56.850s
sys     0m27.180s

Note, the times are slightly faster because I had to gunzip the
gbbct1.seq file first and tear out the first 11 lines of the form

GBBCT1.SEQ           Genetic Sequence Data Bank
                           August 15 2002

                NCBI-GenBank Flat File Release 131.0

                        Bacterial Sequences (Part 1)

   19841 loci,   103009067 bases, from    19841 reported sequences

because Lincoln's parser was dying on that with a substr out of range at
Boulder::Genbank.pm line 853.  After I removed the GenBank header cruft,
it worked fine.

Lincoln, I've idolized you since I first learned how to code perl so
this is a really big moment for me :)

	John Kloss.

-----Original Message-----
From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]
On Behalf Of Ewan Birney
Sent: Thursday, September 12, 2002 6:26 AM
To: Lincoln Stein
Cc: Elia Stupka; Ian Korf; John Kloss; bioperl-l@bioperl.org;
gishlab@species.wustl.edu
Subject: Re: [Bioperl-l] GenBankParser comparison to bioperl parser


On Thu, 12 Sep 2002, Lincoln Stein wrote:

> A separate repository is also fine with me, but I prefer
Bioperl-contrib, 
> because it should not just be for utility code, and nicely echoes the 
> "contrib" directory of the X Windows Consortium code distribution.
> 
> I'll put Boulder into a Bioperl-contrib if there is one.


Deal. John --- sounds good to you?


> 
> Lincoln
> 
> On Thursday 12 September 2002 7:20 am, Elia Stupka wrote:
> > > I like the bioperl-util repository - "for professionals only" I
think it
> > > keeps diversity without freaking newbies out and we can trade
code.
> >
> > Rightie-ho, easiest way out, for some silly reason I thought nobody
would
> > like that...
> >
> > Elia
> >
> > ********************************
> > * http://www.fugu-sg.org/~elia *
> > * tel:    +65 6874 1467        *
> > * mobile: +65 9030 7613        *
> > * fax:    +65 6779 1117        *
> > ********************************
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------

_______________________________________________
Bioperl-l mailing list
Bioperl-l@bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l