[Bioperl-l] new directions

Aaron J Mackey ajm6q@virginia.edu
Wed, 7 Mar 2001 19:02:27 -0500 (EST)


On Wed, 7 Mar 2001, Jason Stajich wrote:

>  o Fasta parsing.  We should find a way to support this, either with a
>    formal grammar or just some perl code.

I've been promising this for a long time now.  Unfortunately, my thesis
committee feels it's a waste of time (or not a worthy enough project for a
PhD - of course I'm in a microbiology department, so it's not surprising).

My first efforts were hard-ball regular expressions.  What I came up with
resembled Bio::Tools::Blast, i.e. I would be the only one in the world who
could deal with it, and the spaghetti-ness of it was unbelievable.  I
quickly found that it was unmaintable, even for me (during a period when
Bill Pearson was making formatting changes to the FASTA output seemingly
daily).  I threw up my hands because I was spending more time fixing my
parser than analyzing results, and I knew that to get that data I needed
at the time, I just had to use grep.

Recently I've begun to think about a parser written via a formal grammar
(which I believe most if not all of the FASTA output variations can be
expressed under).  Of course choosing a grammar requires me to ask the
same questions others have asked: do we care whether we can parse a
streaming, non-seekable pipe, or can we live with just parsing files (or
creating temp files from pipes to get around this)?

So, if there's someone who wants to tackle this one, I'm more than willing
to provide guidance or give feedback.  Otherwise, I'll try to force myself
sometime soon to deal with it (if nothing else because Bill and I would
like to provide XML output from FASTA itself, and would rather not do it
from within the C code, but via a filter - if it could be written using
BioPerl, and therefore coupled to successful BioPerl parsing, all the
better).  Words of encouragement and/or expressions of need are always
appreciated (sometimes I feel like no one else in the world ever uses
FASTA - at least not in the ways I do).

-Aaron

-- 
 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  /
 \ University of Virginia  \
 /  (804) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o