[Biojava-l] Parsing circular sequences
Keith James
kdj@sanger.ac.uk
Tue, 12 Nov 2002 10:04:18 GMT
From: Keith James <kdj@maul>
Date: 12 Nov 2002 10:04:18 +0000
In-Reply-To: <3DD02EBD.6070700@yahoo.co.uk>
Message-ID: <sc48yzzqf31.fsf@maul.i-did-not-set--mail-host-address--so-shoot-me>
Lines: 28
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
>>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:
Matthew> Is there any will to replace the current monolithic
Matthew> parsers for embl/genbank/swissprot et.al. with modular
Matthew> event-based parsers based upon tag-value? If we did this
Matthew> then the location parsing module can just listen for
Matthew> sequence length events. I realy have no idea how the
Matthew> performance of the two aproaches would compare, but I'm
Matthew> willing to help with writing the tag-value embl parser
Matthew> and benchmarking the result.
I started on a hybrid EMBL parser which combined tag-value and
JFlex/CUP for the feature table, but gave it up for more interesting
things. (It was a real drag trying to get conflicts in the feature
table BNF to resolve and then there's the syntax errors in the DB
itself.)
I'd help with this. I'm messing with the same thing in Lisp, so it
would be an interesting excercise. (Dammit! I *swore* I'd never do
another EMBL parser!)
Keith
--
- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -