[Bioperl-l] Entrez Gene ASN parsers

Hilmar Lapp hlapp at gmx.net
Sat Mar 12 23:54:14 EST 2005


On Saturday, March 12, 2005, at 08:12  PM, Liu, Mingyi wrote:

>
> My parser does to NCBI's ASN.1 EntrezGene file what an XML parser does 
> to a yet-to-exist XML-formatted EntrezGene file (or better than it, if 
> NCBI decides to code Entrez Gene in the XML format that Eutils 
> provide).

This is apparently what they will be doing, or at least my 
understanding of it. The discomforting thing is that it's taken them so 
long already to come up with that supposedly little tool. In fact, 
apparently the fact they weren't able to provide the off-line tool yet 
is the reason that they're still maintaining the LocusLink download. 
That's what they told me in a response to an inquiry. Although from 
Monday on they'll remove C.elegans and fruitfly from LL_tmpl. Not good.

> And it performs better than XML parsers.

Actually, even an expat-based XML parser would be by orders of 
magnitude slower than your regexp-based.

The question is how safe are your regexps from possibly unexpected 
things like escaped quotes or an escaped curly brace that's part of a 
string and not end of an entity etc or whatever might confuse your 
regexps.

Maybe in ASN.1 this isn't a big deal? I just have too little knowledge 
about ASN.1 to make any judgment here.

>
> So I really don't think there's any need for XML file from NCBI.

Yeah, I actually started to change my mind w.r.t. waiting for the XML 
format.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list