[Bioperl-l] Error reporting/Validation implemented

Stefan Kirov skirov at utk.edu
Tue Mar 15 16:46:13 EST 2005


Mingyi,
Few things:
I used your parser to produce Bioperl objects based on some of the high 
level features and compared it ot what I have. Your parser is 
considerably faster (about twice), but it is still hard to tell as I am 
descending further  in the hierarchy with mine. At the same time I don't 
think the difference will vanish, so I will start building over your 
parser to produce bioperl objects. I am not sure exactly how I am going 
to deal with the relationships that are necessary, but I'll deal with it 
when I finsih everything else.
By the way it took 9 minutes on a 64 bit Xeon  3.4GHz even with Bioperl 
objects construction on the whole Homo_sapiens ASN file. The data that 
went inside the objects was: general desc of the genes (symbol, name, 
summary, etc.), organsism descr. but none of the truly big parts. 
Unfortunately, I am leaving tomorrow for a conference, so I will have 
some more next week earliest. Thanks for sharing the code!
Stefan

Mingyi Liu wrote:

> Hi, there,
>
> I just implemented basic error reporting and validation 
> functionalities in my Entrez Gene parser in Perl (the regex version).  
> The validation will catch all non-conforming data, while error 
> reporting reports line number, error type, and the first 20 
> (customizable) characters of the offending data (but the line number 
> could be incorrect if the format resulted in an exception, which is 
> hard to deal with for ASN.1-formatted data, although easy for XML 
> parsers).
> The speed for the parser of course slowed down, but I'd say it'd still 
> beat most parsers hands down.  The full human genome now takes a bit 
> over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4 
> GHz CPU.  So I don't think my parser's speed has much to do with 
> performing validation or not.
>
> I had also communicated with Stefan Kirov and turns out the dead 
> entries and 0-sized (should be 1-sized) arrays were simply related to 
> data trimming options.  So far, so good.
>
> If anyone is interested, check it out at 
> http://www.sourceforge.net/projects/egparser/.
>
> Regards,
>
> Mingyi
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list