[Bioperl-l] Re: entrezgene binary ASN

Michael Seewald mseewald at gmail.com
Fri Sep 30 14:46:21 EDT 2005


Hi Mingyi,

I am not sure, what you mean. The piping (in my example) did already work
nicely, not for you?

With respect to gene2xml failures: This is nothing the module has to care
about. It *might* check for correct ASN1 syntax, but this is as much as I
would go. Otherwise, I would just try to make sure, that any errors gene2xml
throws are caught and passed on. It is the duty of the module and/or the
person running the script to watch STDERR output!

With respect to the indexing: Again I do not think this would break
anything. Both gunzipping and transforming with gene2xml are transparent to
the module. The index must not care about it! The indexer should recognize,
however, if the index has to be rebuilt. (This is something that some
bioperl modules have problems with AFAICR.)

With respect to disc i/o: This is definitely a time-saver as more and more
of us are running multi-CPU machines.

Just my 2p,
Michael


On 9/30/05, Mingyi Liu <mingyi.liu at gpc-biotech.com> wrote:
>
> I was half way through adding the support for pipe in
> Bio::ASN1::EntrezGene before I realized that this is not a good
> solution. The problem I have with the pipe thing is that it merely
> added more troubles and did not really save anything.
>
> I mean, one superficial advantage of using pipe directly would be that
> you don't need to first launch gene2xml. But 1. Nobody needs to
> manually launch gene2xml. In any shell/perl script that does the
> automatic download of the NCBI binary ASN files, just add a line to
> launch gene2xml right after download. 2. Having EntrezGene module deal
> with it transparently would force it to deal with multiple failure
> possibilities (no gene2xml installed? gene2xml choked? ...), let alone
> hassles of changing syntax in input_file. Simply put, it's not worth it.
>
> Another proposed advantage is saving disk I/O, in a sense it does (the
> gzipped binary files are much smaller), but that does not necessarily
> lead to shorter processing time since the time gene2xml doing its work
> on the fly should be counted as well. Not to mention if gene2xml choked
> for whatever reason.
>
> A major disadvantage of using pipe would be doing any sort of seeking
> operation on the file - the performance would be abysmal. For indexing
> and indexed entry retrieval, one simply have to do the pre-conversion of
> those binary gzipped files.
>
> As such I feel there are compelling reasons for one to first convert the
> binary gzip files to text files, then use the existing Bioperl modules
> to parse, index, retrieve. Any further input/discussions on the matter
> is welcomed!
>
> Thanks,
>
> Mingyi
>
> Michael Seewald wrote:
>
> >Hi Stefan,
> >
> >There are ways to capture these errors. Perl exception handling might
> >be way to do it.
> >
> >On the other hand: Wouldn"t incomplete .gz downloads throw an error
> >right away? I have to check (but can't right now).
> >
> >Michael
> >
>

--
Dr. Michael Seewald
Bioinformatics
Bayer HealthCare AG



More information about the Bioperl-l mailing list