[Bioperl-l] Entrez Gene and bioperl-db

Sun Jan 9 17:01:02 EST 2005

I meant that there is information about a single gene spread across
various Entrez Gene files, so if one were to parse them all at once, one
would have to keep a lot of info in memory, especially since the order
of the entries is not necessarily the same across files; for instance,
gene2unigene is ordered according to the UniGene identifiers, and
gene2accession is not; if one wanted to add the unigene info to all
entries in one fell swoop, this would seem to require keeping entries
either in memory or in some indexed file. 

In contrast, the ontology files you mention are more independent of one
another, so there is no particular difficulty in combining flat files
for the three subontologies.

I am starting to think that it might make the most sense to concentrate
on the ASN.1 files. It think it should be reasonably simple to do this
with a kind of recursive descent strategy, either using some CPAN
modules or perhaps better self-rolled. At the moment I have not seen any
modules that appear to be great candidates for lexing the ASN.1 text
(ideas anyone?). 

-peter

On Sat, 2005-01-08 at 19:09, Hilmar Lapp wrote:
> On Thursday, January 6, 2005, at 10:51  PM, Peter Robinson wrote:
> 
> > On the other hand, parsing multiple Entrez Gene files at once
> > in order to synthesize various forms of infomration about an Entrez 
> > Gene
> > id seemed to depart from the style of the rest of Bio::SeqIO code.
> 
> I don't think so at all. It only appears so because most other formats 
> happen to come in a single file. The OntologyIO GO parser e.g. takes 
> any number of files.
> 
> 	-hilmar
-- 
Peter N. Robinson
peter.robinson at t-online.de
peter.robinson at charite.de
http://www.charite.de/ch/medgen/robinson/