[BioPython] UniGene parser

Jeffrey Chang jchang@smi.stanford.edu
Tue, 23 Jul 2002 16:31:59 -0700


On Tue, Jul 23, 2002 at 06:07:02PM -0700, Cayte wrote:
> > For accessing LocusLink and maybe also for UniGene I would
> > recommend to download the whole database in ASCII flatfile
> > format, and then parsing the flat files. In my opinion
> > it is much easier to write parsers for these
> > flatfiles, than for any HTML generated primarily for human
> > readers.
> >
>   The full file is 21 MB and over an hour to download to my win98 machine.
> Presumably the size of these databases is exploding so I wonder if this is
> appropriate for desktop environments.  What to others think?

It's appropriate for biopython, which is used in many different types
of environments.  We already have code for iterating, parsing, and
indexing MEDLINE (40Gb), GenBank (25+Gb), etc...  I think Peter's
request to handle the flat text files directly from NCBI is
reasonable.

Jeff