[BioPython] UniGene parser

Sagar Damle sagar@caltech.edu
Wed, 17 Jul 2002 07:34:25 -0700


Hi peter, 

> For accessing LocusLink and maybe also for UniGene I would
> recommend to download the whole database in ASCII flatfile
> format, and then parsing the flat files. In my opinion
> it is much easier to write parsers for these
> flatfiles, than for any HTML generated primarily for human
> readers.

This seems like a good idea, but my own attempt at parsing the unigene/LL flatfiles (like LL_tmpl) makes me worry that these files are just too large to parse each time I need information.  Might it be an even better idea to store these results in a local searchable database?  I think the people at the GO-consortium have done this with their GOannotations, but I'd never seen it made available for unigene/LL at ncbi.  Going to the website seemed to be the shortest path solution.

thoughts anyone?  I'm not really a programmer, just a scripter, so I may be way off-base here.
 
sagar



On Wed, 17 Jul 2002 10:19:56 +0200
Peter Slickers <piet@clondiag.com> wrote:

> Cayte wrote:
> > 
> >   I just did some experiments with LocusLink files and when I strip out the
> > html tags very little information is left.
> 
> Indeed, a LocusLink record contains only a few data fields, namely
> 
>     locusID        (Number)
>     symbol         (alphanumerical code, => genecards )
>     description    (text)
> 
> Further more, there is a list of related GenBank accessions for each LocusLink record.
> 
> 
> > For this reason I think I should use the same approach as UniGene.  Have you
> > checked out Record in
> > Unigene? Is this what you want?
> > 
> 
> For accessing LocusLink and maybe also for UniGene I would
> recommend to download the whole database in ASCII flatfile
> format, and then parsing the flat files. In my opinion
> it is much easier to write parsers for these
> flatfiles, than for any HTML generated primarily for human
> readers.
> 
> 
> Unigene by ftp:
> 
>   ftp://ftp.ncbi.nih.gov/repository/UniGene/ 
>   ftp://ftp.ncbi.nih.gov/repository/UniGene/README 
> 
> 
> 
> LocusLink by ftp:
> 
>   ftp://ftp.ncbi.nih.gov/refseq/LocusLink/
>   ftp://ftp.ncbi.nih.gov/refseq/LocusLink/README
> 
> 
> 
> Peter
> -------------------------------------------------------------------
> Peter Slickers                             piet@clondiag.com
> Clondiag Chip Technologies                 http://www.clondiag.com/
> Löbstedter Str. 105
> 07749 Jena
> Germany
> 
> Fon:  03641/5947-65                        Fax:  03641/5947-20
> -------------------------------------------------------------------
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>