[BioPython] UniGene parser

Wed, 17 Jul 2002 10:19:56 +0200

Cayte wrote:
> 
>   I just did some experiments with LocusLink files and when I strip out the
> html tags very little information is left.

Indeed, a LocusLink record contains only a few data fields, namely

    locusID        (Number)
    symbol         (alphanumerical code, => genecards )
    description    (text)

Further more, there is a list of related GenBank accessions for each LocusLink record.

> For this reason I think I should use the same approach as UniGene.  Have you
> checked out Record in
> Unigene? Is this what you want?
> 

For accessing LocusLink and maybe also for UniGene I would
recommend to download the whole database in ASCII flatfile
format, and then parsing the flat files. In my opinion
it is much easier to write parsers for these
flatfiles, than for any HTML generated primarily for human
readers.

Unigene by ftp:

  ftp://ftp.ncbi.nih.gov/repository/UniGene/ 
  ftp://ftp.ncbi.nih.gov/repository/UniGene/README 

LocusLink by ftp:

  ftp://ftp.ncbi.nih.gov/refseq/LocusLink/
  ftp://ftp.ncbi.nih.gov/refseq/LocusLink/README

Peter
-------------------------------------------------------------------
Peter Slickers                             piet@clondiag.com
Clondiag Chip Technologies                 http://www.clondiag.com/
Löbstedter Str. 105
07749 Jena
Germany

Fon:  03641/5947-65                        Fax:  03641/5947-20
-------------------------------------------------------------------