[BioPython] LocusLink to Gene transition at NCBI

Joanne Adamkewicz jadamkew at exelixis.com
Mon Jan 24 18:32:33 EST 2005


Hi all,

The biotech company I work for, Exelixis, is a heavy user of the data provided by NCBI's LocusLink service.  We ftp the files (generally white-space delimited flat files) and parse them into gdbm indices overlaid by a Python wrapper for local use.

As you are probably be aware if you use this data source, they are discontinuing LocusLink and providing the data via a new service, called EntrezGene.  The transition is happening fairly soon -- LocusLink data files will no longer be updated after March 1:
http://www.ncbi.nlm.nih.gov/entrez/query/static/help/LL2G.html

We are in the process of looking at the new data files, which are in ASN.1 format, and deciding how to get out the data we need.  I'm wondering if anyone else out there is a big user of LocusLink data, and if so, what you are doing to prepare for the transition?  Specifically, have you had any luck parsing the ASN.1 data into XML using NCBI's tools?  They have an asn2xml utility available here: ftp://ftp.ncbi.nih.gov/toolbox/xml/asn2xml/   It works fine on GenBank records, but doesn't work on Gene records.

I'm aware that BioPython has support for LocusLink, with a Martel-based parser to read the flat files.  Is there any intent to develop Python functionality for dealing with the new ANS.1 Gene records?

Happy to continue any discussion off-group, please email me with any ideas or comments.
Thanks!
Joanne


=================================
Joanne I. Adamkewicz, PhD
Informatics Research Scientist II
Exelixis, Inc.
650-837-7151
jadamkew at exelixis.com



More information about the BioPython mailing list