[Biopython-dev] Fwd: [SO-devel] NCBI GFF3 support

Peter Cock p.j.a.cock at googlemail.com
Wed Mar 21 23:24:01 UTC 2012


Good news for GFF3!

The long anticipated NCBI GFF3 corrections are happening now.
http://blastedbio.blogspot.co.uk/2011/08/why-are-ncbi-gff3-files-still-broken.html

This should make putting together a good test suite for Brad's
GFF code much easier :)

(For anyone not aware, the SO ontology developers mailing list
also serves as the GFF3 standard discussion mailing list.)

Peter

---------- Forwarded message ----------
From: Murphy, Terence (NIH/NLM/NCBI) [C] <murphyte at ncbi.nlm.nih.gov>
Date: Wed, Mar 21, 2012 at 6:15 PM
Subject: [SO-devel] NCBI GFF3 support
To: "SO developers (song-devel at lists.sourceforge.net)"
<song-devel at lists.sourceforge.net>

Hi All,

I’m pleased to announce that NCBI has updated their GFF3 export
software to the latest specifications (1.20), and is in the process of
updating files on the NCBI Genomes FTP site
(ftp://ftp.ncbi.nlm.nih.gov/genomes/). Files are now available for the
NCBI annotations of the latest assemblies for human, cow, dog, pig,
chicken, and many others, and will be provided as part of future
releases. See the README files in each species directory for further
details.

For example, the human GRCh37.p5 annotation in top level (chromosome)
coordinates is available at:

ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p5_top_level.gff3.gz

Files in the /Bacteria, /Viruses, and other subdirectories are being
updated as part of rolling update cycles. Files with this header were
produced with the new writer:

##gff-version 3
#!gff-spec-version 1.20
#!processor NCBI annotwriter

We’ve folded in a few bug fixes since we started using the new writer
in production, and are working to refresh all the files in the near
future. So you may see a few anomalies in files produced by
annotwriter earlier this year. Files produced in March or later should
be almost fine, with the exception of a problem with the
‘is_circular=’ tag starting with a lowercase 'i' (thanks to Peter for
catching that so quickly).

annotwriter is available for download as part of the NCBI C++ Toolkit,
but the public toolkit isn’t updated very often so the current version
is missing many updates made in the last year. An updated version of
the toolkit is tentatively scheduled to be released in the next few
months, so I would wait for that before trying to use annotwriter
yourself for ASN to GFF3 conversion.

Please contact the NCBI Service Desk (info at ncbi.nlm.nih.gov) if you
have any questions or suggestions, or you can contact me directly or
through this listserv.

Enjoy!

-Terence

-----

Terence Murphy, Ph.D.
RefSeq Project
NCBI/NLM/NIH/DHHS
45 Center Drive, Room 4AS.37D-82
Bethesda, MD  20892-6510
Phone: 00-1-301-402-0990
e-mail: murphyte at ncbi.nlm.nih.gov




More information about the Biopython-dev mailing list