[Bioperl-l] Fwd: [SO-devel] NCBI GFF3 support

Fields, Christopher J cjfields at illinois.edu
Wed Mar 21 18:21:27 UTC 2012


For those interested...

chris

Begin forwarded message:

From: "Murphy, Terence (NIH/NLM/NCBI) [C]" <murphyte at ncbi.nlm.nih.gov<mailto:murphyte at ncbi.nlm.nih.gov>>
Subject: [SO-devel] NCBI GFF3 support
Date: March 21, 2012 1:15:29 PM CDT
To: "SO developers (song-devel at lists.sourceforge.net<mailto:song-devel at lists.sourceforge.net>)" <song-devel at lists.sourceforge.net<mailto:song-devel at lists.sourceforge.net>>
Reply-To: SO developers <song-devel at lists.sourceforge.net<mailto:song-devel at lists.sourceforge.net>>

Hi All,

I’m pleased to announce that NCBI has updated their GFF3 export software to the latest specifications (1.20), and is in the process of updating files on the NCBI Genomes FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/). Files are now available for the NCBI annotations of the latest assemblies for human, cow, dog, pig, chicken, and many others, and will be provided as part of future releases. See the README files in each species directory for further details.

For example, the human GRCh37.p5 annotation in top level (chromosome) coordinates is available at:
ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p5_top_level.gff3.gz

Files in the /Bacteria, /Viruses, and other subdirectories are being updated as part of rolling update cycles. Files with this header were produced with the new writer:
##gff-version 3
#!gff-spec-version 1.20
#!processor NCBI annotwriter

We’ve folded in a few bug fixes since we started using the new writer in production, and are working to refresh all the files in the near future. So you may see a few anomalies in files produced by annotwriter earlier this year. Files produced in March or later should be almost fine, with the exception of a problem with the ‘is_circular=’ tag starting with a lowercase 'i' (thanks to Peter for catching that so quickly).

annotwriter is available for download as part of the NCBI C++ Toolkit, but the public toolkit isn’t updated very often so the current version is missing many updates made in the last year. An updated version of the toolkit is tentatively scheduled to be released in the next few months, so I would wait for that before trying to use annotwriter yourself for ASN to GFF3 conversion.

Please contact the NCBI Service Desk (info at ncbi.nlm.nih.gov<mailto:info at ncbi.nlm.nih.gov>) if you have any questions or suggestions, or you can contact me directly or through this listserv.

Enjoy!

-Terence

-----
Terence Murphy, Ph.D.
RefSeq Project
NCBI/NLM/NIH/DHHS
45 Center Drive, Room 4AS.37D-82
Bethesda, MD  20892-6510
Phone: 00-1-301-402-0990
e-mail: murphyte at ncbi.nlm.nih.gov<mailto:murphyte at ncbi.nlm.nih.gov>


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure_______________________________________________
SOng-devel mailing list
SOng-devel at lists.sourceforge.net<mailto:SOng-devel at lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/song-devel





More information about the Bioperl-l mailing list