[Bioperl-l] genbank2gff3 for prokaryotes?

Chris Fields cjfields at illinois.edu
Sun Aug 16 02:47:36 UTC 2009


On Aug 15, 2009, at 4:25 PM, Robert Buels wrote:

> Chris Fields wrote:
> > In fact, seeing as we're refactoring GFF and other aspects of  
> Features
> > in bioperl, this may be the best time to add something in.
>
> Reading that thread, it sounds like most of the issues revolve  
> around when and how to use the unflattener.  Perhaps just adding  
> another command line switch or two to the script would be appropriate?
>
> Editorializing a bit, it's really disheartening that Genbank stores  
> features in such a lossy way.
>
> Rob

Just remembered: NCBI does supply GFF3 files for bacterial genomes,  
but I'm not sure how well they correspond to the GFF3 specification.   
For example:

ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Aquifex_aeolicus/NC_000918.gff

A quick glance looks okay, but they don't include FASTA sequence.

I think much of the problem with NCBI/GenBank has to do with lack of  
curation on how submissions are made (lots of inconsistencies).  I'm  
not sure how easy they will be to deal with, but the only way we can  
deal with that is looking at examples of problematic data (IIRC the  
Sulfolobus solfataricus genome GB file was a mess, so maybe that's  
worth a look).

chris



More information about the Bioperl-l mailing list