[Bioperl-l] Genbank file : bad features (tag) order with /translation

Brian Osborne bosborne11 at verizon.net
Wed Aug 3 17:06:05 UTC 2011


Peter,

I currently use BioPerl and SeqIO::genbank to create the *gbf files for NCBI submission, they've always accepted them. In fact I think they don't even use them, I believe they use the *tbl, *fsa, and *agp files and the ASN file as data sources.

Brian O

On Aug 3, 2011, at 12:52 PM, Chris Fields wrote:

> On Aug 3, 2011, at 11:00 AM, Peter Cock wrote:
> 
>> 2011/8/3 Maxime Déraspe <maximilien1er at gmail.com>:
>>>> 
>>>> Why do you care about the order?
>>>> 
>>> 
>>> Hi Peter,
>>> 
>>> I care about the order for the submission to ncbi.
>> 
>> Do the NCBI have some guidelines which ask for a particular order?
> 
> No, beyond the feature table there is no specification that indicates such that I am aware of.  Submitted data is tabular; sequin is a nicer GUI API for getting data into a useful format for submission to NCBI, where data is converted to ASN.1 I believe.
> 
>>> But I guess they
>>> will reformat the file before getting it in their database.
>> 
>> They seem to generate the official GenBank files from their
>> database - so I doubt the input order matters.
> 
> Yep, that's correct.  If NCBI ruled the world everyone would be using ASN.1 (b/c that's what they use internally).
> 
>>> It's also
>>> visually better when the translation of the protein comes in the end
>>> of the annotation for the CDS and not before /product, /note ....
>> 
>> I do see your point, but if that were the only motivation I wouldn't
>> want to make generating GenBank output any more complicated
>> than it already is.
> ...
>>> Anyway maybe I'll reformat the file in sequin table for a direct
>>> submission to ncbi with sequin.
>>> 
>>> Thank you.
>>> 
>>> Max
>> 
>> Peter
> 
> 
> Maxime, I find most users try to avoid using GenBank format except when absolutely needed.  There is a very good reason Sequin and tbl2asn are used by NCBI for submissions; they end up generating simple tabular data that is easier to feed into their internal ASN.1 format.  Genbank is a nice human-readable format, but structure-wise I find it's a pain to deal with, not to mention the variant third-party 'genbank' data that users want us to handle.
> 
> We try to support generation of output within reason, but that's never been our primary goal.  As long as the output generated is capable of being re-read by our parsers with the data intact and generates sane data we're pretty happy.
> 
> Saying that, any additions to deal with this are perfectly welcome (I pointed out one mechanism that could be used), but they would have to address the concerns Peter and I alluded to previously, and it would be nice to evaluate how any changes affect performance.  You are more than welcome to submit this as a feature request using our redmine server (including patches if you do this yourself):
> 
> https://redmine.open-bio.org/
> 
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list