[Bioperl-l] Genbank file : bad features (tag) order with /translation
Brian Osborne
bosborne11 at verizon.net
Wed Aug 3 17:06:05 UTC 2011
Peter,
I currently use BioPerl and SeqIO::genbank to create the *gbf files for NCBI submission, they've always accepted them. In fact I think they don't even use them, I believe they use the *tbl, *fsa, and *agp files and the ASN file as data sources.
Brian O
On Aug 3, 2011, at 12:52 PM, Chris Fields wrote:
> On Aug 3, 2011, at 11:00 AM, Peter Cock wrote:
>
>> 2011/8/3 Maxime Déraspe <maximilien1er at gmail.com>:
>>>>
>>>> Why do you care about the order?
>>>>
>>>
>>> Hi Peter,
>>>
>>> I care about the order for the submission to ncbi.
>>
>> Do the NCBI have some guidelines which ask for a particular order?
>
> No, beyond the feature table there is no specification that indicates such that I am aware of. Submitted data is tabular; sequin is a nicer GUI API for getting data into a useful format for submission to NCBI, where data is converted to ASN.1 I believe.
>
>>> But I guess they
>>> will reformat the file before getting it in their database.
>>
>> They seem to generate the official GenBank files from their
>> database - so I doubt the input order matters.
>
> Yep, that's correct. If NCBI ruled the world everyone would be using ASN.1 (b/c that's what they use internally).
>
>>> It's also
>>> visually better when the translation of the protein comes in the end
>>> of the annotation for the CDS and not before /product, /note ....
>>
>> I do see your point, but if that were the only motivation I wouldn't
>> want to make generating GenBank output any more complicated
>> than it already is.
> ...
>>> Anyway maybe I'll reformat the file in sequin table for a direct
>>> submission to ncbi with sequin.
>>>
>>> Thank you.
>>>
>>> Max
>>
>> Peter
>
>
> Maxime, I find most users try to avoid using GenBank format except when absolutely needed. There is a very good reason Sequin and tbl2asn are used by NCBI for submissions; they end up generating simple tabular data that is easier to feed into their internal ASN.1 format. Genbank is a nice human-readable format, but structure-wise I find it's a pain to deal with, not to mention the variant third-party 'genbank' data that users want us to handle.
>
> We try to support generation of output within reason, but that's never been our primary goal. As long as the output generated is capable of being re-read by our parsers with the data intact and generates sane data we're pretty happy.
>
> Saying that, any additions to deal with this are perfectly welcome (I pointed out one mechanism that could be used), but they would have to address the concerns Peter and I alluded to previously, and it would be nice to evaluate how any changes affect performance. You are more than welcome to submit this as a feature request using our redmine server (including patches if you do this yourself):
>
> https://redmine.open-bio.org/
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list