[Bioperl-l] Genbank file : bad features (tag) order with /translation
Chris Fields
cjfields at illinois.edu
Wed Aug 3 17:10:31 UTC 2011
IMHO I find genbank too unwieldy, but it's nice to know the output works for NCBI submission.
chris
On Aug 3, 2011, at 12:06 PM, Brian Osborne wrote:
> Peter,
>
> I currently use BioPerl and SeqIO::genbank to create the *gbf files for NCBI submission, they've always accepted them. In fact I think they don't even use them, I believe they use the *tbl, *fsa, and *agp files and the ASN file as data sources.
>
> Brian O
>
> On Aug 3, 2011, at 12:52 PM, Chris Fields wrote:
>
>> On Aug 3, 2011, at 11:00 AM, Peter Cock wrote:
>>
>>> 2011/8/3 Maxime Déraspe <maximilien1er at gmail.com>:
>>>>>
>>>>> Why do you care about the order?
>>>>>
>>>>
>>>> Hi Peter,
>>>>
>>>> I care about the order for the submission to ncbi.
>>>
>>> Do the NCBI have some guidelines which ask for a particular order?
>>
>> No, beyond the feature table there is no specification that indicates such that I am aware of. Submitted data is tabular; sequin is a nicer GUI API for getting data into a useful format for submission to NCBI, where data is converted to ASN.1 I believe.
>>
>>>> But I guess they
>>>> will reformat the file before getting it in their database.
>>>
>>> They seem to generate the official GenBank files from their
>>> database - so I doubt the input order matters.
>>
>> Yep, that's correct. If NCBI ruled the world everyone would be using ASN.1 (b/c that's what they use internally).
>>
>>>> It's also
>>>> visually better when the translation of the protein comes in the end
>>>> of the annotation for the CDS and not before /product, /note ....
>>>
>>> I do see your point, but if that were the only motivation I wouldn't
>>> want to make generating GenBank output any more complicated
>>> than it already is.
>> ...
>>>> Anyway maybe I'll reformat the file in sequin table for a direct
>>>> submission to ncbi with sequin.
>>>>
>>>> Thank you.
>>>>
>>>> Max
>>>
>>> Peter
>>
>>
>> Maxime, I find most users try to avoid using GenBank format except when absolutely needed. There is a very good reason Sequin and tbl2asn are used by NCBI for submissions; they end up generating simple tabular data that is easier to feed into their internal ASN.1 format. Genbank is a nice human-readable format, but structure-wise I find it's a pain to deal with, not to mention the variant third-party 'genbank' data that users want us to handle.
>>
>> We try to support generation of output within reason, but that's never been our primary goal. As long as the output generated is capable of being re-read by our parsers with the data intact and generates sane data we're pretty happy.
>>
>> Saying that, any additions to deal with this are perfectly welcome (I pointed out one mechanism that could be used), but they would have to address the concerns Peter and I alluded to previously, and it would be nice to evaluate how any changes affect performance. You are more than welcome to submit this as a feature request using our redmine server (including patches if you do this yourself):
>>
>> https://redmine.open-bio.org/
>>
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list