[Bioperl-l] Genbank file : bad features (tag) order with /translation

Chris Fields cjfields at illinois.edu
Wed Aug 3 14:08:33 UTC 2011


On Aug 3, 2011, at 8:46 AM, Peter Cock wrote:

> 2011/8/3 Maxime Déraspe <maximilien1er at gmail.com>:
>> Hi,
>> 
>> when I parse a genbank file no matter what I do, the /
>> translation="MKAV.." tag value of a CDS never appear in the last place
>> as it should be. Other tags like /note= /product comes after /
>> translation which it's not the usual practice with genbank file. Could
>> anyone have an idea how to deal with it... put /translation tag value
>> in the last place when I write the genbank file.
>> 
>> Thank you !
>> 
>> Max
> 
> Hi Max,
> 
> I'm not aware of anything in the feature table specification
> about the order of the feature qualifiers (the "tags" like /note
> and /product). See http://www.ncbi.nlm.nih.gov/collab/FT/
> 
> I suspect BioPerl is using a hash (Biopython uses a dictionary)
> for the feature qualifiers, which would discard the order.
> 
> Why do you care about the order?
> 
> Peter

Yes, it uses a hash based on the feature tags.  Not sure how Biopython handles it but my guess is something similar (Peter?).  

The output order was never a chief concern of ours.  To tell the truth our main focus has never been simple conversion, except to transform data into a format that is more manageable/normalized.  

For those interested in making this change, all the code  for printing features is in one method in Bio::SeqIO::genbank, _print_GenBank_FTHelper().  The best way to handle this would be to allow an optional coderef/callback that takes the feature (or the tags) and allows custom sorting and printing; I don't want to get into messy semantics on how to specifically sort tags, best to let the user decide.

chris



More information about the Bioperl-l mailing list