[Biopython-dev] Genbank structured comments

Brian Osborne bosborne11 at verizon.net
Thu Sep 10 19:52:05 UTC 2015


Chris,

BioPerl does what you might call a compromise (in bioperl-live, not in any CPAN release). If a structured comment appears in COMMENT it’s still part of the comment (a string) but no returns are removed, it stays tabular. Thus it’s easy to detect and parse.

Yes, if there is a ‘structured_comment’ dict it could have a primary key and secondary keys. This was my first thought. So something like:

defaultdict(<class 'dict'>, {'Assembly-Data': {'a': 1, 'b': 2, 'c': 3}})

Brian O.


> On Sep 10, 2015, at 10:06 AM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
> This is very similar to the issue bioperl had with nested annotations; namely that some annotation data from SwissProt (GENE NAME I believe) had a hierarchal structure.  Seems a bit thornier in this case as the annotation would have a both a standard comment field and a named collection of meta-data tied together.  
> 
> Brian, how is this implemented in BioPerl? 
> 
> chris
> 
>> On Sep 10, 2015, at 10:47 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> 
>> Good question...
>> 
>> e.g. http://www.ncbi.nlm.nih.gov/nuccore/291609868
>> and http://www.ncbi.nlm.nih.gov/nuccore/FJ966082
>> 
>> It almost makes me wonder if that should have top level
>> keys of MIENS-Data and FluData - or is that too nested?
>> 
>> Peter
>> 
>> On Thu, Sep 10, 2015 at 4:37 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>>> Peter,
>>> 
>>> Another question, maybe the last one: what do we do what the “header” and “footer” strings, things like “FluData”, "GISAID_EpiFlu(TM)Data”, and “Assembly-Data”?
>>> 
>>> They could also be keys in the dict, of course. Values are ‘’?
>>> 
>>> Thanks again,
>>> 
>>> Brian O.
>>> 
>>> 
>>>> On Sep 10, 2015, at 1:25 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> 
>>>> On Wed, Sep 9, 2015 at 11:37 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>>>>> Chris,
>>>>> 
>>>>> This is the documentation I’m familiar with, but there may be more:
>>>>> 
>>>>> http://www.ncbi.nlm.nih.gov/genbank/structuredcomment
>>>>> 
>>>>> Peter, I can definitely separate these using ‘comment’ and
>>>>> ‘structured_comment’ keys in the record.annotations dict.
>>>>> 
>>>>> If there’s no structured comment in the Genbank file, would
>>>>> there simply be an empty dict in the SeqRecord?
>>>>> 
>>>>> E.g.
>>>>> 
>>>>>>>> record.annotations[‘structured_comment']
>>>>> {}
>>>> 
>>>> That makes sense - equally no entry in the annotation dictionary
>>>> would be reasonable.
>>>> 
>>>> Peter
>>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150910/0693113e/attachment.html>


More information about the Biopython-dev mailing list