[Biopython-dev] Genbank structured comments

Brian Osborne bosborne11 at verizon.net
Wed Sep 9 22:37:49 UTC 2015


Chris,

This is the documentation I’m familiar with, but there may be more:

http://www.ncbi.nlm.nih.gov/genbank/structuredcomment

Peter, I can definitely separate these using ‘comment’ and ‘structured_comment’ keys in the record.annotations dict.

If there’s no structured comment in the Genbank file, would there simply be an empty dict in the SeqRecord? 

E.g.

>>> record.annotations[‘structured_comment']
{}

Brian O.

> On Sep 9, 2015, at 11:18 AM, Fields, Christopher J <cjfields at illinois.edu> wrote:
> 
> Is there any particular standard for these comment types?  I see these within WGS master records all the time (denotes basic metadata on assembly).
> 
> http://www.ncbi.nlm.nih.gov/nuccore/635626163 <http://www.ncbi.nlm.nih.gov/nuccore/635626163>
> http://www.ncbi.nlm.nih.gov/nuccore/AFTI00000000.1 <http://www.ncbi.nlm.nih.gov/nuccore/AFTI00000000.1>
> 
> (you can’t tell I’m working on a mussel assembly, right?)
> 
> chris
> 
>> On Sep 9, 2015, at 12:08 PM, Peter Cock <p.j.a.cock at googlemail.com <mailto:p.j.a.cock at googlemail.com>> wrote:
>> 
>> That's ... nasty. How about record.features["comment"] for the old style
>> plain text (as a Python string) and record.features["structured_comment"]
>> as a (sorted) Python dict?
>> 
>> This might make GenBank output easier too...
>> 
>> Peter
>> 
>> On Wed, Sep 9, 2015 at 6:01 PM, Brian Osborne <bosborne11 at verizon.net <mailto:bosborne11 at verizon.net>> wrote:
>>> Peter,
>>> 
>>> That is an interesting idea. What would be returned if the COMMENT has both plain and “structured comments” in it? Here’s one:
>>> 
>>> http://www.ncbi.nlm.nih.gov/nuccore/FJ966082 <http://www.ncbi.nlm.nih.gov/nuccore/FJ966082>
>>> 
>>> Thanks again,
>>> 
>>> Brian O.
>>> 
>>> 
>>> 
>>> 
>>>> On Sep 9, 2015, at 7:27 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>> 
>>>> This sounds good - would you turn these into a Python dict?
>>>> 
>>>> Peter
>>>> 
>>>> On Wed, Sep 9, 2015 at 2:56 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>>>>> All,
>>>>> 
>>>>> I noticed that BioPython, like the versions of BioPerl in CPAN, does not
>>>>> handle GenBank structured comments
>>>>> (http://www.ncbi.nlm.nih.gov/genbank/structuredcomment) in the ideal way.
>>>>> Here’s an example structured comment:
>>>>> 
>>>>> COMMENT     ##FluData-START##
>>>>>          EPI_ISOLATE_ID        :: EPI_ISL_77637
>>>>>          NAME                  :: A/California/07/2009
>>>>>          TYPE                  :: H1N1
>>>>>          Segment_name          :: M'
>>>>>          HOST_AGE              :: 54
>>>>>          HOST_GENDER           :: F'
>>>>>          PASSAGE               :: M1/C1 (2009-04-24)
>>>>>          LOCATION              :: United States / California'
>>>>>          COLLECT_DATE          :: 09-Apr-2009
>>>>>          Lineage               :: A(H1N1)pdm09
>>>>>          RESIST_TO_ADAMANTANES :: Resistant'
>>>>>          RESIST_TO_OSELTAMIVIR :: Sensitive'
>>>>>          RESIST_TO_ZANAMVIR    :: Sensitive'
>>>>>          SPECIMEN_ID           :: H13596
>>>>>          SENDER_LAB            :: Naval Health Research Center'
>>>>>          SEQLAB_SAMPLE_ID      :: 2009712111
>>>>>          EPI_SEQUENCE_ID       :: EPI273604
>>>>>          ##FluData-END##
>>>>> 
>>>>> Or here: http://www.ncbi.nlm.nih.gov/nuccore/291609868
>>>>> 
>>>>> A table, with tag/value pairs. A fair number of bacterial genomes in GenBank
>>>>> use the structured comment to hold MIGS/MIMS data. The comment() method
>>>>> should return something like this, which is easily parsed:
>>>>> 
>>>>> ##FluData-START##
>>>>> EPI_ISOLATE_ID        :: EPI_ISL_77637
>>>>> NAME                  :: A/California/07/2009
>>>>> TYPE                  :: H1N1
>>>>> Segment_name          :: M'
>>>>> HOST_AGE              :: 54
>>>>> HOST_GENDER           :: F'
>>>>> PASSAGE               :: M1/C1 (2009-04-24)
>>>>> LOCATION              :: United States / California'
>>>>> COLLECT_DATE          :: 09-Apr-2009
>>>>> Lineage               :: A(H1N1)pdm09
>>>>> RESIST_TO_ADAMANTANES :: Resistant'
>>>>> RESIST_TO_OSELTAMIVIR :: Sensitive'
>>>>> RESIST_TO_ZANAMVIR    :: Sensitive'
>>>>> SPECIMEN_ID           :: H13596
>>>>> SENDER_LAB            :: Naval Health Research Center'
>>>>> SEQLAB_SAMPLE_ID      :: 2009712111
>>>>> EPI_SEQUENCE_ID       :: EPI273604
>>>>> ##FluData-END##
>>>>> 
>>>>> Rather than this, which is what it currently returns:
>>>>> 
>>>>> ##FluData-START## EPI_ISOLATE_ID        :: EPI_ISL_77637 NAME
>>>>> :: A/California/07/2009 TYPE                  :: H1N1 Segment_name
>>>>> :: M' HOST_AGE              :: 54 HOST_GENDER           :: F' PASSAGE
>>>>> :: M1/C1 (2009-04-24) LOCATION              :: United States / California'
>>>>> COLLECT_DATE          :: 09-Apr-2009 Lineage               :: A(H1N1)pdm09
>>>>> RESIST_TO_ADAMANTANES :: Resistant' RESIST_TO_OSELTAMIVIR :: Sensitive'
>>>>> RESIST_TO_ZANAMVIR    :: Sensitive' SPECIMEN_ID           :: H13596
>>>>> SENDER_LAB            :: Naval Health Research Center' SEQLAB_SAMPLE_ID
>>>>> :: 2009712111 EPI_SEQUENCE_ID       :: EPI273604 ##FluData-END##
>>>>> 
>>>>> Are there any objections to me putting in a pull request with this change? I
>>>>> made this same fix in BioPerl. Of course, if the comment is a “normal” one,
>>>>> it will be treated the same as it is treated now. Another words, the vast
>>>>> majority of comments stay the same.
>>>>> 
>>>>> I’ll also add tests.
>>>>> 
>>>>> Thanks again,
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> _______________________________________________
>>>>> Biopython-dev mailing list
>>>>> Biopython-dev at mailman.open-bio.org
>>>>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>>> 
>> 
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org <mailto:Biopython-dev at mailman.open-bio.org>
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150909/acc5617f/attachment.html>


More information about the Biopython-dev mailing list