[Biopython-dev] Genbank structured comments

Peter Cock p.j.a.cock at googlemail.com
Wed Sep 9 17:08:10 UTC 2015


That's ... nasty. How about record.features["comment"] for the old style
plain text (as a Python string) and record.features["structured_comment"]
as a (sorted) Python dict?

This might make GenBank output easier too...

Peter

On Wed, Sep 9, 2015 at 6:01 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
> Peter,
>
> That is an interesting idea. What would be returned if the COMMENT has both plain and “structured comments” in it? Here’s one:
>
> http://www.ncbi.nlm.nih.gov/nuccore/FJ966082
>
> Thanks again,
>
> Brian O.
>
>
>
>
>> On Sep 9, 2015, at 7:27 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>
>> This sounds good - would you turn these into a Python dict?
>>
>> Peter
>>
>> On Wed, Sep 9, 2015 at 2:56 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>>> All,
>>>
>>> I noticed that BioPython, like the versions of BioPerl in CPAN, does not
>>> handle GenBank structured comments
>>> (http://www.ncbi.nlm.nih.gov/genbank/structuredcomment) in the ideal way.
>>> Here’s an example structured comment:
>>>
>>> COMMENT     ##FluData-START##
>>>           EPI_ISOLATE_ID        :: EPI_ISL_77637
>>>           NAME                  :: A/California/07/2009
>>>           TYPE                  :: H1N1
>>>           Segment_name          :: M'
>>>           HOST_AGE              :: 54
>>>           HOST_GENDER           :: F'
>>>           PASSAGE               :: M1/C1 (2009-04-24)
>>>           LOCATION              :: United States / California'
>>>           COLLECT_DATE          :: 09-Apr-2009
>>>           Lineage               :: A(H1N1)pdm09
>>>           RESIST_TO_ADAMANTANES :: Resistant'
>>>           RESIST_TO_OSELTAMIVIR :: Sensitive'
>>>           RESIST_TO_ZANAMVIR    :: Sensitive'
>>>           SPECIMEN_ID           :: H13596
>>>           SENDER_LAB            :: Naval Health Research Center'
>>>           SEQLAB_SAMPLE_ID      :: 2009712111
>>>           EPI_SEQUENCE_ID       :: EPI273604
>>>           ##FluData-END##
>>>
>>> Or here: http://www.ncbi.nlm.nih.gov/nuccore/291609868
>>>
>>> A table, with tag/value pairs. A fair number of bacterial genomes in GenBank
>>> use the structured comment to hold MIGS/MIMS data. The comment() method
>>> should return something like this, which is easily parsed:
>>>
>>> ##FluData-START##
>>> EPI_ISOLATE_ID        :: EPI_ISL_77637
>>> NAME                  :: A/California/07/2009
>>> TYPE                  :: H1N1
>>> Segment_name          :: M'
>>> HOST_AGE              :: 54
>>> HOST_GENDER           :: F'
>>> PASSAGE               :: M1/C1 (2009-04-24)
>>> LOCATION              :: United States / California'
>>> COLLECT_DATE          :: 09-Apr-2009
>>> Lineage               :: A(H1N1)pdm09
>>> RESIST_TO_ADAMANTANES :: Resistant'
>>> RESIST_TO_OSELTAMIVIR :: Sensitive'
>>> RESIST_TO_ZANAMVIR    :: Sensitive'
>>> SPECIMEN_ID           :: H13596
>>> SENDER_LAB            :: Naval Health Research Center'
>>> SEQLAB_SAMPLE_ID      :: 2009712111
>>> EPI_SEQUENCE_ID       :: EPI273604
>>> ##FluData-END##
>>>
>>> Rather than this, which is what it currently returns:
>>>
>>> ##FluData-START## EPI_ISOLATE_ID        :: EPI_ISL_77637 NAME
>>> :: A/California/07/2009 TYPE                  :: H1N1 Segment_name
>>> :: M' HOST_AGE              :: 54 HOST_GENDER           :: F' PASSAGE
>>> :: M1/C1 (2009-04-24) LOCATION              :: United States / California'
>>> COLLECT_DATE          :: 09-Apr-2009 Lineage               :: A(H1N1)pdm09
>>> RESIST_TO_ADAMANTANES :: Resistant' RESIST_TO_OSELTAMIVIR :: Sensitive'
>>> RESIST_TO_ZANAMVIR    :: Sensitive' SPECIMEN_ID           :: H13596
>>> SENDER_LAB            :: Naval Health Research Center' SEQLAB_SAMPLE_ID
>>> :: 2009712111 EPI_SEQUENCE_ID       :: EPI273604 ##FluData-END##
>>>
>>> Are there any objections to me putting in a pull request with this change? I
>>> made this same fix in BioPerl. Of course, if the comment is a “normal” one,
>>> it will be treated the same as it is treated now. Another words, the vast
>>> majority of comments stay the same.
>>>
>>> I’ll also add tests.
>>>
>>> Thanks again,
>>>
>>> Brian O.
>>>
>>> _______________________________________________
>>> Biopython-dev mailing list
>>> Biopython-dev at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>



More information about the Biopython-dev mailing list