[Biopython-dev] Genbank structured comments
Brian Osborne
bosborne11 at verizon.net
Wed Sep 9 22:37:49 UTC 2015
Chris,
This is the documentation I’m familiar with, but there may be more:
http://www.ncbi.nlm.nih.gov/genbank/structuredcomment
Peter, I can definitely separate these using ‘comment’ and ‘structured_comment’ keys in the record.annotations dict.
If there’s no structured comment in the Genbank file, would there simply be an empty dict in the SeqRecord?
E.g.
>>> record.annotations[‘structured_comment']
{}
Brian O.
> On Sep 9, 2015, at 11:18 AM, Fields, Christopher J <cjfields at illinois.edu> wrote:
>
> Is there any particular standard for these comment types? I see these within WGS master records all the time (denotes basic metadata on assembly).
>
> http://www.ncbi.nlm.nih.gov/nuccore/635626163 <http://www.ncbi.nlm.nih.gov/nuccore/635626163>
> http://www.ncbi.nlm.nih.gov/nuccore/AFTI00000000.1 <http://www.ncbi.nlm.nih.gov/nuccore/AFTI00000000.1>
>
> (you can’t tell I’m working on a mussel assembly, right?)
>
> chris
>
>> On Sep 9, 2015, at 12:08 PM, Peter Cock <p.j.a.cock at googlemail.com <mailto:p.j.a.cock at googlemail.com>> wrote:
>>
>> That's ... nasty. How about record.features["comment"] for the old style
>> plain text (as a Python string) and record.features["structured_comment"]
>> as a (sorted) Python dict?
>>
>> This might make GenBank output easier too...
>>
>> Peter
>>
>> On Wed, Sep 9, 2015 at 6:01 PM, Brian Osborne <bosborne11 at verizon.net <mailto:bosborne11 at verizon.net>> wrote:
>>> Peter,
>>>
>>> That is an interesting idea. What would be returned if the COMMENT has both plain and “structured comments” in it? Here’s one:
>>>
>>> http://www.ncbi.nlm.nih.gov/nuccore/FJ966082 <http://www.ncbi.nlm.nih.gov/nuccore/FJ966082>
>>>
>>> Thanks again,
>>>
>>> Brian O.
>>>
>>>
>>>
>>>
>>>> On Sep 9, 2015, at 7:27 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>>>
>>>> This sounds good - would you turn these into a Python dict?
>>>>
>>>> Peter
>>>>
>>>> On Wed, Sep 9, 2015 at 2:56 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>>>>> All,
>>>>>
>>>>> I noticed that BioPython, like the versions of BioPerl in CPAN, does not
>>>>> handle GenBank structured comments
>>>>> (http://www.ncbi.nlm.nih.gov/genbank/structuredcomment) in the ideal way.
>>>>> Here’s an example structured comment:
>>>>>
>>>>> COMMENT ##FluData-START##
>>>>> EPI_ISOLATE_ID :: EPI_ISL_77637
>>>>> NAME :: A/California/07/2009
>>>>> TYPE :: H1N1
>>>>> Segment_name :: M'
>>>>> HOST_AGE :: 54
>>>>> HOST_GENDER :: F'
>>>>> PASSAGE :: M1/C1 (2009-04-24)
>>>>> LOCATION :: United States / California'
>>>>> COLLECT_DATE :: 09-Apr-2009
>>>>> Lineage :: A(H1N1)pdm09
>>>>> RESIST_TO_ADAMANTANES :: Resistant'
>>>>> RESIST_TO_OSELTAMIVIR :: Sensitive'
>>>>> RESIST_TO_ZANAMVIR :: Sensitive'
>>>>> SPECIMEN_ID :: H13596
>>>>> SENDER_LAB :: Naval Health Research Center'
>>>>> SEQLAB_SAMPLE_ID :: 2009712111
>>>>> EPI_SEQUENCE_ID :: EPI273604
>>>>> ##FluData-END##
>>>>>
>>>>> Or here: http://www.ncbi.nlm.nih.gov/nuccore/291609868
>>>>>
>>>>> A table, with tag/value pairs. A fair number of bacterial genomes in GenBank
>>>>> use the structured comment to hold MIGS/MIMS data. The comment() method
>>>>> should return something like this, which is easily parsed:
>>>>>
>>>>> ##FluData-START##
>>>>> EPI_ISOLATE_ID :: EPI_ISL_77637
>>>>> NAME :: A/California/07/2009
>>>>> TYPE :: H1N1
>>>>> Segment_name :: M'
>>>>> HOST_AGE :: 54
>>>>> HOST_GENDER :: F'
>>>>> PASSAGE :: M1/C1 (2009-04-24)
>>>>> LOCATION :: United States / California'
>>>>> COLLECT_DATE :: 09-Apr-2009
>>>>> Lineage :: A(H1N1)pdm09
>>>>> RESIST_TO_ADAMANTANES :: Resistant'
>>>>> RESIST_TO_OSELTAMIVIR :: Sensitive'
>>>>> RESIST_TO_ZANAMVIR :: Sensitive'
>>>>> SPECIMEN_ID :: H13596
>>>>> SENDER_LAB :: Naval Health Research Center'
>>>>> SEQLAB_SAMPLE_ID :: 2009712111
>>>>> EPI_SEQUENCE_ID :: EPI273604
>>>>> ##FluData-END##
>>>>>
>>>>> Rather than this, which is what it currently returns:
>>>>>
>>>>> ##FluData-START## EPI_ISOLATE_ID :: EPI_ISL_77637 NAME
>>>>> :: A/California/07/2009 TYPE :: H1N1 Segment_name
>>>>> :: M' HOST_AGE :: 54 HOST_GENDER :: F' PASSAGE
>>>>> :: M1/C1 (2009-04-24) LOCATION :: United States / California'
>>>>> COLLECT_DATE :: 09-Apr-2009 Lineage :: A(H1N1)pdm09
>>>>> RESIST_TO_ADAMANTANES :: Resistant' RESIST_TO_OSELTAMIVIR :: Sensitive'
>>>>> RESIST_TO_ZANAMVIR :: Sensitive' SPECIMEN_ID :: H13596
>>>>> SENDER_LAB :: Naval Health Research Center' SEQLAB_SAMPLE_ID
>>>>> :: 2009712111 EPI_SEQUENCE_ID :: EPI273604 ##FluData-END##
>>>>>
>>>>> Are there any objections to me putting in a pull request with this change? I
>>>>> made this same fix in BioPerl. Of course, if the comment is a “normal” one,
>>>>> it will be treated the same as it is treated now. Another words, the vast
>>>>> majority of comments stay the same.
>>>>>
>>>>> I’ll also add tests.
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Brian O.
>>>>>
>>>>> _______________________________________________
>>>>> Biopython-dev mailing list
>>>>> Biopython-dev at mailman.open-bio.org
>>>>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>>>
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org <mailto:Biopython-dev at mailman.open-bio.org>
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150909/acc5617f/attachment.html>
More information about the Biopython-dev
mailing list