[Biopython-dev] Genbank structured comments

Brian Osborne bosborne11 at verizon.net
Wed Sep 9 17:01:37 UTC 2015


Peter,

That is an interesting idea. What would be returned if the COMMENT has both plain and “structured comments” in it? Here’s one:

http://www.ncbi.nlm.nih.gov/nuccore/FJ966082

Thanks again,

Brian O.




> On Sep 9, 2015, at 7:27 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> 
> This sounds good - would you turn these into a Python dict?
> 
> Peter
> 
> On Wed, Sep 9, 2015 at 2:56 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>> All,
>> 
>> I noticed that BioPython, like the versions of BioPerl in CPAN, does not
>> handle GenBank structured comments
>> (http://www.ncbi.nlm.nih.gov/genbank/structuredcomment) in the ideal way.
>> Here’s an example structured comment:
>> 
>> COMMENT     ##FluData-START##
>>           EPI_ISOLATE_ID        :: EPI_ISL_77637
>>           NAME                  :: A/California/07/2009
>>           TYPE                  :: H1N1
>>           Segment_name          :: M'
>>           HOST_AGE              :: 54
>>           HOST_GENDER           :: F'
>>           PASSAGE               :: M1/C1 (2009-04-24)
>>           LOCATION              :: United States / California'
>>           COLLECT_DATE          :: 09-Apr-2009
>>           Lineage               :: A(H1N1)pdm09
>>           RESIST_TO_ADAMANTANES :: Resistant'
>>           RESIST_TO_OSELTAMIVIR :: Sensitive'
>>           RESIST_TO_ZANAMVIR    :: Sensitive'
>>           SPECIMEN_ID           :: H13596
>>           SENDER_LAB            :: Naval Health Research Center'
>>           SEQLAB_SAMPLE_ID      :: 2009712111
>>           EPI_SEQUENCE_ID       :: EPI273604
>>           ##FluData-END##
>> 
>> Or here: http://www.ncbi.nlm.nih.gov/nuccore/291609868
>> 
>> A table, with tag/value pairs. A fair number of bacterial genomes in GenBank
>> use the structured comment to hold MIGS/MIMS data. The comment() method
>> should return something like this, which is easily parsed:
>> 
>> ##FluData-START##
>> EPI_ISOLATE_ID        :: EPI_ISL_77637
>> NAME                  :: A/California/07/2009
>> TYPE                  :: H1N1
>> Segment_name          :: M'
>> HOST_AGE              :: 54
>> HOST_GENDER           :: F'
>> PASSAGE               :: M1/C1 (2009-04-24)
>> LOCATION              :: United States / California'
>> COLLECT_DATE          :: 09-Apr-2009
>> Lineage               :: A(H1N1)pdm09
>> RESIST_TO_ADAMANTANES :: Resistant'
>> RESIST_TO_OSELTAMIVIR :: Sensitive'
>> RESIST_TO_ZANAMVIR    :: Sensitive'
>> SPECIMEN_ID           :: H13596
>> SENDER_LAB            :: Naval Health Research Center'
>> SEQLAB_SAMPLE_ID      :: 2009712111
>> EPI_SEQUENCE_ID       :: EPI273604
>> ##FluData-END##
>> 
>> Rather than this, which is what it currently returns:
>> 
>> ##FluData-START## EPI_ISOLATE_ID        :: EPI_ISL_77637 NAME
>> :: A/California/07/2009 TYPE                  :: H1N1 Segment_name
>> :: M' HOST_AGE              :: 54 HOST_GENDER           :: F' PASSAGE
>> :: M1/C1 (2009-04-24) LOCATION              :: United States / California'
>> COLLECT_DATE          :: 09-Apr-2009 Lineage               :: A(H1N1)pdm09
>> RESIST_TO_ADAMANTANES :: Resistant' RESIST_TO_OSELTAMIVIR :: Sensitive'
>> RESIST_TO_ZANAMVIR    :: Sensitive' SPECIMEN_ID           :: H13596
>> SENDER_LAB            :: Naval Health Research Center' SEQLAB_SAMPLE_ID
>> :: 2009712111 EPI_SEQUENCE_ID       :: EPI273604 ##FluData-END##
>> 
>> Are there any objections to me putting in a pull request with this change? I
>> made this same fix in BioPerl. Of course, if the comment is a “normal” one,
>> it will be treated the same as it is treated now. Another words, the vast
>> majority of comments stay the same.
>> 
>> I’ll also add tests.
>> 
>> Thanks again,
>> 
>> Brian O.
>> 
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev




More information about the Biopython-dev mailing list