[Biopython-dev] Genbank structured comments

Fields, Christopher J cjfields at illinois.edu
Wed Sep 9 18:18:08 UTC 2015


Is there any particular standard for these comment types?  I see these within WGS master records all the time (denotes basic metadata on assembly).

http://www.ncbi.nlm.nih.gov/nuccore/635626163
http://www.ncbi.nlm.nih.gov/nuccore/AFTI00000000.1

(you can’t tell I’m working on a mussel assembly, right?)

chris

On Sep 9, 2015, at 12:08 PM, Peter Cock <p.j.a.cock at googlemail.com<mailto:p.j.a.cock at googlemail.com>> wrote:

That's ... nasty. How about record.features["comment"] for the old style
plain text (as a Python string) and record.features["structured_comment"]
as a (sorted) Python dict?

This might make GenBank output easier too...

Peter

On Wed, Sep 9, 2015 at 6:01 PM, Brian Osborne <bosborne11 at verizon.net<mailto:bosborne11 at verizon.net>> wrote:
Peter,

That is an interesting idea. What would be returned if the COMMENT has both plain and “structured comments” in it? Here’s one:

http://www.ncbi.nlm.nih.gov/nuccore/FJ966082

Thanks again,

Brian O.




On Sep 9, 2015, at 7:27 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:

This sounds good - would you turn these into a Python dict?

Peter

On Wed, Sep 9, 2015 at 2:56 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
All,

I noticed that BioPython, like the versions of BioPerl in CPAN, does not
handle GenBank structured comments
(http://www.ncbi.nlm.nih.gov/genbank/structuredcomment) in the ideal way.
Here’s an example structured comment:

COMMENT     ##FluData-START##
         EPI_ISOLATE_ID        :: EPI_ISL_77637
         NAME                  :: A/California/07/2009
         TYPE                  :: H1N1
         Segment_name          :: M'
         HOST_AGE              :: 54
         HOST_GENDER           :: F'
         PASSAGE               :: M1/C1 (2009-04-24)
         LOCATION              :: United States / California'
         COLLECT_DATE          :: 09-Apr-2009
         Lineage               :: A(H1N1)pdm09
         RESIST_TO_ADAMANTANES :: Resistant'
         RESIST_TO_OSELTAMIVIR :: Sensitive'
         RESIST_TO_ZANAMVIR    :: Sensitive'
         SPECIMEN_ID           :: H13596
         SENDER_LAB            :: Naval Health Research Center'
         SEQLAB_SAMPLE_ID      :: 2009712111
         EPI_SEQUENCE_ID       :: EPI273604
         ##FluData-END##

Or here: http://www.ncbi.nlm.nih.gov/nuccore/291609868

A table, with tag/value pairs. A fair number of bacterial genomes in GenBank
use the structured comment to hold MIGS/MIMS data. The comment() method
should return something like this, which is easily parsed:

##FluData-START##
EPI_ISOLATE_ID        :: EPI_ISL_77637
NAME                  :: A/California/07/2009
TYPE                  :: H1N1
Segment_name          :: M'
HOST_AGE              :: 54
HOST_GENDER           :: F'
PASSAGE               :: M1/C1 (2009-04-24)
LOCATION              :: United States / California'
COLLECT_DATE          :: 09-Apr-2009
Lineage               :: A(H1N1)pdm09
RESIST_TO_ADAMANTANES :: Resistant'
RESIST_TO_OSELTAMIVIR :: Sensitive'
RESIST_TO_ZANAMVIR    :: Sensitive'
SPECIMEN_ID           :: H13596
SENDER_LAB            :: Naval Health Research Center'
SEQLAB_SAMPLE_ID      :: 2009712111
EPI_SEQUENCE_ID       :: EPI273604
##FluData-END##

Rather than this, which is what it currently returns:

##FluData-START## EPI_ISOLATE_ID        :: EPI_ISL_77637 NAME
:: A/California/07/2009 TYPE                  :: H1N1 Segment_name
:: M' HOST_AGE              :: 54 HOST_GENDER           :: F' PASSAGE
:: M1/C1 (2009-04-24) LOCATION              :: United States / California'
COLLECT_DATE          :: 09-Apr-2009 Lineage               :: A(H1N1)pdm09
RESIST_TO_ADAMANTANES :: Resistant' RESIST_TO_OSELTAMIVIR :: Sensitive'
RESIST_TO_ZANAMVIR    :: Sensitive' SPECIMEN_ID           :: H13596
SENDER_LAB            :: Naval Health Research Center' SEQLAB_SAMPLE_ID
:: 2009712111 EPI_SEQUENCE_ID       :: EPI273604 ##FluData-END##

Are there any objections to me putting in a pull request with this change? I
made this same fix in BioPerl. Of course, if the comment is a “normal” one,
it will be treated the same as it is treated now. Another words, the vast
majority of comments stay the same.

I’ll also add tests.

Thanks again,

Brian O.

_______________________________________________
Biopython-dev mailing list
Biopython-dev at mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython-dev


_______________________________________________
Biopython-dev mailing list
Biopython-dev at mailman.open-bio.org<mailto:Biopython-dev at mailman.open-bio.org>
http://mailman.open-bio.org/mailman/listinfo/biopython-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20150909/acb3e5b9/attachment-0001.html>


More information about the Biopython-dev mailing list