[Biopython] Issues parsing genbank files

Kevin Bonham kevbonham at gmail.com
Tue Oct 3 21:36:33 UTC 2017


Hi David,

Unfortunately, it is an Entrez question rather than a biopython question. I
feel your pain, but blame NCBI for their often confusing choices and poor
documentation.

Check out this
<https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/?report=objectonly>…
What you want I think is db="nuccore" and rettype="gbwithparts":

> with Entrez.efetch(db="nuccore", rettype="gbwithparts", retmode="text", id="NC_003197") as handle:
>    seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for "genbank"
> print("%s with %i features" % (seq_record.id, len(seq_record.features)))

NC_003197.2 with 14045 features

Cheers!
Kevin
​


On Tue, Oct 3, 2017 at 4:30 PM David Martin (Staff) <
d.m.a.martin at dundee.ac.uk> wrote:

> Hi Jocelyne,
>
>
> Firstly apologies for missing the 'e' in your name before.
>
>
> The record being retrieved is a single sequence record - it is
> a bacterial chromosome. It should have many features, most corresponding to
> genes encoded within the chromosome.
>
>
> ..d
>
>
> Dr David Martin
> Senior Lecturer in Bioinformatics
> College of Life Sciences
> University of Dundee
>
>
>
> ------------------------------
> *From:* Jocelyne <jocelyne at gmail.com>
> *Sent:* 03 October 2017 19:57
> *To:* David Martin (Staff)
> *Cc:* biopython at lists.open-bio.org
> *Subject:* Re: [Biopython] Issues parsing genbank files
>
> Hi David:
> I think if you are searching by id, you should only get 1 record.
> The questions you are asking sound to me like Entrez / NCBI databases
> questions, not necessarily Biopython questions. Unless someone else has
> time to dive into your specific example, I suggest you look at this
> documentation:
> https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/
> MLA CE Course Manual: Molecular Biology Information ...
> <https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/>
> www.ncbi.nlm.nih.gov
> insert the description to be displayed by the search engine. Also searched
> by the search engine.
>
>
> https://www.ncbi.nlm.nih.gov/books/NBK25501/
> Jocelyne
>
>
>
>
>
> On Tue, Oct 3, 2017 at 2:39 AM, David Martin (Staff) <
> d.m.a.martin at dundee.ac.uk> wrote:
>
>> Hi folks,
>>
>>
>>
>> I’m trying to parse some bacterial genomes. I’ve lifted the following
>> code from the biopython tutorial but it seems to be giving issues.
>>
>>
>>
>> from Bio import Entrez
>>
>> from Bio import SeqIO
>>
>> Entrez.email = "A.N.Other at example.com"
>>
>> with Entrez.efetch(db="nucleotide", rettype="gb", retmode="text",
>> id="nc_003197") as handle:
>>
>>     seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for
>> "genbank"
>>
>> print("%s with %i features" % (seq_record.id, len(seq_record.features)))
>>
>>
>>
>> I get one feature instead of the thousands expected.
>>
>>
>>
>> Trying to extract a single gene I get a run of NN instead of sequence.
>>
>>
>>
>> Thoughts: This is initially retrieved as a set of annotations but no
>> sequence. Is there a way to ensure entrez retrieves the full data?
>>
>>
>>
>> ..d
>>
>> [image: Email signature] <//sig/>
>>
>>
>>
>> [image: University of Dundee shield logo] <http://uod.ac.uk/sig-home>
>>
>>
>>
>>
>>
>> *Dr David M A Martin PhD FRSB*
>> Senior Lecturer in Bioinformatics
>> School of Life Sciences, University of Dundee
>> +44(0)1382 388704 <+44%201382%20388704> | d.m.a.martin at dundee.ac.uk
>> <d.m.a.martin at dundee.ac.uk@dundee.ac.uk>
>>
>> [image: University of Dundee Facebook] <http://uod.ac.uk/sig-fb>[image:
>> University of Dundee Twitter] <http://uod.ac.uk/sig-tw>[image:
>> University of Dundee LinkedIn] <http://uod.ac.uk/sig-li>[image:
>> University of Dundee YouTube] <http://uod.ac.uk/sig-yt>[image:
>> University of Dundee Instagram] <http://uod.ac.uk/sig-ig>[image:
>> University of Dundee Snapchat] <http://uod.ac.uk/sig-sc>
>>
>> *We're Scottish University of the Year again!*
>> <http://uod.ac.uk/sig-strapline>
>> The Times / Sunday Times Good University Guide 2016 and 2017
>>
>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20171003/8df213d5/attachment-0001.html>


More information about the Biopython mailing list