[Biopython] Issues parsing genbank files

Jocelyne jocelyne at gmail.com
Tue Oct 3 20:23:38 UTC 2017


Hi David:
If you are sure it's an issue, you should file an issue on the github
project so that a contributor can take a look. Peter Cock is usually very
responsive.

However, I submitted your query to entrez:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=nc_003197&rettype=gb&retmode=text
and attached the file I got.
I only got 1 feature.

I believe genes are in a different database (the 'gene' database) and
you'll have to do the proper querying through eutils.

I'm not a developer on Biopython, and I didn't look into your issue closely
so I could be wrong. Just trying to give you pointers.

Jocelyne


On Tue, Oct 3, 2017 at 12:12 PM, David Martin (Staff) <
d.m.a.martin at dundee.ac.uk> wrote:

> Hi Jocelyne,
>
>
> Firstly apologies for missing the 'e' in your name before.
>
>
> The record being retrieved is a single sequence record - it is
> a bacterial chromosome. It should have many features, most corresponding to
> genes encoded within the chromosome.
>
>
> ..d
>
>
> Dr David Martin
> Senior Lecturer in Bioinformatics
> College of Life Sciences
> University of Dundee
>
>
>
> ------------------------------
> *From:* Jocelyne <jocelyne at gmail.com>
> *Sent:* 03 October 2017 19:57
> *To:* David Martin (Staff)
> *Cc:* biopython at lists.open-bio.org
> *Subject:* Re: [Biopython] Issues parsing genbank files
>
> Hi David:
> I think if you are searching by id, you should only get 1 record.
> The questions you are asking sound to me like Entrez / NCBI databases
> questions, not necessarily Biopython questions. Unless someone else has
> time to dive into your specific example, I suggest you look at this
> documentation:
> https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/
> MLA CE Course Manual: Molecular Biology Information ...
> <https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/>
> www.ncbi.nlm.nih.gov
> insert the description to be displayed by the search engine. Also searched
> by the search engine.
>
>
> https://www.ncbi.nlm.nih.gov/books/NBK25501/
> Jocelyne
>
>
>
>
>
> On Tue, Oct 3, 2017 at 2:39 AM, David Martin (Staff) <
> d.m.a.martin at dundee.ac.uk> wrote:
>
>> Hi folks,
>>
>>
>>
>> I’m trying to parse some bacterial genomes. I’ve lifted the following
>> code from the biopython tutorial but it seems to be giving issues.
>>
>>
>>
>> from Bio import Entrez
>>
>> from Bio import SeqIO
>>
>> Entrez.email = "A.N.Other at example.com"
>>
>> with Entrez.efetch(db="nucleotide", rettype="gb", retmode="text",
>> id="nc_003197") as handle:
>>
>>     seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for
>> "genbank"
>>
>> print("%s with %i features" % (seq_record.id, len(seq_record.features)))
>>
>>
>>
>> I get one feature instead of the thousands expected.
>>
>>
>>
>> Trying to extract a single gene I get a run of NN instead of sequence.
>>
>>
>>
>> Thoughts: This is initially retrieved as a set of annotations but no
>> sequence. Is there a way to ensure entrez retrieves the full data?
>>
>>
>>
>> ..d
>>
>> [image: Email signature] <//sig/>
>>
>>
>>
>> [image: University of Dundee shield logo] <http://uod.ac.uk/sig-home>
>>
>>
>>
>>
>>
>> *Dr David M A Martin PhD FRSB*
>> Senior Lecturer in Bioinformatics
>> School of Life Sciences, University of Dundee
>> +44(0)1382 388704 <+44%201382%20388704> | d.m.a.martin at dundee.ac.uk
>> <d.m.a.martin at dundee.ac.uk@dundee.ac.uk>
>>
>> [image: University of Dundee Facebook] <http://uod.ac.uk/sig-fb>[image:
>> University of Dundee Twitter] <http://uod.ac.uk/sig-tw>[image:
>> University of Dundee LinkedIn] <http://uod.ac.uk/sig-li>[image:
>> University of Dundee YouTube] <http://uod.ac.uk/sig-yt>[image:
>> University of Dundee Instagram] <http://uod.ac.uk/sig-ig>[image:
>> University of Dundee Snapchat] <http://uod.ac.uk/sig-sc>
>>
>> *We're Scottish University of the Year again!*
>> <http://uod.ac.uk/sig-strapline>
>> The Times / Sunday Times Good University Guide 2016 and 2017
>>
>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20171003/d5cb1c2e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sequence.gb
Type: application/octet-stream
Size: 4523 bytes
Desc: not available
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20171003/d5cb1c2e/attachment-0001.obj>


More information about the Biopython mailing list