[Biopython] Issues parsing genbank files

David Martin (Staff) d.m.a.martin at dundee.ac.uk
Tue Oct 3 19:10:12 UTC 2017


Hi Jocelyn,

This is the example given in the Biopython tutorial, with just the id changed. If it is not returning the correct output (a seq_record with a sequence and sufficient features (around 1403)) then it is a Biopython issue.


Downloading the entry as Genbank (full) allows it to be opened properly. It is therefore a tutorial issue as the commands given are not retrieving the record properly.


..d


Dr David Martin
Senior Lecturer in Bioinformatics
College of Life Sciences
University of Dundee



________________________________
From: Jocelyne <jocelyne at gmail.com>
Sent: 03 October 2017 19:57
To: David Martin (Staff)
Cc: biopython at lists.open-bio.org
Subject: Re: [Biopython] Issues parsing genbank files

Hi David:
I think if you are searching by id, you should only get 1 record.
The questions you are asking sound to me like Entrez / NCBI databases questions, not necessarily Biopython questions. Unless someone else has time to dive into your specific example, I suggest you look at this documentation:
https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/
MLA CE Course Manual: Molecular Biology Information ...<https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/>
www.ncbi.nlm.nih.gov
insert the description to be displayed by the search engine. Also searched by the search engine.



https://www.ncbi.nlm.nih.gov/books/NBK25501/
Jocelyne





On Tue, Oct 3, 2017 at 2:39 AM, David Martin (Staff) <d.m.a.martin at dundee.ac.uk<mailto:d.m.a.martin at dundee.ac.uk>> wrote:
Hi folks,

I’m trying to parse some bacterial genomes. I’ve lifted the following code from the biopython tutorial but it seems to be giving issues.

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "A.N.Other at example.com<mailto:A.N.Other at example.com>"
with Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="nc_003197") as handle:
    seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for "genbank"
print("%s with %i features" % (seq_record.id<http://seq_record.id>, len(seq_record.features)))

I get one feature instead of the thousands expected.

Trying to extract a single gene I get a run of NN instead of sequence.

Thoughts: This is initially retrieved as a set of annotations but no sequence. Is there a way to ensure entrez retrieves the full data?

..d
<//sig/>


[University of Dundee shield logo]<http://uod.ac.uk/sig-home>





Dr David M A Martin PhD FRSB
Senior Lecturer in Bioinformatics
School of Life Sciences, University of Dundee
+44(0)1382 388704<tel:+44%201382%20388704> | d.m.a.martin at dundee.ac.uk<mailto:d.m.a.martin at dundee.ac.uk@dundee.ac.uk>


[University of Dundee Facebook]<http://uod.ac.uk/sig-fb>[University of Dundee Twitter]<http://uod.ac.uk/sig-tw>[University of Dundee LinkedIn]<http://uod.ac.uk/sig-li>[University of Dundee YouTube]<http://uod.ac.uk/sig-yt>[University of Dundee Instagram]<http://uod.ac.uk/sig-ig>[University of Dundee Snapchat]<http://uod.ac.uk/sig-sc>

We're Scottish University of the Year again!<http://uod.ac.uk/sig-strapline>
The Times / Sunday Times Good University Guide 2016 and 2017



The University of Dundee is a registered Scottish Charity, No: SC015096

_______________________________________________
Biopython mailing list  -  Biopython at mailman.open-bio.org<mailto:Biopython at mailman.open-bio.org>
http://mailman.open-bio.org/mailman/listinfo/biopython


The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20171003/4913f1f7/attachment-0001.html>


More information about the Biopython mailing list