[Biopython] Issues parsing genbank files

David Martin (Staff) d.m.a.martin at dundee.ac.uk
Tue Oct 3 20:56:58 UTC 2017


OK, I've had a play. The return type needs to be gbwithparts


Perhaps the tutorial could be updated to reflect this?


from Bio import Entrez
from Bio import SeqIO

Entrez.email = "A.N.Other at example.com"

with Entrez.efetch(db="nucleotide", rettype="gbwithparts", retmode="text", id="nc_003197") as handle:

    seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for "genbank"

print("%s with %i features" % (seq_record.id, len(seq_record.features)))






Dr David Martin
Senior Lecturer in Bioinformatics
College of Life Sciences
University of Dundee



________________________________
From: Jocelyne <jocelyne at gmail.com>
Sent: 03 October 2017 21:33
To: David Martin (Staff)
Cc: biopython at lists.open-bio.org
Subject: Re: [Biopython] Issues parsing genbank files

Hi David:

All biopython does is call the EUtilities interface. The links I gave you earlier should be a good starting point on how to use eUtilities to create the correct query.

Jocelyne

On Tue, Oct 3, 2017 at 1:27 PM, David Martin (Staff) <d.m.a.martin at dundee.ac.uk<mailto:d.m.a.martin at dundee.ac.uk>> wrote:

If you put the accession into the NCBI website then the standard Genbank file is the one you receive as with the query you used. However, the full record is in the Genbank (full) view.


The question then is what is the correct syntax to use with the Entrez.fetch( ) command to retrieve the full record, and the note that the example given in the tutorial will not retrieve the full record.


..d


Dr David Martin
Senior Lecturer in Bioinformatics
College of Life Sciences
University of Dundee



________________________________
From: Jocelyne <jocelyne at gmail.com<mailto:jocelyne at gmail.com>>
Sent: 03 October 2017 21:23

To: David Martin (Staff)
Cc: biopython at lists.open-bio.org<mailto:biopython at lists.open-bio.org>
Subject: Re: [Biopython] Issues parsing genbank files

Hi David:
If you are sure it's an issue, you should file an issue on the github project so that a contributor can take a look. Peter Cock is usually very responsive.

However, I submitted your query to entrez:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=nc_003197&rettype=gb&retmode=text
and attached the file I got.
I only got 1 feature.

I believe genes are in a different database (the 'gene' database) and you'll have to do the proper querying through eutils.

I'm not a developer on Biopython, and I didn't look into your issue closely so I could be wrong. Just trying to give you pointers.

Jocelyne


On Tue, Oct 3, 2017 at 12:12 PM, David Martin (Staff) <d.m.a.martin at dundee.ac.uk<mailto:d.m.a.martin at dundee.ac.uk>> wrote:

Hi Jocelyne,


Firstly apologies for missing the 'e' in your name before.


The record being retrieved is a single sequence record - it is  a bacterial chromosome. It should have many features, most corresponding to genes encoded within the chromosome.


..d


Dr David Martin
Senior Lecturer in Bioinformatics
College of Life Sciences
University of Dundee



________________________________
From: Jocelyne <jocelyne at gmail.com<mailto:jocelyne at gmail.com>>
Sent: 03 October 2017 19:57
To: David Martin (Staff)
Cc: biopython at lists.open-bio.org<mailto:biopython at lists.open-bio.org>
Subject: Re: [Biopython] Issues parsing genbank files

Hi David:
I think if you are searching by id, you should only get 1 record.
The questions you are asking sound to me like Entrez / NCBI databases questions, not necessarily Biopython questions. Unless someone else has time to dive into your specific example, I suggest you look at this documentation:
https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/
MLA CE Course Manual: Molecular Biology Information ...<https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/>
www.ncbi.nlm.nih.gov<http://www.ncbi.nlm.nih.gov>
insert the description to be displayed by the search engine. Also searched by the search engine.



https://www.ncbi.nlm.nih.gov/books/NBK25501/
Jocelyne





On Tue, Oct 3, 2017 at 2:39 AM, David Martin (Staff) <d.m.a.martin at dundee.ac.uk<mailto:d.m.a.martin at dundee.ac.uk>> wrote:
Hi folks,

I’m trying to parse some bacterial genomes. I’ve lifted the following code from the biopython tutorial but it seems to be giving issues.

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "A.N.Other at example.com<mailto:A.N.Other at example.com>"
with Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="nc_003197") as handle:
    seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for "genbank"
print("%s with %i features" % (seq_record.id<http://seq_record.id>, len(seq_record.features)))

I get one feature instead of the thousands expected.

Trying to extract a single gene I get a run of NN instead of sequence.

Thoughts: This is initially retrieved as a set of annotations but no sequence. Is there a way to ensure entrez retrieves the full data?

..d
<//sig/>


[University of Dundee shield logo]<http://uod.ac.uk/sig-home>





Dr David M A Martin PhD FRSB
Senior Lecturer in Bioinformatics
School of Life Sciences, University of Dundee
+44(0)1382 388704<tel:+44%201382%20388704> | d.m.a.martin at dundee.ac.uk<mailto:d.m.a.martin at dundee.ac.uk@dundee.ac.uk>


[University of Dundee Facebook]<http://uod.ac.uk/sig-fb>[University of Dundee Twitter]<http://uod.ac.uk/sig-tw>[University of Dundee LinkedIn]<http://uod.ac.uk/sig-li>[University of Dundee YouTube]<http://uod.ac.uk/sig-yt>[University of Dundee Instagram]<http://uod.ac.uk/sig-ig>[University of Dundee Snapchat]<http://uod.ac.uk/sig-sc>

We're Scottish University of the Year again!<http://uod.ac.uk/sig-strapline>
The Times / Sunday Times Good University Guide 2016 and 2017



The University of Dundee is a registered Scottish Charity, No: SC015096

_______________________________________________
Biopython mailing list  -  Biopython at mailman.open-bio.org<mailto:Biopython at mailman.open-bio.org>
http://mailman.open-bio.org/mailman/listinfo/biopython


The University of Dundee is a registered Scottish Charity, No: SC015096


The University of Dundee is a registered Scottish Charity, No: SC015096


The University of Dundee is a registered Scottish Charity, No: SC015096
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20171003/c1ad7fca/attachment-0001.html>


More information about the Biopython mailing list