[Biopython] Issues parsing genbank files

Jocelyne jocelyne at gmail.com
Tue Oct 3 20:33:31 UTC 2017


Hi David:

All biopython does is call the EUtilities interface. The links I gave you
earlier should be a good starting point on how to use eUtilities to create
the correct query.

Jocelyne

On Tue, Oct 3, 2017 at 1:27 PM, David Martin (Staff) <
d.m.a.martin at dundee.ac.uk> wrote:

> If you put the accession into the NCBI website then the standard Genbank
> file is the one you receive as with the query you used. However, the full
> record is in the Genbank (full) view.
>
>
> The question then is what is the correct syntax to use with the
> Entrez.fetch( ) command to retrieve the full record, and the note that the
> example given in the tutorial will not retrieve the full record.
>
>
> ..d
>
>
> Dr David Martin
> Senior Lecturer in Bioinformatics
> College of Life Sciences
> University of Dundee
>
>
>
> ------------------------------
> *From:* Jocelyne <jocelyne at gmail.com>
> *Sent:* 03 October 2017 21:23
>
> *To:* David Martin (Staff)
> *Cc:* biopython at lists.open-bio.org
> *Subject:* Re: [Biopython] Issues parsing genbank files
>
> Hi David:
> If you are sure it's an issue, you should file an issue on the github
> project so that a contributor can take a look. Peter Cock is usually very
> responsive.
>
> However, I submitted your query to entrez:
> https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
> db=nucleotide&id=nc_003197&rettype=gb&retmode=text
> and attached the file I got.
> I only got 1 feature.
>
> I believe genes are in a different database (the 'gene' database) and
> you'll have to do the proper querying through eutils.
>
> I'm not a developer on Biopython, and I didn't look into your issue
> closely so I could be wrong. Just trying to give you pointers.
>
> Jocelyne
>
>
> On Tue, Oct 3, 2017 at 12:12 PM, David Martin (Staff) <
> d.m.a.martin at dundee.ac.uk> wrote:
>
>> Hi Jocelyne,
>>
>>
>> Firstly apologies for missing the 'e' in your name before.
>>
>>
>> The record being retrieved is a single sequence record - it is
>> a bacterial chromosome. It should have many features, most corresponding to
>> genes encoded within the chromosome.
>>
>>
>> ..d
>>
>>
>> Dr David Martin
>> Senior Lecturer in Bioinformatics
>> College of Life Sciences
>> University of Dundee
>>
>>
>>
>> ------------------------------
>> *From:* Jocelyne <jocelyne at gmail.com>
>> *Sent:* 03 October 2017 19:57
>> *To:* David Martin (Staff)
>> *Cc:* biopython at lists.open-bio.org
>> *Subject:* Re: [Biopython] Issues parsing genbank files
>>
>> Hi David:
>> I think if you are searching by id, you should only get 1 record.
>> The questions you are asking sound to me like Entrez / NCBI databases
>> questions, not necessarily Biopython questions. Unless someone else has
>> time to dive into your specific example, I suggest you look at this
>> documentation:
>> https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/
>> MLA CE Course Manual: Molecular Biology Information ...
>> <https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/>
>> www.ncbi.nlm.nih.gov
>> insert the description to be displayed by the search engine. Also
>> searched by the search engine.
>>
>>
>> https://www.ncbi.nlm.nih.gov/books/NBK25501/
>> Jocelyne
>>
>>
>>
>>
>>
>> On Tue, Oct 3, 2017 at 2:39 AM, David Martin (Staff) <
>> d.m.a.martin at dundee.ac.uk> wrote:
>>
>>> Hi folks,
>>>
>>>
>>>
>>> I’m trying to parse some bacterial genomes. I’ve lifted the following
>>> code from the biopython tutorial but it seems to be giving issues.
>>>
>>>
>>>
>>> from Bio import Entrez
>>>
>>> from Bio import SeqIO
>>>
>>> Entrez.email = "A.N.Other at example.com"
>>>
>>> with Entrez.efetch(db="nucleotide", rettype="gb", retmode="text",
>>> id="nc_003197") as handle:
>>>
>>>     seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for
>>> "genbank"
>>>
>>> print("%s with %i features" % (seq_record.id, len(seq_record.features)))
>>>
>>>
>>>
>>> I get one feature instead of the thousands expected.
>>>
>>>
>>>
>>> Trying to extract a single gene I get a run of NN instead of sequence.
>>>
>>>
>>>
>>> Thoughts: This is initially retrieved as a set of annotations but no
>>> sequence. Is there a way to ensure entrez retrieves the full data?
>>>
>>>
>>>
>>> ..d
>>>
>>> [image: Email signature] <//sig/>
>>>
>>>
>>>
>>> [image: University of Dundee shield logo] <http://uod.ac.uk/sig-home>
>>>
>>>
>>>
>>>
>>>
>>> *Dr David M A Martin PhD FRSB*
>>> Senior Lecturer in Bioinformatics
>>> School of Life Sciences, University of Dundee
>>> +44(0)1382 388704 <+44%201382%20388704> | d.m.a.martin at dundee.ac.uk
>>> <d.m.a.martin at dundee.ac.uk@dundee.ac.uk>
>>>
>>> [image: University of Dundee Facebook] <http://uod.ac.uk/sig-fb>[image:
>>> University of Dundee Twitter] <http://uod.ac.uk/sig-tw>[image:
>>> University of Dundee LinkedIn] <http://uod.ac.uk/sig-li>[image:
>>> University of Dundee YouTube] <http://uod.ac.uk/sig-yt>[image:
>>> University of Dundee Instagram] <http://uod.ac.uk/sig-ig>[image:
>>> University of Dundee Snapchat] <http://uod.ac.uk/sig-sc>
>>>
>>> *We're Scottish University of the Year again!*
>>> <http://uod.ac.uk/sig-strapline>
>>> The Times / Sunday Times Good University Guide 2016 and 2017
>>>
>>>
>>>
>>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>>
>>> _______________________________________________
>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>
>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20171003/5e5884c7/attachment-0001.html>


More information about the Biopython mailing list