[Biopython] Issues parsing genbank files

Patrick Kunzmann padix.kleber at gmail.com
Tue Oct 3 23:39:26 UTC 2017


Hi,
If you want to retrieve the list of features, I think you have to change
the 'db' parameter to 'nuccore'.

Best regards,
Patrick

Am 04.10.2017 1:25 vorm. schrieb "Jocelyne" <jocelyne at gmail.com>:

> Hi David:
>
> All biopython does is call the EUtilities interface. The links I gave you
> earlier should be a good starting point on how to use eUtilities to create
> the correct query.
>
> Jocelyne
>
> On Tue, Oct 3, 2017 at 1:27 PM, David Martin (Staff) <
> d.m.a.martin at dundee.ac.uk> wrote:
>
>> If you put the accession into the NCBI website then the standard Genbank
>> file is the one you receive as with the query you used. However, the full
>> record is in the Genbank (full) view.
>>
>>
>> The question then is what is the correct syntax to use with the
>> Entrez.fetch( ) command to retrieve the full record, and the note that the
>> example given in the tutorial will not retrieve the full record.
>>
>>
>> ..d
>>
>>
>> Dr David Martin
>> Senior Lecturer in Bioinformatics
>> College of Life Sciences
>> University of Dundee
>>
>>
>>
>> ------------------------------
>> *From:* Jocelyne <jocelyne at gmail.com>
>> *Sent:* 03 October 2017 21:23
>>
>> *To:* David Martin (Staff)
>> *Cc:* biopython at lists.open-bio.org
>> *Subject:* Re: [Biopython] Issues parsing genbank files
>>
>> Hi David:
>> If you are sure it's an issue, you should file an issue on the github
>> project so that a contributor can take a look. Peter Cock is usually very
>> responsive.
>>
>> However, I submitted your query to entrez:
>> https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db
>> =nucleotide&id=nc_003197&rettype=gb&retmode=text
>> and attached the file I got.
>> I only got 1 feature.
>>
>> I believe genes are in a different database (the 'gene' database) and
>> you'll have to do the proper querying through eutils.
>>
>> I'm not a developer on Biopython, and I didn't look into your issue
>> closely so I could be wrong. Just trying to give you pointers.
>>
>> Jocelyne
>>
>>
>> On Tue, Oct 3, 2017 at 12:12 PM, David Martin (Staff) <
>> d.m.a.martin at dundee.ac.uk> wrote:
>>
>>> Hi Jocelyne,
>>>
>>>
>>> Firstly apologies for missing the 'e' in your name before.
>>>
>>>
>>> The record being retrieved is a single sequence record - it is
>>> a bacterial chromosome. It should have many features, most corresponding to
>>> genes encoded within the chromosome.
>>>
>>>
>>> ..d
>>>
>>>
>>> Dr David Martin
>>> Senior Lecturer in Bioinformatics
>>> College of Life Sciences
>>> University of Dundee
>>>
>>>
>>>
>>> ------------------------------
>>> *From:* Jocelyne <jocelyne at gmail.com>
>>> *Sent:* 03 October 2017 19:57
>>> *To:* David Martin (Staff)
>>> *Cc:* biopython at lists.open-bio.org
>>> *Subject:* Re: [Biopython] Issues parsing genbank files
>>>
>>> Hi David:
>>> I think if you are searching by id, you should only get 1 record.
>>> The questions you are asking sound to me like Entrez / NCBI databases
>>> questions, not necessarily Biopython questions. Unless someone else has
>>> time to dive into your specific example, I suggest you look at this
>>> documentation:
>>> https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/
>>> MLA CE Course Manual: Molecular Biology Information ...
>>> <https://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Entrez/>
>>> www.ncbi.nlm.nih.gov
>>> insert the description to be displayed by the search engine. Also
>>> searched by the search engine.
>>>
>>>
>>> https://www.ncbi.nlm.nih.gov/books/NBK25501/
>>> Jocelyne
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Oct 3, 2017 at 2:39 AM, David Martin (Staff) <
>>> d.m.a.martin at dundee.ac.uk> wrote:
>>>
>>>> Hi folks,
>>>>
>>>>
>>>>
>>>> I’m trying to parse some bacterial genomes. I’ve lifted the following
>>>> code from the biopython tutorial but it seems to be giving issues.
>>>>
>>>>
>>>>
>>>> from Bio import Entrez
>>>>
>>>> from Bio import SeqIO
>>>>
>>>> Entrez.email = "A.N.Other at example.com"
>>>>
>>>> with Entrez.efetch(db="nucleotide", rettype="gb", retmode="text",
>>>> id="nc_003197") as handle:
>>>>
>>>>     seq_record = SeqIO.read(handle, "gb") #using "gb" as an alias for
>>>> "genbank"
>>>>
>>>> print("%s with %i features" % (seq_record.id,
>>>> len(seq_record.features)))
>>>>
>>>>
>>>>
>>>> I get one feature instead of the thousands expected.
>>>>
>>>>
>>>>
>>>> Trying to extract a single gene I get a run of NN instead of sequence.
>>>>
>>>>
>>>>
>>>> Thoughts: This is initially retrieved as a set of annotations but no
>>>> sequence. Is there a way to ensure entrez retrieves the full data?
>>>>
>>>>
>>>>
>>>> ..d
>>>>
>>>> [image: Email signature] <//sig/>
>>>>
>>>>
>>>>
>>>> [image: University of Dundee shield logo] <http://uod.ac.uk/sig-home>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Dr David M A Martin PhD FRSB*
>>>> Senior Lecturer in Bioinformatics
>>>> School of Life Sciences, University of Dundee
>>>> +44(0)1382 388704 <+44%201382%20388704> | d.m.a.martin at dundee.ac.uk
>>>> <d.m.a.martin at dundee.ac.uk@dundee.ac.uk>
>>>>
>>>> [image: University of Dundee Facebook] <http://uod.ac.uk/sig-fb>[image:
>>>> University of Dundee Twitter] <http://uod.ac.uk/sig-tw>[image:
>>>> University of Dundee LinkedIn] <http://uod.ac.uk/sig-li>[image:
>>>> University of Dundee YouTube] <http://uod.ac.uk/sig-yt>[image:
>>>> University of Dundee Instagram] <http://uod.ac.uk/sig-ig>[image:
>>>> University of Dundee Snapchat] <http://uod.ac.uk/sig-sc>
>>>>
>>>> *We're Scottish University of the Year again!*
>>>> <http://uod.ac.uk/sig-strapline>
>>>> The Times / Sunday Times Good University Guide 2016 and 2017
>>>>
>>>>
>>>>
>>>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>>>
>>>> _______________________________________________
>>>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>>>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>>>
>>>
>>>
>>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>>
>>
>>
>> The University of Dundee is a registered Scottish Charity, No: SC015096
>>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20171004/a49edee4/attachment-0001.html>


More information about the Biopython mailing list