[Biopython] Access Entrez gene DB using rettype 'gb'

Peter biopython at maubp.freeserve.co.uk
Fri Dec 3 20:32:21 UTC 2010


On Fri, Dec 3, 2010 at 7:38 PM, David Jacobs wrote:
> Thanks for the info. There seems to be a huge disconnect between what I want
> to do and what this library is letting me do. It seems like there should be
> a really simple way to look up bacterial gene sequences by their names, and
> it's disappointing that that's not the case.
>
> Every workaround I've tried has also failed.
>
> For example, I've downloaded the full CT18 genome from the FTP server and
> parsed it using SeqIO. The problem is that SeqRecord doesn't give me an
> accessor to the "name" attribute of the sequence, as it would appear in the
> gene database.

You'll have to give me more to go on - what did you download by FTP,
a FASTA file, GenBank? How about giving the URL and an example
of the "name" you want to use.

> What's more, if I search the gene database for a name, I do,
> in fact, get an ID back. But that ID has no information about the start and
> stop indices for my sequence, so I can't use that information in conjunction
> with my downloaded genome.

Have you looked at EInfo? It is for cross referencing between the different
Entrez databases.

> Further still, if I try to query the gene
> database for my gene's full information (using the ID that I grabbed from
> esearch(db=gene ...)), I get back data formatted in a way that BioPython
> can't parse.

Are you talking about using EFetch here? Which database? The valid
combinations of retmode and rettype change according to this. See e.g.:
http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html

>
> This is a touch aggravating.
>
> What am I missing?
>

The NCBI Entrez documentation is definitely sparse :(

If all you want to do is get the nucleotide sequence for bacterial
genes then I do suspect working with the FASTA or GenBank files
would be easier than using Entrez (as Brad suggested earlier).

Can you give a specific example - couple of gene names you want,
and desired answer (the sequence want to find for them)? Sean did
ask earlier - this really would and we'd be better able to help you.

Peter



More information about the Biopython mailing list