[Biopython] downloading gnome Protein table
Sheila the angel
from.d.putto at gmail.com
Thu Oct 27 10:47:04 UTC 2011
The problem is I have only the Refseq ID like NC_008390 and I don't have
Protein table ID (in this case CP000441.ptt) so I can't download the .ptt
file (as in ftp url
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Burkholderia_ambifaria_AMMD_uid13490/CP000441.ptt
)
Also not all Refseq IDs I have belongs to 'Bacteria'. So for ID
NC_004314 (just
an example) I have to change the ftp url as
ftp://ftp.ncbi.nih.gov/genomes/Protozoa/Plasmodium_falciparum/NC_004314.ptt
Downloading the *.gbk file may be an option (but later I need to convert
them into protein table) so I tried this
from Bio import Entrez
Entrez.email = "from.d.putto at gmail.com"
handle = Entrez.efetch(db="genome", id="NC_008390", rettype="gbk")
print handle.read()
The output shows me 'Nothing has been found'
I am not sure in which database I should look for id like NC_008390.
Moreover later-on I need to convert 'gbk' file to .ptt (or extract protein
information)
On Wed, Oct 26, 2011 at 5:27 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> On Wed, Oct 26, 2011 at 4:11 PM, Sheila the angel
> <from.d.putto at gmail.com> wrote:
> > Hi All,
> >
> > I an facing some problem to downloading the gnome and other information.
> > For an example I did a query on ncbi gnome for NC_008390
> > On clicking results you can get following link
> >
> >
> http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&Cmd=ShowDetailView&TermToSearch=19840
> > On my web-browser I can save this page as File> Save as >out.html
> >
> > Furthermore I want to download the Protein table also
> >
> http://www.ncbi.nlm.nih.gov/sites/entrez?Db=genome&Cmd=Retrieve&dopt=Protein+Table&list_uids=19840
> >
> > I want to do this for many Ids. Is there any simple way in Bio-Python???
> >
> > Thanks in Advance
>
> Hmm, some of that might be available by Bio.Entrez, not sure though.
>
> For the protein table I would personally work with the *.ptt files from
> the NCBI FTP site, e.g.
>
>
> ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Burkholderia_ambifaria_AMMD_uid13490/CP000441.ptt
>
> or:
>
>
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Burkholderia_ambifaria_AMMD_uid58303/NC_008391.ptt
>
> The FTP links are on the page of the first URL you gave. You can download
> all the "bacteria" *.ptt files as a tar ball,
>
> ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.ptt.tar.gz
>
> Typically I work from the GenBank file files instead (*.gbk rather than
> *.ptt)
>
> Peter
>
More information about the Biopython
mailing list