[Biopython] WP_XXXXXXX RefSeq records

Ivan Erill ivan.erill at gmail.com
Wed Oct 15 20:14:33 UTC 2014


For bacteria, NCBI RefSeq is progressively adopting the non-redundant
protein sequence standard. These protein records are aggregates for any
bacterial genes coding for the same exact coding region (a "detailed"
description of the implementation can be found here (
ftp://ftp.ncbi.nlm.nih.gov/refseq/release/announcements/WP-proteins-06.10.2013.pdf
).

Essentially, all traditional NP_XXXXXXX and YP_XXXXXXX protein records in
bacteria RefSeq now map to a unique WP_XXXXXXX record. While this is
possibly a good idea, it can create problems if one is trying to get back
to (at least one of) the nucleotide sequence coding for the WP_XXXXXXX
record.

For NP_XXXXXXX and YP_XXXXXXX records, one can easily fetch the protein
record with Entrez.efetch, and then use the GB "coded_by" qualifier to
access the corresponding nucleotide record.

Per specification of the new WP format
<ftp://ftp.ncbi.nlm.nih.gov/refseq/release/announcements/WP-proteins-06.10.2013.pdf>
:
- WP_ records will not include information about the corresponding
Nucleotide sequences
on the sequence record.
- WP_ records will have links to Nucleotide in the Related Information
section of the page display. Links in this section are available through
NCBI’s E-utilities API.

I have been trying, unsuccessfully, to access the nucleotide record for WP_
proteins using Entrez.elink.

Here is an example:

protein ID: WP_027579184
http://www.ncbi.nlm.nih.gov/protein/653545797

Clicking on "Genomic records" in "Related information" will bring up the
GenBank record containing the CDS for this protein, but I have been unable
to use Entrez.elink to get to the information on the "Related information"
panel. If I query:

handle = Entrez.elink(dbfrom="protein", id="653545797")

I essentially get an empty LinkSetDb. Any clues?

Ivan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20141015/5c451912/attachment.html>


More information about the Biopython mailing list