[Bioperl-l] Uniprot/Swiss accessions?
Smithies, Russell
Russell.Smithies at agresearch.co.nz
Mon May 18 21:52:31 UTC 2009
Hi guys,
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and created lists of swissprot IDs for both the file from NCBI and the file from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none could be found in the "protein" database at NCBI but all were in the "gene" database as "reference sequences that belong to a specific genome build" and all belonged to recently sequenced bacterial genomes. As none are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random sample again) described as "putative protein" (or "very putative protein" in one case) and are the result of gene predictions. Eg http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids unique to Uniprot and assign them fake GI numbers so I can formatdb them with the " -o T" option.
Thanx again for your help,
Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
Puddle Alley,
Mosgiel,
New Zealand
T +64 3 489 3809
F +64 3 489 9174
www.agresearch.co.nz
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================
More information about the Bioperl-l
mailing list