[Bioperl-l] Uniprot/Swiss accessions?
bill at genenformics.com
bill at genenformics.com
Mon May 18 20:19:53 EDT 2009
Hi, Smithies,
Using an integral local id should work as well.
A define will look like '>lcl|12345 ...'
Bill
> Hi guys,
> Thanx for your suggestions.
>
> With the magic of awk and comm, I split the amalgamated accessions and
> created lists of swissprot IDs for both the file from NCBI and the file
> from Uniprot.
>
> sp_ncbi_accessions.txt 458,377 ids
> sp_uniprot_accessions.txt 466,739 ids
>
> * The NCBI file has 95 ids that don't appear in the Uniprot list
> * The Uniprot file has 8,457 ids that don't appear in the NCBI list
> * There are 458,282 ids that appear on both lists.
>
> I did a quick random sample of the 8,457 ids unique to Uniprot and none
> could be found in the "protein" database at NCBI but all were in the
> "gene" database as "reference sequences that belong to a specific genome
> build" and all belonged to recently sequenced bacterial genomes. As none
> are in the "protein" database, they don't have GI numbers.
>
> The 95 ids that were at NCBI but not in Uniprot were usually (random
> sample again) described as "putative protein" (or "very putative protein"
> in one case) and are the result of gene predictions. Eg
> http://www.ncbi.nlm.nih.gov/protein/48429254
>
>
> So what I'll do is use the NCBI database and add in the extra 8,457 ids
> unique to Uniprot and assign them fake GI numbers so I can formatdb them
> with the " -o T" option.
>
>
> Thanx again for your help,
>
>
>
> Russell Smithies
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E russell.smithies at agresearch.co.nz
> Invermay Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T +64 3 489 3809
> F +64 3 489 9174
> www.agresearch.co.nz
>
>
> Toitu te whenua, Toitu te tangata
> Sustain the land, Sustain the people
>
>
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list