[Bioperl-l] Uniprot/Swiss accessions?

Smithies, Russell Russell.Smithies at agresearch.co.nz
Mon May 18 19:11:40 EDT 2009


As far as I can see, none of the fasta at ftp://ftp.uniprot.org/pub/databases/uniprot_datafiles_by_format/fasta/ will correctly formatdb with the "-o T" option. This is with the latest version of blast (2.2.20 [Feb-08-2009])
If you fomatdb uniprot_sprot.fasta or uniprot_trembl.fasta from the above link, they successfully create the required files but the blast result descriptions are truncated. 
NCBI say it's not their fault and EBI don't answer their email.

A quick hack of prepending fake GI numbers to each accession gets the files formatted correctly and allows sequence retrieval but it's not an ideal solution.


--Russell 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of granjeau at tagc.univ-mrs.fr
> Sent: Tuesday, 19 May 2009 9:39 a.m.
> To: "Cook, Malcolm "@tagc.univ-mrs.fr; " <mec at stowers.org>"@tagc.univ-mrs.fr
> Cc: 'BioPerl List'
> Subject: Re: [Bioperl-l] Uniprot/Swiss accessions?
> 
> May be you try the PICR service at EBI
> http://www.ebi.ac.uk/Tools/picr/
> or some other ID converter (as for example some Gene Ontology tools) or
> even SRS.
> 
> I think there could be more than one gi per sp (it's not clear to me if
> you are looking at SwissProt or UniProtKB, ie SP+TrEMBL).
> 
> Answer us your solution.
> 
> Regards,
> Samuel
> 
> > If you need to retain mapping between acc => gi it gets a little more
> > complicated; most procedures to NCBI return a 'bag' of gi's w/o any
> > relation to their original accession.  You can grab them via esummary,
> > though, but you'll have to iterate through them.
> >
> > The other option is LiveLists (has both nuc and protein acc => gi).
> > I'm assuming this would have the swissprot accessions included (famous
> > last words):
> >
> > ftp://ftp.ncbi.nih.gov/genbank/livelists/README.genbank.livelists
> >
> > chris
> >
> >
> >
> > On May 18, 2009, at 9:34 AM, Cook, Malcolm wrote:
> >
> >> you could:
> >>
> >> 1) Use eutils search with -database protein -term "srcdb swiss
> >> prot"[Properties]
> >>  If you use a retmax of 100000 it should only take a few seconds to
> >> download the 458,445 ginumbers.
> >>  I just did it.
> >>
> >> 2) use fastacmd to extract the fasta from nr for these gis, and
> >> parse the defline.
> >>  (assuming you have a copy of nr)
> >>
> >>
> >> Does this work for you?
> >>
> >>
> >> Malcolm Cook
> >> Stowers Institute for Medical Research - Kansas City, Missouri
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org
> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>> Smithies, Russell
> >>> Sent: Sunday, May 17, 2009 11:53 PM
> >>> To: 'BioPerl List'
> >>> Subject: [Bioperl-l] Uniprot/Swiss accessions?
> >>>
> >>> Does anyone know of a way to get GI numbers for
> >>> Uniprot/Swissprot accessions?
> >>>
> >>> Fasta from Uniprot's FTP site doesn't formatdb correctly
> >>> (with the -o T option) as it's missing the gi number in the
> >>> fasta header.
> >>> NCBI won't let you use SwissProt ids in batch-entrez and I
> >>> don't want to have to look up all 466,739 of them.
> >>> I could use Bio::DB::Eutilities and query each id but even at
> >>> 10 queries/second (the limit changed recently) it would take too
> >>> long.
> >>>
> >>> Any ideas?
> >>> Is there a swissprot2gi list somewhere?
> >>>
> >>> Thanx,
> >>>
> >>>
> >>> Russell Smithies
> >>>
> >>> Bioinformatics Applications Developer
> >>> T +64 3 489 9085
> >>> E  russell.smithies at agresearch.co.nz
> >>>
> >>> Invermay  Research Centre
> >>> Puddle Alley,
> >>> Mosgiel,
> >>> New Zealand
> >>> T  +64 3 489 3809
> >>> F  +64 3 489 9174
> >>> www.agresearch.co.nz
> >>>
> >>>
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Bioperl-l mailing list