[Biopython] IPI fetching
Peter
biopython at maubp.freeserve.co.uk
Tue Sep 8 09:41:40 EDT 2009
On Tue, Sep 8, 2009 at 1:39 PM, Yvan Strahm<yvan.strahm at bccs.uib.no> wrote:
>>
>> Can you give us a specific example of an IPI number and the FASTA
>> record you want back?
>
> IPI00109764
>
>> ipi|IPI00109764|IPI00109764.2 DNA TOPOISOMERASE 1.
> MSGDHLHNDSQIEADFRLNDSHKHKDKHKD...YEF
>
> This particular entry has this Uniprot accession number:Q04750
So if you can work out the uniprot accession number, then you can use
the Bio.ExPASy.get_sprot_raw() function to download the file in the
SwissProt/UniProt plain text format, e.g.
>>> from Bio import ExPASy
>>> from Bio import SeqIO
>>> record = SeqIO.read(ExPASy.get_sprot_raw("Q04750"), "swiss")
>>> print record.format("fasta")
>Q04750 RecName: Full=DNA topoisomerase 1; EC=5.99.1.2; AltName: Full=DNA topoisomerase I;
MSGDHLHNDSQIEADFRLNDSHKHKDKHKD...YEF
It looks like you should be able to get the sequence directly from
the EBI via the International Protein Index (IPI) identifier, IPI00109764
http://www.ebi.ac.uk/IPI/IPIhelp.html
As per that old thread you referenced, Biopython should be able
to parse the "swiss" output from IPI. How about a quick and dirty
URL hack to access the EBI's SRS?
>>> import urllib
>>> from Bio import SeqIO
>>> ipi = "IPI00109764"
>>> url = "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[IPI-acc:%s]+-ascii" % ipi
>>> record = SeqIO.read(urllib.urlopen(url), "swiss")
>>> print record.format("fasta")
>IPI00109764 DNA TOPOISOMERASE 1.
MSGDHLHNDSQIEADFRLNDSHKHKDKHKDRE...YEF
Done? With a little tweaking to the URL you can download this directly as
FASTA if you like (saves some bandwidth).
Peter
More information about the Biopython
mailing list