[Biopython] IPI fetching

Peter biopython at maubp.freeserve.co.uk
Tue Sep 8 09:41:40 EDT 2009

On Tue, Sep 8, 2009 at 1:39 PM, Yvan Strahm<yvan.strahm at bccs.uib.no> wrote:
>> Can you give us a specific example of an IPI number and the FASTA
>> record you want back?
> IPI00109764
>> ipi|IPI00109764|IPI00109764.2 DNA TOPOISOMERASE 1.
> This particular entry has this Uniprot accession number:Q04750

So if you can work out the uniprot accession number, then you can use
the Bio.ExPASy.get_sprot_raw() function to download the file in the
SwissProt/UniProt plain text format, e.g.

>>> from Bio import ExPASy
>>> from Bio import SeqIO
>>> record = SeqIO.read(ExPASy.get_sprot_raw("Q04750"), "swiss")
>>> print record.format("fasta")
>Q04750 RecName: Full=DNA topoisomerase 1; EC=; AltName: Full=DNA topoisomerase I;

It looks like you should be able to get the sequence directly from
the EBI via the International Protein Index (IPI) identifier, IPI00109764

As per that old thread you referenced, Biopython should be able
to parse the "swiss" output from IPI. How about a quick and dirty
URL hack to access the EBI's SRS?

>>> import urllib
>>> from Bio import SeqIO
>>> ipi = "IPI00109764"
>>> url = "http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-e+[IPI-acc:%s]+-ascii" % ipi
>>> record = SeqIO.read(urllib.urlopen(url), "swiss")
>>> print record.format("fasta")

Done? With a little tweaking to the URL you can download this directly as
FASTA if you like (saves some bandwidth).


More information about the Biopython mailing list