[Biopython] Dealing with Non-RefSeq IDs / InParanoid

Matthew Strand stran104 at chapman.edu
Sun Jun 21 01:54:42 UTC 2009


Hello BioPython users,
I am in the process of building lists of orthologous protein sequences
between several species. InParanoid <inparanoid.sbc.su.se> provides
excellent ortholog detection using a clustering algorithm. The website
prefers to receive queries and report results using what I assume to be the
ID assigned by the original publishing database. (e.g. Flybase FBpp0073215
instead of RefSeq NP_523929). They also provide alternative IDs when
possible, but this is not entirely comprehensive.


I have 3 questions:
1. Has anyone had success using BioPython with InParanoid? Perhaps someone
has a nice wrapper class to share? :-)

2. Can you convert from RefSeq --> Publishing database ID (FlyBase,
WormBase, Ensembl). Sometimes the original ID is avaliable in the /db_xref
section of an Entrez report, but not always.

3. Is there a way to retreive a sequence given an ID from the original
database without writing wrappers for every database?
(e.g. WormBase CE23997, FlyBase FBpp0149695, Ensembl ENSCINP00000014675)


Any information would be appreciated.

Many thanks,
Matthew Strand
Chapman University



More information about the Biopython mailing list