[BioPython] Need help to get Fasta sequence of Gis !
Andrew Dalke
dalke at dalkescientific.com
Thu Apr 8 17:15:37 EDT 2004
Jonathan Boulais:
> Hi everyone !
> I'm a newbie to Biopython
Welcome!
> and I would like to get the fasta sequences of a huge list of Gis. Any
> suggestions ?
How huge? At some point it's better to just download GenBank and get
the
data straight from there.
If it's small enough (10,000 or fewer records?), then look at the
Bio.EUtils client.
>>> from Bio import EUtils
>>> from Bio.EUtils import ThinClient
>>> client = ThinClient.ThinClient()
>>> dbids = EUtils.DBIds("protein", ["914034", "5263173", "1769808",
"1060883"])
>>> f = client.efetch_using_dbids(dbids, retmode = "text", rettype =
"fasta")
>>> print f.read()
>gi|914034|gb|AAB32951.1| cruxrhodopsin-2 [Haloarcula]
MLQSGMSTYVPGGESIFLWVGTAGMFLGMLYFIARGWSVSDQRRQKFYIATIMIAAIAFVNYLSMALGFG
VTTIELGGEERAIYWARYTDWLFTTPLLLYDLALLAGADRNTIYSLVGLDVLMIGTGALATLSAGSGVLP
AGAERLVWWGISTGFLLVLLYFLFSNLTDRASELSGDLQSKFSTLRNLVLVLWLVYPVLWLVGTEGLGLV
GLPIETAAFMVLDLTAKIGFGIILLQSHAVLDEGQTASEGAAVAD
>gi|5263173|dbj|BAA81816.1| cruxrhodopsin [Haloarcula japonica]
MPEPGSEAIWLWLGTAGMFLGMLYFIGRGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEFA
GEEHPIYWARYSDWLFTTPLLLYDLGLLAGADRNTIASLVSLDVLMIGTGLVATLSAGSGVLSAGAERLV
WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLIGTEGLGLVGIGIETA
GFMVIDLTAKVGFGIILLRSHGVLDGAAETTGAGATATAD
>gi|1769808|dbj|BAA06680.1| cruxrhodopsin-3 [Haloarcula vallismortis]
MPAPEGEAIWLWLGTAGMFLGMLYFIARGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEIA
GEQRPIYWARYSDWLFTTPLLLYDLGLLAGADRNTISSLVSLDVLMIGTGLVATLSAGSGVLSAGAERLV
WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLVGTEGIGLVGIGIETA
GFMVIDLVAKVGFGIILLRSHGVLDGAAETTGAGATATAD
>gi|1060883|dbj|BAA06678.1| cruxrhodopsin-1 [Haloarcula argentinensis]
MPEPGSEAIWLWLGTAGMFLGMLYFIARGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEFA
GEEHPIYWARYSDWLFTTPLLLYDLGLLAGADRNTITSLVSLDVLMIGTGLVATLSPGSGVLSAGAERLV
WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLIGTEGIGLVGIGIETA
GFMVIDLTAKVGFGIILLRSHGVLDGAAETTGTGATPADD
I'm working a cleanup of EUtils to make some of the machinery
disappear. I expect the result will let you do
import EUtils
f = EUtils.efetch("protein", ["914034", "5263173", "1769808",
"1060883"],
format = "fasta")
print f.read()
Is anyone here using EUtils? I would like to see some code which
uses it, to make sure I don't break things and to see if I can
improve the API.
Andrew
dalke at dalkescientific.com
More information about the BioPython
mailing list