From houcine at seznam.cz Mon Nov 1 06:46:22 2004 From: houcine at seznam.cz (prudence) Date: Mon Nov 1 07:08:09 2004 Subject: [BioPython] =?windows-1251?b?//Dq6OUg6vDg8eroIOvl8uA=?= Message-ID: <200411011207.iA1C7IKr002667@portal.open-bio.org> ?? ????? ?????, ? ?? ???? ?????. ????? ????????? ??????????? ????? ?? www.elitpresent.ru ???????? ??? ??????? ??? ?????? ???? ? ??????? ??????? ???????? ?????... From biopython-dev-bounces at portal.open-bio.org Mon Nov 1 07:55:10 2004 From: biopython-dev-bounces at portal.open-bio.org (biopython-dev-bounces@portal.open-bio.org) Date: Mon Nov 1 07:55:16 2004 Subject: [BioPython] Your message to Biopython-dev awaits moderator approval Message-ID: Your mail to 'Biopython-dev' with the subject Re: Hello Is being held until the list moderator can review it for approval. The reason it is being held: Message has a suspicious header Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: http://biopython.org/mailman/confirm/biopython-dev/7b085f2054325221188ee9576a876bf6f5d8676c From herve at norika-fujiwara.com Mon Nov 1 15:16:34 2004 From: herve at norika-fujiwara.com (=?Windows-1251?B?0OXq6+Ds7e7lIO/w5eTr7ubl7ejl?=) Date: Mon Nov 1 15:21:36 2004 Subject: [BioPython] =?windows-1251?b?0OXq6+Ds4DogxOjn4OntIOTr/yDC4PEh?= =?windows-1251?q?!!?= Message-ID: <1329137624.20041101202134@> ????????? ?????! ???????! ???????? ????? ???????? ??? ???????????, ?????? ?/??? ??????! ?? 300 ?.? ?? ... ? ?????????? ????????? ??????! ?? 2800 ?.? ? ??????/?????/???????/???/???????????/???????? :))) (????????? ????:)) ?????????? ????????, ?????????? ????? - ?? 450 ?.?. ?? ... ?? "???????????" (25 ?.?.) ?????? ????????? ????????? ?? 2500 ?.?. ?? ?????????? ????????? ????????, ???????, ????? - ????. ??????-???? (????????? ???????? ?? ???? ?? ??????) ?? 500 ?.?. ???????? ? ??????? ???????????? ???????????????, Corbis, Fotobank ? ???????. ?? 70 ?.?. ?? ?????????? ? ???????????? ???? - ???????! ?? 700 ?.?. ?? ?????????? ????????! ??????????? ?????? ????? ? ????? ?????????????! ?????????? ??????? ? ??????, ?? ????? ? ??! ???????? ???????! ??????????, ??????-????????!!! ??????????? ?????????! ??????????? ? ?????????? ????????? ????????! ????????? ??? ????? ??????! ???.: (095) 101-3527 ??????????, ?? ????????? ????? ?? ???????? ?????! ?? ????? ??????? ????????? ???????? ???????????????. ??????? ?? ?????????? ?????? ? ?????? ????? ????? ? ????? "??" - ????????? ???????????, ? ????? "????" - ???????: (???????? ??????????? ????, ????????, ??????, ?????? ? ??.) ???? ?? ?? ??????? ???????? ????????? ??????????? ?? ??????????? ?????, ?? ?????? ??????? ???? ????? ?? ????? ???? ??????, ??????? ??? ?????? ? ????? ??????? ???????? ? ???? ??????, ?? ??????: nevajno2000@yahoo.com ?? ?????? ???? ???????, ??? ?? ?? ????? ???? ?????? ??? ???????? ????? ?? ????? ???? ??????. ?????? ???? ???????????? 20 ????? ??????? ??????. From mike at maibaum.org Tue Nov 2 12:56:38 2004 From: mike at maibaum.org (Michael Maibaum) Date: Tue Nov 2 12:58:07 2004 Subject: [BioPython] GenBank parsing errors Message-ID: <8B56F325-2CF8-11D9-AB74-000A95BA5F0A@maibaum.org> Hi, I'm trying to use biopython to parse genbank files and it is working happily on some genbank files, but not many others. So far the pattern appears to be Prokaryotic complete genome => OK Eukaryotic complete genome =>failure. The failures are typically very early in the file and don't have wonderfully useful information in the traceback. It falls over in the Martel Parser giving the error Martel.Parser.ParserPositionException: error parsing at or beyond character 191. As this genome is a bit large to attatch I've just included the +/- 10 lines around 191 The full file, should you want it is at: Does anyone have any ideas why this is failing, is it just the joy of tracking NCBI record formats and I need to start looking at the internals for a fix (or use something else) or? thanks Michael -- Dr Michael Maibaum Department of Biochemistry and Molecular Biology, UCL email: maibaum@biochemistry.ucl.ac.uk From mike at maibaum.org Tue Nov 2 13:05:30 2004 From: mike at maibaum.org (Michael Maibaum) Date: Tue Nov 2 13:07:02 2004 Subject: [BioPython] GenBank parsing errors In-Reply-To: <8B56F325-2CF8-11D9-AB74-000A95BA5F0A@maibaum.org> References: <8B56F325-2CF8-11D9-AB74-000A95BA5F0A@maibaum.org> Message-ID: On 2 Nov 2004, at 17:56, Michael Maibaum wrote: > > Hi, > > I'm trying to use biopython to parse genbank files and it is working > happily on some genbank files, but not many others. So far the > pattern appears to be > > Prokaryotic complete genome => OK > Eukaryotic complete genome =>failure. > > The failures are typically very early in the file and don't have > wonderfully useful information in the traceback. It falls over in the > Martel Parser giving the error > > > Martel.Parser.ParserPositionException: error parsing at or beyond > character 191. As this genome is a bit large to attatch I've just > included the +/- 10 lines around 191 > > The full file, should you want it is at: > Tetraodon_nigroviridis.0.dat.gz> > > Does anyone have any ideas why this is failing, is it just the joy of > tracking NCBI record formats and I need to start looking at the > internals for a fix (or use something else) or? > hmmph, forgot some probably pertinent details.... biopython 1.30 python 2.3 Mac OS X 10.3.5 From djkojeti at unity.ncsu.edu Tue Nov 2 16:14:47 2004 From: djkojeti at unity.ncsu.edu (Douglas Kojetin) Date: Tue Nov 2 16:13:27 2004 Subject: [BioPython] PDB -> FASTA Message-ID: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> Hi All- I'm a beginner @ biopython (and I'm 'switching' from perl to python ...). First off, many thanks for the structrual biopython FAQ ... very helpful! My question: can anyone help me with some ideas on how to whip up a quick PDB->FASTA (sequence) script? From the structural biopython faq, I've been able to extract residue information in the form of: I take it I would just need to grab the MET (split the residue object and grab the r[1] index?) and convert into M, then append to a sequence string .... but I didn't know if biopython had something that did an autoconversion of MET->M, or vice versa (M->MET). Thanks for the input, Doug From idoerg at burnham.org Tue Nov 2 16:51:05 2004 From: idoerg at burnham.org (Iddo) Date: Tue Nov 2 16:59:40 2004 Subject: [BioPython] PDB -> FASTA In-Reply-To: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> Message-ID: <41880149.5060004@burnham.org> Welcome aboard, and I am glad we managed to save one Jedi from the dark side of bioinformatics...;) As to your question: see the attached class. I should get that in Biopython, but I keep not doing that.... ./I Douglas Kojetin wrote: > Hi All- > > I'm a beginner @ biopython (and I'm 'switching' from perl to python > ...). First off, many thanks for the structrual biopython FAQ ... > very helpful! My question: can anyone help me with some ideas on how > to whip up a quick PDB->FASTA (sequence) script? > > From the structural biopython faq, I've been able to extract residue > information in the form of: > > > > I take it I would just need to grab the MET (split the residue object > and grab the r[1] index?) and convert into M, then append to a > sequence string .... > > but I didn't know if biopython had something that did an > autoconversion of MET->M, or vice versa (M->MET). > > Thanks for the input, > Doug > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 North Torrey Pines Road La Jolla, CA 92037 USA T: (858) 646 3100 x3516 F: (858) 713 9930 http://ffas.ljcrf.edu/~iddo -------------- next part -------------- A non-text attachment was scrubbed... Name: aaCode.py Type: text/x-python Size: 2069 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20041102/83a76eb2/aaCode.py From idoerg at burnham.org Tue Nov 2 18:34:20 2004 From: idoerg at burnham.org (Iddo) Date: Tue Nov 2 18:33:05 2004 Subject: PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter hates me) In-Reply-To: References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> <41880149.5060004@burnham.org> Message-ID: <4188197C.8000004@burnham.org> Trouble is, that the conversion here might not be good for a some purposes, as usually structure ->sequence conversion applications want (1) unique mapping and (2) a 20 letter alphabet + 'X' for everything else. Gavin Crooks wrote: > There is also a longer three letter code conversion table in > Bio/SCOP/Raf.py > The PDB contains a whole bunch of weird 3 letter codes for different > chemically modified amino acids. > > Another possibility is to get the fasta sequences directly from the > ASTRAL database, since they have already grubbed around and done the > conversion. > > Gavin > > > # This table is taken from the RAF release notes, and includes the > # undocumented mapping "UNK" -> "X" > to_one_letter_code= { > 'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M', > 'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K', > 'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H', > 'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G', > '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A', > 'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D', > 'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A', > 'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C', > 'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C', > 'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C', > 'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C', > 'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F', > 'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E', > 'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V', > 'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P', > 'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y', > 'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E', > 'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R', > 'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W', > 'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K', > 'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A', > 'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G', > 'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H', > 'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S', > 'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C', > 'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y', > 'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C', > 'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K', > 'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W', > 'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y', > 'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G', > 'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z', > 'UNK':'X' > } > > On Nov 2, 2004, at 13:51, Iddo wrote: > >> Welcome aboard, and I am glad we managed to save one Jedi from the >> dark side of bioinformatics...;) >> >> As to your question: see the attached class. I should get that in >> Biopython, but I keep not doing that.... >> >> ./I >> >> >> Douglas Kojetin wrote: >> >>> Hi All- >>> >>> I'm a beginner @ biopython (and I'm 'switching' from perl to python >>> ...). First off, many thanks for the structrual biopython FAQ ... >>> very helpful! My question: can anyone help me with some ideas on >>> how to whip up a quick PDB->FASTA (sequence) script? >>> >>> From the structural biopython faq, I've been able to extract residue >>> information in the form of: >>> >>> >>> >>> I take it I would just need to grab the MET (split the residue >>> object and grab the r[1] index?) and convert into M, then append to >>> a sequence string .... >>> >>> but I didn't know if biopython had something that did an >>> autoconversion of MET->M, or vice versa (M->MET). >>> >>> Thanks for the input, >>> Doug >>> >>> _______________________________________________ >>> BioPython mailing list - BioPython@biopython.org >>> http://biopython.org/mailman/listinfo/biopython >>> >>> >> > > Gavin E. Crooks > Postdoctoral Fellow tel: (510) 642-9614 > 461 Koshland Hall aim:notastring > University of California http://threeplusone.com/ > Berkeley, CA 94720-3102, USA gec@compbio.berkeley.edu > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 North Torrey Pines Road La Jolla, CA 92037 USA T: (858) 646 3100 x3516 F: (858) 713 9930 http://ffas.ljcrf.edu/~iddo From djkojeti at unity.ncsu.edu Tue Nov 2 18:46:57 2004 From: djkojeti at unity.ncsu.edu (Douglas Kojetin) Date: Tue Nov 2 18:45:46 2004 Subject: PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter hates me) In-Reply-To: <4188197C.8000004@burnham.org> References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> <41880149.5060004@burnham.org> <4188197C.8000004@burnham.org> Message-ID: <7BEA0401-2D29-11D9-AD31-000A9597278C@unity.ncsu.edu> Thanks for all the quick and detailed responses! Quite a motivation to stick with (Bio)Python! I was unaware of the ASTRAL database -- that might be useful in the future (i was looking to do this particular conversion on an experimentally determined PDB ... not yet submitted). Can BioPython interact directly w/ the (remote) ASTRAL database? Or is it something I would need to have locally? Thanks again, Doug On Nov 2, 2004, at 6:34 PM, Iddo wrote: > Trouble is, that the conversion here might not be good for a some > purposes, as usually structure ->sequence conversion applications want > (1) unique mapping and (2) a 20 letter alphabet + 'X' for everything > else. > > > Gavin Crooks wrote: > >> There is also a longer three letter code conversion table in >> Bio/SCOP/Raf.py >> The PDB contains a whole bunch of weird 3 letter codes for different >> chemically modified amino acids. >> >> Another possibility is to get the fasta sequences directly from the >> ASTRAL database, since they have already grubbed around and done the >> conversion. >> >> Gavin >> >> >> # This table is taken from the RAF release notes, and includes the >> # undocumented mapping "UNK" -> "X" >> to_one_letter_code= { >> 'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M', >> 'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K', >> 'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H', >> 'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G', >> '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A', >> 'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D', >> 'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A', >> 'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C', >> 'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C', >> 'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C', >> 'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C', >> 'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F', >> 'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E', >> 'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V', >> 'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P', >> 'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y', >> 'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E', >> 'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R', >> 'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W', >> 'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K', >> 'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A', >> 'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G', >> 'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H', >> 'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S', >> 'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C', >> 'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y', >> 'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C', >> 'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K', >> 'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W', >> 'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y', >> 'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G', >> 'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z', >> 'UNK':'X' >> } >> >> On Nov 2, 2004, at 13:51, Iddo wrote: >> >>> Welcome aboard, and I am glad we managed to save one Jedi from the >>> dark side of bioinformatics...;) >>> >>> As to your question: see the attached class. I should get that in >>> Biopython, but I keep not doing that.... >>> >>> ./I >>> >>> >>> Douglas Kojetin wrote: >>> >>>> Hi All- >>>> >>>> I'm a beginner @ biopython (and I'm 'switching' from perl to python >>>> ...). First off, many thanks for the structrual biopython FAQ ... >>>> very helpful! My question: can anyone help me with some ideas on >>>> how to whip up a quick PDB->FASTA (sequence) script? >>>> >>>> From the structural biopython faq, I've been able to extract >>>> residue information in the form of: >>>> >>>> >>>> >>>> I take it I would just need to grab the MET (split the residue >>>> object and grab the r[1] index?) and convert into M, then append to >>>> a sequence string .... >>>> >>>> but I didn't know if biopython had something that did an >>>> autoconversion of MET->M, or vice versa (M->MET). >>>> >>>> Thanks for the input, >>>> Doug >>>> >>>> _______________________________________________ >>>> BioPython mailing list - BioPython@biopython.org >>>> http://biopython.org/mailman/listinfo/biopython >>>> >>>> >>> >> >> Gavin E. Crooks >> Postdoctoral Fellow tel: (510) 642-9614 >> 461 Koshland Hall aim:notastring >> University of California http://threeplusone.com/ >> Berkeley, CA 94720-3102, USA gec@compbio.berkeley.edu >> >> >> > > > -- > Iddo Friedberg, Ph.D. > The Burnham Institute > 10901 North Torrey Pines Road > La Jolla, CA 92037 USA > T: (858) 646 3100 x3516 > F: (858) 713 9930 > http://ffas.ljcrf.edu/~iddo > From idoerg at burnham.org Tue Nov 2 19:36:50 2004 From: idoerg at burnham.org (Iddo) Date: Tue Nov 2 19:35:23 2004 Subject: PDB to Fasta (was Re: [BioPython] PDB -> FASTA but the spam filter hates me) In-Reply-To: <7BEA0401-2D29-11D9-AD31-000A9597278C@unity.ncsu.edu> References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> <41880149.5060004@burnham.org> <4188197C.8000004@burnham.org> <7BEA0401-2D29-11D9-AD31-000A9597278C@unity.ncsu.edu> Message-ID: <41882822.4010905@burnham.org> I believe it is for local Astral, not for over-the-web. ./I Douglas Kojetin wrote: > Thanks for all the quick and detailed responses! Quite a motivation > to stick with (Bio)Python! > > I was unaware of the ASTRAL database -- that might be useful in the > future (i was looking to do this particular conversion on an > experimentally determined PDB ... not yet submitted). Can BioPython > interact directly w/ the (remote) ASTRAL database? Or is it something > I would need to have locally? > > Thanks again, > Doug > > > On Nov 2, 2004, at 6:34 PM, Iddo wrote: > >> Trouble is, that the conversion here might not be good for a some >> purposes, as usually structure ->sequence conversion applications >> want (1) unique mapping and (2) a 20 letter alphabet + 'X' for >> everything else. >> >> >> Gavin Crooks wrote: >> >>> There is also a longer three letter code conversion table in >>> Bio/SCOP/Raf.py >>> The PDB contains a whole bunch of weird 3 letter codes for different >>> chemically modified amino acids. >>> >>> Another possibility is to get the fasta sequences directly from the >>> ASTRAL database, since they have already grubbed around and done the >>> conversion. >>> >>> Gavin >>> >>> >>> # This table is taken from the RAF release notes, and includes the >>> # undocumented mapping "UNK" -> "X" >>> to_one_letter_code= { >>> 'ALA':'A', 'VAL':'V', 'PHE':'F', 'PRO':'P', 'MET':'M', >>> 'ILE':'I', 'LEU':'L', 'ASP':'D', 'GLU':'E', 'LYS':'K', >>> 'ARG':'R', 'SER':'S', 'THR':'T', 'TYR':'Y', 'HIS':'H', >>> 'CYS':'C', 'ASN':'N', 'GLN':'Q', 'TRP':'W', 'GLY':'G', >>> '2AS':'D', '3AH':'H', '5HP':'E', 'ACL':'R', 'AIB':'A', >>> 'ALM':'A', 'ALO':'T', 'ALY':'K', 'ARM':'R', 'ASA':'D', >>> 'ASB':'D', 'ASK':'D', 'ASL':'D', 'ASQ':'D', 'AYA':'A', >>> 'BCS':'C', 'BHD':'D', 'BMT':'T', 'BNN':'A', 'BUC':'C', >>> 'BUG':'L', 'C5C':'C', 'C6C':'C', 'CCS':'C', 'CEA':'C', >>> 'CHG':'A', 'CLE':'L', 'CME':'C', 'CSD':'A', 'CSO':'C', >>> 'CSP':'C', 'CSS':'C', 'CSW':'C', 'CXM':'M', 'CY1':'C', >>> 'CY3':'C', 'CYG':'C', 'CYM':'C', 'CYQ':'C', 'DAH':'F', >>> 'DAL':'A', 'DAR':'R', 'DAS':'D', 'DCY':'C', 'DGL':'E', >>> 'DGN':'Q', 'DHA':'A', 'DHI':'H', 'DIL':'I', 'DIV':'V', >>> 'DLE':'L', 'DLY':'K', 'DNP':'A', 'DPN':'F', 'DPR':'P', >>> 'DSN':'S', 'DSP':'D', 'DTH':'T', 'DTR':'W', 'DTY':'Y', >>> 'DVA':'V', 'EFC':'C', 'FLA':'A', 'FME':'M', 'GGL':'E', >>> 'GLZ':'G', 'GMA':'E', 'GSC':'G', 'HAC':'A', 'HAR':'R', >>> 'HIC':'H', 'HIP':'H', 'HMR':'R', 'HPQ':'F', 'HTR':'W', >>> 'HYP':'P', 'IIL':'I', 'IYR':'Y', 'KCX':'K', 'LLP':'K', >>> 'LLY':'K', 'LTR':'W', 'LYM':'K', 'LYZ':'K', 'MAA':'A', >>> 'MEN':'N', 'MHS':'H', 'MIS':'S', 'MLE':'L', 'MPQ':'G', >>> 'MSA':'G', 'MSE':'M', 'MVA':'V', 'NEM':'H', 'NEP':'H', >>> 'NLE':'L', 'NLN':'L', 'NLP':'L', 'NMC':'G', 'OAS':'S', >>> 'OCS':'C', 'OMT':'M', 'PAQ':'Y', 'PCA':'E', 'PEC':'C', >>> 'PHI':'F', 'PHL':'F', 'PR3':'C', 'PRR':'A', 'PTR':'Y', >>> 'SAC':'S', 'SAR':'G', 'SCH':'C', 'SCS':'C', 'SCY':'C', >>> 'SEL':'S', 'SEP':'S', 'SET':'S', 'SHC':'C', 'SHR':'K', >>> 'SOC':'C', 'STY':'Y', 'SVA':'S', 'TIH':'A', 'TPL':'W', >>> 'TPO':'T', 'TPQ':'A', 'TRG':'K', 'TRO':'W', 'TYB':'Y', >>> 'TYQ':'Y', 'TYS':'Y', 'TYY':'Y', 'AGM':'R', 'GL3':'G', >>> 'SMC':'C', 'ASX':'B', 'CGU':'E', 'CSX':'C', 'GLX':'Z', >>> 'UNK':'X' >>> } >>> >>> On Nov 2, 2004, at 13:51, Iddo wrote: >>> >>>> Welcome aboard, and I am glad we managed to save one Jedi from the >>>> dark side of bioinformatics...;) >>>> >>>> As to your question: see the attached class. I should get that in >>>> Biopython, but I keep not doing that.... >>>> >>>> ./I >>>> >>>> >>>> Douglas Kojetin wrote: >>>> >>>>> Hi All- >>>>> >>>>> I'm a beginner @ biopython (and I'm 'switching' from perl to >>>>> python ...). First off, many thanks for the structrual biopython >>>>> FAQ ... very helpful! My question: can anyone help me with some >>>>> ideas on how to whip up a quick PDB->FASTA (sequence) script? >>>>> >>>>> From the structural biopython faq, I've been able to extract >>>>> residue information in the form of: >>>>> >>>>> >>>>> >>>>> I take it I would just need to grab the MET (split the residue >>>>> object and grab the r[1] index?) and convert into M, then append >>>>> to a sequence string .... >>>>> >>>>> but I didn't know if biopython had something that did an >>>>> autoconversion of MET->M, or vice versa (M->MET). >>>>> >>>>> Thanks for the input, >>>>> Doug >>>>> >>>>> _______________________________________________ >>>>> BioPython mailing list - BioPython@biopython.org >>>>> http://biopython.org/mailman/listinfo/biopython >>>>> >>>>> >>>> >>> >>> Gavin E. Crooks >>> Postdoctoral Fellow tel: (510) 642-9614 >>> 461 Koshland Hall aim:notastring >>> University of California http://threeplusone.com/ >>> Berkeley, CA 94720-3102, USA gec@compbio.berkeley.edu >>> >>> >>> >> >> >> -- >> Iddo Friedberg, Ph.D. >> The Burnham Institute >> 10901 North Torrey Pines Road >> La Jolla, CA 92037 USA >> T: (858) 646 3100 x3516 >> F: (858) 713 9930 >> http://ffas.ljcrf.edu/~iddo >> > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 North Torrey Pines Road La Jolla, CA 92037 USA T: (858) 646 3100 x3516 F: (858) 713 9930 http://ffas.ljcrf.edu/~iddo From idoerg at burnham.org Tue Nov 2 17:25:56 2004 From: idoerg at burnham.org (Iddo) Date: Wed Nov 3 10:53:23 2004 Subject: [BioPython] PDB -> FASTA In-Reply-To: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> References: <398735D0-2D14-11D9-8885-000A9597278C@unity.ncsu.edu> Message-ID: <41880974.6060900@burnham.org> Douglas Kojetin wrote: > Hi All- > > I'm a beginner @ biopython (and I'm 'switching' from perl to python > ...). First off, many thanks for the structrual biopython FAQ ... > very helpful! My question: can anyone help me with some ideas on how > to whip up a quick PDB->FASTA (sequence) script? > > From the structural biopython faq, I've been able to extract residue > information in the form of: > > > > I take it I would just need to grab the MET (split the residue object > and grab the r[1] index?) and convert into M, then append to a > sequence string .... > > but I didn't know if biopython had something that did an > autoconversion of MET->M, or vice versa (M->MET). > > Thanks for the input, > Doug > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 North Torrey Pines Road La Jolla, CA 92037 USA T: (858) 646 3100 x3516 F: (858) 713 9930 http://ffas.ljcrf.edu/~iddo -------------- next part -------------- A non-text attachment was scrubbed... Name: aaCode.py Type: text/x-python Size: 2108 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20041102/56b1d8ec/aaCode-0001.py From djkojeti at unity.ncsu.edu Wed Nov 3 14:22:07 2004 From: djkojeti at unity.ncsu.edu (Douglas Kojetin) Date: Wed Nov 3 14:20:32 2004 Subject: [BioPython] first,last sequence # in PDB file Message-ID: Hi All- I've tried (for awhile) to figure this out ... can anyone tell me how to extract the first and last sequence number in a PDB file? If the PDB started at 1, I could figure it out easily from the sequence length (code from a recent post). But, what about, for example, when the PDB starts with residue 5 and ends on residues 120. I /tried/ to dig through the documentation for a reference to sequence numbers ... it's probably just a problem with my eyes! Thanks, Doug From gca500 at york.ac.uk Thu Nov 4 07:04:06 2004 From: gca500 at york.ac.uk (Atkinson, GC) Date: Thu Nov 4 07:03:13 2004 Subject: [BioPython] Parsing BLAST results Message-ID: <418A1AB6.5040205@york.ac.uk> Hi Everyone, You've probably had this same query hundreds of times before and are sick of answering it, but I'm stuck and need your help! I'm new to Biopython, and I'm trying to run BLAST from a script like the example in the Cookbook: from Bio import Fasta from Bio.Blast import NCBIWWW fasta= open('m_cold.fasta','r') f_iterator=Fasta.Iterator(fasta) record=f_iterator.next() b_results=NCBIWWW.blast('blastn','nr',record) save_file=open('cold.out','w') results=b_results.read() save_file.write(results) save_file.close() All I get in the results file is the HTML page that comes up when waiting for results ("....


This page will be automatically updated in 14 seconds until search is done
") I realise Blast output keeps changing, so the code needs to be updated but I just don't know how! Gemma From gca500 at york.ac.uk Thu Nov 4 13:02:39 2004 From: gca500 at york.ac.uk (Atkinson, GC) Date: Thu Nov 4 13:01:10 2004 Subject: [BioPython] qblast problems Message-ID: <418A6EBF.70802@york.ac.uk> Hi again, I've downloaded the updated version of NCBIWWW to try to stop the unexpected end of stream error when parsing blast results by using qblast. However, even though I definitely have the new qblast function in the NCBIWWW module, I get the error "AttributeError: 'module' object has no attribute 'qblast'" Please help! Gemma gca500@york.ac.uk From e.picardi at unical.it Fri Nov 5 07:24:12 2004 From: e.picardi at unical.it (Ernesto) Date: Fri Nov 5 07:58:10 2004 Subject: [BioPython] Node problem Message-ID: <042101c4c332$67a479f0$572561a0@Travelmate> Hi all, I have a list of nodes such as: Nodo1=Nodo("A",0.1,None,None,1) Nodo2=Nodo("B",0.2,None,None,1) Nodo3=Nodo("C",0.3,None,None,1) Nodo4=Nodo("D",0.4,None,None,1) Nodo6=Nodo("nodo6",0.2,Nodo1,Nodo2,0) Nodo7=Nodo("nodo7",0.3,Nodo3,Nodo4,0) Nodo8=Nodo("nodo8",0.1,Nodo6,Nodo7,0) Nodo5=Nodo("E",0.5,None,None,1) Nodo9=Nodo("Root",None,Nodo8,Nodo5,None). where Nodo9 is the root. I would to substitute each node with its specific value starting from the root, in order to obtain a tree like the following: Tree=Nodo(9,None,Nodo(8,0.01,Nodo(6,0.01,Nodo(1,0.01,None,None,1),Nodo(2,0.01,None,None,1),0),Nodo(7,0.01,Nodo(3,0.01,None,None,1),Nodo(4,0.01,None,None,1),0),0),Nodo(5,0.01,None,None,1),None) Main difficulty is that the value of each node is an istance according to the following class: class Nodo: '''definisci la struttura del nodo''' def __init__(self, contenuto, br=None, sinistra=None, destra=None, tip=None,seq=None): self.c = contenuto # node name self.b = br # branch length self.s = sinistra # left self.d = destra # rigth self.t = tip # is 1 for a tip or 0 for other nodes self.seq = seq Could you suggest me how I could obtain Tree? I could use the recursion but I don't know. Thank in advance for your help Ernesto From JBonis at imim.es Fri Nov 5 09:17:39 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Fri Nov 5 09:16:24 2004 Subject: [BioPython] Problems with NCBIDict and EUtils Message-ID: <66373AD054447F47851FCC5EB49B36110610B0@basquet.imim.es> Hi! I have been several hours turning around a problem and don't get the solution. I want to retrieve a gene with its SNPs, as can be done with this URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1 (take into account the last parameter: 'extrafeat=1') Using NCBIDict, by default, the extrafeat=1 is not added so SNPs are not shown. What I want is to add extrafeat=1 parameter to my NCBIDict['id'] calls. Any idea? I am getting crazy! Julio -----Mensaje original----- De: biopython-bounces@portal.open-bio.org [mailto:biopython-bounces@portal.open-bio.org]En nombre de Ernesto Enviado el: viernes, 05 de noviembre de 2004 13:24 Para: BioPython@biopython.org Asunto: [BioPython] Node problem Hi all, I have a list of nodes such as: Nodo1=Nodo("A",0.1,None,None,1) Nodo2=Nodo("B",0.2,None,None,1) Nodo3=Nodo("C",0.3,None,None,1) Nodo4=Nodo("D",0.4,None,None,1) Nodo6=Nodo("nodo6",0.2,Nodo1,Nodo2,0) Nodo7=Nodo("nodo7",0.3,Nodo3,Nodo4,0) Nodo8=Nodo("nodo8",0.1,Nodo6,Nodo7,0) Nodo5=Nodo("E",0.5,None,None,1) Nodo9=Nodo("Root",None,Nodo8,Nodo5,None). where Nodo9 is the root. I would to substitute each node with its specific value starting from the root, in order to obtain a tree like the following: Tree=Nodo(9,None,Nodo(8,0.01,Nodo(6,0.01,Nodo(1,0.01,None,None,1),Nodo(2,0.01,None,None,1),0),Nodo(7,0.01,Nodo(3,0.01,None,None,1),Nodo(4,0.01,None,None,1),0),0),Nodo(5,0.01,None,None,1),None) Main difficulty is that the value of each node is an istance according to the following class: class Nodo: '''definisci la struttura del nodo''' def __init__(self, contenuto, br=None, sinistra=None, destra=None, tip=None,seq=None): self.c = contenuto # node name self.b = br # branch length self.s = sinistra # left self.d = destra # rigth self.t = tip # is 1 for a tip or 0 for other nodes self.seq = seq Could you suggest me how I could obtain Tree? I could use the recursion but I don't know. Thank in advance for your help Ernesto From mike at maibaum.org Mon Nov 8 05:28:29 2004 From: mike at maibaum.org (Michael Maibaum) Date: Mon Nov 8 05:29:23 2004 Subject: [BioPython] GenBank Parser Errors (Repost) Message-ID: Hi, (I'm sorry if you get this twice, but I sent it to the list last week and didn't get a reply so I'm hoping someone with a suggestion will see it this time, thanks. ) I'm trying to use biopython to parse genbank files and it is working happily on some genbank files, but not many others. So far the pattern appears to be Prokaryotic complete genome => OK Eukaryotic complete genome =>failure. The failures are typically very early in the file and don't have wonderfully useful information in the traceback. It falls over in the Martel Parser giving the error Martel.Parser.ParserPositionException: error parsing at or beyond character 191. As this genome is a bit large to attatch I've just included the +/- 10 lines around 191 The full file, should you want it is at: Does anyone have any ideas why this is failing, is it just the joy of tracking NCBI record formats and I need to start looking at the internals for a fix (or use something else) or? Is it worth trying biopython cvs? Mac OS X 10.3.5 Python 2.3.4, up to date biopython thanks Michael -- Dr Michael Maibaum Department of Biochemistry and Molecular Biology, UCL email: maibaum@biochemistry.ucl.ac.uk From JBonis at imim.es Tue Nov 9 06:19:52 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Tue Nov 9 06:18:19 2004 Subject: [BioPython] EUtils and SNPs Message-ID: <66373AD054447F47851FCC5EB49B36110610B3@basquet.imim.es> Hi, I posted this question some time ago, but get no answer... will send again (with a partial solution I found!) I am trying to get all the variations (SNPs) from a given gene... In old versions of biopython I used GenBank.NCBIDictionary(parser = GenBank.FeatureParser) .... And I get the variations. But now it doesnt work. The reason is that in EUtils (PubMed) now it is needed to add a parameter... "extrafeat=1" to get the SNPs (variations)..... The URL would be: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1 I have "fix" the problem touching the code in the script /EUtils/ThinClient.py class ThinClient .... def efetch_using_dbids(self, dbids, retmode = None, rettype = None, # sequence only seq_start = None, seq_stop = None, strand = None, complexity = None, ): id_string = _dbids_to_id_string(dbids) return self._get(program = "efetch.fcgi", query = {"id": id_string, "db": dbids.db, "extrafeat": '1', #I HAVE ADDED THIS # "retmax": len(dbids.ids), # needed? "retmode": retmode, "rettype": rettype, "seq_start": seq_start, "seq_stop": seq_stop, "strand": strand, "complexity": complexity, }) So adding a parameter ("extrafeat": '1') in the self._get( query ) dictionary the efetch_using_dbids() function now fetch all the variations (SNPs included) from PubMed. This fix is not the best one (I would like to integrate in the hole library so this extrafeat param could be set from NCBIDict calls, but works for me, as I always need the variations in my scripts). Maybe te biopython developers team want to include this in someway for future releases. Regards, Julio Bonis Sanz MD http://www.juliobonis.com/portal/ From JBonis at imim.es Tue Nov 9 11:58:55 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Tue Nov 9 11:57:12 2004 Subject: [BioPython] EUtils strange behaviour Message-ID: <66373AD054447F47851FCC5EB49B36110610B8@basquet.imim.es> Hi! I am playing again with EUtils and have detected a strange behaviour.... Let me focus in HTR2A receptor in humans' gene: http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=10835174 Well, what I want to do is to retrieve in a gb_record (GenBank python library) all the SNPs in that gene... And it is getting more complex than I guessed.... I have passed to EUtils efetch this parameters: db=nucleotide id=10835174 (is the gi for my mRNA: HTR2A) retmode=html rettype=gb (GenBank mode) extrafeat=1 (to get the SNPs... if you dont send extrafeat=1 EUtils will not show you any SNP or variation).... This is the URL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10835174&retmode=html&rettype=gb&extrafeat=1 Well, there are 7 variations.... 4 of them are outside the limits of the CDS (so are not transcribed). I am only interested in the SNPs of the exons.... Finally I get these 3 SNPs: variation 219 /gene="HTR2A" /replace="a" /replace="c" /db_xref="dbSNP:1805055" variation 734 /gene="HTR2A" /replace="a" /replace="g" /db_xref="dbSNP:6304" variation 1407 /gene="HTR2A" /replace="c" /replace="t" /db_xref="dbSNP:1058576" BUT.... here is the problem.... If you go to LocusLink for HTR2A gene you will find it: http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=3356&mrna=NM_000621&ctg=NT_024524&prot=NP_000612&orien=reverse&view+rs+=view+rs+&chooseRs=coding This is the list of SNPs for the exons in that HTR2A mRNA.... rs6314 rs6308 rs1058576 rs6304 rs6305 rs6313 rs1805055 That compared with the list we get from EUtils is: LocusLink EUtils rs6314 rs6308 rs1058576 ========= 1058576 rs6304 ========= 6304 rs6305 rs6313 rs1805055 ========= 1805055 You can see that there are 5 missing SNPs.... why? Well, I have explanation for two of them (rs6305 and rs6313) are synonymous SNPs (no change in the protein sequence), but what about 6308 and rs6314? where have they gone? I need to solve this mistery, to build an script that retrieves on-the-fly (from EUtils) all the SNPs of a given mRNA. I dont want to take all the contigs or FTPing the hole RefSeq human genome database. :S. Of course I need to retrieve synonymous SNPs also (in fact one of the most clinically relevant SNPs in HTR2A is that synonymous SNPs rs6313... and nobody seems to know why). Any comment, suggestion, idea (apart from RTFM... I promise I have read and read!!!!) From JBonis at imim.es Wed Nov 10 06:24:44 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Wed Nov 10 06:23:17 2004 Subject: [BioPython] RV: variations in a mRNA (was EUtils) Message-ID: <66373AD054447F47851FCC5EB49B36110610BB@basquet.imim.es> I sent a message to NCBI Help desk and they responded this. No so much help, but I will try to understand now how biopython works with elink to try to solve my problem. -----Mensaje original----- De: Gabrielian, Andrei (NIH/NLM/NCBI) [mailto:gabrieli@ncbi.nlm.nih.gov] Enviado el: mi?rcoles, 10 de noviembre de 2004 1:19 Para: Bonis Sanz, Julio Asunto: RE: variations in a mRNA Dear NCBI user, if you click on Links -> SNP for this sequence, you'll be linked to 310 SNPs: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=nucleotide&cmd=Display&dopt =nucleotide_snp&from_uid=10835174 the same result you'll get if you do this via e-utilities call: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db =snp&id=10835174 Hope this helps, NCBI Help desk ------------- Begin Forwarded Message ------------- X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Subject: variations in a mRNA Date: Tue, 9 Nov 2004 17:08:26 +0100 X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: variations in a mRNA Thread-Index: AcTGdc2Y8NYse+0eQjugxxuZ3qtWcQ== From: "Bonis Sanz, Julio" < > To: X-Scanned-By: MIMEDefang 2.36 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by centaurus.ncbi.nlm.nih.gov id iA9G8pTt012815 Dear Sirs, I am building an application that tries to retrieve all the SNPs contained in a given gene. I am using this URI: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=10 8351 74&rettype=gb&extrafeat=1 that is the gene for HTR2A human receptor.... I only get 7 variations (SNPs)... But if you go to http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=3356&mrna=NM_000621&ctg= NT_0 24524&prot=NP_000612&orien=reverse&view+rs+=view+rs+&chooseRs=all You can see that there are several more SNPs (intron ones and synonims ones for example). What I want is to retrieve by EUtils ALL the SNPs in a given gene, but I am not sure of the parameters to send to the EUtils Any help will be appreciated. Julio Bonis Sanz MD Research Group on Biomedical Informatics GRIB: http://www.imim.es/grib/ ------------- End Forwarded Message ------------- From JBonis at imim.es Wed Nov 10 08:56:18 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Wed Nov 10 08:54:30 2004 Subject: [BioPython] EUtils strange behaviour Message-ID: <66373AD054447F47851FCC5EB49B36110610BD@basquet.imim.es> Hi all, hi Andrew, I have been working on the problem of SNPs.... BTW: I found the "extrafeat" param by emailing with NCBI staff. No documented parameter! (really, the documentation of EUtils is very poor). Something curious about extrafeat is that if you put extrafeat=0 you dont get any feature.variation, if you send extrafeat=1 you get feature.variation /s ... but (and this is the curious thing) if you send extrafeat=3 or 5 or 7 or 9 ... you get the same results that for extrafeat=1 ... :D .... only God knows how that param works internally in Entrez logic :). Turning on the original problem: One indirect way to solve it is by using elink... in this way: You send to elink.fcgi: dbfrom=nucleotide db=snp id= __GI of the mRNA of your interest__ http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=nucleotide&db=snp&id=10835174 so you get: nucleotide 10835174 snp nucleotide_snp 633737 9534508 4142900 Then in some way you have to "extract" the list of rs IDs in a list rsList = []... I have not work on it but should not be hard... is in biopython any class or method already implemented to do this task? Then you can make a kind of for rs in rsList: .... and for each rs do the following: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html sending to efetch.fcgi the following: db = snp id = __the rs__ rettype = flt retmode = html so you get this: 1: rs6313 [Homo sapiens] rs6313 | Hs | 9606 | snp | genotype=NO | submitterlink=YES | updated 08/05/2004 16:40:00 ss7941 | WIAF-CSNP | WIAF-10853 | orient=+ ss4928286 | YUSUKE | IMS-JST093413 | orient=- ss11087375 | BCM_SSAHASNP | chr13.NT_024524.12_16044432 | orient=- ss13329237 | SC_SNP | NT_024524.12_16044432 | orient=- ss19284906 | CSHL-HAPMAP | CSHL-HuDD-200402.chr13.NT_024524.13_28449941 | orient=- ss21114665 | SSAHASNP | WGSA-200403-chr13.chr13.NT_024524.13_28449941 | orient=- ss22887275 | IMCJ-GDT | IMCJ-HTR2A_4-CT | orient=+ SNP | alleles='C/T' | het=0.499997 | se(het)=0.00128609 VAL | validated=YES | min_prob=0 | max_prob=? | notwithdrawn MAP | ncbi_num_chr=1 | ncbi_num_ctg=1 | ncbi_num_sec_loc=1 | ncbi_weight=1 CTG | chr=13 | chr-pos=45267941 | Hs13_24680_34:13 | ctg-start=28449941 | ctg-end=28449941 | loctype=2 | orient=- LOC | HTR2A | locus_id=3356 | fxn-class=coding-synon | allele=T | frame=3 | residue=S | aa_position=34 LOC | HTR2A | locus_id=3356 | fxn-class=reference | allele=C | frame=3 | residue=S | aa_position=34 GBL | HTR2A | locus_id=3356 | fxn-class=coding-synon SEQ | AF498982:1 | source-db=gb-mrna | seq-pos=102 | orient=+ SEQ | AL160397:1 | source-db=hgs-finish | seq-pos=45162 | orient=- SEQ | G28536:1 | source-db=gb-sts | seq-pos=247 | orient=+ SEQ | M86841:1 | source-db=gb-mrna | seq-pos=78 | orient=+ SEQ | NM_000621:1 | source-db=ref-mrna | seq-pos=247 | orient=+ SEQ | S42165:1 | source-db=hgs-finish | seq-pos=102 | orient=+ SEQ | S71229:1 | source-db=gb-mrna | seq-pos=152 | orient=+ SEQ | X57830:1 | source-db=gb-mrna | seq-pos=247 | orient=+ Again, somekind of script should extract the information you are interested in... and that's all! I will work in it and send my script soon... cheers, Julio Bonis Sanz MD http://www.juliobonis.com/portal/ -----Mensaje original----- De: Andrew Dalke [mailto:dalke@dalkescientific.com] Enviado el: mi?rcoles, 10 de noviembre de 2004 9:06 Para: Bonis Sanz, Julio CC: biopython@biopython.org Asunto: Re: [BioPython] EUtils strange behaviour Hi again Julio, > Any comment, suggestion, idea (apart from RTFM... I promise I have > read and read!!!!) No doubt you were as frustrated as I on the documentation. It's opaque and incomplete. I spent a lot of time doing what you were doing, testing things and seeing if I could make sense of it all. I didn't figure out the SNPs interface. I just don't know enough about that domain and there's pretty much no documentation for that. You should try the NCBI EUtils help desk at eutilities@ncbi.nlm.nih.gov . Andrew dalke@dalkescientific.com From JBonis at imim.es Wed Nov 10 09:03:00 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Wed Nov 10 09:01:27 2004 Subject: [BioPython] EUtils and SNPs Message-ID: <66373AD054447F47851FCC5EB49B36110610BE@basquet.imim.es> By the way: I want to send this to EUtils: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html but snp is not defined as a valid db in DBIds ... any fast solution? where are the DBIds databases defined? regards, Julio Bonis Sanz -----Mensaje original----- De: Andrew Dalke [mailto:dalke@dalkescientific.com] Enviado el: mi?rcoles, 10 de noviembre de 2004 8:46 Para: Bonis Sanz, Julio CC: biopython@biopython.org Asunto: Re: [BioPython] EUtils and SNPs Hi Julio, > But now it doesnt work. The reason is that in EUtils (PubMed) now it > is needed to add a parameter... "extrafeat=1" to get the SNPs > (variations)..... Interesting. That isn't documented that I know about. How did you find out about it? > I have "fix" the problem touching the code in the script > /EUtils/ThinClient.py Okay, I'll add that to the EUtils v.2 package I announced here a couple months ago. Haven't gotten any feedback on it. Anyone tried it? > Maybe te biopython developers team want to include this in someway for > future releases. Mmm, so the question is how to pass extra parameters from the NCBIDict to the EUtils package. I've been working on other projects since that preview release. I'll bump up the priority on it. Andrew dalke@dalkescientific.com From JBonis at imim.es Wed Nov 10 09:20:27 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Wed Nov 10 09:18:38 2004 Subject: Nevermind....RE: [BioPython] EUtils and SNPs Message-ID: <66373AD054447F47851FCC5EB49B36110610BF@basquet.imim.es> Sorry.... I found myself the answer to my last question: :) >>> eutils = ThinClient.ThinClient() >>> dbids = DBIds("snp",["6313"]) >>> infile = eutils.efetch_using_dbids(dbids,rettype='flt',retmode='html') -----Mensaje original----- De: biopython-bounces@portal.open-bio.org [mailto:biopython-bounces@portal.open-bio.org]En nombre de Bonis Sanz, Julio Enviado el: mi?rcoles, 10 de noviembre de 2004 15:03 Para: biopython@biopython.org Asunto: RE: [BioPython] EUtils and SNPs By the way: I want to send this to EUtils: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6313&rettype=flt&retmode=html but snp is not defined as a valid db in DBIds ... any fast solution? where are the DBIds databases defined? regards, Julio Bonis Sanz -----Mensaje original----- De: Andrew Dalke [mailto:dalke@dalkescientific.com] Enviado el: mi?rcoles, 10 de noviembre de 2004 8:46 Para: Bonis Sanz, Julio CC: biopython@biopython.org Asunto: Re: [BioPython] EUtils and SNPs Hi Julio, > But now it doesnt work. The reason is that in EUtils (PubMed) now it > is needed to add a parameter... "extrafeat=1" to get the SNPs > (variations)..... Interesting. That isn't documented that I know about. How did you find out about it? > I have "fix" the problem touching the code in the script > /EUtils/ThinClient.py Okay, I'll add that to the EUtils v.2 package I announced here a couple months ago. Haven't gotten any feedback on it. Anyone tried it? > Maybe te biopython developers team want to include this in someway for > future releases. Mmm, so the question is how to pass extra parameters from the NCBIDict to the EUtils package. I've been working on other projects since that preview release. I'll bump up the priority on it. Andrew dalke@dalkescientific.com _______________________________________________ BioPython mailing list - BioPython@biopython.org http://biopython.org/mailman/listinfo/biopython From gca500 at york.ac.uk Wed Nov 10 12:01:00 2004 From: gca500 at york.ac.uk (Atkinson, GC) Date: Wed Nov 10 11:59:45 2004 Subject: [BioPython] Blast descriptions and alignments Message-ID: <4192494C.9000200@york.ac.uk> Hi, I'm doing a qblast and I'm having problems retrieving 500 alignments and descriptions. I'm using: b_results=NCBIWWW.qblast('blastp','nr',record, descriptions = 500, alignments = 500) It just retrieves about 100 results, though doing a blast search without my Python script gets the full 500. Any ideas? Gemma Atkinson From aaron at atroxen.com Fri Nov 12 00:21:53 2004 From: aaron at atroxen.com (Aaron Zschau) Date: Fri Nov 12 00:20:10 2004 Subject: [BioPython] genbank protein lookup issues Message-ID: Something seems to have broken since I last ran this program a few weeks ago, my line: gi_list = GenBank.search_for(form["genename"].value, database="protein") with values for form["genename"].value such as "YAL038w" but it seems to give the error no matter what search string I use. I haven't made any changes to my code since it last worked so I'm wondering if genbank changed something. I get the following errors in my logs from running: [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] Traceback (most recent call last):, referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File "/var/www/cgi-bin/cluster.py", line 78, in ?, referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] gi_list = GenBank.search_for(form["genename"].value, database="protein"), referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/GenBank/__init__.py", line 1398, in search_for, referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] retstart = start_id, retmax = max_ids), referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File "/usr/lib/python2.2/site-packages/Bio/EUtils/DBIdsClient.py", line 294, in search, referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] searchinfo = parse.parse_search(infile, [None]), referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/EUtils/parse.py", line 219, in parse_search, referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] raise TypeError("Unknown OP code: %r" % (s,)), referer: http://serval.atroxen.com:8080/interface.html [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] TypeError: Unknown OP code: u'GROUP', referer: http://serval.atroxen.com:8080/interface.html any help would be appreciated, Aaron Zschau From aaron at atroxen.com Sat Nov 13 14:03:51 2004 From: aaron at atroxen.com (Aaron Zschau) Date: Sat Nov 13 14:02:05 2004 Subject: [BioPython] genbank protein lookup issues In-Reply-To: References: Message-ID: I've managed to confirm that the problem line is as follows: gi_list = GenBank.search_for("YAL038w",database='protein') if I just take that one line out and run it by hand from the python prompt (with the necessary imports first), I get the following error: >>> gi_list = GenBank.search_for("YAL038w",database='protein') Traceback (most recent call last): File "", line 1, in ? File "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/GenBank/__init__.py", line 1398, in search_for retstart = start_id, retmax = max_ids) File "/usr/lib/python2.2/site-packages/Bio/EUtils/DBIdsClient.py", line 294, in search searchinfo = parse.parse_search(infile, [None]) File "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/EUtils/parse.py", line 219, in parse_search raise TypeError("Unknown OP code: %r" % (s,)) TypeError: Unknown OP code: u'GROUP' Does anybody know why this would be happening or if it is a known problem with the genbank servers/parsing? thanks, Aaron Zschau On Nov 12, 2004, at 12:21 AM, Aaron Zschau wrote: > Something seems to have broken since I last ran this program a few > weeks ago, my line: > > gi_list = GenBank.search_for(form["genename"].value, > database="protein") > > with values for form["genename"].value such as "YAL038w" but it seems > to give the error no matter what search string I use. I haven't made > any changes to my code since it last worked so I'm wondering if > genbank changed something. > > > I get the following errors in my logs from running: > > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] Traceback (most > recent call last):, referer: > http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File > "/var/www/cgi-bin/cluster.py", line 78, in ?, referer: > http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] gi_list = > GenBank.search_for(form["genename"].value, database="protein"), > referer: http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File > "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/GenBank/ > __init__.py", line 1398, in search_for, referer: > http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] retstart = > start_id, retmax = max_ids), referer: > http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File > "/usr/lib/python2.2/site-packages/Bio/EUtils/DBIdsClient.py", line > 294, in search, referer: http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] searchinfo > = parse.parse_search(infile, [None]), referer: > http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] File > "/tmp/biopython-1.30/build/lib.linux-i586-2.2/Bio/EUtils/parse.py", > line 219, in parse_search, referer: > http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] raise > TypeError("Unknown OP code: %r" % (s,)), referer: > http://serval.atroxen.com:8080/interface.html > [Fri Nov 12 00:14:20 2004] [error] [client 10.0.10.22] TypeError: > Unknown OP code: u'GROUP', referer: > http://serval.atroxen.com:8080/interface.html > > > any help would be appreciated, > > Aaron Zschau > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From raghunath at jhmi.edu Sun Nov 14 15:47:05 2004 From: raghunath at jhmi.edu (Raghunath Reddy) Date: Sun Nov 14 12:40:17 2004 Subject: [BioPython] BLAST Problem Message-ID: I have been trying the biopython Blast module. But i'm getting the following error, INFO: Failure to set-up BLAST search with database 'refseq' Any suggetsions?. Thanks Raghunath From biopython at maubp.freeserve.co.uk Fri Nov 19 06:39:01 2004 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri Nov 19 06:38:02 2004 Subject: [BioPython] GenBank parsing errors Message-ID: <419DDB55.2040907@maubp.freeserve.co.uk> I have been trying to use the GenBank parser and have had some trouble. I notice from the archives that Michael Maibaum has also had difficulties: http://portal.open-bio.org/pipermail/biopython/2004-November/002457.html Michael wrote: > I'm trying to use biopython to parse genbank files and it is working > happily on some genbank files, but not many others. So far the > pattern appears to be > > Prokaryotic complete genome => OK > Eukaryotic complete genome =>failure I have not tried any prokaryotes, but I have tried several eukaryotes without any success. While I do recall have seen Martel parser errors (probably like Michael had), I generally have a different problem. For example, this small sample of code fails using E. coli K12, file NC_000913.gbk (about 10MB) available from here: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/ from Bio import GenBank gb_handle = open('NC_000913.gbk', 'r') feature_parser = GenBank.FeatureParser() gb_iterator = GenBank.Iterator(gb_handle, feature_parser) print 'So far so good' cur_record = gb_iterator.next() print 'Done' I see CPU usage at almost 100%, and memory usage for Python goes steadily up. At about 200 or 300MB the CPU usage drops, and my system becomes very sluggish. I normally kill the process at this point. Windows XP BioPython 1.30 Python 2.3 Does anyone got the GenBank parser to work on a bacterial genome? Thank you Peter From jonathan.taylor at utoronto.ca Fri Nov 19 19:39:59 2004 From: jonathan.taylor at utoronto.ca (Jonathan Taylor) Date: Fri Nov 19 19:38:41 2004 Subject: [BioPython] Biopython and Quixote preventing use of Bio.GenBank.NCBIDictionary Message-ID: <1100911199.18277.21.camel@dallas.bbc.botany.utoronto.ca> Hi, I am using quixote, a web application server, to develop a web based application. I have a genbank id that I want to get the taxonomical description of. I can do this easily via my util.py script from the command line. When I it from inside my web application I get the error below. Any help is greatly appreciated. Thanks Jon. Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/quixote/publish.py", line 522, in process_request output = self.try_publish(request, env.get('PATH_INFO', '')) File "/usr/lib/python2.3/site-packages/quixote/publish.py", line 457, in try_publish output = object(request) File "/home/jtaylor/projects/fungid/fungid/web/input.ptl", line 93, in _q_index return form.action(request, 'Submit', values) File "/home/jtaylor/projects/fungid/fungid/web/input.ptl", line 67, in action seq = blast.add_sequence(record) File "/home/jtaylor/projects/fungid/fungid/lib/blast.py", line 100, in add_sequence name = util.lookup_name_from_gi(gi) File "/home/jtaylor/projects/fungid/fungid/lib/util.py", line 8, in lookup_name_from_gi ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = feature_parser) File "/home/jtaylor/lib/python/Bio/GenBank/__init__.py", line 1320, in __init__ self.db = db["nucleotide-genbank-eutils"] File "/home/jtaylor/lib/python/Bio/config/Registry.py", line 93, in __getitem__ return self._name_table[name] # raises KeyError for unknown entries KeyError: 'nucleotide-genbank-eutils' Here is util.py: from Bio import GenBank def lookup_name_from_gi(gi): feature_parser = GenBank.RecordParser() ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = feature_parser) record = ncbi_dict[str(gi)] # may want to use the taxonomy list to help here return record.source if __name__ == '__main__': print lookup_name_from_gi(47499221) From biopython at maubp.freeserve.co.uk Wed Nov 24 14:35:20 2004 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed Nov 24 14:30:29 2004 Subject: [BioPython] GenBank parsing errors In-Reply-To: <6.0.3.0.0.20041124091100.01c3e070@mail.xbioinformatics.org> References: <419DDB55.2040907@maubp.freeserve.co.uk> <6.0.3.0.0.20041124091100.01c3e070@mail.xbioinformatics.org> Message-ID: <41A4E278.7040103@maubp.freeserve.co.uk> Peter wrote: >> For example, this small sample of code fails using E. coli K12, >> file NC_000913.gbk (about 10MB) available from here: (code removed) >> I see CPU usage at almost 100%, and memory usage for Python >> goes steadily up. At about 200 or 300MB the CPU usage drops, >> and my system becomes very sluggish. I normally kill the >> process at this point. Admin wrote: > I have tried to use the Genbank or bacterial genomes in the past > but I had to abandon it because it thrashes around in memory as > you have described, as the sequence is just too large for the > API. I was splicing out cds features from the records. > > I had to write a custom parser to get the job done Good to know its not just me or my computer :) I have also resorted to writing my own custom script to do the job. For each gene I wanted the translated sequence and the CDS information (i.e. position in genome), which are both fairly easy to get from the GenBank file. I wrote some code to convert the GenBank file into a custom .faa FASTA file with the CDS location (and a few other properties like the product) encoded into the FASTA title records. This lets me use all the BioPython FASTA support, and get the additional information by parsing the sequence title/description. Peter From JBonis at imim.es Thu Nov 25 10:39:40 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Thu Nov 25 10:37:31 2004 Subject: [BioPython] Bug in GenBank.FeatureParser detected Message-ID: <66373AD054447F47851FCC5EB49B36110610D4@basquet.imim.es> Hi, I was trying to parse a genbank record obtained by using eutils: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=51511729&rettype=gbwithparts&retmode=text&seq_start=46267941&seq_stop=46467941 It means, human chromosome 13, from 46267941 to 46467941. My idea is to give a position and a chromosome and then get the surrounding genes/mRNAs/CDS/SNPs in the area.... Well, I have found that when trying to use GenBank.FeatureParser() it shows an error here: LOCUS NC_000013 200001 bp DNA linear CON 25-OCT-2004 DEFINITION Homo sapiens chromosome 13, complete sequence. ACCESSION NC_000013 REGION: 46267941..46467941 VERSION NC_000013.9 GI:51511729 KEYWORDS HTG. Specifically in line: ACCESSION NC_000013 REGION: 46267941..46467941 ... If you remove 'REGION: 46267941..46467941' then the parser works fine. I guess it should be easy to fix in the Biopython code... but I get lost when trying to find where.... Someone can help? I suggest to add the fix as soon as possible... Regards, Julio Bonis Sanz MD http://www.juliobonis.com/portal/ Research Group on Biomedical Informatics http://www.imim.es/grib/ Barcelona - Spain From JBonis at imim.es Mon Nov 29 06:55:05 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Mon Nov 29 06:52:43 2004 Subject: [BioPython] How is a join stored in the gb_record? Message-ID: <66373AD054447F47851FCC5EB49B36110610D9@basquet.imim.es> hi! I have parsed a "genome" genBank record with biopython... Lets say: mRNA join(1870..1959,2070..2106,8593..8646,9327..9399, 20023..20132,25649..25786,30723..30800,38444..40397) what I need is to retrieve from the gb_record the subsegments (1870 to 1959, 2070 to 2106....) But I only have realized how to retrieve the start and end points (1870 and 40397) by: gb_record.features[x].location._start.position gb_record.features[x].location._end.position Any idea? regards, Julio Bonis Sanz -----Mensaje original----- De: biopython-bounces@portal.open-bio.org [mailto:biopython-bounces@portal.open-bio.org]En nombre de Bonis Sanz, Julio Enviado el: jueves, 25 de noviembre de 2004 16:40 Para: biopython@biopython.org Asunto: [BioPython] Bug in GenBank.FeatureParser detected Hi, I was trying to parse a genbank record obtained by using eutils: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=51511729&rettype=gbwithparts&retmode=text&seq_start=46267941&seq_stop=46467941 It means, human chromosome 13, from 46267941 to 46467941. My idea is to give a position and a chromosome and then get the surrounding genes/mRNAs/CDS/SNPs in the area.... Well, I have found that when trying to use GenBank.FeatureParser() it shows an error here: LOCUS NC_000013 200001 bp DNA linear CON 25-OCT-2004 DEFINITION Homo sapiens chromosome 13, complete sequence. ACCESSION NC_000013 REGION: 46267941..46467941 VERSION NC_000013.9 GI:51511729 KEYWORDS HTG. Specifically in line: ACCESSION NC_000013 REGION: 46267941..46467941 ... If you remove 'REGION: 46267941..46467941' then the parser works fine. I guess it should be easy to fix in the Biopython code... but I get lost when trying to find where.... Someone can help? I suggest to add the fix as soon as possible... Regards, Julio Bonis Sanz MD http://www.juliobonis.com/portal/ Research Group on Biomedical Informatics http://www.imim.es/grib/ Barcelona - Spain _______________________________________________ BioPython mailing list - BioPython@biopython.org http://biopython.org/mailman/listinfo/biopython From JBonis at imim.es Tue Nov 30 10:13:35 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Tue Nov 30 10:11:17 2004 Subject: [BioPython] TypeError: Unknown OP code: u'GROUP' and other issues Message-ID: <66373AD054447F47851FCC5EB49B36110610E0@basquet.imim.es> Hi all, Working with biopython I have found some bugs.... For example when using GenBank.search_for() function: >> GenBank.search_for("HTR2A[GENE]") it returns the error: >> TypeError: Unknown OP code: u'GROUP' This is becouse the XML from NCBI returns a "new" operator, named GROUP that is not included in the biopython code. HTR2A[GENE] GENE 27 Y GROUP I dont know what those operators means, but touching the biopython code (Bio/EUtils/parse.py) like this: elif s == "NOT": stack[-2:] = [Datatypes.Not(stack[-2], stack[-1])] #added by jbonis elif s == "GROUP": garbage = s #end of added by jbonis else: raise TypeError("Unknown OP code: %r" % (s,)) it works.... But there is other problem.... using GenBank.search_for() ... it returns: "Error: Sequence Viewer does not have any Presentations for code='gi_text'" the solution I have found is to change the Bio/GenBank/__init__py search_for() function like this: >> for db_id in db_ids: >> #ids.append(db_id.dbids.ids[0]) #removed by jbonis >> ids.append(int(db_id.records_dbids.ids[0])) #jbonis >> return ids Hope it helps to others with the same problem, and to the people from biopython to improve the next release. regards, Julio Bonis Sanz MD http://www.juliobonis.com/portal/ From JBonis at imim.es Tue Nov 30 10:27:36 2004 From: JBonis at imim.es (Bonis Sanz, Julio) Date: Tue Nov 30 10:25:09 2004 Subject: mistake: RE: [BioPython] TypeError: Unknown OP code: u'GROUP' and other issues Message-ID: <66373AD054447F47851FCC5EB49B36110610E2@basquet.imim.es> There was an error in my code: please replace this: >> for db_id in db_ids: >> #ids.append(db_id.dbids.ids[0]) #removed by jbonis >> ids.append(int(db_id.records_dbids.ids[0])) #jbonis >> return ids with this: >> for db_id in db_ids: >> #ids.append(db_id.dbids.ids[0]) #removed by jbonis >> ids.append(str(int(db_id.records_dbids.ids[0]))) #jbonis >> return ids ( add a str to ids.append() ) ... it should work fine now -----Mensaje original----- De: biopython-bounces@portal.open-bio.org [mailto:biopython-bounces@portal.open-bio.org]En nombre de Bonis Sanz, Julio Enviado el: martes, 30 de noviembre de 2004 16:14 Para: biopython@biopython.org Asunto: [BioPython] TypeError: Unknown OP code: u'GROUP' and other issues Hi all, Working with biopython I have found some bugs.... For example when using GenBank.search_for() function: >> GenBank.search_for("HTR2A[GENE]") it returns the error: >> TypeError: Unknown OP code: u'GROUP' This is becouse the XML from NCBI returns a "new" operator, named GROUP that is not included in the biopython code. HTR2A[GENE] GENE 27 Y GROUP I dont know what those operators means, but touching the biopython code (Bio/EUtils/parse.py) like this: elif s == "NOT": stack[-2:] = [Datatypes.Not(stack[-2], stack[-1])] #added by jbonis elif s == "GROUP": garbage = s #end of added by jbonis else: raise TypeError("Unknown OP code: %r" % (s,)) it works.... But there is other problem.... using GenBank.search_for() ... it returns: "Error: Sequence Viewer does not have any Presentations for code='gi_text'" the solution I have found is to change the Bio/GenBank/__init__py search_for() function like this: >> for db_id in db_ids: >> #ids.append(db_id.dbids.ids[0]) #removed by jbonis >> ids.append(int(db_id.records_dbids.ids[0])) #jbonis >> return ids Hope it helps to others with the same problem, and to the people from biopython to improve the next release. regards, Julio Bonis Sanz MD http://www.juliobonis.com/portal/ _______________________________________________ BioPython mailing list - BioPython@biopython.org http://biopython.org/mailman/listinfo/biopython