From chapmanb at arches.uga.edu Sun Jul 1 13:40:51 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: References: Message-ID: <15167.24739.293433.283046@taxus.athen1.ga.home.com> Jeff: > It smells like time to put together another release to get some bug > fixes and new functionality out to the public. Sounds like a great idea! Thanks for pushing for it to happen. > I currently have the > middle of next week in mind, although that's still up for debate. Fine with me. > Here's my undoubtedly incomplete list of stuff that's been updated > since the last release (on Mar 3!): [...] A few other things I can think of: o Can output GenBank.Record objects in GenBank format o Fixes and updates in SubsMat code. > 1) you're currently working on something and really want to hold > off until it's done. Iddo and I were working on fixes and additions he suggested for the Align stuff. He just sent me a revised version today, so I think we should be able to have it in within the deadline without a problem. > - dynamic programming code (Brad, where's yours? :) Um...my cat ate it. Jeff: > Stuff I'd like, but may not get done: > - PDB parser Andrew: > I use UPDB to generate Martel format definitions. However, it's > not really useful unless the parser can build real data structures, Thomas: > I think, this month I can spend significantly more time on the biopython > project than the last months - so is anything I mentioned worth to pull in > the next release ? Don't know if this can make it is the next release, but it seems like there is: 1. Interest in a PDB parser. 2. Code available from Andrew for a partial PDB parser, that needs someone to help finish and integrate it. 3. Someone named Thomas with extra time on his hands :-) Don't know if this is a good match with your interests, Thomas, but it's at least an idea (man, and I don't get them very often). In general, sounds good about the release. Please let me know if there are specific things I can do to help out. Brad From biopython-bugs at bioperl.org Sun Jul 1 13:56:27 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/34 Message-ID: <200107011756.f61HuQ402618@pw600a.bioperl.org> JitterBug notification chapmanb moved PR#34 from incoming to trash Message summary for PR#34 From: Subject: specials of the day Date: Mon, 28 May 2001 15:02:24 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From dragon13@dwp.net Mon May 28 13:59:32 2001 Received: from lunar.eclipse.net (root@[207.207.192.6]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f4SHxVb29487; Mon, 28 May 2001 13:59:31 -0400 Received: from dwp.net (da001d1958.atl-ga.osd.concentric.net [64.3.199.167]) by lunar.eclipse.net (8.9.1a/8.6.12) with SMTP id PAA10840; Mon, 28 May 2001 15:03:30 -0400 (EDT) From: Subject: specials of the day Date: Mon, 28 May 2001 15:02:24 Message-Id: <613.466435.205071@dwp.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" PLEASE FORWARD TO THE PERSON RESPONSIBLE FOR PURCHASING YOUR LASER PRINTER SUPPLIES **** VORTEX SUPPLIES **** LASER PRINTER TONER CARTRIDGES, COPIER AND FAX CARTRIDGES SAVE UP TO 30% FROM RETAIL ORDER BY PHONE:1-888-288-9043 ORDER BY FAX: 1-888-977-1577 CUSTOMER SERVICE: 1-888-248-2015 E-MAIL REMOVAL LINE: 1-888-248-4930 UNIVERSITY AND/OR SCHOOL PURCHASE ORDERS WELCOME. (NO CREDIT APPROVAL REQUIRED) ALL OTHER PURCHASE ORDER REQUESTS REQUIRE CREDIT APPROVAL. PAY BY CHECK (C.O.D), CREDIT CARD OR PURCHASE ORDER (NET 30 DAYS). IF YOUR ORDER IS BY CREDIT CARD PLEASE LEAVE YOUR CREDIT CARD # PLUS EXPIRATION DATE. IF YOUR ORDER IS BY PURCHASE ORDER LEAVE YOUR SHIPPING/BILLING ADDRESSES AND YOUR P.O. NUMBER FOR THOSE OF YOU WHO REQUIRE MORE INFORMATION ABOUT OUR COMPANY INCUDING FEDERAL TAX ID NUMBER, CLOSEST SHIPPING OR CORPORATE ADDRESS IN THE CONTINENTAL U.S. OR FOR CATALOG REQUESTS PLEASE CALL OUR CUSTOMER SERVICE LINE 1-888-248-2015 OUR NEW , LASER PRINTER TONER CARTRIDGE, PRICES ARE AS FOLLOWS: (PLEASE ORDER BY PAGE NUMBER AND/OR ITEM NUMBER) HEWLETT PACKARD: (ON PAGE 2) ITEM #1 LASERJET SERIES 4L,4P (74A)------------------------$44 ITEM #2 LASERJET SERIES 1100 (92A)-------------------------$44 ITEM #3 LASERJET SERIES 2 (95A)----------------------------$39 ITEM #4 LASERJET SERIES 2P (75A)---------------------------$54 ITEM #5 LASERJET SERIES 5P,6P,5MP, 6MP (3903A)---------- -$44 ITEM #6 LASERJET SERIES 5SI, 8000 (09A)--------------------$95 ITEM #7 LASERJET SERIES 2100 (96A)-------------------------$74 ITEM #8 LASERJET SERIES 8100 (82X)------------------------$145 ITEM #9 LASERJET SERIES 5L/6L (3906A)----------------------$39 ITEM #10 LASERJET SERIES 4V---------------------------------$95 ITEM #11 LASERJET SERIES 4000 (27X)--------------------------$72 ITEM #12 LASERJET SERIES 3SI/4SI (91A)-----------------------$54 ITEM #13 LASERJET SERIES 4, 4M, 5,5M-------------------------$49 ITEM #13A LASERJET SERIES 5000 (29X)-------------------------$95 HEWLETT PACKARD FAX (ON PAGE 2) ITEM #14 LASERFAX 500, 700 (FX1)----------$49 ITEM #15 LASERFAX 5000,7000 (FX2)--------$54 ITEM #16 LASERFAX (FX3)------------------$59 ITEM #17 LASERFAX (FX4)------------------$54 LEXMARK/IBM (ON PAGE 3) OPTRA 4019, 4029 HIGH YIELD---------------$89 OPTRA R, 4039, 4049 HIGH YIELD-----------$105 OPTRA E-----------------------------------$59 OPTRA N----------------------------------$115 OPTRA S----------------------------------$165 EPSON (ON PAGE 4) ACTION LASER 7000,7500,8000,9000----------$105 ACTION LASER 1000,1500--------------------$105 CANON PRINTERS (ON PAGE 5) PLEASE CALL FOR MODELS AND UPDATED PRICES FOR CANON PRINTER CARTRIDGES PANASONIC (0N PAGE 7) NEC SERIES 2 MODELS 90 AND 95----------$105 APPLE (0N PAGE 8) LASER WRITER PRO 600 or 16/600------------------$49 LASER WRITER SELECT 300,320,360-----------------$74 LASER WRITER 300 AND 320------------------------$54 LASER WRITER NT, 2NT----------------------------$54 LASER WRITER 12/640-----------------------------$79 CANON FAX (ON PAGE 9) LASERCLASS 4000 (FX3)---------------------------$59 LASERCLASS 5000,6000,7000 (FX2)-----------------$54 LASERFAX 5000,7000 (FX2)------------------------$54 LASERFAX 8500,9000 (FX4)------------------------$54 CANON COPIERS (PAGE 10) PC 3, 6RE, 7 AND 11 (A30)---------------------$69 PC 300,320,700,720 and 760 (E-40)-------------$89 IF YOUR CARTRIDGE IS NOT LISTED CALL CUSTOMER SERVICE AT 1-888-248-2015 90 DAY UNLIMITED WARRANTY INCLUDED ON ALL PRODUCTS. ALL TRADEMARKS AND BRAND NAMES LISTED ABOVE ARE PROPERTY OF THE RESPECTIVE HOLDERS AND USED FOR DESCRIPTIVE PURPOSES ONLY. From biopython-bugs at bioperl.org Sun Jul 1 13:56:27 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/36 Message-ID: <200107011756.f61HuR402622@pw600a.bioperl.org> JitterBug notification chapmanb moved PR#36 from incoming to trash Message summary for PR#36 From: Subject: toner supplies Date: Fri, 29 Jun 2001 04:13:00 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From br56@peopleweb.com Fri Jun 29 04:20:04 2001 Received: from custmail.concentric.net (custmail.concentric.net [205.158.26.150]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5T8Jw810271; Fri, 29 Jun 2001 04:19:59 -0400 Received: from www.z209220078.sjc-ca.dsl.cnc.net (hq.dcara.org [209.220.78.2]) by custmail.concentric.net (8.11.0/8.11.0) with ESMTP id f5T8JfY24378; Fri, 29 Jun 2001 01:19:42 -0700 (PDT) Received: from peopleweb.com ([168.191.92.201]) by www.z209220078.sjc-ca.dsl.cnc.net (Post.Office MTA v3.5.2 release 221 ID# 0-67874U100L2S100V35) with SMTP id net; Fri, 29 Jun 2001 01:18:59 -0700 From: Subject: toner supplies Date: Fri, 29 Jun 2001 04:13:00 Message-Id: <71.849361.143942@peopleweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" PLEASE FORWARD TO THE PERSON RESPONSIBLE FOR PURCHASING YOUR LASER PRINTER SUPPLIES **** VORTEX SUPPLIES **** LASER PRINTER TONER CARTRIDGES, COPIER AND FAX CARTRIDGES SAVE UP TO 30% FROM RETAIL ORDER BY PHONE:1-888-288-9043 ORDER BY FAX: 1-888-977-1577 CUSTOMER SERVICE: 1-888-248-2015 E-MAIL REMOVAL LINE: 1-888-248-4930 UNIVERSITY AND/OR SCHOOL PURCHASE ORDERS WELCOME. (NO CREDIT APPROVAL REQUIRED) ALL OTHER PURCHASE ORDER REQUESTS REQUIRE CREDIT APPROVAL. PAY BY CHECK (C.O.D), CREDIT CARD OR PURCHASE ORDER (NET 30 DAYS). IF YOUR ORDER IS BY CREDIT CARD PLEASE LEAVE YOUR CREDIT CARD # PLUS EXPIRATION DATE. IF YOUR ORDER IS BY PURCHASE ORDER LEAVE YOUR SHIPPING/BILLING ADDRESSES AND YOUR P.O. NUMBER FOR THOSE OF YOU WHO REQUIRE MORE INFORMATION ABOUT OUR COMPANY INCUDING FEDERAL TAX ID NUMBER, CLOSEST SHIPPING OR CORPORATE ADDRESS IN THE CONTINENTAL U.S. OR FOR CATALOG REQUESTS PLEASE CALL OUR CUSTOMER SERVICE LINE 1-888-248-2015 OUR NEW , LASER PRINTER TONER CARTRIDGE, PRICES ARE AS FOLLOWS: (PLEASE ORDER BY PAGE NUMBER AND/OR ITEM NUMBER) HEWLETT PACKARD: (ON PAGE 2) ITEM #1 LASERJET SERIES 4L,4P (74A)------------------------$44 ITEM #2 LASERJET SERIES 1100 (92A)-------------------------$44 ITEM #3 LASERJET SERIES 2 (95A)----------------------------$39 ITEM #4 LASERJET SERIES 2P (75A)---------------------------$54 ITEM #5 LASERJET SERIES 5P,6P,5MP, 6MP (3903A)---------- -$44 ITEM #6 LASERJET SERIES 5SI, 8000 (09A)--------------------$95 ITEM #7 LASERJET SERIES 2100 (96A)-------------------------$74 ITEM #8 LASERJET SERIES 8100 (82X)------------------------$145 ITEM #9 LASERJET SERIES 5L/6L (3906A)----------------------$35 ITEM #10 LASERJET SERIES 4V---------------------------------$95 ITEM #11 LASERJET SERIES 4000 (27X)--------------------------$72 ITEM #12 LASERJET SERIES 3SI/4SI (91A)-----------------------$54 ITEM #13 LASERJET SERIES 4, 4M, 5,5M-------------------------$49 ITEM #13A LASERJET SERIES 5000 (29X)-------------------------$95 HEWLETT PACKARD FAX (ON PAGE 2) ITEM #14 LASERFAX 500, 700 (FX1)----------$49 ITEM #15 LASERFAX 5000,7000 (FX2)--------$54 ITEM #16 LASERFAX (FX3)------------------$59 ITEM #17 LASERFAX (FX4)------------------$54 LEXMARK/IBM (ON PAGE 3) OPTRA 4019, 4029 HIGH YIELD---------------$89 OPTRA R, 4039, 4049 HIGH YIELD-----------$105 OPTRA E-----------------------------------$59 OPTRA N----------------------------------$115 OPTRA S----------------------------------$165 EPSON (ON PAGE 4) ACTION LASER 7000,7500,8000,9000----------$105 ACTION LASER 1000,1500--------------------$105 CANON PRINTERS (ON PAGE 5) PLEASE CALL FOR MODELS AND UPDATED PRICES FOR CANON PRINTER CARTRIDGES PANASONIC (0N PAGE 7) NEC SERIES 2 MODELS 90 AND 95----------$105 APPLE (0N PAGE 8) LASER WRITER PRO 600 or 16/600------------------$49 LASER WRITER SELECT 300,320,360-----------------$74 LASER WRITER 300 AND 320------------------------$54 LASER WRITER NT, 2NT----------------------------$54 LASER WRITER 12/640-----------------------------$79 CANON FAX (ON PAGE 9) LASERCLASS 4000 (FX3)---------------------------$59 LASERCLASS 5000,6000,7000 (FX2)-----------------$54 LASERFAX 5000,7000 (FX2)------------------------$54 LASERFAX 8500,9000 (FX4)------------------------$54 CANON COPIERS (PAGE 10) PC 3, 6RE, 7 AND 11 (A30)---------------------$69 PC 300,320,700,720 and 760 (E-40)-------------$89 IF YOUR CARTRIDGE IS NOT LISTED CALL CUSTOMER SERVICE AT 1-888-248-2015 90 DAY UNLIMITED WARRANTY INCLUDED ON ALL PRODUCTS. ALL TRADEMARKS AND BRAND NAMES LISTED ABOVE ARE PROPERTY OF THE RESPECTIVE HOLDERS AND USED FOR DESCRIPTIVE PURPOSES ONLY. From idoerg at cc.huji.ac.il Mon Jul 2 02:15:12 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: <15167.24739.293433.283046@taxus.athen1.ga.home.com> Message-ID: Hi, Jeff: : : > Here's my undoubtedly incomplete list of stuff that's been updated : > since the last release (on Mar 3!): : [...] : : A few other things I can think of: : : o Can output GenBank.Record objects in GenBank format : o Fixes and updates in SubsMat code. I fixed the SubsMat code, commited it + regression tests, and updated the Wiki docs. AFAIK, should be OK. Let me know if not so. Also, I added a bit of functionality to FSSP.FSSPTools Brad: : Iddo and I were working on fixes and additions he suggested for the : Align stuff. He just sent me a revised version today, so I think we : should be able to have it in within the deadline without a problem. Yep. I think I can finish this by Tuesday. : Jeff: : > Stuff I'd like, but may not get done: : > - PDB parser : : Andrew: : > I use UPDB to generate Martel format definitions.However, it's : > not really useful unless the parser can build real data structures, : Is anybody familiar with Konrad Hinsen's PDB module which is used, I believe as part of MMTK? I downloaded it a couple of years ago, and it seems to stand out well on its own. It might be worth checking out: http://starship.python.net/crew/hinsen/MMTK/ (Haven't really looked at it for a while, though). Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From idoerg at cc.huji.ac.il Wed Jul 4 12:19:46 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: Message-ID: Hi, FSSP, Align, and SubsMat have all been committed for the new version. Inc. updated regression tests. Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From chapmanb at arches.uga.edu Wed Jul 4 16:06:54 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: References: Message-ID: <15171.30558.888426.472221@taxus.athen1.ga.home.com> Hi Iddo; > FSSP, Align, and SubsMat have all been committed for the new version. Inc. > updated regression tests. Sweet. Thanks for the fixes and updates. Can you also update the generated output in Tests/output for test_FSSP, test_align and test_SubsMat if you've verified it to be correct (you can do 'python run_tests.py -g test_FSSP' to get the output updated). Right now running python run_tests.py in the Tests directory gives the following errors: ====================================================================== ERROR: test_FSSP ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 155, in runTest raise IOError, "Warning: Can't open %s for test %s" % \ IOError: Warning: Can't open ./output/test_FSSP for test test_FSSP ====================================================================== FAIL: test_SubsMat ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 153, in runTest expected_handle) File "run_tests.py", line 247, in compare_output assert expected_line == output_line, \ AssertionError: Output : '\n' Expected: 'A 1.60\n' ====================================================================== FAIL: test_align ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 153, in runTest expected_handle) File "run_tests.py", line 247, in compare_output assert expected_line == output_line, \ AssertionError: Output : 'part of alignment: 88.4230990854\n' Expected: 'part of alignment: 1.57690091462\n' ---------------------------------------------------------------------- Thanks again for the updates! Brad From idoerg at cc.huji.ac.il Thu Jul 5 04:56:15 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: <15171.30558.888426.472221@taxus.athen1.ga.home.com> Message-ID: Hi, Incidentally: Happy (belated) 4th of July! On Wed, 4 Jul 2001, Brad Chapman wrote: : : Can you also update the generated output in Tests/output for : test_FSSP, test_align and test_SubsMat if you've verified it to be : correct (you can do 'python run_tests.py -g test_FSSP' to get the : output updated). OK, done. Sorry about that. Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From jchang at SMI.Stanford.EDU Thu Jul 5 15:57:29 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: References: Message-ID: test_FSSP is still failing the regression tests for me. It seems to be caused by: f.write("\nRecords filtered in %s\n" % sum_ge_15.keys()) Since dictionaries are unordered, it is outputting them in a different order than they did when the output was created. Thus, I added a little code so that it outputs the sorted keys, which should be the same every time: k = sum_ge_15.keys() k.sort() f.write("\nRecords filtered in %s\n" % k) I'm committed this change and updated the output. I'm going to start putting together the release today. Jeff From chapmanb at arches.uga.edu Thu Jul 5 16:44:05 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: References: Message-ID: <15172.53653.811395.427800@taxus.athen1.ga.home.com> Jeff: > test_FSSP is still failing the regression tests for me. It seems to > be caused by: > f.write("\nRecords filtered in %s\n" % sum_ge_15.keys()) > > Since dictionaries are unordered, it is outputting them in a > different order than they did when the output was created. Ya, I also did this myself on some tests. It is very easy to get tricked on this (well, at least it was easy for me to get tricked :-) because for the same dictionary in multiple tests, python will give you the same order. It is when you switch versions of python that the ordering will start to change and you'll realize it. Sneaky! > I'm going to start > putting together the release today. Sweet! Give a heads up when you're done and I can build the windows installer and update the HappyDoc documentation. Also, can we try to get the pdf documentation in the release this time? :-). You can grab it from the normal place: http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.pdf If you put it in the Doc directory it would automagically be included (I hope!). Thanks! Brad From idoerg at cc.huji.ac.il Thu Jul 5 18:12:01 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: <15172.53653.811395.427800@taxus.athen1.ga.home.com> Message-ID: Hi, On Thu, 5 Jul 2001, Brad Chapman wrote: : Jeff: : > test_FSSP is still failing the regression tests for me.It seems to : > be caused by: : > f.write("\nRecords filtered in %s\n" % sum_ge_15.keys()) : > : > Since dictionaries are unordered, it is outputting them in a : > different order than they did when the output was created. : : Ya, I also did this myself on some tests. It is very easy to get : tricked on this (well, at least it was easy for me to get tricked :-) Same here. Didn't know that, (though I should have thought about it). Jeff's fix is fine. Thanks for taking the time to do this. One more thing: can the following correction please be inserted in the manual? (Section 3.5.5) Amend the paragraph beginning with "Qi - " to: Qi - is automatically assigned to 0.05 for a protein alphabet, and 0.25 for a nucleic acid alphabet. This is for geting the information content without any assumption of prior distribtions. When assuming priors, or when using a non-standard alphabet, user should supply the values for Qi. Scratch the whole paragraph (just following) about gap characters, including the 2nd equation. We're not doing it that way anyway. Thanks, Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From chapmanb at arches.uga.edu Thu Jul 5 20:41:25 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: References: <15172.53653.811395.427800@taxus.athen1.ga.home.com> Message-ID: <15173.2357.484152.295717@taxus.athen1.ga.home.com> Iddo: > One more thing: can the following correction please be inserted in the > manual? Shorely! All fixed in CVS and on the web site documentation. Thanks! Brad From jchang at SMI.Stanford.EDU Thu Jul 5 22:24:17 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: <15172.53653.811395.427800@taxus.athen1.ga.home.com> References: <15172.53653.811395.427800@taxus.athen1.ga.home.com> Message-ID: > > I'm going to start >> putting together the release today. > >Sweet! Give a heads up when you're done and I can build the windows >installer and update the HappyDoc documentation. Yep, it's all there now! >Also, can we try to get the pdf documentation in the release this >time? :-). You can grab it from the normal place: > >http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.pdf Thanks. I had meant to include it last time, but there was a failure in the build process and it accidentally got left out. Sorry about that! Jeff From jchang at SMI.Stanford.EDU Thu Jul 5 22:26:46 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] build process online Message-ID: Hello everybody, I've written down the build process I use to make biopython releases and put them online: http://www.biopython.org/wiki/html/BioPython/BuildProcess.html This is a reference to make sure everything gets done for the builds! Jeff From chapmanb at arches.uga.edu Sat Jul 7 14:48:04 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: References: <15172.53653.811395.427800@taxus.athen1.ga.home.com> Message-ID: <15175.22884.286737.563372@taxus.athen1.ga.home.com> [Jeff on the new release] > Yep, it's all there now! I've added windows installers for python2.0 and python2.1 to the download page. They seem to install fine for me, but if anyone here uses windows and wants to test them that would be excellent! I'm probably just going to lay off on making rpms right now, unless people complain. [pdf docs] > Thanks. I had meant to include it last time, but there was a failure > in the build process and it accidentally got left out. Sorry about > that! No problem -- I just thought it was funny that we've been trying to put it in for the last 12 releases or so and keep forgetting about it somehow :-). Thanks for getting it in there! Also, the build process wiki page is quite interesting reading -- thanks for adding it! Brad From johann at egenetics.com Mon Jul 9 09:34:26 2001 From: johann at egenetics.com (Johann Visagie) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: ; from jchang@SMI.Stanford.EDU on Wed, Jun 27, 2001 at 11:19:32PM -0700 References: Message-ID: <20010709153426.E9831@fling.sanbi.ac.za> FYI, I've just updated the FreeBSD port of Biopython to 1.00a2, and committed the changes back the FreeBSD CVS tree. http://www.freebsd.org/cgi/cvsweb.cgi/ports/biology/py-biopython/ -- Johann From biopython-bugs at bioperl.org Tue Jul 10 10:40:46 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Notification: incoming/37 Message-ID: <200107101440.f6AEek416252@pw600a.bioperl.org> JitterBug notification new message incoming/37 Message summary for PR#37 From: idoerg@cc.huji.ac.il Subject: MutableSeq Date: Tue, 10 Jul 2001 10:40:46 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From idoerg@cc.huji.ac.il Tue Jul 10 10:40:46 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f6AEej416246 for ; Tue, 10 Jul 2001 10:40:46 -0400 Date: Tue, 10 Jul 2001 10:40:46 -0400 Message-Id: <200107101440.f6AEej416246@pw600a.bioperl.org> From: idoerg@cc.huji.ac.il To: biopython-bugs@bioperl.org Subject: MutableSeq Full_Name: Iddo Friedberg Module: CVS Version: OS: RedHat 7.0 Submission from: nv5.huji.ac.il (62.0.54.61) >>> from Bio import Seq >>> l=Seq.Seq('ACDEFGHIKL')phabet()) >>> j=l.tomutable() >>> print j MutableSeq('ACDEFGHIKL', Alphabet()) >>> j Segmentation fault (back to shell prompt) BTW, I patched my RH 7.0 system up.. I don't believe this is due to the overflow problems that are known in this system. Iddo From idoerg at cc.huji.ac.il Wed Jul 11 03:49:51 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: bug 37 (fwd) Message-ID: Hi Andrew & all, Sorry, I mangled the thing during cut/paste. Seems like you reproduced the code correctly, Andrew. Ok. Here's the minimal code. It causes a segfault with python 2.0 / 2.0.1 on Linux RH 7.0 and RH 6.2 idoerg@arrakis:noh/sw> python Python 2.0 (#1, Oct 16 2000, 18:10:03) [GCC 2.95.2 19991024 (release)] on linux2 Type "copyright", "credits" or "license" for more information. imported cPickle, math, string, os imported dirs >>> from Bio import Seq >>> l=Seq.MutableSeq('ACDEFGHIKL') >>> print j MutableSeq('ACDEFG', Alphabet()) >>> j Segmentation fault (core dumped) Boo Hoo!! Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From dalke at dalkescientific.com Wed Jul 11 04:59:39 2001 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: bug 37 (fwd) Message-ID: <006f01c109e7$d1108260$c1c63ec1@josiah.ebi.ac.uk> That's can't be right, since there's no 'j': Iddo: > >>> from Bio import Seq > >>> l=Seq.MutableSeq('ACDEFGHIKL') > >>> print j > MutableSeq('ACDEFG', Alphabet()) > >>> j >Segmentation fault (core dumped) [dalke@pw600a biopython]$ python Python 2.0 (#4, Dec 8 2000, 21:23:00) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from Bio import Seq >>> l = Seq.MutableSeq('ACDEFGHIKL') >>> print l MutableSeq('ACDEFGHIKL', Alphabet()) >>> l MutableSeq(array('c', 'ACDEFGHIKL'), Alphabet()) >>> If I were to make a guess, what happens when you do >>> import array >>> a = array.array('c', 'ACDEFGHIKL') >>> a array('c', 'ACDEFGHIKL') >>> ? Andrew From idoerg at cc.huji.ac.il Wed Jul 11 05:16:57 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: bug 37 (fwd) In-Reply-To: <006f01c109e7$d1108260$c1c63ec1@josiah.ebi.ac.uk> Message-ID: On Wed, 11 Jul 2001, Andrew Dalke wrote: : That's can't be right, since there's no 'j': Of course it's wrong! Silly me... Here's the real thing: >>> from Bio import Seq >>> l=Seq.MutableSeq('ACDEFGHIKL') >>> print l MutableSeq('ACDEFG', Alphabet()) >>> l (segfault) Here's the summary (for now): Problem appears on python2.0, 2.0.1 Linux RH 6.2, RH 7.0 Does not appear on python2.1, which I have just installed. [Andrew] : If I were to make a guess, what happens when you do : : >>> import array : >>> a = array.array('c', 'ACDEFGHIKL') : >>> a : array('c', 'ACDEFGHIKL') : >>> This bit of code works fine on python2.0 Can anyone reproduce the segfault bug on python2.0? Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From dalke at dalkescientific.com Wed Jul 11 05:31:03 2001 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: bug 37 (fwd) Message-ID: <008401c109ec$34367300$c1c63ec1@josiah.ebi.ac.uk> Iddo: >Can anyone reproduce the segfault bug on python2.0? The biopython.org machine is an Alpha machine running Linux and Python 2.0 - no segfault: [dalke@pw600a ~]$ uname -a Linux pw600a.bioperl.org 2.2.14-6.0 #1 Tue Mar 28 16:56:56 EST 2000 alpha unknown [dalke@pw600a ~]$ python Python 2.0 (#4, Dec 8 2000, 21:23:00) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from Bio import Seq >>> l = Seq.MutableSeq('ACDEFGHIKL') >>> print l MutableSeq('ACDEFGHIKL', Alphabet()) >>> l MutableSeq(array('c', 'ACDEFGHIKL'), Alphabet()) >>> Andrew From chapmanb at arches.uga.edu Wed Jul 11 05:44:58 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: bug 37 (fwd) In-Reply-To: <008401c109ec$34367300$c1c63ec1@josiah.ebi.ac.uk> References: <008401c109ec$34367300$c1c63ec1@josiah.ebi.ac.uk> Message-ID: <15180.8218.54340.15962@taxus.athen1.ga.home.com> > Iddo: > >Can anyone reproduce the segfault bug on python2.0? Andrew: > The biopython.org machine is an Alpha machine running Linux > and Python 2.0 - no segfault: I do get the segfault with Python 2.0 on FreeBSD: [chapmanb]$ uname -a FreeBSD insomniac.athen1.ga.home.com 4.3-STABLE FreeBSD 4.3-STABLE #5: Fri Jun 8 01:44:22 EDT 2001 chapmanb@insomniac.athen1.ga.home.com:/usr/src/sys/compi le/INSOMNIAC i386 [chapmanb]$ python Python 2.0 (#2, Oct 31 2000, 15:45:46) [GCC 2.95.2 19991024 (release)] on freebsd4 Type "copyright", "credits" or "license" for more information. >>> from Bio import Seq >>> l = Seq.MutableSeq('ACDEFG') >>> print l MutableSeq('ACDEFG', Alphabet()) >>> l Segmentation fault (core dumped) but I don't see it with NetBSD/Python2.1: [chapmanb]$ uname -a NetBSD taxus.athen1.ga.home.com 1.5.1 NetBSD 1.5.1 (TAXUS) #1: Tue Jun 12 09:13:48 EDT 2001 chapmanb@taxus:/usr/src/sys/arch/macppc/compile/TAXUS macppc [chapmanb]$ python Python 2.1 (#6, Jul 8 2001, 17:18:01) [GCC egcs-2.91.66 19990314 (egcs-1.1.2 release)] on netbsd1 Type "copyright", "credits" or "license" for more information. >>> from Bio import Seq >>> l = Seq.MutableSeq('ACDEFGHIKL') >>> print l MutableSeq('ACDEFGHIKL', Alphabet()) >>> l MutableSeq(array('c', 'ACDEFGHIKL'), Alphabet()) I'd suspect this is a tricky bug in 2.0 that got fixed somewhere along the line in 2.1. only-use-old-versions-of-software-if-you-like-to-run-into-fixed-bugs-ly yr's, Brad From idoerg at cc.huji.ac.il Wed Jul 11 06:03:01 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: bug 37 (fwd) In-Reply-To: <15180.8218.54340.15962@taxus.athen1.ga.home.com> Message-ID: Hi, You're up bright and early, Brad! OK, I'm out of ideas. This should be documented somewhere (where do we document known bugs?), but I'm out of ideas as to a fix. Besides a python2.1 update. Iddo On Wed, 11 Jul 2001, Brad Chapman wrote: : > Iddo: : > >Can anyone reproduce the segfault bug on python2.0? : : Andrew: : > The biopython.org machine is an Alpha machine running Linux : > and Python 2.0 - no segfault: : : I do get the segfault with Python 2.0 on FreeBSD: : : [chapmanb]$ uname -a : FreeBSD insomniac.athen1.ga.home.com 4.3-STABLE FreeBSD 4.3-STABLE #5: Fri Jun : 8 01:44:22 EDT 2001 chapmanb@insomniac.athen1.ga.home.com:/usr/src/sys/compi : le/INSOMNIACi386 : [chapmanb]$ python : Python 2.0 (#2, Oct 31 2000, 15:45:46) : [GCC 2.95.2 19991024 (release)] on freebsd4 : Type "copyright", "credits" or "license" for more information. : >>> from Bio import Seq : >>> l = Seq.MutableSeq('ACDEFG') : >>> print l : MutableSeq('ACDEFG', Alphabet()) : >>> l : Segmentation fault (core dumped) : : but I don't see it with NetBSD/Python2.1: : : [chapmanb]$ uname -a : NetBSD taxus.athen1.ga.home.com 1.5.1 NetBSD 1.5.1 (TAXUS) #1: Tue Jun 12 09:13:48 EDT 2001 chapmanb@taxus:/usr/src/sys/arch/macppc/compile/TAXUS macppc : [chapmanb]$ python : Python 2.1(#6, Jul8 2001, 17:18:01) : [GCC egcs-2.91.66 19990314 (egcs-1.1.2 release)] on netbsd1 : Type "copyright", "credits" or "license" for more information. : >>> from Bio import Seq : >>> l = Seq.MutableSeq('ACDEFGHIKL') : >>> print l : MutableSeq('ACDEFGHIKL', Alphabet()) : >>> l : MutableSeq(array('c', 'ACDEFGHIKL'), Alphabet()) : : I'd suspect this is a tricky bug in 2.0 that got fixed somewhere along : the line in 2.1. : : only-use-old-versions-of-software-if-you-like-to-run-into-fixed-bugs-ly : yr's, : : Brad : : _______________________________________________ : Biopython-dev mailing list : Biopython-dev@biopython.org : http://biopython.org/mailman/listinfo/biopython-dev : -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From chapmanb at arches.uga.edu Wed Jul 11 06:15:42 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: bug 37 (fwd) In-Reply-To: References: <15180.8218.54340.15962@taxus.athen1.ga.home.com> Message-ID: <15180.10062.710172.68319@taxus.athen1.ga.home.com> Hi Iddo; > You're up bright and early, Brad! :-). Up late actually; ah, the fun of trying to finish a project update that you start at the very last moment! > OK, I'm out of ideas. This should be documented somewhere (where do we > document known bugs?), but I'm out of ideas as to a fix. Besides a > python2.1 update. I can add something to the Tutorial about it (I can add a "known bugs" section). I don't think this problem should affect anyones scripts or anything, but it is good to bring it up so that we know it's happening. Brad From dalke at dalkescientific.com Wed Jul 11 10:00:57 2001 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Martel now supports attributes Message-ID: <001901c10a11$e8877dc0$c1c63ec1@josiah.ebi.ac.uk> I finally got a chance to do something I proposed a long time ago. Martel now supports attributes for the XML events. [dalke@pw600a biopython]$ python Python 2.0 (#4, Dec 8 2000, 21:23:00) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from Martel import * >>> format = Group("seq", Re("[ATCG]+"), {"type": "dna"}) + AnyEol() >>> from xml.sax import saxutils >>> gen = saxutils.XMLGenerator() >>> parser = format.make_parser() >>> parser.setContentHandler(gen) >>> parser.parseString("GATTACA\n") GATTACA >>> So the new part here is the optional 3rd arg to Martel.Group, which is the dictionary to use for the attributes. The result is shown in the tag, which now includes the attribute 'type="dna"'. The regular expression pattern language was modified to allow persisting the attributes to/from the ?P<> group name. >>> str(format) '(?P[ACGT]+)(\\n|\\r\\n?)' >>> This is actually encoded like the query component of a URL, so the following is allowed (?P...) and corresponds to a startElement of: The reason for this change is to make it easier to support different formats and versions. Currently I've been using tags like ... Now I can do: ID 100K_RAT AC Q12345; ... SQ EKLADWERDN ADEDLE and if tag names are chosen consistently across the databases then something like a FASTA conversion can be made very generic - just get the 'id type="id"' and fields of each . I've also added the old Martel-specific regression tests back the the main biopython CVS tree. This doesn't have the format specific tests (like for PIR, BLAST, etc.), excepting SWISS-PROT. I change the 'sre_parse.py' and 'sre_constants.py' files to be 'msre_parse.py' and 'msre_constants.py' because I was all too often running into conflicts between those files and the ones in the now standard Python distribution. Andrew dalke@acm.org From dalke at dalkescientific.com Thu Jul 12 20:13:48 2001 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] bioperl-db Message-ID: <003401c10b30$b0831180$c1c63ec1@josiah.ebi.ac.uk> As I think I've mentioned before, I'm visiting EBI before going to BOSC next week. Part of my hope while here is to get better integration between biopython and bioperl. This afternoon (and evening) I worked on being able to use a MySQL database loaded from the bioperl-db schema. Here's some of what it looks like, with comments starting with ">>> #" josiah> python Python 2.1a2 (#1, Feb 11 2001, 00:48:26) [GCC 2.95.3 19991030 (prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> import BioSeqDatabase >>> server = BioSeqDatabase.open_database(user = "root", db="minidb") >>> server.keys() ['genbank'] >>> db = server["genbank"] >>> # can also do 'display_id' and 'primary_id' >>> record = db.lookup(accession = "L26462") >>> # size of the sequence >>> len(record) 3002 >>> # This is akin to the bioperl way to access the sequence record >>> seq = record.primary_seq.seq >>> # Except that biopython's Seq is a first-class object >>> # In bioperl this is a primary_seq.subseq(0, 1) >>> seq[0] 'A' >>> # The first fetches the full string from the database, >>> # then shows the 1st 5 chars; the second only fetches 5 characters >>> # from the database >>> seq.tostring()[:5], seq[:5].tostring() ('ACCTC', 'ACCTC') >>> # More showing off :) >>> seq[-1] 'C' >>> seq.tostring()[-5:], seq[-5:].tostring() ('CTATC', 'CTATC') >>> # Yep, subslices can be subsliced >>> seq.tostring()[-5:], seq[-10:][5:].tostring() ('CTATC', 'CTATC') >>> from Bio import utils >>> # This shows that the Alphabet is correct >>> utils.translate_to_stop(record.primary_seq.seq) Seq('TSYLTPLITPLIVTLWVVSDFLFICIFDCIKRSLVFYLLFPKT', IUPACProtein()) >>> # Added a new 'Species' object for this, based on bioperl's >>> record.species >>> str(record.species) 'Eukaryota Metazoa Chordata Craniata Vertebrata Mammalia Eutheria Primates Catarrhini Hominidae Homo sapiens' >>> record.species.binomial() 'Homo sapiens' >>> # Loaded sequence features into the biopython Bio.SeqFeature.SeqFeature >>> len(record.seq_features) 26 >>> for feature in record.seq_features[:4]: ... print str(feature) ... type: source location: (0..3002) ref: None:None strand: 1 qualifiers: Key: db_xref, Value: taxon:9606 Key: haplotype, Value: C4 Key: note, Value: sequence found in a Melanesian population Key: organism, Value: Homo sapiens type: variation location: (110..111) ref: None:None strand: 1 qualifiers: Key: replace, Value: t type: variation location: (262..263) ref: None:None strand: 1 qualifiers: Key: note, Value: Rsa I polymorphism Key: replace, Value: t type: variation location: (272..273) ref: None:None strand: 1 qualifiers: Key: replace, Value: c >>> # Much more is supported - see the source! >>> If you are interested, the files are at http://www.biopython.org/~dalke/BioSeqDatabase.py http://www.biopython.org/~dalke/BioSeq.py It may be somewhat harder reading than usual because - this is prototype code - I've not used the bioperl object layer before - I've not used the bioperl-db schema before - I've not used MySQL before - there are almost no comments in the 550+ LOC I found out some things in the process: 1) The biopython SeqFeature currently must be used like: feature = SeqFeature() feature.type = "variation" ... I would much rather prefer allowing the values to be set through the constructor, as in feature = SeqFeature(type = "variation", ...) This is part of my design philosophy, which is to minimize modification of a data structure after it has been created. 2) is strand part of the feature or the location? In bioperl-db and in bioperl, the strand information is part of the Range. In SeqFeature, the strand is part of the SeqFeature and not of the FeatureLocation (which is our equivalent to the Range). This is a problem when a SeqFeature has several subfeatures. In the current scheme, the parent SeqFeature keeps track of a global start/end location for all its children. It also needs a strand. What strand is used if the subchildren are on different strands? (Does that possibility make sense?) In any case, I'm not sure enough about how to use a SeqFeature that I can't figure out if it's really wrong or not. 3) Related to that, what's the type used when there are subfeatures? The SeqFeature documentation says in the case of a join( ... ) the subfeatures have a typename of parent.type + "_span" > CDS join(1..10,30..40,50..60) > > The the top level feature would be a CDS from 1 to 60, and the sub > features would be of 'CDS_span' type and would be from 1 to 10, 30 to > 40 and 50 to 60, respectively but in Genbank._FeatureConsumer it says: > # add _join or _order to the name to make the type clear I don't like either one. Why does the SeqFeature need a type at all? 4) feature qualifiers shouldn't really be a dictionary The SeqFeature keeps track of the qualifiers through a dictionary. That's a problem because if there are two qualifiers with the same name then there's a collision, and one gets over written. (For example, LOCUS HUM2C18X01 1668 bp DNA PRI 24-AUG-1993 DEFINITION Homo sapiens cytochrome P4502C18 (CYP2C18) gene, 5' flank and exon 1. ACCESSION L16869 VERSION L16869.1 GI:291599 has two /citations exon <1270..1437 /gene="CYP2C18" /note="transcription start site not determined" /citation=[1] /citation=[3] /number=1 /evidence=experimental ) To get around this problem, the GenBank parser has: > If there are multiple qualifier keys with the same name we > would lose some info in the dictionary, so we append a unique > number to the end of the name in case of conflicts. > """ > # if we've got a key from before, add it to the dictionary of > # qualifiers > if self._cur_qualifier_key: > # get a unique name > unique_name = self._cur_qualifier_key > counter = 1 > while self._cur_feature.qualifiers.has_key(unique_name): > unique_name = self._cur_qualifier_key + str(counter) > counter = counter + 1 > > self._cur_feature.qualifiers[unique_name] = \ > self._cur_qualifier_value which means the qualifier key has a number on the end of it. That's messing with user visible data in a way that I definitely don't regard as kosher. (In my new bioperl-db integration code I still keep workaround but minimize it somewhat by not appending a 0 the first time around. That isn't the right solution either.) One way I've solved this problem before is to make a hybrid dict/list class, where you can refer to obj["key"] to get the value associated with the named key - a string - or obj[4] to get the value in the 5th position. This only works if the key can never be a string. I'm not suggesting it as a solution, only pointing out that other alternatives exist. - What's the numbering system of the FeatureLocation? When the GenBank parser uses it there is no conversion from GenBank's numbering system (where [i,j] means i<=position<=j and position == 1 is the first term) to Python's (where [i,j] means i<=position; from dalke@dalkescientific.com on Fri, Jul 13, 2001 at 01:13:48AM +0100 References: <003401c10b30$b0831180$c1c63ec1@josiah.ebi.ac.uk> Message-ID: <20010713102720.J66477@fling.sanbi.ac.za> Andrew Dalke on 2001-07-13 (Fri) at 01:13:48 +0100: > > - I've not used MySQL before Accessing MySQL via Andy Dustman's MySQLdb module is a strange twilight experience that is and yet isn't quite unlike most other MySQL APIs. On the one hand, MySQLdb implements many of the functions of the standard MySQL C API as methods. On the other, the emulated cursor class (which satisfies the Python DB API spec) makes using it very different from any other MySQL API I've used, and more in line with what you'd expect of an "enterprise" RDBMS. Somewhat strange... but overall MySQLdb is extremely Pythonic and quite clean. This all just aside and FYI and all that... -- V From dalke at dalkescientific.com Sat Jul 14 20:06:53 2001 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] selenocysteines Message-ID: <003d01c10cc2$0e9f6380$c1c63ec1@josiah.ebi.ac.uk> Hey all, Just noticed ftp://ftp.ncbi.nlm.nih.gov/genbank/gbrel.txt says: > Selenocysteine residues within the protein translations of coding > region features have been represented in GenBank via the letter 'X' > and a /transl_except qualifier. At the May 1999 DDBJ/EMBL/GenBank > collaborative meeting, it was learned that IUPAC plans to adopt the > letter 'U' for selenocysteine. Any knowledge on if that has occured. Also, I noticed the GenBank parsers is using the generic DNA, RNA and protein alphabets when it looks like it should use the IUPAC versions. Even if there are a few places where it fails, it should be more useful than what there is now. I'll go ahead and change it but if there are complaints (Brad? You did that code) I'll change it back. Andrew From idoerg at cc.huji.ac.il Sun Jul 15 03:43:28 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] bug in SeqIO.FASTA? Message-ID: Hi, Line 80 in SeqIO.FASTA: Method write_records is defined as: def write_records(records): . . . Needs to accept a list of SeqRecords, as far as I can tell. Seems to abort though: fasta_1.write_records(rec_list_1) TypeError: write_records() takes exactly 1 argument (2 given) Shouldn't the definition line be: def write_records(self,records): ????? Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From dalke at dalkescientific.com Sun Jul 15 07:26:49 2001 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] bug in SeqIO.FASTA? Message-ID: <004601c10d21$09f641a0$c1c63ec1@josiah.ebi.ac.uk> Iddo: >Shouldn't the definition line be: >def write_records(self,records): Yep. I updated CVS. Andrew From idoerg at cc.huji.ac.il Sun Jul 15 10:04:34 2001 From: idoerg at cc.huji.ac.il (Iddo Friedberg) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] selenocysteines In-Reply-To: <003d01c10cc2$0e9f6380$c1c63ec1@josiah.ebi.ac.uk> Message-ID: On Sun, 15 Jul 2001, Andrew Dalke wrote: : Hey all, : : Just noticed ftp://ftp.ncbi.nlm.nih.gov/genbank/gbrel.txt says: : : : >Selenocysteine residues within the protein translations of coding : > region features have been represented in GenBank via the letter 'X' : > and a /transl_except qualifier. At the May 1999 DDBJ/EMBL/GenBank : > collaborative meeting, it was learned that IUPAC plans to adopt the : > letter 'U' for selenocysteine. : : Any knowledge on if that has occured. : Well, I looked at Swiss-Prot's FDHF_ECOLI, which contains selenocysteine in position 140. (Just to make things even happier: SwissProt marks this with a "C"). Moseying on to M13563, one of 3 pointed to GenBank enries, I indeed found an 'X' in the position the Selenocysteine. Same thing for the E.coli chromosome file (U0006) and the (what appears to be) a contig. Guess the switch hasn't happened yet. Iddo -- Iddo Friedberg | Tel: +972-2-6758647 Dept. of Molecular Genetics and Biotechnology | Fax: +972-2-6757308 The Hebrew University - Hadassah Medical School | email: idoerg@cc.huji.ac.il POB 12272, Jerusalem 91120 | Israel | http://bioinfo.md.huji.ac.il/marg/people-home/iddo/ From dalke at dalkescientific.com Tue Jul 17 05:27:06 2001 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] selenocysteines Message-ID: <00c801c10ea2$a57ce0c0$c1c63ec1@josiah.ebi.ac.uk> Looks like U for selenocysteines is the right thing to do, so I added that to the extended protein alphabet. That also allowed me to add 'X' as the symbol for unknown protein. Anything which uses X for selenocysteines should translate it to U on import. SWISS-PROT is using 'C' + /translation comment for selenocysteines. Since 'C' != 'X' it doesn't fall under that previous paragraph. :) Andrew From biopython-bugs at bioperl.org Tue Jul 17 23:30:33 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Notification: incoming/38 Message-ID: <200107180330.f6I3UXw20222@pw600a.bioperl.org> JitterBug notification new message incoming/38 Message summary for PR#38 From: katel@worldpath.net Subject: skipping Martel fields Date: Tue, 17 Jul 2001 23:30:31 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From katel@worldpath.net Tue Jul 17 23:30:31 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f6I3UVw20215 for ; Tue, 17 Jul 2001 23:30:31 -0400 Date: Tue, 17 Jul 2001 23:30:31 -0400 Message-Id: <200107180330.f6I3UVw20215@pw600a.bioperl.org> From: katel@worldpath.net To: biopython-bugs@bioperl.org Subject: skipping Martel fields Full_Name: Katharine Lindner Module: Version: 1.00a OS: Winndows 98 Submission from: (NULL) (207.3.148.253) Set up a test case with the format at the end. The parser skips the line sfter a short codon Examples: SEQTPA 92 83 act THR T SEQTPA 93 84 cc SEQTPA 94 85 gag GLU E SEQTPA 95 86 ggc GLY G Skips residue 94 Format: alpha = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' amino_1_letter_codes = 'ACDEFGHIKLMNPQRSTVWY-' nucleotides = 'gcat-' amino_alts = map( Martel.Str, amino_3_letter_codes ) codon = Martel.Group( "codon", Martel.MaxRepeat( Martel.Any( nucleotides ), 1, 3 ) ) amino_3_letter_code = Martel.Group( "amino_3_letter_code", \ reduce( Martel.Alt, amino_alts ) ) amino_1_letter_code = Martel.Group( "amino_1_letter_code", \ Martel.Any( amino_1_letter_codes ) ) amino_acid = Martel.Group( "amino_acid", blank_space + amino_3_letter_code + blank_space + amino_1_letter_code ) residue = Martel.Group( "residue", blank_space + codon + Martel.Opt( amino_acid ) ) kabatid = Martel.Group("kabatid", Martel.Rep1(Martel.Integer())) pubmed_num = Martel.Group("pubmed_num", Martel.Rep1(Martel.Integer())) residue_num = Martel.Group("residue_num", Martel.Rep1(Martel.Integer())) kabat_num = Martel.Group("kabat_num", Martel.Rep1(Martel.Integer()) + Martel.Opt( Martel.Any( alpha ) ) ) id_line = Martel.Group("id_line", Martel.Str("KADBID") + blank_space + kabatid + Martel.ToEol() ) residue_line = Martel.Group( "residue_line", Martel.Str( "SEQTPA" ) + blank_space + residue_num + blank_space + kabat_num + Martel.Opt( residue ) + floop ) kabat_record_end_line = Martel.Group( "record_end", Martel.Str( "RECEND" ) + Martel.ToEol() + Martel.ToEol() + Martel.ToEol()) residue_lines = Martel.Group( "residue_lines", Martel.Rep( residue_line ) ) kabat_record = id_line + residue_lines + kabat_record_end_line From katel at worldpath.net Sat Jul 21 18:35:11 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] WIT database Message-ID: <001701c11235$68b5dc00$010a0a0a@cadence.com> After some cleanup of old parsers, I plan to write a parser for the WIT database. It should be interesting because it is a database of metabolic steps instead of sequences. Cayte From tarjei at genome.wi.mit.edu Sat Jul 21 16:14:36 2001 From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] WIT database In-Reply-To: <001701c11235$68b5dc00$010a0a0a@cadence.com> Message-ID: Hi, I'm new to this project, but I'm currently in the process of writing a BioPython parser for the KEGG database, which is very similar to WIT. I'll soon be posting a module for parsing the KEGG/Ligand database here, which I hope you'll integrate into BioPython. I've also been toying with the idea of writing a parser for the pathway section of KEGG, but I think that to maximize the usefulness of such a module there should first be a set of objects for representing reactions and pathways (a Bio.Pathway module). This would make it possible to manipulate information from databases like KEGG and WIT in a uniform manner - just like all sequence information is parsed into a Bio.Seq object. If there is interest I'd be happy to collaborate, or even take the lead, in developing a Bio.Pathway module. As Cayte says below, this is an interesting challenge because, as far as I know, this is not a functionality that is currently present in any of the other bio* projects. Thanks, Tarjei Mikkelsen tarjei@genome.wi.mit.edu > -----Original Message----- > From: biopython-dev-admin@biopython.org > [mailto:biopython-dev-admin@biopython.org]On Behalf Of Cayte > Sent: Saturday, July 21, 2001 6:35 PM > To: biopython-dev@biopython.org > Subject: [Biopython-dev] WIT database > > > After some cleanup of old parsers, I plan to write a parser for the WIT > database. It should be interesting because it is a database of metabolic > steps instead of sequences. > > Cayte > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > From katel at worldpath.net Sat Jul 21 20:23:41 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] WIT database References: Message-ID: <002501c11244$90bd84a0$010a0a0a@cadence.com> > > I've also been toying with the idea of writing a parser for the > pathway section of KEGG, but I think that to maximize the usefulness > of such a module there should first be a set of objects for representing > reactions and pathways (a Bio.Pathway module). This would make it > possible to manipulate information from databases like KEGG and WIT > in a uniform manner - just like all sequence information is parsed into > a Bio.Seq object. > I think its a great idea! My first step is usually just to poke around the database and get ideas. Hopefully,other biopythons will be interested and after some brainstorming the ideas will converge to some reasonable Bio.Pathways objects. Cayte From tarjei at genome.wi.mit.edu Mon Jul 23 02:36:42 2001 From: tarjei at genome.wi.mit.edu (Tarjei S Mikkelsen) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Bio.KEGG Message-ID: Hi, as I mentioned earlier this weekend I have begun writing a BioPython compatible parser for the KEGG database that I would be happy to contribute to the project. Currently only the Ligand/Enzyme section is supported, but the Ligand/Compound and the Genes section should follow soon. Since I don't have CVS access I've attached the code here. It is organized to plug directly into the current BioPython distribution. The source is heavily inspired by the Bio.GenBank module (which makes sense since KEGG has (unfortunately) modeled its flatfile format on the GenBank format). thanks, Tarjei Mikkelsen tarjei@genome.wi.mit.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: bio.kegg.v0.1a.tar.gz Type: application/x-gzip Size: 14261 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010723/b784991b/bio.kegg.v0.1a.tar.bin From marchign at di.unipi.it Mon Jul 23 09:08:18 2001 From: marchign at di.unipi.it (Davide Marchignoli) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] "Features" of Bio.Clustalw Message-ID: I wanted to point out some (minor) problem I had using Bio.Clustalw. In an alignment run clustalw generates two files: - the output file (with default extension aln), and - the tree file (with default extension dnd) the MultipleAlignCL permits to change the output file with method set_output but there is no way to change the tree file. Is it possible to add a method set_tree that add to the command line the -newtree option and the -align option (At the moment I am subclassing MultipleAlignCL)? The do_alignment function executes print "executing %s..." % command_line and I think not everybody want it printed (I do not). Yes, one could temporary redirect sys.stdout, but why? Even if one sets the alignment type to PROTEIN (via set_type method of MultipleAlignCL) the resulting alignment has DNA alphabet. Tkanks, for your attention, Davide Marchignoli PS: Is it possible to submit patches to biopython? How can I contribute? From thomas at genome.cbs.dtu.dk Tue Jul 24 09:09:04 2001 From: thomas at genome.cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] New: sequtils.py Message-ID: Hej All, After the Biopython BoF meeting at ISMB01 in Copenhagen we decided to temporarily collect seqeuence utilities/functions in Bio/sequtils.py Cessie (our new biopython member) and I started by collecting some functions (some of them are just aliases to existing - but deeply hidden functions). Currently included: ProteinX, makeTableX for error free translation of ambiguous DNA complement, reverse, antiparallel and translate nice six_frame_translations ala DNA Strider/XBBtools GC, GC123, GC_skew, Accumulated_GC_skew fasta_uniqids for getting unique identifiers in the FASTA file (useful) for using clustalw quick_FASTA_reader for reading huge FASTA files (e.g. genomes) apply_on_multi_fasta: use any function (e.g. GC) and apply it on all entries in a multiple FASTA file Questions: 1) should we move Proteinx and maketablex somewhere else ? 2) we included a quick_fasta_reader hack, the FASTA parser is cool and nice but because of all checkings it takes ages for e.g. a complete genome Should we create a faster alternative ? (compatible with the normal one) 3) some functions exists in utils.py. Could we move sequence based functions to sequtils.py and use utils.py for other non-seqeunce based functions ? (e.g. I'd like to put my hyper-geometric distribution code there for expression data) 4) anyone got a hangover from yesterdays banquette ? cheers -thomas Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From jchang at SMI.Stanford.EDU Wed Jul 25 10:17:53 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Bio.KEGG In-Reply-To: Message-ID: Hi Tarjei, Thanks very much. I'm in copenhagen right now, so I"ll take a look at it next week after I get back. Jeff On Mon, 23 Jul 2001, Tarjei S Mikkelsen wrote: > > Hi, > > as I mentioned earlier this weekend I have begun writing a BioPython > compatible parser for the KEGG database that I would be happy to contribute > to the project. Currently only the Ligand/Enzyme section is supported, > but the Ligand/Compound and the Genes section should follow soon. > > Since I don't have CVS access I've attached the code here. It is organized > to plug directly into the current BioPython distribution. The source is > heavily inspired by the Bio.GenBank module (which makes sense since KEGG > has (unfortunately) modeled its flatfile format on the GenBank format). > > thanks, > > Tarjei Mikkelsen > tarjei@genome.wi.mit.edu > > From jchang at SMI.Stanford.EDU Sat Jul 28 12:40:55 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] "Features" of Bio.Clustalw In-Reply-To: References: Message-ID: Hi Davide, Thanks for the comments. We're all getting back from ISMB and catching up on email, so the proper developer will comment soon. >PS: Is it possible to submit patches to biopython? How can I contribute? Yes! We accept patches and new modules. Pathes are preferably done against a current CVS repository. Jeff From chapmanb at arches.uga.edu Sat Jul 28 15:03:10 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] "Features" of Bio.Clustalw In-Reply-To: References: Message-ID: <15203.3182.97701.271322@taxus.athen1.ga.home.com> Hi Davide; > I wanted to point out some (minor) problem I had using Bio.Clustalw. Thanks for sending the comments! Jeff: > Thanks for the comments. We're all getting back from ISMB and > catching up on email, so the proper developer will comment soon. I think I'm the "proper developer" since I wrote the Clustalw stuff, although this is the first time I've had someone refer to me like that :-) > In an alignment run clustalw generates two files: > - the output file (with default extension aln), and > - the tree file (with default extension dnd) > the MultipleAlignCL permits to change the output file with method > set_output but there is no way to change the tree file. Yup, this is bad. Thanks for catching it! > Is it possible to add a method set_tree that add to the command line the > -newtree option and the -align option? Definately -- this sounds exactly right. > The do_alignment function executes > print "executing %s..." % command_line > and I think not everybody want it printed (I do not). Yes, one could > temporary redirect sys.stdout, but why? Ooop, another good catch. It is definately bad for libraries to print things. > Even if one sets the alignment type to PROTEIN (via set_type method of > MultipleAlignCL) the resulting alignment has DNA alphabet. Bleah. You caught all of my bugs :-). Bad me! > PS: Is it possible to submit patches to biopython? How can I contribute? Definately. We loooooove contributions. If you have fixes for the above problems, the best way to submit the patch is to send a context or unified diff ('diff -c' or 'diff -u') to this list, and I'll be happy to apply it and make sure things are worked out. You can check out the CVS tree anonymously (instructions for this are at http://cvs.biopython.org/ ), and submitting patches against this would be extra-great. Thanks for your feedback on this. If you can send patches for whatever you can fix, I'll take care of the rest of the bugs, and beef up the test suite to catch them. Thanks again! Brad From katel at worldpath.net Mon Jul 30 19:10:37 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Pathway Module Message-ID: <002101c1194c$d997d5e0$010a0a0a@cadence.com> I'm posting some sketchy ideas about what the Pathway modules may contain, mostly to get the discussion going. I used arrays because most of the relations are many to many. Maybe reaction should have a pathway array? Step is separate from reaction, because a reaction could occur in more than one pathway. These classes are scaffolding, so please don't hesitate to suggest better ideas. There may be other information associate with reaction, like temperature, but I haven't come across it yet in the WIT or EMP databases. It looks like a lot of the meat is in the enzyme structures. In EMP, the structure of an Enzyme looks straight forward because it is a flat two column file with the first column the name of a field and the contents in the second. Cayte class Reaction: self.substrates = [] self.products = [] self.enzymes = [] self.factors = [] class PathStep: self.reaction = None self.inlinks = [] self.outlinks = [] class Pathway: self.name = '' self.organisms = [] self.steps = [] From davide at biodec.com Mon Jul 30 19:22:00 2001 From: davide at biodec.com (Davide Marchignoli) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] "Features" of Bio.Clustalw In-Reply-To: <15203.3182.97701.271322@taxus.athen1.ga.home.com> Message-ID: On Sat, 28 Jul 2001, Brad Chapman wrote: > Hi Davide; > > Thanks for your feedback on this. If you can send patches for whatever > you can fix, I'll take care of the rest of the bugs, and beef up the > test suite to catch them. > > Thanks again! > Brad > Hi Brad, Here I send the patches I was able to cook up, these are only minor changes, anyway I hope it will help. I think that having a class like MultipleAlignCL is superior to passing the alignment arguments to a function as is for blastpgp or blastall. For first, it allow you to see which command it will be executed (for debugging purposes it can be interesting) simply applying str to the object. Also it permits to build a set of parameters to be used on different files. Finally it is a general mechanism and could be used to give a uniform interface to functions invoking external programs. Do you think you would be interested in a patch implementing such behaviour? I think one could also retain compatibilty with the current interface. Thanks again, Davide Marchignoli From jchang at SMI.Stanford.EDU Mon Jul 30 19:34:28 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] New: sequtils.py In-Reply-To: References: Message-ID: Hi Thomas, >2) we included a quick_fasta_reader hack, the FASTA parser is cool and nice > but because of all checkings it takes ages for e.g. a complete genome > Should we create a faster alternative ? (compatible with the normal one) Yes! It looks like the current Fasta readers are implemented in python and might even have some overhead in parsing the description lines. It looks like we should update our FASTA handling to 1) run faster and 2) allow optional in-depth parsing of description lines. Perhaps it's time to redo it in Martel. >3) some functions exists in utils.py. Could we move sequence based functions > to sequtils.py and use utils.py for other non-seqeunce based functions ? > (e.g. I'd like to put my hyper-geometric distribution code there >for expression data) Sure. However, I'm not sure hyper-geometric distributions should go into utils.py. Perhaps we should start a new package to handle statistics and probability-type stuff. >4) anyone got a hangover from yesterdays banquette ? Looks like it... :) Jeff From davide at biodec.com Mon Jul 30 19:45:40 2001 From: davide at biodec.com (Davide Marchignoli) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] "Features" of Bio.Clustalw (fwd) Message-ID: Ops, I forgot the patch. Bye, Davide Marchignoli -------------- next part -------------- diff -Nacr /usr/src/biopython-300701/Bio/Blast/Record.py Bio/Blast/Record.py *** /usr/src/biopython-300701/Bio/Blast/Record.py Mon Jul 30 23:14:13 2001 --- Bio/Blast/Record.py Tue Jul 31 00:22:22 2001 *************** *** 291,294 **** DatabaseReport.__init__(self) Parameters.__init__(self) self.rounds = [] ! converged = 0 --- 291,294 ---- DatabaseReport.__init__(self) Parameters.__init__(self) self.rounds = [] ! self.converged = 0 diff -Nacr /usr/src/biopython-300701/Bio/Clustalw/__init__.py Bio/Clustalw/__init__.py *** /usr/src/biopython-300701/Bio/Clustalw/__init__.py Mon Jul 30 23:14:13 2001 --- Bio/Clustalw/__init__.py Tue Jul 31 00:14:24 2001 *************** *** 62,79 **** return align_handler.align ! def do_alignment(command_line): """Perform an alignment with the given command line. Arguments: o command_line - A command line object that can give out the command line we will input into clustalw. Returns: o A clustal alignment object corresponding to the created alignment. If the alignment type was not a clustal object, None is returned. """ - print "executing %s..." % command_line run_clust = os.popen(str(command_line)) value = run_clust.close() --- 62,81 ---- return align_handler.align ! def do_alignment(command_line, alphabet=None): """Perform an alignment with the given command line. Arguments: o command_line - A command line object that can give out the command line we will input into clustalw. + o alphabet - the alphabet to use in the created alignment. If not + specified IUPAC.unambiguous_dna and IUPAC.protein will be used for + dna and protein alignment respectively. Returns: o A clustal alignment object corresponding to the created alignment. If the alignment type was not a clustal object, None is returned. """ run_clust = os.popen(str(command_line)) value = run_clust.close() *************** *** 107,113 **** return None # otherwise parse it into a ClustalAlignment object else: ! return parse_file(out_file) class ClustalAlignment(Alignment): --- 109,118 ---- return None # otherwise parse it into a ClustalAlignment object else: ! if not alphabet: ! alphabet = (IUPAC.unambiguous_dna, IUPAC.protein)[ ! command_line.type == 'PROTEIN'] ! return parse_file(out_file, alphabet) class ClustalAlignment(Alignment): *************** *** 361,366 **** --- 366,372 ---- # 2. a guide tree to use self.guide_tree = None + self.new_tree = None # 3. matrices self.protein_matrix = None *************** *** 369,375 **** # 4. type of residues self.type = None ! def __repr__(self): """Write out the command line as a string.""" cline = self.command + " " + self.sequence_file --- 375,381 ---- # 4. type of residues self.type = None ! def __str__(self): """Write out the command line as a string.""" cline = self.command + " " + self.sequence_file *************** *** 392,397 **** --- 398,406 ---- cline = cline + " -CASE=" + self.change_case if self.add_seqnos: cline = cline + " -SEQNOS=" + self.add_seqnos + if self.new_tree: + # clustal does not work if -align is written -ALIGN + cline = cline + " -NEWTREE=" + self.new_tree + " -align" # multiple alignment options if self.guide_tree: *************** *** 478,483 **** --- 487,497 ---- else: self.guide_tree = tree_file + def set_new_guide_tree(self, tree_file): + """Set the name of the guide tree file generated in the alignment. + """ + self.new_tree = tree_file + def set_protein_matrix(self, protein_matrix): """Set the type of protein matrix to use. From jchang at SMI.Stanford.EDU Tue Jul 31 12:56:04 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] Re: [BioPython] tests failing In-Reply-To: <3B66BE5E.7020204@herts.ac.uk> References: <3B66BE5E.7020204@herts.ac.uk> Message-ID: Hey Mark, Thanks for letting us know about these. I'm moving this thread onto the "biopython-dev" list, as it's probably more appropriate there. >Failure: test_SubsMat > >AssertionError: >output: 'M0.00 0.40 0.70 0.80 1.00\n' >Expected: 'M -0.00 0.40 0.70 0.80 1.00\n' It looks like this is from a difference in how windows and Iddo's OS handles 0's. It's probably not serious, but should be fixed. Iddo, can you write some code that will check for this? >Error: test_gobase > >from Bio import Sequence >ImportError: cannot import name Sequence > >Error: test_rebase > >from Bio import Sequence >ImportError: cannot import name Sequence These seem to be from some legacy code that hasn't been cleaned up. It's now fixed in the CVS and will be incorporated into the next release. >Failure: test_prodoc > >AssertionError: >Output: 'J. \n' >Expected: 'J. \n' Brad, this looks pretty odd. Is it a newline problem? Jeff From jchang at SMI.Stanford.EDU Tue Jul 31 13:12:55 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:01 2005 Subject: [Biopython-dev] biopython-dev BOSC BoF redux Message-ID: Hello everybody, Here is a summary of the Developer's BoF at BOSC. It'll provide a good roadmap of things that need to be done in the next few months. Jeff Andrew's work at EBI - Andrew met bioperl and biojava developers - Andrew wrote code to connect biopython objects to Ewan's bioperl-db Documentation - Brad needs some feedback, what's important to work on? - we need documentation for Martel - manual and tutorial should be separate - A cookbook with mini blocks of working code is a good idea. Brad will investigate - general dissatisfaction with Wiki. Organization problems? Reorganizing modules Currently, modules are organized: Bio/ [databases] Data/ Tools/ Classification/ Clustering/ Transcribe.py Translate.py listfns.py mathfns.py stringfns.py MultiProc/ Seq.py SeqFeature.py utils.py Martel/ Discussion points: - want a broad tree rather than deep tree -- stuff easier to find - packages for end users should be on top, rather than buried deep inside - should try to keep stuff in Bio namespace to avoid clashes with other packages (Martel is an exception) - directories within Tools/ should be moved up one level - should add sequtils.py to include general sequence calculations Johann's stuff - has a DAS server, almost done - has code to work with controlled vocabularies on expression data - needs to get approval to release to biopython Integrating current parsing API with Martel We considered two alternatives to building the parsing API. Both are compatible with Martel: Current API: parser = XXXParser() obj = parser.parse(handle) Alternative API: p = format.make_parser() p.setContentHandler(obj) p.parseFile(handle) return obj.built We agreed to map Martel into the current API, because it is simpler and less tied to the Martel/SAX way of doing things. Biopython.com Technically, we talked about this in the general BoF, but I'm including it here because of its importance. Andrew had some proposals for what to do with the biopython.com domain name. We could 1) allow him to use it for his consulting company, or we can 2) reserve it for companies that provide support and/or services for biopython. We agreed that the second would be the better choice, and trust him to maintain the domain for those purposes.