From chapmanb at arches.uga.edu Tue Jun 5 03:29:00 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] New stuff in CVS Message-ID: <15132.35388.457962.154037@taxus.athen1.ga.home.com> Hey Jeff, et al; I noticed a new c-extension was added to the setup.py file, which is supposed to be living in: Bio/Tools/Statistics/cstathelpermodule.c but it doesn't look like it actually got checked in. Not sure if this is just an oversight, or if it just isn't ready yet... Also, I noticed the download_many function added to Bio.GenBank. Very useful stuff. If you want to work more on some of the stuff you mention in the comments, like checking that all GIs are valid and stuff like that, I have some code which does sort of similar stuff, although I do it in a different way -- my code was meant to start with a list of accession numbers and retrieve all of the records in GenBank from that (I use it to automatically update my local copy o' the Arabidopsis genome). So it doesn't start directly with GIs like yours. If you want to check it out/steal some code or whatever, it's at: http://bioinformatics.org/cgi-bin/cvsweb.cgi/biopy-pgml/Bio/PGML/Organize/FileFactory.py?rev=1.3&content-type=text/x-cvsweb-markup (sorry about the long URL). Also, along the same lines, I added some other KeyError conditions to NCBIDictionary which I ran into while I was working on this. The two other things I ran into were getting back: Please try again later. Server error for GI "7212005" and: The sequence has been intentionally withdrawn : GI "9993999" The first just happens randomly (whenever NCBI isn't happy, I guess), and the second will happen if there are some cross-refs between the old record and the new one. You'd get back the withdrawn GI first and the new, good GI second, so if there wasn't an error than you'll end up with the withdrawn "record" instead of the good one. Just thought I'd mention it in case you see any other stuff along these lines. Thanks again for the new code! Brad From nissim at math.ufl.edu Sat Jun 9 07:11:21 2001 From: nissim at math.ufl.edu (Nissim Broudo) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython for Windows NT Message-ID: <003201c0f0d4$ebe28d80$9553e30a@computer> I downloaded the self-installing executable biopython-1.00a1.win32-py2.0.exe to my PC running Windows NT. When I go ahead and run it, I'm prompted for the 'installation directory', but the text box won't accept any characters. Any ideas ? Thanks, Nissim Broudo -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010609/56472b71/attachment.htm From chapmanb at arches.uga.edu Fri Jun 15 14:42:00 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython for Windows NT In-Reply-To: <003201c0f0d4$ebe28d80$9553e30a@computer> References: <003201c0f0d4$ebe28d80$9553e30a@computer> Message-ID: <15146.22264.221938.111829@taxus.athen1.ga.home.com> Hi Nissim; Thanks for writing -- apologies for the delay in getting back with you. Our windows sysadmin just got back today so that I could get myself permissions to test this out and try installing on 2000 machines. Thanks for waiting. > I downloaded the self-installing executable > > biopython-1.00a1.win32-py2.0.exe > > to my PC running Windows NT. When I go ahead and run it, I'm > prompted for the 'installation directory', but the text box > won't accept any characters. Any ideas ? I'm not sure if I exactly know what box you are talking about, but I'm thinking this is the initial box where it tries to find your Python installation. I hope that is right and the rest of this mail helps. After playing around with this for a while on both 98 and Windows 2000 (sorry that we don't have an NT box at my lab to try it on), I've come to grips with some reasons why this might not be working. Basically, it appears that the installer looks in the registry for a Python installation, and pulls out all of the applicable installs. If it doesn't find an installation, the type-in box will be useless (as you found out). The installer was built using distutils, so this is a limitation there. The best thing to do is to make sure it finds your python installation. I can think of the following reasons why it might not find python: -> You don't have python installed. -> You don't have the right version of python installed. The installer requires python 2.0. If you need a version for 2.1, I can make one for you. -> The installation of python is not in the registry for all users. Under Windows 2000, if you install python without Administrator permissions, the registry info for python will only be available for the user who installed it. The fix here is to reinstall python with Administrator permissions. Does this help the problem? Don't hesitate to ask again if you have more problems getting it running. Brad From nissim at math.ufl.edu Fri Jun 15 13:05:19 2001 From: nissim at math.ufl.edu (Nissim Broudo) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython for Windows NT References: <003201c0f0d4$ebe28d80$9553e30a@computer> <15146.22264.221938.111829@taxus.athen1.ga.home.com> Message-ID: <002d01c0f5bd$5e3ffdd0$8953e30a@computer> Hi Brad, Thanks for writing back. You were correct to guess that the text box I'm being prompted with is requesting my Python installation. I looked at the options you listed and by process of elimination, the problem is that I am running Python 2.1. Should I go ahead and install Python 2.0 or should I wait for a biopython versions that relies on 2.1 ? Either way is fine with me. Nissim ----- Original Message ----- From: "Brad Chapman" To: "Nissim Broudo" Cc: Sent: Friday, June 15, 2001 7:42 PM Subject: Re: [Biopython-dev] Biopython for Windows NT > Hi Nissim; > Thanks for writing -- apologies for the delay in getting back with > you. Our windows sysadmin just got back today so that I could get > myself permissions to test this out and try installing on 2000 > machines. Thanks for waiting. > > > I downloaded the self-installing executable > > > > biopython-1.00a1.win32-py2.0.exe > > > > to my PC running Windows NT. When I go ahead and run it, I'm > > prompted for the 'installation directory', but the text box > > won't accept any characters. Any ideas ? > > I'm not sure if I exactly know what box you are talking about, but I'm > thinking this is the initial box where it tries to find your Python > installation. I hope that is right and the rest of this mail helps. > > After playing around with this for a while on both 98 and Windows 2000 > (sorry that we don't have an NT box at my lab to try it on), I've come > to grips with some reasons why this might not be working. Basically, > it appears that the installer looks in the registry for a Python > installation, and pulls out all of the applicable installs. If it > doesn't find an installation, the type-in box will be useless (as you > found out). The installer was built using distutils, so this is a > limitation there. > > The best thing to do is to make sure it finds your python > installation. I can think of the following reasons why it might not > find python: > > -> You don't have python installed. > > -> You don't have the right version of python installed. The installer > requires python 2.0. If you need a version for 2.1, I can make one for > you. > > -> The installation of python is not in the registry for all > users. Under Windows 2000, if you install python without Administrator > permissions, the registry info for python will only be available for > the user who installed it. The fix here is to reinstall python with > Administrator permissions. > > Does this help the problem? Don't hesitate to ask again if you have > more problems getting it running. > > Brad > From chapmanb at arches.uga.edu Fri Jun 15 21:44:42 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython for Windows NT In-Reply-To: <002d01c0f5bd$5e3ffdd0$8953e30a@computer> References: <003201c0f0d4$ebe28d80$9553e30a@computer> <15146.22264.221938.111829@taxus.athen1.ga.home.com> <002d01c0f5bd$5e3ffdd0$8953e30a@computer> Message-ID: <15146.47626.353756.614708@taxus.athen1.ga.home.com> Hi Nissim; > Thanks for writing back. No problem, sorry again about the delay. > You were correct to guess that the text box I'm > being prompted with is requesting my Python installation. Whoo hoo! I'm so happy I was not just rambling pointlessly :-) > I looked at the options you listed and by process of elimination, the > problem is that I am running Python 2.1. Should I go ahead and install > Python 2.0 or should I wait for a biopython versions that relies on 2.1 ? > Either way is fine with me. Well, either way is fine with me too -- I just created and uploaded an installer for Python-2.1. Biopython works just fine with 2.1, but that version just hadn't been released yet last time we were getting our biopython release ready, so I hadn't prepared a Windows installer. So, you can choose either to downgrade to Python-2.0, or just use the new installer I uploaded. There's more than 1 way to do it :-). Let us know if you have any other problems with anything! Brad From biopython-bugs at bioperl.org Tue Jun 19 10:57:43 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/35 Message-ID: <200106191457.f5JEvh826278@pw600a.bioperl.org> JitterBug notification new message incoming/35 Message summary for PR#35 From: tarjei@mit.edu Subject: NCBIStandalone.BlastParser bug Date: Tue, 19 Jun 2001 10:57:42 -0400 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From tarjei@mit.edu Tue Jun 19 10:57:42 2001 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5JEvg826272 for ; Tue, 19 Jun 2001 10:57:42 -0400 Date: Tue, 19 Jun 2001 10:57:42 -0400 Message-Id: <200106191457.f5JEvg826272@pw600a.bioperl.org> From: tarjei@mit.edu To: biopython-bugs@bioperl.org Subject: NCBIStandalone.BlastParser bug Full_Name: Tarjei Mikkelsen Module: Bio.Blast.NCBIStandalone.BlastParser Version: 1.00a OS: Dec/Alpha OSF1 Submission from: incognito.mit.edu (18.246.0.239) The standalone BLAST record parser (Bio.Blast.NCBISTandalone.BlastParser) fails with a SyntaxError when the (path)name of the database spans more than one line. The following code stub/BLAST output will reproduce the bug: (Even though this example is from BLAST 2.0.5 the same thing happens in newer versions) <<<<>>>> from Bio.Blast import NCBIStandalone blast_out = open("blast_parser_bug.out", "r") blast_parser = NCBIStandalone.BlastParser() blast_record = blast_parser.parse(blast_out) <<<<>>>> <<<<>>>> BLASTP 2.0.5 [May-5-1998] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= eco:b1416 (83 letters) Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.11 .fa 39 sequences; 18,779 total letters Searching......................................done Score E Sequences producing significant alignments: (bits) Value spy:SPy1283 20 0.64 lla:L0002 20 0.84 >spy:SPy1283 Length = 337 Score = 20.4 bits (41), Expect = 0.64 Identities = 10/26 (38%), Positives = 17/26 (64%), Gaps = 1/26 (3%) Query: 21 GYTDEEIVSSDIIG-SHFGSVFDATQ 45 G +EE+V S I+G + G++F T+ Sbjct: 287 GIHNEELVESPILGTAEEGALFSLTE 312 >lla:L0002 Length = 340 Score = 20.0 bits (40), Expect = 0.84 Identities = 10/25 (40%), Positives = 16/25 (64%), Gaps = 1/25 (4%) Query: 21 GYTDEEIVSSDIIG-SHFGSVFDAT 44 G +EE+V S I+G + G++F T Sbjct: 286 GIRNEELVESPILGTAEEGALFSLT 310 Score = 18.8 bits (37), Expect = 1.9 Identities = 9/29 (31%), Positives = 17/29 (58%), Gaps = 1/29 (3%) Query: 28 VSSDIIGSHFGSVFD-ATQTEITAVGDLQ 55 + +DI+G+ F FD A T + A+ ++ Sbjct: 126 IDNDIVGTDFTIGFDTAVSTVVDALDKIR 154 Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1. 11.fa Posted date: Jun 18, 2001 1:19 PM Number of letters in database: 18,779 Number of sequences in database: 39 Lambda K H 0.313 0.129 0.352 Gapped Lambda K H 0.270 0.0470 0.230 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 2788 Number of Sequences: 39 Number of extensions: 119 Number of successful extensions: 3 Number of sequences better than 10: 2 Number of HSP's better than 10.0 without gapping: 2 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 0 Number of HSP's gapped (non-prelim): 3 length of query: 83 length of database: 18779 effective HSP length: 33 effective length of query: 50 effective length of database: 17492 effective search space: 874600 T: 11 A: 40 X1: 16 ( 7.2 bits) X2: 38 (14.8 bits) X3: 64 (24.9 bits) S1: 34 (18.3 bits) S2: 31 (16.5 bits) <<<<>>>> From jchang at SMI.Stanford.EDU Tue Jun 19 11:25:05 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/35 In-Reply-To: <200106191457.f5JEvh826278@pw600a.bioperl.org> Message-ID: Good catch! Yes, the parser is assuming 1 line for the database. I've gone through and fixed this in the NCBIStandalone.py file. Please install this over your previous one and let me know if it works. I'm not sure if there are more outstanding issues with the formatting of BLAST 2.0.5. If there are, please send me the offending output to me directly as an attachment -- the jitterbug remailer reformats text so I can't tell exactly what the original output looks like. Thanks, Jeff > From: biopython-bugs@bioperl.org > Date: Tue, 19 Jun 2001 10:57:43 -0400 > To: biopython-dev@biopython.org > Subject: [Biopython-dev] Notification: incoming/35 > > JitterBug notification > > new message incoming/35 > > Message summary for PR#35 > From: tarjei@mit.edu > Subject: NCBIStandalone.BlastParser bug > Date: Tue, 19 Jun 2001 10:57:42 -0400 > 0 replies 0 followups > > ====> ORIGINAL MESSAGE FOLLOWS <==== > >> From tarjei@mit.edu Tue Jun 19 10:57:42 2001 > Received: from localhost (localhost [127.0.0.1]) > by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5JEvg826272 > for ; Tue, 19 Jun 2001 10:57:42 -0400 > Date: Tue, 19 Jun 2001 10:57:42 -0400 > Message-Id: <200106191457.f5JEvg826272@pw600a.bioperl.org> > From: tarjei@mit.edu > To: biopython-bugs@bioperl.org > Subject: NCBIStandalone.BlastParser bug > > Full_Name: Tarjei Mikkelsen > Module: Bio.Blast.NCBIStandalone.BlastParser > Version: 1.00a > OS: Dec/Alpha OSF1 > Submission from: incognito.mit.edu (18.246.0.239) > > > The standalone BLAST record parser (Bio.Blast.NCBISTandalone.BlastParser) > fails > with a SyntaxError when the (path)name of the database spans more than one > line. > > The following code stub/BLAST output will reproduce the bug: (Even though this > example is from BLAST 2.0.5 the same thing happens in newer versions) > > <<<<>>>> > from Bio.Blast import NCBIStandalone > > blast_out = open("blast_parser_bug.out", "r") > blast_parser = NCBIStandalone.BlastParser() > blast_record = blast_parser.parse(blast_out) > <<<<>>>> > > <<<<>>>> > BLASTP 2.0.5 [May-5-1998] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= eco:b1416 > (83 letters) > > Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.11 > .fa > 39 sequences; 18,779 total letters > > Searching......................................done > > Score E > Sequences producing significant alignments: (bits) > Value > > spy:SPy1283 20 0.64 > lla:L0002 20 0.84 > >> spy:SPy1283 > Length = 337 > > Score = 20.4 bits (41), Expect = 0.64 > Identities = 10/26 (38%), Positives = 17/26 (64%), Gaps = 1/26 (3%) > > Query: 21 GYTDEEIVSSDIIG-SHFGSVFDATQ 45 > G +EE+V S I+G + G++F T+ > Sbjct: 287 GIHNEELVESPILGTAEEGALFSLTE 312 > > >> lla:L0002 > Length = 340 > > Score = 20.0 bits (40), Expect = 0.84 > Identities = 10/25 (40%), Positives = 16/25 (64%), Gaps = 1/25 (4%) > > Query: 21 GYTDEEIVSSDIIG-SHFGSVFDAT 44 > G +EE+V S I+G + G++F T > Sbjct: 286 GIRNEELVESPILGTAEEGALFSLT 310 > > > Score = 18.8 bits (37), Expect = 1.9 > Identities = 9/29 (31%), Positives = 17/29 (58%), Gaps = 1/29 (3%) > > Query: 28 VSSDIIGSHFGSVFD-ATQTEITAVGDLQ 55 > + +DI+G+ F FD A T + A+ ++ > Sbjct: 126 IDNDIVGTDFTIGFDTAVSTVVDALDKIR 154 > > > Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1. > 11.fa > Posted date: Jun 18, 2001 1:19 PM > Number of letters in database: 18,779 > Number of sequences in database: 39 > > Lambda K H > 0.313 0.129 0.352 > > Gapped > Lambda K H > 0.270 0.0470 0.230 > > > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Hits to DB: 2788 > Number of Sequences: 39 > Number of extensions: 119 > Number of successful extensions: 3 > Number of sequences better than 10: 2 > Number of HSP's better than 10.0 without gapping: 2 > Number of HSP's successfully gapped in prelim test: 0 > Number of HSP's that attempted gapping in prelim test: 0 > Number of HSP's gapped (non-prelim): 3 > length of query: 83 > length of database: 18779 > effective HSP length: 33 > effective length of query: 50 > effective length of database: 17492 > effective search space: 874600 > T: 11 > A: 40 > X1: 16 ( 7.2 bits) > X2: 38 (14.8 bits) > X3: 64 (24.9 bits) > S1: 34 (18.3 bits) > S2: 31 (16.5 bits) > <<<<>>>> > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010619/b14ef800/attachment.htm From jchang at SMI.Stanford.EDU Thu Jun 28 02:19:32 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release Message-ID: Hey happy developers! It smells like time to put together another release to get some bug fixes and new functionality out to the public. I currently have the middle of next week in mind, although that's still up for debate. Here's my undoubtedly incomplete list of stuff that's been updated since the last release (on Mar 3!): deprecated old regression testing frameworks deprecated Sequence.py (still needs to be done) Swiss-Prot parser bug fixes GenBank parser bug fixes can now download many sequences at a time from GenBank kMeans clustering algorithm support for Kabat support for FSSP numerous updates in alignment code fixed memory leak in listfns Martel bundled and part of the install procedure Please let me know if: 1) you're currently working on something and really want to hold off until it's done. 2) there's new, fixed stuff, or deleted stuff that I overlooked 3) something I said is done is not ready 4) I'm out of my mind to be releasing something now We're still marching towards a 1.0 release sometime. Before that goes out, we'll need to: - stabilize the APIs - integrate Martel parsers, deprecate old ones - flesh out the regression tests Stuff I'd like, but may not get done: - PDB parser - internet-aware regression tester - dynamic programming code (Brad, where's yours? :) Let me know what you think. Core developers, I will wait until I hear from each of you before I move forward. Thanks, Jeff From dalke at acm.org Thu Jun 28 02:29:20 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release Message-ID: <09d601c0ff9b$aac19140$6401a8c0@josiah.dalkescientific.com> Jeff: > ... Looks good with me. BTW all, I'm going to be visiting EBI before going to BOSC, starting next Friday. My plan is to work a lot on biopython - getting it up to date with some of the other projects, finshing off and testing Martel (they have a lot more databases at EBI than I do!), propagandizing Python in the den of Perlers :) and generally having fun. I don't plan to do any API changes, so none of it should affect a 1.0 final release. But there might be a lot of new code. > Stuff I'd like, but may not get done: > - PDB parser Not going to happen from me soon. It's tricky stuff. On the other hand, if you just want support for the SEQRES, ATOM, HETATOM, TER, BOND and MODEL/ENDMDL cards then it's a bit easier. Still lots of trickiness (which ATOM formats? 1.x or 2.x? XPLOR-style?) Andrew dalke@acm.org From thomas at cbs.dtu.dk Thu Jun 28 03:27:09 2001 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: Jeffrey Chang's message of "Wed, 27 Jun 2001 23:19:32 -0700" References: Message-ID: Jeffrey Chang writes: > Hey happy developers! > > It smells like time to put together another release to get some bug > fixes and new functionality out to the public. I currently have the > middle of next week in mind, although that's still up for debate. > > Here's my undoubtedly incomplete list of stuff that's been updated > since the last release (on Mar 3!): > > deprecated old regression testing frameworks > deprecated Sequence.py (still needs to be done) > Swiss-Prot parser bug fixes > GenBank parser bug fixes > can now download many sequences at a time from GenBank > kMeans clustering algorithm Cool .... Arghhh, I should have spend more time following all development postings, I spend 2 days implementing kMeans in python ... :-) - one observation for calculating the euclidean distance, looping over the vectors seems to be faster than the vector based operation .... the most frequent used function in the kMean algoritm is the euclidean distance measure, so we gain significant speed ... (at least on my machine) (any thoughts about implementing the bi-secting kmeans algoritm ?) # this is slow def EucDist2(v1, v2): return sqrt(sum((v1-v2)**2)) # this is faster def EucDist1(v1, v2): sum = 0 for i in range(0,len(v1)): sum += (v1[i] -v2[i])**2 return sqrt(sum) > support for Kabat > support for FSSP > numerous updates in alignment code > fixed memory leak in listfns > Martel bundled and part of the install procedure > > Please let me know if: > 1) you're currently working on something and really want to hold > off until it's done. I have small things I always wanted to include but I never did ... Where should we include sequence based calculations like GC-content, GC3 etc. - should we put that in the sequence class (not a good idea IMHO) or create a new module e.g. seqstat ? Whats the status of an antiparallel or complementary function - do we still lack one ? > Let me know what you think. Core developers, I will wait until I hear > from each of you before I move forward. I think, this month I can spend significantly more time on the biopython project than the last months - so is anything I mentioned worth to pull in the next release ? cheers -thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From dalke at acm.org Thu Jun 28 03:42:21 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release Message-ID: <0a1101c0ffa5$ddd9d060$6401a8c0@josiah.dalkescientific.com> Thomas Sicheritz-Ponten : ># this is slow >def EucDist2(v1, v2): > return sqrt(sum((v1-v2)**2)) > ># this is faster >def EucDist1(v1, v2): > sum = 0 > for i in range(0,len(v1)): > sum += (v1[i] -v2[i])**2 > return sqrt(sum) The first does more work than the second. It has to find the v1-v2 uses a "__sub__" method call, which then does the same as v1[i] - v2[i], except with the method call overhead. Ditto with ** defining "__pow__". It also makes itermediate objects for every call. (C++ use to have that problem. We worked on a system with a lot of overloaded 3-vectors. Got a huge performance boost turning the calls into 3-arg form. OTOH, the overloaded vector form was much easier to write and debug. Nowadays C++ people use expression templates.) The only thing I can suggest you change is to get rid of the "0, " in the range call. Out of curiosity, I tried for a1, a2 in zip(v1, v2): sum += (a1-a2) ** 2 The 'zip' version was about 3 times slower. Here's my test harness. def main(): for n in range(1, 6): v1 = range(0, 10**n) v2 = range(n, 10**n+n) t1 = time.time() d1 = EucDist1(v1, v2) t2 = time.time() d2 = EucDist3(v1, v2) t3 = time.time() assert d1 == d2, (d1, d2) print n, t2-t1, t3-t2 Andrew From jchang at SMI.Stanford.EDU Thu Jun 28 14:28:36 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release In-Reply-To: References: Message-ID: At 9:27 AM +0200 6/28/01, Thomas Sicheritz-Ponten wrote: >- one observation for calculating the euclidean distance, looping over the > vectors seems to be faster than the vector based operation .... the most > frequent used function in the kMean algoritm is the euclidean distance > measure, so we gain significant speed ... (at least on my machine) > (any thoughts about implementing the bi-secting kmeans algoritm ?) I'm not familiar with the bisecting algorithm, but it sounds cool! ;) ># this is slow >def EucDist2(v1, v2): > return sqrt(sum((v1-v2)**2)) > ># this is faster >def EucDist1(v1, v2): > sum = 0 > for i in range(0,len(v1)): > sum += (v1[i] -v2[i])**2 > return sqrt(sum) Wow, that's surprising. In EucDist2, the (v1-v2) work is getting pushed down into Numeric, which loops through the array in C code. However, there is some overhead because the results of v1-v2 gets instantiated as a new list, and then squared, and then summed. EucDist1 has less overhead because the calculations are done on the fly, although in Python code. Indexing, i.e. v1[i], does generate a new object, but because of the semantics of Numeric, doesn't copy the values from the original array. However, I would have expected the overhead from looping through the python code to offset that. On my system, EucDist2 is about 10x faster than EucDist1. [krusty:~] jchang% python test.py EucDist1 time: 14.2610449791 EucDist1 31622.7766017 EucDist2 time: 1.84816789627 EucDist2 31622.7766017 [krusty:~] jchang% This is running on Darwin-ppc, Python 2.1, Numeric 20.0.0. I get similar results on solaris-sparc, Python 2.1, Numeric 17.3. So, the behavior you're seeing must be system dependent, which means we should provide both implementations of euclidean distance and let people choose which one to use. Jeff [krusty:~] jchang% cat test.py import time from Numeric import * # this is slow def EucDist2(v1, v2): return sqrt(sum((v1-v2)**2)) # this is faster def EucDist1(v1, v2): sum = 0 for i in range(0,len(v1)): sum += (v1[i] -v2[i])**2 return sqrt(sum) v1 = map(float, range(1000)) v2 = map(float, range(1000, 2000)) av1 = array(v1, Float32) av2 = array(v2, Float32) NTIMES = 1000 start = time.time() for i in range(NTIMES): EucDist1(av1, av2) t = time.time() - start print "EucDist1 time:", t print "EucDist1", EucDist1(av1, av2) start = time.time() for i in range(NTIMES): EucDist2(av1, av2) t = time.time() - start print "EucDist2 time:", t print "EucDist2", EucDist2(av1, av2) From dalke at acm.org Thu Jun 28 14:32:24 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Biopython 1.00a2 release Message-ID: <0b6201c10000$ad88aca0$6401a8c0@josiah.dalkescientific.com> Jeff: >Yeah, I've been there before. What happened with UPDB? Can the PDB >definition parser be reused to generate Martel-type definitions? I use UPDB to generate Martel format definitions. However, it's not really useful unless the parser can build real data structures, including, eg, convert the string "1.234" into the floating point number, or turn the atom index of "A0000" into the integer 100000. Andrew From biopython-bugs at bioperl.org Fri Jun 29 04:20:04 2001 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:00 2005 Subject: [Biopython-dev] Notification: incoming/36 Message-ID: <200106290820.f5T8K4810281@pw600a.bioperl.org> JitterBug notification new message incoming/36 Message summary for PR#36 From: Subject: toner supplies Date: Fri, 29 Jun 2001 04:13:00 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From br56@peopleweb.com Fri Jun 29 04:20:04 2001 Received: from custmail.concentric.net (custmail.concentric.net [205.158.26.150]) by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5T8Jw810271; Fri, 29 Jun 2001 04:19:59 -0400 Received: from www.z209220078.sjc-ca.dsl.cnc.net (hq.dcara.org [209.220.78.2]) by custmail.concentric.net (8.11.0/8.11.0) with ESMTP id f5T8JfY24378; Fri, 29 Jun 2001 01:19:42 -0700 (PDT) Received: from peopleweb.com ([168.191.92.201]) by www.z209220078.sjc-ca.dsl.cnc.net (Post.Office MTA v3.5.2 release 221 ID# 0-67874U100L2S100V35) with SMTP id net; Fri, 29 Jun 2001 01:18:59 -0700 From: Subject: toner supplies Date: Fri, 29 Jun 2001 04:13:00 Message-Id: <71.849361.143942@peopleweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" PLEASE FORWARD TO THE PERSON RESPONSIBLE FOR PURCHASING YOUR LASER PRINTER SUPPLIES **** VORTEX SUPPLIES **** LASER PRINTER TONER CARTRIDGES, COPIER AND FAX CARTRIDGES SAVE UP TO 30% FROM RETAIL ORDER BY PHONE:1-888-288-9043 ORDER BY FAX: 1-888-977-1577 CUSTOMER SERVICE: 1-888-248-2015 E-MAIL REMOVAL LINE: 1-888-248-4930 UNIVERSITY AND/OR SCHOOL PURCHASE ORDERS WELCOME. (NO CREDIT APPROVAL REQUIRED) ALL OTHER PURCHASE ORDER REQUESTS REQUIRE CREDIT APPROVAL. PAY BY CHECK (C.O.D), CREDIT CARD OR PURCHASE ORDER (NET 30 DAYS). IF YOUR ORDER IS BY CREDIT CARD PLEASE LEAVE YOUR CREDIT CARD # PLUS EXPIRATION DATE. IF YOUR ORDER IS BY PURCHASE ORDER LEAVE YOUR SHIPPING/BILLING ADDRESSES AND YOUR P.O. NUMBER FOR THOSE OF YOU WHO REQUIRE MORE INFORMATION ABOUT OUR COMPANY INCUDING FEDERAL TAX ID NUMBER, CLOSEST SHIPPING OR CORPORATE ADDRESS IN THE CONTINENTAL U.S. OR FOR CATALOG REQUESTS PLEASE CALL OUR CUSTOMER SERVICE LINE 1-888-248-2015 OUR NEW , LASER PRINTER TONER CARTRIDGE, PRICES ARE AS FOLLOWS: (PLEASE ORDER BY PAGE NUMBER AND/OR ITEM NUMBER) HEWLETT PACKARD: (ON PAGE 2) ITEM #1 LASERJET SERIES 4L,4P (74A)------------------------$44 ITEM #2 LASERJET SERIES 1100 (92A)-------------------------$44 ITEM #3 LASERJET SERIES 2 (95A)----------------------------$39 ITEM #4 LASERJET SERIES 2P (75A)---------------------------$54 ITEM #5 LASERJET SERIES 5P,6P,5MP, 6MP (3903A)---------- -$44 ITEM #6 LASERJET SERIES 5SI, 8000 (09A)--------------------$95 ITEM #7 LASERJET SERIES 2100 (96A)-------------------------$74 ITEM #8 LASERJET SERIES 8100 (82X)------------------------$145 ITEM #9 LASERJET SERIES 5L/6L (3906A)----------------------$35 ITEM #10 LASERJET SERIES 4V---------------------------------$95 ITEM #11 LASERJET SERIES 4000 (27X)--------------------------$72 ITEM #12 LASERJET SERIES 3SI/4SI (91A)-----------------------$54 ITEM #13 LASERJET SERIES 4, 4M, 5,5M-------------------------$49 ITEM #13A LASERJET SERIES 5000 (29X)-------------------------$95 HEWLETT PACKARD FAX (ON PAGE 2) ITEM #14 LASERFAX 500, 700 (FX1)----------$49 ITEM #15 LASERFAX 5000,7000 (FX2)--------$54 ITEM #16 LASERFAX (FX3)------------------$59 ITEM #17 LASERFAX (FX4)------------------$54 LEXMARK/IBM (ON PAGE 3) OPTRA 4019, 4029 HIGH YIELD---------------$89 OPTRA R, 4039, 4049 HIGH YIELD-----------$105 OPTRA E-----------------------------------$59 OPTRA N----------------------------------$115 OPTRA S----------------------------------$165 EPSON (ON PAGE 4) ACTION LASER 7000,7500,8000,9000----------$105 ACTION LASER 1000,1500--------------------$105 CANON PRINTERS (ON PAGE 5) PLEASE CALL FOR MODELS AND UPDATED PRICES FOR CANON PRINTER CARTRIDGES PANASONIC (0N PAGE 7) NEC SERIES 2 MODELS 90 AND 95----------$105 APPLE (0N PAGE 8) LASER WRITER PRO 600 or 16/600------------------$49 LASER WRITER SELECT 300,320,360-----------------$74 LASER WRITER 300 AND 320------------------------$54 LASER WRITER NT, 2NT----------------------------$54 LASER WRITER 12/640-----------------------------$79 CANON FAX (ON PAGE 9) LASERCLASS 4000 (FX3)---------------------------$59 LASERCLASS 5000,6000,7000 (FX2)-----------------$54 LASERFAX 5000,7000 (FX2)------------------------$54 LASERFAX 8500,9000 (FX4)------------------------$54 CANON COPIERS (PAGE 10) PC 3, 6RE, 7 AND 11 (A30)---------------------$69 PC 300,320,700,720 and 760 (E-40)-------------$89 IF YOUR CARTRIDGE IS NOT LISTED CALL CUSTOMER SERVICE AT 1-888-248-2015 90 DAY UNLIMITED WARRANTY INCLUDED ON ALL PRODUCTS. ALL TRADEMARKS AND BRAND NAMES LISTED ABOVE ARE PROPERTY OF THE RESPECTIVE HOLDERS AND USED FOR DESCRIPTIVE PURPOSES ONLY.