From chapmanb at arches.uga.edu  Tue Jun  5 03:29:00 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] New stuff in CVS
Message-ID: <15132.35388.457962.154037@taxus.athen1.ga.home.com>

Hey Jeff, et al;
I noticed a new c-extension was added to the setup.py file, which is
supposed to be living in:

Bio/Tools/Statistics/cstathelpermodule.c

but it doesn't look like it actually got checked in. Not sure if this
is just an oversight, or if it just isn't ready yet...

Also, I noticed the download_many function added to Bio.GenBank. Very
useful stuff. If you want to work more on some of the stuff you
mention in the comments, like checking that all GIs are valid and
stuff like that, I have some code which does sort of similar stuff,
although I do it in a different way -- my code was meant to start with 
a list of accession numbers and retrieve all of the records in GenBank 
from that (I use it to automatically update my local copy o' the
Arabidopsis genome). So it doesn't start directly with GIs like yours.

If you want to check it out/steal some code or whatever, it's at:

http://bioinformatics.org/cgi-bin/cvsweb.cgi/biopy-pgml/Bio/PGML/Organize/FileFactory.py?rev=1.3&content-type=text/x-cvsweb-markup

(sorry about the long URL).

Also, along the same lines, I added some other KeyError conditions to
NCBIDictionary which I  ran into while I was working on this. The two
other things I ran into were getting back:

Please try again later. Server error  for GI "7212005"

and:

The sequence has been intentionally withdrawn : GI "9993999"

The first just happens randomly (whenever NCBI isn't happy, I guess),
and the second will happen if there are some cross-refs between the
old record and the new one. You'd get back the withdrawn GI first and
the new, good GI second, so if there wasn't an error than you'll end
up with the withdrawn "record" instead of the good one.

Just thought I'd mention it in case you see any other stuff along
these lines.

Thanks again for the new code!
Brad


From nissim at math.ufl.edu  Sat Jun  9 07:11:21 2001
From: nissim at math.ufl.edu (Nissim Broudo)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython for Windows NT
Message-ID: <003201c0f0d4$ebe28d80$9553e30a@computer>

I downloaded the self-installing executable 

biopython-1.00a1.win32-py2.0.exe

to my PC running Windows NT.  When I go ahead and run it, I'm prompted for the 'installation directory', but the text box won't accept any characters.  Any ideas ?  

Thanks,
Nissim Broudo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010609/56472b71/attachment.htm
From chapmanb at arches.uga.edu  Fri Jun 15 14:42:00 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython for Windows NT
In-Reply-To: <003201c0f0d4$ebe28d80$9553e30a@computer>
References: <003201c0f0d4$ebe28d80$9553e30a@computer>
Message-ID: <15146.22264.221938.111829@taxus.athen1.ga.home.com>

Hi Nissim;
Thanks for writing -- apologies for the delay in getting back with
you. Our windows sysadmin just got back today so that I could get
myself permissions to test this out and try installing on 2000
machines. Thanks for waiting.

> I downloaded the self-installing executable 
> 
> biopython-1.00a1.win32-py2.0.exe
> 
> to my PC running Windows NT.  When I go ahead and run it, I'm
> prompted for the 'installation directory', but the text box 
> won't accept any characters.  Any ideas ?  

I'm not sure if I exactly know what box you are talking about, but I'm
thinking this is the initial box where it tries to find your Python
installation. I hope that is right and the rest of this mail helps.

After playing around with this for a while on both 98 and Windows 2000
(sorry that we don't have an NT box at my lab to try it on), I've come
to grips with some reasons why this might not be working. Basically,
it appears that the installer looks in the registry for a Python
installation, and pulls out all of the applicable installs. If it
doesn't find an installation, the type-in box will be useless (as you
found out). The installer was built using distutils, so this is a
limitation there. 

The best thing to do is to make sure it finds your python
installation. I can think of the following reasons why it might not
find python:

-> You don't have python installed. 

-> You don't have the right version of python installed. The installer
requires python 2.0. If you need a version for 2.1, I can make one for
you.

-> The installation of python is not in the registry for all
users. Under Windows 2000, if you install python without Administrator
permissions, the registry info for python will only be available for
the user who installed it. The fix here is to reinstall python with
Administrator permissions.

Does this help the problem? Don't hesitate to ask again if you have
more problems getting it running.

Brad


From nissim at math.ufl.edu  Fri Jun 15 13:05:19 2001
From: nissim at math.ufl.edu (Nissim Broudo)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython for Windows NT
References: <003201c0f0d4$ebe28d80$9553e30a@computer> <15146.22264.221938.111829@taxus.athen1.ga.home.com>
Message-ID: <002d01c0f5bd$5e3ffdd0$8953e30a@computer>

Hi Brad,

Thanks for writing back.  You were correct to guess that the text box I'm
being prompted with is requesting my Python installation.

I looked at the options you listed and by process of elimination, the
problem is that I am running Python 2.1.  Should I go ahead and install
Python 2.0 or should I wait for a biopython versions that relies on 2.1 ?
Either way is fine with me.

Nissim

----- Original Message -----
From: "Brad Chapman" <chapmanb@arches.uga.edu>
To: "Nissim Broudo" <nissim@math.ufl.edu>
Cc: <biopython-dev@biopython.org>
Sent: Friday, June 15, 2001 7:42 PM
Subject: Re: [Biopython-dev] Biopython for Windows NT


> Hi Nissim;
> Thanks for writing -- apologies for the delay in getting back with
> you. Our windows sysadmin just got back today so that I could get
> myself permissions to test this out and try installing on 2000
> machines. Thanks for waiting.
>
> > I downloaded the self-installing executable
> >
> > biopython-1.00a1.win32-py2.0.exe
> >
> > to my PC running Windows NT.  When I go ahead and run it, I'm
> > prompted for the 'installation directory', but the text box
> > won't accept any characters.  Any ideas ?
>
> I'm not sure if I exactly know what box you are talking about, but I'm
> thinking this is the initial box where it tries to find your Python
> installation. I hope that is right and the rest of this mail helps.
>
> After playing around with this for a while on both 98 and Windows 2000
> (sorry that we don't have an NT box at my lab to try it on), I've come
> to grips with some reasons why this might not be working. Basically,
> it appears that the installer looks in the registry for a Python
> installation, and pulls out all of the applicable installs. If it
> doesn't find an installation, the type-in box will be useless (as you
> found out). The installer was built using distutils, so this is a
> limitation there.
>
> The best thing to do is to make sure it finds your python
> installation. I can think of the following reasons why it might not
> find python:
>
> -> You don't have python installed.
>
> -> You don't have the right version of python installed. The installer
> requires python 2.0. If you need a version for 2.1, I can make one for
> you.
>
> -> The installation of python is not in the registry for all
> users. Under Windows 2000, if you install python without Administrator
> permissions, the registry info for python will only be available for
> the user who installed it. The fix here is to reinstall python with
> Administrator permissions.
>
> Does this help the problem? Don't hesitate to ask again if you have
> more problems getting it running.
>
> Brad
>


From chapmanb at arches.uga.edu  Fri Jun 15 21:44:42 2001
From: chapmanb at arches.uga.edu (Brad Chapman)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython for Windows NT
In-Reply-To: <002d01c0f5bd$5e3ffdd0$8953e30a@computer>
References: <003201c0f0d4$ebe28d80$9553e30a@computer>
	<15146.22264.221938.111829@taxus.athen1.ga.home.com>
	<002d01c0f5bd$5e3ffdd0$8953e30a@computer>
Message-ID: <15146.47626.353756.614708@taxus.athen1.ga.home.com>

Hi Nissim;

> Thanks for writing back.  

No problem, sorry again about the delay.

> You were correct to guess that the text box I'm
> being prompted with is requesting my Python installation.
 
Whoo hoo! I'm so happy I was not just rambling pointlessly :-)

> I looked at the options you listed and by process of elimination, the
> problem is that I am running Python 2.1.  Should I go ahead and install
> Python 2.0 or should I wait for a biopython versions that relies on 2.1 ?
> Either way is fine with me.

Well, either way is fine with me too -- I just created and uploaded an
installer for Python-2.1. Biopython works just fine with 2.1, but that
version just hadn't been released yet last time we were getting our
biopython release ready, so I hadn't prepared a Windows installer.

So, you can choose either to downgrade to Python-2.0, or just use the
new installer I uploaded. There's more than 1 way to do it :-). 

Let us know if you have any other problems with anything!

Brad


From biopython-bugs at bioperl.org  Tue Jun 19 10:57:43 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Notification: incoming/35
Message-ID: <200106191457.f5JEvh826278@pw600a.bioperl.org>

JitterBug notification

new message incoming/35

Message summary for PR#35
	From: tarjei@mit.edu
	Subject: NCBIStandalone.BlastParser bug
	Date: Tue, 19 Jun 2001 10:57:42 -0400
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From tarjei@mit.edu Tue Jun 19 10:57:42 2001
Received: from localhost (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5JEvg826272
	for <biopython-bugs@pw600a.bioperl.org>; Tue, 19 Jun 2001 10:57:42 -0400
Date: Tue, 19 Jun 2001 10:57:42 -0400
Message-Id: <200106191457.f5JEvg826272@pw600a.bioperl.org>
From: tarjei@mit.edu
To: biopython-bugs@bioperl.org
Subject: NCBIStandalone.BlastParser bug

Full_Name: Tarjei Mikkelsen
Module: Bio.Blast.NCBIStandalone.BlastParser
Version: 1.00a
OS: Dec/Alpha OSF1
Submission from: incognito.mit.edu (18.246.0.239)


The standalone BLAST record parser (Bio.Blast.NCBISTandalone.BlastParser) fails
with a SyntaxError when the (path)name of the database spans more than one
line.

The following code stub/BLAST output will reproduce the bug: (Even though this
example is from BLAST 2.0.5 the same thing happens in newer versions)

<<<<<CUT: blast_parser_bug.py>>>>>
from Bio.Blast import NCBIStandalone

blast_out = open("blast_parser_bug.out", "r")
blast_parser = NCBIStandalone.BlastParser()
blast_record = blast_parser.parse(blast_out)
<<<<<CUT>>>>>

<<<<<CUT: blast_parser_bug.out>>>>>
BLASTP 2.0.5 [May-5-1998]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= eco:b1416
         (83 letters)

Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.11
.fa
           39 sequences; 18,779 total letters

Searching......................................done

                                                                   Score     E
Sequences producing significant alignments:                        (bits) 
Value

spy:SPy1283                                                           20  0.64
lla:L0002                                                             20  0.84

>spy:SPy1283
           Length = 337
           
 Score = 20.4 bits (41), Expect = 0.64
 Identities = 10/26 (38%), Positives = 17/26 (64%), Gaps = 1/26 (3%)

Query: 21  GYTDEEIVSSDIIG-SHFGSVFDATQ 45
           G  +EE+V S I+G +  G++F  T+
Sbjct: 287 GIHNEELVESPILGTAEEGALFSLTE 312


>lla:L0002
           Length = 340
           
 Score = 20.0 bits (40), Expect = 0.84
 Identities = 10/25 (40%), Positives = 16/25 (64%), Gaps = 1/25 (4%)

Query: 21  GYTDEEIVSSDIIG-SHFGSVFDAT 44
           G  +EE+V S I+G +  G++F  T
Sbjct: 286 GIRNEELVESPILGTAEEGALFSLT 310


 Score = 18.8 bits (37), Expect = 1.9
 Identities = 9/29 (31%), Positives = 17/29 (58%), Gaps = 1/29 (3%)

Query: 28  VSSDIIGSHFGSVFD-ATQTEITAVGDLQ 55
           + +DI+G+ F   FD A  T + A+  ++
Sbjct: 126 IDNDIVGTDFTIGFDTAVSTVVDALDKIR 154


  Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.
  11.fa
    Posted date:  Jun 18, 2001  1:19 PM
  Number of letters in database: 18,779
  Number of sequences in database:  39
  
Lambda     K      H
   0.313    0.129    0.352 

Gapped
Lambda     K      H
   0.270   0.0470    0.230 


Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 2788
Number of Sequences: 39
Number of extensions: 119
Number of successful extensions: 3
Number of sequences better than 10: 2
Number of HSP's better than 10.0 without gapping: 2
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 3
length of query: 83
length of database: 18779
effective HSP length: 33
effective length of query: 50
effective length of database: 17492
effective search space:   874600
T: 11
A: 40
X1: 16 ( 7.2 bits)
X2: 38 (14.8 bits)
X3: 64 (24.9 bits)
S1: 34 (18.3 bits)
S2: 31 (16.5 bits)
<<<<<CUT>>>>>


From jchang at SMI.Stanford.EDU  Tue Jun 19 11:25:05 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Notification: incoming/35
In-Reply-To: <200106191457.f5JEvh826278@pw600a.bioperl.org>
Message-ID: <B754BCE0.2392%jchang@smi.stanford.edu>

Good catch!  Yes, the parser is assuming 1 line for the database.  I've gone
through and fixed this in the NCBIStandalone.py file.  Please install this
over your previous one and let me know if it works.  I'm not sure if there
are more outstanding issues with the formatting of BLAST 2.0.5.  If there
are, please send me the offending output to me directly as an attachment --
the jitterbug remailer reformats text so I can't tell exactly what the
original output looks like.

Thanks,
Jeff


> From: biopython-bugs@bioperl.org
> Date: Tue, 19 Jun 2001 10:57:43 -0400
> To: biopython-dev@biopython.org
> Subject: [Biopython-dev] Notification: incoming/35
> 
> JitterBug notification
> 
> new message incoming/35
> 
> Message summary for PR#35
> From: tarjei@mit.edu
> Subject: NCBIStandalone.BlastParser bug
> Date: Tue, 19 Jun 2001 10:57:42 -0400
> 0 replies  0 followups
> 
> ====> ORIGINAL MESSAGE FOLLOWS <====
> 
>> From tarjei@mit.edu Tue Jun 19 10:57:42 2001
> Received: from localhost (localhost [127.0.0.1])
> by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5JEvg826272
> for <biopython-bugs@pw600a.bioperl.org>; Tue, 19 Jun 2001 10:57:42 -0400
> Date: Tue, 19 Jun 2001 10:57:42 -0400
> Message-Id: <200106191457.f5JEvg826272@pw600a.bioperl.org>
> From: tarjei@mit.edu
> To: biopython-bugs@bioperl.org
> Subject: NCBIStandalone.BlastParser bug
> 
> Full_Name: Tarjei Mikkelsen
> Module: Bio.Blast.NCBIStandalone.BlastParser
> Version: 1.00a
> OS: Dec/Alpha OSF1
> Submission from: incognito.mit.edu (18.246.0.239)
> 
> 
> The standalone BLAST record parser (Bio.Blast.NCBISTandalone.BlastParser)
> fails
> with a SyntaxError when the (path)name of the database spans more than one
> line.
> 
> The following code stub/BLAST output will reproduce the bug: (Even though this
> example is from BLAST 2.0.5 the same thing happens in newer versions)
> 
> <<<<<CUT: blast_parser_bug.py>>>>>
> from Bio.Blast import NCBIStandalone
> 
> blast_out = open("blast_parser_bug.out", "r")
> blast_parser = NCBIStandalone.BlastParser()
> blast_record = blast_parser.parse(blast_out)
> <<<<<CUT>>>>>
> 
> <<<<<CUT: blast_parser_bug.out>>>>>
> BLASTP 2.0.5 [May-5-1998]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= eco:b1416
> (83 letters)
> 
> Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.11
> .fa
> 39 sequences; 18,779 total letters
> 
> Searching......................................done
> 
> Score     E
> Sequences producing significant alignments:                        (bits)
> Value
> 
> spy:SPy1283                                                           20  0.64
> lla:L0002                                                             20  0.84
> 
>> spy:SPy1283
> Length = 337
> 
> Score = 20.4 bits (41), Expect = 0.64
> Identities = 10/26 (38%), Positives = 17/26 (64%), Gaps = 1/26 (3%)
> 
> Query: 21  GYTDEEIVSSDIIG-SHFGSVFDATQ 45
> G  +EE+V S I+G +  G++F  T+
> Sbjct: 287 GIHNEELVESPILGTAEEGALFSLTE 312
> 
> 
>> lla:L0002
> Length = 340
> 
> Score = 20.0 bits (40), Expect = 0.84
> Identities = 10/25 (40%), Positives = 16/25 (64%), Gaps = 1/25 (4%)
> 
> Query: 21  GYTDEEIVSSDIIG-SHFGSVFDAT 44
> G  +EE+V S I+G +  G++F  T
> Sbjct: 286 GIRNEELVESPILGTAEEGALFSLT 310
> 
> 
> Score = 18.8 bits (37), Expect = 1.9
> Identities = 9/29 (31%), Positives = 17/29 (58%), Gaps = 1/29 (3%)
> 
> Query: 28  VSSDIIGSHFGSVFD-ATQTEITAVGDLQ 55
> + +DI+G+ F   FD A  T + A+  ++
> Sbjct: 126 IDNDIVGTDFTIGFDTAVSTVVDALDKIR 154
> 
> 
> Database: /home/strontium/tarjei/pathway/src/Bio/Pathway/data/2.7.1.
> 11.fa
> Posted date:  Jun 18, 2001  1:19 PM
> Number of letters in database: 18,779
> Number of sequences in database:  39
> 
> Lambda     K      H
> 0.313    0.129    0.352
> 
> Gapped
> Lambda     K      H
> 0.270   0.0470    0.230
> 
> 
> Matrix: BLOSUM62
> Gap Penalties: Existence: 11, Extension: 1
> Number of Hits to DB: 2788
> Number of Sequences: 39
> Number of extensions: 119
> Number of successful extensions: 3
> Number of sequences better than 10: 2
> Number of HSP's better than 10.0 without gapping: 2
> Number of HSP's successfully gapped in prelim test: 0
> Number of HSP's that attempted gapping in prelim test: 0
> Number of HSP's gapped (non-prelim): 3
> length of query: 83
> length of database: 18779
> effective HSP length: 33
> effective length of query: 50
> effective length of database: 17492
> effective search space:   874600
> T: 11
> A: 40
> X1: 16 ( 7.2 bits)
> X2: 38 (14.8 bits)
> X3: 64 (24.9 bits)
> S1: 34 (18.3 bits)
> S2: 31 (16.5 bits)
> <<<<<CUT>>>>>
> 
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev@biopython.org
> http://biopython.org/mailman/listinfo/biopython-dev
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010619/b14ef800/attachment.htm
From jchang at SMI.Stanford.EDU  Thu Jun 28 02:19:32 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython 1.00a2 release
Message-ID: <a0510100fb76079968dd4@[192.168.0.4]>

Hey happy developers!

It smells like time to put together another release to get some bug 
fixes and new functionality out to the public.  I currently have the 
middle of next week in mind, although that's still up for debate.

Here's my undoubtedly incomplete list of stuff that's been updated 
since the last release (on Mar 3!):
   deprecated old regression testing frameworks
   deprecated Sequence.py  (still needs to be done)
   Swiss-Prot parser bug fixes
   GenBank parser bug fixes
   can now download many sequences at a time from GenBank
   kMeans clustering algorithm
   support for Kabat
   support for FSSP
   numerous updates in alignment code
   fixed memory leak in listfns
   Martel bundled and part of the install procedure

Please let me know if:
   1) you're currently working on something and really want to hold 
off until it's done.
   2) there's new, fixed stuff, or deleted stuff that I overlooked
   3) something I said is done is not ready
   4) I'm out of my mind to be releasing something now


We're still marching towards a 1.0 release sometime.  Before that 
goes out, we'll need to:
   - stabilize the APIs
   - integrate Martel parsers, deprecate old ones
   - flesh out the regression tests

Stuff I'd like, but may not get done:
   - PDB parser
   - internet-aware regression tester
   - dynamic programming code (Brad, where's yours? :)


Let me know what you think.  Core developers, I will wait until I 
hear from each of you before I move forward.

Thanks,
Jeff

From dalke at acm.org  Thu Jun 28 02:29:20 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython 1.00a2 release
Message-ID: <09d601c0ff9b$aac19140$6401a8c0@josiah.dalkescientific.com>

Jeff:
> ...

Looks good with me.

BTW all, I'm going to be visiting EBI before going to BOSC,
starting next Friday.  My plan is to work a lot on biopython -
getting it up to date with some of the other projects, finshing
off and testing Martel (they have a lot more databases at EBI
than I do!), propagandizing Python in the den of Perlers :)
and generally having fun.

I don't plan to do any API changes, so none of it should affect
a 1.0 final release.  But there might be a lot of new code.

> Stuff I'd like, but may not get done:
>    - PDB parser

Not going to happen from me soon.  It's tricky stuff.  On the
other hand, if you just want support for the SEQRES, ATOM, HETATOM,
TER, BOND and MODEL/ENDMDL cards then it's a bit easier.  Still
lots of trickiness (which ATOM formats? 1.x or 2.x?  XPLOR-style?)

                    Andrew
                    dalke@acm.org


From thomas at cbs.dtu.dk  Thu Jun 28 03:27:09 2001
From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython 1.00a2 release
In-Reply-To: Jeffrey Chang's message of "Wed, 27 Jun 2001 23:19:32 -0700"
References: <a0510100fb76079968dd4@[192.168.0.4]>
Message-ID: <y9vu211uks2.fsf@genome.cbs.dtu.dk>

Jeffrey Chang <jchang@SMI.Stanford.EDU> writes:

> Hey happy developers!
> 
> It smells like time to put together another release to get some bug
> fixes and new functionality out to the public.  I currently have the
> middle of next week in mind, although that's still up for debate.
> 
> Here's my undoubtedly incomplete list of stuff that's been updated
> since the last release (on Mar 3!):
> 
>    deprecated old regression testing frameworks
>    deprecated Sequence.py  (still needs to be done)
>    Swiss-Prot parser bug fixes
>    GenBank parser bug fixes
>    can now download many sequences at a time from GenBank
>    kMeans clustering algorithm

Cool .... Arghhh, I should have spend more time following all development
postings, I spend 2 days implementing kMeans in python ... :-)
- one observation for calculating the euclidean distance, looping over the
  vectors seems to be faster than the vector based operation .... the most
  frequent used function in the kMean algoritm is the euclidean distance
  measure, so we gain significant speed ... (at least on my machine)
 (any thoughts about implementing the bi-secting kmeans algoritm ?)

# this is slow
def EucDist2(v1, v2):
    return sqrt(sum((v1-v2)**2))

# this is faster
def EucDist1(v1, v2):
    sum = 0
    for i in range(0,len(v1)):
        sum += (v1[i] -v2[i])**2
    return sqrt(sum)


>    support for Kabat
>    support for FSSP
>    numerous updates in alignment code
>    fixed memory leak in listfns
>    Martel bundled and part of the install procedure
> 
> Please let me know if:
>    1) you're currently working on something and really want to hold
>    off until it's done.

I have small things I always wanted to include but I never did ...  
Where should we include sequence based calculations like GC-content, GC3
etc. - should we put that in the sequence class (not a good idea IMHO)
or create a new module e.g. seqstat ?

Whats the status of an antiparallel or complementary function - do we still
lack one ?

> Let me know what you think.  Core developers, I will wait until I hear
> from each of you before I move forward.

I think, this month I can spend significantly more time on the biopython
project than the last months - so is anything I mentioned worth to pull in
the next release ?

cheers
-thomas

-- 
Sicheritz-Ponten Thomas, Ph.D  CBS, Department of Biotechnology
thomas@biopython.org           The Technical University of Denmark
CBS:  +45 45 252489            Building 208, DK-2800 Lyngby
Fax   +45 45 931585            http://www.cbs.dtu.dk/thomas

	De Chelonian Mobile ... The Turtle Moves ...

From dalke at acm.org  Thu Jun 28 03:42:21 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython 1.00a2 release
Message-ID: <0a1101c0ffa5$ddd9d060$6401a8c0@josiah.dalkescientific.com>

Thomas Sicheritz-Ponten <thomas@cbs.dtu.dk>:
># this is slow
>def EucDist2(v1, v2):
>    return sqrt(sum((v1-v2)**2))
>
># this is faster
>def EucDist1(v1, v2):
>    sum = 0
>    for i in range(0,len(v1)):
>        sum += (v1[i] -v2[i])**2
>    return sqrt(sum)

The first does more work than the second.  It has to find the
v1-v2 uses a "__sub__" method call, which then does the same
as v1[i] - v2[i], except with the method call overhead.  Ditto
with ** defining "__pow__".  It also makes itermediate objects
for every call.  (C++ use to have that problem.  We worked on
a system with a lot of overloaded 3-vectors.  Got a huge performance
boost turning the calls into 3-arg form.  OTOH, the overloaded
vector form was much easier to write and debug.  Nowadays C++
people use expression templates.)

The only thing I can suggest you change is to get rid of the "0, "
in the range call.

Out of curiosity, I tried

   for a1, a2 in zip(v1, v2):
     sum += (a1-a2) ** 2

The 'zip' version was about 3 times slower.  Here's my test
harness.

def main():
   for n in range(1, 6):
      v1 = range(0, 10**n)
      v2 = range(n, 10**n+n)
      t1 = time.time()
      d1 = EucDist1(v1, v2)
      t2 = time.time()
      d2 = EucDist3(v1, v2)
      t3 = time.time()

      assert d1 == d2, (d1, d2)
      print n, t2-t1, t3-t2

                    Andrew


From jchang at SMI.Stanford.EDU  Thu Jun 28 14:28:36 2001
From: jchang at SMI.Stanford.EDU (Jeffrey Chang)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython 1.00a2 release
In-Reply-To: <y9vu211uks2.fsf@genome.cbs.dtu.dk>
References: <a0510100fb76079968dd4@[192.168.0.4]>
 <y9vu211uks2.fsf@genome.cbs.dtu.dk>
Message-ID: <a05101002b76123ddce05@[171.65.33.250]>

At 9:27 AM +0200 6/28/01, Thomas Sicheritz-Ponten wrote:

>- one observation for calculating the euclidean distance, looping over the
>   vectors seems to be faster than the vector based operation .... the most
>   frequent used function in the kMean algoritm is the euclidean distance
>   measure, so we gain significant speed ... (at least on my machine)
>  (any thoughts about implementing the bi-secting kmeans algoritm ?)

I'm not familiar with the bisecting algorithm, but it sounds cool!  ;)


># this is slow
>def EucDist2(v1, v2):
>     return sqrt(sum((v1-v2)**2))
>
># this is faster
>def EucDist1(v1, v2):
>     sum = 0
>     for i in range(0,len(v1)):
>         sum += (v1[i] -v2[i])**2
>     return sqrt(sum)


Wow, that's surprising.  In EucDist2, the (v1-v2) work is getting
pushed down into Numeric, which loops through the array in C code.
However, there is some overhead because the results of v1-v2 gets
instantiated as a new list, and then squared, and then summed.

EucDist1 has less overhead because the calculations are done on the
fly, although in Python code.  Indexing, i.e. v1[i], does generate a
new object, but because of the semantics of Numeric, doesn't copy the
values from the original array.  However, I would have expected the
overhead from looping through the python code to offset that.

On my system, EucDist2 is about 10x faster than EucDist1.

[krusty:~] jchang% python test.py
EucDist1 time: 14.2610449791
EucDist1 31622.7766017
EucDist2 time: 1.84816789627
EucDist2 31622.7766017
[krusty:~] jchang%

This is running on Darwin-ppc, Python 2.1, Numeric 20.0.0.  I get
similar results on solaris-sparc, Python 2.1, Numeric 17.3.

So, the behavior you're seeing must be system dependent, which means
we should provide both implementations of euclidean distance and let
people choose which one to use.

Jeff


[krusty:~] jchang% cat test.py
import time
from Numeric import *

# this is slow
def EucDist2(v1, v2):
     return sqrt(sum((v1-v2)**2))

# this is faster
def EucDist1(v1, v2):
     sum = 0
     for i in range(0,len(v1)):
         sum += (v1[i] -v2[i])**2
     return sqrt(sum)

v1 = map(float, range(1000))
v2 = map(float, range(1000, 2000))
av1 = array(v1, Float32)
av2 = array(v2, Float32)

NTIMES = 1000

start = time.time()
for i in range(NTIMES):
     EucDist1(av1, av2)
t = time.time() - start
print "EucDist1 time:", t
print "EucDist1", EucDist1(av1, av2)

start = time.time()
for i in range(NTIMES):
     EucDist2(av1, av2)
t = time.time() - start
print "EucDist2 time:", t
print "EucDist2", EucDist2(av1, av2)


From dalke at acm.org  Thu Jun 28 14:32:24 2001
From: dalke at acm.org (Andrew Dalke)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Biopython 1.00a2 release
Message-ID: <0b6201c10000$ad88aca0$6401a8c0@josiah.dalkescientific.com>

Jeff:
>Yeah, I've been there before.  What happened with UPDB?  Can the PDB
>definition parser be reused to generate Martel-type definitions?

I use UPDB to generate Martel format definitions.  However, it's
not really useful unless the parser can build real data structures,
including, eg, convert the string "1.234" into the floating point
number, or turn the atom index of "A0000" into the integer 100000.

                    Andrew


From biopython-bugs at bioperl.org  Fri Jun 29 04:20:04 2001
From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org)
Date: Sat Mar  5 14:43:00 2005
Subject: [Biopython-dev] Notification: incoming/36
Message-ID: <200106290820.f5T8K4810281@pw600a.bioperl.org>

JitterBug notification

new message incoming/36

Message summary for PR#36
	From: <br56@peopleweb.com>
	Subject: toner supplies
	Date: Fri, 29 Jun 2001 04:13:00
	0 replies 	0 followups

====> ORIGINAL MESSAGE FOLLOWS <====

>From br56@peopleweb.com Fri Jun 29 04:20:04 2001
Received: from custmail.concentric.net (custmail.concentric.net [205.158.26.150])
	by pw600a.bioperl.org (8.11.2/8.11.2) with ESMTP id f5T8Jw810271;
	Fri, 29 Jun 2001 04:19:59 -0400
Received: from www.z209220078.sjc-ca.dsl.cnc.net (hq.dcara.org [209.220.78.2])
	by custmail.concentric.net (8.11.0/8.11.0) with ESMTP id f5T8JfY24378;
	Fri, 29 Jun 2001 01:19:42 -0700 (PDT)
Received: from peopleweb.com ([168.191.92.201])
          by www.z209220078.sjc-ca.dsl.cnc.net (Post.Office MTA v3.5.2
          release 221 ID# 0-67874U100L2S100V35) with SMTP id net;
          Fri, 29 Jun 2001 01:18:59 -0700
From: <br56@peopleweb.com>
Subject: toner supplies
Date: Fri, 29 Jun 2001 04:13:00
Message-Id: <71.849361.143942@peopleweb.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"


 PLEASE FORWARD TO THE PERSON
RESPONSIBLE FOR PURCHASING
YOUR LASER PRINTER SUPPLIES

**** VORTEX  SUPPLIES ****

LASER PRINTER TONER CARTRIDGES,
COPIER AND FAX CARTRIDGES

SAVE UP TO 30% FROM RETAIL

ORDER BY PHONE:1-888-288-9043
ORDER BY FAX: 1-888-977-1577
CUSTOMER SERVICE: 1-888-248-2015
E-MAIL REMOVAL LINE: 1-888-248-4930 

UNIVERSITY AND/OR SCHOOL PURCHASE ORDERS WELCOME. (NO CREDIT APPROVAL REQUIRED)
ALL OTHER PURCHASE ORDER REQUESTS REQUIRE CREDIT APPROVAL.
PAY BY CHECK (C.O.D), CREDIT CARD OR PURCHASE ORDER (NET 30 DAYS).

IF YOUR ORDER IS BY CREDIT CARD PLEASE LEAVE YOUR CREDIT CARD # PLUS EXPIRATION DATE. 
IF YOUR ORDER IS BY PURCHASE ORDER LEAVE YOUR SHIPPING/BILLING ADDRESSES AND YOUR P.O. NUMBER


FOR THOSE OF YOU WHO REQUIRE MORE INFORMATION ABOUT OUR COMPANY
INCUDING FEDERAL TAX ID NUMBER, CLOSEST SHIPPING OR CORPORATE ADDRESS IN THE CONTINENTAL 
U.S. OR  FOR CATALOG  REQUESTS PLEASE CALL OUR CUSTOMER SERVICE LINE  1-888-248-2015 
 

OUR NEW , LASER PRINTER TONER CARTRIDGE, PRICES ARE  AS FOLLOWS: 
(PLEASE ORDER BY PAGE NUMBER AND/OR ITEM NUMBER)

HEWLETT PACKARD: (ON PAGE 2)

ITEM #1  LASERJET SERIES  4L,4P (74A)------------------------$44
ITEM #2  LASERJET SERIES  1100 (92A)-------------------------$44
ITEM #3  LASERJET SERIES  2 (95A)----------------------------$39
ITEM #4  LASERJET SERIES  2P (75A)---------------------------$54 
ITEM #5  LASERJET SERIES  5P,6P,5MP, 6MP (3903A)----------  -$44
ITEM #6  LASERJET SERIES  5SI, 8000 (09A)--------------------$95
ITEM #7  LASERJET SERIES  2100 (96A)-------------------------$74
ITEM #8  LASERJET SERIES  8100 (82X)------------------------$145
ITEM #9  LASERJET SERIES  5L/6L (3906A)----------------------$35
ITEM #10 LASERJET SERIES  4V---------------------------------$95
ITEM #11 LASERJET SERIES 4000 (27X)--------------------------$72
ITEM #12 LASERJET SERIES 3SI/4SI (91A)-----------------------$54
ITEM #13 LASERJET SERIES 4, 4M, 5,5M-------------------------$49
ITEM #13A LASERJET SERIES 5000 (29X)-------------------------$95

HEWLETT PACKARD FAX (ON PAGE 2)

ITEM #14 LASERFAX 500, 700 (FX1)----------$49
ITEM #15  LASERFAX 5000,7000 (FX2)--------$54
ITEM #16  LASERFAX (FX3)------------------$59
ITEM #17  LASERFAX (FX4)------------------$54


LEXMARK/IBM (ON PAGE 3)

OPTRA 4019, 4029 HIGH YIELD---------------$89
OPTRA R, 4039, 4049 HIGH YIELD-----------$105

OPTRA E-----------------------------------$59
OPTRA N----------------------------------$115
OPTRA S----------------------------------$165


EPSON (ON PAGE 4)

ACTION LASER 7000,7500,8000,9000----------$105
ACTION LASER 1000,1500--------------------$105


CANON PRINTERS (ON PAGE 5)

PLEASE CALL FOR MODELS AND UPDATED PRICES
FOR CANON PRINTER CARTRIDGES

PANASONIC (0N PAGE 7)

NEC SERIES 2 MODELS 90 AND 95----------$105

APPLE (0N PAGE 8)

LASER WRITER PRO 600 or 16/600------------------$49 
LASER WRITER SELECT 300,320,360-----------------$74
LASER WRITER 300 AND 320------------------------$54
LASER WRITER NT, 2NT----------------------------$54
LASER WRITER 12/640-----------------------------$79

CANON FAX (ON PAGE 9)

LASERCLASS 4000 (FX3)---------------------------$59
LASERCLASS 5000,6000,7000 (FX2)-----------------$54
LASERFAX 5000,7000 (FX2)------------------------$54
LASERFAX 8500,9000 (FX4)------------------------$54

CANON COPIERS (PAGE 10)

PC 3, 6RE, 7 AND 11 (A30)---------------------$69
PC 300,320,700,720 and 760 (E-40)-------------$89

IF YOUR CARTRIDGE IS NOT LISTED CALL CUSTOMER SERVICE AT 1-888-248-2015 

90 DAY UNLIMITED WARRANTY INCLUDED ON ALL PRODUCTS.

ALL TRADEMARKS AND BRAND NAMES LISTED ABOVE ARE PROPERTY OF THE 
RESPECTIVE HOLDERS AND USED FOR DESCRIPTIVE PURPOSES ONLY.