From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 01:54:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 01:54:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707010554.l615stgK032500@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 01:54 EST -------
Sorry for the mistake.

With the code for Python >= 2.4 separately, we still get an error message when
installing Biopython, because Python attempts to byte-compile each module. It
is not so serious, because this error is otherwise ignored. However, how about
this code for Python >= 2.4:

from itertools import cycle, imap

return sum(imap(lambda n,c: n*ord(c.upper()), cycle(range(1,58)),seq)) % 10000

It is almost as fast as the code you now have for Python >= 2.4, but avoids
having to create a separate module gcg24.py.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 07:02:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 07:02:47 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707011102.l61B2lHg029279@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 07:02 EST -------
Btw, I am finding that the code for Python < 2.3 is faster than the code for
Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
if  we avoid copying seq, I still find that it is faster than the code for
Python >= 2.4.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mdehoon at c2b2.columbia.edu  Sun Jul  1 08:01:00 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 01 Jul 2007 21:01:00 +0900
Subject: [Biopython-dev] TempFastaWriter,
	TempFastaWriterSingle in Bio/GFF/easy.py
In-Reply-To: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
References: <4685FCCA.4090904@c2b2.columbia.edu>
	<320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
Message-ID: <4687977C.70903@c2b2.columbia.edu>

Peter wrote:
>> Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in
>> Bio/GFF/easy.py? They are currently using the old Fasta writer in
>> Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can
>> either update them to use the new Fasta writer, or simply remove them,
>> since currently these classes are not used anywhere in Biopython.
> 
> This is for Bug 2284 right?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2284
> 
> I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle
> 
Actually I hadn't noticed bug 2284. I looked into this because the 
Biopython tests are causing DeprecationWarnings. If no users of these 
classes step forward, I am in favor of removing them.

--Michiel.

From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 10:13:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 10:13:29 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707011413.l61EDTF3012907@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #19 from sbassi at gmail.com  2007-07-01 10:13 EST -------
(In reply to comment #18)
> Btw, I am finding that the code for Python < 2.3 is faster than the code for
> Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
> if  we avoid copying seq, I still find that it is faster than the code for
> Python >= 2.4.

OK, so leave it w/o the check for python version and use just the 2.3 code.
Best,
SB.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 18:38:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 18:38:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707012238.l61Mct1k007379@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 18:38 EST -------
Updated in CVS, using the 2.3 code without copying seq.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 19:42:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:42:14 -0400
Subject: [Biopython-dev] [Bug 2327] New: test_Cluster takes too long
Message-ID: <bug-2327-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2327

           Summary: test_Cluster takes too long
           Product: Biopython
           Version: 1.43
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: idoerg at burnham.org


When running the biopython test suite, test_Cluster takes too long. I gave up
after 2 minutes.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 19:55:34 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:34 -0400
Subject: [Biopython-dev] [Bug 2327] test_Cluster takes too long
In-Reply-To: <bug-2327-42@http.bugzilla.open-bio.org/>
Message-ID: <200707012355.l61NtYcR012177@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2327


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 19:55 EST -------


*** This bug has been marked as a duplicate of bug 2268 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 19:55:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:36 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To: <bug-2268-42@http.bugzilla.open-bio.org/>
Message-ID: <200707012355.l61NtaNW012196@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2268


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |idoerg at gmail.com


------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 19:55 EST -------
*** Bug 2327 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jul  3 07:03:40 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:03:40 -0400
Subject: [Biopython-dev] [Bug 2328] New: NCBIStandalone.blastall chokes on
	integer argument
Message-ID: <bug-2328-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328

           Summary: NCBIStandalone.blastall chokes on integer argument
           Product: Biopython
           Version: 1.43
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: grunberg at embl.de
                CC: grunberg at embl.de


Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
expect that the argument align_view is given as a string rather than an
integer. So the following call worked with previous versions but now fails::

   results, err = NCBIStandalone.blastall( settings.blast_bin,
                                           method, db, seqFile,
                                           expectation=e,
                                           align_view=7, ## XML output
                                           **kw)

The error is raised here::

  NCBIStandalone: 1788 (blastall) 
     w, r, e = os.popen3(" ".join([blastcmd] + params))

because align_view escapes the str conversion of the other parameters in this
line::

   params.extend([att2param['align_view'], align_view])

This line should rather look like this::

   params.extend([att2param['align_view'], str(align_view)])

I am going to attach a patch to this bugreport.

Greetings,
Raik


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jul  3 07:05:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:05:37 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707031105.l63B5bAP013190@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


------- Comment #1 from grunberg at embl.de  2007-07-03 07:05 EST -------
Created an attachment (id=698)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=698&action=view)
patch for  Bug 2328 (NCBIStandalone.blastall / blastpgp)

The patch is described in my bug report.
Cheers,
Raik


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Jul  3 19:26:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 19:26:15 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707032326.l63NQFBB022873@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2007-07-03 19:26 EST -------
> Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> expect that the argument align_view is given as a string rather than an
> integer. So the following call worked with previous versions but now fails::

In which previous version of Biopython did this work? Your patch looks fine,
but I'd like to find out how this bug entered Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jul  5 09:30:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jul 2007 09:30:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707051330.l65DUW2k004459@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #21 from dalloliogm at gmail.com  2007-07-05 09:30 EST -------
(In reply to comment #1)
> Created an attachment (id=689)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) [details]
> Proposed functions (CRC64 and GCG checksum)
> 
> This could be in utils.py, but I am not sure.


Maybe it could be useful to add a 'GCG checksum' attribute to the BioPython Seq
object.

Checksums could be used to quickly compare if two sequences are the same; but
in the documentation you should state very clearly that two sequences which
differ even for a single symbol (ex. AAANAAA and AAAAAAA) have different
values.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jul  7 05:28:56 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 7 Jul 2007 05:28:56 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707070928.l679SuTJ010432@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


------- Comment #3 from grunberg at embl.de  2007-07-07 05:28 EST -------
(In reply to comment #2)
> > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> > expect that the argument align_view is given as a string rather than an
> > integer. So the following call worked with previous versions but now fails::
> 
> In which previous version of Biopython did this work? Your patch looks fine,
> but I'd like to find out how this bug entered Biopython.
> 

Sorry about the late reply... My previous Biopython installation (which didn't
have the glitch) was version 1.42.
Greetings
Raik


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul  8 00:20:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 8 Jul 2007 00:20:12 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707080420.l684KCSq031646@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2007-07-08 00:20 EST -------
Fixed in CVS (see biopython/Bio/Blast/NCBIStandalone.py revision 1.68).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From chengsoon.ong at tuebingen.mpg.de  Mon Jul  9 06:15:50 2007
From: chengsoon.ong at tuebingen.mpg.de (Cheng Soon Ong)
Date: Mon, 9 Jul 2007 12:15:50 +0200
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
Message-ID: <CEE0EEC1-BF4A-4A07-9389-6FA2C6E4ACC3@tuebingen.mpg.de>

Hi,

I've just written a small extension to the qblast function. The  
current version of only passes a subset of parameters to NCBI. I've  
just written some code such that it passes all the parameters that  
the qblast API at NCBI accepts.

Is anyone interested to merge this into the blast module of  
Biopython? Sorry, I do not know the protocol here for getting code  
into Biopython.

Cheng


From mdehoon at c2b2.columbia.edu  Mon Jul  9 07:40:23 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 09 Jul 2007 20:40:23 +0900
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
In-Reply-To: <CEE0EEC1-BF4A-4A07-9389-6FA2C6E4ACC3@tuebingen.mpg.de>
References: <CEE0EEC1-BF4A-4A07-9389-6FA2C6E4ACC3@tuebingen.mpg.de>
Message-ID: <46921EA7.2080106@c2b2.columbia.edu>

Dear Cheng,

Thank you for your contribution.

The "official" way to contribute code to Biopython is to open a bug 
report at http://bugzilla.open-bio.org/, open a new bug report, and add 
your code to it.

For your qblast code, you can also just send it to me (not to the list), 
then I can merge it into Biopython.

--Michiel.

Cheng Soon Ong wrote:
> Hi,
> 
> I've just written a small extension to the qblast function. The  
> current version of only passes a subset of parameters to NCBI. I've  
> just written some code such that it passes all the parameters that  
> the qblast API at NCBI accepts.
> 
> Is anyone interested to merge this into the blast module of  
> Biopython? Sorry, I do not know the protocol here for getting code  
> into Biopython.
> 

From biopython-dev at maubp.freeserve.co.uk  Tue Jul 10 15:31:55 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 20:31:55 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
Message-ID: <4693DEAB.8000900@maubp.freeserve.co.uk>

Hi Tiago,

Have you had any feedback (off the mailing list)?

Ralph - did you have a chance to look over Tiago's code or discuss this 
with him?

It would be a shame if nothing came from this...

Peter

Tiago Ant?o wrote:
> Hi!
> 
> I have submitted another enhancement bug, with support for FDist. It
> allows to generate and parse Fdist files and to control fdist
> applications. There are also a couple of utility functions. FDist is a
> niche application (mainly used to detect selection in animal
> genetics). Not the most fundamental one to support, but it is
> currently one that I am working on, thus, the code.
> 
> Regarding my summited code for GenePop, I have summited a different
> version on bugzilla.  The main difference, is that I moved everything
> from Bio to Bio.PopGen.
> 
> Before I continue putting code on bugzilla I would like to know if it
> is worthwhile doing it... Any opinions on the code submitted or if any
> changes are required? I would really like to continue converting my
> code to BioPython, but only if it has any possibility of ending up
> being useful/included in distribution somewhere in the future... ;)
> 
> I am currently working on code related to SimCoal2, Arlequin and
> general statistics (Fst, heterozygosity, ...). Which will probably be
> ready quite soon (ie, next two weeks). This is more mainstream than
> FDist
> 
> I have some other code lying around mainly related to HapMap, but I
> will only submit it after reviewing and reusing it again. This is more
> distant future ... like a couple of months.
> 
> Tiago


From biopython-dev at maubp.freeserve.co.uk  Tue Jul 10 17:12:44 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 22:12:44 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <Pine.OSX.4.64.0707101652021.415@emeraldii.local>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
	<Pine.OSX.4.64.0707101652021.415@emeraldii.local>
Message-ID: <4693F64C.9050403@maubp.freeserve.co.uk>

Ralph Haygood wrote:
> Peter,
> 
> I haven't received any code from Tiago to review.
> 
> Ralph

He's put some on Bugzilla:
http://bugzilla.open-bio.org/show_bug.cgi?id=2170

Peter


From rhaygood at duke.edu  Tue Jul 10 23:45:56 2007
From: rhaygood at duke.edu (Ralph Haygood)
Date: Tue, 10 Jul 2007 23:45:56 -0400 (EDT)
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <4693DEAB.8000900@maubp.freeserve.co.uk>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID: <Pine.OSX.4.64.0707102320420.658@emeraldii.local>

Peter and Tiago,

Hello.  No, I haven't done anything with Tiago's code.  I'm afraid
it's pretty far from what I'm working on these days.

I still think it would be good for BioPython to include methods for
computing basic population-genetical statistics (Watterson's theta,
Tajima's D, etc.) from DNA alignments.  I have in mind something like
BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen).  My own
code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
conform to BioPython's standards for style, testing, or documentation,
and I don't know when I'll have time to standardize it.

Ralph

On Tue, 10 Jul 2007, Peter wrote:

> Hi Tiago,
>
> Have you had any feedback (off the mailing list)?
>
> Ralph - did you have a chance to look over Tiago's code or discuss this with 
> him?
>
> It would be a shame if nothing came from this...
>
> Peter
>
> Tiago Ant?o wrote:
>> Hi!
>> 
>> I have submitted another enhancement bug, with support for FDist. It
>> allows to generate and parse Fdist files and to control fdist
>> applications. There are also a couple of utility functions. FDist is a
>> niche application (mainly used to detect selection in animal
>> genetics). Not the most fundamental one to support, but it is
>> currently one that I am working on, thus, the code.
>> 
>> Regarding my summited code for GenePop, I have summited a different
>> version on bugzilla.  The main difference, is that I moved everything
>> from Bio to Bio.PopGen.
>> 
>> Before I continue putting code on bugzilla I would like to know if it
>> is worthwhile doing it... Any opinions on the code submitted or if any
>> changes are required? I would really like to continue converting my
>> code to BioPython, but only if it has any possibility of ending up
>> being useful/included in distribution somewhere in the future... ;)
>> 
>> I am currently working on code related to SimCoal2, Arlequin and
>> general statistics (Fst, heterozygosity, ...). Which will probably be
>> ready quite soon (ie, next two weeks). This is more mainstream than
>> FDist
>> 
>> I have some other code lying around mainly related to HapMap, but I
>> will only submit it after reviewing and reusing it again. This is more
>> distant future ... like a couple of months.
>> 
>> Tiago
>
>
>

From tiagoantao at gmail.com  Wed Jul 11 06:05:21 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 11 Jul 2007 12:05:21 +0200
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <Pine.OSX.4.64.0707102320420.658@emeraldii.local>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
	<Pine.OSX.4.64.0707102320420.658@emeraldii.local>
Message-ID: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>

Hi,

I had no feedback and it seemed that there was no interest, so I
decided to start a Python Population Genetics project on google, which
is going ahead, but still on alpha stages:
http://code.google.com/p/pypopgen/
I am doing this on a personal basis for now (I did not even announce
it anywhere), and so it is advancing at my personal pace and design
according to me needs
I have used it already (or a tiny part of it) on a published
aplication ( http://popgen.eu/soft/m4s2 ).
I am still willing to integrate this on BioPython, but for that some
interest and feedback would be needed... That would have to happen
somewhat soon as the code will have to be adapted to BioPython
standards and namespace, and when, in a future, there is a lot of code
that will be in practice difficult (and after going public it will be
impossible really).

The "strangest" code that I am doing (and that would need more
discussion) is one to do asyncronous computation (to be easy to use on
multicore computers and grids).

Regards,
Tiago

On 7/11/07, Ralph Haygood <rhaygood at duke.edu> wrote:
> Peter and Tiago,
>
> Hello.  No, I haven't done anything with Tiago's code.  I'm afraid
> it's pretty far from what I'm working on these days.
>
> I still think it would be good for BioPython to include methods for
> computing basic population-genetical statistics (Watterson's theta,
> Tajima's D, etc.) from DNA alignments.  I have in mind something like
> BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen).  My own
> code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> conform to BioPython's standards for style, testing, or documentation,
> and I don't know when I'll have time to standardize it.
>
> Ralph
>
> On Tue, 10 Jul 2007, Peter wrote:
>
> > Hi Tiago,
> >
> > Have you had any feedback (off the mailing list)?
> >
> > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > him?
> >
> > It would be a shame if nothing came from this...
> >
> > Peter
> >
> > Tiago Ant?o wrote:
> >> Hi!
> >>
> >> I have submitted another enhancement bug, with support for FDist. It
> >> allows to generate and parse Fdist files and to control fdist
> >> applications. There are also a couple of utility functions. FDist is a
> >> niche application (mainly used to detect selection in animal
> >> genetics). Not the most fundamental one to support, but it is
> >> currently one that I am working on, thus, the code.
> >>
> >> Regarding my summited code for GenePop, I have summited a different
> >> version on bugzilla.  The main difference, is that I moved everything
> >> from Bio to Bio.PopGen.
> >>
> >> Before I continue putting code on bugzilla I would like to know if it
> >> is worthwhile doing it... Any opinions on the code submitted or if any
> >> changes are required? I would really like to continue converting my
> >> code to BioPython, but only if it has any possibility of ending up
> >> being useful/included in distribution somewhere in the future... ;)
> >>
> >> I am currently working on code related to SimCoal2, Arlequin and
> >> general statistics (Fst, heterozygosity, ...). Which will probably be
> >> ready quite soon (ie, next two weeks). This is more mainstream than
> >> FDist
> >>
> >> I have some other code lying around mainly related to HapMap, but I
> >> will only submit it after reviewing and reusing it again. This is more
> >> distant future ... like a couple of months.
> >>
> >> Tiago
> >
> >
> >


From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 07:08:07 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:08:07 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131108.l6DB87xm027778@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 07:08 EST -------
I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py
and noticed that while crc64, gcg and seguid will cope with both strings and
Seq objects, crc32 will only cope with strings.

Any objections to me fixing this like so:

Old:

from binascii import crc32

New:

from binascii import crc32 as _crc32

def crc32(seq) :
    """Returns the crc32 checksum for a sequence (string or Seq object)"""
    try :
        #Assume its a Seq object
        return _crc32(seq.tostring())
    except AttributeError :
        #Assume its a string
        return _crc32(seq)

--
Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 07:18:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:18:30 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131118.l6DBIUOS028425@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 07:18 EST -------
Created an attachment (id=703)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=703&action=view)
Initial unit test for Bio/SeqUtils/CheckSum

If the crc32 function could accept a Seq object then the "try/except" at the
end isn't needed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 10:38:52 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 10:38:52 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131438.l6DEcqn5008339@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp  2007-07-13 10:38 EST -------
A better solution would be for Seq to inherit from str, instead of Seq having
str as a member. Then we don't have to modify crc32, and other code in
Biopython will also become simpler.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 11:17:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:17:59 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131517.l6DFHxY4010647@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 11:17 EST -------
I have just fixed a few in CVS, here a list of remaining abnormal
shebang/hashbang lines:

biopython/Bio/EUtils/POM.py     '#!/usr/bin/python -i\n'
biopython/Bio/EUtils/DTDs/LinkOut.py    '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/__init__.py   '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eInfo_020511.py       '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eLink_020511.py       '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/ePost_020511.py       '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSearch_020511.py     '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSummary_020511.py    '#!/usr/bin/python\n'

The biopython/Bio/EUtils/*.py examples are interesting in that many of those
files are autogenerated from DTD files (using the dtd2py.py script I think -
but it doesn't seem to work on all of them).

Also, I don't think all the files under Bio/Restriction/*.py need a shebang,
and a large proportion of the unit tests have shebangs (but less than half).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Fri Jul 13 11:23:03 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 13 Jul 2007 16:23:03 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
	<Pine.OSX.4.64.0707102320420.658@emeraldii.local>
	<6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
Message-ID: <6d941f120707130823i6b62478fl9ac589eb5c02ca9d@mail.gmail.com>

I just want to add that I followed precisely the procedure that I was
suggested at that time, ie to open bugzilla issues, but I got no
answer or follow up from it. I also had some very useful mail
exchanges with Ralph at that time, but no code was floated around.

I reiterate my interest in supplying the code (currently supporting
fdist, simcoal2, genepop, hapmap, ucsc table browser - in varying
degrees of quality). You can have a look at the google url supplied
(svn repository in it). I would still take the necessary time to
convert it to BioPython namespace and format.

If in one week I see no interest (interest in the form of pro actively
making things go forward) at all then I will consider this a closed
issue and will not spend more time with trying any form of
integration, in the sense that I have done all that was requested here
and really got no feedback.

Tiago

On 7/11/07, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> I had no feedback and it seemed that there was no interest, so I
> decided to start a Python Population Genetics project on google, which
> is going ahead, but still on alpha stages:
> http://code.google.com/p/pypopgen/
> I am doing this on a personal basis for now (I did not even announce
> it anywhere), and so it is advancing at my personal pace and design
> according to me needs
> I have used it already (or a tiny part of it) on a published
> aplication ( http://popgen.eu/soft/m4s2 ).
> I am still willing to integrate this on BioPython, but for that some
> interest and feedback would be needed... That would have to happen
> somewhat soon as the code will have to be adapted to BioPython
> standards and namespace, and when, in a future, there is a lot of code
> that will be in practice difficult (and after going public it will be
> impossible really).
>
> The "strangest" code that I am doing (and that would need more
> discussion) is one to do asyncronous computation (to be easy to use on
> multicore computers and grids).
>
> Regards,
> Tiago
>
> On 7/11/07, Ralph Haygood <rhaygood at duke.edu> wrote:
> > Peter and Tiago,
> >
> > Hello.  No, I haven't done anything with Tiago's code.  I'm afraid
> > it's pretty far from what I'm working on these days.
> >
> > I still think it would be good for BioPython to include methods for
> > computing basic population-genetical statistics (Watterson's theta,
> > Tajima's D, etc.) from DNA alignments.  I have in mind something like
> > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen).  My own
> > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> > conform to BioPython's standards for style, testing, or documentation,
> > and I don't know when I'll have time to standardize it.
> >
> > Ralph
> >
> > On Tue, 10 Jul 2007, Peter wrote:
> >
> > > Hi Tiago,
> > >
> > > Have you had any feedback (off the mailing list)?
> > >
> > > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > > him?
> > >
> > > It would be a shame if nothing came from this...
> > >
> > > Peter
> > >
> > > Tiago Ant?o wrote:
> > >> Hi!
> > >>
> > >> I have submitted another enhancement bug, with support for FDist. It
> > >> allows to generate and parse Fdist files and to control fdist
> > >> applications. There are also a couple of utility functions. FDist is a
> > >> niche application (mainly used to detect selection in animal
> > >> genetics). Not the most fundamental one to support, but it is
> > >> currently one that I am working on, thus, the code.
> > >>
> > >> Regarding my summited code for GenePop, I have summited a different
> > >> version on bugzilla.  The main difference, is that I moved everything
> > >> from Bio to Bio.PopGen.
> > >>
> > >> Before I continue putting code on bugzilla I would like to know if it
> > >> is worthwhile doing it... Any opinions on the code submitted or if any
> > >> changes are required? I would really like to continue converting my
> > >> code to BioPython, but only if it has any possibility of ending up
> > >> being useful/included in distribution somewhere in the future... ;)
> > >>
> > >> I am currently working on code related to SimCoal2, Arlequin and
> > >> general statistics (Fst, heterozygosity, ...). Which will probably be
> > >> ready quite soon (ie, next two weeks). This is more mainstream than
> > >> FDist
> > >>
> > >> I have some other code lying around mainly related to HapMap, but I
> > >> will only submit it after reviewing and reusing it again. This is more
> > >> distant future ... like a couple of months.
> > >>
> > >> Tiago
> > >
> > >
> > >
>


From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 11:25:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:25:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131525.l6DFPWMa011025@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 11:25 EST -------
Changing the Seq object to be a subclass of string might be nice... but perhaps
rather confusing for minority alphabets where the "letters" are not single
characters(*).  More importantly, wouldn't this dramatic change break a lot of
existing scripts? Probably something for the mailing list!

(*) I've never done it, but one example is storing three letter protein
sequences, nice if you have any post translational modifications which cannot
be represented using the single letter scheme.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython-dev at maubp.freeserve.co.uk  Sat Jul 14 06:22:06 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 14 Jul 2007 11:22:06 +0100
Subject: [Biopython-dev] Bug 2292 - TER lines from Bio.PDBIO
Message-ID: <4698A3CE.7020907@maubp.freeserve.co.uk>

Hi Thomas,

Could you have a look at Biopython Bug 2292 and the suggested patch from 
   Michal Gajda to write TER records in line with the spec:

http://bugzilla.open-bio.org/show_bug.cgi?id=2292

Thanks

Peter


From tiagoantao at gmail.com  Sat Jul 14 12:32:43 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 14 Jul 2007 17:32:43 +0100
Subject: [Biopython-dev] Population Genetics code
Message-ID: <6d941f120707140932u356c84bel6a9322a2767e6da7@mail.gmail.com>

Hi!

Firstly I would like to thank everybody that answered so positively to
my "rant" about submitting population genetics code to Biopython.

I have a few suggestions on how to progress in a safe in constructive
way with a possible Population Genetics part for biopython.
First of all, the starting point:
1. There is none in the core developers that is working actively in
populations genetics
2. Point 1 entails that any code submissions (made by biopython
newbies like me) will not be able to be completely reviewed by
seasoned biopython developers
3. Initially there will only be me submitting code (please correct me
if I am wrong, especially Ralph...)
4. There is already some popgen statistical code in python lying
around e.g. http://www.pypop.org/

Therefore I suggest starting out by doing a small, "safe", project
around a not very used application (Mark Beaumont's Fdist program
http://www.rubic.rdg.ac.uk/~mab/software.html ). This code is already
done and tested (by myself). I also have test cases (in BioPython
format) for parts of it. The major issue is that it is currently
outside of Bio.PopGen namespace, so its not really very major...
I would provide parsers, configuration file generators and utilities
to run the suite of fdist programs.
Why start with such a simple and less relevant application:
1. Its safer to start with something less grand (if its poorly done it
won't be that serious).
2. There is no python fdist code lying around, so there is no overlap
at all with existing projects
3. This code is already done and being used...

I will provide code, test code, and documentation (probably by adding
stuff to the wiki). Then other people could evaluate what was done,
and we would continue from there to other, more used applications
(Genepop, arlequin, simcoal2, ...) and databases (HapMap,
TableBrowser).

Is this an acceptable way of going ahead? If other people would like
to participate, that would be fantastic...

If my suggestion is rubbish, please also say ;)

Many thanks,
Tiago

From biopython-dev at maubp.freeserve.co.uk  Mon Jul 16 14:27:40 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:27:40 +0100
Subject: [Biopython-dev] Biopython usage figures
Message-ID: <469BB89C.8010904@maubp.freeserve.co.uk>

A little last minute I know, but would anyone have access to the website
download statistics? I'd like to include rough figures for the number of
downloads of the recent releases in the BOSC 2007 talk.

A list of developers with CVS access would be nice too - but I can just
trawl though the logs to spot active people ;)

Peter


From biopython-dev at maubp.freeserve.co.uk  Mon Jul 16 14:50:49 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:50:49 +0100
Subject: [Biopython-dev] Is Bio.Crystal obsolete?
Message-ID: <469BBE09.1000005@maubp.freeserve.co.uk>

I just had a look at the Bio.Crystal module by Katharine Lindner (2002), 
consisting of the single file Bio/Crystal/__init__.py whose preamble states:

 > Hetero, Crystal and Chain exist to represent the NDB Atlas
 > structure.  Atlas is a minimal subset of the PDB format.  Heteo
 > supports a 3 alphameric code. The NDB web interface is located at
 > ...

The old link should probably be updated as it doesn't work, perhaps:
http://ndbserver.rutgers.edu/atlas/index.html

As far as I can see, they now provide their downloads in PDB, CIF and an 
XML file format - and the PDB files look like full thing to me at first 
glance rather than a minimal subset.

There is a unit test, Tests/test_Crystal.py but no example input files.

This module looks obsolete to me - can we mark it as deprecated after 
checking on the main list no one uses it (as done for Bio.Kabat back in 
March 2007)?

Peter


From tiagoantao at gmail.com  Wed Jul 18 06:29:08 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 11:29:08 +0100
Subject: [Biopython-dev] PopGen code
Message-ID: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>

Hi!

Starting today I will begin putting code on CVS regarding Population
Genetics stuff.
I will start by checking in a GenePop parser and test code.
Very soon FDist code will follow.
After that Simcoal stuff, more GenePop stuff, HapMap and UCSC table
browser will follow.
I was not able to read dev.open-bio.org suggestions as it seems to be
down for a some time.
If any of the senior Biopython developers finds that I am doing
anything seriously wrong, please don't hesitate to contact me
immediately.
I will be putting everything below a PopGen directory in Bio.
Everything except tests, of course ;)

Regards,
Tiago

From biopython-dev at maubp.freeserve.co.uk  Wed Jul 18 17:37:46 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jul 2007 22:37:46 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
Message-ID: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>

Tiago Ant?o wrote:
> Hi!
>
> Starting today I will begin putting code on CVS regarding Population
> Genetics stuff...
> I will be putting everything below a PopGen directory in Bio.
> Everything except tests, of course ;)

Sounds good :)

If you can write some introductory text to add to the
cookbook/tutorial that would be even better.  If you are not familiar
with LaTeX, then just write it up in plain text and I could add that
to the tutorial with suitable mark-up/formatting on your behalf.

This may be easier to do in chunks as you add new code, or in a large
batch later on - up to you.

Peter


From tiagoantao at gmail.com  Wed Jul 18 18:46:19 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 23:46:19 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
	<320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
Message-ID: <6d941f120707181546y34e17038nb07106dacae533db@mail.gmail.com>

Hi!

On 7/18/07, Peter <biopython-dev at maubp.freeserve.co.uk> wrote:
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better.  If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.

I agree, in fact it is what I intend to do after having the FDist code in.
I will write mostly in parallel with commiting. So the doc should be
more or less aligned with what is being put in CVS...

Regards,
Tiago

From tiagoantao at gmail.com  Thu Jul 19 09:09:29 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 19 Jul 2007 14:09:29 +0100
Subject: [Biopython-dev] PopGen Documentation
Message-ID: <6d941f120707190609o4f5f7412x94851295865ba22b@mail.gmail.com>

Hi All,

Following Peter's suggestion, I had a closer look at the
documentation, and, if nobody opposes, I would like to add a new
subsection between PDB and Miscellaneous on the cookbook chapter, Like
this

4.10  Going 3D: The PDB module
4.11  PopGen: Population genetics (and genomics)
4.12  Miscellaneous

Tiago


On 7/18/07, Peter <biopython-dev at maubp.freeserve.co.uk> wrote:
> Tiago Ant?o wrote:
> > Hi!
> >
> > Starting today I will begin putting code on CVS regarding Population
> > Genetics stuff...
> > I will be putting everything below a PopGen directory in Bio.
> > Everything except tests, of course ;)
>
> Sounds good :)
>
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better.  If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.
>
> This may be easier to do in chunks as you add new code, or in a large
> batch later on - up to you.
>
> Peter
>


From bugzilla-daemon at portal.open-bio.org  Sat Jul 21 11:28:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:28:49 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707211528.l6LFSnBk031498@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2007-07-21 11:28 EST -------
In biopython/Bio/EUtils, the shebang line was specified in sourcegen.py. I
fixed that line and the shebang lines in the other *.py files under
biopython/Bio/EUtils. Can we close this bug?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jul 21 11:47:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:47:32 -0400
Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF
	folder after the install
In-Reply-To: <bug-2291-42@http.bugzilla.open-bio.org/>
Message-ID: <200707211547.l6LFlWxU032394@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2291


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2007-07-21 11:47 EST -------
I have modified setup.py so that Bio.PDB.mmCIF is always a module (but not
necessarily with the MMCIFlex module; users still need to modify setup.py to
include it). With Bio.PDB.mmCIF always present, the Bio/PDB/mmCIF/__init__.py
file is no longer lost.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul 22 04:30:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 04:30:11 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707220830.l6M8UB6d006746@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-22 04:30 EST -------
Regarding comment 8, after changing sourcegen.py were you able to regenerate
all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?

Anyway - that should leave us with consistent shebang/hashbang lines :)

Unless we also want to remove any surplus lines, and decide if all or none of
the unit tests should have them, then this bug looks done.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jul 22 05:53:46 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 05:53:46 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707220953.l6M9rkap010929@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2007-07-22 05:53 EST -------
> Regarding comment 8, after changing sourcegen.py were you able to regenerate
> all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?

I fixed them by hand. The fixed sourcegen.py should result in the same
biopython/Bio/EUtils/*.py files as I created by hand. I tried regenerating
these files automatically, but that didn't work for me. At some point, somebody
should figure out how the biopython/Bio/EUtils code works.

> Unless we also want to remove any surplus lines, and decide if all or none of
> the unit tests should have them, then this bug looks done.

Since Python itself does not seem to have a clear rule as to which files should
have a shebang line, it is not obvious which Biopython files should have one.
If somebody really wants to fix this, it's probably better to discuss such an
issue on the mailing list first. As the issue raised by the original bug report
has been resolved, I am closing this bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mdehoon at c2b2.columbia.edu  Sun Jul 22 06:28:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 22 Jul 2007 19:28:22 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
	files with one record)
In-Reply-To: <4693E5FE.708@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk>
Message-ID: <46A33146.7030405@c2b2.columbia.edu>

Peter wrote:
> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
> 
Let's discuss the Bio.Align.Alignment class first, and then decide how 
to parse alignment files.

Currently, the alignment class holds a list of SeqRecord objects:


class Alignment:
     ...
     def __init__(self, alphabet):
         ...
         # hold everything at a list of seq record objects
         self._records = []

To get access to self_record, the Alignment class has some accessor 
functions:

     def get_all_seqs(self):
         ...
         return self._records


     def get_seq_by_num(self, number):
         ...
         return self._records[number].seq

A cleaner way to do this is to let the class Alignment inherit from 
list. This also allows us to use all list methods on Alignment objects. 
For example, we can iterate over them, as suggested in this bug report:

http://bugzilla.open-bio.org/show_bug.cgi?id=1944

Any objections against letting Alignment inherit from list?


--Michiel

From salish at picasso.ucsf.edu  Sun Jul 22 14:27:58 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Sun, 22 Jul 2007 11:27:58 -0700
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
	files with one record)
In-Reply-To: <46A33146.7030405@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>
	<46A33146.7030405@c2b2.columbia.edu>
Message-ID: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>

Hello all,


To get this same behavior, you can also create the __iter__ and next()
methods in Alignment itself.

-Howard Salis

On 7/22/07, Michiel de Hoon <mdehoon at c2b2.columbia.edu> wrote:
> Peter wrote:
> > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
> >
> Let's discuss the Bio.Align.Alignment class first, and then decide how
> to parse alignment files.
>
> Currently, the alignment class holds a list of SeqRecord objects:
>
>
> class Alignment:
>      ...
>      def __init__(self, alphabet):
>          ...
>          # hold everything at a list of seq record objects
>          self._records = []
>
> To get access to self_record, the Alignment class has some accessor
> functions:
>
>      def get_all_seqs(self):
>          ...
>          return self._records
>
>
>      def get_seq_by_num(self, number):
>          ...
>          return self._records[number].seq
>
> A cleaner way to do this is to let the class Alignment inherit from
> list. This also allows us to use all list methods on Alignment objects.
> For example, we can iterate over them, as suggested in this bug report:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Any objections against letting Alignment inherit from list?
>
>
> --Michiel
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From mdehoon at c2b2.columbia.edu  Wed Jul 25 09:17:33 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 22:17:33 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
 files with one record)
In-Reply-To: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>
	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
Message-ID: <46A74D6D.9020309@c2b2.columbia.edu>

Sure, that is possible, but that means we'd be adding methods to 
Alignment in order for it to behave like a list, whereas we can get 
that for free by letting the Alignment class inherit from list.

--Michiel.

Howard Salis wrote:
> Hello all,
> 
> 
> To get this same behavior, you can also create the __iter__ and next()
> methods in Alignment itself.
> 
> -Howard Salis
> 
> On 7/22/07, Michiel de Hoon <mdehoon at c2b2.columbia.edu> wrote:
>> Peter wrote:
>>> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
>>> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
>>>
>> Let's discuss the Bio.Align.Alignment class first, and then decide how
>> to parse alignment files.
>>
>> Currently, the alignment class holds a list of SeqRecord objects:
>>
>>
>> class Alignment:
>>      ...
>>      def __init__(self, alphabet):
>>          ...
>>          # hold everything at a list of seq record objects
>>          self._records = []
>>
>> To get access to self_record, the Alignment class has some accessor
>> functions:
>>
>>      def get_all_seqs(self):
>>          ...
>>          return self._records
>>
>>
>>      def get_seq_by_num(self, number):
>>          ...
>>          return self._records[number].seq
>>
>> A cleaner way to do this is to let the class Alignment inherit from
>> list. This also allows us to use all list methods on Alignment objects.
>> For example, we can iterate over them, as suggested in this bug report:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Any objections against letting Alignment inherit from list?
>>
>>
>> --Michiel
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython-dev at maubp.freeserve.co.uk  Wed Jul 25 09:34:02 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 14:34:02 +0100
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A74D6D.9020309@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
	<46A74D6D.9020309@c2b2.columbia.edu>
Message-ID: <46A7514A.1090405@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Sure, that is possible, but that means we'd be adding methods to 
> Alignment in order for it to behave like a list, whereas we can get 
> that for free by letting the Alignment class inherit from list.
> 
> --Michiel.

Personally I see an alignment as both an array of characters (i.e. amino 
acid residues or nucleotides), and a list of sequences.

In the same way that a Numeric or NumPy array lets you iterate over 
rows, yet also access individual elements, we could allow iteration of 
SeqRecords and also allow access to individual letters.

Peter

From mdehoon at c2b2.columbia.edu  Wed Jul 25 10:44:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 23:44:56 +0900
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A7514A.1090405@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
	<46A74D6D.9020309@c2b2.columbia.edu>
	<46A7514A.1090405@maubp.freeserve.co.uk>
Message-ID: <46A761E8.5080909@c2b2.columbia.edu>

Peter wrote:
> Personally I see an alignment as both an array of characters (i.e. amino 
> acid residues or nucleotides), and a list of sequences.
> 
> In the same way that a Numeric or NumPy array lets you iterate over 
> rows, yet also access individual elements, we could allow iteration of 
> SeqRecords and also allow access to individual letters.

How about the following:

-Iterators iterate for the SeqRecords in the alignment

-An index of the form [xxx] returns the corresponding SeqRecord

-An index of the form [xxx:yyy:zzz] returns an Alignment object 
containing the SeqRecords in rows [xxx:yyy:zzz]
(compare to the current method get_all_seqs()).

-An index of the form [xxx,:] returns the Seq object of the SeqRecord at 
xxx (this is currently done by the get_seq_by_num() method).

-An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects

-An index of the form [:,www] returns a string containing the characters 
  at column www (which is currently done by the get_column method)

-An index of the form [xxx:yyy:zzz,www] returns a string containing the 
characters at column www using only the rows xxx:yyy:zzz.

-An index of the form [xxx,www] returns a string containing the 
character of the sequence in row xxx at column www.

This is more-or-less how Numerical Python arrays work, except that we'll 
be returning SeqRecord/Seq/string objects depending on the indices.

--Michiel.

From biopython-dev at maubp.freeserve.co.uk  Wed Jul 25 12:10:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 17:10:43 +0100
Subject: [Biopython-dev] Improving the Alignment object. Was Bio.AlignIO
In-Reply-To: <46A761E8.5080909@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>	<46A74D6D.9020309@c2b2.columbia.edu>	<46A7514A.1090405@maubp.freeserve.co.uk>
	<46A761E8.5080909@c2b2.columbia.edu>
Message-ID: <46A77603.1030101@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>> Personally I see an alignment as both an array of characters (i.e. amino 
>> acid residues or nucleotides), and a list of sequences.
>>
>> In the same way that a Numeric or NumPy array lets you iterate over 
>> rows, yet also access individual elements, we could allow iteration of 
>> SeqRecords and also allow access to individual letters.
> 
> How about the following:
> 
> -Iterators iterate for the SeqRecords in the alignment

I Agree. And this is trivial to implement without needing the element 
access/splicing support.

As to element access, we've been thinking along similar lines :)
Its just that with all the different special cases, there are lots of 
different possible return types!

> -An index of the form [xxx] returns the corresponding SeqRecord
> -An index of the form [xxx:yyy:zzz] returns an Alignment object 
>  containing the SeqRecords in rows [xxx:yyy:zzz]
>  (compare to the current method get_all_seqs()).

I agree. This is essential to make an alignment act like a list of 
SeqRecord objects when only a one-dimensional index is given.

> -An index of the form [xxx,:] returns the Seq object of the SeqRecord at 
> xxx (this is currently done by the get_seq_by_num() method).
> -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects

I'm not immediately convinced about returning Seq objects here.  I might 
expect indices like [xxx,:] to return a SeqRecord (not a Seq) and 
[xxx:yyy:zzz,:] to return a sub-alignment (not a list of Seq objects).

> -An index of the form [:,www] returns a string containing the characters 
>  at column www (which is currently done by the get_column method)
> -An index of the form [xxx,www] returns a string containing the 
>  character of the sequence in row xxx at column www.

Those look fine - however we might want to return Seq objects rather 
than strings.

 > -An index of the form [xxx:yyy:zzz,www] returns a string containing
 >  the characters at column www using only the rows xxx:yyy:zzz.

Or a sub alignment? See later...

> This is more-or-less how Numerical Python arrays work, except that we'll 
> be returning SeqRecord/Seq/string objects depending on the indices.

For comparison, that is what I had been thinking:
* [r,c] means one element is requested, return a single character string
* [r] or [r,:] means one row is requested, return a SeqRecord
* [:,c] means one column is requested, return a string (or Seq object?)
* Otherwise returns a (sub)alignment. Note that [:] or [:,:] would 
return a copy of the alignment.

This would cover slicing of the column index by returning a 
sub-alignment. i.e. indexes of the form [rrr, xxx:yyy:zzz] or 
[rrr:ppp:qqq, xxx:yyy:zzz]

I'm not sure if requests for part of a single row or column like [rrr, 
xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning 
sub-alignments or as special cases (strings/Seq and Seq/SeqRecord 
respectively?).

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 10:52:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 10:52:38 -0400
Subject: [Biopython-dev] [Bug 2340] New: SProt.py fails to parse the current
	Swiss-Prot version 54.0
Message-ID: <bug-2340-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340

           Summary: SProt.py fails to parse the current Swiss-Prot version
                    54.0
           Product: Biopython
           Version: 1.43
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: gould at embl.de


Hi, 

I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
swiss-prot record but the parser just seems to bomb out not throwing an error
of where it actually fails. I'm guessing it has to do with the Release 54.0 of
24-Jul-07 of UniPROT with the addition of the new line type PE??


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 11:46:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 11:46:36 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261546.l6QFkaGq022472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-26 11:46 EST -------
Hi Kate,

Could you give us the URL of one or two specific SwissProt files you're having
trouble with.

Also how are you trying to read the SwissProt files? e.g. with
Bio.SeqIO.parse()?

If you could include the python error too, that could be helpful. Thanks.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 12:06:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:06:15 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261606.l6QG6FkE023264@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #2 from gould at embl.de  2007-07-26 12:06 EST -------
(In reply to comment #0)
> Hi, 
> 
> I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
> swiss-prot record but the parser just seems to bomb out not throwing an error
> of where it actually fails. I'm guessing it has to do with the Release 54.0 of
> 24-Jul-07 of UniPROT with the addition of the new line type PE??
> 

(In reply to comment #1)
> Hi Kate,
> 
> Could you give us the URL of one or two specific SwissProt files you're having
> trouble with.
> 
> Also how are you trying to read the SwissProt files? e.g. with
> Bio.SeqIO.parse()?
> 
> If you could include the python error too, that could be helpful. Thanks.
> 
> Peter
> 

hi 
the following snippet of code is where the error occurs(this used to work no
problem before something changed in the last day or two I guess)

def getSequence(self,acc):
""" This method retrieves the most recent annotated sequence from the ExPASy
server for a given accession number. """

        from Bio.WWW import ExPASy
        from Bio.SwissProt import SProt
        from Bio import File

        if acc != '':
            try:
                results = ExPASy.get_sprot_raw(acc.strip()).read()
                sp_parser = SProt.RecordParser()
                sp_iterator = SProt.Iterator(File.StringHandle(results),
sp_parser)
                Record = sp_iterator.next()
                return Record.sequence.strip()
            except:
                return -1
        else:
            return acc


breaks at line : Record = sp_iterator.next() but doesn't print any error to
terminal....
some examples of accessions nrs used are: P01100, P12522 etc

thanks
Kate


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 12:32:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:32:31 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261632.l6QGWVrC024560@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-26 12:32 EST -------
Confirmeing bug - it is due to the new PE line (protein evidence).

The reason you didn't see the error is in your example the parser is wrapped in
a try ... except ... clause.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 12:51:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:51:45 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261651.l6QGpja8025622@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-26 12:51 EST -------
I think I have fixed this - at least your example code now works.

You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
CVS, which you can download here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython

Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
to put things back.

Please test this and report back.

NOTE - The fix just makes the parser aware of the new PE line, and ignores it. 
It doesn't (yet) do anything useful with the information it contains!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jul 27 02:46:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 02:46:35 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707270646.l6R6kZaI001699@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #5 from gould at embl.de  2007-07-27 02:46 EST -------
(In reply to comment #4)
> I think I have fixed this - at least your example code now works.
> 
> You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
> CVS, which you can download here:
> 
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython
> 
> Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
> to put things back.
> 
> Please test this and report back.
> 
> NOTE - The fix just makes the parser aware of the new PE line, and ignores it. 
> It doesn't (yet) do anything useful with the information it contains!
> 

Yes it has done the trick and all works OK again. thanks


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jul 27 03:54:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 03:54:14 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707270754.l6R7sEnm007432@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-27 03:54 EST -------
Great :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From kosa at genesilico.pl  Fri Jul 27 06:47:10 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 12:47:10 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
Message-ID: <46A9CD2E.6080402@genesilico.pl>

Hi,

 From the viewpoint of the enduser we would like python Alignment object
to behave outside as an array so we could get slices, columns,
sequences, their fragments, whatever we want etc. The most intuitive and
clear (certainly much better than not very clear indexes like
[xxx:yyy:zzz]) for the  user is the following.

[A:B][X:Y] - general syntax of indices. This supports almost everything.

Several examples of usage and proposed outputs:

[:][:] - returns an alignment or its copy (as Alignment object)

[:][x:y] - returns slice of the alignment (as Alignment object; aln of
all sequences and residues corresponding to columns from x and y)

[a:b][:] - returns the aln of seqs from a to b (as Alignment object)

[a:b][x:y] - returns the slice and subalignment (as Alignment object)

[a:a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as Alignment object)
[a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as a String)

[a:][x:y] and similar combinations - returns the slice and subalignment,
sequences from a to the last are included (as Alignment object)

[:][x] - returns single column (as a String object? string here could be
very useful)

[:][x:x] - returns single column (as Alignment object)

[a] - returns single sequence (as a SeqRecord object)
[a:a] and [a:a][:] - returns single sequence (as Alignment object)

[m][n] - returns n-th element of sequence m (as a String)

Disputable could be that different but similar sets of indices return
different types of objects (ex. [:][x] would return a column as string
while [:][x:x] would return a column as Alignment object, but in my
opinion it would just extend the usability).

The only problem is an implementation of such calls but it depends on 
what type of object the Alignment object will be.

What do you think?

Cheers,
Jan Kosinski
Grzegorz Papaj


:.


From bugzilla-daemon at portal.open-bio.org  Fri Jul 27 08:51:10 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 08:51:10 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200707271251.l6RCpAIg025706@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-27 08:51 EST -------
Created an attachment (id=721)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=721&action=view)
Patch for Bio/Align/Generic.py to add __getitem__ method

This patch adds a __getitem__ method, a small "mini test" when running the
module directly, and updates the doc strings.  This gives SeqRecord iteration
"for free" (without an explicit __iter__ method).

As discussed on the mailing list, this allows an Alignment object to be treated
as a list of SeqRecord objects or as an array of character strings - plus
extract whole columns as strings.

Quoting the proposed __getitem__ doc string:

        Depending on the indices, you can get a SeqRecord objects
        (representing a single row), strings (for a single columns or
        single characters) or another alignment (representing some or
        part of the alignment).

        align[r,c] gives a single character as a string
        align[r] gives a SeqRecord
        align[:,c] gives a column as a string
        align[:] and align[:,:] give a copy of the alignment

        Anything else gives a sub alignment, e.g.
        align[0:2] or align[0:2,:] uses only row 0 and 1
        align[:,1:3] uses only columns 1 and 2
        align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2

Feedback welcome - either here, or on the developers' mailing list.  Thanks


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython-dev at maubp.freeserve.co.uk  Fri Jul 27 08:18:21 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 13:18:21 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9CD2E.6080402@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
Message-ID: <46A9E28D.40609@maubp.freeserve.co.uk>

Jan Kosinski wrote:
> Hi,
> 
>  From the viewpoint of the enduser we would like python Alignment object
> to behave outside as an array so we could get slices, columns,
> sequences, their fragments, whatever we want etc. The most intuitive and
> clear (certainly much better than not very clear indexes like
> [xxx:yyy:zzz]) for the  user is the following.
> 
> [A:B][X:Y] - general syntax of indices. This supports almost everything.

I think Michiel and I were suggesting [A:B,X:Y] or rather [A:B:C,X:Y:Z] 
to be fully general, rather than [A:B][X:Y] or [A:B:C][X:Y:Z]

i.e. [arg1, arg2] rather than [arg1][arg2]

This is an important point, as in the first case the __getitem__ method 
of the alignment is called once (with both arguments). In the second 
case, the __getitem__ method is called with arg1, and may return a 
SeqRecord or an alignment - and this object's __getitem__ method is 
called with arg2.

As written, many of your cases appear to be impossible - but using the 
[arg1,arg2] we can get close.

I've got a working bit of code put together now which I'll attached to 
bug 1944 soon.

http://bugzilla.open-bio.org/show_bug.cgi?id=1944

Peter


From kosa at genesilico.pl  Fri Jul 27 10:13:24 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:13:24 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
	<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46A9FD84.4080502@genesilico.pl>

Hi,

Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not 
clear while the [A:B][X:Y] syntax is clear and sufficient.

We had another discussion in the lab about that Alignment object should 
not store records in the list but rather in a dictionary (but keeping 
information about sequence order ) or so.  What is you reasoning for 
making Alignment object a list of SeqRecord objects?
One should carefully think about design of the Alignment class since it 
will influence all further steps. As now the class is in its infancy 
there is a very good moment for thinking what the Alignment class is for 
and what it should support. For instance, the Alignment object should 
support changing characters in the alignment without a need of copying 
it (using  aln[a][x] = "D"). Can it be done now with Alignment which is 
a list of SeqRecord objects with sequences implemented as immutable Seq 
objects ?

Cheers,
Jan Kosinski


Peter wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>>  From the viewpoint of the enduser we would like python Alignment object
>> to behave outside as an array so we could get slices, columns,
>> sequences, their fragments, whatever we want etc. The most intuitive and
>> clear (certainly much better than not very clear indexes like
>> [xxx:yyy:zzz]) for the  user is the following.
>>
>> [A:B][X:Y] - general syntax of indices. This supports almost everything.
>
> I think Michiel and I were suggesting [A:B,X:Y] or rather 
> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or 
> [A:B:C][X:Y:Z]
>
> i.e. [arg1, arg2] rather than [arg1][arg2]
>
> This is an important point, as in the first case the __getitem__ 
> method of the alignment is called once (with both arguments). In the 
> second case, the __getitem__ method is called with arg1, and may 
> return a SeqRecord or an alignment - and this object's __getitem__ 
> method is called with arg2.
>
> As written, many of your cases appear to be impossible - but using the 
> [arg1,arg2] we can get close.
>
> I've got a working bit of code put together now which I'll attached to 
> bug 1944 soon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Peter
>
>
> :.
>


:.


From kosa at genesilico.pl  Fri Jul 27 10:35:15 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:35:15 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA02A3.30000@genesilico.pl>

Hi,

Sorry for a typo ;-) Of course it should read:
... while the [A:B,X:Y] syntax is clear and sufficient."

Cheers,
Janek

Jan Kosinski wrote:
> Hi,
>
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is 
> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> We had another discussion in the lab about that Alignment object 
> should not store records in the list but rather in a dictionary (but 
> keeping information about sequence order ) or so.  What is you 
> reasoning for making Alignment object a list of SeqRecord objects?
> One should carefully think about design of the Alignment class since 
> it will influence all further steps. As now the class is in its 
> infancy there is a very good moment for thinking what the Alignment 
> class is for and what it should support. For instance, the Alignment 
> object should support changing characters in the alignment without a 
> need of copying it (using  aln[a][x] = "D"). Can it be done now with 
> Alignment which is a list of SeqRecord objects with sequences 
> implemented as immutable Seq objects ?
>
> Cheers,
> Jan Kosinski
>
>
> Peter wrote:
>> Jan Kosinski wrote:
>>> Hi,
>>>
>>>  From the viewpoint of the enduser we would like python Alignment 
>>> object
>>> to behave outside as an array so we could get slices, columns,
>>> sequences, their fragments, whatever we want etc. The most intuitive 
>>> and
>>> clear (certainly much better than not very clear indexes like
>>> [xxx:yyy:zzz]) for the  user is the following.
>>>
>>> [A:B][X:Y] - general syntax of indices. This supports almost 
>>> everything.
>>
>> I think Michiel and I were suggesting [A:B,X:Y] or rather 
>> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or 
>> [A:B:C][X:Y:Z]
>>
>> i.e. [arg1, arg2] rather than [arg1][arg2]
>>
>> This is an important point, as in the first case the __getitem__ 
>> method of the alignment is called once (with both arguments). In the 
>> second case, the __getitem__ method is called with arg1, and may 
>> return a SeqRecord or an alignment - and this object's __getitem__ 
>> method is called with arg2.
>>
>> As written, many of your cases appear to be impossible - but using 
>> the [arg1,arg2] we can get close.
>>
>> I've got a working bit of code put together now which I'll attached 
>> to bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Peter
>>
>>
>> :.
>>
>
>


:.


From biopython-dev at maubp.freeserve.co.uk  Fri Jul 27 13:11:03 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 18:11:03 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA2727.103@maubp.freeserve.co.uk>

Jan Kosinski wrote:
> We had another discussion in the lab about that Alignment object should 
> not store records in the list but rather in a dictionary (but keeping 
> information about sequence order ) or so.  What is you reasoning for 
> making Alignment object a list of SeqRecord objects?

In a sense the Bio.Align.Generic.Alignment object always was a list of 
SeqRecords (if you look at the internal implementation that is), and I 
hadn't stopped to really question it. I like having list like behaviour 
and exploit this in a lot of my code dealing with alignments.

The are some nice things about having dictionary like behaviour in an 
alignment class, but unless a notional sequence order is preserved, this 
breaks the array of characters model.

Also, using a dictionary like alignment would force the user to specify 
unique keys for each record (e.g. the record.id) which is something the 
current list-like-alignment does not require.

Perhaps we could have a "dictionary like" sub class of Alignment where 
the __getitem__ method would allow a record identifier in place of a row 
index:

print aln["P3454"]
print aln["P3454", 20]

instead or as well as:

print aln[10]
print aln[10, 20]

> One should carefully think about design of the Alignment class since it 
> will influence all further steps. As now the class is in its infancy 
> there is a very good moment for thinking what the Alignment class is for 
> and what it should support.

I had viewed the new __getitem__ method as a backwards compatible 
enhancement of the existing stable (but rather limited) 
Bio.Generic.Alignment class. That's not to say we can't design a new 
class from scratch - I just prefer gradual improvements without breaking 
existing usage.

I am particularly keen to allow splicing of alignments. For example, you 
could select the conserved core of an alignment by removing the left 
most 10 columns and the right most ten columns:

align_core = aln[:,10:-10]

 > For instance, the Alignment object should
> support changing characters in the alignment without a need of copying 
> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
> a list of SeqRecord objects with sequences implemented as immutable Seq 
> objects ?

No, right now you can't easily edit sequences in a Bio.Generic.Alignment 
(even with the proposed change) as it is implemented using immutable Seq 
objects. I personally haven't needed to edit an alignment like this.  Is 
this something you want to do often?

To me the obvious way to handle this is to have a MutableAlignment 
sub-class, where editing individual elements with aln[r,c] = "D" would 
be supported (possibly implemented using the MutableSeq class internally 
rather than the immutable Seq class).

On a related point, I was planning to raise the following suggestion in 
the future - adding alignments, like this:

combined_aln = aln1 + aln2

e.g. aln1 had 5 rows of length 10, and aln2 had 5 rows of length 15, 
then the result of aln1+aln2 would have 5 rows of length 25.

Alignment addition would only be defined for alignments with the same 
number of rows (perhaps also restricted to the same sequence type, and 
row weights?). The result would contain the same number of rows, where 
each sequence was the concatenation of the corresponding two rows in the 
input alignments. I'd suggest concatenating the record.id's (if 
different) however one could argue that it would be better to insist the 
user had made sure the two alignments had consistent identifiers.

An example of where this could be used is taking alignments of multiple 
sets of homologous genes, sorting them to use the same species order, 
and then creating a concatenated alignment for robust phylogenetic tree 
construction.

Peter


From mdehoon at c2b2.columbia.edu  Fri Jul 27 22:57:05 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 11:57:05 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AAB081.30609@c2b2.columbia.edu>

Jan Kosinski wrote:
> Hi,
> 
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not 
> clear while the [A:B][X:Y] syntax is clear and sufficient.

Python lists, tuples, and strings support [A:B:C], and Numerical Python 
2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment should 
not support this format.

--Michiel.

From mdehoon at c2b2.columbia.edu  Fri Jul 27 23:10:06 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 12:10:06 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>
	<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAB38E.50009@c2b2.columbia.edu>

Peter wrote:
> Perhaps we could have a "dictionary like" sub class of Alignment where 
> the __getitem__ method would allow a record identifier in place of a row 
> index:
> 
> print aln["P3454"]
> print aln["P3454", 20]
> 
> instead or as well as:
> 
> print aln[10]
> print aln[10, 20]

"as well as" would break if a user decides to use an integer as a key in 
the dictionary. A safer approach would be to define a method 
specifically for dictionary-like access. Something like:

print aln[10]
print aln[10,20]

for list-like access, and

print aln.get("P3454")

for dictionary-like access.

--Michiel.

From mdehoon at c2b2.columbia.edu  Sat Jul 28 00:11:03 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:11:03 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
	<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46AAC1D7.8030208@c2b2.columbia.edu>

Peter wrote:
> I've got a working bit of code put together now which I'll attached to 
> bug 1944 soon.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
> 
For the most part, I agree with the functionality in this patch. I have 
three suggestions though:

 >>> aln = Alignment(alphabet)
# Suggestion 1: We should allow creating an Alignment without specifying 
an alphabet

 >>> aln.add_sequence("seq1", "ATCGTTGC")
 >>> aln.add_sequence("seq2", "ATCCTTGC")
 >>> aln.add_sequence("seq3", "ATCCGTGC")
 >>> aln[0]
SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
name='<unknown name>', description='seq1', dbxrefs=[])
# Suggestion 2: I would expect "seq1" as the id rather than the description

 >>> aln[:2]
<Bio.Align.Generic.Alignment instance at 0x10aaeb8>
# OK
 >>> aln[:,4]
'TTG'
# OK
 >>> aln[2,:]
<Bio.Align.Generic.Alignment instance at 0x105efd0>
# Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment 
consisting of a single sequence doesn't make much sense.

--Michiel.

From mdehoon at c2b2.columbia.edu  Sat Jul 28 00:20:24 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:20:24 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>
	<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAC408.2050703@c2b2.columbia.edu>

Peter wrote:
>> For instance, the Alignment object should
>> support changing characters in the alignment without a need of copying 
>> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
>> a list of SeqRecord objects with sequences implemented as immutable Seq 
>> objects ?
> 
....
> 
> To me the obvious way to handle this is to have a MutableAlignment 
> sub-class, where editing individual elements with aln[r,c] = "D" would 
> be supported (possibly implemented using the MutableSeq class internally 
> rather than the immutable Seq class).
> 
I don't think we'd need a separate MutableAlignment for that. An 
Alignment is a list of sequences and is therefore mutable. If we add a 
__setitem__ method to the Alignment class, then this method can take 
care of constructing a new sequence and put it in the appropriate row.

--Michiel.

From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 06:04:04 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 11:04:04 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAC1D7.8030208@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46AAC1D7.8030208@c2b2.columbia.edu>
Message-ID: <46AB1494.301@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>> I've got a working bit of code put together now which I'll attached to 
>> bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
> For the most part, I agree with the functionality in this patch. I have 
> three suggestions though:
> 
>  >>> aln = Alignment(alphabet)
> # Suggestion 1: We should allow creating an Alignment without specifying 
> an alphabet

That would mean changing the existing __init__ from:

def __init__(self, alphabet):

to something like:

def __init__(self, alphabet=single_letter_alphabet):

with this import statement added:

from Bio.Alphabet import single_letter_alphabet

This seems like a good idea, and shouldn't break any existing code either.

>  >>> aln.add_sequence("seq1", "ATCGTTGC")
>  >>> aln.add_sequence("seq2", "ATCCTTGC")
>  >>> aln.add_sequence("seq3", "ATCCGTGC")
>  >>> aln[0]
> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
> name='<unknown name>', description='seq1', dbxrefs=[])
> # Suggestion 2: I would expect "seq1" as the id rather than the description

I agree with you here - this is the historic behaviour of the
add_sequence method which actually creates a SeqRecord from the strings
it is given. I would suggest it populate the record.id but for backwards
compatibility still populate the record.description in case anyone is
still using that.

We also could add an add_record method to the alignment object which
takes a SeqRecord, plus optional weight (and start and end?). Marc
Colosimo also made this point on bug 1944 (although I don't like his
mixed case method name).

>  >>> aln[:2]
> <Bio.Align.Generic.Alignment instance at 0x10aaeb8>
> # OK
>  >>> aln[:,4]
> 'TTG'
> # OK
>  >>> aln[2,:]
> <Bio.Align.Generic.Alignment instance at 0x105efd0>
> # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment 
> consisting of a single sequence doesn't make much sense.

I'll have a closer look, but as aln[2] returns a single SeqRecord maybe
aln[2,:] should do that too - rather than returning a string.

Peter


From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 09:14:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 14:14:43 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>	<46AA2727.103@maubp.freeserve.co.uk>
	<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB4143.5070406@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of copying 
>>> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
>>> a list of SeqRecord objects with sequences implemented as immutable Seq 
>>> objects ?
> ....
>> To me the obvious way to handle this is to have a MutableAlignment 
>> sub-class, where editing individual elements with aln[r,c] = "D" would 
>> be supported (possibly implemented using the MutableSeq class internally 
>> rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An 
> Alignment is a list of sequences and is therefore mutable. If we add a 
> __setitem__ method to the Alignment class, then this method can take 
> care of constructing a new sequence and put it in the appropriate row.
> 
So rather than editing one character of a MutableSeq, we could replace 
one immutable Seq object with a new immutable Seq object where one 
character was different? That would work - sounds a little slow, but 
certainly possible.

Peter


From mdehoon at c2b2.columbia.edu  Sat Jul 28 11:15:49 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:15:49 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46AAC1D7.8030208@c2b2.columbia.edu>
	<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5DA5.6050604@c2b2.columbia.edu>

# Current method to add a row to the alignment:
>>> aln.add_sequence("seq1", "ATCGTTGC")
...

Peter wrote:
> We also could add an add_record method to the alignment object which
> takes a SeqRecord, plus optional weight (and start and end?). Marc
> Colosimo also made this point on bug 1944 (although I don't like his
> mixed case method name).

This is Marc Colosimo's suggestion for adding a SeqRecord:
     def addSeqRecord(self, seqRec):
         """Add a Sequence Record to the Alignment

         @param seqRec: a sequence record (SeqRecord) to add.
         """
         if isinstance(seqRec, SeqRecord):
             self._records.append(seqRec)
         else:
             raise TypeError("sequence is NOT a SeqRecord Object")

Since an Alignment is essentially a list of SeqRecords, I propose that 
we call the method to add a row to this list "append". In addition, this 
method should be able to take a SeqRecord, a Seq object, or a plain 
string. Something like this:

     def append(self, sequence):
         if isinstance(sequence, SeqRecord):
             self._records.append(sequence)
         elif isinstance(sequence, Seq):
             self._records.append(SeqRecord(sequence))
         elif isinstance(sequence, str):
             self._records.append(SeqRecord(Seq(sequence)))
         else:
             raise TypeError("sequence should be a string, a Seq Object, 
or a SeqRecord object")

This method can be generalized to allow a descriptor, weight, start, end 
end, just like in the current add_sequence method. Then we can replace 
add_sequence and addSeqRecord by a single append method.

--Michiel.

From mdehoon at c2b2.columbia.edu  Sat Jul 28 11:17:52 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:17:52 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46AAC1D7.8030208@c2b2.columbia.edu>
	<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5E20.5090605@c2b2.columbia.edu>

Peter wrote:
> Michiel de Hoon wrote:
>>  >>> aln.add_sequence("seq1", "ATCGTTGC")
>>  >>> aln[0]
>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
>> name='<unknown name>', description='seq1', dbxrefs=[])
>> # Suggestion 2: I would expect "seq1" as the id rather than the 
>> description
> 
> I agree with you here - this is the historic behaviour of the
> add_sequence method which actually creates a SeqRecord from the strings
> it is given. I would suggest it populate the record.id but for backwards
> compatibility still populate the record.description in case anyone is
> still using that.
> 
That sounds good to me.

--Michiel.

From mdehoon at c2b2.columbia.edu  Sat Jul 28 11:23:51 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:23:51 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AB4143.5070406@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>	<46AA2727.103@maubp.freeserve.co.uk>
	<46AAC408.2050703@c2b2.columbia.edu>
	<46AB4143.5070406@maubp.freeserve.co.uk>
Message-ID: <46AB5F87.1090506@c2b2.columbia.edu>

Peter wrote:
> Michiel de Hoon wrote:
>> Peter wrote:
>>>> For instance, the Alignment object should
>>>> support changing characters in the alignment without a need of 
>>>> copying it (using  aln[a,x] = "D"). Can it be done now with 
>>>> Alignment which is a list of SeqRecord objects with sequences 
>>>> implemented as immutable Seq objects ?
>> ....
>>> To me the obvious way to handle this is to have a MutableAlignment 
>>> sub-class, where editing individual elements with aln[r,c] = "D" 
>>> would be supported (possibly implemented using the MutableSeq class 
>>> internally rather than the immutable Seq class).
>>>
>> I don't think we'd need a separate MutableAlignment for that. An 
>> Alignment is a list of sequences and is therefore mutable. If we add a 
>> __setitem__ method to the Alignment class, then this method can take 
>> care of constructing a new sequence and put it in the appropriate row.
>>
> So rather than editing one character of a MutableSeq, we could replace 
> one immutable Seq object with a new immutable Seq object where one 
> character was different? That would work - sounds a little slow, but 
> certainly possible.
> 
At first, I also thought that that would be slow, especially for long 
sequences. But in practice, it's surprisingly fast. Unless somebody 
wants to edit an alignment of chromosome-size sequences, we probably 
won't run into a speed problem.

--Michiel.

From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 12:00:34 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 17:00:34 +0100
Subject: [Biopython-dev] adding rows to an alignment object
In-Reply-To: <46AB5DA5.6050604@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46AAC1D7.8030208@c2b2.columbia.edu>	<46AB1494.301@maubp.freeserve.co.uk>
	<46AB5DA5.6050604@c2b2.columbia.edu>
Message-ID: <46AB6822.6090706@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Since an Alignment is essentially a list of SeqRecords, I propose that 
> we call the method to add a row to this list "append".

Sounds very sensible.

 > In addition, this method should be able to take a SeqRecord, a Seq
 > object, or a plain string.

Do you really think we should complicate things like this? I would just 
accept SeqRecord objects (with optional start/end/weight).

> Something like this:
> 
>      def append(self, sequence):
>          if isinstance(sequence, SeqRecord):
>              self._records.append(sequence)
>          elif isinstance(sequence, Seq):
>              self._records.append(SeqRecord(sequence))
>          elif isinstance(sequence, str):
>              self._records.append(SeqRecord(Seq(sequence)))
>          else:
>              raise TypeError("sequence should be a string, a Seq Object, 
> or a SeqRecord object")

One minor point - we should use the alignment's alphabet when building a 
Seq object from a string. Perhaps we should even check the alphabet when 
asked to append a SeqRecord or Seq object...

 > This method can be generalized to allow a descriptor, weight, start,
 > end, just like in the current add_sequence method.

Where the descriptor is expected for Seq and string input, and used as 
the SeqRecord's id?

I would personally check the length matches the rest of the alignment 
(something the current add_sequence method doesn't do) otherwise its 
very easy to get a malformed alignment where some sequences are longer 
than others.

Also, I would leave the existing .add_sequence() method in place, but 
update its docstring to encourage use of .append() instead.

Peter


From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 11:49:11 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 16:49:11 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB5E20.5090605@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46AAC1D7.8030208@c2b2.columbia.edu>	<46AB1494.301@maubp.freeserve.co.uk>
	<46AB5E20.5090605@c2b2.columbia.edu>
Message-ID: <46AB6577.6050708@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>> Michiel de Hoon wrote:
>>>  >>> aln.add_sequence("seq1", "ATCGTTGC")
>>>  >>> aln[0]
>>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
>>> name='<unknown name>', description='seq1', dbxrefs=[])
>>> # Suggestion 2: I would expect "seq1" as the id rather than the 
>>> description
>> I agree with you here - this is the historic behaviour of the
>> add_sequence method which actually creates a SeqRecord from the strings
>> it is given. I would suggest it populate the record.id but for backwards
>> compatibility still populate the record.description in case anyone is
>> still using that.
>>
> That sounds good to me.

Good. Done, CVS revision 1.6 of file Bio/Align/Generic.py

Peter


From kosa at genesilico.pl  Sat Jul 28 12:53:04 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:53:04 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAB081.30609@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
Message-ID: <46AB7470.6010006@genesilico.pl>

Hi,

I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of 
alignments. Ins't [A:B,X:Y] sufficient?

Janek


Michiel de Hoon wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
>> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is 
>> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> Python lists, tuples, and strings support [A:B:C], and Numerical 
> Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment 
> should not support this format.
>
> --Michiel.
>
> :.
>


:.


From kosa at genesilico.pl  Sat Jul 28 12:55:33 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:55:33 +0200
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>
	<46AA2727.103@maubp.freeserve.co.uk>
	<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB7505.30302@genesilico.pl>

Hi,

I think the same, an alignment should be mutable and there is no need 
for making two classes, mutable and not mutable.

Janek

Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of 
>>> copying it (using  aln[a,x] = "D"). Can it be done now with 
>>> Alignment which is a list of SeqRecord objects with sequences 
>>> implemented as immutable Seq objects ?
>>
> ....
>>
>> To me the obvious way to handle this is to have a MutableAlignment 
>> sub-class, where editing individual elements with aln[r,c] = "D" 
>> would be supported (possibly implemented using the MutableSeq class 
>> internally rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An 
> Alignment is a list of sequences and is therefore mutable. If we add a 
> __setitem__ method to the Alignment class, then this method can take 
> care of constructing a new sequence and put it in the appropriate row.
>
> --Michiel.
>
> :.
>


:.


From mdehoon at c2b2.columbia.edu  Sun Jul 29 00:38:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 13:38:28 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB7470.6010006@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
	<46AB7470.6010006@genesilico.pl>
Message-ID: <46AC19C4.1000102@c2b2.columbia.edu>

Jan Kosinski wrote:
> I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of 
> alignments. Ins't [A:B,X:Y] sufficient?
> 
[A:B,X:Y] may be sufficient, but does not agree with Python indices for 
other objects (lists, tuples, strings). In addition, since allowing 
[A:B,X:Y] only is different from usual Python usage, we'd actually end 
up writing more code to specifically disallow [A:B:C,X:Y:Z].

Note also that [A:B:C,X:Y:Z] includes [A:B,X:Y] as a special case. So if 
the Alignment class is written to deal with [A:B:C,X:Y:Z], but I'd tell 
you that it expects [A:B,X:Y], then you wouldn't notice any difference. 
Until you'd try [A:B:C,X:Y:Z] and you find out that that works too.

--Michiel.

From mdehoon at c2b2.columbia.edu  Tue Jul 31 21:50:05 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 31 Jul 2007 21:50:05 -0400
Subject: [Biopython-dev] Improving the Alignment object
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>
	<46AA2727.103@maubp.freeserve.co.uk>
	<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5FD@mail2.exch.c2b2.columbia.edu>

Peter wrote:
> I'm not sure if requests for part of a single row or column like
> [rrr, xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning
> sub-alignments or as special cases (strings/Seq and Seq/SeqRecord
> respectively?).

Jan wrote:
> For instance, the Alignment object should
> support changing characters in the alignment without a need of copying 
> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
> a list of SeqRecord objects with sequences implemented as immutable Seq 
> objects ?
>

If we allow
>>> aln[a,x] = "D"

then we should also allow
>>> aln[a,x:x+4] = "DEFG"
>>> aln[a:a+5,x] = "KLMNO"
and perhaps even
>>> aln[a:a+5,x:x+3] = ["KLMNO","PQRST","UVWXY"]


For consistency, I feel that then aln[a,x:y] and aln[a:b,x] should both
return a string.

--Michiel

Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 02:55:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 30 Jun 2007 22:55:31 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707010255.l612tVwN022655@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #15 from mdehoon at ims.u-tokyo.ac.jp  2007-06-30 22:55 EST -------
I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 03:23:02 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 30 Jun 2007 23:23:02 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707010323.l613N24V023919@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #16 from sbassi at gmail.com  2007-06-30 23:23 EST -------
(In reply to comment #15)
> I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py.
> 

This code won't run on Python 2.3:
=============================================
sbassi at hp:~/bioinfo$ python
Python 2.3.4 (#2, Jun 16 2005, 18:52:31)
[GCC 3.3.5 (Debian 1:3.3.5-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import CheckSum
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "CheckSum.py", line 50
    return sum(n*ord(c.upper()) for (n,c) in izip(cycle(range(1,58)),seq)) %
10000
                                  ^
SyntaxError: invalid syntax
==========================================
That is why I made a separate module for Python 2.4+


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 05:54:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 01:54:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707010554.l615stgK032500@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 01:54 EST -------
Sorry for the mistake.

With the code for Python >= 2.4 separately, we still get an error message when
installing Biopython, because Python attempts to byte-compile each module. It
is not so serious, because this error is otherwise ignored. However, how about
this code for Python >= 2.4:

from itertools import cycle, imap

return sum(imap(lambda n,c: n*ord(c.upper()), cycle(range(1,58)),seq)) % 10000

It is almost as fast as the code you now have for Python >= 2.4, but avoids
having to create a separate module gcg24.py.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 11:02:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 07:02:47 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707011102.l61B2lHg029279@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 07:02 EST -------
Btw, I am finding that the code for Python < 2.3 is faster than the code for
Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
if  we avoid copying seq, I still find that it is faster than the code for
Python >= 2.4.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mdehoon at c2b2.columbia.edu  Sun Jul  1 12:01:00 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 01 Jul 2007 21:01:00 +0900
Subject: [Biopython-dev] TempFastaWriter,
	TempFastaWriterSingle in Bio/GFF/easy.py
In-Reply-To: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
References: <4685FCCA.4090904@c2b2.columbia.edu>
	<320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
Message-ID: <4687977C.70903@c2b2.columbia.edu>

Peter wrote:
>> Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in
>> Bio/GFF/easy.py? They are currently using the old Fasta writer in
>> Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can
>> either update them to use the new Fasta writer, or simply remove them,
>> since currently these classes are not used anywhere in Biopython.
> 
> This is for Bug 2284 right?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2284
> 
> I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle
> 
Actually I hadn't noticed bug 2284. I looked into this because the 
Biopython tests are causing DeprecationWarnings. If no users of these 
classes step forward, I am in favor of removing them.

--Michiel.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 14:13:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 10:13:29 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707011413.l61EDTF3012907@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #19 from sbassi at gmail.com  2007-07-01 10:13 EST -------
(In reply to comment #18)
> Btw, I am finding that the code for Python < 2.3 is faster than the code for
> Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
> if  we avoid copying seq, I still find that it is faster than the code for
> Python >= 2.4.

OK, so leave it w/o the check for python version and use just the 2.3 code.
Best,
SB.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 22:38:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 18:38:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707012238.l61Mct1k007379@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 18:38 EST -------
Updated in CVS, using the 2.3 code without copying seq.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 23:42:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:42:14 -0400
Subject: [Biopython-dev] [Bug 2327] New: test_Cluster takes too long
Message-ID: <bug-2327-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2327

           Summary: test_Cluster takes too long
           Product: Biopython
           Version: 1.43
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: idoerg at burnham.org


When running the biopython test suite, test_Cluster takes too long. I gave up
after 2 minutes.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 23:55:34 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:34 -0400
Subject: [Biopython-dev] [Bug 2327] test_Cluster takes too long
In-Reply-To: <bug-2327-42@http.bugzilla.open-bio.org/>
Message-ID: <200707012355.l61NtYcR012177@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2327


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 19:55 EST -------


*** This bug has been marked as a duplicate of bug 2268 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  1 23:55:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:36 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To: <bug-2268-42@http.bugzilla.open-bio.org/>
Message-ID: <200707012355.l61NtaNW012196@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2268


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |idoerg at gmail.com


------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp  2007-07-01 19:55 EST -------
*** Bug 2327 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jul  3 11:03:40 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:03:40 -0400
Subject: [Biopython-dev] [Bug 2328] New: NCBIStandalone.blastall chokes on
	integer argument
Message-ID: <bug-2328-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328

           Summary: NCBIStandalone.blastall chokes on integer argument
           Product: Biopython
           Version: 1.43
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: grunberg at embl.de
                CC: grunberg at embl.de


Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
expect that the argument align_view is given as a string rather than an
integer. So the following call worked with previous versions but now fails::

   results, err = NCBIStandalone.blastall( settings.blast_bin,
                                           method, db, seqFile,
                                           expectation=e,
                                           align_view=7, ## XML output
                                           **kw)

The error is raised here::

  NCBIStandalone: 1788 (blastall) 
     w, r, e = os.popen3(" ".join([blastcmd] + params))

because align_view escapes the str conversion of the other parameters in this
line::

   params.extend([att2param['align_view'], align_view])

This line should rather look like this::

   params.extend([att2param['align_view'], str(align_view)])

I am going to attach a patch to this bugreport.

Greetings,
Raik


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jul  3 11:05:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:05:37 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707031105.l63B5bAP013190@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


------- Comment #1 from grunberg at embl.de  2007-07-03 07:05 EST -------
Created an attachment (id=698)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=698&action=view)
patch for  Bug 2328 (NCBIStandalone.blastall / blastpgp)

The patch is described in my bug report.
Cheers,
Raik


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jul  3 23:26:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 19:26:15 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707032326.l63NQFBB022873@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2007-07-03 19:26 EST -------
> Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> expect that the argument align_view is given as a string rather than an
> integer. So the following call worked with previous versions but now fails::

In which previous version of Biopython did this work? Your patch looks fine,
but I'd like to find out how this bug entered Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jul  5 13:30:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jul 2007 09:30:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707051330.l65DUW2k004459@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #21 from dalloliogm at gmail.com  2007-07-05 09:30 EST -------
(In reply to comment #1)
> Created an attachment (id=689)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) [details]
> Proposed functions (CRC64 and GCG checksum)
> 
> This could be in utils.py, but I am not sure.


Maybe it could be useful to add a 'GCG checksum' attribute to the BioPython Seq
object.

Checksums could be used to quickly compare if two sequences are the same; but
in the documentation you should state very clearly that two sequences which
differ even for a single symbol (ex. AAANAAA and AAAAAAA) have different
values.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jul  7 09:28:56 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 7 Jul 2007 05:28:56 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707070928.l679SuTJ010432@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


------- Comment #3 from grunberg at embl.de  2007-07-07 05:28 EST -------
(In reply to comment #2)
> > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> > expect that the argument align_view is given as a string rather than an
> > integer. So the following call worked with previous versions but now fails::
> 
> In which previous version of Biopython did this work? Your patch looks fine,
> but I'd like to find out how this bug entered Biopython.
> 

Sorry about the late reply... My previous Biopython installation (which didn't
have the glitch) was version 1.42.
Greetings
Raik


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul  8 04:20:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 8 Jul 2007 00:20:12 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
	integer argument
In-Reply-To: <bug-2328-42@http.bugzilla.open-bio.org/>
Message-ID: <200707080420.l684KCSq031646@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2328


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2007-07-08 00:20 EST -------
Fixed in CVS (see biopython/Bio/Blast/NCBIStandalone.py revision 1.68).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From chengsoon.ong at tuebingen.mpg.de  Mon Jul  9 10:15:50 2007
From: chengsoon.ong at tuebingen.mpg.de (Cheng Soon Ong)
Date: Mon, 9 Jul 2007 12:15:50 +0200
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
Message-ID: <CEE0EEC1-BF4A-4A07-9389-6FA2C6E4ACC3@tuebingen.mpg.de>

Hi,

I've just written a small extension to the qblast function. The  
current version of only passes a subset of parameters to NCBI. I've  
just written some code such that it passes all the parameters that  
the qblast API at NCBI accepts.

Is anyone interested to merge this into the blast module of  
Biopython? Sorry, I do not know the protocol here for getting code  
into Biopython.

Cheng


From mdehoon at c2b2.columbia.edu  Mon Jul  9 11:40:23 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 09 Jul 2007 20:40:23 +0900
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
In-Reply-To: <CEE0EEC1-BF4A-4A07-9389-6FA2C6E4ACC3@tuebingen.mpg.de>
References: <CEE0EEC1-BF4A-4A07-9389-6FA2C6E4ACC3@tuebingen.mpg.de>
Message-ID: <46921EA7.2080106@c2b2.columbia.edu>

Dear Cheng,

Thank you for your contribution.

The "official" way to contribute code to Biopython is to open a bug 
report at http://bugzilla.open-bio.org/, open a new bug report, and add 
your code to it.

For your qblast code, you can also just send it to me (not to the list), 
then I can merge it into Biopython.

--Michiel.

Cheng Soon Ong wrote:
> Hi,
> 
> I've just written a small extension to the qblast function. The  
> current version of only passes a subset of parameters to NCBI. I've  
> just written some code such that it passes all the parameters that  
> the qblast API at NCBI accepts.
> 
> Is anyone interested to merge this into the blast module of  
> Biopython? Sorry, I do not know the protocol here for getting code  
> into Biopython.
> 


From biopython-dev at maubp.freeserve.co.uk  Tue Jul 10 19:31:55 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 20:31:55 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
Message-ID: <4693DEAB.8000900@maubp.freeserve.co.uk>

Hi Tiago,

Have you had any feedback (off the mailing list)?

Ralph - did you have a chance to look over Tiago's code or discuss this 
with him?

It would be a shame if nothing came from this...

Peter

Tiago Ant?o wrote:
> Hi!
> 
> I have submitted another enhancement bug, with support for FDist. It
> allows to generate and parse Fdist files and to control fdist
> applications. There are also a couple of utility functions. FDist is a
> niche application (mainly used to detect selection in animal
> genetics). Not the most fundamental one to support, but it is
> currently one that I am working on, thus, the code.
> 
> Regarding my summited code for GenePop, I have summited a different
> version on bugzilla.  The main difference, is that I moved everything
> from Bio to Bio.PopGen.
> 
> Before I continue putting code on bugzilla I would like to know if it
> is worthwhile doing it... Any opinions on the code submitted or if any
> changes are required? I would really like to continue converting my
> code to BioPython, but only if it has any possibility of ending up
> being useful/included in distribution somewhere in the future... ;)
> 
> I am currently working on code related to SimCoal2, Arlequin and
> general statistics (Fst, heterozygosity, ...). Which will probably be
> ready quite soon (ie, next two weeks). This is more mainstream than
> FDist
> 
> I have some other code lying around mainly related to HapMap, but I
> will only submit it after reviewing and reusing it again. This is more
> distant future ... like a couple of months.
> 
> Tiago


From biopython-dev at maubp.freeserve.co.uk  Tue Jul 10 21:12:44 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 22:12:44 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <Pine.OSX.4.64.0707101652021.415@emeraldii.local>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
	<Pine.OSX.4.64.0707101652021.415@emeraldii.local>
Message-ID: <4693F64C.9050403@maubp.freeserve.co.uk>

Ralph Haygood wrote:
> Peter,
> 
> I haven't received any code from Tiago to review.
> 
> Ralph

He's put some on Bugzilla:
http://bugzilla.open-bio.org/show_bug.cgi?id=2170

Peter


From rhaygood at duke.edu  Wed Jul 11 03:45:56 2007
From: rhaygood at duke.edu (Ralph Haygood)
Date: Tue, 10 Jul 2007 23:45:56 -0400 (EDT)
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <4693DEAB.8000900@maubp.freeserve.co.uk>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID: <Pine.OSX.4.64.0707102320420.658@emeraldii.local>

Peter and Tiago,

Hello.  No, I haven't done anything with Tiago's code.  I'm afraid
it's pretty far from what I'm working on these days.

I still think it would be good for BioPython to include methods for
computing basic population-genetical statistics (Watterson's theta,
Tajima's D, etc.) from DNA alignments.  I have in mind something like
BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen).  My own
code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
conform to BioPython's standards for style, testing, or documentation,
and I don't know when I'll have time to standardize it.

Ralph

On Tue, 10 Jul 2007, Peter wrote:

> Hi Tiago,
>
> Have you had any feedback (off the mailing list)?
>
> Ralph - did you have a chance to look over Tiago's code or discuss this with 
> him?
>
> It would be a shame if nothing came from this...
>
> Peter
>
> Tiago Ant?o wrote:
>> Hi!
>> 
>> I have submitted another enhancement bug, with support for FDist. It
>> allows to generate and parse Fdist files and to control fdist
>> applications. There are also a couple of utility functions. FDist is a
>> niche application (mainly used to detect selection in animal
>> genetics). Not the most fundamental one to support, but it is
>> currently one that I am working on, thus, the code.
>> 
>> Regarding my summited code for GenePop, I have summited a different
>> version on bugzilla.  The main difference, is that I moved everything
>> from Bio to Bio.PopGen.
>> 
>> Before I continue putting code on bugzilla I would like to know if it
>> is worthwhile doing it... Any opinions on the code submitted or if any
>> changes are required? I would really like to continue converting my
>> code to BioPython, but only if it has any possibility of ending up
>> being useful/included in distribution somewhere in the future... ;)
>> 
>> I am currently working on code related to SimCoal2, Arlequin and
>> general statistics (Fst, heterozygosity, ...). Which will probably be
>> ready quite soon (ie, next two weeks). This is more mainstream than
>> FDist
>> 
>> I have some other code lying around mainly related to HapMap, but I
>> will only submit it after reviewing and reusing it again. This is more
>> distant future ... like a couple of months.
>> 
>> Tiago
>
>
>

From tiagoantao at gmail.com  Wed Jul 11 10:05:21 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 11 Jul 2007 12:05:21 +0200
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <Pine.OSX.4.64.0707102320420.658@emeraldii.local>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
	<Pine.OSX.4.64.0707102320420.658@emeraldii.local>
Message-ID: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>

Hi,

I had no feedback and it seemed that there was no interest, so I
decided to start a Python Population Genetics project on google, which
is going ahead, but still on alpha stages:
http://code.google.com/p/pypopgen/
I am doing this on a personal basis for now (I did not even announce
it anywhere), and so it is advancing at my personal pace and design
according to me needs
I have used it already (or a tiny part of it) on a published
aplication ( http://popgen.eu/soft/m4s2 ).
I am still willing to integrate this on BioPython, but for that some
interest and feedback would be needed... That would have to happen
somewhat soon as the code will have to be adapted to BioPython
standards and namespace, and when, in a future, there is a lot of code
that will be in practice difficult (and after going public it will be
impossible really).

The "strangest" code that I am doing (and that would need more
discussion) is one to do asyncronous computation (to be easy to use on
multicore computers and grids).

Regards,
Tiago

On 7/11/07, Ralph Haygood <rhaygood at duke.edu> wrote:
> Peter and Tiago,
>
> Hello.  No, I haven't done anything with Tiago's code.  I'm afraid
> it's pretty far from what I'm working on these days.
>
> I still think it would be good for BioPython to include methods for
> computing basic population-genetical statistics (Watterson's theta,
> Tajima's D, etc.) from DNA alignments.  I have in mind something like
> BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen).  My own
> code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> conform to BioPython's standards for style, testing, or documentation,
> and I don't know when I'll have time to standardize it.
>
> Ralph
>
> On Tue, 10 Jul 2007, Peter wrote:
>
> > Hi Tiago,
> >
> > Have you had any feedback (off the mailing list)?
> >
> > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > him?
> >
> > It would be a shame if nothing came from this...
> >
> > Peter
> >
> > Tiago Ant?o wrote:
> >> Hi!
> >>
> >> I have submitted another enhancement bug, with support for FDist. It
> >> allows to generate and parse Fdist files and to control fdist
> >> applications. There are also a couple of utility functions. FDist is a
> >> niche application (mainly used to detect selection in animal
> >> genetics). Not the most fundamental one to support, but it is
> >> currently one that I am working on, thus, the code.
> >>
> >> Regarding my summited code for GenePop, I have summited a different
> >> version on bugzilla.  The main difference, is that I moved everything
> >> from Bio to Bio.PopGen.
> >>
> >> Before I continue putting code on bugzilla I would like to know if it
> >> is worthwhile doing it... Any opinions on the code submitted or if any
> >> changes are required? I would really like to continue converting my
> >> code to BioPython, but only if it has any possibility of ending up
> >> being useful/included in distribution somewhere in the future... ;)
> >>
> >> I am currently working on code related to SimCoal2, Arlequin and
> >> general statistics (Fst, heterozygosity, ...). Which will probably be
> >> ready quite soon (ie, next two weeks). This is more mainstream than
> >> FDist
> >>
> >> I have some other code lying around mainly related to HapMap, but I
> >> will only submit it after reviewing and reusing it again. This is more
> >> distant future ... like a couple of months.
> >>
> >> Tiago
> >
> >
> >


From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 11:08:07 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:08:07 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131108.l6DB87xm027778@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 07:08 EST -------
I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py
and noticed that while crc64, gcg and seguid will cope with both strings and
Seq objects, crc32 will only cope with strings.

Any objections to me fixing this like so:

Old:

from binascii import crc32

New:

from binascii import crc32 as _crc32

def crc32(seq) :
    """Returns the crc32 checksum for a sequence (string or Seq object)"""
    try :
        #Assume its a Seq object
        return _crc32(seq.tostring())
    except AttributeError :
        #Assume its a string
        return _crc32(seq)

--
Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 11:18:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:18:30 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131118.l6DBIUOS028425@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 07:18 EST -------
Created an attachment (id=703)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=703&action=view)
Initial unit test for Bio/SeqUtils/CheckSum

If the crc32 function could accept a Seq object then the "try/except" at the
end isn't needed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 14:38:52 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 10:38:52 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131438.l6DEcqn5008339@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp  2007-07-13 10:38 EST -------
A better solution would be for Seq to inherit from str, instead of Seq having
str as a member. Then we don't have to modify crc32, and other code in
Biopython will also become simpler.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 15:17:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:17:59 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131517.l6DFHxY4010647@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 11:17 EST -------
I have just fixed a few in CVS, here a list of remaining abnormal
shebang/hashbang lines:

biopython/Bio/EUtils/POM.py     '#!/usr/bin/python -i\n'
biopython/Bio/EUtils/DTDs/LinkOut.py    '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/__init__.py   '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eInfo_020511.py       '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eLink_020511.py       '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/ePost_020511.py       '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSearch_020511.py     '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSummary_020511.py    '#!/usr/bin/python\n'

The biopython/Bio/EUtils/*.py examples are interesting in that many of those
files are autogenerated from DTD files (using the dtd2py.py script I think -
but it doesn't seem to work on all of them).

Also, I don't think all the files under Bio/Restriction/*.py need a shebang,
and a large proportion of the unit tests have shebangs (but less than half).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Fri Jul 13 15:23:03 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 13 Jul 2007 16:23:03 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
	<4693DEAB.8000900@maubp.freeserve.co.uk>
	<Pine.OSX.4.64.0707102320420.658@emeraldii.local>
	<6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
Message-ID: <6d941f120707130823i6b62478fl9ac589eb5c02ca9d@mail.gmail.com>

I just want to add that I followed precisely the procedure that I was
suggested at that time, ie to open bugzilla issues, but I got no
answer or follow up from it. I also had some very useful mail
exchanges with Ralph at that time, but no code was floated around.

I reiterate my interest in supplying the code (currently supporting
fdist, simcoal2, genepop, hapmap, ucsc table browser - in varying
degrees of quality). You can have a look at the google url supplied
(svn repository in it). I would still take the necessary time to
convert it to BioPython namespace and format.

If in one week I see no interest (interest in the form of pro actively
making things go forward) at all then I will consider this a closed
issue and will not spend more time with trying any form of
integration, in the sense that I have done all that was requested here
and really got no feedback.

Tiago

On 7/11/07, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi,
>
> I had no feedback and it seemed that there was no interest, so I
> decided to start a Python Population Genetics project on google, which
> is going ahead, but still on alpha stages:
> http://code.google.com/p/pypopgen/
> I am doing this on a personal basis for now (I did not even announce
> it anywhere), and so it is advancing at my personal pace and design
> according to me needs
> I have used it already (or a tiny part of it) on a published
> aplication ( http://popgen.eu/soft/m4s2 ).
> I am still willing to integrate this on BioPython, but for that some
> interest and feedback would be needed... That would have to happen
> somewhat soon as the code will have to be adapted to BioPython
> standards and namespace, and when, in a future, there is a lot of code
> that will be in practice difficult (and after going public it will be
> impossible really).
>
> The "strangest" code that I am doing (and that would need more
> discussion) is one to do asyncronous computation (to be easy to use on
> multicore computers and grids).
>
> Regards,
> Tiago
>
> On 7/11/07, Ralph Haygood <rhaygood at duke.edu> wrote:
> > Peter and Tiago,
> >
> > Hello.  No, I haven't done anything with Tiago's code.  I'm afraid
> > it's pretty far from what I'm working on these days.
> >
> > I still think it would be good for BioPython to include methods for
> > computing basic population-genetical statistics (Watterson's theta,
> > Tajima's D, etc.) from DNA alignments.  I have in mind something like
> > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen).  My own
> > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> > conform to BioPython's standards for style, testing, or documentation,
> > and I don't know when I'll have time to standardize it.
> >
> > Ralph
> >
> > On Tue, 10 Jul 2007, Peter wrote:
> >
> > > Hi Tiago,
> > >
> > > Have you had any feedback (off the mailing list)?
> > >
> > > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > > him?
> > >
> > > It would be a shame if nothing came from this...
> > >
> > > Peter
> > >
> > > Tiago Ant?o wrote:
> > >> Hi!
> > >>
> > >> I have submitted another enhancement bug, with support for FDist. It
> > >> allows to generate and parse Fdist files and to control fdist
> > >> applications. There are also a couple of utility functions. FDist is a
> > >> niche application (mainly used to detect selection in animal
> > >> genetics). Not the most fundamental one to support, but it is
> > >> currently one that I am working on, thus, the code.
> > >>
> > >> Regarding my summited code for GenePop, I have summited a different
> > >> version on bugzilla.  The main difference, is that I moved everything
> > >> from Bio to Bio.PopGen.
> > >>
> > >> Before I continue putting code on bugzilla I would like to know if it
> > >> is worthwhile doing it... Any opinions on the code submitted or if any
> > >> changes are required? I would really like to continue converting my
> > >> code to BioPython, but only if it has any possibility of ending up
> > >> being useful/included in distribution somewhere in the future... ;)
> > >>
> > >> I am currently working on code related to SimCoal2, Arlequin and
> > >> general statistics (Fst, heterozygosity, ...). Which will probably be
> > >> ready quite soon (ie, next two weeks). This is more mainstream than
> > >> FDist
> > >>
> > >> I have some other code lying around mainly related to HapMap, but I
> > >> will only submit it after reviewing and reusing it again. This is more
> > >> distant future ... like a couple of months.
> > >>
> > >> Tiago
> > >
> > >
> > >
>


From bugzilla-daemon at portal.open-bio.org  Fri Jul 13 15:25:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:25:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To: <bug-2323-42@http.bugzilla.open-bio.org/>
Message-ID: <200707131525.l6DFPWMa011025@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2323


------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-13 11:25 EST -------
Changing the Seq object to be a subclass of string might be nice... but perhaps
rather confusing for minority alphabets where the "letters" are not single
characters(*).  More importantly, wouldn't this dramatic change break a lot of
existing scripts? Probably something for the mailing list!

(*) I've never done it, but one example is storing three letter protein
sequences, nice if you have any post translational modifications which cannot
be represented using the single letter scheme.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython-dev at maubp.freeserve.co.uk  Sat Jul 14 10:22:06 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 14 Jul 2007 11:22:06 +0100
Subject: [Biopython-dev] Bug 2292 - TER lines from Bio.PDBIO
Message-ID: <4698A3CE.7020907@maubp.freeserve.co.uk>

Hi Thomas,

Could you have a look at Biopython Bug 2292 and the suggested patch from 
   Michal Gajda to write TER records in line with the spec:

http://bugzilla.open-bio.org/show_bug.cgi?id=2292

Thanks

Peter


From tiagoantao at gmail.com  Sat Jul 14 16:32:43 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 14 Jul 2007 17:32:43 +0100
Subject: [Biopython-dev] Population Genetics code
Message-ID: <6d941f120707140932u356c84bel6a9322a2767e6da7@mail.gmail.com>

Hi!

Firstly I would like to thank everybody that answered so positively to
my "rant" about submitting population genetics code to Biopython.

I have a few suggestions on how to progress in a safe in constructive
way with a possible Population Genetics part for biopython.
First of all, the starting point:
1. There is none in the core developers that is working actively in
populations genetics
2. Point 1 entails that any code submissions (made by biopython
newbies like me) will not be able to be completely reviewed by
seasoned biopython developers
3. Initially there will only be me submitting code (please correct me
if I am wrong, especially Ralph...)
4. There is already some popgen statistical code in python lying
around e.g. http://www.pypop.org/

Therefore I suggest starting out by doing a small, "safe", project
around a not very used application (Mark Beaumont's Fdist program
http://www.rubic.rdg.ac.uk/~mab/software.html ). This code is already
done and tested (by myself). I also have test cases (in BioPython
format) for parts of it. The major issue is that it is currently
outside of Bio.PopGen namespace, so its not really very major...
I would provide parsers, configuration file generators and utilities
to run the suite of fdist programs.
Why start with such a simple and less relevant application:
1. Its safer to start with something less grand (if its poorly done it
won't be that serious).
2. There is no python fdist code lying around, so there is no overlap
at all with existing projects
3. This code is already done and being used...

I will provide code, test code, and documentation (probably by adding
stuff to the wiki). Then other people could evaluate what was done,
and we would continue from there to other, more used applications
(Genepop, arlequin, simcoal2, ...) and databases (HapMap,
TableBrowser).

Is this an acceptable way of going ahead? If other people would like
to participate, that would be fantastic...

If my suggestion is rubbish, please also say ;)

Many thanks,
Tiago


From biopython-dev at maubp.freeserve.co.uk  Mon Jul 16 18:27:40 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:27:40 +0100
Subject: [Biopython-dev] Biopython usage figures
Message-ID: <469BB89C.8010904@maubp.freeserve.co.uk>

A little last minute I know, but would anyone have access to the website
download statistics? I'd like to include rough figures for the number of
downloads of the recent releases in the BOSC 2007 talk.

A list of developers with CVS access would be nice too - but I can just
trawl though the logs to spot active people ;)

Peter


From biopython-dev at maubp.freeserve.co.uk  Mon Jul 16 18:50:49 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:50:49 +0100
Subject: [Biopython-dev] Is Bio.Crystal obsolete?
Message-ID: <469BBE09.1000005@maubp.freeserve.co.uk>

I just had a look at the Bio.Crystal module by Katharine Lindner (2002), 
consisting of the single file Bio/Crystal/__init__.py whose preamble states:

 > Hetero, Crystal and Chain exist to represent the NDB Atlas
 > structure.  Atlas is a minimal subset of the PDB format.  Heteo
 > supports a 3 alphameric code. The NDB web interface is located at
 > ...

The old link should probably be updated as it doesn't work, perhaps:
http://ndbserver.rutgers.edu/atlas/index.html

As far as I can see, they now provide their downloads in PDB, CIF and an 
XML file format - and the PDB files look like full thing to me at first 
glance rather than a minimal subset.

There is a unit test, Tests/test_Crystal.py but no example input files.

This module looks obsolete to me - can we mark it as deprecated after 
checking on the main list no one uses it (as done for Bio.Kabat back in 
March 2007)?

Peter


From tiagoantao at gmail.com  Wed Jul 18 10:29:08 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 11:29:08 +0100
Subject: [Biopython-dev] PopGen code
Message-ID: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>

Hi!

Starting today I will begin putting code on CVS regarding Population
Genetics stuff.
I will start by checking in a GenePop parser and test code.
Very soon FDist code will follow.
After that Simcoal stuff, more GenePop stuff, HapMap and UCSC table
browser will follow.
I was not able to read dev.open-bio.org suggestions as it seems to be
down for a some time.
If any of the senior Biopython developers finds that I am doing
anything seriously wrong, please don't hesitate to contact me
immediately.
I will be putting everything below a PopGen directory in Bio.
Everything except tests, of course ;)

Regards,
Tiago


From biopython-dev at maubp.freeserve.co.uk  Wed Jul 18 21:37:46 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jul 2007 22:37:46 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
Message-ID: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>

Tiago Ant?o wrote:
> Hi!
>
> Starting today I will begin putting code on CVS regarding Population
> Genetics stuff...
> I will be putting everything below a PopGen directory in Bio.
> Everything except tests, of course ;)

Sounds good :)

If you can write some introductory text to add to the
cookbook/tutorial that would be even better.  If you are not familiar
with LaTeX, then just write it up in plain text and I could add that
to the tutorial with suitable mark-up/formatting on your behalf.

This may be easier to do in chunks as you add new code, or in a large
batch later on - up to you.

Peter


From tiagoantao at gmail.com  Wed Jul 18 22:46:19 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 23:46:19 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
	<320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
Message-ID: <6d941f120707181546y34e17038nb07106dacae533db@mail.gmail.com>

Hi!

On 7/18/07, Peter <biopython-dev at maubp.freeserve.co.uk> wrote:
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better.  If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.

I agree, in fact it is what I intend to do after having the FDist code in.
I will write mostly in parallel with commiting. So the doc should be
more or less aligned with what is being put in CVS...

Regards,
Tiago


From tiagoantao at gmail.com  Thu Jul 19 13:09:29 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 19 Jul 2007 14:09:29 +0100
Subject: [Biopython-dev] PopGen Documentation
Message-ID: <6d941f120707190609o4f5f7412x94851295865ba22b@mail.gmail.com>

Hi All,

Following Peter's suggestion, I had a closer look at the
documentation, and, if nobody opposes, I would like to add a new
subsection between PDB and Miscellaneous on the cookbook chapter, Like
this

4.10  Going 3D: The PDB module
4.11  PopGen: Population genetics (and genomics)
4.12  Miscellaneous

Tiago


On 7/18/07, Peter <biopython-dev at maubp.freeserve.co.uk> wrote:
> Tiago Ant?o wrote:
> > Hi!
> >
> > Starting today I will begin putting code on CVS regarding Population
> > Genetics stuff...
> > I will be putting everything below a PopGen directory in Bio.
> > Everything except tests, of course ;)
>
> Sounds good :)
>
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better.  If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.
>
> This may be easier to do in chunks as you add new code, or in a large
> batch later on - up to you.
>
> Peter
>


From bugzilla-daemon at portal.open-bio.org  Sat Jul 21 15:28:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:28:49 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707211528.l6LFSnBk031498@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp  2007-07-21 11:28 EST -------
In biopython/Bio/EUtils, the shebang line was specified in sourcegen.py. I
fixed that line and the shebang lines in the other *.py files under
biopython/Bio/EUtils. Can we close this bug?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jul 21 15:47:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:47:32 -0400
Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF
	folder after the install
In-Reply-To: <bug-2291-42@http.bugzilla.open-bio.org/>
Message-ID: <200707211547.l6LFlWxU032394@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2291


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2007-07-21 11:47 EST -------
I have modified setup.py so that Bio.PDB.mmCIF is always a module (but not
necessarily with the MMCIFlex module; users still need to modify setup.py to
include it). With Bio.PDB.mmCIF always present, the Bio/PDB/mmCIF/__init__.py
file is no longer lost.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul 22 08:30:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 04:30:11 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707220830.l6M8UB6d006746@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-22 04:30 EST -------
Regarding comment 8, after changing sourcegen.py were you able to regenerate
all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?

Anyway - that should leave us with consistent shebang/hashbang lines :)

Unless we also want to remove any surplus lines, and decide if all or none of
the unit tests should have them, then this bug looks done.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jul 22 09:53:46 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 05:53:46 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To: <bug-2269-42@http.bugzilla.open-bio.org/>
Message-ID: <200707220953.l6M9rkap010929@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2269


mdehoon at ims.u-tokyo.ac.jp changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp  2007-07-22 05:53 EST -------
> Regarding comment 8, after changing sourcegen.py were you able to regenerate
> all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?

I fixed them by hand. The fixed sourcegen.py should result in the same
biopython/Bio/EUtils/*.py files as I created by hand. I tried regenerating
these files automatically, but that didn't work for me. At some point, somebody
should figure out how the biopython/Bio/EUtils code works.

> Unless we also want to remove any surplus lines, and decide if all or none of
> the unit tests should have them, then this bug looks done.

Since Python itself does not seem to have a clear rule as to which files should
have a shebang line, it is not obvious which Biopython files should have one.
If somebody really wants to fix this, it's probably better to discuss such an
issue on the mailing list first. As the issue raised by the original bug report
has been resolved, I am closing this bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mdehoon at c2b2.columbia.edu  Sun Jul 22 10:28:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 22 Jul 2007 19:28:22 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
	files with one record)
In-Reply-To: <4693E5FE.708@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk>
Message-ID: <46A33146.7030405@c2b2.columbia.edu>

Peter wrote:
> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
> 
Let's discuss the Bio.Align.Alignment class first, and then decide how 
to parse alignment files.

Currently, the alignment class holds a list of SeqRecord objects:


class Alignment:
     ...
     def __init__(self, alphabet):
         ...
         # hold everything at a list of seq record objects
         self._records = []

To get access to self_record, the Alignment class has some accessor 
functions:

     def get_all_seqs(self):
         ...
         return self._records


     def get_seq_by_num(self, number):
         ...
         return self._records[number].seq

A cleaner way to do this is to let the class Alignment inherit from 
list. This also allows us to use all list methods on Alignment objects. 
For example, we can iterate over them, as suggested in this bug report:

http://bugzilla.open-bio.org/show_bug.cgi?id=1944

Any objections against letting Alignment inherit from list?


--Michiel


From salish at picasso.ucsf.edu  Sun Jul 22 18:27:58 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Sun, 22 Jul 2007 11:27:58 -0700
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
	files with one record)
In-Reply-To: <46A33146.7030405@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>
	<46A33146.7030405@c2b2.columbia.edu>
Message-ID: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>

Hello all,


To get this same behavior, you can also create the __iter__ and next()
methods in Alignment itself.

-Howard Salis

On 7/22/07, Michiel de Hoon <mdehoon at c2b2.columbia.edu> wrote:
> Peter wrote:
> > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
> >
> Let's discuss the Bio.Align.Alignment class first, and then decide how
> to parse alignment files.
>
> Currently, the alignment class holds a list of SeqRecord objects:
>
>
> class Alignment:
>      ...
>      def __init__(self, alphabet):
>          ...
>          # hold everything at a list of seq record objects
>          self._records = []
>
> To get access to self_record, the Alignment class has some accessor
> functions:
>
>      def get_all_seqs(self):
>          ...
>          return self._records
>
>
>      def get_seq_by_num(self, number):
>          ...
>          return self._records[number].seq
>
> A cleaner way to do this is to let the class Alignment inherit from
> list. This also allows us to use all list methods on Alignment objects.
> For example, we can iterate over them, as suggested in this bug report:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Any objections against letting Alignment inherit from list?
>
>
> --Michiel
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From mdehoon at c2b2.columbia.edu  Wed Jul 25 13:17:33 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 22:17:33 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
 files with one record)
In-Reply-To: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>
	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
Message-ID: <46A74D6D.9020309@c2b2.columbia.edu>

Sure, that is possible, but that means we'd be adding methods to 
Alignment in order for it to behave like a list, whereas we can get 
that for free by letting the Alignment class inherit from list.

--Michiel.

Howard Salis wrote:
> Hello all,
> 
> 
> To get this same behavior, you can also create the __iter__ and next()
> methods in Alignment itself.
> 
> -Howard Salis
> 
> On 7/22/07, Michiel de Hoon <mdehoon at c2b2.columbia.edu> wrote:
>> Peter wrote:
>>> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
>>> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
>>>
>> Let's discuss the Bio.Align.Alignment class first, and then decide how
>> to parse alignment files.
>>
>> Currently, the alignment class holds a list of SeqRecord objects:
>>
>>
>> class Alignment:
>>      ...
>>      def __init__(self, alphabet):
>>          ...
>>          # hold everything at a list of seq record objects
>>          self._records = []
>>
>> To get access to self_record, the Alignment class has some accessor
>> functions:
>>
>>      def get_all_seqs(self):
>>          ...
>>          return self._records
>>
>>
>>      def get_seq_by_num(self, number):
>>          ...
>>          return self._records[number].seq
>>
>> A cleaner way to do this is to let the class Alignment inherit from
>> list. This also allows us to use all list methods on Alignment objects.
>> For example, we can iterate over them, as suggested in this bug report:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Any objections against letting Alignment inherit from list?
>>
>>
>> --Michiel
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython-dev at maubp.freeserve.co.uk  Wed Jul 25 13:34:02 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 14:34:02 +0100
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A74D6D.9020309@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
	<46A74D6D.9020309@c2b2.columbia.edu>
Message-ID: <46A7514A.1090405@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Sure, that is possible, but that means we'd be adding methods to 
> Alignment in order for it to behave like a list, whereas we can get 
> that for free by letting the Alignment class inherit from list.
> 
> --Michiel.

Personally I see an alignment as both an array of characters (i.e. amino 
acid residues or nucleotides), and a list of sequences.

In the same way that a Numeric or NumPy array lets you iterate over 
rows, yet also access individual elements, we could allow iteration of 
SeqRecords and also allow access to individual letters.

Peter


From mdehoon at c2b2.columbia.edu  Wed Jul 25 14:44:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 23:44:56 +0900
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A7514A.1090405@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
	<46A74D6D.9020309@c2b2.columbia.edu>
	<46A7514A.1090405@maubp.freeserve.co.uk>
Message-ID: <46A761E8.5080909@c2b2.columbia.edu>

Peter wrote:
> Personally I see an alignment as both an array of characters (i.e. amino 
> acid residues or nucleotides), and a list of sequences.
> 
> In the same way that a Numeric or NumPy array lets you iterate over 
> rows, yet also access individual elements, we could allow iteration of 
> SeqRecords and also allow access to individual letters.

How about the following:

-Iterators iterate for the SeqRecords in the alignment

-An index of the form [xxx] returns the corresponding SeqRecord

-An index of the form [xxx:yyy:zzz] returns an Alignment object 
containing the SeqRecords in rows [xxx:yyy:zzz]
(compare to the current method get_all_seqs()).

-An index of the form [xxx,:] returns the Seq object of the SeqRecord at 
xxx (this is currently done by the get_seq_by_num() method).

-An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects

-An index of the form [:,www] returns a string containing the characters 
  at column www (which is currently done by the get_column method)

-An index of the form [xxx:yyy:zzz,www] returns a string containing the 
characters at column www using only the rows xxx:yyy:zzz.

-An index of the form [xxx,www] returns a string containing the 
character of the sequence in row xxx at column www.

This is more-or-less how Numerical Python arrays work, except that we'll 
be returning SeqRecord/Seq/string objects depending on the indices.

--Michiel.


From biopython-dev at maubp.freeserve.co.uk  Wed Jul 25 16:10:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 17:10:43 +0100
Subject: [Biopython-dev] Improving the Alignment object. Was Bio.AlignIO
In-Reply-To: <46A761E8.5080909@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>	<46A33146.7030405@c2b2.columbia.edu>	<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>	<46A74D6D.9020309@c2b2.columbia.edu>	<46A7514A.1090405@maubp.freeserve.co.uk>
	<46A761E8.5080909@c2b2.columbia.edu>
Message-ID: <46A77603.1030101@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>> Personally I see an alignment as both an array of characters (i.e. amino 
>> acid residues or nucleotides), and a list of sequences.
>>
>> In the same way that a Numeric or NumPy array lets you iterate over 
>> rows, yet also access individual elements, we could allow iteration of 
>> SeqRecords and also allow access to individual letters.
> 
> How about the following:
> 
> -Iterators iterate for the SeqRecords in the alignment

I Agree. And this is trivial to implement without needing the element 
access/splicing support.

As to element access, we've been thinking along similar lines :)
Its just that with all the different special cases, there are lots of 
different possible return types!

> -An index of the form [xxx] returns the corresponding SeqRecord
> -An index of the form [xxx:yyy:zzz] returns an Alignment object 
>  containing the SeqRecords in rows [xxx:yyy:zzz]
>  (compare to the current method get_all_seqs()).

I agree. This is essential to make an alignment act like a list of 
SeqRecord objects when only a one-dimensional index is given.

> -An index of the form [xxx,:] returns the Seq object of the SeqRecord at 
> xxx (this is currently done by the get_seq_by_num() method).
> -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects

I'm not immediately convinced about returning Seq objects here.  I might 
expect indices like [xxx,:] to return a SeqRecord (not a Seq) and 
[xxx:yyy:zzz,:] to return a sub-alignment (not a list of Seq objects).

> -An index of the form [:,www] returns a string containing the characters 
>  at column www (which is currently done by the get_column method)
> -An index of the form [xxx,www] returns a string containing the 
>  character of the sequence in row xxx at column www.

Those look fine - however we might want to return Seq objects rather 
than strings.

 > -An index of the form [xxx:yyy:zzz,www] returns a string containing
 >  the characters at column www using only the rows xxx:yyy:zzz.

Or a sub alignment? See later...

> This is more-or-less how Numerical Python arrays work, except that we'll 
> be returning SeqRecord/Seq/string objects depending on the indices.

For comparison, that is what I had been thinking:
* [r,c] means one element is requested, return a single character string
* [r] or [r,:] means one row is requested, return a SeqRecord
* [:,c] means one column is requested, return a string (or Seq object?)
* Otherwise returns a (sub)alignment. Note that [:] or [:,:] would 
return a copy of the alignment.

This would cover slicing of the column index by returning a 
sub-alignment. i.e. indexes of the form [rrr, xxx:yyy:zzz] or 
[rrr:ppp:qqq, xxx:yyy:zzz]

I'm not sure if requests for part of a single row or column like [rrr, 
xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning 
sub-alignments or as special cases (strings/Seq and Seq/SeqRecord 
respectively?).

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 14:52:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 10:52:38 -0400
Subject: [Biopython-dev] [Bug 2340] New: SProt.py fails to parse the current
	Swiss-Prot version 54.0
Message-ID: <bug-2340-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340

           Summary: SProt.py fails to parse the current Swiss-Prot version
                    54.0
           Product: Biopython
           Version: 1.43
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: gould at embl.de


Hi, 

I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
swiss-prot record but the parser just seems to bomb out not throwing an error
of where it actually fails. I'm guessing it has to do with the Release 54.0 of
24-Jul-07 of UniPROT with the addition of the new line type PE??


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 15:46:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 11:46:36 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261546.l6QFkaGq022472@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-26 11:46 EST -------
Hi Kate,

Could you give us the URL of one or two specific SwissProt files you're having
trouble with.

Also how are you trying to read the SwissProt files? e.g. with
Bio.SeqIO.parse()?

If you could include the python error too, that could be helpful. Thanks.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 16:06:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:06:15 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261606.l6QG6FkE023264@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #2 from gould at embl.de  2007-07-26 12:06 EST -------
(In reply to comment #0)
> Hi, 
> 
> I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
> swiss-prot record but the parser just seems to bomb out not throwing an error
> of where it actually fails. I'm guessing it has to do with the Release 54.0 of
> 24-Jul-07 of UniPROT with the addition of the new line type PE??
> 

(In reply to comment #1)
> Hi Kate,
> 
> Could you give us the URL of one or two specific SwissProt files you're having
> trouble with.
> 
> Also how are you trying to read the SwissProt files? e.g. with
> Bio.SeqIO.parse()?
> 
> If you could include the python error too, that could be helpful. Thanks.
> 
> Peter
> 

hi 
the following snippet of code is where the error occurs(this used to work no
problem before something changed in the last day or two I guess)

def getSequence(self,acc):
""" This method retrieves the most recent annotated sequence from the ExPASy
server for a given accession number. """

        from Bio.WWW import ExPASy
        from Bio.SwissProt import SProt
        from Bio import File

        if acc != '':
            try:
                results = ExPASy.get_sprot_raw(acc.strip()).read()
                sp_parser = SProt.RecordParser()
                sp_iterator = SProt.Iterator(File.StringHandle(results),
sp_parser)
                Record = sp_iterator.next()
                return Record.sequence.strip()
            except:
                return -1
        else:
            return acc


breaks at line : Record = sp_iterator.next() but doesn't print any error to
terminal....
some examples of accessions nrs used are: P01100, P12522 etc

thanks
Kate


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 16:32:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:32:31 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261632.l6QGWVrC024560@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-26 12:32 EST -------
Confirmeing bug - it is due to the new PE line (protein evidence).

The reason you didn't see the error is in your example the parser is wrapped in
a try ... except ... clause.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jul 26 16:51:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:51:45 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707261651.l6QGpja8025622@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-26 12:51 EST -------
I think I have fixed this - at least your example code now works.

You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
CVS, which you can download here:

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython

Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
to put things back.

Please test this and report back.

NOTE - The fix just makes the parser aware of the new PE line, and ignores it. 
It doesn't (yet) do anything useful with the information it contains!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jul 27 06:46:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 02:46:35 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707270646.l6R6kZaI001699@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


------- Comment #5 from gould at embl.de  2007-07-27 02:46 EST -------
(In reply to comment #4)
> I think I have fixed this - at least your example code now works.
> 
> You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
> CVS, which you can download here:
> 
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython
> 
> Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
> to put things back.
> 
> Please test this and report back.
> 
> NOTE - The fix just makes the parser aware of the new PE line, and ignores it. 
> It doesn't (yet) do anything useful with the information it contains!
> 

Yes it has done the trick and all works OK again. thanks


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jul 27 07:54:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 03:54:14 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
	Swiss-Prot version 54.0
In-Reply-To: <bug-2340-42@http.bugzilla.open-bio.org/>
Message-ID: <200707270754.l6R7sEnm007432@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2340


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-27 03:54 EST -------
Great :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From kosa at genesilico.pl  Fri Jul 27 10:47:10 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 12:47:10 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
Message-ID: <46A9CD2E.6080402@genesilico.pl>

Hi,

 From the viewpoint of the enduser we would like python Alignment object
to behave outside as an array so we could get slices, columns,
sequences, their fragments, whatever we want etc. The most intuitive and
clear (certainly much better than not very clear indexes like
[xxx:yyy:zzz]) for the  user is the following.

[A:B][X:Y] - general syntax of indices. This supports almost everything.

Several examples of usage and proposed outputs:

[:][:] - returns an alignment or its copy (as Alignment object)

[:][x:y] - returns slice of the alignment (as Alignment object; aln of
all sequences and residues corresponding to columns from x and y)

[a:b][:] - returns the aln of seqs from a to b (as Alignment object)

[a:b][x:y] - returns the slice and subalignment (as Alignment object)

[a:a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as Alignment object)
[a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as a String)

[a:][x:y] and similar combinations - returns the slice and subalignment,
sequences from a to the last are included (as Alignment object)

[:][x] - returns single column (as a String object? string here could be
very useful)

[:][x:x] - returns single column (as Alignment object)

[a] - returns single sequence (as a SeqRecord object)
[a:a] and [a:a][:] - returns single sequence (as Alignment object)

[m][n] - returns n-th element of sequence m (as a String)

Disputable could be that different but similar sets of indices return
different types of objects (ex. [:][x] would return a column as string
while [:][x:x] would return a column as Alignment object, but in my
opinion it would just extend the usability).

The only problem is an implementation of such calls but it depends on 
what type of object the Alignment object will be.

What do you think?

Cheers,
Jan Kosinski
Grzegorz Papaj


:.


From bugzilla-daemon at portal.open-bio.org  Fri Jul 27 12:51:10 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 08:51:10 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To: <bug-1944-42@http.bugzilla.open-bio.org/>
Message-ID: <200707271251.l6RCpAIg025706@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=1944


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2007-07-27 08:51 EST -------
Created an attachment (id=721)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=721&action=view)
Patch for Bio/Align/Generic.py to add __getitem__ method

This patch adds a __getitem__ method, a small "mini test" when running the
module directly, and updates the doc strings.  This gives SeqRecord iteration
"for free" (without an explicit __iter__ method).

As discussed on the mailing list, this allows an Alignment object to be treated
as a list of SeqRecord objects or as an array of character strings - plus
extract whole columns as strings.

Quoting the proposed __getitem__ doc string:

        Depending on the indices, you can get a SeqRecord objects
        (representing a single row), strings (for a single columns or
        single characters) or another alignment (representing some or
        part of the alignment).

        align[r,c] gives a single character as a string
        align[r] gives a SeqRecord
        align[:,c] gives a column as a string
        align[:] and align[:,:] give a copy of the alignment

        Anything else gives a sub alignment, e.g.
        align[0:2] or align[0:2,:] uses only row 0 and 1
        align[:,1:3] uses only columns 1 and 2
        align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2

Feedback welcome - either here, or on the developers' mailing list.  Thanks


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython-dev at maubp.freeserve.co.uk  Fri Jul 27 12:18:21 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 13:18:21 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9CD2E.6080402@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
Message-ID: <46A9E28D.40609@maubp.freeserve.co.uk>

Jan Kosinski wrote:
> Hi,
> 
>  From the viewpoint of the enduser we would like python Alignment object
> to behave outside as an array so we could get slices, columns,
> sequences, their fragments, whatever we want etc. The most intuitive and
> clear (certainly much better than not very clear indexes like
> [xxx:yyy:zzz]) for the  user is the following.
> 
> [A:B][X:Y] - general syntax of indices. This supports almost everything.

I think Michiel and I were suggesting [A:B,X:Y] or rather [A:B:C,X:Y:Z] 
to be fully general, rather than [A:B][X:Y] or [A:B:C][X:Y:Z]

i.e. [arg1, arg2] rather than [arg1][arg2]

This is an important point, as in the first case the __getitem__ method 
of the alignment is called once (with both arguments). In the second 
case, the __getitem__ method is called with arg1, and may return a 
SeqRecord or an alignment - and this object's __getitem__ method is 
called with arg2.

As written, many of your cases appear to be impossible - but using the 
[arg1,arg2] we can get close.

I've got a working bit of code put together now which I'll attached to 
bug 1944 soon.

http://bugzilla.open-bio.org/show_bug.cgi?id=1944

Peter


From kosa at genesilico.pl  Fri Jul 27 14:13:24 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:13:24 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
	<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46A9FD84.4080502@genesilico.pl>

Hi,

Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not 
clear while the [A:B][X:Y] syntax is clear and sufficient.

We had another discussion in the lab about that Alignment object should 
not store records in the list but rather in a dictionary (but keeping 
information about sequence order ) or so.  What is you reasoning for 
making Alignment object a list of SeqRecord objects?
One should carefully think about design of the Alignment class since it 
will influence all further steps. As now the class is in its infancy 
there is a very good moment for thinking what the Alignment class is for 
and what it should support. For instance, the Alignment object should 
support changing characters in the alignment without a need of copying 
it (using  aln[a][x] = "D"). Can it be done now with Alignment which is 
a list of SeqRecord objects with sequences implemented as immutable Seq 
objects ?

Cheers,
Jan Kosinski


Peter wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>>  From the viewpoint of the enduser we would like python Alignment object
>> to behave outside as an array so we could get slices, columns,
>> sequences, their fragments, whatever we want etc. The most intuitive and
>> clear (certainly much better than not very clear indexes like
>> [xxx:yyy:zzz]) for the  user is the following.
>>
>> [A:B][X:Y] - general syntax of indices. This supports almost everything.
>
> I think Michiel and I were suggesting [A:B,X:Y] or rather 
> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or 
> [A:B:C][X:Y:Z]
>
> i.e. [arg1, arg2] rather than [arg1][arg2]
>
> This is an important point, as in the first case the __getitem__ 
> method of the alignment is called once (with both arguments). In the 
> second case, the __getitem__ method is called with arg1, and may 
> return a SeqRecord or an alignment - and this object's __getitem__ 
> method is called with arg2.
>
> As written, many of your cases appear to be impossible - but using the 
> [arg1,arg2] we can get close.
>
> I've got a working bit of code put together now which I'll attached to 
> bug 1944 soon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Peter
>
>
> :.
>


:.


From kosa at genesilico.pl  Fri Jul 27 14:35:15 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:35:15 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA02A3.30000@genesilico.pl>

Hi,

Sorry for a typo ;-) Of course it should read:
... while the [A:B,X:Y] syntax is clear and sufficient."

Cheers,
Janek

Jan Kosinski wrote:
> Hi,
>
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is 
> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> We had another discussion in the lab about that Alignment object 
> should not store records in the list but rather in a dictionary (but 
> keeping information about sequence order ) or so.  What is you 
> reasoning for making Alignment object a list of SeqRecord objects?
> One should carefully think about design of the Alignment class since 
> it will influence all further steps. As now the class is in its 
> infancy there is a very good moment for thinking what the Alignment 
> class is for and what it should support. For instance, the Alignment 
> object should support changing characters in the alignment without a 
> need of copying it (using  aln[a][x] = "D"). Can it be done now with 
> Alignment which is a list of SeqRecord objects with sequences 
> implemented as immutable Seq objects ?
>
> Cheers,
> Jan Kosinski
>
>
> Peter wrote:
>> Jan Kosinski wrote:
>>> Hi,
>>>
>>>  From the viewpoint of the enduser we would like python Alignment 
>>> object
>>> to behave outside as an array so we could get slices, columns,
>>> sequences, their fragments, whatever we want etc. The most intuitive 
>>> and
>>> clear (certainly much better than not very clear indexes like
>>> [xxx:yyy:zzz]) for the  user is the following.
>>>
>>> [A:B][X:Y] - general syntax of indices. This supports almost 
>>> everything.
>>
>> I think Michiel and I were suggesting [A:B,X:Y] or rather 
>> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or 
>> [A:B:C][X:Y:Z]
>>
>> i.e. [arg1, arg2] rather than [arg1][arg2]
>>
>> This is an important point, as in the first case the __getitem__ 
>> method of the alignment is called once (with both arguments). In the 
>> second case, the __getitem__ method is called with arg1, and may 
>> return a SeqRecord or an alignment - and this object's __getitem__ 
>> method is called with arg2.
>>
>> As written, many of your cases appear to be impossible - but using 
>> the [arg1,arg2] we can get close.
>>
>> I've got a working bit of code put together now which I'll attached 
>> to bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Peter
>>
>>
>> :.
>>
>
>


:.


From biopython-dev at maubp.freeserve.co.uk  Fri Jul 27 17:11:03 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 18:11:03 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA2727.103@maubp.freeserve.co.uk>

Jan Kosinski wrote:
> We had another discussion in the lab about that Alignment object should 
> not store records in the list but rather in a dictionary (but keeping 
> information about sequence order ) or so.  What is you reasoning for 
> making Alignment object a list of SeqRecord objects?

In a sense the Bio.Align.Generic.Alignment object always was a list of 
SeqRecords (if you look at the internal implementation that is), and I 
hadn't stopped to really question it. I like having list like behaviour 
and exploit this in a lot of my code dealing with alignments.

The are some nice things about having dictionary like behaviour in an 
alignment class, but unless a notional sequence order is preserved, this 
breaks the array of characters model.

Also, using a dictionary like alignment would force the user to specify 
unique keys for each record (e.g. the record.id) which is something the 
current list-like-alignment does not require.

Perhaps we could have a "dictionary like" sub class of Alignment where 
the __getitem__ method would allow a record identifier in place of a row 
index:

print aln["P3454"]
print aln["P3454", 20]

instead or as well as:

print aln[10]
print aln[10, 20]

> One should carefully think about design of the Alignment class since it 
> will influence all further steps. As now the class is in its infancy 
> there is a very good moment for thinking what the Alignment class is for 
> and what it should support.

I had viewed the new __getitem__ method as a backwards compatible 
enhancement of the existing stable (but rather limited) 
Bio.Generic.Alignment class. That's not to say we can't design a new 
class from scratch - I just prefer gradual improvements without breaking 
existing usage.

I am particularly keen to allow splicing of alignments. For example, you 
could select the conserved core of an alignment by removing the left 
most 10 columns and the right most ten columns:

align_core = aln[:,10:-10]

 > For instance, the Alignment object should
> support changing characters in the alignment without a need of copying 
> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
> a list of SeqRecord objects with sequences implemented as immutable Seq 
> objects ?

No, right now you can't easily edit sequences in a Bio.Generic.Alignment 
(even with the proposed change) as it is implemented using immutable Seq 
objects. I personally haven't needed to edit an alignment like this.  Is 
this something you want to do often?

To me the obvious way to handle this is to have a MutableAlignment 
sub-class, where editing individual elements with aln[r,c] = "D" would 
be supported (possibly implemented using the MutableSeq class internally 
rather than the immutable Seq class).

On a related point, I was planning to raise the following suggestion in 
the future - adding alignments, like this:

combined_aln = aln1 + aln2

e.g. aln1 had 5 rows of length 10, and aln2 had 5 rows of length 15, 
then the result of aln1+aln2 would have 5 rows of length 25.

Alignment addition would only be defined for alignments with the same 
number of rows (perhaps also restricted to the same sequence type, and 
row weights?). The result would contain the same number of rows, where 
each sequence was the concatenation of the corresponding two rows in the 
input alignments. I'd suggest concatenating the record.id's (if 
different) however one could argue that it would be better to insist the 
user had made sure the two alignments had consistent identifiers.

An example of where this could be used is taking alignments of multiple 
sets of homologous genes, sorting them to use the same species order, 
and then creating a concatenated alignment for robust phylogenetic tree 
construction.

Peter


From mdehoon at c2b2.columbia.edu  Sat Jul 28 02:57:05 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 11:57:05 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AAB081.30609@c2b2.columbia.edu>

Jan Kosinski wrote:
> Hi,
> 
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not 
> clear while the [A:B][X:Y] syntax is clear and sufficient.

Python lists, tuples, and strings support [A:B:C], and Numerical Python 
2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment should 
not support this format.

--Michiel.


From mdehoon at c2b2.columbia.edu  Sat Jul 28 03:10:06 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 12:10:06 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>
	<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAB38E.50009@c2b2.columbia.edu>

Peter wrote:
> Perhaps we could have a "dictionary like" sub class of Alignment where 
> the __getitem__ method would allow a record identifier in place of a row 
> index:
> 
> print aln["P3454"]
> print aln["P3454", 20]
> 
> instead or as well as:
> 
> print aln[10]
> print aln[10, 20]

"as well as" would break if a user decides to use an integer as a key in 
the dictionary. A safer approach would be to define a method 
specifically for dictionary-like access. Something like:

print aln[10]
print aln[10,20]

for list-like access, and

print aln.get("P3454")

for dictionary-like access.

--Michiel.


From mdehoon at c2b2.columbia.edu  Sat Jul 28 04:11:03 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:11:03 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
	<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46AAC1D7.8030208@c2b2.columbia.edu>

Peter wrote:
> I've got a working bit of code put together now which I'll attached to 
> bug 1944 soon.
> 
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
> 
For the most part, I agree with the functionality in this patch. I have 
three suggestions though:

 >>> aln = Alignment(alphabet)
# Suggestion 1: We should allow creating an Alignment without specifying 
an alphabet

 >>> aln.add_sequence("seq1", "ATCGTTGC")
 >>> aln.add_sequence("seq2", "ATCCTTGC")
 >>> aln.add_sequence("seq3", "ATCCGTGC")
 >>> aln[0]
SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
name='<unknown name>', description='seq1', dbxrefs=[])
# Suggestion 2: I would expect "seq1" as the id rather than the description

 >>> aln[:2]
<Bio.Align.Generic.Alignment instance at 0x10aaeb8>
# OK
 >>> aln[:,4]
'TTG'
# OK
 >>> aln[2,:]
<Bio.Align.Generic.Alignment instance at 0x105efd0>
# Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment 
consisting of a single sequence doesn't make much sense.

--Michiel.


From mdehoon at c2b2.columbia.edu  Sat Jul 28 04:20:24 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:20:24 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>
	<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAC408.2050703@c2b2.columbia.edu>

Peter wrote:
>> For instance, the Alignment object should
>> support changing characters in the alignment without a need of copying 
>> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
>> a list of SeqRecord objects with sequences implemented as immutable Seq 
>> objects ?
> 
....
> 
> To me the obvious way to handle this is to have a MutableAlignment 
> sub-class, where editing individual elements with aln[r,c] = "D" would 
> be supported (possibly implemented using the MutableSeq class internally 
> rather than the immutable Seq class).
> 
I don't think we'd need a separate MutableAlignment for that. An 
Alignment is a list of sequences and is therefore mutable. If we add a 
__setitem__ method to the Alignment class, then this method can take 
care of constructing a new sequence and put it in the appropriate row.

--Michiel.


From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 10:04:04 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 11:04:04 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAC1D7.8030208@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46AAC1D7.8030208@c2b2.columbia.edu>
Message-ID: <46AB1494.301@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>> I've got a working bit of code put together now which I'll attached to 
>> bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
> For the most part, I agree with the functionality in this patch. I have 
> three suggestions though:
> 
>  >>> aln = Alignment(alphabet)
> # Suggestion 1: We should allow creating an Alignment without specifying 
> an alphabet

That would mean changing the existing __init__ from:

def __init__(self, alphabet):

to something like:

def __init__(self, alphabet=single_letter_alphabet):

with this import statement added:

from Bio.Alphabet import single_letter_alphabet

This seems like a good idea, and shouldn't break any existing code either.

>  >>> aln.add_sequence("seq1", "ATCGTTGC")
>  >>> aln.add_sequence("seq2", "ATCCTTGC")
>  >>> aln.add_sequence("seq3", "ATCCGTGC")
>  >>> aln[0]
> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
> name='<unknown name>', description='seq1', dbxrefs=[])
> # Suggestion 2: I would expect "seq1" as the id rather than the description

I agree with you here - this is the historic behaviour of the
add_sequence method which actually creates a SeqRecord from the strings
it is given. I would suggest it populate the record.id but for backwards
compatibility still populate the record.description in case anyone is
still using that.

We also could add an add_record method to the alignment object which
takes a SeqRecord, plus optional weight (and start and end?). Marc
Colosimo also made this point on bug 1944 (although I don't like his
mixed case method name).

>  >>> aln[:2]
> <Bio.Align.Generic.Alignment instance at 0x10aaeb8>
> # OK
>  >>> aln[:,4]
> 'TTG'
> # OK
>  >>> aln[2,:]
> <Bio.Align.Generic.Alignment instance at 0x105efd0>
> # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment 
> consisting of a single sequence doesn't make much sense.

I'll have a closer look, but as aln[2] returns a single SeqRecord maybe
aln[2,:] should do that too - rather than returning a string.

Peter


From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 13:14:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 14:14:43 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>	<46AA2727.103@maubp.freeserve.co.uk>
	<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB4143.5070406@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of copying 
>>> it (using  aln[a,x] = "D"). Can it be done now with Alignment which is 
>>> a list of SeqRecord objects with sequences implemented as immutable Seq 
>>> objects ?
> ....
>> To me the obvious way to handle this is to have a MutableAlignment 
>> sub-class, where editing individual elements with aln[r,c] = "D" would 
>> be supported (possibly implemented using the MutableSeq class internally 
>> rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An 
> Alignment is a list of sequences and is therefore mutable. If we add a 
> __setitem__ method to the Alignment class, then this method can take 
> care of constructing a new sequence and put it in the appropriate row.
> 
So rather than editing one character of a MutableSeq, we could replace 
one immutable Seq object with a new immutable Seq object where one 
character was different? That would work - sounds a little slow, but 
certainly possible.

Peter


From mdehoon at c2b2.columbia.edu  Sat Jul 28 15:15:49 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:15:49 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46AAC1D7.8030208@c2b2.columbia.edu>
	<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5DA5.6050604@c2b2.columbia.edu>

# Current method to add a row to the alignment:
>>> aln.add_sequence("seq1", "ATCGTTGC")
...

Peter wrote:
> We also could add an add_record method to the alignment object which
> takes a SeqRecord, plus optional weight (and start and end?). Marc
> Colosimo also made this point on bug 1944 (although I don't like his
> mixed case method name).

This is Marc Colosimo's suggestion for adding a SeqRecord:
     def addSeqRecord(self, seqRec):
         """Add a Sequence Record to the Alignment

         @param seqRec: a sequence record (SeqRecord) to add.
         """
         if isinstance(seqRec, SeqRecord):
             self._records.append(seqRec)
         else:
             raise TypeError("sequence is NOT a SeqRecord Object")

Since an Alignment is essentially a list of SeqRecords, I propose that 
we call the method to add a row to this list "append". In addition, this 
method should be able to take a SeqRecord, a Seq object, or a plain 
string. Something like this:

     def append(self, sequence):
         if isinstance(sequence, SeqRecord):
             self._records.append(sequence)
         elif isinstance(sequence, Seq):
             self._records.append(SeqRecord(sequence))
         elif isinstance(sequence, str):
             self._records.append(SeqRecord(Seq(sequence)))
         else:
             raise TypeError("sequence should be a string, a Seq Object, 
or a SeqRecord object")

This method can be generalized to allow a descriptor, weight, start, end 
end, just like in the current add_sequence method. Then we can replace 
add_sequence and addSeqRecord by a single append method.

--Michiel.


From mdehoon at c2b2.columbia.edu  Sat Jul 28 15:17:52 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:17:52 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46AAC1D7.8030208@c2b2.columbia.edu>
	<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5E20.5090605@c2b2.columbia.edu>

Peter wrote:
> Michiel de Hoon wrote:
>>  >>> aln.add_sequence("seq1", "ATCGTTGC")
>>  >>> aln[0]
>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
>> name='<unknown name>', description='seq1', dbxrefs=[])
>> # Suggestion 2: I would expect "seq1" as the id rather than the 
>> description
> 
> I agree with you here - this is the historic behaviour of the
> add_sequence method which actually creates a SeqRecord from the strings
> it is given. I would suggest it populate the record.id but for backwards
> compatibility still populate the record.description in case anyone is
> still using that.
> 
That sounds good to me.

--Michiel.


From mdehoon at c2b2.columbia.edu  Sat Jul 28 15:23:51 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:23:51 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AB4143.5070406@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>	<46AA2727.103@maubp.freeserve.co.uk>
	<46AAC408.2050703@c2b2.columbia.edu>
	<46AB4143.5070406@maubp.freeserve.co.uk>
Message-ID: <46AB5F87.1090506@c2b2.columbia.edu>

Peter wrote:
> Michiel de Hoon wrote:
>> Peter wrote:
>>>> For instance, the Alignment object should
>>>> support changing characters in the alignment without a need of 
>>>> copying it (using  aln[a,x] = "D"). Can it be done now with 
>>>> Alignment which is a list of SeqRecord objects with sequences 
>>>> implemented as immutable Seq objects ?
>> ....
>>> To me the obvious way to handle this is to have a MutableAlignment 
>>> sub-class, where editing individual elements with aln[r,c] = "D" 
>>> would be supported (possibly implemented using the MutableSeq class 
>>> internally rather than the immutable Seq class).
>>>
>> I don't think we'd need a separate MutableAlignment for that. An 
>> Alignment is a list of sequences and is therefore mutable. If we add a 
>> __setitem__ method to the Alignment class, then this method can take 
>> care of constructing a new sequence and put it in the appropriate row.
>>
> So rather than editing one character of a MutableSeq, we could replace 
> one immutable Seq object with a new immutable Seq object where one 
> character was different? That would work - sounds a little slow, but 
> certainly possible.
> 
At first, I also thought that that would be slow, especially for long 
sequences. But in practice, it's surprisingly fast. Unless somebody 
wants to edit an alignment of chromosome-size sequences, we probably 
won't run into a speed problem.

--Michiel.


From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 16:00:34 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 17:00:34 +0100
Subject: [Biopython-dev] adding rows to an alignment object
In-Reply-To: <46AB5DA5.6050604@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46AAC1D7.8030208@c2b2.columbia.edu>	<46AB1494.301@maubp.freeserve.co.uk>
	<46AB5DA5.6050604@c2b2.columbia.edu>
Message-ID: <46AB6822.6090706@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Since an Alignment is essentially a list of SeqRecords, I propose that 
> we call the method to add a row to this list "append".

Sounds very sensible.

 > In addition, this method should be able to take a SeqRecord, a Seq
 > object, or a plain string.

Do you really think we should complicate things like this? I would just 
accept SeqRecord objects (with optional start/end/weight).

> Something like this:
> 
>      def append(self, sequence):
>          if isinstance(sequence, SeqRecord):
>              self._records.append(sequence)
>          elif isinstance(sequence, Seq):
>              self._records.append(SeqRecord(sequence))
>          elif isinstance(sequence, str):
>              self._records.append(SeqRecord(Seq(sequence)))
>          else:
>              raise TypeError("sequence should be a string, a Seq Object, 
> or a SeqRecord object")

One minor point - we should use the alignment's alphabet when building a 
Seq object from a string. Perhaps we should even check the alphabet when 
asked to append a SeqRecord or Seq object...

 > This method can be generalized to allow a descriptor, weight, start,
 > end, just like in the current add_sequence method.

Where the descriptor is expected for Seq and string input, and used as 
the SeqRecord's id?

I would personally check the length matches the rest of the alignment 
(something the current add_sequence method doesn't do) otherwise its 
very easy to get a malformed alignment where some sequences are longer 
than others.

Also, I would leave the existing .add_sequence() method in place, but 
update its docstring to encourage use of .append() instead.

Peter


From biopython-dev at maubp.freeserve.co.uk  Sat Jul 28 15:49:11 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 16:49:11 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB5E20.5090605@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46AAC1D7.8030208@c2b2.columbia.edu>	<46AB1494.301@maubp.freeserve.co.uk>
	<46AB5E20.5090605@c2b2.columbia.edu>
Message-ID: <46AB6577.6050708@maubp.freeserve.co.uk>

Michiel de Hoon wrote:
> Peter wrote:
>> Michiel de Hoon wrote:
>>>  >>> aln.add_sequence("seq1", "ATCGTTGC")
>>>  >>> aln[0]
>>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='<unknown id>', 
>>> name='<unknown name>', description='seq1', dbxrefs=[])
>>> # Suggestion 2: I would expect "seq1" as the id rather than the 
>>> description
>> I agree with you here - this is the historic behaviour of the
>> add_sequence method which actually creates a SeqRecord from the strings
>> it is given. I would suggest it populate the record.id but for backwards
>> compatibility still populate the record.description in case anyone is
>> still using that.
>>
> That sounds good to me.

Good. Done, CVS revision 1.6 of file Bio/Align/Generic.py

Peter


From kosa at genesilico.pl  Sat Jul 28 16:53:04 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:53:04 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAB081.30609@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
Message-ID: <46AB7470.6010006@genesilico.pl>

Hi,

I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of 
alignments. Ins't [A:B,X:Y] sufficient?

Janek


Michiel de Hoon wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is 
>> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is 
>> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> Python lists, tuples, and strings support [A:B:C], and Numerical 
> Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment 
> should not support this format.
>
> --Michiel.
>
> :.
>


:.


From kosa at genesilico.pl  Sat Jul 28 16:55:33 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:55:33 +0200
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>	<46A9FD84.4080502@genesilico.pl>
	<46AA2727.103@maubp.freeserve.co.uk>
	<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB7505.30302@genesilico.pl>

Hi,

I think the same, an alignment should be mutable and there is no need 
for making two classes, mutable and not mutable.

Janek

Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of 
>>> copying it (using  aln[a,x] = "D"). Can it be done now with 
>>> Alignment which is a list of SeqRecord objects with sequences 
>>> implemented as immutable Seq objects ?
>>
> ....
>>
>> To me the obvious way to handle this is to have a MutableAlignment 
>> sub-class, where editing individual elements with aln[r,c] = "D" 
>> would be supported (possibly implemented using the MutableSeq class 
>> internally rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An 
> Alignment is a list of sequences and is therefore mutable. If we add a 
> __setitem__ method to the Alignment class, then this method can take 
> care of constructing a new sequence and put it in the appropriate row.
>
> --Michiel.
>
> :.
>


:.


From mdehoon at c2b2.columbia.edu  Sun Jul 29 04:38:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 13:38:28 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB7470.6010006@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>	<46A9E28D.40609@maubp.freeserve.co.uk>
	<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
	<46AB7470.6010006@genesilico.pl>
Message-ID: <46AC19C4.1000102@c2b2.columbia.edu>

Jan Kosinski wrote:
> I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of 
> alignments. Ins't [A:B,X:Y] sufficient?
> 
[A:B,X:Y] may be sufficient, but does not agree with Python indices for 
other objects (lists, tuples, strings). In addition, since allowing 
[A:B,X:Y] only is different from usual Python usage, we'd actually end 
up writing more code to specifically disallow [A:B:C,X:Y:Z].

Note also that [A:B:C,X:Y:Z] includes [A:B,X:Y] as a special case. So if 
the Alignment class is written to deal with [A:B:C,X:Y:Z], but I'd tell 
you that it expects [A:B,X:Y], then you wouldn't notice any difference. 
Until you'd try [A:B:C,X:Y:Z] and you find out that that works too.

--Michiel.