From bugzilla-daemon at portal.open-bio.org Sun Jul 1 01:54:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 01:54:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707010554.l615stgK032500@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 01:54 EST -------
Sorry for the mistake.
With the code for Python >= 2.4 separately, we still get an error message when
installing Biopython, because Python attempts to byte-compile each module. It
is not so serious, because this error is otherwise ignored. However, how about
this code for Python >= 2.4:
from itertools import cycle, imap
return sum(imap(lambda n,c: n*ord(c.upper()), cycle(range(1,58)),seq)) % 10000
It is almost as fast as the code you now have for Python >= 2.4, but avoids
having to create a separate module gcg24.py.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 07:02:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 07:02:47 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707011102.l61B2lHg029279@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 07:02 EST -------
Btw, I am finding that the code for Python < 2.3 is faster than the code for
Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
if we avoid copying seq, I still find that it is faster than the code for
Python >= 2.4.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Sun Jul 1 08:01:00 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 01 Jul 2007 21:01:00 +0900
Subject: [Biopython-dev] TempFastaWriter,
TempFastaWriterSingle in Bio/GFF/easy.py
In-Reply-To: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
References: <4685FCCA.4090904@c2b2.columbia.edu>
<320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
Message-ID: <4687977C.70903@c2b2.columbia.edu>
Peter wrote:
>> Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in
>> Bio/GFF/easy.py? They are currently using the old Fasta writer in
>> Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can
>> either update them to use the new Fasta writer, or simply remove them,
>> since currently these classes are not used anywhere in Biopython.
>
> This is for Bug 2284 right?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2284
>
> I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle
>
Actually I hadn't noticed bug 2284. I looked into this because the
Biopython tests are causing DeprecationWarnings. If no users of these
classes step forward, I am in favor of removing them.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 10:13:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 10:13:29 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707011413.l61EDTF3012907@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #19 from sbassi at gmail.com 2007-07-01 10:13 EST -------
(In reply to comment #18)
> Btw, I am finding that the code for Python < 2.3 is faster than the code for
> Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
> if we avoid copying seq, I still find that it is faster than the code for
> Python >= 2.4.
OK, so leave it w/o the check for python version and use just the 2.3 code.
Best,
SB.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 18:38:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 18:38:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707012238.l61Mct1k007379@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 18:38 EST -------
Updated in CVS, using the 2.3 code without copying seq.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 19:42:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:42:14 -0400
Subject: [Biopython-dev] [Bug 2327] New: test_Cluster takes too long
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2327
Summary: test_Cluster takes too long
Product: Biopython
Version: 1.43
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: idoerg at burnham.org
When running the biopython test suite, test_Cluster takes too long. I gave up
after 2 minutes.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 19:55:34 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:34 -0400
Subject: [Biopython-dev] [Bug 2327] test_Cluster takes too long
In-Reply-To:
Message-ID: <200707012355.l61NtYcR012177@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2327
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE
------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST -------
*** This bug has been marked as a duplicate of bug 2268 ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 19:55:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:36 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200707012355.l61NtaNW012196@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |idoerg at gmail.com
------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST -------
*** Bug 2327 has been marked as a duplicate of this bug. ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 3 07:03:40 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:03:40 -0400
Subject: [Biopython-dev] [Bug 2328] New: NCBIStandalone.blastall chokes on
integer argument
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
Summary: NCBIStandalone.blastall chokes on integer argument
Product: Biopython
Version: 1.43
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: grunberg at embl.de
CC: grunberg at embl.de
Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
expect that the argument align_view is given as a string rather than an
integer. So the following call worked with previous versions but now fails::
results, err = NCBIStandalone.blastall( settings.blast_bin,
method, db, seqFile,
expectation=e,
align_view=7, ## XML output
**kw)
The error is raised here::
NCBIStandalone: 1788 (blastall)
w, r, e = os.popen3(" ".join([blastcmd] + params))
because align_view escapes the str conversion of the other parameters in this
line::
params.extend([att2param['align_view'], align_view])
This line should rather look like this::
params.extend([att2param['align_view'], str(align_view)])
I am going to attach a patch to this bugreport.
Greetings,
Raik
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 3 07:05:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:05:37 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707031105.l63B5bAP013190@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
------- Comment #1 from grunberg at embl.de 2007-07-03 07:05 EST -------
Created an attachment (id=698)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=698&action=view)
patch for Bug 2328 (NCBIStandalone.blastall / blastpgp)
The patch is described in my bug report.
Cheers,
Raik
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 3 19:26:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 19:26:15 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707032326.l63NQFBB022873@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-07-03 19:26 EST -------
> Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> expect that the argument align_view is given as a string rather than an
> integer. So the following call worked with previous versions but now fails::
In which previous version of Biopython did this work? Your patch looks fine,
but I'd like to find out how this bug entered Biopython.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 5 09:30:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jul 2007 09:30:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707051330.l65DUW2k004459@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #21 from dalloliogm at gmail.com 2007-07-05 09:30 EST -------
(In reply to comment #1)
> Created an attachment (id=689)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) [details]
> Proposed functions (CRC64 and GCG checksum)
>
> This could be in utils.py, but I am not sure.
Maybe it could be useful to add a 'GCG checksum' attribute to the BioPython Seq
object.
Checksums could be used to quickly compare if two sequences are the same; but
in the documentation you should state very clearly that two sequences which
differ even for a single symbol (ex. AAANAAA and AAAAAAA) have different
values.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 7 05:28:56 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 7 Jul 2007 05:28:56 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707070928.l679SuTJ010432@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
------- Comment #3 from grunberg at embl.de 2007-07-07 05:28 EST -------
(In reply to comment #2)
> > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> > expect that the argument align_view is given as a string rather than an
> > integer. So the following call worked with previous versions but now fails::
>
> In which previous version of Biopython did this work? Your patch looks fine,
> but I'd like to find out how this bug entered Biopython.
>
Sorry about the late reply... My previous Biopython installation (which didn't
have the glitch) was version 1.42.
Greetings
Raik
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 8 00:20:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 8 Jul 2007 00:20:12 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707080420.l684KCSq031646@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-07-08 00:20 EST -------
Fixed in CVS (see biopython/Bio/Blast/NCBIStandalone.py revision 1.68).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From chengsoon.ong at tuebingen.mpg.de Mon Jul 9 06:15:50 2007
From: chengsoon.ong at tuebingen.mpg.de (Cheng Soon Ong)
Date: Mon, 9 Jul 2007 12:15:50 +0200
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
Message-ID:
Hi,
I've just written a small extension to the qblast function. The
current version of only passes a subset of parameters to NCBI. I've
just written some code such that it passes all the parameters that
the qblast API at NCBI accepts.
Is anyone interested to merge this into the blast module of
Biopython? Sorry, I do not know the protocol here for getting code
into Biopython.
Cheng
From mdehoon at c2b2.columbia.edu Mon Jul 9 07:40:23 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 09 Jul 2007 20:40:23 +0900
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
In-Reply-To:
References:
Message-ID: <46921EA7.2080106@c2b2.columbia.edu>
Dear Cheng,
Thank you for your contribution.
The "official" way to contribute code to Biopython is to open a bug
report at http://bugzilla.open-bio.org/, open a new bug report, and add
your code to it.
For your qblast code, you can also just send it to me (not to the list),
then I can merge it into Biopython.
--Michiel.
Cheng Soon Ong wrote:
> Hi,
>
> I've just written a small extension to the qblast function. The
> current version of only passes a subset of parameters to NCBI. I've
> just written some code such that it passes all the parameters that
> the qblast API at NCBI accepts.
>
> Is anyone interested to merge this into the blast module of
> Biopython? Sorry, I do not know the protocol here for getting code
> into Biopython.
>
From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 15:31:55 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 20:31:55 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
Message-ID: <4693DEAB.8000900@maubp.freeserve.co.uk>
Hi Tiago,
Have you had any feedback (off the mailing list)?
Ralph - did you have a chance to look over Tiago's code or discuss this
with him?
It would be a shame if nothing came from this...
Peter
Tiago Ant?o wrote:
> Hi!
>
> I have submitted another enhancement bug, with support for FDist. It
> allows to generate and parse Fdist files and to control fdist
> applications. There are also a couple of utility functions. FDist is a
> niche application (mainly used to detect selection in animal
> genetics). Not the most fundamental one to support, but it is
> currently one that I am working on, thus, the code.
>
> Regarding my summited code for GenePop, I have summited a different
> version on bugzilla. The main difference, is that I moved everything
> from Bio to Bio.PopGen.
>
> Before I continue putting code on bugzilla I would like to know if it
> is worthwhile doing it... Any opinions on the code submitted or if any
> changes are required? I would really like to continue converting my
> code to BioPython, but only if it has any possibility of ending up
> being useful/included in distribution somewhere in the future... ;)
>
> I am currently working on code related to SimCoal2, Arlequin and
> general statistics (Fst, heterozygosity, ...). Which will probably be
> ready quite soon (ie, next two weeks). This is more mainstream than
> FDist
>
> I have some other code lying around mainly related to HapMap, but I
> will only submit it after reviewing and reusing it again. This is more
> distant future ... like a couple of months.
>
> Tiago
From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 17:12:44 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 22:12:44 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To:
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID: <4693F64C.9050403@maubp.freeserve.co.uk>
Ralph Haygood wrote:
> Peter,
>
> I haven't received any code from Tiago to review.
>
> Ralph
He's put some on Bugzilla:
http://bugzilla.open-bio.org/show_bug.cgi?id=2170
Peter
From rhaygood at duke.edu Tue Jul 10 23:45:56 2007
From: rhaygood at duke.edu (Ralph Haygood)
Date: Tue, 10 Jul 2007 23:45:56 -0400 (EDT)
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <4693DEAB.8000900@maubp.freeserve.co.uk>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID:
Peter and Tiago,
Hello. No, I haven't done anything with Tiago's code. I'm afraid
it's pretty far from what I'm working on these days.
I still think it would be good for BioPython to include methods for
computing basic population-genetical statistics (Watterson's theta,
Tajima's D, etc.) from DNA alignments. I have in mind something like
BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own
code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
conform to BioPython's standards for style, testing, or documentation,
and I don't know when I'll have time to standardize it.
Ralph
On Tue, 10 Jul 2007, Peter wrote:
> Hi Tiago,
>
> Have you had any feedback (off the mailing list)?
>
> Ralph - did you have a chance to look over Tiago's code or discuss this with
> him?
>
> It would be a shame if nothing came from this...
>
> Peter
>
> Tiago Ant?o wrote:
>> Hi!
>>
>> I have submitted another enhancement bug, with support for FDist. It
>> allows to generate and parse Fdist files and to control fdist
>> applications. There are also a couple of utility functions. FDist is a
>> niche application (mainly used to detect selection in animal
>> genetics). Not the most fundamental one to support, but it is
>> currently one that I am working on, thus, the code.
>>
>> Regarding my summited code for GenePop, I have summited a different
>> version on bugzilla. The main difference, is that I moved everything
>> from Bio to Bio.PopGen.
>>
>> Before I continue putting code on bugzilla I would like to know if it
>> is worthwhile doing it... Any opinions on the code submitted or if any
>> changes are required? I would really like to continue converting my
>> code to BioPython, but only if it has any possibility of ending up
>> being useful/included in distribution somewhere in the future... ;)
>>
>> I am currently working on code related to SimCoal2, Arlequin and
>> general statistics (Fst, heterozygosity, ...). Which will probably be
>> ready quite soon (ie, next two weeks). This is more mainstream than
>> FDist
>>
>> I have some other code lying around mainly related to HapMap, but I
>> will only submit it after reviewing and reusing it again. This is more
>> distant future ... like a couple of months.
>>
>> Tiago
>
>
>
From tiagoantao at gmail.com Wed Jul 11 06:05:21 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 11 Jul 2007 12:05:21 +0200
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To:
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
Hi,
I had no feedback and it seemed that there was no interest, so I
decided to start a Python Population Genetics project on google, which
is going ahead, but still on alpha stages:
http://code.google.com/p/pypopgen/
I am doing this on a personal basis for now (I did not even announce
it anywhere), and so it is advancing at my personal pace and design
according to me needs
I have used it already (or a tiny part of it) on a published
aplication ( http://popgen.eu/soft/m4s2 ).
I am still willing to integrate this on BioPython, but for that some
interest and feedback would be needed... That would have to happen
somewhat soon as the code will have to be adapted to BioPython
standards and namespace, and when, in a future, there is a lot of code
that will be in practice difficult (and after going public it will be
impossible really).
The "strangest" code that I am doing (and that would need more
discussion) is one to do asyncronous computation (to be easy to use on
multicore computers and grids).
Regards,
Tiago
On 7/11/07, Ralph Haygood wrote:
> Peter and Tiago,
>
> Hello. No, I haven't done anything with Tiago's code. I'm afraid
> it's pretty far from what I'm working on these days.
>
> I still think it would be good for BioPython to include methods for
> computing basic population-genetical statistics (Watterson's theta,
> Tajima's D, etc.) from DNA alignments. I have in mind something like
> BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own
> code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> conform to BioPython's standards for style, testing, or documentation,
> and I don't know when I'll have time to standardize it.
>
> Ralph
>
> On Tue, 10 Jul 2007, Peter wrote:
>
> > Hi Tiago,
> >
> > Have you had any feedback (off the mailing list)?
> >
> > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > him?
> >
> > It would be a shame if nothing came from this...
> >
> > Peter
> >
> > Tiago Ant?o wrote:
> >> Hi!
> >>
> >> I have submitted another enhancement bug, with support for FDist. It
> >> allows to generate and parse Fdist files and to control fdist
> >> applications. There are also a couple of utility functions. FDist is a
> >> niche application (mainly used to detect selection in animal
> >> genetics). Not the most fundamental one to support, but it is
> >> currently one that I am working on, thus, the code.
> >>
> >> Regarding my summited code for GenePop, I have summited a different
> >> version on bugzilla. The main difference, is that I moved everything
> >> from Bio to Bio.PopGen.
> >>
> >> Before I continue putting code on bugzilla I would like to know if it
> >> is worthwhile doing it... Any opinions on the code submitted or if any
> >> changes are required? I would really like to continue converting my
> >> code to BioPython, but only if it has any possibility of ending up
> >> being useful/included in distribution somewhere in the future... ;)
> >>
> >> I am currently working on code related to SimCoal2, Arlequin and
> >> general statistics (Fst, heterozygosity, ...). Which will probably be
> >> ready quite soon (ie, next two weeks). This is more mainstream than
> >> FDist
> >>
> >> I have some other code lying around mainly related to HapMap, but I
> >> will only submit it after reviewing and reusing it again. This is more
> >> distant future ... like a couple of months.
> >>
> >> Tiago
> >
> >
> >
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 07:08:07 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:08:07 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131108.l6DB87xm027778@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:08 EST -------
I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py
and noticed that while crc64, gcg and seguid will cope with both strings and
Seq objects, crc32 will only cope with strings.
Any objections to me fixing this like so:
Old:
from binascii import crc32
New:
from binascii import crc32 as _crc32
def crc32(seq) :
"""Returns the crc32 checksum for a sequence (string or Seq object)"""
try :
#Assume its a Seq object
return _crc32(seq.tostring())
except AttributeError :
#Assume its a string
return _crc32(seq)
--
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 07:18:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:18:30 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131118.l6DBIUOS028425@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:18 EST -------
Created an attachment (id=703)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=703&action=view)
Initial unit test for Bio/SeqUtils/CheckSum
If the crc32 function could accept a Seq object then the "try/except" at the
end isn't needed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 10:38:52 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 10:38:52 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131438.l6DEcqn5008339@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp 2007-07-13 10:38 EST -------
A better solution would be for Seq to inherit from str, instead of Seq having
str as a member. Then we don't have to modify crc32, and other code in
Biopython will also become simpler.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:17:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:17:59 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707131517.l6DFHxY4010647@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:17 EST -------
I have just fixed a few in CVS, here a list of remaining abnormal
shebang/hashbang lines:
biopython/Bio/EUtils/POM.py '#!/usr/bin/python -i\n'
biopython/Bio/EUtils/DTDs/LinkOut.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/__init__.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eInfo_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eLink_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/ePost_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSearch_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSummary_020511.py '#!/usr/bin/python\n'
The biopython/Bio/EUtils/*.py examples are interesting in that many of those
files are autogenerated from DTD files (using the dtd2py.py script I think -
but it doesn't seem to work on all of them).
Also, I don't think all the files under Bio/Restriction/*.py need a shebang,
and a large proportion of the unit tests have shebangs (but less than half).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From tiagoantao at gmail.com Fri Jul 13 11:23:03 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 13 Jul 2007 16:23:03 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
<6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
Message-ID: <6d941f120707130823i6b62478fl9ac589eb5c02ca9d@mail.gmail.com>
I just want to add that I followed precisely the procedure that I was
suggested at that time, ie to open bugzilla issues, but I got no
answer or follow up from it. I also had some very useful mail
exchanges with Ralph at that time, but no code was floated around.
I reiterate my interest in supplying the code (currently supporting
fdist, simcoal2, genepop, hapmap, ucsc table browser - in varying
degrees of quality). You can have a look at the google url supplied
(svn repository in it). I would still take the necessary time to
convert it to BioPython namespace and format.
If in one week I see no interest (interest in the form of pro actively
making things go forward) at all then I will consider this a closed
issue and will not spend more time with trying any form of
integration, in the sense that I have done all that was requested here
and really got no feedback.
Tiago
On 7/11/07, Tiago Ant?o wrote:
> Hi,
>
> I had no feedback and it seemed that there was no interest, so I
> decided to start a Python Population Genetics project on google, which
> is going ahead, but still on alpha stages:
> http://code.google.com/p/pypopgen/
> I am doing this on a personal basis for now (I did not even announce
> it anywhere), and so it is advancing at my personal pace and design
> according to me needs
> I have used it already (or a tiny part of it) on a published
> aplication ( http://popgen.eu/soft/m4s2 ).
> I am still willing to integrate this on BioPython, but for that some
> interest and feedback would be needed... That would have to happen
> somewhat soon as the code will have to be adapted to BioPython
> standards and namespace, and when, in a future, there is a lot of code
> that will be in practice difficult (and after going public it will be
> impossible really).
>
> The "strangest" code that I am doing (and that would need more
> discussion) is one to do asyncronous computation (to be easy to use on
> multicore computers and grids).
>
> Regards,
> Tiago
>
> On 7/11/07, Ralph Haygood wrote:
> > Peter and Tiago,
> >
> > Hello. No, I haven't done anything with Tiago's code. I'm afraid
> > it's pretty far from what I'm working on these days.
> >
> > I still think it would be good for BioPython to include methods for
> > computing basic population-genetical statistics (Watterson's theta,
> > Tajima's D, etc.) from DNA alignments. I have in mind something like
> > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own
> > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> > conform to BioPython's standards for style, testing, or documentation,
> > and I don't know when I'll have time to standardize it.
> >
> > Ralph
> >
> > On Tue, 10 Jul 2007, Peter wrote:
> >
> > > Hi Tiago,
> > >
> > > Have you had any feedback (off the mailing list)?
> > >
> > > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > > him?
> > >
> > > It would be a shame if nothing came from this...
> > >
> > > Peter
> > >
> > > Tiago Ant?o wrote:
> > >> Hi!
> > >>
> > >> I have submitted another enhancement bug, with support for FDist. It
> > >> allows to generate and parse Fdist files and to control fdist
> > >> applications. There are also a couple of utility functions. FDist is a
> > >> niche application (mainly used to detect selection in animal
> > >> genetics). Not the most fundamental one to support, but it is
> > >> currently one that I am working on, thus, the code.
> > >>
> > >> Regarding my summited code for GenePop, I have summited a different
> > >> version on bugzilla. The main difference, is that I moved everything
> > >> from Bio to Bio.PopGen.
> > >>
> > >> Before I continue putting code on bugzilla I would like to know if it
> > >> is worthwhile doing it... Any opinions on the code submitted or if any
> > >> changes are required? I would really like to continue converting my
> > >> code to BioPython, but only if it has any possibility of ending up
> > >> being useful/included in distribution somewhere in the future... ;)
> > >>
> > >> I am currently working on code related to SimCoal2, Arlequin and
> > >> general statistics (Fst, heterozygosity, ...). Which will probably be
> > >> ready quite soon (ie, next two weeks). This is more mainstream than
> > >> FDist
> > >>
> > >> I have some other code lying around mainly related to HapMap, but I
> > >> will only submit it after reviewing and reusing it again. This is more
> > >> distant future ... like a couple of months.
> > >>
> > >> Tiago
> > >
> > >
> > >
>
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:25:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:25:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131525.l6DFPWMa011025@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:25 EST -------
Changing the Seq object to be a subclass of string might be nice... but perhaps
rather confusing for minority alphabets where the "letters" are not single
characters(*). More importantly, wouldn't this dramatic change break a lot of
existing scripts? Probably something for the mailing list!
(*) I've never done it, but one example is storing three letter protein
sequences, nice if you have any post translational modifications which cannot
be represented using the single letter scheme.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Sat Jul 14 06:22:06 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 14 Jul 2007 11:22:06 +0100
Subject: [Biopython-dev] Bug 2292 - TER lines from Bio.PDBIO
Message-ID: <4698A3CE.7020907@maubp.freeserve.co.uk>
Hi Thomas,
Could you have a look at Biopython Bug 2292 and the suggested patch from
Michal Gajda to write TER records in line with the spec:
http://bugzilla.open-bio.org/show_bug.cgi?id=2292
Thanks
Peter
From tiagoantao at gmail.com Sat Jul 14 12:32:43 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 14 Jul 2007 17:32:43 +0100
Subject: [Biopython-dev] Population Genetics code
Message-ID: <6d941f120707140932u356c84bel6a9322a2767e6da7@mail.gmail.com>
Hi!
Firstly I would like to thank everybody that answered so positively to
my "rant" about submitting population genetics code to Biopython.
I have a few suggestions on how to progress in a safe in constructive
way with a possible Population Genetics part for biopython.
First of all, the starting point:
1. There is none in the core developers that is working actively in
populations genetics
2. Point 1 entails that any code submissions (made by biopython
newbies like me) will not be able to be completely reviewed by
seasoned biopython developers
3. Initially there will only be me submitting code (please correct me
if I am wrong, especially Ralph...)
4. There is already some popgen statistical code in python lying
around e.g. http://www.pypop.org/
Therefore I suggest starting out by doing a small, "safe", project
around a not very used application (Mark Beaumont's Fdist program
http://www.rubic.rdg.ac.uk/~mab/software.html ). This code is already
done and tested (by myself). I also have test cases (in BioPython
format) for parts of it. The major issue is that it is currently
outside of Bio.PopGen namespace, so its not really very major...
I would provide parsers, configuration file generators and utilities
to run the suite of fdist programs.
Why start with such a simple and less relevant application:
1. Its safer to start with something less grand (if its poorly done it
won't be that serious).
2. There is no python fdist code lying around, so there is no overlap
at all with existing projects
3. This code is already done and being used...
I will provide code, test code, and documentation (probably by adding
stuff to the wiki). Then other people could evaluate what was done,
and we would continue from there to other, more used applications
(Genepop, arlequin, simcoal2, ...) and databases (HapMap,
TableBrowser).
Is this an acceptable way of going ahead? If other people would like
to participate, that would be fantastic...
If my suggestion is rubbish, please also say ;)
Many thanks,
Tiago
From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 14:27:40 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:27:40 +0100
Subject: [Biopython-dev] Biopython usage figures
Message-ID: <469BB89C.8010904@maubp.freeserve.co.uk>
A little last minute I know, but would anyone have access to the website
download statistics? I'd like to include rough figures for the number of
downloads of the recent releases in the BOSC 2007 talk.
A list of developers with CVS access would be nice too - but I can just
trawl though the logs to spot active people ;)
Peter
From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 14:50:49 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:50:49 +0100
Subject: [Biopython-dev] Is Bio.Crystal obsolete?
Message-ID: <469BBE09.1000005@maubp.freeserve.co.uk>
I just had a look at the Bio.Crystal module by Katharine Lindner (2002),
consisting of the single file Bio/Crystal/__init__.py whose preamble states:
> Hetero, Crystal and Chain exist to represent the NDB Atlas
> structure. Atlas is a minimal subset of the PDB format. Heteo
> supports a 3 alphameric code. The NDB web interface is located at
> ...
The old link should probably be updated as it doesn't work, perhaps:
http://ndbserver.rutgers.edu/atlas/index.html
As far as I can see, they now provide their downloads in PDB, CIF and an
XML file format - and the PDB files look like full thing to me at first
glance rather than a minimal subset.
There is a unit test, Tests/test_Crystal.py but no example input files.
This module looks obsolete to me - can we mark it as deprecated after
checking on the main list no one uses it (as done for Bio.Kabat back in
March 2007)?
Peter
From tiagoantao at gmail.com Wed Jul 18 06:29:08 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 11:29:08 +0100
Subject: [Biopython-dev] PopGen code
Message-ID: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
Hi!
Starting today I will begin putting code on CVS regarding Population
Genetics stuff.
I will start by checking in a GenePop parser and test code.
Very soon FDist code will follow.
After that Simcoal stuff, more GenePop stuff, HapMap and UCSC table
browser will follow.
I was not able to read dev.open-bio.org suggestions as it seems to be
down for a some time.
If any of the senior Biopython developers finds that I am doing
anything seriously wrong, please don't hesitate to contact me
immediately.
I will be putting everything below a PopGen directory in Bio.
Everything except tests, of course ;)
Regards,
Tiago
From biopython-dev at maubp.freeserve.co.uk Wed Jul 18 17:37:46 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jul 2007 22:37:46 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
Message-ID: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
Tiago Ant?o wrote:
> Hi!
>
> Starting today I will begin putting code on CVS regarding Population
> Genetics stuff...
> I will be putting everything below a PopGen directory in Bio.
> Everything except tests, of course ;)
Sounds good :)
If you can write some introductory text to add to the
cookbook/tutorial that would be even better. If you are not familiar
with LaTeX, then just write it up in plain text and I could add that
to the tutorial with suitable mark-up/formatting on your behalf.
This may be easier to do in chunks as you add new code, or in a large
batch later on - up to you.
Peter
From tiagoantao at gmail.com Wed Jul 18 18:46:19 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 23:46:19 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
<320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
Message-ID: <6d941f120707181546y34e17038nb07106dacae533db@mail.gmail.com>
Hi!
On 7/18/07, Peter wrote:
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better. If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.
I agree, in fact it is what I intend to do after having the FDist code in.
I will write mostly in parallel with commiting. So the doc should be
more or less aligned with what is being put in CVS...
Regards,
Tiago
From tiagoantao at gmail.com Thu Jul 19 09:09:29 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 19 Jul 2007 14:09:29 +0100
Subject: [Biopython-dev] PopGen Documentation
Message-ID: <6d941f120707190609o4f5f7412x94851295865ba22b@mail.gmail.com>
Hi All,
Following Peter's suggestion, I had a closer look at the
documentation, and, if nobody opposes, I would like to add a new
subsection between PDB and Miscellaneous on the cookbook chapter, Like
this
4.10 Going 3D: The PDB module
4.11 PopGen: Population genetics (and genomics)
4.12 Miscellaneous
Tiago
On 7/18/07, Peter wrote:
> Tiago Ant?o wrote:
> > Hi!
> >
> > Starting today I will begin putting code on CVS regarding Population
> > Genetics stuff...
> > I will be putting everything below a PopGen directory in Bio.
> > Everything except tests, of course ;)
>
> Sounds good :)
>
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better. If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.
>
> This may be easier to do in chunks as you add new code, or in a large
> batch later on - up to you.
>
> Peter
>
From bugzilla-daemon at portal.open-bio.org Sat Jul 21 11:28:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:28:49 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707211528.l6LFSnBk031498@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:28 EST -------
In biopython/Bio/EUtils, the shebang line was specified in sourcegen.py. I
fixed that line and the shebang lines in the other *.py files under
biopython/Bio/EUtils. Can we close this bug?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 21 11:47:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:47:32 -0400
Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF
folder after the install
In-Reply-To:
Message-ID: <200707211547.l6LFlWxU032394@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2291
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:47 EST -------
I have modified setup.py so that Bio.PDB.mmCIF is always a module (but not
necessarily with the MMCIFlex module; users still need to modify setup.py to
include it). With Bio.PDB.mmCIF always present, the Bio/PDB/mmCIF/__init__.py
file is no longer lost.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 22 04:30:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 04:30:11 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707220830.l6M8UB6d006746@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-22 04:30 EST -------
Regarding comment 8, after changing sourcegen.py were you able to regenerate
all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?
Anyway - that should leave us with consistent shebang/hashbang lines :)
Unless we also want to remove any surplus lines, and decide if all or none of
the unit tests should have them, then this bug looks done.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 22 05:53:46 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 05:53:46 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707220953.l6M9rkap010929@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2007-07-22 05:53 EST -------
> Regarding comment 8, after changing sourcegen.py were you able to regenerate
> all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?
I fixed them by hand. The fixed sourcegen.py should result in the same
biopython/Bio/EUtils/*.py files as I created by hand. I tried regenerating
these files automatically, but that didn't work for me. At some point, somebody
should figure out how the biopython/Bio/EUtils code works.
> Unless we also want to remove any surplus lines, and decide if all or none of
> the unit tests should have them, then this bug looks done.
Since Python itself does not seem to have a clear rule as to which files should
have a shebang line, it is not obvious which Biopython files should have one.
If somebody really wants to fix this, it's probably better to discuss such an
issue on the mailing list first. As the issue raised by the original bug report
has been resolved, I am closing this bug.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Sun Jul 22 06:28:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 22 Jul 2007 19:28:22 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
files with one record)
In-Reply-To: <4693E5FE.708@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk>
Message-ID: <46A33146.7030405@c2b2.columbia.edu>
Peter wrote:
> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
>
Let's discuss the Bio.Align.Alignment class first, and then decide how
to parse alignment files.
Currently, the alignment class holds a list of SeqRecord objects:
class Alignment:
...
def __init__(self, alphabet):
...
# hold everything at a list of seq record objects
self._records = []
To get access to self_record, the Alignment class has some accessor
functions:
def get_all_seqs(self):
...
return self._records
def get_seq_by_num(self, number):
...
return self._records[number].seq
A cleaner way to do this is to let the class Alignment inherit from
list. This also allows us to use all list methods on Alignment objects.
For example, we can iterate over them, as suggested in this bug report:
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
Any objections against letting Alignment inherit from list?
--Michiel
From salish at picasso.ucsf.edu Sun Jul 22 14:27:58 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Sun, 22 Jul 2007 11:27:58 -0700
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
files with one record)
In-Reply-To: <46A33146.7030405@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>
<46A33146.7030405@c2b2.columbia.edu>
Message-ID: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
Hello all,
To get this same behavior, you can also create the __iter__ and next()
methods in Alignment itself.
-Howard Salis
On 7/22/07, Michiel de Hoon wrote:
> Peter wrote:
> > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
> >
> Let's discuss the Bio.Align.Alignment class first, and then decide how
> to parse alignment files.
>
> Currently, the alignment class holds a list of SeqRecord objects:
>
>
> class Alignment:
> ...
> def __init__(self, alphabet):
> ...
> # hold everything at a list of seq record objects
> self._records = []
>
> To get access to self_record, the Alignment class has some accessor
> functions:
>
> def get_all_seqs(self):
> ...
> return self._records
>
>
> def get_seq_by_num(self, number):
> ...
> return self._records[number].seq
>
> A cleaner way to do this is to let the class Alignment inherit from
> list. This also allows us to use all list methods on Alignment objects.
> For example, we can iterate over them, as suggested in this bug report:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Any objections against letting Alignment inherit from list?
>
>
> --Michiel
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From mdehoon at c2b2.columbia.edu Wed Jul 25 09:17:33 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 22:17:33 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
files with one record)
In-Reply-To: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu>
<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
Message-ID: <46A74D6D.9020309@c2b2.columbia.edu>
Sure, that is possible, but that means we'd be adding methods to
Alignment in order for it to behave like a list, whereas we can get
that for free by letting the Alignment class inherit from list.
--Michiel.
Howard Salis wrote:
> Hello all,
>
>
> To get this same behavior, you can also create the __iter__ and next()
> methods in Alignment itself.
>
> -Howard Salis
>
> On 7/22/07, Michiel de Hoon wrote:
>> Peter wrote:
>>> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
>>> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
>>>
>> Let's discuss the Bio.Align.Alignment class first, and then decide how
>> to parse alignment files.
>>
>> Currently, the alignment class holds a list of SeqRecord objects:
>>
>>
>> class Alignment:
>> ...
>> def __init__(self, alphabet):
>> ...
>> # hold everything at a list of seq record objects
>> self._records = []
>>
>> To get access to self_record, the Alignment class has some accessor
>> functions:
>>
>> def get_all_seqs(self):
>> ...
>> return self._records
>>
>>
>> def get_seq_by_num(self, number):
>> ...
>> return self._records[number].seq
>>
>> A cleaner way to do this is to let the class Alignment inherit from
>> list. This also allows us to use all list methods on Alignment objects.
>> For example, we can iterate over them, as suggested in this bug report:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Any objections against letting Alignment inherit from list?
>>
>>
>> --Michiel
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 09:34:02 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 14:34:02 +0100
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A74D6D.9020309@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
<46A74D6D.9020309@c2b2.columbia.edu>
Message-ID: <46A7514A.1090405@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Sure, that is possible, but that means we'd be adding methods to
> Alignment in order for it to behave like a list, whereas we can get
> that for free by letting the Alignment class inherit from list.
>
> --Michiel.
Personally I see an alignment as both an array of characters (i.e. amino
acid residues or nucleotides), and a list of sequences.
In the same way that a Numeric or NumPy array lets you iterate over
rows, yet also access individual elements, we could allow iteration of
SeqRecords and also allow access to individual letters.
Peter
From mdehoon at c2b2.columbia.edu Wed Jul 25 10:44:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 23:44:56 +0900
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A7514A.1090405@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
<46A74D6D.9020309@c2b2.columbia.edu>
<46A7514A.1090405@maubp.freeserve.co.uk>
Message-ID: <46A761E8.5080909@c2b2.columbia.edu>
Peter wrote:
> Personally I see an alignment as both an array of characters (i.e. amino
> acid residues or nucleotides), and a list of sequences.
>
> In the same way that a Numeric or NumPy array lets you iterate over
> rows, yet also access individual elements, we could allow iteration of
> SeqRecords and also allow access to individual letters.
How about the following:
-Iterators iterate for the SeqRecords in the alignment
-An index of the form [xxx] returns the corresponding SeqRecord
-An index of the form [xxx:yyy:zzz] returns an Alignment object
containing the SeqRecords in rows [xxx:yyy:zzz]
(compare to the current method get_all_seqs()).
-An index of the form [xxx,:] returns the Seq object of the SeqRecord at
xxx (this is currently done by the get_seq_by_num() method).
-An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects
-An index of the form [:,www] returns a string containing the characters
at column www (which is currently done by the get_column method)
-An index of the form [xxx:yyy:zzz,www] returns a string containing the
characters at column www using only the rows xxx:yyy:zzz.
-An index of the form [xxx,www] returns a string containing the
character of the sequence in row xxx at column www.
This is more-or-less how Numerical Python arrays work, except that we'll
be returning SeqRecord/Seq/string objects depending on the indices.
--Michiel.
From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 12:10:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 17:10:43 +0100
Subject: [Biopython-dev] Improving the Alignment object. Was Bio.AlignIO
In-Reply-To: <46A761E8.5080909@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> <46A7514A.1090405@maubp.freeserve.co.uk>
<46A761E8.5080909@c2b2.columbia.edu>
Message-ID: <46A77603.1030101@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>> Personally I see an alignment as both an array of characters (i.e. amino
>> acid residues or nucleotides), and a list of sequences.
>>
>> In the same way that a Numeric or NumPy array lets you iterate over
>> rows, yet also access individual elements, we could allow iteration of
>> SeqRecords and also allow access to individual letters.
>
> How about the following:
>
> -Iterators iterate for the SeqRecords in the alignment
I Agree. And this is trivial to implement without needing the element
access/splicing support.
As to element access, we've been thinking along similar lines :)
Its just that with all the different special cases, there are lots of
different possible return types!
> -An index of the form [xxx] returns the corresponding SeqRecord
> -An index of the form [xxx:yyy:zzz] returns an Alignment object
> containing the SeqRecords in rows [xxx:yyy:zzz]
> (compare to the current method get_all_seqs()).
I agree. This is essential to make an alignment act like a list of
SeqRecord objects when only a one-dimensional index is given.
> -An index of the form [xxx,:] returns the Seq object of the SeqRecord at
> xxx (this is currently done by the get_seq_by_num() method).
> -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects
I'm not immediately convinced about returning Seq objects here. I might
expect indices like [xxx,:] to return a SeqRecord (not a Seq) and
[xxx:yyy:zzz,:] to return a sub-alignment (not a list of Seq objects).
> -An index of the form [:,www] returns a string containing the characters
> at column www (which is currently done by the get_column method)
> -An index of the form [xxx,www] returns a string containing the
> character of the sequence in row xxx at column www.
Those look fine - however we might want to return Seq objects rather
than strings.
> -An index of the form [xxx:yyy:zzz,www] returns a string containing
> the characters at column www using only the rows xxx:yyy:zzz.
Or a sub alignment? See later...
> This is more-or-less how Numerical Python arrays work, except that we'll
> be returning SeqRecord/Seq/string objects depending on the indices.
For comparison, that is what I had been thinking:
* [r,c] means one element is requested, return a single character string
* [r] or [r,:] means one row is requested, return a SeqRecord
* [:,c] means one column is requested, return a string (or Seq object?)
* Otherwise returns a (sub)alignment. Note that [:] or [:,:] would
return a copy of the alignment.
This would cover slicing of the column index by returning a
sub-alignment. i.e. indexes of the form [rrr, xxx:yyy:zzz] or
[rrr:ppp:qqq, xxx:yyy:zzz]
I'm not sure if requests for part of a single row or column like [rrr,
xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning
sub-alignments or as special cases (strings/Seq and Seq/SeqRecord
respectively?).
Peter
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 10:52:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 10:52:38 -0400
Subject: [Biopython-dev] [Bug 2340] New: SProt.py fails to parse the current
Swiss-Prot version 54.0
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
Summary: SProt.py fails to parse the current Swiss-Prot version
54.0
Product: Biopython
Version: 1.43
Platform: All
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: gould at embl.de
Hi,
I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
swiss-prot record but the parser just seems to bomb out not throwing an error
of where it actually fails. I'm guessing it has to do with the Release 54.0 of
24-Jul-07 of UniPROT with the addition of the new line type PE??
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 11:46:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 11:46:36 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261546.l6QFkaGq022472@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 11:46 EST -------
Hi Kate,
Could you give us the URL of one or two specific SwissProt files you're having
trouble with.
Also how are you trying to read the SwissProt files? e.g. with
Bio.SeqIO.parse()?
If you could include the python error too, that could be helpful. Thanks.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 12:06:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:06:15 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261606.l6QG6FkE023264@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #2 from gould at embl.de 2007-07-26 12:06 EST -------
(In reply to comment #0)
> Hi,
>
> I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
> swiss-prot record but the parser just seems to bomb out not throwing an error
> of where it actually fails. I'm guessing it has to do with the Release 54.0 of
> 24-Jul-07 of UniPROT with the addition of the new line type PE??
>
(In reply to comment #1)
> Hi Kate,
>
> Could you give us the URL of one or two specific SwissProt files you're having
> trouble with.
>
> Also how are you trying to read the SwissProt files? e.g. with
> Bio.SeqIO.parse()?
>
> If you could include the python error too, that could be helpful. Thanks.
>
> Peter
>
hi
the following snippet of code is where the error occurs(this used to work no
problem before something changed in the last day or two I guess)
def getSequence(self,acc):
""" This method retrieves the most recent annotated sequence from the ExPASy
server for a given accession number. """
from Bio.WWW import ExPASy
from Bio.SwissProt import SProt
from Bio import File
if acc != '':
try:
results = ExPASy.get_sprot_raw(acc.strip()).read()
sp_parser = SProt.RecordParser()
sp_iterator = SProt.Iterator(File.StringHandle(results),
sp_parser)
Record = sp_iterator.next()
return Record.sequence.strip()
except:
return -1
else:
return acc
breaks at line : Record = sp_iterator.next() but doesn't print any error to
terminal....
some examples of accessions nrs used are: P01100, P12522 etc
thanks
Kate
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 12:32:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:32:31 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261632.l6QGWVrC024560@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:32 EST -------
Confirmeing bug - it is due to the new PE line (protein evidence).
The reason you didn't see the error is in your example the parser is wrapped in
a try ... except ... clause.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 12:51:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:51:45 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261651.l6QGpja8025622@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:51 EST -------
I think I have fixed this - at least your example code now works.
You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
CVS, which you can download here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython
Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
to put things back.
Please test this and report back.
NOTE - The fix just makes the parser aware of the new PE line, and ignores it.
It doesn't (yet) do anything useful with the information it contains!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 27 02:46:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 02:46:35 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707270646.l6R6kZaI001699@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #5 from gould at embl.de 2007-07-27 02:46 EST -------
(In reply to comment #4)
> I think I have fixed this - at least your example code now works.
>
> You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
> CVS, which you can download here:
>
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython
>
> Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
> to put things back.
>
> Please test this and report back.
>
> NOTE - The fix just makes the parser aware of the new PE line, and ignores it.
> It doesn't (yet) do anything useful with the information it contains!
>
Yes it has done the trick and all works OK again. thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 27 03:54:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 03:54:14 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707270754.l6R7sEnm007432@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 03:54 EST -------
Great :)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From kosa at genesilico.pl Fri Jul 27 06:47:10 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 12:47:10 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
Message-ID: <46A9CD2E.6080402@genesilico.pl>
Hi,
From the viewpoint of the enduser we would like python Alignment object
to behave outside as an array so we could get slices, columns,
sequences, their fragments, whatever we want etc. The most intuitive and
clear (certainly much better than not very clear indexes like
[xxx:yyy:zzz]) for the user is the following.
[A:B][X:Y] - general syntax of indices. This supports almost everything.
Several examples of usage and proposed outputs:
[:][:] - returns an alignment or its copy (as Alignment object)
[:][x:y] - returns slice of the alignment (as Alignment object; aln of
all sequences and residues corresponding to columns from x and y)
[a:b][:] - returns the aln of seqs from a to b (as Alignment object)
[a:b][x:y] - returns the slice and subalignment (as Alignment object)
[a:a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as Alignment object)
[a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as a String)
[a:][x:y] and similar combinations - returns the slice and subalignment,
sequences from a to the last are included (as Alignment object)
[:][x] - returns single column (as a String object? string here could be
very useful)
[:][x:x] - returns single column (as Alignment object)
[a] - returns single sequence (as a SeqRecord object)
[a:a] and [a:a][:] - returns single sequence (as Alignment object)
[m][n] - returns n-th element of sequence m (as a String)
Disputable could be that different but similar sets of indices return
different types of objects (ex. [:][x] would return a column as string
while [:][x:x] would return a column as Alignment object, but in my
opinion it would just extend the usability).
The only problem is an implementation of such calls but it depends on
what type of object the Alignment object will be.
What do you think?
Cheers,
Jan Kosinski
Grzegorz Papaj
:.
From bugzilla-daemon at portal.open-bio.org Fri Jul 27 08:51:10 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 08:51:10 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200707271251.l6RCpAIg025706@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 08:51 EST -------
Created an attachment (id=721)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=721&action=view)
Patch for Bio/Align/Generic.py to add __getitem__ method
This patch adds a __getitem__ method, a small "mini test" when running the
module directly, and updates the doc strings. This gives SeqRecord iteration
"for free" (without an explicit __iter__ method).
As discussed on the mailing list, this allows an Alignment object to be treated
as a list of SeqRecord objects or as an array of character strings - plus
extract whole columns as strings.
Quoting the proposed __getitem__ doc string:
Depending on the indices, you can get a SeqRecord objects
(representing a single row), strings (for a single columns or
single characters) or another alignment (representing some or
part of the alignment).
align[r,c] gives a single character as a string
align[r] gives a SeqRecord
align[:,c] gives a column as a string
align[:] and align[:,:] give a copy of the alignment
Anything else gives a sub alignment, e.g.
align[0:2] or align[0:2,:] uses only row 0 and 1
align[:,1:3] uses only columns 1 and 2
align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2
Feedback welcome - either here, or on the developers' mailing list. Thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 08:18:21 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 13:18:21 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9CD2E.6080402@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
Message-ID: <46A9E28D.40609@maubp.freeserve.co.uk>
Jan Kosinski wrote:
> Hi,
>
> From the viewpoint of the enduser we would like python Alignment object
> to behave outside as an array so we could get slices, columns,
> sequences, their fragments, whatever we want etc. The most intuitive and
> clear (certainly much better than not very clear indexes like
> [xxx:yyy:zzz]) for the user is the following.
>
> [A:B][X:Y] - general syntax of indices. This supports almost everything.
I think Michiel and I were suggesting [A:B,X:Y] or rather [A:B:C,X:Y:Z]
to be fully general, rather than [A:B][X:Y] or [A:B:C][X:Y:Z]
i.e. [arg1, arg2] rather than [arg1][arg2]
This is an important point, as in the first case the __getitem__ method
of the alignment is called once (with both arguments). In the second
case, the __getitem__ method is called with arg1, and may return a
SeqRecord or an alignment - and this object's __getitem__ method is
called with arg2.
As written, many of your cases appear to be impossible - but using the
[arg1,arg2] we can get close.
I've got a working bit of code put together now which I'll attached to
bug 1944 soon.
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
Peter
From kosa at genesilico.pl Fri Jul 27 10:13:24 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:13:24 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46A9FD84.4080502@genesilico.pl>
Hi,
Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not
clear while the [A:B][X:Y] syntax is clear and sufficient.
We had another discussion in the lab about that Alignment object should
not store records in the list but rather in a dictionary (but keeping
information about sequence order ) or so. What is you reasoning for
making Alignment object a list of SeqRecord objects?
One should carefully think about design of the Alignment class since it
will influence all further steps. As now the class is in its infancy
there is a very good moment for thinking what the Alignment class is for
and what it should support. For instance, the Alignment object should
support changing characters in the alignment without a need of copying
it (using aln[a][x] = "D"). Can it be done now with Alignment which is
a list of SeqRecord objects with sequences implemented as immutable Seq
objects ?
Cheers,
Jan Kosinski
Peter wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> From the viewpoint of the enduser we would like python Alignment object
>> to behave outside as an array so we could get slices, columns,
>> sequences, their fragments, whatever we want etc. The most intuitive and
>> clear (certainly much better than not very clear indexes like
>> [xxx:yyy:zzz]) for the user is the following.
>>
>> [A:B][X:Y] - general syntax of indices. This supports almost everything.
>
> I think Michiel and I were suggesting [A:B,X:Y] or rather
> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or
> [A:B:C][X:Y:Z]
>
> i.e. [arg1, arg2] rather than [arg1][arg2]
>
> This is an important point, as in the first case the __getitem__
> method of the alignment is called once (with both arguments). In the
> second case, the __getitem__ method is called with arg1, and may
> return a SeqRecord or an alignment - and this object's __getitem__
> method is called with arg2.
>
> As written, many of your cases appear to be impossible - but using the
> [arg1,arg2] we can get close.
>
> I've got a working bit of code put together now which I'll attached to
> bug 1944 soon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Peter
>
>
> :.
>
:.
From kosa at genesilico.pl Fri Jul 27 10:35:15 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:35:15 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
<46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA02A3.30000@genesilico.pl>
Hi,
Sorry for a typo ;-) Of course it should read:
... while the [A:B,X:Y] syntax is clear and sufficient."
Cheers,
Janek
Jan Kosinski wrote:
> Hi,
>
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is
> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> We had another discussion in the lab about that Alignment object
> should not store records in the list but rather in a dictionary (but
> keeping information about sequence order ) or so. What is you
> reasoning for making Alignment object a list of SeqRecord objects?
> One should carefully think about design of the Alignment class since
> it will influence all further steps. As now the class is in its
> infancy there is a very good moment for thinking what the Alignment
> class is for and what it should support. For instance, the Alignment
> object should support changing characters in the alignment without a
> need of copying it (using aln[a][x] = "D"). Can it be done now with
> Alignment which is a list of SeqRecord objects with sequences
> implemented as immutable Seq objects ?
>
> Cheers,
> Jan Kosinski
>
>
> Peter wrote:
>> Jan Kosinski wrote:
>>> Hi,
>>>
>>> From the viewpoint of the enduser we would like python Alignment
>>> object
>>> to behave outside as an array so we could get slices, columns,
>>> sequences, their fragments, whatever we want etc. The most intuitive
>>> and
>>> clear (certainly much better than not very clear indexes like
>>> [xxx:yyy:zzz]) for the user is the following.
>>>
>>> [A:B][X:Y] - general syntax of indices. This supports almost
>>> everything.
>>
>> I think Michiel and I were suggesting [A:B,X:Y] or rather
>> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or
>> [A:B:C][X:Y:Z]
>>
>> i.e. [arg1, arg2] rather than [arg1][arg2]
>>
>> This is an important point, as in the first case the __getitem__
>> method of the alignment is called once (with both arguments). In the
>> second case, the __getitem__ method is called with arg1, and may
>> return a SeqRecord or an alignment - and this object's __getitem__
>> method is called with arg2.
>>
>> As written, many of your cases appear to be impossible - but using
>> the [arg1,arg2] we can get close.
>>
>> I've got a working bit of code put together now which I'll attached
>> to bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Peter
>>
>>
>> :.
>>
>
>
:.
From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 13:11:03 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 18:11:03 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA2727.103@maubp.freeserve.co.uk>
Jan Kosinski wrote:
> We had another discussion in the lab about that Alignment object should
> not store records in the list but rather in a dictionary (but keeping
> information about sequence order ) or so. What is you reasoning for
> making Alignment object a list of SeqRecord objects?
In a sense the Bio.Align.Generic.Alignment object always was a list of
SeqRecords (if you look at the internal implementation that is), and I
hadn't stopped to really question it. I like having list like behaviour
and exploit this in a lot of my code dealing with alignments.
The are some nice things about having dictionary like behaviour in an
alignment class, but unless a notional sequence order is preserved, this
breaks the array of characters model.
Also, using a dictionary like alignment would force the user to specify
unique keys for each record (e.g. the record.id) which is something the
current list-like-alignment does not require.
Perhaps we could have a "dictionary like" sub class of Alignment where
the __getitem__ method would allow a record identifier in place of a row
index:
print aln["P3454"]
print aln["P3454", 20]
instead or as well as:
print aln[10]
print aln[10, 20]
> One should carefully think about design of the Alignment class since it
> will influence all further steps. As now the class is in its infancy
> there is a very good moment for thinking what the Alignment class is for
> and what it should support.
I had viewed the new __getitem__ method as a backwards compatible
enhancement of the existing stable (but rather limited)
Bio.Generic.Alignment class. That's not to say we can't design a new
class from scratch - I just prefer gradual improvements without breaking
existing usage.
I am particularly keen to allow splicing of alignments. For example, you
could select the conserved core of an alignment by removing the left
most 10 columns and the right most ten columns:
align_core = aln[:,10:-10]
> For instance, the Alignment object should
> support changing characters in the alignment without a need of copying
> it (using aln[a,x] = "D"). Can it be done now with Alignment which is
> a list of SeqRecord objects with sequences implemented as immutable Seq
> objects ?
No, right now you can't easily edit sequences in a Bio.Generic.Alignment
(even with the proposed change) as it is implemented using immutable Seq
objects. I personally haven't needed to edit an alignment like this. Is
this something you want to do often?
To me the obvious way to handle this is to have a MutableAlignment
sub-class, where editing individual elements with aln[r,c] = "D" would
be supported (possibly implemented using the MutableSeq class internally
rather than the immutable Seq class).
On a related point, I was planning to raise the following suggestion in
the future - adding alignments, like this:
combined_aln = aln1 + aln2
e.g. aln1 had 5 rows of length 10, and aln2 had 5 rows of length 15,
then the result of aln1+aln2 would have 5 rows of length 25.
Alignment addition would only be defined for alignments with the same
number of rows (perhaps also restricted to the same sequence type, and
row weights?). The result would contain the same number of rows, where
each sequence was the concatenation of the corresponding two rows in the
input alignments. I'd suggest concatenating the record.id's (if
different) however one could argue that it would be better to insist the
user had made sure the two alignments had consistent identifiers.
An example of where this could be used is taking alignments of multiple
sets of homologous genes, sorting them to use the same species order,
and then creating a concatenated alignment for robust phylogenetic tree
construction.
Peter
From mdehoon at c2b2.columbia.edu Fri Jul 27 22:57:05 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 11:57:05 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AAB081.30609@c2b2.columbia.edu>
Jan Kosinski wrote:
> Hi,
>
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not
> clear while the [A:B][X:Y] syntax is clear and sufficient.
Python lists, tuples, and strings support [A:B:C], and Numerical Python
2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment should
not support this format.
--Michiel.
From mdehoon at c2b2.columbia.edu Fri Jul 27 23:10:06 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 12:10:06 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl>
<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAB38E.50009@c2b2.columbia.edu>
Peter wrote:
> Perhaps we could have a "dictionary like" sub class of Alignment where
> the __getitem__ method would allow a record identifier in place of a row
> index:
>
> print aln["P3454"]
> print aln["P3454", 20]
>
> instead or as well as:
>
> print aln[10]
> print aln[10, 20]
"as well as" would break if a user decides to use an integer as a key in
the dictionary. A safer approach would be to define a method
specifically for dictionary-like access. Something like:
print aln[10]
print aln[10,20]
for list-like access, and
print aln.get("P3454")
for dictionary-like access.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 00:11:03 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:11:03 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46AAC1D7.8030208@c2b2.columbia.edu>
Peter wrote:
> I've got a working bit of code put together now which I'll attached to
> bug 1944 soon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
For the most part, I agree with the functionality in this patch. I have
three suggestions though:
>>> aln = Alignment(alphabet)
# Suggestion 1: We should allow creating an Alignment without specifying
an alphabet
>>> aln.add_sequence("seq1", "ATCGTTGC")
>>> aln.add_sequence("seq2", "ATCCTTGC")
>>> aln.add_sequence("seq3", "ATCCGTGC")
>>> aln[0]
SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
name='', description='seq1', dbxrefs=[])
# Suggestion 2: I would expect "seq1" as the id rather than the description
>>> aln[:2]
# OK
>>> aln[:,4]
'TTG'
# OK
>>> aln[2,:]
# Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment
consisting of a single sequence doesn't make much sense.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 00:20:24 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:20:24 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl>
<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAC408.2050703@c2b2.columbia.edu>
Peter wrote:
>> For instance, the Alignment object should
>> support changing characters in the alignment without a need of copying
>> it (using aln[a,x] = "D"). Can it be done now with Alignment which is
>> a list of SeqRecord objects with sequences implemented as immutable Seq
>> objects ?
>
....
>
> To me the obvious way to handle this is to have a MutableAlignment
> sub-class, where editing individual elements with aln[r,c] = "D" would
> be supported (possibly implemented using the MutableSeq class internally
> rather than the immutable Seq class).
>
I don't think we'd need a separate MutableAlignment for that. An
Alignment is a list of sequences and is therefore mutable. If we add a
__setitem__ method to the Alignment class, then this method can take
care of constructing a new sequence and put it in the appropriate row.
--Michiel.
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 06:04:04 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 11:04:04 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAC1D7.8030208@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46AAC1D7.8030208@c2b2.columbia.edu>
Message-ID: <46AB1494.301@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>> I've got a working bit of code put together now which I'll attached to
>> bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
> For the most part, I agree with the functionality in this patch. I have
> three suggestions though:
>
> >>> aln = Alignment(alphabet)
> # Suggestion 1: We should allow creating an Alignment without specifying
> an alphabet
That would mean changing the existing __init__ from:
def __init__(self, alphabet):
to something like:
def __init__(self, alphabet=single_letter_alphabet):
with this import statement added:
from Bio.Alphabet import single_letter_alphabet
This seems like a good idea, and shouldn't break any existing code either.
> >>> aln.add_sequence("seq1", "ATCGTTGC")
> >>> aln.add_sequence("seq2", "ATCCTTGC")
> >>> aln.add_sequence("seq3", "ATCCGTGC")
> >>> aln[0]
> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
> name='', description='seq1', dbxrefs=[])
> # Suggestion 2: I would expect "seq1" as the id rather than the description
I agree with you here - this is the historic behaviour of the
add_sequence method which actually creates a SeqRecord from the strings
it is given. I would suggest it populate the record.id but for backwards
compatibility still populate the record.description in case anyone is
still using that.
We also could add an add_record method to the alignment object which
takes a SeqRecord, plus optional weight (and start and end?). Marc
Colosimo also made this point on bug 1944 (although I don't like his
mixed case method name).
> >>> aln[:2]
>
> # OK
> >>> aln[:,4]
> 'TTG'
> # OK
> >>> aln[2,:]
>
> # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment
> consisting of a single sequence doesn't make much sense.
I'll have a closer look, but as aln[2] returns a single SeqRecord maybe
aln[2,:] should do that too - rather than returning a string.
Peter
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 09:14:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 14:14:43 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk>
<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB4143.5070406@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of copying
>>> it (using aln[a,x] = "D"). Can it be done now with Alignment which is
>>> a list of SeqRecord objects with sequences implemented as immutable Seq
>>> objects ?
> ....
>> To me the obvious way to handle this is to have a MutableAlignment
>> sub-class, where editing individual elements with aln[r,c] = "D" would
>> be supported (possibly implemented using the MutableSeq class internally
>> rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An
> Alignment is a list of sequences and is therefore mutable. If we add a
> __setitem__ method to the Alignment class, then this method can take
> care of constructing a new sequence and put it in the appropriate row.
>
So rather than editing one character of a MutableSeq, we could replace
one immutable Seq object with a new immutable Seq object where one
character was different? That would work - sounds a little slow, but
certainly possible.
Peter
From mdehoon at c2b2.columbia.edu Sat Jul 28 11:15:49 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:15:49 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46AAC1D7.8030208@c2b2.columbia.edu>
<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5DA5.6050604@c2b2.columbia.edu>
# Current method to add a row to the alignment:
>>> aln.add_sequence("seq1", "ATCGTTGC")
...
Peter wrote:
> We also could add an add_record method to the alignment object which
> takes a SeqRecord, plus optional weight (and start and end?). Marc
> Colosimo also made this point on bug 1944 (although I don't like his
> mixed case method name).
This is Marc Colosimo's suggestion for adding a SeqRecord:
def addSeqRecord(self, seqRec):
"""Add a Sequence Record to the Alignment
@param seqRec: a sequence record (SeqRecord) to add.
"""
if isinstance(seqRec, SeqRecord):
self._records.append(seqRec)
else:
raise TypeError("sequence is NOT a SeqRecord Object")
Since an Alignment is essentially a list of SeqRecords, I propose that
we call the method to add a row to this list "append". In addition, this
method should be able to take a SeqRecord, a Seq object, or a plain
string. Something like this:
def append(self, sequence):
if isinstance(sequence, SeqRecord):
self._records.append(sequence)
elif isinstance(sequence, Seq):
self._records.append(SeqRecord(sequence))
elif isinstance(sequence, str):
self._records.append(SeqRecord(Seq(sequence)))
else:
raise TypeError("sequence should be a string, a Seq Object,
or a SeqRecord object")
This method can be generalized to allow a descriptor, weight, start, end
end, just like in the current add_sequence method. Then we can replace
add_sequence and addSeqRecord by a single append method.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 11:17:52 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:17:52 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46AAC1D7.8030208@c2b2.columbia.edu>
<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5E20.5090605@c2b2.columbia.edu>
Peter wrote:
> Michiel de Hoon wrote:
>> >>> aln.add_sequence("seq1", "ATCGTTGC")
>> >>> aln[0]
>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
>> name='', description='seq1', dbxrefs=[])
>> # Suggestion 2: I would expect "seq1" as the id rather than the
>> description
>
> I agree with you here - this is the historic behaviour of the
> add_sequence method which actually creates a SeqRecord from the strings
> it is given. I would suggest it populate the record.id but for backwards
> compatibility still populate the record.description in case anyone is
> still using that.
>
That sounds good to me.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 11:23:51 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:23:51 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AB4143.5070406@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk>
<46AAC408.2050703@c2b2.columbia.edu>
<46AB4143.5070406@maubp.freeserve.co.uk>
Message-ID: <46AB5F87.1090506@c2b2.columbia.edu>
Peter wrote:
> Michiel de Hoon wrote:
>> Peter wrote:
>>>> For instance, the Alignment object should
>>>> support changing characters in the alignment without a need of
>>>> copying it (using aln[a,x] = "D"). Can it be done now with
>>>> Alignment which is a list of SeqRecord objects with sequences
>>>> implemented as immutable Seq objects ?
>> ....
>>> To me the obvious way to handle this is to have a MutableAlignment
>>> sub-class, where editing individual elements with aln[r,c] = "D"
>>> would be supported (possibly implemented using the MutableSeq class
>>> internally rather than the immutable Seq class).
>>>
>> I don't think we'd need a separate MutableAlignment for that. An
>> Alignment is a list of sequences and is therefore mutable. If we add a
>> __setitem__ method to the Alignment class, then this method can take
>> care of constructing a new sequence and put it in the appropriate row.
>>
> So rather than editing one character of a MutableSeq, we could replace
> one immutable Seq object with a new immutable Seq object where one
> character was different? That would work - sounds a little slow, but
> certainly possible.
>
At first, I also thought that that would be slow, especially for long
sequences. But in practice, it's surprisingly fast. Unless somebody
wants to edit an alignment of chromosome-size sequences, we probably
won't run into a speed problem.
--Michiel.
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 12:00:34 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 17:00:34 +0100
Subject: [Biopython-dev] adding rows to an alignment object
In-Reply-To: <46AB5DA5.6050604@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk>
<46AB5DA5.6050604@c2b2.columbia.edu>
Message-ID: <46AB6822.6090706@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Since an Alignment is essentially a list of SeqRecords, I propose that
> we call the method to add a row to this list "append".
Sounds very sensible.
> In addition, this method should be able to take a SeqRecord, a Seq
> object, or a plain string.
Do you really think we should complicate things like this? I would just
accept SeqRecord objects (with optional start/end/weight).
> Something like this:
>
> def append(self, sequence):
> if isinstance(sequence, SeqRecord):
> self._records.append(sequence)
> elif isinstance(sequence, Seq):
> self._records.append(SeqRecord(sequence))
> elif isinstance(sequence, str):
> self._records.append(SeqRecord(Seq(sequence)))
> else:
> raise TypeError("sequence should be a string, a Seq Object,
> or a SeqRecord object")
One minor point - we should use the alignment's alphabet when building a
Seq object from a string. Perhaps we should even check the alphabet when
asked to append a SeqRecord or Seq object...
> This method can be generalized to allow a descriptor, weight, start,
> end, just like in the current add_sequence method.
Where the descriptor is expected for Seq and string input, and used as
the SeqRecord's id?
I would personally check the length matches the rest of the alignment
(something the current add_sequence method doesn't do) otherwise its
very easy to get a malformed alignment where some sequences are longer
than others.
Also, I would leave the existing .add_sequence() method in place, but
update its docstring to encourage use of .append() instead.
Peter
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 11:49:11 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 16:49:11 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB5E20.5090605@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk>
<46AB5E20.5090605@c2b2.columbia.edu>
Message-ID: <46AB6577.6050708@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>> Michiel de Hoon wrote:
>>> >>> aln.add_sequence("seq1", "ATCGTTGC")
>>> >>> aln[0]
>>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
>>> name='', description='seq1', dbxrefs=[])
>>> # Suggestion 2: I would expect "seq1" as the id rather than the
>>> description
>> I agree with you here - this is the historic behaviour of the
>> add_sequence method which actually creates a SeqRecord from the strings
>> it is given. I would suggest it populate the record.id but for backwards
>> compatibility still populate the record.description in case anyone is
>> still using that.
>>
> That sounds good to me.
Good. Done, CVS revision 1.6 of file Bio/Align/Generic.py
Peter
From kosa at genesilico.pl Sat Jul 28 12:53:04 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:53:04 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAB081.30609@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
Message-ID: <46AB7470.6010006@genesilico.pl>
Hi,
I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of
alignments. Ins't [A:B,X:Y] sufficient?
Janek
Michiel de Hoon wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
>> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is
>> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> Python lists, tuples, and strings support [A:B:C], and Numerical
> Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment
> should not support this format.
>
> --Michiel.
>
> :.
>
:.
From kosa at genesilico.pl Sat Jul 28 12:55:33 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:55:33 +0200
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl>
<46AA2727.103@maubp.freeserve.co.uk>
<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB7505.30302@genesilico.pl>
Hi,
I think the same, an alignment should be mutable and there is no need
for making two classes, mutable and not mutable.
Janek
Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of
>>> copying it (using aln[a,x] = "D"). Can it be done now with
>>> Alignment which is a list of SeqRecord objects with sequences
>>> implemented as immutable Seq objects ?
>>
> ....
>>
>> To me the obvious way to handle this is to have a MutableAlignment
>> sub-class, where editing individual elements with aln[r,c] = "D"
>> would be supported (possibly implemented using the MutableSeq class
>> internally rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An
> Alignment is a list of sequences and is therefore mutable. If we add a
> __setitem__ method to the Alignment class, then this method can take
> care of constructing a new sequence and put it in the appropriate row.
>
> --Michiel.
>
> :.
>
:.
From mdehoon at c2b2.columbia.edu Sun Jul 29 00:38:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 13:38:28 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB7470.6010006@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
<46AB7470.6010006@genesilico.pl>
Message-ID: <46AC19C4.1000102@c2b2.columbia.edu>
Jan Kosinski wrote:
> I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of
> alignments. Ins't [A:B,X:Y] sufficient?
>
[A:B,X:Y] may be sufficient, but does not agree with Python indices for
other objects (lists, tuples, strings). In addition, since allowing
[A:B,X:Y] only is different from usual Python usage, we'd actually end
up writing more code to specifically disallow [A:B:C,X:Y:Z].
Note also that [A:B:C,X:Y:Z] includes [A:B,X:Y] as a special case. So if
the Alignment class is written to deal with [A:B:C,X:Y:Z], but I'd tell
you that it expects [A:B,X:Y], then you wouldn't notice any difference.
Until you'd try [A:B:C,X:Y:Z] and you find out that that works too.
--Michiel.
From mdehoon at c2b2.columbia.edu Tue Jul 31 21:50:05 2007
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Tue, 31 Jul 2007 21:50:05 -0400
Subject: [Biopython-dev] Improving the Alignment object
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl>
<46AA2727.103@maubp.freeserve.co.uk>
<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5FD@mail2.exch.c2b2.columbia.edu>
Peter wrote:
> I'm not sure if requests for part of a single row or column like
> [rrr, xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning
> sub-alignments or as special cases (strings/Seq and Seq/SeqRecord
> respectively?).
Jan wrote:
> For instance, the Alignment object should
> support changing characters in the alignment without a need of copying
> it (using aln[a,x] = "D"). Can it be done now with Alignment which is
> a list of SeqRecord objects with sequences implemented as immutable Seq
> objects ?
>
If we allow
>>> aln[a,x] = "D"
then we should also allow
>>> aln[a,x:x+4] = "DEFG"
>>> aln[a:a+5,x] = "KLMNO"
and perhaps even
>>> aln[a:a+5,x:x+3] = ["KLMNO","PQRST","UVWXY"]
For consistency, I feel that then aln[a,x:y] and aln[a:b,x] should both
return a string.
--Michiel
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 02:55:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 30 Jun 2007 22:55:31 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707010255.l612tVwN022655@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #15 from mdehoon at ims.u-tokyo.ac.jp 2007-06-30 22:55 EST -------
I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 03:23:02 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 30 Jun 2007 23:23:02 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707010323.l613N24V023919@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #16 from sbassi at gmail.com 2007-06-30 23:23 EST -------
(In reply to comment #15)
> I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py.
>
This code won't run on Python 2.3:
=============================================
sbassi at hp:~/bioinfo$ python
Python 2.3.4 (#2, Jun 16 2005, 18:52:31)
[GCC 3.3.5 (Debian 1:3.3.5-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import CheckSum
Traceback (most recent call last):
File "", line 1, in ?
File "CheckSum.py", line 50
return sum(n*ord(c.upper()) for (n,c) in izip(cycle(range(1,58)),seq)) %
10000
^
SyntaxError: invalid syntax
==========================================
That is why I made a separate module for Python 2.4+
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 05:54:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 01:54:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707010554.l615stgK032500@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 01:54 EST -------
Sorry for the mistake.
With the code for Python >= 2.4 separately, we still get an error message when
installing Biopython, because Python attempts to byte-compile each module. It
is not so serious, because this error is otherwise ignored. However, how about
this code for Python >= 2.4:
from itertools import cycle, imap
return sum(imap(lambda n,c: n*ord(c.upper()), cycle(range(1,58)),seq)) % 10000
It is almost as fast as the code you now have for Python >= 2.4, but avoids
having to create a separate module gcg24.py.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 11:02:47 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 07:02:47 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707011102.l61B2lHg029279@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 07:02 EST -------
Btw, I am finding that the code for Python < 2.3 is faster than the code for
Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
if we avoid copying seq, I still find that it is faster than the code for
Python >= 2.4.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Sun Jul 1 12:01:00 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 01 Jul 2007 21:01:00 +0900
Subject: [Biopython-dev] TempFastaWriter,
TempFastaWriterSingle in Bio/GFF/easy.py
In-Reply-To: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
References: <4685FCCA.4090904@c2b2.columbia.edu>
<320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com>
Message-ID: <4687977C.70903@c2b2.columbia.edu>
Peter wrote:
>> Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in
>> Bio/GFF/easy.py? They are currently using the old Fasta writer in
>> Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can
>> either update them to use the new Fasta writer, or simply remove them,
>> since currently these classes are not used anywhere in Biopython.
>
> This is for Bug 2284 right?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2284
>
> I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle
>
Actually I hadn't noticed bug 2284. I looked into this because the
Biopython tests are causing DeprecationWarnings. If no users of these
classes step forward, I am in favor of removing them.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 14:13:29 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 10:13:29 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707011413.l61EDTF3012907@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #19 from sbassi at gmail.com 2007-07-01 10:13 EST -------
(In reply to comment #18)
> Btw, I am finding that the code for Python < 2.3 is faster than the code for
> Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even
> if we avoid copying seq, I still find that it is faster than the code for
> Python >= 2.4.
OK, so leave it w/o the check for python version and use just the 2.3 code.
Best,
SB.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 22:38:55 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 18:38:55 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707012238.l61Mct1k007379@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 18:38 EST -------
Updated in CVS, using the 2.3 code without copying seq.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 23:42:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:42:14 -0400
Subject: [Biopython-dev] [Bug 2327] New: test_Cluster takes too long
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2327
Summary: test_Cluster takes too long
Product: Biopython
Version: 1.43
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: idoerg at burnham.org
When running the biopython test suite, test_Cluster takes too long. I gave up
after 2 minutes.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 23:55:34 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:34 -0400
Subject: [Biopython-dev] [Bug 2327] test_Cluster takes too long
In-Reply-To:
Message-ID: <200707012355.l61NtYcR012177@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2327
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE
------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST -------
*** This bug has been marked as a duplicate of bug 2268 ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 1 23:55:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 1 Jul 2007 19:55:36 -0400
Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely
In-Reply-To:
Message-ID: <200707012355.l61NtaNW012196@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2268
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |idoerg at gmail.com
------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST -------
*** Bug 2327 has been marked as a duplicate of this bug. ***
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 3 11:03:40 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:03:40 -0400
Subject: [Biopython-dev] [Bug 2328] New: NCBIStandalone.blastall chokes on
integer argument
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
Summary: NCBIStandalone.blastall chokes on integer argument
Product: Biopython
Version: 1.43
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: grunberg at embl.de
CC: grunberg at embl.de
Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
expect that the argument align_view is given as a string rather than an
integer. So the following call worked with previous versions but now fails::
results, err = NCBIStandalone.blastall( settings.blast_bin,
method, db, seqFile,
expectation=e,
align_view=7, ## XML output
**kw)
The error is raised here::
NCBIStandalone: 1788 (blastall)
w, r, e = os.popen3(" ".join([blastcmd] + params))
because align_view escapes the str conversion of the other parameters in this
line::
params.extend([att2param['align_view'], align_view])
This line should rather look like this::
params.extend([att2param['align_view'], str(align_view)])
I am going to attach a patch to this bugreport.
Greetings,
Raik
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 3 11:05:37 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 07:05:37 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707031105.l63B5bAP013190@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
------- Comment #1 from grunberg at embl.de 2007-07-03 07:05 EST -------
Created an attachment (id=698)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=698&action=view)
patch for Bug 2328 (NCBIStandalone.blastall / blastpgp)
The patch is described in my bug report.
Cheers,
Raik
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Jul 3 23:26:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 3 Jul 2007 19:26:15 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707032326.l63NQFBB022873@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-07-03 19:26 EST -------
> Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> expect that the argument align_view is given as a string rather than an
> integer. So the following call worked with previous versions but now fails::
In which previous version of Biopython did this work? Your patch looks fine,
but I'd like to find out how this bug entered Biopython.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 5 13:30:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 5 Jul 2007 09:30:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707051330.l65DUW2k004459@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #21 from dalloliogm at gmail.com 2007-07-05 09:30 EST -------
(In reply to comment #1)
> Created an attachment (id=689)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) [details]
> Proposed functions (CRC64 and GCG checksum)
>
> This could be in utils.py, but I am not sure.
Maybe it could be useful to add a 'GCG checksum' attribute to the BioPython Seq
object.
Checksums could be used to quickly compare if two sequences are the same; but
in the documentation you should state very clearly that two sequences which
differ even for a single symbol (ex. AAANAAA and AAAAAAA) have different
values.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 7 09:28:56 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 7 Jul 2007 05:28:56 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707070928.l679SuTJ010432@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
------- Comment #3 from grunberg at embl.de 2007-07-07 05:28 EST -------
(In reply to comment #2)
> > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp
> > expect that the argument align_view is given as a string rather than an
> > integer. So the following call worked with previous versions but now fails::
>
> In which previous version of Biopython did this work? Your patch looks fine,
> but I'd like to find out how this bug entered Biopython.
>
Sorry about the late reply... My previous Biopython installation (which didn't
have the glitch) was version 1.42.
Greetings
Raik
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 8 04:20:12 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 8 Jul 2007 00:20:12 -0400
Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on
integer argument
In-Reply-To:
Message-ID: <200707080420.l684KCSq031646@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2328
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-07-08 00:20 EST -------
Fixed in CVS (see biopython/Bio/Blast/NCBIStandalone.py revision 1.68).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From chengsoon.ong at tuebingen.mpg.de Mon Jul 9 10:15:50 2007
From: chengsoon.ong at tuebingen.mpg.de (Cheng Soon Ong)
Date: Mon, 9 Jul 2007 12:15:50 +0200
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
Message-ID:
Hi,
I've just written a small extension to the qblast function. The
current version of only passes a subset of parameters to NCBI. I've
just written some code such that it passes all the parameters that
the qblast API at NCBI accepts.
Is anyone interested to merge this into the blast module of
Biopython? Sorry, I do not know the protocol here for getting code
into Biopython.
Cheng
From mdehoon at c2b2.columbia.edu Mon Jul 9 11:40:23 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Mon, 09 Jul 2007 20:40:23 +0900
Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast
In-Reply-To:
References:
Message-ID: <46921EA7.2080106@c2b2.columbia.edu>
Dear Cheng,
Thank you for your contribution.
The "official" way to contribute code to Biopython is to open a bug
report at http://bugzilla.open-bio.org/, open a new bug report, and add
your code to it.
For your qblast code, you can also just send it to me (not to the list),
then I can merge it into Biopython.
--Michiel.
Cheng Soon Ong wrote:
> Hi,
>
> I've just written a small extension to the qblast function. The
> current version of only passes a subset of parameters to NCBI. I've
> just written some code such that it passes all the parameters that
> the qblast API at NCBI accepts.
>
> Is anyone interested to merge this into the blast module of
> Biopython? Sorry, I do not know the protocol here for getting code
> into Biopython.
>
From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 19:31:55 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 20:31:55 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
Message-ID: <4693DEAB.8000900@maubp.freeserve.co.uk>
Hi Tiago,
Have you had any feedback (off the mailing list)?
Ralph - did you have a chance to look over Tiago's code or discuss this
with him?
It would be a shame if nothing came from this...
Peter
Tiago Ant?o wrote:
> Hi!
>
> I have submitted another enhancement bug, with support for FDist. It
> allows to generate and parse Fdist files and to control fdist
> applications. There are also a couple of utility functions. FDist is a
> niche application (mainly used to detect selection in animal
> genetics). Not the most fundamental one to support, but it is
> currently one that I am working on, thus, the code.
>
> Regarding my summited code for GenePop, I have summited a different
> version on bugzilla. The main difference, is that I moved everything
> from Bio to Bio.PopGen.
>
> Before I continue putting code on bugzilla I would like to know if it
> is worthwhile doing it... Any opinions on the code submitted or if any
> changes are required? I would really like to continue converting my
> code to BioPython, but only if it has any possibility of ending up
> being useful/included in distribution somewhere in the future... ;)
>
> I am currently working on code related to SimCoal2, Arlequin and
> general statistics (Fst, heterozygosity, ...). Which will probably be
> ready quite soon (ie, next two weeks). This is more mainstream than
> FDist
>
> I have some other code lying around mainly related to HapMap, but I
> will only submit it after reviewing and reusing it again. This is more
> distant future ... like a couple of months.
>
> Tiago
From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 21:12:44 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Tue, 10 Jul 2007 22:12:44 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To:
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID: <4693F64C.9050403@maubp.freeserve.co.uk>
Ralph Haygood wrote:
> Peter,
>
> I haven't received any code from Tiago to review.
>
> Ralph
He's put some on Bugzilla:
http://bugzilla.open-bio.org/show_bug.cgi?id=2170
Peter
From rhaygood at duke.edu Wed Jul 11 03:45:56 2007
From: rhaygood at duke.edu (Ralph Haygood)
Date: Tue, 10 Jul 2007 23:45:56 -0400 (EDT)
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <4693DEAB.8000900@maubp.freeserve.co.uk>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID:
Peter and Tiago,
Hello. No, I haven't done anything with Tiago's code. I'm afraid
it's pretty far from what I'm working on these days.
I still think it would be good for BioPython to include methods for
computing basic population-genetical statistics (Watterson's theta,
Tajima's D, etc.) from DNA alignments. I have in mind something like
BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own
code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
conform to BioPython's standards for style, testing, or documentation,
and I don't know when I'll have time to standardize it.
Ralph
On Tue, 10 Jul 2007, Peter wrote:
> Hi Tiago,
>
> Have you had any feedback (off the mailing list)?
>
> Ralph - did you have a chance to look over Tiago's code or discuss this with
> him?
>
> It would be a shame if nothing came from this...
>
> Peter
>
> Tiago Ant?o wrote:
>> Hi!
>>
>> I have submitted another enhancement bug, with support for FDist. It
>> allows to generate and parse Fdist files and to control fdist
>> applications. There are also a couple of utility functions. FDist is a
>> niche application (mainly used to detect selection in animal
>> genetics). Not the most fundamental one to support, but it is
>> currently one that I am working on, thus, the code.
>>
>> Regarding my summited code for GenePop, I have summited a different
>> version on bugzilla. The main difference, is that I moved everything
>> from Bio to Bio.PopGen.
>>
>> Before I continue putting code on bugzilla I would like to know if it
>> is worthwhile doing it... Any opinions on the code submitted or if any
>> changes are required? I would really like to continue converting my
>> code to BioPython, but only if it has any possibility of ending up
>> being useful/included in distribution somewhere in the future... ;)
>>
>> I am currently working on code related to SimCoal2, Arlequin and
>> general statistics (Fst, heterozygosity, ...). Which will probably be
>> ready quite soon (ie, next two weeks). This is more mainstream than
>> FDist
>>
>> I have some other code lying around mainly related to HapMap, but I
>> will only submit it after reviewing and reusing it again. This is more
>> distant future ... like a couple of months.
>>
>> Tiago
>
>
>
From tiagoantao at gmail.com Wed Jul 11 10:05:21 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 11 Jul 2007 12:05:21 +0200
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To:
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
Message-ID: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
Hi,
I had no feedback and it seemed that there was no interest, so I
decided to start a Python Population Genetics project on google, which
is going ahead, but still on alpha stages:
http://code.google.com/p/pypopgen/
I am doing this on a personal basis for now (I did not even announce
it anywhere), and so it is advancing at my personal pace and design
according to me needs
I have used it already (or a tiny part of it) on a published
aplication ( http://popgen.eu/soft/m4s2 ).
I am still willing to integrate this on BioPython, but for that some
interest and feedback would be needed... That would have to happen
somewhat soon as the code will have to be adapted to BioPython
standards and namespace, and when, in a future, there is a lot of code
that will be in practice difficult (and after going public it will be
impossible really).
The "strangest" code that I am doing (and that would need more
discussion) is one to do asyncronous computation (to be easy to use on
multicore computers and grids).
Regards,
Tiago
On 7/11/07, Ralph Haygood wrote:
> Peter and Tiago,
>
> Hello. No, I haven't done anything with Tiago's code. I'm afraid
> it's pretty far from what I'm working on these days.
>
> I still think it would be good for BioPython to include methods for
> computing basic population-genetical statistics (Watterson's theta,
> Tajima's D, etc.) from DNA alignments. I have in mind something like
> BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own
> code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> conform to BioPython's standards for style, testing, or documentation,
> and I don't know when I'll have time to standardize it.
>
> Ralph
>
> On Tue, 10 Jul 2007, Peter wrote:
>
> > Hi Tiago,
> >
> > Have you had any feedback (off the mailing list)?
> >
> > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > him?
> >
> > It would be a shame if nothing came from this...
> >
> > Peter
> >
> > Tiago Ant?o wrote:
> >> Hi!
> >>
> >> I have submitted another enhancement bug, with support for FDist. It
> >> allows to generate and parse Fdist files and to control fdist
> >> applications. There are also a couple of utility functions. FDist is a
> >> niche application (mainly used to detect selection in animal
> >> genetics). Not the most fundamental one to support, but it is
> >> currently one that I am working on, thus, the code.
> >>
> >> Regarding my summited code for GenePop, I have summited a different
> >> version on bugzilla. The main difference, is that I moved everything
> >> from Bio to Bio.PopGen.
> >>
> >> Before I continue putting code on bugzilla I would like to know if it
> >> is worthwhile doing it... Any opinions on the code submitted or if any
> >> changes are required? I would really like to continue converting my
> >> code to BioPython, but only if it has any possibility of ending up
> >> being useful/included in distribution somewhere in the future... ;)
> >>
> >> I am currently working on code related to SimCoal2, Arlequin and
> >> general statistics (Fst, heterozygosity, ...). Which will probably be
> >> ready quite soon (ie, next two weeks). This is more mainstream than
> >> FDist
> >>
> >> I have some other code lying around mainly related to HapMap, but I
> >> will only submit it after reviewing and reusing it again. This is more
> >> distant future ... like a couple of months.
> >>
> >> Tiago
> >
> >
> >
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:08:07 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:08:07 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131108.l6DB87xm027778@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |biopython-
| |bugzilla at maubp.freeserve.co.
| |uk
Status|RESOLVED |REOPENED
Resolution|FIXED |
------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:08 EST -------
I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py
and noticed that while crc64, gcg and seguid will cope with both strings and
Seq objects, crc32 will only cope with strings.
Any objections to me fixing this like so:
Old:
from binascii import crc32
New:
from binascii import crc32 as _crc32
def crc32(seq) :
"""Returns the crc32 checksum for a sequence (string or Seq object)"""
try :
#Assume its a Seq object
return _crc32(seq.tostring())
except AttributeError :
#Assume its a string
return _crc32(seq)
--
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:18:30 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 07:18:30 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131118.l6DBIUOS028425@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:18 EST -------
Created an attachment (id=703)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=703&action=view)
Initial unit test for Bio/SeqUtils/CheckSum
If the crc32 function could accept a Seq object then the "try/except" at the
end isn't needed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 14:38:52 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 10:38:52 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131438.l6DEcqn5008339@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp 2007-07-13 10:38 EST -------
A better solution would be for Seq to inherit from str, instead of Seq having
str as a member. Then we don't have to modify crc32, and other code in
Biopython will also become simpler.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 15:17:59 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:17:59 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707131517.l6DFHxY4010647@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:17 EST -------
I have just fixed a few in CVS, here a list of remaining abnormal
shebang/hashbang lines:
biopython/Bio/EUtils/POM.py '#!/usr/bin/python -i\n'
biopython/Bio/EUtils/DTDs/LinkOut.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/__init__.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eInfo_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eLink_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/ePost_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSearch_020511.py '#!/usr/bin/python\n'
biopython/Bio/EUtils/DTDs/eSummary_020511.py '#!/usr/bin/python\n'
The biopython/Bio/EUtils/*.py examples are interesting in that many of those
files are autogenerated from DTD files (using the dtd2py.py script I think -
but it doesn't seem to work on all of them).
Also, I don't think all the files under Bio/Restriction/*.py need a shebang,
and a large proportion of the unit tests have shebangs (but less than half).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From tiagoantao at gmail.com Fri Jul 13 15:23:03 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 13 Jul 2007 16:23:03 +0100
Subject: [Biopython-dev] FDist: more Population Genetics code
In-Reply-To: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com>
<4693DEAB.8000900@maubp.freeserve.co.uk>
<6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com>
Message-ID: <6d941f120707130823i6b62478fl9ac589eb5c02ca9d@mail.gmail.com>
I just want to add that I followed precisely the procedure that I was
suggested at that time, ie to open bugzilla issues, but I got no
answer or follow up from it. I also had some very useful mail
exchanges with Ralph at that time, but no code was floated around.
I reiterate my interest in supplying the code (currently supporting
fdist, simcoal2, genepop, hapmap, ucsc table browser - in varying
degrees of quality). You can have a look at the google url supplied
(svn repository in it). I would still take the necessary time to
convert it to BioPython namespace and format.
If in one week I see no interest (interest in the form of pro actively
making things go forward) at all then I will consider this a closed
issue and will not spend more time with trying any form of
integration, in the sense that I have done all that was requested here
and really got no feedback.
Tiago
On 7/11/07, Tiago Ant?o wrote:
> Hi,
>
> I had no feedback and it seemed that there was no interest, so I
> decided to start a Python Population Genetics project on google, which
> is going ahead, but still on alpha stages:
> http://code.google.com/p/pypopgen/
> I am doing this on a personal basis for now (I did not even announce
> it anywhere), and so it is advancing at my personal pace and design
> according to me needs
> I have used it already (or a tiny part of it) on a published
> aplication ( http://popgen.eu/soft/m4s2 ).
> I am still willing to integrate this on BioPython, but for that some
> interest and feedback would be needed... That would have to happen
> somewhat soon as the code will have to be adapted to BioPython
> standards and namespace, and when, in a future, there is a lot of code
> that will be in practice difficult (and after going public it will be
> impossible really).
>
> The "strangest" code that I am doing (and that would need more
> discussion) is one to do asyncronous computation (to be easy to use on
> multicore computers and grids).
>
> Regards,
> Tiago
>
> On 7/11/07, Ralph Haygood wrote:
> > Peter and Tiago,
> >
> > Hello. No, I haven't done anything with Tiago's code. I'm afraid
> > it's pretty far from what I'm working on these days.
> >
> > I still think it would be good for BioPython to include methods for
> > computing basic population-genetical statistics (Watterson's theta,
> > Tajima's D, etc.) from DNA alignments. I have in mind something like
> > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own
> > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't
> > conform to BioPython's standards for style, testing, or documentation,
> > and I don't know when I'll have time to standardize it.
> >
> > Ralph
> >
> > On Tue, 10 Jul 2007, Peter wrote:
> >
> > > Hi Tiago,
> > >
> > > Have you had any feedback (off the mailing list)?
> > >
> > > Ralph - did you have a chance to look over Tiago's code or discuss this with
> > > him?
> > >
> > > It would be a shame if nothing came from this...
> > >
> > > Peter
> > >
> > > Tiago Ant?o wrote:
> > >> Hi!
> > >>
> > >> I have submitted another enhancement bug, with support for FDist. It
> > >> allows to generate and parse Fdist files and to control fdist
> > >> applications. There are also a couple of utility functions. FDist is a
> > >> niche application (mainly used to detect selection in animal
> > >> genetics). Not the most fundamental one to support, but it is
> > >> currently one that I am working on, thus, the code.
> > >>
> > >> Regarding my summited code for GenePop, I have summited a different
> > >> version on bugzilla. The main difference, is that I moved everything
> > >> from Bio to Bio.PopGen.
> > >>
> > >> Before I continue putting code on bugzilla I would like to know if it
> > >> is worthwhile doing it... Any opinions on the code submitted or if any
> > >> changes are required? I would really like to continue converting my
> > >> code to BioPython, but only if it has any possibility of ending up
> > >> being useful/included in distribution somewhere in the future... ;)
> > >>
> > >> I am currently working on code related to SimCoal2, Arlequin and
> > >> general statistics (Fst, heterozygosity, ...). Which will probably be
> > >> ready quite soon (ie, next two weeks). This is more mainstream than
> > >> FDist
> > >>
> > >> I have some other code lying around mainly related to HapMap, but I
> > >> will only submit it after reviewing and reusing it again. This is more
> > >> distant future ... like a couple of months.
> > >>
> > >> Tiago
> > >
> > >
> > >
>
From bugzilla-daemon at portal.open-bio.org Fri Jul 13 15:25:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 13 Jul 2007 11:25:32 -0400
Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64
In-Reply-To:
Message-ID: <200707131525.l6DFPWMa011025@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2323
------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:25 EST -------
Changing the Seq object to be a subclass of string might be nice... but perhaps
rather confusing for minority alphabets where the "letters" are not single
characters(*). More importantly, wouldn't this dramatic change break a lot of
existing scripts? Probably something for the mailing list!
(*) I've never done it, but one example is storing three letter protein
sequences, nice if you have any post translational modifications which cannot
be represented using the single letter scheme.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Sat Jul 14 10:22:06 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 14 Jul 2007 11:22:06 +0100
Subject: [Biopython-dev] Bug 2292 - TER lines from Bio.PDBIO
Message-ID: <4698A3CE.7020907@maubp.freeserve.co.uk>
Hi Thomas,
Could you have a look at Biopython Bug 2292 and the suggested patch from
Michal Gajda to write TER records in line with the spec:
http://bugzilla.open-bio.org/show_bug.cgi?id=2292
Thanks
Peter
From tiagoantao at gmail.com Sat Jul 14 16:32:43 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 14 Jul 2007 17:32:43 +0100
Subject: [Biopython-dev] Population Genetics code
Message-ID: <6d941f120707140932u356c84bel6a9322a2767e6da7@mail.gmail.com>
Hi!
Firstly I would like to thank everybody that answered so positively to
my "rant" about submitting population genetics code to Biopython.
I have a few suggestions on how to progress in a safe in constructive
way with a possible Population Genetics part for biopython.
First of all, the starting point:
1. There is none in the core developers that is working actively in
populations genetics
2. Point 1 entails that any code submissions (made by biopython
newbies like me) will not be able to be completely reviewed by
seasoned biopython developers
3. Initially there will only be me submitting code (please correct me
if I am wrong, especially Ralph...)
4. There is already some popgen statistical code in python lying
around e.g. http://www.pypop.org/
Therefore I suggest starting out by doing a small, "safe", project
around a not very used application (Mark Beaumont's Fdist program
http://www.rubic.rdg.ac.uk/~mab/software.html ). This code is already
done and tested (by myself). I also have test cases (in BioPython
format) for parts of it. The major issue is that it is currently
outside of Bio.PopGen namespace, so its not really very major...
I would provide parsers, configuration file generators and utilities
to run the suite of fdist programs.
Why start with such a simple and less relevant application:
1. Its safer to start with something less grand (if its poorly done it
won't be that serious).
2. There is no python fdist code lying around, so there is no overlap
at all with existing projects
3. This code is already done and being used...
I will provide code, test code, and documentation (probably by adding
stuff to the wiki). Then other people could evaluate what was done,
and we would continue from there to other, more used applications
(Genepop, arlequin, simcoal2, ...) and databases (HapMap,
TableBrowser).
Is this an acceptable way of going ahead? If other people would like
to participate, that would be fantastic...
If my suggestion is rubbish, please also say ;)
Many thanks,
Tiago
From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 18:27:40 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:27:40 +0100
Subject: [Biopython-dev] Biopython usage figures
Message-ID: <469BB89C.8010904@maubp.freeserve.co.uk>
A little last minute I know, but would anyone have access to the website
download statistics? I'd like to include rough figures for the number of
downloads of the recent releases in the BOSC 2007 talk.
A list of developers with CVS access would be nice too - but I can just
trawl though the logs to spot active people ;)
Peter
From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 18:50:49 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Mon, 16 Jul 2007 19:50:49 +0100
Subject: [Biopython-dev] Is Bio.Crystal obsolete?
Message-ID: <469BBE09.1000005@maubp.freeserve.co.uk>
I just had a look at the Bio.Crystal module by Katharine Lindner (2002),
consisting of the single file Bio/Crystal/__init__.py whose preamble states:
> Hetero, Crystal and Chain exist to represent the NDB Atlas
> structure. Atlas is a minimal subset of the PDB format. Heteo
> supports a 3 alphameric code. The NDB web interface is located at
> ...
The old link should probably be updated as it doesn't work, perhaps:
http://ndbserver.rutgers.edu/atlas/index.html
As far as I can see, they now provide their downloads in PDB, CIF and an
XML file format - and the PDB files look like full thing to me at first
glance rather than a minimal subset.
There is a unit test, Tests/test_Crystal.py but no example input files.
This module looks obsolete to me - can we mark it as deprecated after
checking on the main list no one uses it (as done for Bio.Kabat back in
March 2007)?
Peter
From tiagoantao at gmail.com Wed Jul 18 10:29:08 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 11:29:08 +0100
Subject: [Biopython-dev] PopGen code
Message-ID: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
Hi!
Starting today I will begin putting code on CVS regarding Population
Genetics stuff.
I will start by checking in a GenePop parser and test code.
Very soon FDist code will follow.
After that Simcoal stuff, more GenePop stuff, HapMap and UCSC table
browser will follow.
I was not able to read dev.open-bio.org suggestions as it seems to be
down for a some time.
If any of the senior Biopython developers finds that I am doing
anything seriously wrong, please don't hesitate to contact me
immediately.
I will be putting everything below a PopGen directory in Bio.
Everything except tests, of course ;)
Regards,
Tiago
From biopython-dev at maubp.freeserve.co.uk Wed Jul 18 21:37:46 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 18 Jul 2007 22:37:46 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
Message-ID: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
Tiago Ant?o wrote:
> Hi!
>
> Starting today I will begin putting code on CVS regarding Population
> Genetics stuff...
> I will be putting everything below a PopGen directory in Bio.
> Everything except tests, of course ;)
Sounds good :)
If you can write some introductory text to add to the
cookbook/tutorial that would be even better. If you are not familiar
with LaTeX, then just write it up in plain text and I could add that
to the tutorial with suitable mark-up/formatting on your behalf.
This may be easier to do in chunks as you add new code, or in a large
batch later on - up to you.
Peter
From tiagoantao at gmail.com Wed Jul 18 22:46:19 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 18 Jul 2007 23:46:19 +0100
Subject: [Biopython-dev] PopGen code
In-Reply-To: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com>
<320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com>
Message-ID: <6d941f120707181546y34e17038nb07106dacae533db@mail.gmail.com>
Hi!
On 7/18/07, Peter wrote:
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better. If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.
I agree, in fact it is what I intend to do after having the FDist code in.
I will write mostly in parallel with commiting. So the doc should be
more or less aligned with what is being put in CVS...
Regards,
Tiago
From tiagoantao at gmail.com Thu Jul 19 13:09:29 2007
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 19 Jul 2007 14:09:29 +0100
Subject: [Biopython-dev] PopGen Documentation
Message-ID: <6d941f120707190609o4f5f7412x94851295865ba22b@mail.gmail.com>
Hi All,
Following Peter's suggestion, I had a closer look at the
documentation, and, if nobody opposes, I would like to add a new
subsection between PDB and Miscellaneous on the cookbook chapter, Like
this
4.10 Going 3D: The PDB module
4.11 PopGen: Population genetics (and genomics)
4.12 Miscellaneous
Tiago
On 7/18/07, Peter wrote:
> Tiago Ant?o wrote:
> > Hi!
> >
> > Starting today I will begin putting code on CVS regarding Population
> > Genetics stuff...
> > I will be putting everything below a PopGen directory in Bio.
> > Everything except tests, of course ;)
>
> Sounds good :)
>
> If you can write some introductory text to add to the
> cookbook/tutorial that would be even better. If you are not familiar
> with LaTeX, then just write it up in plain text and I could add that
> to the tutorial with suitable mark-up/formatting on your behalf.
>
> This may be easier to do in chunks as you add new code, or in a large
> batch later on - up to you.
>
> Peter
>
From bugzilla-daemon at portal.open-bio.org Sat Jul 21 15:28:49 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:28:49 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707211528.l6LFSnBk031498@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:28 EST -------
In biopython/Bio/EUtils, the shebang line was specified in sourcegen.py. I
fixed that line and the shebang lines in the other *.py files under
biopython/Bio/EUtils. Can we close this bug?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Jul 21 15:47:32 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 21 Jul 2007 11:47:32 -0400
Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF
folder after the install
In-Reply-To:
Message-ID: <200707211547.l6LFlWxU032394@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2291
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:47 EST -------
I have modified setup.py so that Bio.PDB.mmCIF is always a module (but not
necessarily with the MMCIFlex module; users still need to modify setup.py to
include it). With Bio.PDB.mmCIF always present, the Bio/PDB/mmCIF/__init__.py
file is no longer lost.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 22 08:30:11 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 04:30:11 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707220830.l6M8UB6d006746@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-22 04:30 EST -------
Regarding comment 8, after changing sourcegen.py were you able to regenerate
all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?
Anyway - that should leave us with consistent shebang/hashbang lines :)
Unless we also want to remove any surplus lines, and decide if all or none of
the unit tests should have them, then this bug looks done.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Jul 22 09:53:46 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 22 Jul 2007 05:53:46 -0400
Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup
In-Reply-To:
Message-ID: <200707220953.l6M9rkap010929@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2269
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2007-07-22 05:53 EST -------
> Regarding comment 8, after changing sourcegen.py were you able to regenerate
> all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand?
I fixed them by hand. The fixed sourcegen.py should result in the same
biopython/Bio/EUtils/*.py files as I created by hand. I tried regenerating
these files automatically, but that didn't work for me. At some point, somebody
should figure out how the biopython/Bio/EUtils code works.
> Unless we also want to remove any surplus lines, and decide if all or none of
> the unit tests should have them, then this bug looks done.
Since Python itself does not seem to have a clear rule as to which files should
have a shebang line, it is not obvious which Biopython files should have one.
If somebody really wants to fix this, it's probably better to discuss such an
issue on the mailing list first. As the issue raised by the original bug report
has been resolved, I am closing this bug.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mdehoon at c2b2.columbia.edu Sun Jul 22 10:28:22 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 22 Jul 2007 19:28:22 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
files with one record)
In-Reply-To: <4693E5FE.708@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk>
Message-ID: <46A33146.7030405@c2b2.columbia.edu>
Peter wrote:
> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
>
Let's discuss the Bio.Align.Alignment class first, and then decide how
to parse alignment files.
Currently, the alignment class holds a list of SeqRecord objects:
class Alignment:
...
def __init__(self, alphabet):
...
# hold everything at a list of seq record objects
self._records = []
To get access to self_record, the Alignment class has some accessor
functions:
def get_all_seqs(self):
...
return self._records
def get_seq_by_num(self, number):
...
return self._records[number].seq
A cleaner way to do this is to let the class Alignment inherit from
list. This also allows us to use all list methods on Alignment objects.
For example, we can iterate over them, as suggested in this bug report:
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
Any objections against letting Alignment inherit from list?
--Michiel
From salish at picasso.ucsf.edu Sun Jul 22 18:27:58 2007
From: salish at picasso.ucsf.edu (Howard Salis)
Date: Sun, 22 Jul 2007 11:27:58 -0700
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
files with one record)
In-Reply-To: <46A33146.7030405@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk>
<46A33146.7030405@c2b2.columbia.edu>
Message-ID: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
Hello all,
To get this same behavior, you can also create the __iter__ and next()
methods in Alignment itself.
-Howard Salis
On 7/22/07, Michiel de Hoon wrote:
> Peter wrote:
> > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
> > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
> >
> Let's discuss the Bio.Align.Alignment class first, and then decide how
> to parse alignment files.
>
> Currently, the alignment class holds a list of SeqRecord objects:
>
>
> class Alignment:
> ...
> def __init__(self, alphabet):
> ...
> # hold everything at a list of seq record objects
> self._records = []
>
> To get access to self_record, the Alignment class has some accessor
> functions:
>
> def get_all_seqs(self):
> ...
> return self._records
>
>
> def get_seq_by_num(self, number):
> ...
> return self._records[number].seq
>
> A cleaner way to do this is to let the class Alignment inherit from
> list. This also allows us to use all list methods on Alignment objects.
> For example, we can iterate over them, as suggested in this bug report:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Any objections against letting Alignment inherit from list?
>
>
> --Michiel
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From mdehoon at c2b2.columbia.edu Wed Jul 25 13:17:33 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 22:17:33 +0900
Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and
files with one record)
In-Reply-To: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu>
<9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
Message-ID: <46A74D6D.9020309@c2b2.columbia.edu>
Sure, that is possible, but that means we'd be adding methods to
Alignment in order for it to behave like a list, whereas we can get
that for free by letting the Alignment class inherit from list.
--Michiel.
Howard Salis wrote:
> Hello all,
>
>
> To get this same behavior, you can also create the __iter__ and next()
> methods in Alignment itself.
>
> -Howard Salis
>
> On 7/22/07, Michiel de Hoon wrote:
>> Peter wrote:
>>> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
>>> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
>>>
>> Let's discuss the Bio.Align.Alignment class first, and then decide how
>> to parse alignment files.
>>
>> Currently, the alignment class holds a list of SeqRecord objects:
>>
>>
>> class Alignment:
>> ...
>> def __init__(self, alphabet):
>> ...
>> # hold everything at a list of seq record objects
>> self._records = []
>>
>> To get access to self_record, the Alignment class has some accessor
>> functions:
>>
>> def get_all_seqs(self):
>> ...
>> return self._records
>>
>>
>> def get_seq_by_num(self, number):
>> ...
>> return self._records[number].seq
>>
>> A cleaner way to do this is to let the class Alignment inherit from
>> list. This also allows us to use all list methods on Alignment objects.
>> For example, we can iterate over them, as suggested in this bug report:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Any objections against letting Alignment inherit from list?
>>
>>
>> --Michiel
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 13:34:02 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 14:34:02 +0100
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A74D6D.9020309@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
<46A74D6D.9020309@c2b2.columbia.edu>
Message-ID: <46A7514A.1090405@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Sure, that is possible, but that means we'd be adding methods to
> Alignment in order for it to behave like a list, whereas we can get
> that for free by letting the Alignment class inherit from list.
>
> --Michiel.
Personally I see an alignment as both an array of characters (i.e. amino
acid residues or nucleotides), and a list of sequences.
In the same way that a Numeric or NumPy array lets you iterate over
rows, yet also access individual elements, we could allow iteration of
SeqRecords and also allow access to individual letters.
Peter
From mdehoon at c2b2.columbia.edu Wed Jul 25 14:44:56 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Wed, 25 Jul 2007 23:44:56 +0900
Subject: [Biopython-dev] Bio.AlignIO
In-Reply-To: <46A7514A.1090405@maubp.freeserve.co.uk>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com>
<46A74D6D.9020309@c2b2.columbia.edu>
<46A7514A.1090405@maubp.freeserve.co.uk>
Message-ID: <46A761E8.5080909@c2b2.columbia.edu>
Peter wrote:
> Personally I see an alignment as both an array of characters (i.e. amino
> acid residues or nucleotides), and a list of sequences.
>
> In the same way that a Numeric or NumPy array lets you iterate over
> rows, yet also access individual elements, we could allow iteration of
> SeqRecords and also allow access to individual letters.
How about the following:
-Iterators iterate for the SeqRecords in the alignment
-An index of the form [xxx] returns the corresponding SeqRecord
-An index of the form [xxx:yyy:zzz] returns an Alignment object
containing the SeqRecords in rows [xxx:yyy:zzz]
(compare to the current method get_all_seqs()).
-An index of the form [xxx,:] returns the Seq object of the SeqRecord at
xxx (this is currently done by the get_seq_by_num() method).
-An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects
-An index of the form [:,www] returns a string containing the characters
at column www (which is currently done by the get_column method)
-An index of the form [xxx:yyy:zzz,www] returns a string containing the
characters at column www using only the rows xxx:yyy:zzz.
-An index of the form [xxx,www] returns a string containing the
character of the sequence in row xxx at column www.
This is more-or-less how Numerical Python arrays work, except that we'll
be returning SeqRecord/Seq/string objects depending on the indices.
--Michiel.
From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 16:10:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Wed, 25 Jul 2007 17:10:43 +0100
Subject: [Biopython-dev] Improving the Alignment object. Was Bio.AlignIO
In-Reply-To: <46A761E8.5080909@c2b2.columbia.edu>
References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> <46A7514A.1090405@maubp.freeserve.co.uk>
<46A761E8.5080909@c2b2.columbia.edu>
Message-ID: <46A77603.1030101@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>> Personally I see an alignment as both an array of characters (i.e. amino
>> acid residues or nucleotides), and a list of sequences.
>>
>> In the same way that a Numeric or NumPy array lets you iterate over
>> rows, yet also access individual elements, we could allow iteration of
>> SeqRecords and also allow access to individual letters.
>
> How about the following:
>
> -Iterators iterate for the SeqRecords in the alignment
I Agree. And this is trivial to implement without needing the element
access/splicing support.
As to element access, we've been thinking along similar lines :)
Its just that with all the different special cases, there are lots of
different possible return types!
> -An index of the form [xxx] returns the corresponding SeqRecord
> -An index of the form [xxx:yyy:zzz] returns an Alignment object
> containing the SeqRecords in rows [xxx:yyy:zzz]
> (compare to the current method get_all_seqs()).
I agree. This is essential to make an alignment act like a list of
SeqRecord objects when only a one-dimensional index is given.
> -An index of the form [xxx,:] returns the Seq object of the SeqRecord at
> xxx (this is currently done by the get_seq_by_num() method).
> -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects
I'm not immediately convinced about returning Seq objects here. I might
expect indices like [xxx,:] to return a SeqRecord (not a Seq) and
[xxx:yyy:zzz,:] to return a sub-alignment (not a list of Seq objects).
> -An index of the form [:,www] returns a string containing the characters
> at column www (which is currently done by the get_column method)
> -An index of the form [xxx,www] returns a string containing the
> character of the sequence in row xxx at column www.
Those look fine - however we might want to return Seq objects rather
than strings.
> -An index of the form [xxx:yyy:zzz,www] returns a string containing
> the characters at column www using only the rows xxx:yyy:zzz.
Or a sub alignment? See later...
> This is more-or-less how Numerical Python arrays work, except that we'll
> be returning SeqRecord/Seq/string objects depending on the indices.
For comparison, that is what I had been thinking:
* [r,c] means one element is requested, return a single character string
* [r] or [r,:] means one row is requested, return a SeqRecord
* [:,c] means one column is requested, return a string (or Seq object?)
* Otherwise returns a (sub)alignment. Note that [:] or [:,:] would
return a copy of the alignment.
This would cover slicing of the column index by returning a
sub-alignment. i.e. indexes of the form [rrr, xxx:yyy:zzz] or
[rrr:ppp:qqq, xxx:yyy:zzz]
I'm not sure if requests for part of a single row or column like [rrr,
xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning
sub-alignments or as special cases (strings/Seq and Seq/SeqRecord
respectively?).
Peter
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 14:52:38 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 10:52:38 -0400
Subject: [Biopython-dev] [Bug 2340] New: SProt.py fails to parse the current
Swiss-Prot version 54.0
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
Summary: SProt.py fails to parse the current Swiss-Prot version
54.0
Product: Biopython
Version: 1.43
Platform: All
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: gould at embl.de
Hi,
I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
swiss-prot record but the parser just seems to bomb out not throwing an error
of where it actually fails. I'm guessing it has to do with the Release 54.0 of
24-Jul-07 of UniPROT with the addition of the new line type PE??
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 15:46:36 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 11:46:36 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261546.l6QFkaGq022472@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 11:46 EST -------
Hi Kate,
Could you give us the URL of one or two specific SwissProt files you're having
trouble with.
Also how are you trying to read the SwissProt files? e.g. with
Bio.SeqIO.parse()?
If you could include the python error too, that could be helpful. Thanks.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 16:06:15 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:06:15 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261606.l6QG6FkE023264@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #2 from gould at embl.de 2007-07-26 12:06 EST -------
(In reply to comment #0)
> Hi,
>
> I'm running on a red hat linux box on python 2.3.4 and am trying to parse any
> swiss-prot record but the parser just seems to bomb out not throwing an error
> of where it actually fails. I'm guessing it has to do with the Release 54.0 of
> 24-Jul-07 of UniPROT with the addition of the new line type PE??
>
(In reply to comment #1)
> Hi Kate,
>
> Could you give us the URL of one or two specific SwissProt files you're having
> trouble with.
>
> Also how are you trying to read the SwissProt files? e.g. with
> Bio.SeqIO.parse()?
>
> If you could include the python error too, that could be helpful. Thanks.
>
> Peter
>
hi
the following snippet of code is where the error occurs(this used to work no
problem before something changed in the last day or two I guess)
def getSequence(self,acc):
""" This method retrieves the most recent annotated sequence from the ExPASy
server for a given accession number. """
from Bio.WWW import ExPASy
from Bio.SwissProt import SProt
from Bio import File
if acc != '':
try:
results = ExPASy.get_sprot_raw(acc.strip()).read()
sp_parser = SProt.RecordParser()
sp_iterator = SProt.Iterator(File.StringHandle(results),
sp_parser)
Record = sp_iterator.next()
return Record.sequence.strip()
except:
return -1
else:
return acc
breaks at line : Record = sp_iterator.next() but doesn't print any error to
terminal....
some examples of accessions nrs used are: P01100, P12522 etc
thanks
Kate
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 16:32:31 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:32:31 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261632.l6QGWVrC024560@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:32 EST -------
Confirmeing bug - it is due to the new PE line (protein evidence).
The reason you didn't see the error is in your example the parser is wrapped in
a try ... except ... clause.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Jul 26 16:51:45 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 26 Jul 2007 12:51:45 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707261651.l6QGpja8025622@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:51 EST -------
I think I have fixed this - at least your example code now works.
You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
CVS, which you can download here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython
Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
to put things back.
Please test this and report back.
NOTE - The fix just makes the parser aware of the new PE line, and ignores it.
It doesn't (yet) do anything useful with the information it contains!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 27 06:46:35 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 02:46:35 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707270646.l6R6kZaI001699@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
------- Comment #5 from gould at embl.de 2007-07-27 02:46 EST -------
(In reply to comment #4)
> I think I have fixed this - at least your example code now works.
>
> You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from
> CVS, which you can download here:
>
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython
>
> Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want
> to put things back.
>
> Please test this and report back.
>
> NOTE - The fix just makes the parser aware of the new PE line, and ignores it.
> It doesn't (yet) do anything useful with the information it contains!
>
Yes it has done the trick and all works OK again. thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Jul 27 07:54:14 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 03:54:14 -0400
Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current
Swiss-Prot version 54.0
In-Reply-To:
Message-ID: <200707270754.l6R7sEnm007432@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2340
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 03:54 EST -------
Great :)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From kosa at genesilico.pl Fri Jul 27 10:47:10 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 12:47:10 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
Message-ID: <46A9CD2E.6080402@genesilico.pl>
Hi,
From the viewpoint of the enduser we would like python Alignment object
to behave outside as an array so we could get slices, columns,
sequences, their fragments, whatever we want etc. The most intuitive and
clear (certainly much better than not very clear indexes like
[xxx:yyy:zzz]) for the user is the following.
[A:B][X:Y] - general syntax of indices. This supports almost everything.
Several examples of usage and proposed outputs:
[:][:] - returns an alignment or its copy (as Alignment object)
[:][x:y] - returns slice of the alignment (as Alignment object; aln of
all sequences and residues corresponding to columns from x and y)
[a:b][:] - returns the aln of seqs from a to b (as Alignment object)
[a:b][x:y] - returns the slice and subalignment (as Alignment object)
[a:a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as Alignment object)
[a][x:y] - returns slice of the single sequence (residues x to y of
sequence a) (as a String)
[a:][x:y] and similar combinations - returns the slice and subalignment,
sequences from a to the last are included (as Alignment object)
[:][x] - returns single column (as a String object? string here could be
very useful)
[:][x:x] - returns single column (as Alignment object)
[a] - returns single sequence (as a SeqRecord object)
[a:a] and [a:a][:] - returns single sequence (as Alignment object)
[m][n] - returns n-th element of sequence m (as a String)
Disputable could be that different but similar sets of indices return
different types of objects (ex. [:][x] would return a column as string
while [:][x:x] would return a column as Alignment object, but in my
opinion it would just extend the usability).
The only problem is an implementation of such calls but it depends on
what type of object the Alignment object will be.
What do you think?
Cheers,
Jan Kosinski
Grzegorz Papaj
:.
From bugzilla-daemon at portal.open-bio.org Fri Jul 27 12:51:10 2007
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 27 Jul 2007 08:51:10 -0400
Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more
In-Reply-To:
Message-ID: <200707271251.l6RCpAIg025706@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 08:51 EST -------
Created an attachment (id=721)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=721&action=view)
Patch for Bio/Align/Generic.py to add __getitem__ method
This patch adds a __getitem__ method, a small "mini test" when running the
module directly, and updates the doc strings. This gives SeqRecord iteration
"for free" (without an explicit __iter__ method).
As discussed on the mailing list, this allows an Alignment object to be treated
as a list of SeqRecord objects or as an array of character strings - plus
extract whole columns as strings.
Quoting the proposed __getitem__ doc string:
Depending on the indices, you can get a SeqRecord objects
(representing a single row), strings (for a single columns or
single characters) or another alignment (representing some or
part of the alignment).
align[r,c] gives a single character as a string
align[r] gives a SeqRecord
align[:,c] gives a column as a string
align[:] and align[:,:] give a copy of the alignment
Anything else gives a sub alignment, e.g.
align[0:2] or align[0:2,:] uses only row 0 and 1
align[:,1:3] uses only columns 1 and 2
align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2
Feedback welcome - either here, or on the developers' mailing list. Thanks
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 12:18:21 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 13:18:21 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9CD2E.6080402@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
Message-ID: <46A9E28D.40609@maubp.freeserve.co.uk>
Jan Kosinski wrote:
> Hi,
>
> From the viewpoint of the enduser we would like python Alignment object
> to behave outside as an array so we could get slices, columns,
> sequences, their fragments, whatever we want etc. The most intuitive and
> clear (certainly much better than not very clear indexes like
> [xxx:yyy:zzz]) for the user is the following.
>
> [A:B][X:Y] - general syntax of indices. This supports almost everything.
I think Michiel and I were suggesting [A:B,X:Y] or rather [A:B:C,X:Y:Z]
to be fully general, rather than [A:B][X:Y] or [A:B:C][X:Y:Z]
i.e. [arg1, arg2] rather than [arg1][arg2]
This is an important point, as in the first case the __getitem__ method
of the alignment is called once (with both arguments). In the second
case, the __getitem__ method is called with arg1, and may return a
SeqRecord or an alignment - and this object's __getitem__ method is
called with arg2.
As written, many of your cases appear to be impossible - but using the
[arg1,arg2] we can get close.
I've got a working bit of code put together now which I'll attached to
bug 1944 soon.
http://bugzilla.open-bio.org/show_bug.cgi?id=1944
Peter
From kosa at genesilico.pl Fri Jul 27 14:13:24 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:13:24 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46A9FD84.4080502@genesilico.pl>
Hi,
Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not
clear while the [A:B][X:Y] syntax is clear and sufficient.
We had another discussion in the lab about that Alignment object should
not store records in the list but rather in a dictionary (but keeping
information about sequence order ) or so. What is you reasoning for
making Alignment object a list of SeqRecord objects?
One should carefully think about design of the Alignment class since it
will influence all further steps. As now the class is in its infancy
there is a very good moment for thinking what the Alignment class is for
and what it should support. For instance, the Alignment object should
support changing characters in the alignment without a need of copying
it (using aln[a][x] = "D"). Can it be done now with Alignment which is
a list of SeqRecord objects with sequences implemented as immutable Seq
objects ?
Cheers,
Jan Kosinski
Peter wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> From the viewpoint of the enduser we would like python Alignment object
>> to behave outside as an array so we could get slices, columns,
>> sequences, their fragments, whatever we want etc. The most intuitive and
>> clear (certainly much better than not very clear indexes like
>> [xxx:yyy:zzz]) for the user is the following.
>>
>> [A:B][X:Y] - general syntax of indices. This supports almost everything.
>
> I think Michiel and I were suggesting [A:B,X:Y] or rather
> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or
> [A:B:C][X:Y:Z]
>
> i.e. [arg1, arg2] rather than [arg1][arg2]
>
> This is an important point, as in the first case the __getitem__
> method of the alignment is called once (with both arguments). In the
> second case, the __getitem__ method is called with arg1, and may
> return a SeqRecord or an alignment - and this object's __getitem__
> method is called with arg2.
>
> As written, many of your cases appear to be impossible - but using the
> [arg1,arg2] we can get close.
>
> I've got a working bit of code put together now which I'll attached to
> bug 1944 soon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
> Peter
>
>
> :.
>
:.
From kosa at genesilico.pl Fri Jul 27 14:35:15 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Fri, 27 Jul 2007 16:35:15 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl>
<46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA02A3.30000@genesilico.pl>
Hi,
Sorry for a typo ;-) Of course it should read:
... while the [A:B,X:Y] syntax is clear and sufficient."
Cheers,
Janek
Jan Kosinski wrote:
> Hi,
>
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is
> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> We had another discussion in the lab about that Alignment object
> should not store records in the list but rather in a dictionary (but
> keeping information about sequence order ) or so. What is you
> reasoning for making Alignment object a list of SeqRecord objects?
> One should carefully think about design of the Alignment class since
> it will influence all further steps. As now the class is in its
> infancy there is a very good moment for thinking what the Alignment
> class is for and what it should support. For instance, the Alignment
> object should support changing characters in the alignment without a
> need of copying it (using aln[a][x] = "D"). Can it be done now with
> Alignment which is a list of SeqRecord objects with sequences
> implemented as immutable Seq objects ?
>
> Cheers,
> Jan Kosinski
>
>
> Peter wrote:
>> Jan Kosinski wrote:
>>> Hi,
>>>
>>> From the viewpoint of the enduser we would like python Alignment
>>> object
>>> to behave outside as an array so we could get slices, columns,
>>> sequences, their fragments, whatever we want etc. The most intuitive
>>> and
>>> clear (certainly much better than not very clear indexes like
>>> [xxx:yyy:zzz]) for the user is the following.
>>>
>>> [A:B][X:Y] - general syntax of indices. This supports almost
>>> everything.
>>
>> I think Michiel and I were suggesting [A:B,X:Y] or rather
>> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or
>> [A:B:C][X:Y:Z]
>>
>> i.e. [arg1, arg2] rather than [arg1][arg2]
>>
>> This is an important point, as in the first case the __getitem__
>> method of the alignment is called once (with both arguments). In the
>> second case, the __getitem__ method is called with arg1, and may
>> return a SeqRecord or an alignment - and this object's __getitem__
>> method is called with arg2.
>>
>> As written, many of your cases appear to be impossible - but using
>> the [arg1,arg2] we can get close.
>>
>> I've got a working bit of code put together now which I'll attached
>> to bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
>> Peter
>>
>>
>> :.
>>
>
>
:.
From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 17:11:03 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Fri, 27 Jul 2007 18:11:03 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AA2727.103@maubp.freeserve.co.uk>
Jan Kosinski wrote:
> We had another discussion in the lab about that Alignment object should
> not store records in the list but rather in a dictionary (but keeping
> information about sequence order ) or so. What is you reasoning for
> making Alignment object a list of SeqRecord objects?
In a sense the Bio.Align.Generic.Alignment object always was a list of
SeqRecords (if you look at the internal implementation that is), and I
hadn't stopped to really question it. I like having list like behaviour
and exploit this in a lot of my code dealing with alignments.
The are some nice things about having dictionary like behaviour in an
alignment class, but unless a notional sequence order is preserved, this
breaks the array of characters model.
Also, using a dictionary like alignment would force the user to specify
unique keys for each record (e.g. the record.id) which is something the
current list-like-alignment does not require.
Perhaps we could have a "dictionary like" sub class of Alignment where
the __getitem__ method would allow a record identifier in place of a row
index:
print aln["P3454"]
print aln["P3454", 20]
instead or as well as:
print aln[10]
print aln[10, 20]
> One should carefully think about design of the Alignment class since it
> will influence all further steps. As now the class is in its infancy
> there is a very good moment for thinking what the Alignment class is for
> and what it should support.
I had viewed the new __getitem__ method as a backwards compatible
enhancement of the existing stable (but rather limited)
Bio.Generic.Alignment class. That's not to say we can't design a new
class from scratch - I just prefer gradual improvements without breaking
existing usage.
I am particularly keen to allow splicing of alignments. For example, you
could select the conserved core of an alignment by removing the left
most 10 columns and the right most ten columns:
align_core = aln[:,10:-10]
> For instance, the Alignment object should
> support changing characters in the alignment without a need of copying
> it (using aln[a,x] = "D"). Can it be done now with Alignment which is
> a list of SeqRecord objects with sequences implemented as immutable Seq
> objects ?
No, right now you can't easily edit sequences in a Bio.Generic.Alignment
(even with the proposed change) as it is implemented using immutable Seq
objects. I personally haven't needed to edit an alignment like this. Is
this something you want to do often?
To me the obvious way to handle this is to have a MutableAlignment
sub-class, where editing individual elements with aln[r,c] = "D" would
be supported (possibly implemented using the MutableSeq class internally
rather than the immutable Seq class).
On a related point, I was planning to raise the following suggestion in
the future - adding alignments, like this:
combined_aln = aln1 + aln2
e.g. aln1 had 5 rows of length 10, and aln2 had 5 rows of length 15,
then the result of aln1+aln2 would have 5 rows of length 25.
Alignment addition would only be defined for alignments with the same
number of rows (perhaps also restricted to the same sequence type, and
row weights?). The result would contain the same number of rows, where
each sequence was the concatenation of the corresponding two rows in the
input alignments. I'd suggest concatenating the record.id's (if
different) however one could argue that it would be better to insist the
user had made sure the two alignments had consistent identifiers.
An example of where this could be used is taking alignments of multiple
sets of homologous genes, sorting them to use the same species order,
and then creating a concatenated alignment for robust phylogenetic tree
construction.
Peter
From mdehoon at c2b2.columbia.edu Sat Jul 28 02:57:05 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 11:57:05 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9FD84.4080502@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl>
Message-ID: <46AAB081.30609@c2b2.columbia.edu>
Jan Kosinski wrote:
> Hi,
>
> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not
> clear while the [A:B][X:Y] syntax is clear and sufficient.
Python lists, tuples, and strings support [A:B:C], and Numerical Python
2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment should
not support this format.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 03:10:06 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 12:10:06 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl>
<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAB38E.50009@c2b2.columbia.edu>
Peter wrote:
> Perhaps we could have a "dictionary like" sub class of Alignment where
> the __getitem__ method would allow a record identifier in place of a row
> index:
>
> print aln["P3454"]
> print aln["P3454", 20]
>
> instead or as well as:
>
> print aln[10]
> print aln[10, 20]
"as well as" would break if a user decides to use an integer as a key in
the dictionary. A safer approach would be to define a method
specifically for dictionary-like access. Something like:
print aln[10]
print aln[10,20]
for list-like access, and
print aln.get("P3454")
for dictionary-like access.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 04:11:03 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:11:03 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl>
<46A9E28D.40609@maubp.freeserve.co.uk>
Message-ID: <46AAC1D7.8030208@c2b2.columbia.edu>
Peter wrote:
> I've got a working bit of code put together now which I'll attached to
> bug 1944 soon.
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>
For the most part, I agree with the functionality in this patch. I have
three suggestions though:
>>> aln = Alignment(alphabet)
# Suggestion 1: We should allow creating an Alignment without specifying
an alphabet
>>> aln.add_sequence("seq1", "ATCGTTGC")
>>> aln.add_sequence("seq2", "ATCCTTGC")
>>> aln.add_sequence("seq3", "ATCCGTGC")
>>> aln[0]
SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
name='', description='seq1', dbxrefs=[])
# Suggestion 2: I would expect "seq1" as the id rather than the description
>>> aln[:2]
# OK
>>> aln[:,4]
'TTG'
# OK
>>> aln[2,:]
# Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment
consisting of a single sequence doesn't make much sense.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 04:20:24 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sat, 28 Jul 2007 13:20:24 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl>
<46AA2727.103@maubp.freeserve.co.uk>
Message-ID: <46AAC408.2050703@c2b2.columbia.edu>
Peter wrote:
>> For instance, the Alignment object should
>> support changing characters in the alignment without a need of copying
>> it (using aln[a,x] = "D"). Can it be done now with Alignment which is
>> a list of SeqRecord objects with sequences implemented as immutable Seq
>> objects ?
>
....
>
> To me the obvious way to handle this is to have a MutableAlignment
> sub-class, where editing individual elements with aln[r,c] = "D" would
> be supported (possibly implemented using the MutableSeq class internally
> rather than the immutable Seq class).
>
I don't think we'd need a separate MutableAlignment for that. An
Alignment is a list of sequences and is therefore mutable. If we add a
__setitem__ method to the Alignment class, then this method can take
care of constructing a new sequence and put it in the appropriate row.
--Michiel.
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 10:04:04 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 11:04:04 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAC1D7.8030208@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46AAC1D7.8030208@c2b2.columbia.edu>
Message-ID: <46AB1494.301@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>> I've got a working bit of code put together now which I'll attached to
>> bug 1944 soon.
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=1944
>>
> For the most part, I agree with the functionality in this patch. I have
> three suggestions though:
>
> >>> aln = Alignment(alphabet)
> # Suggestion 1: We should allow creating an Alignment without specifying
> an alphabet
That would mean changing the existing __init__ from:
def __init__(self, alphabet):
to something like:
def __init__(self, alphabet=single_letter_alphabet):
with this import statement added:
from Bio.Alphabet import single_letter_alphabet
This seems like a good idea, and shouldn't break any existing code either.
> >>> aln.add_sequence("seq1", "ATCGTTGC")
> >>> aln.add_sequence("seq2", "ATCCTTGC")
> >>> aln.add_sequence("seq3", "ATCCGTGC")
> >>> aln[0]
> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
> name='', description='seq1', dbxrefs=[])
> # Suggestion 2: I would expect "seq1" as the id rather than the description
I agree with you here - this is the historic behaviour of the
add_sequence method which actually creates a SeqRecord from the strings
it is given. I would suggest it populate the record.id but for backwards
compatibility still populate the record.description in case anyone is
still using that.
We also could add an add_record method to the alignment object which
takes a SeqRecord, plus optional weight (and start and end?). Marc
Colosimo also made this point on bug 1944 (although I don't like his
mixed case method name).
> >>> aln[:2]
>
> # OK
> >>> aln[:,4]
> 'TTG'
> # OK
> >>> aln[2,:]
>
> # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment
> consisting of a single sequence doesn't make much sense.
I'll have a closer look, but as aln[2] returns a single SeqRecord maybe
aln[2,:] should do that too - rather than returning a string.
Peter
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 13:14:43 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 14:14:43 +0100
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk>
<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB4143.5070406@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of copying
>>> it (using aln[a,x] = "D"). Can it be done now with Alignment which is
>>> a list of SeqRecord objects with sequences implemented as immutable Seq
>>> objects ?
> ....
>> To me the obvious way to handle this is to have a MutableAlignment
>> sub-class, where editing individual elements with aln[r,c] = "D" would
>> be supported (possibly implemented using the MutableSeq class internally
>> rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An
> Alignment is a list of sequences and is therefore mutable. If we add a
> __setitem__ method to the Alignment class, then this method can take
> care of constructing a new sequence and put it in the appropriate row.
>
So rather than editing one character of a MutableSeq, we could replace
one immutable Seq object with a new immutable Seq object where one
character was different? That would work - sounds a little slow, but
certainly possible.
Peter
From mdehoon at c2b2.columbia.edu Sat Jul 28 15:15:49 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:15:49 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46AAC1D7.8030208@c2b2.columbia.edu>
<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5DA5.6050604@c2b2.columbia.edu>
# Current method to add a row to the alignment:
>>> aln.add_sequence("seq1", "ATCGTTGC")
...
Peter wrote:
> We also could add an add_record method to the alignment object which
> takes a SeqRecord, plus optional weight (and start and end?). Marc
> Colosimo also made this point on bug 1944 (although I don't like his
> mixed case method name).
This is Marc Colosimo's suggestion for adding a SeqRecord:
def addSeqRecord(self, seqRec):
"""Add a Sequence Record to the Alignment
@param seqRec: a sequence record (SeqRecord) to add.
"""
if isinstance(seqRec, SeqRecord):
self._records.append(seqRec)
else:
raise TypeError("sequence is NOT a SeqRecord Object")
Since an Alignment is essentially a list of SeqRecords, I propose that
we call the method to add a row to this list "append". In addition, this
method should be able to take a SeqRecord, a Seq object, or a plain
string. Something like this:
def append(self, sequence):
if isinstance(sequence, SeqRecord):
self._records.append(sequence)
elif isinstance(sequence, Seq):
self._records.append(SeqRecord(sequence))
elif isinstance(sequence, str):
self._records.append(SeqRecord(Seq(sequence)))
else:
raise TypeError("sequence should be a string, a Seq Object,
or a SeqRecord object")
This method can be generalized to allow a descriptor, weight, start, end
end, just like in the current add_sequence method. Then we can replace
add_sequence and addSeqRecord by a single append method.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 15:17:52 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:17:52 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46AAC1D7.8030208@c2b2.columbia.edu>
<46AB1494.301@maubp.freeserve.co.uk>
Message-ID: <46AB5E20.5090605@c2b2.columbia.edu>
Peter wrote:
> Michiel de Hoon wrote:
>> >>> aln.add_sequence("seq1", "ATCGTTGC")
>> >>> aln[0]
>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
>> name='', description='seq1', dbxrefs=[])
>> # Suggestion 2: I would expect "seq1" as the id rather than the
>> description
>
> I agree with you here - this is the historic behaviour of the
> add_sequence method which actually creates a SeqRecord from the strings
> it is given. I would suggest it populate the record.id but for backwards
> compatibility still populate the record.description in case anyone is
> still using that.
>
That sounds good to me.
--Michiel.
From mdehoon at c2b2.columbia.edu Sat Jul 28 15:23:51 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 00:23:51 +0900
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AB4143.5070406@maubp.freeserve.co.uk>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk>
<46AAC408.2050703@c2b2.columbia.edu>
<46AB4143.5070406@maubp.freeserve.co.uk>
Message-ID: <46AB5F87.1090506@c2b2.columbia.edu>
Peter wrote:
> Michiel de Hoon wrote:
>> Peter wrote:
>>>> For instance, the Alignment object should
>>>> support changing characters in the alignment without a need of
>>>> copying it (using aln[a,x] = "D"). Can it be done now with
>>>> Alignment which is a list of SeqRecord objects with sequences
>>>> implemented as immutable Seq objects ?
>> ....
>>> To me the obvious way to handle this is to have a MutableAlignment
>>> sub-class, where editing individual elements with aln[r,c] = "D"
>>> would be supported (possibly implemented using the MutableSeq class
>>> internally rather than the immutable Seq class).
>>>
>> I don't think we'd need a separate MutableAlignment for that. An
>> Alignment is a list of sequences and is therefore mutable. If we add a
>> __setitem__ method to the Alignment class, then this method can take
>> care of constructing a new sequence and put it in the appropriate row.
>>
> So rather than editing one character of a MutableSeq, we could replace
> one immutable Seq object with a new immutable Seq object where one
> character was different? That would work - sounds a little slow, but
> certainly possible.
>
At first, I also thought that that would be slow, especially for long
sequences. But in practice, it's surprisingly fast. Unless somebody
wants to edit an alignment of chromosome-size sequences, we probably
won't run into a speed problem.
--Michiel.
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 16:00:34 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 17:00:34 +0100
Subject: [Biopython-dev] adding rows to an alignment object
In-Reply-To: <46AB5DA5.6050604@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk>
<46AB5DA5.6050604@c2b2.columbia.edu>
Message-ID: <46AB6822.6090706@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Since an Alignment is essentially a list of SeqRecords, I propose that
> we call the method to add a row to this list "append".
Sounds very sensible.
> In addition, this method should be able to take a SeqRecord, a Seq
> object, or a plain string.
Do you really think we should complicate things like this? I would just
accept SeqRecord objects (with optional start/end/weight).
> Something like this:
>
> def append(self, sequence):
> if isinstance(sequence, SeqRecord):
> self._records.append(sequence)
> elif isinstance(sequence, Seq):
> self._records.append(SeqRecord(sequence))
> elif isinstance(sequence, str):
> self._records.append(SeqRecord(Seq(sequence)))
> else:
> raise TypeError("sequence should be a string, a Seq Object,
> or a SeqRecord object")
One minor point - we should use the alignment's alphabet when building a
Seq object from a string. Perhaps we should even check the alphabet when
asked to append a SeqRecord or Seq object...
> This method can be generalized to allow a descriptor, weight, start,
> end, just like in the current add_sequence method.
Where the descriptor is expected for Seq and string input, and used as
the SeqRecord's id?
I would personally check the length matches the rest of the alignment
(something the current add_sequence method doesn't do) otherwise its
very easy to get a malformed alignment where some sequences are longer
than others.
Also, I would leave the existing .add_sequence() method in place, but
update its docstring to encourage use of .append() instead.
Peter
From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 15:49:11 2007
From: biopython-dev at maubp.freeserve.co.uk (Peter)
Date: Sat, 28 Jul 2007 16:49:11 +0100
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB5E20.5090605@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk>
<46AB5E20.5090605@c2b2.columbia.edu>
Message-ID: <46AB6577.6050708@maubp.freeserve.co.uk>
Michiel de Hoon wrote:
> Peter wrote:
>> Michiel de Hoon wrote:
>>> >>> aln.add_sequence("seq1", "ATCGTTGC")
>>> >>> aln[0]
>>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='',
>>> name='', description='seq1', dbxrefs=[])
>>> # Suggestion 2: I would expect "seq1" as the id rather than the
>>> description
>> I agree with you here - this is the historic behaviour of the
>> add_sequence method which actually creates a SeqRecord from the strings
>> it is given. I would suggest it populate the record.id but for backwards
>> compatibility still populate the record.description in case anyone is
>> still using that.
>>
> That sounds good to me.
Good. Done, CVS revision 1.6 of file Bio/Align/Generic.py
Peter
From kosa at genesilico.pl Sat Jul 28 16:53:04 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:53:04 +0200
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AAB081.30609@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
Message-ID: <46AB7470.6010006@genesilico.pl>
Hi,
I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of
alignments. Ins't [A:B,X:Y] sufficient?
Janek
Michiel de Hoon wrote:
> Jan Kosinski wrote:
>> Hi,
>>
>> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is
>> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is
>> not clear while the [A:B][X:Y] syntax is clear and sufficient.
>
> Python lists, tuples, and strings support [A:B:C], and Numerical
> Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment
> should not support this format.
>
> --Michiel.
>
> :.
>
:.
From kosa at genesilico.pl Sat Jul 28 16:55:33 2007
From: kosa at genesilico.pl (Jan Kosinski)
Date: Sat, 28 Jul 2007 18:55:33 +0200
Subject: [Biopython-dev] Improving the Alignment object
In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl>
<46AA2727.103@maubp.freeserve.co.uk>
<46AAC408.2050703@c2b2.columbia.edu>
Message-ID: <46AB7505.30302@genesilico.pl>
Hi,
I think the same, an alignment should be mutable and there is no need
for making two classes, mutable and not mutable.
Janek
Michiel de Hoon wrote:
> Peter wrote:
>>> For instance, the Alignment object should
>>> support changing characters in the alignment without a need of
>>> copying it (using aln[a,x] = "D"). Can it be done now with
>>> Alignment which is a list of SeqRecord objects with sequences
>>> implemented as immutable Seq objects ?
>>
> ....
>>
>> To me the obvious way to handle this is to have a MutableAlignment
>> sub-class, where editing individual elements with aln[r,c] = "D"
>> would be supported (possibly implemented using the MutableSeq class
>> internally rather than the immutable Seq class).
>>
> I don't think we'd need a separate MutableAlignment for that. An
> Alignment is a list of sequences and is therefore mutable. If we add a
> __setitem__ method to the Alignment class, then this method can take
> care of constructing a new sequence and put it in the appropriate row.
>
> --Michiel.
>
> :.
>
:.
From mdehoon at c2b2.columbia.edu Sun Jul 29 04:38:28 2007
From: mdehoon at c2b2.columbia.edu (Michiel de Hoon)
Date: Sun, 29 Jul 2007 13:38:28 +0900
Subject: [Biopython-dev] syntax of indices for future Alignment object
In-Reply-To: <46AB7470.6010006@genesilico.pl>
References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk>
<46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu>
<46AB7470.6010006@genesilico.pl>
Message-ID: <46AC19C4.1000102@c2b2.columbia.edu>
Jan Kosinski wrote:
> I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of
> alignments. Ins't [A:B,X:Y] sufficient?
>
[A:B,X:Y] may be sufficient, but does not agree with Python indices for
other objects (lists, tuples, strings). In addition, since allowing
[A:B,X:Y] only is different from usual Python usage, we'd actually end
up writing more code to specifically disallow [A:B:C,X:Y:Z].
Note also that [A:B:C,X:Y:Z] includes [A:B,X:Y] as a special case. So if
the Alignment class is written to deal with [A:B:C,X:Y:Z], but I'd tell
you that it expects [A:B,X:Y], then you wouldn't notice any difference.
Until you'd try [A:B:C,X:Y:Z] and you find out that that works too.
--Michiel.