From bugzilla-daemon at portal.open-bio.org Sun Jul 1 01:54:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 01:54:55 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707010554.l615stgK032500@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 01:54 EST ------- Sorry for the mistake. With the code for Python >= 2.4 separately, we still get an error message when installing Biopython, because Python attempts to byte-compile each module. It is not so serious, because this error is otherwise ignored. However, how about this code for Python >= 2.4: from itertools import cycle, imap return sum(imap(lambda n,c: n*ord(c.upper()), cycle(range(1,58)),seq)) % 10000 It is almost as fast as the code you now have for Python >= 2.4, but avoids having to create a separate module gcg24.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 07:02:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 07:02:47 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707011102.l61B2lHg029279@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 07:02 EST ------- Btw, I am finding that the code for Python < 2.3 is faster than the code for Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even if we avoid copying seq, I still find that it is faster than the code for Python >= 2.4. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Jul 1 08:01:00 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 01 Jul 2007 21:01:00 +0900 Subject: [Biopython-dev] TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py In-Reply-To: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com> References: <4685FCCA.4090904@c2b2.columbia.edu> <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com> Message-ID: <4687977C.70903@c2b2.columbia.edu> Peter wrote: >> Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in >> Bio/GFF/easy.py? They are currently using the old Fasta writer in >> Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can >> either update them to use the new Fasta writer, or simply remove them, >> since currently these classes are not used anywhere in Biopython. > > This is for Bug 2284 right? > http://bugzilla.open-bio.org/show_bug.cgi?id=2284 > > I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle > Actually I hadn't noticed bug 2284. I looked into this because the Biopython tests are causing DeprecationWarnings. If no users of these classes step forward, I am in favor of removing them. --Michiel. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 10:13:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 10:13:29 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707011413.l61EDTF3012907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #19 from sbassi at gmail.com 2007-07-01 10:13 EST ------- (In reply to comment #18) > Btw, I am finding that the code for Python < 2.3 is faster than the code for > Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even > if we avoid copying seq, I still find that it is faster than the code for > Python >= 2.4. OK, so leave it w/o the check for python version and use just the 2.3 code. Best, SB. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 18:38:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 18:38:55 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707012238.l61Mct1k007379@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 18:38 EST ------- Updated in CVS, using the 2.3 code without copying seq. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 19:42:14 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 19:42:14 -0400 Subject: [Biopython-dev] [Bug 2327] New: test_Cluster takes too long Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2327 Summary: test_Cluster takes too long Product: Biopython Version: 1.43 Platform: Other OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: idoerg at burnham.org When running the biopython test suite, test_Cluster takes too long. I gave up after 2 minutes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 19:55:34 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 19:55:34 -0400 Subject: [Biopython-dev] [Bug 2327] test_Cluster takes too long In-Reply-To: Message-ID: <200707012355.l61NtYcR012177@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2327 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST ------- *** This bug has been marked as a duplicate of bug 2268 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 19:55:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 19:55:36 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200707012355.l61NtaNW012196@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |idoerg at gmail.com ------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST ------- *** Bug 2327 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 3 07:03:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Jul 2007 07:03:40 -0400 Subject: [Biopython-dev] [Bug 2328] New: NCBIStandalone.blastall chokes on integer argument Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2328 Summary: NCBIStandalone.blastall chokes on integer argument Product: Biopython Version: 1.43 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: grunberg at embl.de CC: grunberg at embl.de Unlike previous versions, the current NCBIStandalone.blastall and blastpgp expect that the argument align_view is given as a string rather than an integer. So the following call worked with previous versions but now fails:: results, err = NCBIStandalone.blastall( settings.blast_bin, method, db, seqFile, expectation=e, align_view=7, ## XML output **kw) The error is raised here:: NCBIStandalone: 1788 (blastall) w, r, e = os.popen3(" ".join([blastcmd] + params)) because align_view escapes the str conversion of the other parameters in this line:: params.extend([att2param['align_view'], align_view]) This line should rather look like this:: params.extend([att2param['align_view'], str(align_view)]) I am going to attach a patch to this bugreport. Greetings, Raik -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 3 07:05:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Jul 2007 07:05:37 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707031105.l63B5bAP013190@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 ------- Comment #1 from grunberg at embl.de 2007-07-03 07:05 EST ------- Created an attachment (id=698) --> (http://bugzilla.open-bio.org/attachment.cgi?id=698&action=view) patch for Bug 2328 (NCBIStandalone.blastall / blastpgp) The patch is described in my bug report. Cheers, Raik -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 3 19:26:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Jul 2007 19:26:15 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707032326.l63NQFBB022873@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-07-03 19:26 EST ------- > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp > expect that the argument align_view is given as a string rather than an > integer. So the following call worked with previous versions but now fails:: In which previous version of Biopython did this work? Your patch looks fine, but I'd like to find out how this bug entered Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 5 09:30:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Jul 2007 09:30:32 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707051330.l65DUW2k004459@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #21 from dalloliogm at gmail.com 2007-07-05 09:30 EST ------- (In reply to comment #1) > Created an attachment (id=689) --> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) [details] > Proposed functions (CRC64 and GCG checksum) > > This could be in utils.py, but I am not sure. Maybe it could be useful to add a 'GCG checksum' attribute to the BioPython Seq object. Checksums could be used to quickly compare if two sequences are the same; but in the documentation you should state very clearly that two sequences which differ even for a single symbol (ex. AAANAAA and AAAAAAA) have different values. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 7 05:28:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 7 Jul 2007 05:28:56 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707070928.l679SuTJ010432@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 ------- Comment #3 from grunberg at embl.de 2007-07-07 05:28 EST ------- (In reply to comment #2) > > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp > > expect that the argument align_view is given as a string rather than an > > integer. So the following call worked with previous versions but now fails:: > > In which previous version of Biopython did this work? Your patch looks fine, > but I'd like to find out how this bug entered Biopython. > Sorry about the late reply... My previous Biopython installation (which didn't have the glitch) was version 1.42. Greetings Raik -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 8 00:20:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 8 Jul 2007 00:20:12 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707080420.l684KCSq031646@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-07-08 00:20 EST ------- Fixed in CVS (see biopython/Bio/Blast/NCBIStandalone.py revision 1.68). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chengsoon.ong at tuebingen.mpg.de Mon Jul 9 06:15:50 2007 From: chengsoon.ong at tuebingen.mpg.de (Cheng Soon Ong) Date: Mon, 9 Jul 2007 12:15:50 +0200 Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast Message-ID: Hi, I've just written a small extension to the qblast function. The current version of only passes a subset of parameters to NCBI. I've just written some code such that it passes all the parameters that the qblast API at NCBI accepts. Is anyone interested to merge this into the blast module of Biopython? Sorry, I do not know the protocol here for getting code into Biopython. Cheng From mdehoon at c2b2.columbia.edu Mon Jul 9 07:40:23 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Mon, 09 Jul 2007 20:40:23 +0900 Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast In-Reply-To: References: Message-ID: <46921EA7.2080106@c2b2.columbia.edu> Dear Cheng, Thank you for your contribution. The "official" way to contribute code to Biopython is to open a bug report at http://bugzilla.open-bio.org/, open a new bug report, and add your code to it. For your qblast code, you can also just send it to me (not to the list), then I can merge it into Biopython. --Michiel. Cheng Soon Ong wrote: > Hi, > > I've just written a small extension to the qblast function. The > current version of only passes a subset of parameters to NCBI. I've > just written some code such that it passes all the parameters that > the qblast API at NCBI accepts. > > Is anyone interested to merge this into the blast module of > Biopython? Sorry, I do not know the protocol here for getting code > into Biopython. > From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 15:31:55 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 20:31:55 +0100 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> Message-ID: <4693DEAB.8000900@maubp.freeserve.co.uk> Hi Tiago, Have you had any feedback (off the mailing list)? Ralph - did you have a chance to look over Tiago's code or discuss this with him? It would be a shame if nothing came from this... Peter Tiago Ant?o wrote: > Hi! > > I have submitted another enhancement bug, with support for FDist. It > allows to generate and parse Fdist files and to control fdist > applications. There are also a couple of utility functions. FDist is a > niche application (mainly used to detect selection in animal > genetics). Not the most fundamental one to support, but it is > currently one that I am working on, thus, the code. > > Regarding my summited code for GenePop, I have summited a different > version on bugzilla. The main difference, is that I moved everything > from Bio to Bio.PopGen. > > Before I continue putting code on bugzilla I would like to know if it > is worthwhile doing it... Any opinions on the code submitted or if any > changes are required? I would really like to continue converting my > code to BioPython, but only if it has any possibility of ending up > being useful/included in distribution somewhere in the future... ;) > > I am currently working on code related to SimCoal2, Arlequin and > general statistics (Fst, heterozygosity, ...). Which will probably be > ready quite soon (ie, next two weeks). This is more mainstream than > FDist > > I have some other code lying around mainly related to HapMap, but I > will only submit it after reviewing and reusing it again. This is more > distant future ... like a couple of months. > > Tiago From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 17:12:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 22:12:44 +0100 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> Message-ID: <4693F64C.9050403@maubp.freeserve.co.uk> Ralph Haygood wrote: > Peter, > > I haven't received any code from Tiago to review. > > Ralph He's put some on Bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2170 Peter From rhaygood at duke.edu Tue Jul 10 23:45:56 2007 From: rhaygood at duke.edu (Ralph Haygood) Date: Tue, 10 Jul 2007 23:45:56 -0400 (EDT) Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: <4693DEAB.8000900@maubp.freeserve.co.uk> References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> Message-ID: Peter and Tiago, Hello. No, I haven't done anything with Tiago's code. I'm afraid it's pretty far from what I'm working on these days. I still think it would be good for BioPython to include methods for computing basic population-genetical statistics (Watterson's theta, Tajima's D, etc.) from DNA alignments. I have in mind something like BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't conform to BioPython's standards for style, testing, or documentation, and I don't know when I'll have time to standardize it. Ralph On Tue, 10 Jul 2007, Peter wrote: > Hi Tiago, > > Have you had any feedback (off the mailing list)? > > Ralph - did you have a chance to look over Tiago's code or discuss this with > him? > > It would be a shame if nothing came from this... > > Peter > > Tiago Ant?o wrote: >> Hi! >> >> I have submitted another enhancement bug, with support for FDist. It >> allows to generate and parse Fdist files and to control fdist >> applications. There are also a couple of utility functions. FDist is a >> niche application (mainly used to detect selection in animal >> genetics). Not the most fundamental one to support, but it is >> currently one that I am working on, thus, the code. >> >> Regarding my summited code for GenePop, I have summited a different >> version on bugzilla. The main difference, is that I moved everything >> from Bio to Bio.PopGen. >> >> Before I continue putting code on bugzilla I would like to know if it >> is worthwhile doing it... Any opinions on the code submitted or if any >> changes are required? I would really like to continue converting my >> code to BioPython, but only if it has any possibility of ending up >> being useful/included in distribution somewhere in the future... ;) >> >> I am currently working on code related to SimCoal2, Arlequin and >> general statistics (Fst, heterozygosity, ...). Which will probably be >> ready quite soon (ie, next two weeks). This is more mainstream than >> FDist >> >> I have some other code lying around mainly related to HapMap, but I >> will only submit it after reviewing and reusing it again. This is more >> distant future ... like a couple of months. >> >> Tiago > > > From tiagoantao at gmail.com Wed Jul 11 06:05:21 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 11 Jul 2007 12:05:21 +0200 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> Message-ID: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com> Hi, I had no feedback and it seemed that there was no interest, so I decided to start a Python Population Genetics project on google, which is going ahead, but still on alpha stages: http://code.google.com/p/pypopgen/ I am doing this on a personal basis for now (I did not even announce it anywhere), and so it is advancing at my personal pace and design according to me needs I have used it already (or a tiny part of it) on a published aplication ( http://popgen.eu/soft/m4s2 ). I am still willing to integrate this on BioPython, but for that some interest and feedback would be needed... That would have to happen somewhat soon as the code will have to be adapted to BioPython standards and namespace, and when, in a future, there is a lot of code that will be in practice difficult (and after going public it will be impossible really). The "strangest" code that I am doing (and that would need more discussion) is one to do asyncronous computation (to be easy to use on multicore computers and grids). Regards, Tiago On 7/11/07, Ralph Haygood wrote: > Peter and Tiago, > > Hello. No, I haven't done anything with Tiago's code. I'm afraid > it's pretty far from what I'm working on these days. > > I still think it would be good for BioPython to include methods for > computing basic population-genetical statistics (Watterson's theta, > Tajima's D, etc.) from DNA alignments. I have in mind something like > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't > conform to BioPython's standards for style, testing, or documentation, > and I don't know when I'll have time to standardize it. > > Ralph > > On Tue, 10 Jul 2007, Peter wrote: > > > Hi Tiago, > > > > Have you had any feedback (off the mailing list)? > > > > Ralph - did you have a chance to look over Tiago's code or discuss this with > > him? > > > > It would be a shame if nothing came from this... > > > > Peter > > > > Tiago Ant?o wrote: > >> Hi! > >> > >> I have submitted another enhancement bug, with support for FDist. It > >> allows to generate and parse Fdist files and to control fdist > >> applications. There are also a couple of utility functions. FDist is a > >> niche application (mainly used to detect selection in animal > >> genetics). Not the most fundamental one to support, but it is > >> currently one that I am working on, thus, the code. > >> > >> Regarding my summited code for GenePop, I have summited a different > >> version on bugzilla. The main difference, is that I moved everything > >> from Bio to Bio.PopGen. > >> > >> Before I continue putting code on bugzilla I would like to know if it > >> is worthwhile doing it... Any opinions on the code submitted or if any > >> changes are required? I would really like to continue converting my > >> code to BioPython, but only if it has any possibility of ending up > >> being useful/included in distribution somewhere in the future... ;) > >> > >> I am currently working on code related to SimCoal2, Arlequin and > >> general statistics (Fst, heterozygosity, ...). Which will probably be > >> ready quite soon (ie, next two weeks). This is more mainstream than > >> FDist > >> > >> I have some other code lying around mainly related to HapMap, but I > >> will only submit it after reviewing and reusing it again. This is more > >> distant future ... like a couple of months. > >> > >> Tiago > > > > > > From bugzilla-daemon at portal.open-bio.org Fri Jul 13 07:08:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 07:08:07 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131108.l6DB87xm027778@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:08 EST ------- I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py and noticed that while crc64, gcg and seguid will cope with both strings and Seq objects, crc32 will only cope with strings. Any objections to me fixing this like so: Old: from binascii import crc32 New: from binascii import crc32 as _crc32 def crc32(seq) : """Returns the crc32 checksum for a sequence (string or Seq object)""" try : #Assume its a Seq object return _crc32(seq.tostring()) except AttributeError : #Assume its a string return _crc32(seq) -- Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 13 07:18:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 07:18:30 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131118.l6DBIUOS028425@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:18 EST ------- Created an attachment (id=703) --> (http://bugzilla.open-bio.org/attachment.cgi?id=703&action=view) Initial unit test for Bio/SeqUtils/CheckSum If the crc32 function could accept a Seq object then the "try/except" at the end isn't needed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 13 10:38:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 10:38:52 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131438.l6DEcqn5008339@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp 2007-07-13 10:38 EST ------- A better solution would be for Seq to inherit from str, instead of Seq having str as a member. Then we don't have to modify crc32, and other code in Biopython will also become simpler. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:17:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 11:17:59 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707131517.l6DFHxY4010647@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:17 EST ------- I have just fixed a few in CVS, here a list of remaining abnormal shebang/hashbang lines: biopython/Bio/EUtils/POM.py '#!/usr/bin/python -i\n' biopython/Bio/EUtils/DTDs/LinkOut.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/__init__.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eInfo_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eLink_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/ePost_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eSearch_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eSummary_020511.py '#!/usr/bin/python\n' The biopython/Bio/EUtils/*.py examples are interesting in that many of those files are autogenerated from DTD files (using the dtd2py.py script I think - but it doesn't seem to work on all of them). Also, I don't think all the files under Bio/Restriction/*.py need a shebang, and a large proportion of the unit tests have shebangs (but less than half). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Fri Jul 13 11:23:03 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 13 Jul 2007 16:23:03 +0100 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com> References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com> Message-ID: <6d941f120707130823i6b62478fl9ac589eb5c02ca9d@mail.gmail.com> I just want to add that I followed precisely the procedure that I was suggested at that time, ie to open bugzilla issues, but I got no answer or follow up from it. I also had some very useful mail exchanges with Ralph at that time, but no code was floated around. I reiterate my interest in supplying the code (currently supporting fdist, simcoal2, genepop, hapmap, ucsc table browser - in varying degrees of quality). You can have a look at the google url supplied (svn repository in it). I would still take the necessary time to convert it to BioPython namespace and format. If in one week I see no interest (interest in the form of pro actively making things go forward) at all then I will consider this a closed issue and will not spend more time with trying any form of integration, in the sense that I have done all that was requested here and really got no feedback. Tiago On 7/11/07, Tiago Ant?o wrote: > Hi, > > I had no feedback and it seemed that there was no interest, so I > decided to start a Python Population Genetics project on google, which > is going ahead, but still on alpha stages: > http://code.google.com/p/pypopgen/ > I am doing this on a personal basis for now (I did not even announce > it anywhere), and so it is advancing at my personal pace and design > according to me needs > I have used it already (or a tiny part of it) on a published > aplication ( http://popgen.eu/soft/m4s2 ). > I am still willing to integrate this on BioPython, but for that some > interest and feedback would be needed... That would have to happen > somewhat soon as the code will have to be adapted to BioPython > standards and namespace, and when, in a future, there is a lot of code > that will be in practice difficult (and after going public it will be > impossible really). > > The "strangest" code that I am doing (and that would need more > discussion) is one to do asyncronous computation (to be easy to use on > multicore computers and grids). > > Regards, > Tiago > > On 7/11/07, Ralph Haygood wrote: > > Peter and Tiago, > > > > Hello. No, I haven't done anything with Tiago's code. I'm afraid > > it's pretty far from what I'm working on these days. > > > > I still think it would be good for BioPython to include methods for > > computing basic population-genetical statistics (Watterson's theta, > > Tajima's D, etc.) from DNA alignments. I have in mind something like > > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own > > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't > > conform to BioPython's standards for style, testing, or documentation, > > and I don't know when I'll have time to standardize it. > > > > Ralph > > > > On Tue, 10 Jul 2007, Peter wrote: > > > > > Hi Tiago, > > > > > > Have you had any feedback (off the mailing list)? > > > > > > Ralph - did you have a chance to look over Tiago's code or discuss this with > > > him? > > > > > > It would be a shame if nothing came from this... > > > > > > Peter > > > > > > Tiago Ant?o wrote: > > >> Hi! > > >> > > >> I have submitted another enhancement bug, with support for FDist. It > > >> allows to generate and parse Fdist files and to control fdist > > >> applications. There are also a couple of utility functions. FDist is a > > >> niche application (mainly used to detect selection in animal > > >> genetics). Not the most fundamental one to support, but it is > > >> currently one that I am working on, thus, the code. > > >> > > >> Regarding my summited code for GenePop, I have summited a different > > >> version on bugzilla. The main difference, is that I moved everything > > >> from Bio to Bio.PopGen. > > >> > > >> Before I continue putting code on bugzilla I would like to know if it > > >> is worthwhile doing it... Any opinions on the code submitted or if any > > >> changes are required? I would really like to continue converting my > > >> code to BioPython, but only if it has any possibility of ending up > > >> being useful/included in distribution somewhere in the future... ;) > > >> > > >> I am currently working on code related to SimCoal2, Arlequin and > > >> general statistics (Fst, heterozygosity, ...). Which will probably be > > >> ready quite soon (ie, next two weeks). This is more mainstream than > > >> FDist > > >> > > >> I have some other code lying around mainly related to HapMap, but I > > >> will only submit it after reviewing and reusing it again. This is more > > >> distant future ... like a couple of months. > > >> > > >> Tiago > > > > > > > > > > From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:25:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 11:25:32 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131525.l6DFPWMa011025@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:25 EST ------- Changing the Seq object to be a subclass of string might be nice... but perhaps rather confusing for minority alphabets where the "letters" are not single characters(*). More importantly, wouldn't this dramatic change break a lot of existing scripts? Probably something for the mailing list! (*) I've never done it, but one example is storing three letter protein sequences, nice if you have any post translational modifications which cannot be represented using the single letter scheme. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Sat Jul 14 06:22:06 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 11:22:06 +0100 Subject: [Biopython-dev] Bug 2292 - TER lines from Bio.PDBIO Message-ID: <4698A3CE.7020907@maubp.freeserve.co.uk> Hi Thomas, Could you have a look at Biopython Bug 2292 and the suggested patch from Michal Gajda to write TER records in line with the spec: http://bugzilla.open-bio.org/show_bug.cgi?id=2292 Thanks Peter From tiagoantao at gmail.com Sat Jul 14 12:32:43 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 14 Jul 2007 17:32:43 +0100 Subject: [Biopython-dev] Population Genetics code Message-ID: <6d941f120707140932u356c84bel6a9322a2767e6da7@mail.gmail.com> Hi! Firstly I would like to thank everybody that answered so positively to my "rant" about submitting population genetics code to Biopython. I have a few suggestions on how to progress in a safe in constructive way with a possible Population Genetics part for biopython. First of all, the starting point: 1. There is none in the core developers that is working actively in populations genetics 2. Point 1 entails that any code submissions (made by biopython newbies like me) will not be able to be completely reviewed by seasoned biopython developers 3. Initially there will only be me submitting code (please correct me if I am wrong, especially Ralph...) 4. There is already some popgen statistical code in python lying around e.g. http://www.pypop.org/ Therefore I suggest starting out by doing a small, "safe", project around a not very used application (Mark Beaumont's Fdist program http://www.rubic.rdg.ac.uk/~mab/software.html ). This code is already done and tested (by myself). I also have test cases (in BioPython format) for parts of it. The major issue is that it is currently outside of Bio.PopGen namespace, so its not really very major... I would provide parsers, configuration file generators and utilities to run the suite of fdist programs. Why start with such a simple and less relevant application: 1. Its safer to start with something less grand (if its poorly done it won't be that serious). 2. There is no python fdist code lying around, so there is no overlap at all with existing projects 3. This code is already done and being used... I will provide code, test code, and documentation (probably by adding stuff to the wiki). Then other people could evaluate what was done, and we would continue from there to other, more used applications (Genepop, arlequin, simcoal2, ...) and databases (HapMap, TableBrowser). Is this an acceptable way of going ahead? If other people would like to participate, that would be fantastic... If my suggestion is rubbish, please also say ;) Many thanks, Tiago From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 14:27:40 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 19:27:40 +0100 Subject: [Biopython-dev] Biopython usage figures Message-ID: <469BB89C.8010904@maubp.freeserve.co.uk> A little last minute I know, but would anyone have access to the website download statistics? I'd like to include rough figures for the number of downloads of the recent releases in the BOSC 2007 talk. A list of developers with CVS access would be nice too - but I can just trawl though the logs to spot active people ;) Peter From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 14:50:49 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 19:50:49 +0100 Subject: [Biopython-dev] Is Bio.Crystal obsolete? Message-ID: <469BBE09.1000005@maubp.freeserve.co.uk> I just had a look at the Bio.Crystal module by Katharine Lindner (2002), consisting of the single file Bio/Crystal/__init__.py whose preamble states: > Hetero, Crystal and Chain exist to represent the NDB Atlas > structure. Atlas is a minimal subset of the PDB format. Heteo > supports a 3 alphameric code. The NDB web interface is located at > ... The old link should probably be updated as it doesn't work, perhaps: http://ndbserver.rutgers.edu/atlas/index.html As far as I can see, they now provide their downloads in PDB, CIF and an XML file format - and the PDB files look like full thing to me at first glance rather than a minimal subset. There is a unit test, Tests/test_Crystal.py but no example input files. This module looks obsolete to me - can we mark it as deprecated after checking on the main list no one uses it (as done for Bio.Kabat back in March 2007)? Peter From tiagoantao at gmail.com Wed Jul 18 06:29:08 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 18 Jul 2007 11:29:08 +0100 Subject: [Biopython-dev] PopGen code Message-ID: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> Hi! Starting today I will begin putting code on CVS regarding Population Genetics stuff. I will start by checking in a GenePop parser and test code. Very soon FDist code will follow. After that Simcoal stuff, more GenePop stuff, HapMap and UCSC table browser will follow. I was not able to read dev.open-bio.org suggestions as it seems to be down for a some time. If any of the senior Biopython developers finds that I am doing anything seriously wrong, please don't hesitate to contact me immediately. I will be putting everything below a PopGen directory in Bio. Everything except tests, of course ;) Regards, Tiago From biopython-dev at maubp.freeserve.co.uk Wed Jul 18 17:37:46 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Jul 2007 22:37:46 +0100 Subject: [Biopython-dev] PopGen code In-Reply-To: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> Message-ID: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com> Tiago Ant?o wrote: > Hi! > > Starting today I will begin putting code on CVS regarding Population > Genetics stuff... > I will be putting everything below a PopGen directory in Bio. > Everything except tests, of course ;) Sounds good :) If you can write some introductory text to add to the cookbook/tutorial that would be even better. If you are not familiar with LaTeX, then just write it up in plain text and I could add that to the tutorial with suitable mark-up/formatting on your behalf. This may be easier to do in chunks as you add new code, or in a large batch later on - up to you. Peter From tiagoantao at gmail.com Wed Jul 18 18:46:19 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 18 Jul 2007 23:46:19 +0100 Subject: [Biopython-dev] PopGen code In-Reply-To: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com> References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com> Message-ID: <6d941f120707181546y34e17038nb07106dacae533db@mail.gmail.com> Hi! On 7/18/07, Peter wrote: > If you can write some introductory text to add to the > cookbook/tutorial that would be even better. If you are not familiar > with LaTeX, then just write it up in plain text and I could add that > to the tutorial with suitable mark-up/formatting on your behalf. I agree, in fact it is what I intend to do after having the FDist code in. I will write mostly in parallel with commiting. So the doc should be more or less aligned with what is being put in CVS... Regards, Tiago From tiagoantao at gmail.com Thu Jul 19 09:09:29 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 19 Jul 2007 14:09:29 +0100 Subject: [Biopython-dev] PopGen Documentation Message-ID: <6d941f120707190609o4f5f7412x94851295865ba22b@mail.gmail.com> Hi All, Following Peter's suggestion, I had a closer look at the documentation, and, if nobody opposes, I would like to add a new subsection between PDB and Miscellaneous on the cookbook chapter, Like this 4.10 Going 3D: The PDB module 4.11 PopGen: Population genetics (and genomics) 4.12 Miscellaneous Tiago On 7/18/07, Peter wrote: > Tiago Ant?o wrote: > > Hi! > > > > Starting today I will begin putting code on CVS regarding Population > > Genetics stuff... > > I will be putting everything below a PopGen directory in Bio. > > Everything except tests, of course ;) > > Sounds good :) > > If you can write some introductory text to add to the > cookbook/tutorial that would be even better. If you are not familiar > with LaTeX, then just write it up in plain text and I could add that > to the tutorial with suitable mark-up/formatting on your behalf. > > This may be easier to do in chunks as you add new code, or in a large > batch later on - up to you. > > Peter > From bugzilla-daemon at portal.open-bio.org Sat Jul 21 11:28:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Jul 2007 11:28:49 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707211528.l6LFSnBk031498@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:28 EST ------- In biopython/Bio/EUtils, the shebang line was specified in sourcegen.py. I fixed that line and the shebang lines in the other *.py files under biopython/Bio/EUtils. Can we close this bug? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 21 11:47:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Jul 2007 11:47:32 -0400 Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF folder after the install In-Reply-To: Message-ID: <200707211547.l6LFlWxU032394@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2291 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:47 EST ------- I have modified setup.py so that Bio.PDB.mmCIF is always a module (but not necessarily with the MMCIFlex module; users still need to modify setup.py to include it). With Bio.PDB.mmCIF always present, the Bio/PDB/mmCIF/__init__.py file is no longer lost. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 22 04:30:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 22 Jul 2007 04:30:11 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707220830.l6M8UB6d006746@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-22 04:30 EST ------- Regarding comment 8, after changing sourcegen.py were you able to regenerate all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand? Anyway - that should leave us with consistent shebang/hashbang lines :) Unless we also want to remove any surplus lines, and decide if all or none of the unit tests should have them, then this bug looks done. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 22 05:53:46 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 22 Jul 2007 05:53:46 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707220953.l6M9rkap010929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2007-07-22 05:53 EST ------- > Regarding comment 8, after changing sourcegen.py were you able to regenerate > all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand? I fixed them by hand. The fixed sourcegen.py should result in the same biopython/Bio/EUtils/*.py files as I created by hand. I tried regenerating these files automatically, but that didn't work for me. At some point, somebody should figure out how the biopython/Bio/EUtils code works. > Unless we also want to remove any surplus lines, and decide if all or none of > the unit tests should have them, then this bug looks done. Since Python itself does not seem to have a clear rule as to which files should have a shebang line, it is not obvious which Biopython files should have one. If somebody really wants to fix this, it's probably better to discuss such an issue on the mailing list first. As the issue raised by the original bug report has been resolved, I am closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Jul 22 06:28:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 22 Jul 2007 19:28:22 +0900 Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and files with one record) In-Reply-To: <4693E5FE.708@maubp.freeserve.co.uk> References: <4693E5FE.708@maubp.freeserve.co.uk> Message-ID: <46A33146.7030405@c2b2.columbia.edu> Peter wrote: > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html > Let's discuss the Bio.Align.Alignment class first, and then decide how to parse alignment files. Currently, the alignment class holds a list of SeqRecord objects: class Alignment: ... def __init__(self, alphabet): ... # hold everything at a list of seq record objects self._records = [] To get access to self_record, the Alignment class has some accessor functions: def get_all_seqs(self): ... return self._records def get_seq_by_num(self, number): ... return self._records[number].seq A cleaner way to do this is to let the class Alignment inherit from list. This also allows us to use all list methods on Alignment objects. For example, we can iterate over them, as suggested in this bug report: http://bugzilla.open-bio.org/show_bug.cgi?id=1944 Any objections against letting Alignment inherit from list? --Michiel From salish at picasso.ucsf.edu Sun Jul 22 14:27:58 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Sun, 22 Jul 2007 11:27:58 -0700 Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and files with one record) In-Reply-To: <46A33146.7030405@c2b2.columbia.edu> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> Message-ID: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> Hello all, To get this same behavior, you can also create the __iter__ and next() methods in Alignment itself. -Howard Salis On 7/22/07, Michiel de Hoon wrote: > Peter wrote: > > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? > > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html > > > Let's discuss the Bio.Align.Alignment class first, and then decide how > to parse alignment files. > > Currently, the alignment class holds a list of SeqRecord objects: > > > class Alignment: > ... > def __init__(self, alphabet): > ... > # hold everything at a list of seq record objects > self._records = [] > > To get access to self_record, the Alignment class has some accessor > functions: > > def get_all_seqs(self): > ... > return self._records > > > def get_seq_by_num(self, number): > ... > return self._records[number].seq > > A cleaner way to do this is to let the class Alignment inherit from > list. This also allows us to use all list methods on Alignment objects. > For example, we can iterate over them, as suggested in this bug report: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1944 > > Any objections against letting Alignment inherit from list? > > > --Michiel > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mdehoon at c2b2.columbia.edu Wed Jul 25 09:17:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 25 Jul 2007 22:17:33 +0900 Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and files with one record) In-Reply-To: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> Message-ID: <46A74D6D.9020309@c2b2.columbia.edu> Sure, that is possible, but that means we'd be adding methods to Alignment in order for it to behave like a list, whereas we can get that for free by letting the Alignment class inherit from list. --Michiel. Howard Salis wrote: > Hello all, > > > To get this same behavior, you can also create the __iter__ and next() > methods in Alignment itself. > > -Howard Salis > > On 7/22/07, Michiel de Hoon wrote: >> Peter wrote: >>> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? >>> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html >>> >> Let's discuss the Bio.Align.Alignment class first, and then decide how >> to parse alignment files. >> >> Currently, the alignment class holds a list of SeqRecord objects: >> >> >> class Alignment: >> ... >> def __init__(self, alphabet): >> ... >> # hold everything at a list of seq record objects >> self._records = [] >> >> To get access to self_record, the Alignment class has some accessor >> functions: >> >> def get_all_seqs(self): >> ... >> return self._records >> >> >> def get_seq_by_num(self, number): >> ... >> return self._records[number].seq >> >> A cleaner way to do this is to let the class Alignment inherit from >> list. This also allows us to use all list methods on Alignment objects. >> For example, we can iterate over them, as suggested in this bug report: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 >> >> Any objections against letting Alignment inherit from list? >> >> >> --Michiel >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 09:34:02 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 14:34:02 +0100 Subject: [Biopython-dev] Bio.AlignIO In-Reply-To: <46A74D6D.9020309@c2b2.columbia.edu> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> Message-ID: <46A7514A.1090405@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Sure, that is possible, but that means we'd be adding methods to > Alignment in order for it to behave like a list, whereas we can get > that for free by letting the Alignment class inherit from list. > > --Michiel. Personally I see an alignment as both an array of characters (i.e. amino acid residues or nucleotides), and a list of sequences. In the same way that a Numeric or NumPy array lets you iterate over rows, yet also access individual elements, we could allow iteration of SeqRecords and also allow access to individual letters. Peter From mdehoon at c2b2.columbia.edu Wed Jul 25 10:44:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 25 Jul 2007 23:44:56 +0900 Subject: [Biopython-dev] Bio.AlignIO In-Reply-To: <46A7514A.1090405@maubp.freeserve.co.uk> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> <46A7514A.1090405@maubp.freeserve.co.uk> Message-ID: <46A761E8.5080909@c2b2.columbia.edu> Peter wrote: > Personally I see an alignment as both an array of characters (i.e. amino > acid residues or nucleotides), and a list of sequences. > > In the same way that a Numeric or NumPy array lets you iterate over > rows, yet also access individual elements, we could allow iteration of > SeqRecords and also allow access to individual letters. How about the following: -Iterators iterate for the SeqRecords in the alignment -An index of the form [xxx] returns the corresponding SeqRecord -An index of the form [xxx:yyy:zzz] returns an Alignment object containing the SeqRecords in rows [xxx:yyy:zzz] (compare to the current method get_all_seqs()). -An index of the form [xxx,:] returns the Seq object of the SeqRecord at xxx (this is currently done by the get_seq_by_num() method). -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects -An index of the form [:,www] returns a string containing the characters at column www (which is currently done by the get_column method) -An index of the form [xxx:yyy:zzz,www] returns a string containing the characters at column www using only the rows xxx:yyy:zzz. -An index of the form [xxx,www] returns a string containing the character of the sequence in row xxx at column www. This is more-or-less how Numerical Python arrays work, except that we'll be returning SeqRecord/Seq/string objects depending on the indices. --Michiel. From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 12:10:43 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 17:10:43 +0100 Subject: [Biopython-dev] Improving the Alignment object. Was Bio.AlignIO In-Reply-To: <46A761E8.5080909@c2b2.columbia.edu> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> <46A7514A.1090405@maubp.freeserve.co.uk> <46A761E8.5080909@c2b2.columbia.edu> Message-ID: <46A77603.1030101@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> Personally I see an alignment as both an array of characters (i.e. amino >> acid residues or nucleotides), and a list of sequences. >> >> In the same way that a Numeric or NumPy array lets you iterate over >> rows, yet also access individual elements, we could allow iteration of >> SeqRecords and also allow access to individual letters. > > How about the following: > > -Iterators iterate for the SeqRecords in the alignment I Agree. And this is trivial to implement without needing the element access/splicing support. As to element access, we've been thinking along similar lines :) Its just that with all the different special cases, there are lots of different possible return types! > -An index of the form [xxx] returns the corresponding SeqRecord > -An index of the form [xxx:yyy:zzz] returns an Alignment object > containing the SeqRecords in rows [xxx:yyy:zzz] > (compare to the current method get_all_seqs()). I agree. This is essential to make an alignment act like a list of SeqRecord objects when only a one-dimensional index is given. > -An index of the form [xxx,:] returns the Seq object of the SeqRecord at > xxx (this is currently done by the get_seq_by_num() method). > -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects I'm not immediately convinced about returning Seq objects here. I might expect indices like [xxx,:] to return a SeqRecord (not a Seq) and [xxx:yyy:zzz,:] to return a sub-alignment (not a list of Seq objects). > -An index of the form [:,www] returns a string containing the characters > at column www (which is currently done by the get_column method) > -An index of the form [xxx,www] returns a string containing the > character of the sequence in row xxx at column www. Those look fine - however we might want to return Seq objects rather than strings. > -An index of the form [xxx:yyy:zzz,www] returns a string containing > the characters at column www using only the rows xxx:yyy:zzz. Or a sub alignment? See later... > This is more-or-less how Numerical Python arrays work, except that we'll > be returning SeqRecord/Seq/string objects depending on the indices. For comparison, that is what I had been thinking: * [r,c] means one element is requested, return a single character string * [r] or [r,:] means one row is requested, return a SeqRecord * [:,c] means one column is requested, return a string (or Seq object?) * Otherwise returns a (sub)alignment. Note that [:] or [:,:] would return a copy of the alignment. This would cover slicing of the column index by returning a sub-alignment. i.e. indexes of the form [rrr, xxx:yyy:zzz] or [rrr:ppp:qqq, xxx:yyy:zzz] I'm not sure if requests for part of a single row or column like [rrr, xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning sub-alignments or as special cases (strings/Seq and Seq/SeqRecord respectively?). Peter From bugzilla-daemon at portal.open-bio.org Thu Jul 26 10:52:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 10:52:38 -0400 Subject: [Biopython-dev] [Bug 2340] New: SProt.py fails to parse the current Swiss-Prot version 54.0 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2340 Summary: SProt.py fails to parse the current Swiss-Prot version 54.0 Product: Biopython Version: 1.43 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: gould at embl.de Hi, I'm running on a red hat linux box on python 2.3.4 and am trying to parse any swiss-prot record but the parser just seems to bomb out not throwing an error of where it actually fails. I'm guessing it has to do with the Release 54.0 of 24-Jul-07 of UniPROT with the addition of the new line type PE?? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 11:46:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 11:46:36 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261546.l6QFkaGq022472@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 11:46 EST ------- Hi Kate, Could you give us the URL of one or two specific SwissProt files you're having trouble with. Also how are you trying to read the SwissProt files? e.g. with Bio.SeqIO.parse()? If you could include the python error too, that could be helpful. Thanks. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 12:06:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 12:06:15 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261606.l6QG6FkE023264@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #2 from gould at embl.de 2007-07-26 12:06 EST ------- (In reply to comment #0) > Hi, > > I'm running on a red hat linux box on python 2.3.4 and am trying to parse any > swiss-prot record but the parser just seems to bomb out not throwing an error > of where it actually fails. I'm guessing it has to do with the Release 54.0 of > 24-Jul-07 of UniPROT with the addition of the new line type PE?? > (In reply to comment #1) > Hi Kate, > > Could you give us the URL of one or two specific SwissProt files you're having > trouble with. > > Also how are you trying to read the SwissProt files? e.g. with > Bio.SeqIO.parse()? > > If you could include the python error too, that could be helpful. Thanks. > > Peter > hi the following snippet of code is where the error occurs(this used to work no problem before something changed in the last day or two I guess) def getSequence(self,acc): """ This method retrieves the most recent annotated sequence from the ExPASy server for a given accession number. """ from Bio.WWW import ExPASy from Bio.SwissProt import SProt from Bio import File if acc != '': try: results = ExPASy.get_sprot_raw(acc.strip()).read() sp_parser = SProt.RecordParser() sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser) Record = sp_iterator.next() return Record.sequence.strip() except: return -1 else: return acc breaks at line : Record = sp_iterator.next() but doesn't print any error to terminal.... some examples of accessions nrs used are: P01100, P12522 etc thanks Kate -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 12:32:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 12:32:31 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261632.l6QGWVrC024560@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:32 EST ------- Confirmeing bug - it is due to the new PE line (protein evidence). The reason you didn't see the error is in your example the parser is wrapped in a try ... except ... clause. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 12:51:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 12:51:45 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261651.l6QGpja8025622@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:51 EST ------- I think I have fixed this - at least your example code now works. You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from CVS, which you can download here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want to put things back. Please test this and report back. NOTE - The fix just makes the parser aware of the new PE line, and ignores it. It doesn't (yet) do anything useful with the information it contains! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 27 02:46:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Jul 2007 02:46:35 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707270646.l6R6kZaI001699@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #5 from gould at embl.de 2007-07-27 02:46 EST ------- (In reply to comment #4) > I think I have fixed this - at least your example code now works. > > You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from > CVS, which you can download here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython > > Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want > to put things back. > > Please test this and report back. > > NOTE - The fix just makes the parser aware of the new PE line, and ignores it. > It doesn't (yet) do anything useful with the information it contains! > Yes it has done the trick and all works OK again. thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 27 03:54:14 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Jul 2007 03:54:14 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707270754.l6R7sEnm007432@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 03:54 EST ------- Great :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kosa at genesilico.pl Fri Jul 27 06:47:10 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Fri, 27 Jul 2007 12:47:10 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object Message-ID: <46A9CD2E.6080402@genesilico.pl> Hi, From the viewpoint of the enduser we would like python Alignment object to behave outside as an array so we could get slices, columns, sequences, their fragments, whatever we want etc. The most intuitive and clear (certainly much better than not very clear indexes like [xxx:yyy:zzz]) for the user is the following. [A:B][X:Y] - general syntax of indices. This supports almost everything. Several examples of usage and proposed outputs: [:][:] - returns an alignment or its copy (as Alignment object) [:][x:y] - returns slice of the alignment (as Alignment object; aln of all sequences and residues corresponding to columns from x and y) [a:b][:] - returns the aln of seqs from a to b (as Alignment object) [a:b][x:y] - returns the slice and subalignment (as Alignment object) [a:a][x:y] - returns slice of the single sequence (residues x to y of sequence a) (as Alignment object) [a][x:y] - returns slice of the single sequence (residues x to y of sequence a) (as a String) [a:][x:y] and similar combinations - returns the slice and subalignment, sequences from a to the last are included (as Alignment object) [:][x] - returns single column (as a String object? string here could be very useful) [:][x:x] - returns single column (as Alignment object) [a] - returns single sequence (as a SeqRecord object) [a:a] and [a:a][:] - returns single sequence (as Alignment object) [m][n] - returns n-th element of sequence m (as a String) Disputable could be that different but similar sets of indices return different types of objects (ex. [:][x] would return a column as string while [:][x:x] would return a column as Alignment object, but in my opinion it would just extend the usability). The only problem is an implementation of such calls but it depends on what type of object the Alignment object will be. What do you think? Cheers, Jan Kosinski Grzegorz Papaj :. From bugzilla-daemon at portal.open-bio.org Fri Jul 27 08:51:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Jul 2007 08:51:10 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200707271251.l6RCpAIg025706@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 08:51 EST ------- Created an attachment (id=721) --> (http://bugzilla.open-bio.org/attachment.cgi?id=721&action=view) Patch for Bio/Align/Generic.py to add __getitem__ method This patch adds a __getitem__ method, a small "mini test" when running the module directly, and updates the doc strings. This gives SeqRecord iteration "for free" (without an explicit __iter__ method). As discussed on the mailing list, this allows an Alignment object to be treated as a list of SeqRecord objects or as an array of character strings - plus extract whole columns as strings. Quoting the proposed __getitem__ doc string: Depending on the indices, you can get a SeqRecord objects (representing a single row), strings (for a single columns or single characters) or another alignment (representing some or part of the alignment). align[r,c] gives a single character as a string align[r] gives a SeqRecord align[:,c] gives a column as a string align[:] and align[:,:] give a copy of the alignment Anything else gives a sub alignment, e.g. align[0:2] or align[0:2,:] uses only row 0 and 1 align[:,1:3] uses only columns 1 and 2 align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2 Feedback welcome - either here, or on the developers' mailing list. Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 08:18:21 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Jul 2007 13:18:21 +0100 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9CD2E.6080402@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> Message-ID: <46A9E28D.40609@maubp.freeserve.co.uk> Jan Kosinski wrote: > Hi, > > From the viewpoint of the enduser we would like python Alignment object > to behave outside as an array so we could get slices, columns, > sequences, their fragments, whatever we want etc. The most intuitive and > clear (certainly much better than not very clear indexes like > [xxx:yyy:zzz]) for the user is the following. > > [A:B][X:Y] - general syntax of indices. This supports almost everything. I think Michiel and I were suggesting [A:B,X:Y] or rather [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or [A:B:C][X:Y:Z] i.e. [arg1, arg2] rather than [arg1][arg2] This is an important point, as in the first case the __getitem__ method of the alignment is called once (with both arguments). In the second case, the __getitem__ method is called with arg1, and may return a SeqRecord or an alignment - and this object's __getitem__ method is called with arg2. As written, many of your cases appear to be impossible - but using the [arg1,arg2] we can get close. I've got a working bit of code put together now which I'll attached to bug 1944 soon. http://bugzilla.open-bio.org/show_bug.cgi?id=1944 Peter From kosa at genesilico.pl Fri Jul 27 10:13:24 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Fri, 27 Jul 2007 16:13:24 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> Message-ID: <46A9FD84.4080502@genesilico.pl> Hi, Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not clear while the [A:B][X:Y] syntax is clear and sufficient. We had another discussion in the lab about that Alignment object should not store records in the list but rather in a dictionary (but keeping information about sequence order ) or so. What is you reasoning for making Alignment object a list of SeqRecord objects? One should carefully think about design of the Alignment class since it will influence all further steps. As now the class is in its infancy there is a very good moment for thinking what the Alignment class is for and what it should support. For instance, the Alignment object should support changing characters in the alignment without a need of copying it (using aln[a][x] = "D"). Can it be done now with Alignment which is a list of SeqRecord objects with sequences implemented as immutable Seq objects ? Cheers, Jan Kosinski Peter wrote: > Jan Kosinski wrote: >> Hi, >> >> From the viewpoint of the enduser we would like python Alignment object >> to behave outside as an array so we could get slices, columns, >> sequences, their fragments, whatever we want etc. The most intuitive and >> clear (certainly much better than not very clear indexes like >> [xxx:yyy:zzz]) for the user is the following. >> >> [A:B][X:Y] - general syntax of indices. This supports almost everything. > > I think Michiel and I were suggesting [A:B,X:Y] or rather > [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or > [A:B:C][X:Y:Z] > > i.e. [arg1, arg2] rather than [arg1][arg2] > > This is an important point, as in the first case the __getitem__ > method of the alignment is called once (with both arguments). In the > second case, the __getitem__ method is called with arg1, and may > return a SeqRecord or an alignment - and this object's __getitem__ > method is called with arg2. > > As written, many of your cases appear to be impossible - but using the > [arg1,arg2] we can get close. > > I've got a working bit of code put together now which I'll attached to > bug 1944 soon. > > http://bugzilla.open-bio.org/show_bug.cgi?id=1944 > > Peter > > > :. > :. From kosa at genesilico.pl Fri Jul 27 10:35:15 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Fri, 27 Jul 2007 16:35:15 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9FD84.4080502@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> Message-ID: <46AA02A3.30000@genesilico.pl> Hi, Sorry for a typo ;-) Of course it should read: ... while the [A:B,X:Y] syntax is clear and sufficient." Cheers, Janek Jan Kosinski wrote: > Hi, > > Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is > fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is > not clear while the [A:B][X:Y] syntax is clear and sufficient. > > We had another discussion in the lab about that Alignment object > should not store records in the list but rather in a dictionary (but > keeping information about sequence order ) or so. What is you > reasoning for making Alignment object a list of SeqRecord objects? > One should carefully think about design of the Alignment class since > it will influence all further steps. As now the class is in its > infancy there is a very good moment for thinking what the Alignment > class is for and what it should support. For instance, the Alignment > object should support changing characters in the alignment without a > need of copying it (using aln[a][x] = "D"). Can it be done now with > Alignment which is a list of SeqRecord objects with sequences > implemented as immutable Seq objects ? > > Cheers, > Jan Kosinski > > > Peter wrote: >> Jan Kosinski wrote: >>> Hi, >>> >>> From the viewpoint of the enduser we would like python Alignment >>> object >>> to behave outside as an array so we could get slices, columns, >>> sequences, their fragments, whatever we want etc. The most intuitive >>> and >>> clear (certainly much better than not very clear indexes like >>> [xxx:yyy:zzz]) for the user is the following. >>> >>> [A:B][X:Y] - general syntax of indices. This supports almost >>> everything. >> >> I think Michiel and I were suggesting [A:B,X:Y] or rather >> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or >> [A:B:C][X:Y:Z] >> >> i.e. [arg1, arg2] rather than [arg1][arg2] >> >> This is an important point, as in the first case the __getitem__ >> method of the alignment is called once (with both arguments). In the >> second case, the __getitem__ method is called with arg1, and may >> return a SeqRecord or an alignment - and this object's __getitem__ >> method is called with arg2. >> >> As written, many of your cases appear to be impossible - but using >> the [arg1,arg2] we can get close. >> >> I've got a working bit of code put together now which I'll attached >> to bug 1944 soon. >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 >> >> Peter >> >> >> :. >> > > :. From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 13:11:03 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Jul 2007 18:11:03 +0100 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46A9FD84.4080502@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> Message-ID: <46AA2727.103@maubp.freeserve.co.uk> Jan Kosinski wrote: > We had another discussion in the lab about that Alignment object should > not store records in the list but rather in a dictionary (but keeping > information about sequence order ) or so. What is you reasoning for > making Alignment object a list of SeqRecord objects? In a sense the Bio.Align.Generic.Alignment object always was a list of SeqRecords (if you look at the internal implementation that is), and I hadn't stopped to really question it. I like having list like behaviour and exploit this in a lot of my code dealing with alignments. The are some nice things about having dictionary like behaviour in an alignment class, but unless a notional sequence order is preserved, this breaks the array of characters model. Also, using a dictionary like alignment would force the user to specify unique keys for each record (e.g. the record.id) which is something the current list-like-alignment does not require. Perhaps we could have a "dictionary like" sub class of Alignment where the __getitem__ method would allow a record identifier in place of a row index: print aln["P3454"] print aln["P3454", 20] instead or as well as: print aln[10] print aln[10, 20] > One should carefully think about design of the Alignment class since it > will influence all further steps. As now the class is in its infancy > there is a very good moment for thinking what the Alignment class is for > and what it should support. I had viewed the new __getitem__ method as a backwards compatible enhancement of the existing stable (but rather limited) Bio.Generic.Alignment class. That's not to say we can't design a new class from scratch - I just prefer gradual improvements without breaking existing usage. I am particularly keen to allow splicing of alignments. For example, you could select the conserved core of an alignment by removing the left most 10 columns and the right most ten columns: align_core = aln[:,10:-10] > For instance, the Alignment object should > support changing characters in the alignment without a need of copying > it (using aln[a,x] = "D"). Can it be done now with Alignment which is > a list of SeqRecord objects with sequences implemented as immutable Seq > objects ? No, right now you can't easily edit sequences in a Bio.Generic.Alignment (even with the proposed change) as it is implemented using immutable Seq objects. I personally haven't needed to edit an alignment like this. Is this something you want to do often? To me the obvious way to handle this is to have a MutableAlignment sub-class, where editing individual elements with aln[r,c] = "D" would be supported (possibly implemented using the MutableSeq class internally rather than the immutable Seq class). On a related point, I was planning to raise the following suggestion in the future - adding alignments, like this: combined_aln = aln1 + aln2 e.g. aln1 had 5 rows of length 10, and aln2 had 5 rows of length 15, then the result of aln1+aln2 would have 5 rows of length 25. Alignment addition would only be defined for alignments with the same number of rows (perhaps also restricted to the same sequence type, and row weights?). The result would contain the same number of rows, where each sequence was the concatenation of the corresponding two rows in the input alignments. I'd suggest concatenating the record.id's (if different) however one could argue that it would be better to insist the user had made sure the two alignments had consistent identifiers. An example of where this could be used is taking alignments of multiple sets of homologous genes, sorting them to use the same species order, and then creating a concatenated alignment for robust phylogenetic tree construction. Peter From mdehoon at c2b2.columbia.edu Fri Jul 27 22:57:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 11:57:05 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9FD84.4080502@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> Message-ID: <46AAB081.30609@c2b2.columbia.edu> Jan Kosinski wrote: > Hi, > > Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is > fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not > clear while the [A:B][X:Y] syntax is clear and sufficient. Python lists, tuples, and strings support [A:B:C], and Numerical Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment should not support this format. --Michiel. From mdehoon at c2b2.columbia.edu Fri Jul 27 23:10:06 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 12:10:06 +0900 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> Message-ID: <46AAB38E.50009@c2b2.columbia.edu> Peter wrote: > Perhaps we could have a "dictionary like" sub class of Alignment where > the __getitem__ method would allow a record identifier in place of a row > index: > > print aln["P3454"] > print aln["P3454", 20] > > instead or as well as: > > print aln[10] > print aln[10, 20] "as well as" would break if a user decides to use an integer as a key in the dictionary. A safer approach would be to define a method specifically for dictionary-like access. Something like: print aln[10] print aln[10,20] for list-like access, and print aln.get("P3454") for dictionary-like access. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 00:11:03 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 13:11:03 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> Message-ID: <46AAC1D7.8030208@c2b2.columbia.edu> Peter wrote: > I've got a working bit of code put together now which I'll attached to > bug 1944 soon. > > http://bugzilla.open-bio.org/show_bug.cgi?id=1944 > For the most part, I agree with the functionality in this patch. I have three suggestions though: >>> aln = Alignment(alphabet) # Suggestion 1: We should allow creating an Alignment without specifying an alphabet >>> aln.add_sequence("seq1", "ATCGTTGC") >>> aln.add_sequence("seq2", "ATCCTTGC") >>> aln.add_sequence("seq3", "ATCCGTGC") >>> aln[0] SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', name='', description='seq1', dbxrefs=[]) # Suggestion 2: I would expect "seq1" as the id rather than the description >>> aln[:2] # OK >>> aln[:,4] 'TTG' # OK >>> aln[2,:] # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment consisting of a single sequence doesn't make much sense. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 00:20:24 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 13:20:24 +0900 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> Message-ID: <46AAC408.2050703@c2b2.columbia.edu> Peter wrote: >> For instance, the Alignment object should >> support changing characters in the alignment without a need of copying >> it (using aln[a,x] = "D"). Can it be done now with Alignment which is >> a list of SeqRecord objects with sequences implemented as immutable Seq >> objects ? > .... > > To me the obvious way to handle this is to have a MutableAlignment > sub-class, where editing individual elements with aln[r,c] = "D" would > be supported (possibly implemented using the MutableSeq class internally > rather than the immutable Seq class). > I don't think we'd need a separate MutableAlignment for that. An Alignment is a list of sequences and is therefore mutable. If we add a __setitem__ method to the Alignment class, then this method can take care of constructing a new sequence and put it in the appropriate row. --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 06:04:04 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 11:04:04 +0100 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AAC1D7.8030208@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> Message-ID: <46AB1494.301@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> I've got a working bit of code put together now which I'll attached to >> bug 1944 soon. >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 >> > For the most part, I agree with the functionality in this patch. I have > three suggestions though: > > >>> aln = Alignment(alphabet) > # Suggestion 1: We should allow creating an Alignment without specifying > an alphabet That would mean changing the existing __init__ from: def __init__(self, alphabet): to something like: def __init__(self, alphabet=single_letter_alphabet): with this import statement added: from Bio.Alphabet import single_letter_alphabet This seems like a good idea, and shouldn't break any existing code either. > >>> aln.add_sequence("seq1", "ATCGTTGC") > >>> aln.add_sequence("seq2", "ATCCTTGC") > >>> aln.add_sequence("seq3", "ATCCGTGC") > >>> aln[0] > SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', > name='', description='seq1', dbxrefs=[]) > # Suggestion 2: I would expect "seq1" as the id rather than the description I agree with you here - this is the historic behaviour of the add_sequence method which actually creates a SeqRecord from the strings it is given. I would suggest it populate the record.id but for backwards compatibility still populate the record.description in case anyone is still using that. We also could add an add_record method to the alignment object which takes a SeqRecord, plus optional weight (and start and end?). Marc Colosimo also made this point on bug 1944 (although I don't like his mixed case method name). > >>> aln[:2] > > # OK > >>> aln[:,4] > 'TTG' > # OK > >>> aln[2,:] > > # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment > consisting of a single sequence doesn't make much sense. I'll have a closer look, but as aln[2] returns a single SeqRecord maybe aln[2,:] should do that too - rather than returning a string. Peter From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 09:14:43 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 14:14:43 +0100 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> Message-ID: <46AB4143.5070406@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >>> For instance, the Alignment object should >>> support changing characters in the alignment without a need of copying >>> it (using aln[a,x] = "D"). Can it be done now with Alignment which is >>> a list of SeqRecord objects with sequences implemented as immutable Seq >>> objects ? > .... >> To me the obvious way to handle this is to have a MutableAlignment >> sub-class, where editing individual elements with aln[r,c] = "D" would >> be supported (possibly implemented using the MutableSeq class internally >> rather than the immutable Seq class). >> > I don't think we'd need a separate MutableAlignment for that. An > Alignment is a list of sequences and is therefore mutable. If we add a > __setitem__ method to the Alignment class, then this method can take > care of constructing a new sequence and put it in the appropriate row. > So rather than editing one character of a MutableSeq, we could replace one immutable Seq object with a new immutable Seq object where one character was different? That would work - sounds a little slow, but certainly possible. Peter From mdehoon at c2b2.columbia.edu Sat Jul 28 11:15:49 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 00:15:49 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> Message-ID: <46AB5DA5.6050604@c2b2.columbia.edu> # Current method to add a row to the alignment: >>> aln.add_sequence("seq1", "ATCGTTGC") ... Peter wrote: > We also could add an add_record method to the alignment object which > takes a SeqRecord, plus optional weight (and start and end?). Marc > Colosimo also made this point on bug 1944 (although I don't like his > mixed case method name). This is Marc Colosimo's suggestion for adding a SeqRecord: def addSeqRecord(self, seqRec): """Add a Sequence Record to the Alignment @param seqRec: a sequence record (SeqRecord) to add. """ if isinstance(seqRec, SeqRecord): self._records.append(seqRec) else: raise TypeError("sequence is NOT a SeqRecord Object") Since an Alignment is essentially a list of SeqRecords, I propose that we call the method to add a row to this list "append". In addition, this method should be able to take a SeqRecord, a Seq object, or a plain string. Something like this: def append(self, sequence): if isinstance(sequence, SeqRecord): self._records.append(sequence) elif isinstance(sequence, Seq): self._records.append(SeqRecord(sequence)) elif isinstance(sequence, str): self._records.append(SeqRecord(Seq(sequence))) else: raise TypeError("sequence should be a string, a Seq Object, or a SeqRecord object") This method can be generalized to allow a descriptor, weight, start, end end, just like in the current add_sequence method. Then we can replace add_sequence and addSeqRecord by a single append method. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 11:17:52 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 00:17:52 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> Message-ID: <46AB5E20.5090605@c2b2.columbia.edu> Peter wrote: > Michiel de Hoon wrote: >> >>> aln.add_sequence("seq1", "ATCGTTGC") >> >>> aln[0] >> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', >> name='', description='seq1', dbxrefs=[]) >> # Suggestion 2: I would expect "seq1" as the id rather than the >> description > > I agree with you here - this is the historic behaviour of the > add_sequence method which actually creates a SeqRecord from the strings > it is given. I would suggest it populate the record.id but for backwards > compatibility still populate the record.description in case anyone is > still using that. > That sounds good to me. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 11:23:51 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 00:23:51 +0900 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AB4143.5070406@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> <46AB4143.5070406@maubp.freeserve.co.uk> Message-ID: <46AB5F87.1090506@c2b2.columbia.edu> Peter wrote: > Michiel de Hoon wrote: >> Peter wrote: >>>> For instance, the Alignment object should >>>> support changing characters in the alignment without a need of >>>> copying it (using aln[a,x] = "D"). Can it be done now with >>>> Alignment which is a list of SeqRecord objects with sequences >>>> implemented as immutable Seq objects ? >> .... >>> To me the obvious way to handle this is to have a MutableAlignment >>> sub-class, where editing individual elements with aln[r,c] = "D" >>> would be supported (possibly implemented using the MutableSeq class >>> internally rather than the immutable Seq class). >>> >> I don't think we'd need a separate MutableAlignment for that. An >> Alignment is a list of sequences and is therefore mutable. If we add a >> __setitem__ method to the Alignment class, then this method can take >> care of constructing a new sequence and put it in the appropriate row. >> > So rather than editing one character of a MutableSeq, we could replace > one immutable Seq object with a new immutable Seq object where one > character was different? That would work - sounds a little slow, but > certainly possible. > At first, I also thought that that would be slow, especially for long sequences. But in practice, it's surprisingly fast. Unless somebody wants to edit an alignment of chromosome-size sequences, we probably won't run into a speed problem. --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 12:00:34 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 17:00:34 +0100 Subject: [Biopython-dev] adding rows to an alignment object In-Reply-To: <46AB5DA5.6050604@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> <46AB5DA5.6050604@c2b2.columbia.edu> Message-ID: <46AB6822.6090706@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Since an Alignment is essentially a list of SeqRecords, I propose that > we call the method to add a row to this list "append". Sounds very sensible. > In addition, this method should be able to take a SeqRecord, a Seq > object, or a plain string. Do you really think we should complicate things like this? I would just accept SeqRecord objects (with optional start/end/weight). > Something like this: > > def append(self, sequence): > if isinstance(sequence, SeqRecord): > self._records.append(sequence) > elif isinstance(sequence, Seq): > self._records.append(SeqRecord(sequence)) > elif isinstance(sequence, str): > self._records.append(SeqRecord(Seq(sequence))) > else: > raise TypeError("sequence should be a string, a Seq Object, > or a SeqRecord object") One minor point - we should use the alignment's alphabet when building a Seq object from a string. Perhaps we should even check the alphabet when asked to append a SeqRecord or Seq object... > This method can be generalized to allow a descriptor, weight, start, > end, just like in the current add_sequence method. Where the descriptor is expected for Seq and string input, and used as the SeqRecord's id? I would personally check the length matches the rest of the alignment (something the current add_sequence method doesn't do) otherwise its very easy to get a malformed alignment where some sequences are longer than others. Also, I would leave the existing .add_sequence() method in place, but update its docstring to encourage use of .append() instead. Peter From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 11:49:11 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 16:49:11 +0100 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB5E20.5090605@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> <46AB5E20.5090605@c2b2.columbia.edu> Message-ID: <46AB6577.6050708@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> Michiel de Hoon wrote: >>> >>> aln.add_sequence("seq1", "ATCGTTGC") >>> >>> aln[0] >>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', >>> name='', description='seq1', dbxrefs=[]) >>> # Suggestion 2: I would expect "seq1" as the id rather than the >>> description >> I agree with you here - this is the historic behaviour of the >> add_sequence method which actually creates a SeqRecord from the strings >> it is given. I would suggest it populate the record.id but for backwards >> compatibility still populate the record.description in case anyone is >> still using that. >> > That sounds good to me. Good. Done, CVS revision 1.6 of file Bio/Align/Generic.py Peter From kosa at genesilico.pl Sat Jul 28 12:53:04 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Sat, 28 Jul 2007 18:53:04 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AAB081.30609@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu> Message-ID: <46AB7470.6010006@genesilico.pl> Hi, I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of alignments. Ins't [A:B,X:Y] sufficient? Janek Michiel de Hoon wrote: > Jan Kosinski wrote: >> Hi, >> >> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is >> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is >> not clear while the [A:B][X:Y] syntax is clear and sufficient. > > Python lists, tuples, and strings support [A:B:C], and Numerical > Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment > should not support this format. > > --Michiel. > > :. > :. From kosa at genesilico.pl Sat Jul 28 12:55:33 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Sat, 28 Jul 2007 18:55:33 +0200 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> Message-ID: <46AB7505.30302@genesilico.pl> Hi, I think the same, an alignment should be mutable and there is no need for making two classes, mutable and not mutable. Janek Michiel de Hoon wrote: > Peter wrote: >>> For instance, the Alignment object should >>> support changing characters in the alignment without a need of >>> copying it (using aln[a,x] = "D"). Can it be done now with >>> Alignment which is a list of SeqRecord objects with sequences >>> implemented as immutable Seq objects ? >> > .... >> >> To me the obvious way to handle this is to have a MutableAlignment >> sub-class, where editing individual elements with aln[r,c] = "D" >> would be supported (possibly implemented using the MutableSeq class >> internally rather than the immutable Seq class). >> > I don't think we'd need a separate MutableAlignment for that. An > Alignment is a list of sequences and is therefore mutable. If we add a > __setitem__ method to the Alignment class, then this method can take > care of constructing a new sequence and put it in the appropriate row. > > --Michiel. > > :. > :. From mdehoon at c2b2.columbia.edu Sun Jul 29 00:38:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 13:38:28 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB7470.6010006@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu> <46AB7470.6010006@genesilico.pl> Message-ID: <46AC19C4.1000102@c2b2.columbia.edu> Jan Kosinski wrote: > I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of > alignments. Ins't [A:B,X:Y] sufficient? > [A:B,X:Y] may be sufficient, but does not agree with Python indices for other objects (lists, tuples, strings). In addition, since allowing [A:B,X:Y] only is different from usual Python usage, we'd actually end up writing more code to specifically disallow [A:B:C,X:Y:Z]. Note also that [A:B:C,X:Y:Z] includes [A:B,X:Y] as a special case. So if the Alignment class is written to deal with [A:B:C,X:Y:Z], but I'd tell you that it expects [A:B,X:Y], then you wouldn't notice any difference. Until you'd try [A:B:C,X:Y:Z] and you find out that that works too. --Michiel. From mdehoon at c2b2.columbia.edu Tue Jul 31 21:50:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Tue, 31 Jul 2007 21:50:05 -0400 Subject: [Biopython-dev] Improving the Alignment object References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> Message-ID: <6243BAA9F5E0D24DA41B27997D1FD14402B5FD@mail2.exch.c2b2.columbia.edu> Peter wrote: > I'm not sure if requests for part of a single row or column like > [rrr, xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning > sub-alignments or as special cases (strings/Seq and Seq/SeqRecord > respectively?). Jan wrote: > For instance, the Alignment object should > support changing characters in the alignment without a need of copying > it (using aln[a,x] = "D"). Can it be done now with Alignment which is > a list of SeqRecord objects with sequences implemented as immutable Seq > objects ? > If we allow >>> aln[a,x] = "D" then we should also allow >>> aln[a,x:x+4] = "DEFG" >>> aln[a:a+5,x] = "KLMNO" and perhaps even >>> aln[a:a+5,x:x+3] = ["KLMNO","PQRST","UVWXY"] For consistency, I feel that then aln[a,x:y] and aln[a:b,x] should both return a string. --Michiel Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From bugzilla-daemon at portal.open-bio.org Sun Jul 1 02:55:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 30 Jun 2007 22:55:31 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707010255.l612tVwN022655@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #15 from mdehoon at ims.u-tokyo.ac.jp 2007-06-30 22:55 EST ------- I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 03:23:02 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 30 Jun 2007 23:23:02 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707010323.l613N24V023919@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #16 from sbassi at gmail.com 2007-06-30 23:23 EST ------- (In reply to comment #15) > I've added the functions gcg and seguid to Bio/SeqUtils/CheckSum.py. > This code won't run on Python 2.3: ============================================= sbassi at hp:~/bioinfo$ python Python 2.3.4 (#2, Jun 16 2005, 18:52:31) [GCC 3.3.5 (Debian 1:3.3.5-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import CheckSum Traceback (most recent call last): File "", line 1, in ? File "CheckSum.py", line 50 return sum(n*ord(c.upper()) for (n,c) in izip(cycle(range(1,58)),seq)) % 10000 ^ SyntaxError: invalid syntax ========================================== That is why I made a separate module for Python 2.4+ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 05:54:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 01:54:55 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707010554.l615stgK032500@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #17 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 01:54 EST ------- Sorry for the mistake. With the code for Python >= 2.4 separately, we still get an error message when installing Biopython, because Python attempts to byte-compile each module. It is not so serious, because this error is otherwise ignored. However, how about this code for Python >= 2.4: from itertools import cycle, imap return sum(imap(lambda n,c: n*ord(c.upper()), cycle(range(1,58)),seq)) % 10000 It is almost as fast as the code you now have for Python >= 2.4, but avoids having to create a separate module gcg24.py. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 11:02:47 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 07:02:47 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707011102.l61B2lHg029279@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #18 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 07:02 EST ------- Btw, I am finding that the code for Python < 2.3 is faster than the code for Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even if we avoid copying seq, I still find that it is faster than the code for Python >= 2.4. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Jul 1 12:01:00 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 01 Jul 2007 21:01:00 +0900 Subject: [Biopython-dev] TempFastaWriter, TempFastaWriterSingle in Bio/GFF/easy.py In-Reply-To: <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com> References: <4685FCCA.4090904@c2b2.columbia.edu> <320fb6e00706301214p41c33329o98126738d77fad19@mail.gmail.com> Message-ID: <4687977C.70903@c2b2.columbia.edu> Peter wrote: >> Is anybody using the classes TempFastaWriter, TempFastaWriterSingle in >> Bio/GFF/easy.py? They are currently using the old Fasta writer in >> Bio.SeqIO.FASTA instead of the new one in Bio.SeqIO.FastaIO. We can >> either update them to use the new Fasta writer, or simply remove them, >> since currently these classes are not used anywhere in Biopython. > > This is for Bug 2284 right? > http://bugzilla.open-bio.org/show_bug.cgi?id=2284 > > I'm inclined to remove classes TempFastaWriter and TempFastaWriterSingle > Actually I hadn't noticed bug 2284. I looked into this because the Biopython tests are causing DeprecationWarnings. If no users of these classes step forward, I am in favor of removing them. --Michiel. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 14:13:29 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 10:13:29 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707011413.l61EDTF3012907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #19 from sbassi at gmail.com 2007-07-01 10:13 EST ------- (In reply to comment #18) > Btw, I am finding that the code for Python < 2.3 is faster than the code for > Python >= 2.4. The former uses more memory, as it makes a copy of seq, but even > if we avoid copying seq, I still find that it is faster than the code for > Python >= 2.4. OK, so leave it w/o the check for python version and use just the 2.3 code. Best, SB. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 22:38:55 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 18:38:55 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707012238.l61Mct1k007379@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #20 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 18:38 EST ------- Updated in CVS, using the 2.3 code without copying seq. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 23:42:14 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 19:42:14 -0400 Subject: [Biopython-dev] [Bug 2327] New: test_Cluster takes too long Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2327 Summary: test_Cluster takes too long Product: Biopython Version: 1.43 Platform: Other OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: idoerg at burnham.org When running the biopython test suite, test_Cluster takes too long. I gave up after 2 minutes. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 23:55:34 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 19:55:34 -0400 Subject: [Biopython-dev] [Bug 2327] test_Cluster takes too long In-Reply-To: Message-ID: <200707012355.l61NtYcR012177@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2327 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST ------- *** This bug has been marked as a duplicate of bug 2268 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 1 23:55:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 1 Jul 2007 19:55:36 -0400 Subject: [Biopython-dev] [Bug 2268] Cluster unit test suite runs indefinitely In-Reply-To: Message-ID: <200707012355.l61NtaNW012196@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2268 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |idoerg at gmail.com ------- Comment #22 from mdehoon at ims.u-tokyo.ac.jp 2007-07-01 19:55 EST ------- *** Bug 2327 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 3 11:03:40 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Jul 2007 07:03:40 -0400 Subject: [Biopython-dev] [Bug 2328] New: NCBIStandalone.blastall chokes on integer argument Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2328 Summary: NCBIStandalone.blastall chokes on integer argument Product: Biopython Version: 1.43 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: grunberg at embl.de CC: grunberg at embl.de Unlike previous versions, the current NCBIStandalone.blastall and blastpgp expect that the argument align_view is given as a string rather than an integer. So the following call worked with previous versions but now fails:: results, err = NCBIStandalone.blastall( settings.blast_bin, method, db, seqFile, expectation=e, align_view=7, ## XML output **kw) The error is raised here:: NCBIStandalone: 1788 (blastall) w, r, e = os.popen3(" ".join([blastcmd] + params)) because align_view escapes the str conversion of the other parameters in this line:: params.extend([att2param['align_view'], align_view]) This line should rather look like this:: params.extend([att2param['align_view'], str(align_view)]) I am going to attach a patch to this bugreport. Greetings, Raik -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 3 11:05:37 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Jul 2007 07:05:37 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707031105.l63B5bAP013190@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 ------- Comment #1 from grunberg at embl.de 2007-07-03 07:05 EST ------- Created an attachment (id=698) --> (http://bugzilla.open-bio.org/attachment.cgi?id=698&action=view) patch for Bug 2328 (NCBIStandalone.blastall / blastpgp) The patch is described in my bug report. Cheers, Raik -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jul 3 23:26:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Jul 2007 19:26:15 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707032326.l63NQFBB022873@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2007-07-03 19:26 EST ------- > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp > expect that the argument align_view is given as a string rather than an > integer. So the following call worked with previous versions but now fails:: In which previous version of Biopython did this work? Your patch looks fine, but I'd like to find out how this bug entered Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 5 13:30:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Jul 2007 09:30:32 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707051330.l65DUW2k004459@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #21 from dalloliogm at gmail.com 2007-07-05 09:30 EST ------- (In reply to comment #1) > Created an attachment (id=689) --> (http://bugzilla.open-bio.org/attachment.cgi?id=689&action=view) [details] > Proposed functions (CRC64 and GCG checksum) > > This could be in utils.py, but I am not sure. Maybe it could be useful to add a 'GCG checksum' attribute to the BioPython Seq object. Checksums could be used to quickly compare if two sequences are the same; but in the documentation you should state very clearly that two sequences which differ even for a single symbol (ex. AAANAAA and AAAAAAA) have different values. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 7 09:28:56 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 7 Jul 2007 05:28:56 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707070928.l679SuTJ010432@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 ------- Comment #3 from grunberg at embl.de 2007-07-07 05:28 EST ------- (In reply to comment #2) > > Unlike previous versions, the current NCBIStandalone.blastall and blastpgp > > expect that the argument align_view is given as a string rather than an > > integer. So the following call worked with previous versions but now fails:: > > In which previous version of Biopython did this work? Your patch looks fine, > but I'd like to find out how this bug entered Biopython. > Sorry about the late reply... My previous Biopython installation (which didn't have the glitch) was version 1.42. Greetings Raik -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 8 04:20:12 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 8 Jul 2007 00:20:12 -0400 Subject: [Biopython-dev] [Bug 2328] NCBIStandalone.blastall chokes on integer argument In-Reply-To: Message-ID: <200707080420.l684KCSq031646@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2328 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2007-07-08 00:20 EST ------- Fixed in CVS (see biopython/Bio/Blast/NCBIStandalone.py revision 1.68). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chengsoon.ong at tuebingen.mpg.de Mon Jul 9 10:15:50 2007 From: chengsoon.ong at tuebingen.mpg.de (Cheng Soon Ong) Date: Mon, 9 Jul 2007 12:15:50 +0200 Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast Message-ID: Hi, I've just written a small extension to the qblast function. The current version of only passes a subset of parameters to NCBI. I've just written some code such that it passes all the parameters that the qblast API at NCBI accepts. Is anyone interested to merge this into the blast module of Biopython? Sorry, I do not know the protocol here for getting code into Biopython. Cheng From mdehoon at c2b2.columbia.edu Mon Jul 9 11:40:23 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Mon, 09 Jul 2007 20:40:23 +0900 Subject: [Biopython-dev] Bio.Blast.NCBIWWW.qblast In-Reply-To: References: Message-ID: <46921EA7.2080106@c2b2.columbia.edu> Dear Cheng, Thank you for your contribution. The "official" way to contribute code to Biopython is to open a bug report at http://bugzilla.open-bio.org/, open a new bug report, and add your code to it. For your qblast code, you can also just send it to me (not to the list), then I can merge it into Biopython. --Michiel. Cheng Soon Ong wrote: > Hi, > > I've just written a small extension to the qblast function. The > current version of only passes a subset of parameters to NCBI. I've > just written some code such that it passes all the parameters that > the qblast API at NCBI accepts. > > Is anyone interested to merge this into the blast module of > Biopython? Sorry, I do not know the protocol here for getting code > into Biopython. > From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 19:31:55 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 20:31:55 +0100 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> Message-ID: <4693DEAB.8000900@maubp.freeserve.co.uk> Hi Tiago, Have you had any feedback (off the mailing list)? Ralph - did you have a chance to look over Tiago's code or discuss this with him? It would be a shame if nothing came from this... Peter Tiago Ant?o wrote: > Hi! > > I have submitted another enhancement bug, with support for FDist. It > allows to generate and parse Fdist files and to control fdist > applications. There are also a couple of utility functions. FDist is a > niche application (mainly used to detect selection in animal > genetics). Not the most fundamental one to support, but it is > currently one that I am working on, thus, the code. > > Regarding my summited code for GenePop, I have summited a different > version on bugzilla. The main difference, is that I moved everything > from Bio to Bio.PopGen. > > Before I continue putting code on bugzilla I would like to know if it > is worthwhile doing it... Any opinions on the code submitted or if any > changes are required? I would really like to continue converting my > code to BioPython, but only if it has any possibility of ending up > being useful/included in distribution somewhere in the future... ;) > > I am currently working on code related to SimCoal2, Arlequin and > general statistics (Fst, heterozygosity, ...). Which will probably be > ready quite soon (ie, next two weeks). This is more mainstream than > FDist > > I have some other code lying around mainly related to HapMap, but I > will only submit it after reviewing and reusing it again. This is more > distant future ... like a couple of months. > > Tiago From biopython-dev at maubp.freeserve.co.uk Tue Jul 10 21:12:44 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Jul 2007 22:12:44 +0100 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> Message-ID: <4693F64C.9050403@maubp.freeserve.co.uk> Ralph Haygood wrote: > Peter, > > I haven't received any code from Tiago to review. > > Ralph He's put some on Bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2170 Peter From rhaygood at duke.edu Wed Jul 11 03:45:56 2007 From: rhaygood at duke.edu (Ralph Haygood) Date: Tue, 10 Jul 2007 23:45:56 -0400 (EDT) Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: <4693DEAB.8000900@maubp.freeserve.co.uk> References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> Message-ID: Peter and Tiago, Hello. No, I haven't done anything with Tiago's code. I'm afraid it's pretty far from what I'm working on these days. I still think it would be good for BioPython to include methods for computing basic population-genetical statistics (Watterson's theta, Tajima's D, etc.) from DNA alignments. I have in mind something like BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't conform to BioPython's standards for style, testing, or documentation, and I don't know when I'll have time to standardize it. Ralph On Tue, 10 Jul 2007, Peter wrote: > Hi Tiago, > > Have you had any feedback (off the mailing list)? > > Ralph - did you have a chance to look over Tiago's code or discuss this with > him? > > It would be a shame if nothing came from this... > > Peter > > Tiago Ant?o wrote: >> Hi! >> >> I have submitted another enhancement bug, with support for FDist. It >> allows to generate and parse Fdist files and to control fdist >> applications. There are also a couple of utility functions. FDist is a >> niche application (mainly used to detect selection in animal >> genetics). Not the most fundamental one to support, but it is >> currently one that I am working on, thus, the code. >> >> Regarding my summited code for GenePop, I have summited a different >> version on bugzilla. The main difference, is that I moved everything >> from Bio to Bio.PopGen. >> >> Before I continue putting code on bugzilla I would like to know if it >> is worthwhile doing it... Any opinions on the code submitted or if any >> changes are required? I would really like to continue converting my >> code to BioPython, but only if it has any possibility of ending up >> being useful/included in distribution somewhere in the future... ;) >> >> I am currently working on code related to SimCoal2, Arlequin and >> general statistics (Fst, heterozygosity, ...). Which will probably be >> ready quite soon (ie, next two weeks). This is more mainstream than >> FDist >> >> I have some other code lying around mainly related to HapMap, but I >> will only submit it after reviewing and reusing it again. This is more >> distant future ... like a couple of months. >> >> Tiago > > > From tiagoantao at gmail.com Wed Jul 11 10:05:21 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 11 Jul 2007 12:05:21 +0200 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> Message-ID: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com> Hi, I had no feedback and it seemed that there was no interest, so I decided to start a Python Population Genetics project on google, which is going ahead, but still on alpha stages: http://code.google.com/p/pypopgen/ I am doing this on a personal basis for now (I did not even announce it anywhere), and so it is advancing at my personal pace and design according to me needs I have used it already (or a tiny part of it) on a published aplication ( http://popgen.eu/soft/m4s2 ). I am still willing to integrate this on BioPython, but for that some interest and feedback would be needed... That would have to happen somewhat soon as the code will have to be adapted to BioPython standards and namespace, and when, in a future, there is a lot of code that will be in practice difficult (and after going public it will be impossible really). The "strangest" code that I am doing (and that would need more discussion) is one to do asyncronous computation (to be easy to use on multicore computers and grids). Regards, Tiago On 7/11/07, Ralph Haygood wrote: > Peter and Tiago, > > Hello. No, I haven't done anything with Tiago's code. I'm afraid > it's pretty far from what I'm working on these days. > > I still think it would be good for BioPython to include methods for > computing basic population-genetical statistics (Watterson's theta, > Tajima's D, etc.) from DNA alignments. I have in mind something like > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't > conform to BioPython's standards for style, testing, or documentation, > and I don't know when I'll have time to standardize it. > > Ralph > > On Tue, 10 Jul 2007, Peter wrote: > > > Hi Tiago, > > > > Have you had any feedback (off the mailing list)? > > > > Ralph - did you have a chance to look over Tiago's code or discuss this with > > him? > > > > It would be a shame if nothing came from this... > > > > Peter > > > > Tiago Ant?o wrote: > >> Hi! > >> > >> I have submitted another enhancement bug, with support for FDist. It > >> allows to generate and parse Fdist files and to control fdist > >> applications. There are also a couple of utility functions. FDist is a > >> niche application (mainly used to detect selection in animal > >> genetics). Not the most fundamental one to support, but it is > >> currently one that I am working on, thus, the code. > >> > >> Regarding my summited code for GenePop, I have summited a different > >> version on bugzilla. The main difference, is that I moved everything > >> from Bio to Bio.PopGen. > >> > >> Before I continue putting code on bugzilla I would like to know if it > >> is worthwhile doing it... Any opinions on the code submitted or if any > >> changes are required? I would really like to continue converting my > >> code to BioPython, but only if it has any possibility of ending up > >> being useful/included in distribution somewhere in the future... ;) > >> > >> I am currently working on code related to SimCoal2, Arlequin and > >> general statistics (Fst, heterozygosity, ...). Which will probably be > >> ready quite soon (ie, next two weeks). This is more mainstream than > >> FDist > >> > >> I have some other code lying around mainly related to HapMap, but I > >> will only submit it after reviewing and reusing it again. This is more > >> distant future ... like a couple of months. > >> > >> Tiago > > > > > > From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:08:07 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 07:08:07 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131108.l6DB87xm027778@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:08 EST ------- I've started to write a test case for the code now in Bio/SeqUtils/CheckSum.py and noticed that while crc64, gcg and seguid will cope with both strings and Seq objects, crc32 will only cope with strings. Any objections to me fixing this like so: Old: from binascii import crc32 New: from binascii import crc32 as _crc32 def crc32(seq) : """Returns the crc32 checksum for a sequence (string or Seq object)""" try : #Assume its a Seq object return _crc32(seq.tostring()) except AttributeError : #Assume its a string return _crc32(seq) -- Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 13 11:18:30 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 07:18:30 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131118.l6DBIUOS028425@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 07:18 EST ------- Created an attachment (id=703) --> (http://bugzilla.open-bio.org/attachment.cgi?id=703&action=view) Initial unit test for Bio/SeqUtils/CheckSum If the crc32 function could accept a Seq object then the "try/except" at the end isn't needed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 13 14:38:52 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 10:38:52 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131438.l6DEcqn5008339@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #24 from mdehoon at ims.u-tokyo.ac.jp 2007-07-13 10:38 EST ------- A better solution would be for Seq to inherit from str, instead of Seq having str as a member. Then we don't have to modify crc32, and other code in Biopython will also become simpler. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 13 15:17:59 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 11:17:59 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707131517.l6DFHxY4010647@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:17 EST ------- I have just fixed a few in CVS, here a list of remaining abnormal shebang/hashbang lines: biopython/Bio/EUtils/POM.py '#!/usr/bin/python -i\n' biopython/Bio/EUtils/DTDs/LinkOut.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/__init__.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eInfo_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eLink_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/ePost_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eSearch_020511.py '#!/usr/bin/python\n' biopython/Bio/EUtils/DTDs/eSummary_020511.py '#!/usr/bin/python\n' The biopython/Bio/EUtils/*.py examples are interesting in that many of those files are autogenerated from DTD files (using the dtd2py.py script I think - but it doesn't seem to work on all of them). Also, I don't think all the files under Bio/Restriction/*.py need a shebang, and a large proportion of the unit tests have shebangs (but less than half). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Fri Jul 13 15:23:03 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 13 Jul 2007 16:23:03 +0100 Subject: [Biopython-dev] FDist: more Population Genetics code In-Reply-To: <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com> References: <6d941f120701030516m1adb3daeh6e4645121ba8679d@mail.gmail.com> <4693DEAB.8000900@maubp.freeserve.co.uk> <6d941f120707110305n75c75e77y7426280477cf23ca@mail.gmail.com> Message-ID: <6d941f120707130823i6b62478fl9ac589eb5c02ca9d@mail.gmail.com> I just want to add that I followed precisely the procedure that I was suggested at that time, ie to open bugzilla issues, but I got no answer or follow up from it. I also had some very useful mail exchanges with Ralph at that time, but no code was floated around. I reiterate my interest in supplying the code (currently supporting fdist, simcoal2, genepop, hapmap, ucsc table browser - in varying degrees of quality). You can have a look at the google url supplied (svn repository in it). I would still take the necessary time to convert it to BioPython namespace and format. If in one week I see no interest (interest in the form of pro actively making things go forward) at all then I will consider this a closed issue and will not spend more time with trying any form of integration, in the sense that I have done all that was requested here and really got no feedback. Tiago On 7/11/07, Tiago Ant?o wrote: > Hi, > > I had no feedback and it seemed that there was no interest, so I > decided to start a Python Population Genetics project on google, which > is going ahead, but still on alpha stages: > http://code.google.com/p/pypopgen/ > I am doing this on a personal basis for now (I did not even announce > it anywhere), and so it is advancing at my personal pace and design > according to me needs > I have used it already (or a tiny part of it) on a published > aplication ( http://popgen.eu/soft/m4s2 ). > I am still willing to integrate this on BioPython, but for that some > interest and feedback would be needed... That would have to happen > somewhat soon as the code will have to be adapted to BioPython > standards and namespace, and when, in a future, there is a lot of code > that will be in practice difficult (and after going public it will be > impossible really). > > The "strangest" code that I am doing (and that would need more > discussion) is one to do asyncronous computation (to be easy to use on > multicore computers and grids). > > Regards, > Tiago > > On 7/11/07, Ralph Haygood wrote: > > Peter and Tiago, > > > > Hello. No, I haven't done anything with Tiago's code. I'm afraid > > it's pretty far from what I'm working on these days. > > > > I still think it would be good for BioPython to include methods for > > computing basic population-genetical statistics (Watterson's theta, > > Tajima's D, etc.) from DNA alignments. I have in mind something like > > BioPerl's PopGen (http://www.bioperl.org/wiki/HOWTO:PopGen). My own > > code is easy to use with a Bio.Align.Generic.Alignment, but it doesn't > > conform to BioPython's standards for style, testing, or documentation, > > and I don't know when I'll have time to standardize it. > > > > Ralph > > > > On Tue, 10 Jul 2007, Peter wrote: > > > > > Hi Tiago, > > > > > > Have you had any feedback (off the mailing list)? > > > > > > Ralph - did you have a chance to look over Tiago's code or discuss this with > > > him? > > > > > > It would be a shame if nothing came from this... > > > > > > Peter > > > > > > Tiago Ant?o wrote: > > >> Hi! > > >> > > >> I have submitted another enhancement bug, with support for FDist. It > > >> allows to generate and parse Fdist files and to control fdist > > >> applications. There are also a couple of utility functions. FDist is a > > >> niche application (mainly used to detect selection in animal > > >> genetics). Not the most fundamental one to support, but it is > > >> currently one that I am working on, thus, the code. > > >> > > >> Regarding my summited code for GenePop, I have summited a different > > >> version on bugzilla. The main difference, is that I moved everything > > >> from Bio to Bio.PopGen. > > >> > > >> Before I continue putting code on bugzilla I would like to know if it > > >> is worthwhile doing it... Any opinions on the code submitted or if any > > >> changes are required? I would really like to continue converting my > > >> code to BioPython, but only if it has any possibility of ending up > > >> being useful/included in distribution somewhere in the future... ;) > > >> > > >> I am currently working on code related to SimCoal2, Arlequin and > > >> general statistics (Fst, heterozygosity, ...). Which will probably be > > >> ready quite soon (ie, next two weeks). This is more mainstream than > > >> FDist > > >> > > >> I have some other code lying around mainly related to HapMap, but I > > >> will only submit it after reviewing and reusing it again. This is more > > >> distant future ... like a couple of months. > > >> > > >> Tiago > > > > > > > > > > From bugzilla-daemon at portal.open-bio.org Fri Jul 13 15:25:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Jul 2007 11:25:32 -0400 Subject: [Biopython-dev] [Bug 2323] New functions: GCG Checksum and CRC64 In-Reply-To: Message-ID: <200707131525.l6DFPWMa011025@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2323 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-13 11:25 EST ------- Changing the Seq object to be a subclass of string might be nice... but perhaps rather confusing for minority alphabets where the "letters" are not single characters(*). More importantly, wouldn't this dramatic change break a lot of existing scripts? Probably something for the mailing list! (*) I've never done it, but one example is storing three letter protein sequences, nice if you have any post translational modifications which cannot be represented using the single letter scheme. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Sat Jul 14 10:22:06 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Jul 2007 11:22:06 +0100 Subject: [Biopython-dev] Bug 2292 - TER lines from Bio.PDBIO Message-ID: <4698A3CE.7020907@maubp.freeserve.co.uk> Hi Thomas, Could you have a look at Biopython Bug 2292 and the suggested patch from Michal Gajda to write TER records in line with the spec: http://bugzilla.open-bio.org/show_bug.cgi?id=2292 Thanks Peter From tiagoantao at gmail.com Sat Jul 14 16:32:43 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 14 Jul 2007 17:32:43 +0100 Subject: [Biopython-dev] Population Genetics code Message-ID: <6d941f120707140932u356c84bel6a9322a2767e6da7@mail.gmail.com> Hi! Firstly I would like to thank everybody that answered so positively to my "rant" about submitting population genetics code to Biopython. I have a few suggestions on how to progress in a safe in constructive way with a possible Population Genetics part for biopython. First of all, the starting point: 1. There is none in the core developers that is working actively in populations genetics 2. Point 1 entails that any code submissions (made by biopython newbies like me) will not be able to be completely reviewed by seasoned biopython developers 3. Initially there will only be me submitting code (please correct me if I am wrong, especially Ralph...) 4. There is already some popgen statistical code in python lying around e.g. http://www.pypop.org/ Therefore I suggest starting out by doing a small, "safe", project around a not very used application (Mark Beaumont's Fdist program http://www.rubic.rdg.ac.uk/~mab/software.html ). This code is already done and tested (by myself). I also have test cases (in BioPython format) for parts of it. The major issue is that it is currently outside of Bio.PopGen namespace, so its not really very major... I would provide parsers, configuration file generators and utilities to run the suite of fdist programs. Why start with such a simple and less relevant application: 1. Its safer to start with something less grand (if its poorly done it won't be that serious). 2. There is no python fdist code lying around, so there is no overlap at all with existing projects 3. This code is already done and being used... I will provide code, test code, and documentation (probably by adding stuff to the wiki). Then other people could evaluate what was done, and we would continue from there to other, more used applications (Genepop, arlequin, simcoal2, ...) and databases (HapMap, TableBrowser). Is this an acceptable way of going ahead? If other people would like to participate, that would be fantastic... If my suggestion is rubbish, please also say ;) Many thanks, Tiago From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 18:27:40 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 19:27:40 +0100 Subject: [Biopython-dev] Biopython usage figures Message-ID: <469BB89C.8010904@maubp.freeserve.co.uk> A little last minute I know, but would anyone have access to the website download statistics? I'd like to include rough figures for the number of downloads of the recent releases in the BOSC 2007 talk. A list of developers with CVS access would be nice too - but I can just trawl though the logs to spot active people ;) Peter From biopython-dev at maubp.freeserve.co.uk Mon Jul 16 18:50:49 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Mon, 16 Jul 2007 19:50:49 +0100 Subject: [Biopython-dev] Is Bio.Crystal obsolete? Message-ID: <469BBE09.1000005@maubp.freeserve.co.uk> I just had a look at the Bio.Crystal module by Katharine Lindner (2002), consisting of the single file Bio/Crystal/__init__.py whose preamble states: > Hetero, Crystal and Chain exist to represent the NDB Atlas > structure. Atlas is a minimal subset of the PDB format. Heteo > supports a 3 alphameric code. The NDB web interface is located at > ... The old link should probably be updated as it doesn't work, perhaps: http://ndbserver.rutgers.edu/atlas/index.html As far as I can see, they now provide their downloads in PDB, CIF and an XML file format - and the PDB files look like full thing to me at first glance rather than a minimal subset. There is a unit test, Tests/test_Crystal.py but no example input files. This module looks obsolete to me - can we mark it as deprecated after checking on the main list no one uses it (as done for Bio.Kabat back in March 2007)? Peter From tiagoantao at gmail.com Wed Jul 18 10:29:08 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 18 Jul 2007 11:29:08 +0100 Subject: [Biopython-dev] PopGen code Message-ID: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> Hi! Starting today I will begin putting code on CVS regarding Population Genetics stuff. I will start by checking in a GenePop parser and test code. Very soon FDist code will follow. After that Simcoal stuff, more GenePop stuff, HapMap and UCSC table browser will follow. I was not able to read dev.open-bio.org suggestions as it seems to be down for a some time. If any of the senior Biopython developers finds that I am doing anything seriously wrong, please don't hesitate to contact me immediately. I will be putting everything below a PopGen directory in Bio. Everything except tests, of course ;) Regards, Tiago From biopython-dev at maubp.freeserve.co.uk Wed Jul 18 21:37:46 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 18 Jul 2007 22:37:46 +0100 Subject: [Biopython-dev] PopGen code In-Reply-To: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> Message-ID: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com> Tiago Ant?o wrote: > Hi! > > Starting today I will begin putting code on CVS regarding Population > Genetics stuff... > I will be putting everything below a PopGen directory in Bio. > Everything except tests, of course ;) Sounds good :) If you can write some introductory text to add to the cookbook/tutorial that would be even better. If you are not familiar with LaTeX, then just write it up in plain text and I could add that to the tutorial with suitable mark-up/formatting on your behalf. This may be easier to do in chunks as you add new code, or in a large batch later on - up to you. Peter From tiagoantao at gmail.com Wed Jul 18 22:46:19 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 18 Jul 2007 23:46:19 +0100 Subject: [Biopython-dev] PopGen code In-Reply-To: <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com> References: <6d941f120707180329u6bf60c50o8e4868e5a470de2c@mail.gmail.com> <320fb6e00707181437l22b1aecdh8ab5c2fa2aea7380@mail.gmail.com> Message-ID: <6d941f120707181546y34e17038nb07106dacae533db@mail.gmail.com> Hi! On 7/18/07, Peter wrote: > If you can write some introductory text to add to the > cookbook/tutorial that would be even better. If you are not familiar > with LaTeX, then just write it up in plain text and I could add that > to the tutorial with suitable mark-up/formatting on your behalf. I agree, in fact it is what I intend to do after having the FDist code in. I will write mostly in parallel with commiting. So the doc should be more or less aligned with what is being put in CVS... Regards, Tiago From tiagoantao at gmail.com Thu Jul 19 13:09:29 2007 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 19 Jul 2007 14:09:29 +0100 Subject: [Biopython-dev] PopGen Documentation Message-ID: <6d941f120707190609o4f5f7412x94851295865ba22b@mail.gmail.com> Hi All, Following Peter's suggestion, I had a closer look at the documentation, and, if nobody opposes, I would like to add a new subsection between PDB and Miscellaneous on the cookbook chapter, Like this 4.10 Going 3D: The PDB module 4.11 PopGen: Population genetics (and genomics) 4.12 Miscellaneous Tiago On 7/18/07, Peter wrote: > Tiago Ant?o wrote: > > Hi! > > > > Starting today I will begin putting code on CVS regarding Population > > Genetics stuff... > > I will be putting everything below a PopGen directory in Bio. > > Everything except tests, of course ;) > > Sounds good :) > > If you can write some introductory text to add to the > cookbook/tutorial that would be even better. If you are not familiar > with LaTeX, then just write it up in plain text and I could add that > to the tutorial with suitable mark-up/formatting on your behalf. > > This may be easier to do in chunks as you add new code, or in a large > batch later on - up to you. > > Peter > From bugzilla-daemon at portal.open-bio.org Sat Jul 21 15:28:49 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Jul 2007 11:28:49 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707211528.l6LFSnBk031498@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:28 EST ------- In biopython/Bio/EUtils, the shebang line was specified in sourcegen.py. I fixed that line and the shebang lines in the other *.py files under biopython/Bio/EUtils. Can we close this bug? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jul 21 15:47:32 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 21 Jul 2007 11:47:32 -0400 Subject: [Biopython-dev] [Bug 2291] __init__.py missing in the Bio.PDB.mmCIF folder after the install In-Reply-To: Message-ID: <200707211547.l6LFlWxU032394@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2291 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2007-07-21 11:47 EST ------- I have modified setup.py so that Bio.PDB.mmCIF is always a module (but not necessarily with the MMCIFlex module; users still need to modify setup.py to include it). With Bio.PDB.mmCIF always present, the Bio/PDB/mmCIF/__init__.py file is no longer lost. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 22 08:30:11 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 22 Jul 2007 04:30:11 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707220830.l6M8UB6d006746@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-22 04:30 EST ------- Regarding comment 8, after changing sourcegen.py were you able to regenerate all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand? Anyway - that should leave us with consistent shebang/hashbang lines :) Unless we also want to remove any surplus lines, and decide if all or none of the unit tests should have them, then this bug looks done. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jul 22 09:53:46 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 22 Jul 2007 05:53:46 -0400 Subject: [Biopython-dev] [Bug 2269] Shebang (hashbang) lines need cleanup In-Reply-To: Message-ID: <200707220953.l6M9rkap010929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2269 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2007-07-22 05:53 EST ------- > Regarding comment 8, after changing sourcegen.py were you able to regenerate > all the biopython/Bio/EUtils/*.py files? Or did you just fix them by hand? I fixed them by hand. The fixed sourcegen.py should result in the same biopython/Bio/EUtils/*.py files as I created by hand. I tried regenerating these files automatically, but that didn't work for me. At some point, somebody should figure out how the biopython/Bio/EUtils code works. > Unless we also want to remove any surplus lines, and decide if all or none of > the unit tests should have them, then this bug looks done. Since Python itself does not seem to have a clear rule as to which files should have a shebang line, it is not obvious which Biopython files should have one. If somebody really wants to fix this, it's probably better to discuss such an issue on the mailing list first. As the issue raised by the original bug report has been resolved, I am closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mdehoon at c2b2.columbia.edu Sun Jul 22 10:28:22 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 22 Jul 2007 19:28:22 +0900 Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and files with one record) In-Reply-To: <4693E5FE.708@maubp.freeserve.co.uk> References: <4693E5FE.708@maubp.freeserve.co.uk> Message-ID: <46A33146.7030405@c2b2.columbia.edu> Peter wrote: > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html > Let's discuss the Bio.Align.Alignment class first, and then decide how to parse alignment files. Currently, the alignment class holds a list of SeqRecord objects: class Alignment: ... def __init__(self, alphabet): ... # hold everything at a list of seq record objects self._records = [] To get access to self_record, the Alignment class has some accessor functions: def get_all_seqs(self): ... return self._records def get_seq_by_num(self, number): ... return self._records[number].seq A cleaner way to do this is to let the class Alignment inherit from list. This also allows us to use all list methods on Alignment objects. For example, we can iterate over them, as suggested in this bug report: http://bugzilla.open-bio.org/show_bug.cgi?id=1944 Any objections against letting Alignment inherit from list? --Michiel From salish at picasso.ucsf.edu Sun Jul 22 18:27:58 2007 From: salish at picasso.ucsf.edu (Howard Salis) Date: Sun, 22 Jul 2007 11:27:58 -0700 Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and files with one record) In-Reply-To: <46A33146.7030405@c2b2.columbia.edu> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> Message-ID: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> Hello all, To get this same behavior, you can also create the __iter__ and next() methods in Alignment itself. -Howard Salis On 7/22/07, Michiel de Hoon wrote: > Peter wrote: > > P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? > > http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html > > > Let's discuss the Bio.Align.Alignment class first, and then decide how > to parse alignment files. > > Currently, the alignment class holds a list of SeqRecord objects: > > > class Alignment: > ... > def __init__(self, alphabet): > ... > # hold everything at a list of seq record objects > self._records = [] > > To get access to self_record, the Alignment class has some accessor > functions: > > def get_all_seqs(self): > ... > return self._records > > > def get_seq_by_num(self, number): > ... > return self._records[number].seq > > A cleaner way to do this is to let the class Alignment inherit from > list. This also allows us to use all list methods on Alignment objects. > For example, we can iterate over them, as suggested in this bug report: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1944 > > Any objections against letting Alignment inherit from list? > > > --Michiel > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mdehoon at c2b2.columbia.edu Wed Jul 25 13:17:33 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 25 Jul 2007 22:17:33 +0900 Subject: [Biopython-dev] Bio.AlignIO (was Re: [BioPython] Bio.SeqIO and files with one record) In-Reply-To: <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> Message-ID: <46A74D6D.9020309@c2b2.columbia.edu> Sure, that is possible, but that means we'd be adding methods to Alignment in order for it to behave like a list, whereas we can get that for free by letting the Alignment class inherit from list. --Michiel. Howard Salis wrote: > Hello all, > > > To get this same behavior, you can also create the __iter__ and next() > methods in Alignment itself. > > -Howard Salis > > On 7/22/07, Michiel de Hoon wrote: >> Peter wrote: >>> P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007? >>> http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html >>> >> Let's discuss the Bio.Align.Alignment class first, and then decide how >> to parse alignment files. >> >> Currently, the alignment class holds a list of SeqRecord objects: >> >> >> class Alignment: >> ... >> def __init__(self, alphabet): >> ... >> # hold everything at a list of seq record objects >> self._records = [] >> >> To get access to self_record, the Alignment class has some accessor >> functions: >> >> def get_all_seqs(self): >> ... >> return self._records >> >> >> def get_seq_by_num(self, number): >> ... >> return self._records[number].seq >> >> A cleaner way to do this is to let the class Alignment inherit from >> list. This also allows us to use all list methods on Alignment objects. >> For example, we can iterate over them, as suggested in this bug report: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 >> >> Any objections against letting Alignment inherit from list? >> >> >> --Michiel >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 13:34:02 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 14:34:02 +0100 Subject: [Biopython-dev] Bio.AlignIO In-Reply-To: <46A74D6D.9020309@c2b2.columbia.edu> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> Message-ID: <46A7514A.1090405@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Sure, that is possible, but that means we'd be adding methods to > Alignment in order for it to behave like a list, whereas we can get > that for free by letting the Alignment class inherit from list. > > --Michiel. Personally I see an alignment as both an array of characters (i.e. amino acid residues or nucleotides), and a list of sequences. In the same way that a Numeric or NumPy array lets you iterate over rows, yet also access individual elements, we could allow iteration of SeqRecords and also allow access to individual letters. Peter From mdehoon at c2b2.columbia.edu Wed Jul 25 14:44:56 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Wed, 25 Jul 2007 23:44:56 +0900 Subject: [Biopython-dev] Bio.AlignIO In-Reply-To: <46A7514A.1090405@maubp.freeserve.co.uk> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> <46A7514A.1090405@maubp.freeserve.co.uk> Message-ID: <46A761E8.5080909@c2b2.columbia.edu> Peter wrote: > Personally I see an alignment as both an array of characters (i.e. amino > acid residues or nucleotides), and a list of sequences. > > In the same way that a Numeric or NumPy array lets you iterate over > rows, yet also access individual elements, we could allow iteration of > SeqRecords and also allow access to individual letters. How about the following: -Iterators iterate for the SeqRecords in the alignment -An index of the form [xxx] returns the corresponding SeqRecord -An index of the form [xxx:yyy:zzz] returns an Alignment object containing the SeqRecords in rows [xxx:yyy:zzz] (compare to the current method get_all_seqs()). -An index of the form [xxx,:] returns the Seq object of the SeqRecord at xxx (this is currently done by the get_seq_by_num() method). -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects -An index of the form [:,www] returns a string containing the characters at column www (which is currently done by the get_column method) -An index of the form [xxx:yyy:zzz,www] returns a string containing the characters at column www using only the rows xxx:yyy:zzz. -An index of the form [xxx,www] returns a string containing the character of the sequence in row xxx at column www. This is more-or-less how Numerical Python arrays work, except that we'll be returning SeqRecord/Seq/string objects depending on the indices. --Michiel. From biopython-dev at maubp.freeserve.co.uk Wed Jul 25 16:10:43 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Jul 2007 17:10:43 +0100 Subject: [Biopython-dev] Improving the Alignment object. Was Bio.AlignIO In-Reply-To: <46A761E8.5080909@c2b2.columbia.edu> References: <4693E5FE.708@maubp.freeserve.co.uk> <46A33146.7030405@c2b2.columbia.edu> <9fa7e98e0707221127v5b7b2a85x38978fd647e18931@mail.gmail.com> <46A74D6D.9020309@c2b2.columbia.edu> <46A7514A.1090405@maubp.freeserve.co.uk> <46A761E8.5080909@c2b2.columbia.edu> Message-ID: <46A77603.1030101@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> Personally I see an alignment as both an array of characters (i.e. amino >> acid residues or nucleotides), and a list of sequences. >> >> In the same way that a Numeric or NumPy array lets you iterate over >> rows, yet also access individual elements, we could allow iteration of >> SeqRecords and also allow access to individual letters. > > How about the following: > > -Iterators iterate for the SeqRecords in the alignment I Agree. And this is trivial to implement without needing the element access/splicing support. As to element access, we've been thinking along similar lines :) Its just that with all the different special cases, there are lots of different possible return types! > -An index of the form [xxx] returns the corresponding SeqRecord > -An index of the form [xxx:yyy:zzz] returns an Alignment object > containing the SeqRecords in rows [xxx:yyy:zzz] > (compare to the current method get_all_seqs()). I agree. This is essential to make an alignment act like a list of SeqRecord objects when only a one-dimensional index is given. > -An index of the form [xxx,:] returns the Seq object of the SeqRecord at > xxx (this is currently done by the get_seq_by_num() method). > -An index of the form [xxx:yyy:zzz,:] returns a list of Seq objects I'm not immediately convinced about returning Seq objects here. I might expect indices like [xxx,:] to return a SeqRecord (not a Seq) and [xxx:yyy:zzz,:] to return a sub-alignment (not a list of Seq objects). > -An index of the form [:,www] returns a string containing the characters > at column www (which is currently done by the get_column method) > -An index of the form [xxx,www] returns a string containing the > character of the sequence in row xxx at column www. Those look fine - however we might want to return Seq objects rather than strings. > -An index of the form [xxx:yyy:zzz,www] returns a string containing > the characters at column www using only the rows xxx:yyy:zzz. Or a sub alignment? See later... > This is more-or-less how Numerical Python arrays work, except that we'll > be returning SeqRecord/Seq/string objects depending on the indices. For comparison, that is what I had been thinking: * [r,c] means one element is requested, return a single character string * [r] or [r,:] means one row is requested, return a SeqRecord * [:,c] means one column is requested, return a string (or Seq object?) * Otherwise returns a (sub)alignment. Note that [:] or [:,:] would return a copy of the alignment. This would cover slicing of the column index by returning a sub-alignment. i.e. indexes of the form [rrr, xxx:yyy:zzz] or [rrr:ppp:qqq, xxx:yyy:zzz] I'm not sure if requests for part of a single row or column like [rrr, xxx:yyy:zzz] and [rrr:ppp:qqq, xxx] are best handled by returning sub-alignments or as special cases (strings/Seq and Seq/SeqRecord respectively?). Peter From bugzilla-daemon at portal.open-bio.org Thu Jul 26 14:52:38 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 10:52:38 -0400 Subject: [Biopython-dev] [Bug 2340] New: SProt.py fails to parse the current Swiss-Prot version 54.0 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2340 Summary: SProt.py fails to parse the current Swiss-Prot version 54.0 Product: Biopython Version: 1.43 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: gould at embl.de Hi, I'm running on a red hat linux box on python 2.3.4 and am trying to parse any swiss-prot record but the parser just seems to bomb out not throwing an error of where it actually fails. I'm guessing it has to do with the Release 54.0 of 24-Jul-07 of UniPROT with the addition of the new line type PE?? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 15:46:36 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 11:46:36 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261546.l6QFkaGq022472@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 11:46 EST ------- Hi Kate, Could you give us the URL of one or two specific SwissProt files you're having trouble with. Also how are you trying to read the SwissProt files? e.g. with Bio.SeqIO.parse()? If you could include the python error too, that could be helpful. Thanks. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 16:06:15 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 12:06:15 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261606.l6QG6FkE023264@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #2 from gould at embl.de 2007-07-26 12:06 EST ------- (In reply to comment #0) > Hi, > > I'm running on a red hat linux box on python 2.3.4 and am trying to parse any > swiss-prot record but the parser just seems to bomb out not throwing an error > of where it actually fails. I'm guessing it has to do with the Release 54.0 of > 24-Jul-07 of UniPROT with the addition of the new line type PE?? > (In reply to comment #1) > Hi Kate, > > Could you give us the URL of one or two specific SwissProt files you're having > trouble with. > > Also how are you trying to read the SwissProt files? e.g. with > Bio.SeqIO.parse()? > > If you could include the python error too, that could be helpful. Thanks. > > Peter > hi the following snippet of code is where the error occurs(this used to work no problem before something changed in the last day or two I guess) def getSequence(self,acc): """ This method retrieves the most recent annotated sequence from the ExPASy server for a given accession number. """ from Bio.WWW import ExPASy from Bio.SwissProt import SProt from Bio import File if acc != '': try: results = ExPASy.get_sprot_raw(acc.strip()).read() sp_parser = SProt.RecordParser() sp_iterator = SProt.Iterator(File.StringHandle(results), sp_parser) Record = sp_iterator.next() return Record.sequence.strip() except: return -1 else: return acc breaks at line : Record = sp_iterator.next() but doesn't print any error to terminal.... some examples of accessions nrs used are: P01100, P12522 etc thanks Kate -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 16:32:31 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 12:32:31 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261632.l6QGWVrC024560@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:32 EST ------- Confirmeing bug - it is due to the new PE line (protein evidence). The reason you didn't see the error is in your example the parser is wrapped in a try ... except ... clause. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jul 26 16:51:45 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 26 Jul 2007 12:51:45 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707261651.l6QGpja8025622@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-26 12:51 EST ------- I think I have fixed this - at least your example code now works. You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from CVS, which you can download here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want to put things back. Please test this and report back. NOTE - The fix just makes the parser aware of the new PE line, and ignores it. It doesn't (yet) do anything useful with the information it contains! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 27 06:46:35 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Jul 2007 02:46:35 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707270646.l6R6kZaI001699@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 ------- Comment #5 from gould at embl.de 2007-07-27 02:46 EST ------- (In reply to comment #4) > I think I have fixed this - at least your example code now works. > > You'll need to update the file Bio/SwissProt/SProt.py to revision 1.38 from > CVS, which you can download here: > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SwissProt/SProt.py?cvsroot=biopython > > Don't forget to backup the old Bio/SwissProt/SProt.py first, in case you want > to put things back. > > Please test this and report back. > > NOTE - The fix just makes the parser aware of the new PE line, and ignores it. > It doesn't (yet) do anything useful with the information it contains! > Yes it has done the trick and all works OK again. thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jul 27 07:54:14 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Jul 2007 03:54:14 -0400 Subject: [Biopython-dev] [Bug 2340] SProt.py fails to parse the current Swiss-Prot version 54.0 In-Reply-To: Message-ID: <200707270754.l6R7sEnm007432@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2340 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 03:54 EST ------- Great :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kosa at genesilico.pl Fri Jul 27 10:47:10 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Fri, 27 Jul 2007 12:47:10 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object Message-ID: <46A9CD2E.6080402@genesilico.pl> Hi, From the viewpoint of the enduser we would like python Alignment object to behave outside as an array so we could get slices, columns, sequences, their fragments, whatever we want etc. The most intuitive and clear (certainly much better than not very clear indexes like [xxx:yyy:zzz]) for the user is the following. [A:B][X:Y] - general syntax of indices. This supports almost everything. Several examples of usage and proposed outputs: [:][:] - returns an alignment or its copy (as Alignment object) [:][x:y] - returns slice of the alignment (as Alignment object; aln of all sequences and residues corresponding to columns from x and y) [a:b][:] - returns the aln of seqs from a to b (as Alignment object) [a:b][x:y] - returns the slice and subalignment (as Alignment object) [a:a][x:y] - returns slice of the single sequence (residues x to y of sequence a) (as Alignment object) [a][x:y] - returns slice of the single sequence (residues x to y of sequence a) (as a String) [a:][x:y] and similar combinations - returns the slice and subalignment, sequences from a to the last are included (as Alignment object) [:][x] - returns single column (as a String object? string here could be very useful) [:][x:x] - returns single column (as Alignment object) [a] - returns single sequence (as a SeqRecord object) [a:a] and [a:a][:] - returns single sequence (as Alignment object) [m][n] - returns n-th element of sequence m (as a String) Disputable could be that different but similar sets of indices return different types of objects (ex. [:][x] would return a column as string while [:][x:x] would return a column as Alignment object, but in my opinion it would just extend the usability). The only problem is an implementation of such calls but it depends on what type of object the Alignment object will be. What do you think? Cheers, Jan Kosinski Grzegorz Papaj :. From bugzilla-daemon at portal.open-bio.org Fri Jul 27 12:51:10 2007 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Jul 2007 08:51:10 -0400 Subject: [Biopython-dev] [Bug 1944] Align.Generic adding iterator and more In-Reply-To: Message-ID: <200707271251.l6RCpAIg025706@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2007-07-27 08:51 EST ------- Created an attachment (id=721) --> (http://bugzilla.open-bio.org/attachment.cgi?id=721&action=view) Patch for Bio/Align/Generic.py to add __getitem__ method This patch adds a __getitem__ method, a small "mini test" when running the module directly, and updates the doc strings. This gives SeqRecord iteration "for free" (without an explicit __iter__ method). As discussed on the mailing list, this allows an Alignment object to be treated as a list of SeqRecord objects or as an array of character strings - plus extract whole columns as strings. Quoting the proposed __getitem__ doc string: Depending on the indices, you can get a SeqRecord objects (representing a single row), strings (for a single columns or single characters) or another alignment (representing some or part of the alignment). align[r,c] gives a single character as a string align[r] gives a SeqRecord align[:,c] gives a column as a string align[:] and align[:,:] give a copy of the alignment Anything else gives a sub alignment, e.g. align[0:2] or align[0:2,:] uses only row 0 and 1 align[:,1:3] uses only columns 1 and 2 align[0:2,1:3] uses only rows 0 & 1 and only cols 1 & 2 Feedback welcome - either here, or on the developers' mailing list. Thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 12:18:21 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Jul 2007 13:18:21 +0100 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9CD2E.6080402@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> Message-ID: <46A9E28D.40609@maubp.freeserve.co.uk> Jan Kosinski wrote: > Hi, > > From the viewpoint of the enduser we would like python Alignment object > to behave outside as an array so we could get slices, columns, > sequences, their fragments, whatever we want etc. The most intuitive and > clear (certainly much better than not very clear indexes like > [xxx:yyy:zzz]) for the user is the following. > > [A:B][X:Y] - general syntax of indices. This supports almost everything. I think Michiel and I were suggesting [A:B,X:Y] or rather [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or [A:B:C][X:Y:Z] i.e. [arg1, arg2] rather than [arg1][arg2] This is an important point, as in the first case the __getitem__ method of the alignment is called once (with both arguments). In the second case, the __getitem__ method is called with arg1, and may return a SeqRecord or an alignment - and this object's __getitem__ method is called with arg2. As written, many of your cases appear to be impossible - but using the [arg1,arg2] we can get close. I've got a working bit of code put together now which I'll attached to bug 1944 soon. http://bugzilla.open-bio.org/show_bug.cgi?id=1944 Peter From kosa at genesilico.pl Fri Jul 27 14:13:24 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Fri, 27 Jul 2007 16:13:24 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> Message-ID: <46A9FD84.4080502@genesilico.pl> Hi, Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not clear while the [A:B][X:Y] syntax is clear and sufficient. We had another discussion in the lab about that Alignment object should not store records in the list but rather in a dictionary (but keeping information about sequence order ) or so. What is you reasoning for making Alignment object a list of SeqRecord objects? One should carefully think about design of the Alignment class since it will influence all further steps. As now the class is in its infancy there is a very good moment for thinking what the Alignment class is for and what it should support. For instance, the Alignment object should support changing characters in the alignment without a need of copying it (using aln[a][x] = "D"). Can it be done now with Alignment which is a list of SeqRecord objects with sequences implemented as immutable Seq objects ? Cheers, Jan Kosinski Peter wrote: > Jan Kosinski wrote: >> Hi, >> >> From the viewpoint of the enduser we would like python Alignment object >> to behave outside as an array so we could get slices, columns, >> sequences, their fragments, whatever we want etc. The most intuitive and >> clear (certainly much better than not very clear indexes like >> [xxx:yyy:zzz]) for the user is the following. >> >> [A:B][X:Y] - general syntax of indices. This supports almost everything. > > I think Michiel and I were suggesting [A:B,X:Y] or rather > [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or > [A:B:C][X:Y:Z] > > i.e. [arg1, arg2] rather than [arg1][arg2] > > This is an important point, as in the first case the __getitem__ > method of the alignment is called once (with both arguments). In the > second case, the __getitem__ method is called with arg1, and may > return a SeqRecord or an alignment - and this object's __getitem__ > method is called with arg2. > > As written, many of your cases appear to be impossible - but using the > [arg1,arg2] we can get close. > > I've got a working bit of code put together now which I'll attached to > bug 1944 soon. > > http://bugzilla.open-bio.org/show_bug.cgi?id=1944 > > Peter > > > :. > :. From kosa at genesilico.pl Fri Jul 27 14:35:15 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Fri, 27 Jul 2007 16:35:15 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9FD84.4080502@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> Message-ID: <46AA02A3.30000@genesilico.pl> Hi, Sorry for a typo ;-) Of course it should read: ... while the [A:B,X:Y] syntax is clear and sufficient." Cheers, Janek Jan Kosinski wrote: > Hi, > > Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is > fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is > not clear while the [A:B][X:Y] syntax is clear and sufficient. > > We had another discussion in the lab about that Alignment object > should not store records in the list but rather in a dictionary (but > keeping information about sequence order ) or so. What is you > reasoning for making Alignment object a list of SeqRecord objects? > One should carefully think about design of the Alignment class since > it will influence all further steps. As now the class is in its > infancy there is a very good moment for thinking what the Alignment > class is for and what it should support. For instance, the Alignment > object should support changing characters in the alignment without a > need of copying it (using aln[a][x] = "D"). Can it be done now with > Alignment which is a list of SeqRecord objects with sequences > implemented as immutable Seq objects ? > > Cheers, > Jan Kosinski > > > Peter wrote: >> Jan Kosinski wrote: >>> Hi, >>> >>> From the viewpoint of the enduser we would like python Alignment >>> object >>> to behave outside as an array so we could get slices, columns, >>> sequences, their fragments, whatever we want etc. The most intuitive >>> and >>> clear (certainly much better than not very clear indexes like >>> [xxx:yyy:zzz]) for the user is the following. >>> >>> [A:B][X:Y] - general syntax of indices. This supports almost >>> everything. >> >> I think Michiel and I were suggesting [A:B,X:Y] or rather >> [A:B:C,X:Y:Z] to be fully general, rather than [A:B][X:Y] or >> [A:B:C][X:Y:Z] >> >> i.e. [arg1, arg2] rather than [arg1][arg2] >> >> This is an important point, as in the first case the __getitem__ >> method of the alignment is called once (with both arguments). In the >> second case, the __getitem__ method is called with arg1, and may >> return a SeqRecord or an alignment - and this object's __getitem__ >> method is called with arg2. >> >> As written, many of your cases appear to be impossible - but using >> the [arg1,arg2] we can get close. >> >> I've got a working bit of code put together now which I'll attached >> to bug 1944 soon. >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 >> >> Peter >> >> >> :. >> > > :. From biopython-dev at maubp.freeserve.co.uk Fri Jul 27 17:11:03 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Jul 2007 18:11:03 +0100 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46A9FD84.4080502@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> Message-ID: <46AA2727.103@maubp.freeserve.co.uk> Jan Kosinski wrote: > We had another discussion in the lab about that Alignment object should > not store records in the list but rather in a dictionary (but keeping > information about sequence order ) or so. What is you reasoning for > making Alignment object a list of SeqRecord objects? In a sense the Bio.Align.Generic.Alignment object always was a list of SeqRecords (if you look at the internal implementation that is), and I hadn't stopped to really question it. I like having list like behaviour and exploit this in a lot of my code dealing with alignments. The are some nice things about having dictionary like behaviour in an alignment class, but unless a notional sequence order is preserved, this breaks the array of characters model. Also, using a dictionary like alignment would force the user to specify unique keys for each record (e.g. the record.id) which is something the current list-like-alignment does not require. Perhaps we could have a "dictionary like" sub class of Alignment where the __getitem__ method would allow a record identifier in place of a row index: print aln["P3454"] print aln["P3454", 20] instead or as well as: print aln[10] print aln[10, 20] > One should carefully think about design of the Alignment class since it > will influence all further steps. As now the class is in its infancy > there is a very good moment for thinking what the Alignment class is for > and what it should support. I had viewed the new __getitem__ method as a backwards compatible enhancement of the existing stable (but rather limited) Bio.Generic.Alignment class. That's not to say we can't design a new class from scratch - I just prefer gradual improvements without breaking existing usage. I am particularly keen to allow splicing of alignments. For example, you could select the conserved core of an alignment by removing the left most 10 columns and the right most ten columns: align_core = aln[:,10:-10] > For instance, the Alignment object should > support changing characters in the alignment without a need of copying > it (using aln[a,x] = "D"). Can it be done now with Alignment which is > a list of SeqRecord objects with sequences implemented as immutable Seq > objects ? No, right now you can't easily edit sequences in a Bio.Generic.Alignment (even with the proposed change) as it is implemented using immutable Seq objects. I personally haven't needed to edit an alignment like this. Is this something you want to do often? To me the obvious way to handle this is to have a MutableAlignment sub-class, where editing individual elements with aln[r,c] = "D" would be supported (possibly implemented using the MutableSeq class internally rather than the immutable Seq class). On a related point, I was planning to raise the following suggestion in the future - adding alignments, like this: combined_aln = aln1 + aln2 e.g. aln1 had 5 rows of length 10, and aln2 had 5 rows of length 15, then the result of aln1+aln2 would have 5 rows of length 25. Alignment addition would only be defined for alignments with the same number of rows (perhaps also restricted to the same sequence type, and row weights?). The result would contain the same number of rows, where each sequence was the concatenation of the corresponding two rows in the input alignments. I'd suggest concatenating the record.id's (if different) however one could argue that it would be better to insist the user had made sure the two alignments had consistent identifiers. An example of where this could be used is taking alignments of multiple sets of homologous genes, sorting them to use the same species order, and then creating a concatenated alignment for robust phylogenetic tree construction. Peter From mdehoon at c2b2.columbia.edu Sat Jul 28 02:57:05 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 11:57:05 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9FD84.4080502@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> Message-ID: <46AAB081.30609@c2b2.columbia.edu> Jan Kosinski wrote: > Hi, > > Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is > fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is not > clear while the [A:B][X:Y] syntax is clear and sufficient. Python lists, tuples, and strings support [A:B:C], and Numerical Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment should not support this format. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 03:10:06 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 12:10:06 +0900 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> Message-ID: <46AAB38E.50009@c2b2.columbia.edu> Peter wrote: > Perhaps we could have a "dictionary like" sub class of Alignment where > the __getitem__ method would allow a record identifier in place of a row > index: > > print aln["P3454"] > print aln["P3454", 20] > > instead or as well as: > > print aln[10] > print aln[10, 20] "as well as" would break if a user decides to use an integer as a key in the dictionary. A safer approach would be to define a method specifically for dictionary-like access. Something like: print aln[10] print aln[10,20] for list-like access, and print aln.get("P3454") for dictionary-like access. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 04:11:03 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 13:11:03 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46A9E28D.40609@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> Message-ID: <46AAC1D7.8030208@c2b2.columbia.edu> Peter wrote: > I've got a working bit of code put together now which I'll attached to > bug 1944 soon. > > http://bugzilla.open-bio.org/show_bug.cgi?id=1944 > For the most part, I agree with the functionality in this patch. I have three suggestions though: >>> aln = Alignment(alphabet) # Suggestion 1: We should allow creating an Alignment without specifying an alphabet >>> aln.add_sequence("seq1", "ATCGTTGC") >>> aln.add_sequence("seq2", "ATCCTTGC") >>> aln.add_sequence("seq3", "ATCCGTGC") >>> aln[0] SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', name='', description='seq1', dbxrefs=[]) # Suggestion 2: I would expect "seq1" as the id rather than the description >>> aln[:2] # OK >>> aln[:,4] 'TTG' # OK >>> aln[2,:] # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment consisting of a single sequence doesn't make much sense. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 04:20:24 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sat, 28 Jul 2007 13:20:24 +0900 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AA2727.103@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> Message-ID: <46AAC408.2050703@c2b2.columbia.edu> Peter wrote: >> For instance, the Alignment object should >> support changing characters in the alignment without a need of copying >> it (using aln[a,x] = "D"). Can it be done now with Alignment which is >> a list of SeqRecord objects with sequences implemented as immutable Seq >> objects ? > .... > > To me the obvious way to handle this is to have a MutableAlignment > sub-class, where editing individual elements with aln[r,c] = "D" would > be supported (possibly implemented using the MutableSeq class internally > rather than the immutable Seq class). > I don't think we'd need a separate MutableAlignment for that. An Alignment is a list of sequences and is therefore mutable. If we add a __setitem__ method to the Alignment class, then this method can take care of constructing a new sequence and put it in the appropriate row. --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 10:04:04 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 11:04:04 +0100 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AAC1D7.8030208@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> Message-ID: <46AB1494.301@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> I've got a working bit of code put together now which I'll attached to >> bug 1944 soon. >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=1944 >> > For the most part, I agree with the functionality in this patch. I have > three suggestions though: > > >>> aln = Alignment(alphabet) > # Suggestion 1: We should allow creating an Alignment without specifying > an alphabet That would mean changing the existing __init__ from: def __init__(self, alphabet): to something like: def __init__(self, alphabet=single_letter_alphabet): with this import statement added: from Bio.Alphabet import single_letter_alphabet This seems like a good idea, and shouldn't break any existing code either. > >>> aln.add_sequence("seq1", "ATCGTTGC") > >>> aln.add_sequence("seq2", "ATCCTTGC") > >>> aln.add_sequence("seq3", "ATCCGTGC") > >>> aln[0] > SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', > name='', description='seq1', dbxrefs=[]) > # Suggestion 2: I would expect "seq1" as the id rather than the description I agree with you here - this is the historic behaviour of the add_sequence method which actually creates a SeqRecord from the strings it is given. I would suggest it populate the record.id but for backwards compatibility still populate the record.description in case anyone is still using that. We also could add an add_record method to the alignment object which takes a SeqRecord, plus optional weight (and start and end?). Marc Colosimo also made this point on bug 1944 (although I don't like his mixed case method name). > >>> aln[:2] > > # OK > >>> aln[:,4] > 'TTG' > # OK > >>> aln[2,:] > > # Suggestion 3: Here, I would expect "ATCCGTGC" instead. An alignment > consisting of a single sequence doesn't make much sense. I'll have a closer look, but as aln[2] returns a single SeqRecord maybe aln[2,:] should do that too - rather than returning a string. Peter From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 13:14:43 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 14:14:43 +0100 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> Message-ID: <46AB4143.5070406@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >>> For instance, the Alignment object should >>> support changing characters in the alignment without a need of copying >>> it (using aln[a,x] = "D"). Can it be done now with Alignment which is >>> a list of SeqRecord objects with sequences implemented as immutable Seq >>> objects ? > .... >> To me the obvious way to handle this is to have a MutableAlignment >> sub-class, where editing individual elements with aln[r,c] = "D" would >> be supported (possibly implemented using the MutableSeq class internally >> rather than the immutable Seq class). >> > I don't think we'd need a separate MutableAlignment for that. An > Alignment is a list of sequences and is therefore mutable. If we add a > __setitem__ method to the Alignment class, then this method can take > care of constructing a new sequence and put it in the appropriate row. > So rather than editing one character of a MutableSeq, we could replace one immutable Seq object with a new immutable Seq object where one character was different? That would work - sounds a little slow, but certainly possible. Peter From mdehoon at c2b2.columbia.edu Sat Jul 28 15:15:49 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 00:15:49 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> Message-ID: <46AB5DA5.6050604@c2b2.columbia.edu> # Current method to add a row to the alignment: >>> aln.add_sequence("seq1", "ATCGTTGC") ... Peter wrote: > We also could add an add_record method to the alignment object which > takes a SeqRecord, plus optional weight (and start and end?). Marc > Colosimo also made this point on bug 1944 (although I don't like his > mixed case method name). This is Marc Colosimo's suggestion for adding a SeqRecord: def addSeqRecord(self, seqRec): """Add a Sequence Record to the Alignment @param seqRec: a sequence record (SeqRecord) to add. """ if isinstance(seqRec, SeqRecord): self._records.append(seqRec) else: raise TypeError("sequence is NOT a SeqRecord Object") Since an Alignment is essentially a list of SeqRecords, I propose that we call the method to add a row to this list "append". In addition, this method should be able to take a SeqRecord, a Seq object, or a plain string. Something like this: def append(self, sequence): if isinstance(sequence, SeqRecord): self._records.append(sequence) elif isinstance(sequence, Seq): self._records.append(SeqRecord(sequence)) elif isinstance(sequence, str): self._records.append(SeqRecord(Seq(sequence))) else: raise TypeError("sequence should be a string, a Seq Object, or a SeqRecord object") This method can be generalized to allow a descriptor, weight, start, end end, just like in the current add_sequence method. Then we can replace add_sequence and addSeqRecord by a single append method. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 15:17:52 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 00:17:52 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB1494.301@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> Message-ID: <46AB5E20.5090605@c2b2.columbia.edu> Peter wrote: > Michiel de Hoon wrote: >> >>> aln.add_sequence("seq1", "ATCGTTGC") >> >>> aln[0] >> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', >> name='', description='seq1', dbxrefs=[]) >> # Suggestion 2: I would expect "seq1" as the id rather than the >> description > > I agree with you here - this is the historic behaviour of the > add_sequence method which actually creates a SeqRecord from the strings > it is given. I would suggest it populate the record.id but for backwards > compatibility still populate the record.description in case anyone is > still using that. > That sounds good to me. --Michiel. From mdehoon at c2b2.columbia.edu Sat Jul 28 15:23:51 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 00:23:51 +0900 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AB4143.5070406@maubp.freeserve.co.uk> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> <46AB4143.5070406@maubp.freeserve.co.uk> Message-ID: <46AB5F87.1090506@c2b2.columbia.edu> Peter wrote: > Michiel de Hoon wrote: >> Peter wrote: >>>> For instance, the Alignment object should >>>> support changing characters in the alignment without a need of >>>> copying it (using aln[a,x] = "D"). Can it be done now with >>>> Alignment which is a list of SeqRecord objects with sequences >>>> implemented as immutable Seq objects ? >> .... >>> To me the obvious way to handle this is to have a MutableAlignment >>> sub-class, where editing individual elements with aln[r,c] = "D" >>> would be supported (possibly implemented using the MutableSeq class >>> internally rather than the immutable Seq class). >>> >> I don't think we'd need a separate MutableAlignment for that. An >> Alignment is a list of sequences and is therefore mutable. If we add a >> __setitem__ method to the Alignment class, then this method can take >> care of constructing a new sequence and put it in the appropriate row. >> > So rather than editing one character of a MutableSeq, we could replace > one immutable Seq object with a new immutable Seq object where one > character was different? That would work - sounds a little slow, but > certainly possible. > At first, I also thought that that would be slow, especially for long sequences. But in practice, it's surprisingly fast. Unless somebody wants to edit an alignment of chromosome-size sequences, we probably won't run into a speed problem. --Michiel. From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 16:00:34 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 17:00:34 +0100 Subject: [Biopython-dev] adding rows to an alignment object In-Reply-To: <46AB5DA5.6050604@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> <46AB5DA5.6050604@c2b2.columbia.edu> Message-ID: <46AB6822.6090706@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Since an Alignment is essentially a list of SeqRecords, I propose that > we call the method to add a row to this list "append". Sounds very sensible. > In addition, this method should be able to take a SeqRecord, a Seq > object, or a plain string. Do you really think we should complicate things like this? I would just accept SeqRecord objects (with optional start/end/weight). > Something like this: > > def append(self, sequence): > if isinstance(sequence, SeqRecord): > self._records.append(sequence) > elif isinstance(sequence, Seq): > self._records.append(SeqRecord(sequence)) > elif isinstance(sequence, str): > self._records.append(SeqRecord(Seq(sequence))) > else: > raise TypeError("sequence should be a string, a Seq Object, > or a SeqRecord object") One minor point - we should use the alignment's alphabet when building a Seq object from a string. Perhaps we should even check the alphabet when asked to append a SeqRecord or Seq object... > This method can be generalized to allow a descriptor, weight, start, > end, just like in the current add_sequence method. Where the descriptor is expected for Seq and string input, and used as the SeqRecord's id? I would personally check the length matches the rest of the alignment (something the current add_sequence method doesn't do) otherwise its very easy to get a malformed alignment where some sequences are longer than others. Also, I would leave the existing .add_sequence() method in place, but update its docstring to encourage use of .append() instead. Peter From biopython-dev at maubp.freeserve.co.uk Sat Jul 28 15:49:11 2007 From: biopython-dev at maubp.freeserve.co.uk (Peter) Date: Sat, 28 Jul 2007 16:49:11 +0100 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB5E20.5090605@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46AAC1D7.8030208@c2b2.columbia.edu> <46AB1494.301@maubp.freeserve.co.uk> <46AB5E20.5090605@c2b2.columbia.edu> Message-ID: <46AB6577.6050708@maubp.freeserve.co.uk> Michiel de Hoon wrote: > Peter wrote: >> Michiel de Hoon wrote: >>> >>> aln.add_sequence("seq1", "ATCGTTGC") >>> >>> aln[0] >>> SeqRecord(seq=Seq('ATCGTTGC', Alphabet()), id='', >>> name='', description='seq1', dbxrefs=[]) >>> # Suggestion 2: I would expect "seq1" as the id rather than the >>> description >> I agree with you here - this is the historic behaviour of the >> add_sequence method which actually creates a SeqRecord from the strings >> it is given. I would suggest it populate the record.id but for backwards >> compatibility still populate the record.description in case anyone is >> still using that. >> > That sounds good to me. Good. Done, CVS revision 1.6 of file Bio/Align/Generic.py Peter From kosa at genesilico.pl Sat Jul 28 16:53:04 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Sat, 28 Jul 2007 18:53:04 +0200 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AAB081.30609@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu> Message-ID: <46AB7470.6010006@genesilico.pl> Hi, I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of alignments. Ins't [A:B,X:Y] sufficient? Janek Michiel de Hoon wrote: > Jan Kosinski wrote: >> Hi, >> >> Ok, I agree that [A:B][X:Y] syntax is not possible here. [A:B,X:Y] is >> fine. However, I would recommend not using [A:B:C,X:Y:Z] since it is >> not clear while the [A:B][X:Y] syntax is clear and sufficient. > > Python lists, tuples, and strings support [A:B:C], and Numerical > Python 2D arrays support [A:B:C,X:Y:Z]. I don't see why the Alignment > should not support this format. > > --Michiel. > > :. > :. From kosa at genesilico.pl Sat Jul 28 16:55:33 2007 From: kosa at genesilico.pl (Jan Kosinski) Date: Sat, 28 Jul 2007 18:55:33 +0200 Subject: [Biopython-dev] Improving the Alignment object In-Reply-To: <46AAC408.2050703@c2b2.columbia.edu> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AA2727.103@maubp.freeserve.co.uk> <46AAC408.2050703@c2b2.columbia.edu> Message-ID: <46AB7505.30302@genesilico.pl> Hi, I think the same, an alignment should be mutable and there is no need for making two classes, mutable and not mutable. Janek Michiel de Hoon wrote: > Peter wrote: >>> For instance, the Alignment object should >>> support changing characters in the alignment without a need of >>> copying it (using aln[a,x] = "D"). Can it be done now with >>> Alignment which is a list of SeqRecord objects with sequences >>> implemented as immutable Seq objects ? >> > .... >> >> To me the obvious way to handle this is to have a MutableAlignment >> sub-class, where editing individual elements with aln[r,c] = "D" >> would be supported (possibly implemented using the MutableSeq class >> internally rather than the immutable Seq class). >> > I don't think we'd need a separate MutableAlignment for that. An > Alignment is a list of sequences and is therefore mutable. If we add a > __setitem__ method to the Alignment class, then this method can take > care of constructing a new sequence and put it in the appropriate row. > > --Michiel. > > :. > :. From mdehoon at c2b2.columbia.edu Sun Jul 29 04:38:28 2007 From: mdehoon at c2b2.columbia.edu (Michiel de Hoon) Date: Sun, 29 Jul 2007 13:38:28 +0900 Subject: [Biopython-dev] syntax of indices for future Alignment object In-Reply-To: <46AB7470.6010006@genesilico.pl> References: <46A9CD2E.6080402@genesilico.pl> <46A9E28D.40609@maubp.freeserve.co.uk> <46A9FD84.4080502@genesilico.pl> <46AAB081.30609@c2b2.columbia.edu> <46AB7470.6010006@genesilico.pl> Message-ID: <46AC19C4.1000102@c2b2.columbia.edu> Jan Kosinski wrote: > I just do not see what [A:B:C,X:Y:Z] adds to [A:B,X:Y] in case of > alignments. Ins't [A:B,X:Y] sufficient? > [A:B,X:Y] may be sufficient, but does not agree with Python indices for other objects (lists, tuples, strings). In addition, since allowing [A:B,X:Y] only is different from usual Python usage, we'd actually end up writing more code to specifically disallow [A:B:C,X:Y:Z]. Note also that [A:B:C,X:Y:Z] includes [A:B,X:Y] as a special case. So if the Alignment class is written to deal with [A:B:C,X:Y:Z], but I'd tell you that it expects [A:B,X:Y], then you wouldn't notice any difference. Until you'd try [A:B:C,X:Y:Z] and you find out that that works too. --Michiel.