From bugzilla-daemon at portal.open-bio.org Sat Nov 1 00:02:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 00:02:49 -0400 Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove oldnumeric and listfns imports In-Reply-To: Message-ID: <200811010402.mA142nUi010329@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2627 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-11-01 00:02 EST ------- I made some changes to this patch and committed it to CVS; see MarkovModel.py revision 1.9. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 01:38:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 01:38:41 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200811010538.mA15cfGM016656@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-01 01:38 EST ------- Committed to CVS with some changes; see MaxEntropy.py versions 1.8 and 1.9. I added your example at the bottom of Bio/MaxEntropy.py. Next time, instead of the complete new code for a module, please attach a patch instead. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 02:59:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 02:59:40 -0400 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811010659.mA16xedF020106@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-11-01 02:59 EST ------- I committed part of this patch to CVS; see NaiveBayes.py revision 1.9. Could you check your classify function? It seems to contain some debugging statements. Also, do we need the classifyprob function? If you send in a new version of this code, please attach it as a patch to the current version of NaiveBayes.py in CVS. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 17:22:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 17:22:53 -0400 Subject: [Biopython-dev] [Bug 2592] numpy migration for Bio.PDB.Vector In-Reply-To: Message-ID: <200811012122.mA1LMrf6021694@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2592 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-01 17:22 EST ------- Fixed in CVS, see Bio/PDB/Vector.py revision 1.45 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 18:11:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 18:11:47 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811012211.mA1MBl3b026482@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-01 18:11 EST ------- Here is an example of how the updated Seq object might be used (taken from the new edition of the tutorial in CVS): >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna) >>> coding_dna.translate() Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*')) >>> coding_dna.translate(to_stop=True) Seq('MAIVMGR', IUPACProtein()) Using the Vertebrate Mitochondrial table instead: >>> coding_dna.translate(table="Vertebrate Mitochondrial") Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*')) >>> coding_dna.translate(table=2) Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*')) >>> coding_dna.translate(table=2, to_stop=True) Seq('MAIVMGRWKGAR', IUPACProtein()) As I said in comment 24, the name "to_stop" and its behaviour are taken from the old (now obsolete) Bio.Translate module. ------------------------------------------------------------- I'm also considering adding an additional boolean argument too (see comment 22): > Validate the first codon is a valid start codon, and translate > it as M (even if going on the genetic code it would normally be > say L). This should be a boolean argument defaulting to False, > possible names "start", "check_start", "from_start", ... I would prefer to avoid calling this argument "start" given the existing meaning associated with "start" and "end" used in python strings (for specifying a sub-sequence to be translated - discussed earlier on this bug). This would be especially useful for translating a gene/CDS sequence into protein where making sure a non-standard start codon is translated as "M" is non-trivial. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 06:17:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 06:17:59 -0500 Subject: [Biopython-dev] [Bug 2638] New: test_PopGen_SimCoal_nodepend.py fails on Windows do to newline issue Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2638 Summary: test_PopGen_SimCoal_nodepend.py fails on Windows do to newline issue Product: Biopython Version: Not Applicable Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This unit test attempts to regenerate a plain text SimCoal file, and currently fails on Windows (but passes on Linux and Mac OS X). Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 06:22:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 06:22:16 -0500 Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on Windows do to newline issue In-Reply-To: Message-ID: <200811031122.mA3BMGwX013481@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2638 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 06:22 EST ------- Created an attachment (id=1030) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1030&action=view) Patch to the PopGen/SimCoal/Template.py and the unit test Looking at the code, rather than using \n to mean a platform aware new line, \r\n is used (this doesn't always give a CR LF, but on Windows you get CR CR LF instead). Also, are the template files in CVS as plain text files or binary files? I haven't double checked but I think they may be checked in as binary files with DOS/Windows new lines... I haven't committed this as I don't have SIMCOAL installed to check there are no side effects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 06:22:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 06:22:53 -0500 Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on Windows, newline issue In-Reply-To: Message-ID: <200811031122.mA3BMr8B013540@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2638 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|test_PopGen_SimCoal_nodepend|test_PopGen_SimCoal_nodepend |.py fails on Windows do to |.py fails on Windows, |newline issue |newline issue ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 06:22 EST ------- Removed typo in the bug summary (title). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Nov 3 06:48:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Nov 2008 11:48:06 +0000 Subject: [Biopython-dev] New line issues in the source zip or tarballs In-Reply-To: <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com> References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com> <6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com> <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com> Message-ID: <320fb6e00811030348vb7b6068v549ebfab9f6ec76b@mail.gmail.com> On Mon, Sep 8 Peterwrote: > Tiago wrote: >> Peter wrote: >>> In the case of test_PopGen_SimCoal_nodepend.py the failure is >>> expecting simple.par and simple_100_30.par to be exactly the same size >>> (in class TemplateTest, line 47). This is not true going to be true >>> when the input file uses Unix new lines but the generated file uses >>> Windows new lines. Perhaps using a simple bit of code to load the >>> files line by line and compare them would work here? >> >> I am currently at a workshop (I belong to the organization committee, so I >> don't have much time), but I will try to sort this in the next couple of >> days. > > This issue new line issue has probably been there since Biopython 1.45 > without anyone else spotting it, so I don't see fixing it as urgent. > Hopefully we can resolve this for the next release instead. I've filed Bug 2638 on this with a possible patch. Could you take a look at this please? I just tried installing SIMCOAL2 on my Mac, but failed. To be fair, they do only appear to support Linux and Windows... Thanks Peter From biopython at maubp.freeserve.co.uk Mon Nov 3 07:43:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Nov 2008 12:43:22 +0000 Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation Message-ID: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> Hi Tiago, I've just compiled SIMCOAL2 on a Linux machine from http://cmpg.unibe.ch/software/simcoal2/ (version 2.1.2). If anyone else tries this, it required the use of -fpermissive on g++ 4.1.2 to compile (and gave lots of deprecation warnings, plus some trivial ones about header files which didn't end with a newline). The make file specifies the executable name as simcoal2_1_2, however it does not include an install target, so it is up to the user where to put the binary (e.g. I used ~/bin/ rather than system wide) and perhaps what to call it. The provided pre-compiled binary is also called simcoal2_1_2. However, Bio.PopGen.SimCoal.Controller seems to assume the executable will be called just simcoal2 (or simcoal2.exe on Windows), and thus fails detect a binary called simcoal2_1_2. The unit test however is more flexible and looks for any binary on the path whose name starts with simcoal2. Ideally these two should be consistent. I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation issue or a bug in Bio.PopGen.SimCoal.Controller? In my experience it is not normal for a Linux tool to include the full version in the executable name - using just simcoal2 does make more sense. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 3 12:16:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 12:16:41 -0500 Subject: [Biopython-dev] [Bug 2639] New: SeqRecord.init doesn't check for arguments to their types Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2639 Summary: SeqRecord.init doesn't check for arguments to their types Product: Biopython Version: 1.47 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com SeqRecord doesn't check if description is a string when creating SeqRecord objects. This causes an error when later you will have to print the record in formats like fasta. >>> from Bio.Seq import Seq >>> from Bio.SeqRecord import SeqRecord >>> sr = SeqRecord(Seq('aaa'), description = [1, 2, 3]) # should give an error here! >>> print sr.fasta : 'list' object has no attribute 'replace' Looking at SeqRecord.__init__ code, none of the arguments is checked for its type. This is a minor bug, but if you want to solve it, you just have to add some isinstance() check in the init function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 13:47:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 13:47:59 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811031847.mA3IlxuE025247@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 13:47 EST ------- Fixed in CVS, although there is a small chance this will break existing scripts which relied on the old lax behaviour. Peter P.S. Assuming you are using an unmodified Biopython, the last line of your example wouldn't work: >>> print sr.fasta Try: >>> print sr.format("fasta") -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 14:33:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 14:33:39 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811031933.mA3JXdcZ028123@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #4 from bsouthey at gmail.com 2008-11-03 14:33 EST ------- (In reply to comment #3) > I committed part of this patch to CVS; see NaiveBayes.py revision 1.9. > Could you check your classify function? It seems to contain some debugging > statements. Also, do we need the classifyprob function? > If you send in a new version of this code, please attach it as a patch to the > current version of NaiveBayes.py in CVS. > Thanks! > Yes, there is a print statement at the end of the 'classify' function (line 125 of attached file) that should be removed (as with any print statements that are commented out). These were to check that the values were the same as the original code. The classifyprob function can be dropped with not problems. I just wanted to return the probability but I also recognize that it is not very useful. I noticed you are using set (line 145 in the new cvs file) which is not compatible with Python2.3. How should this be addressed? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon Nov 3 14:34:36 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 3 Nov 2008 19:34:36 +0000 Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation In-Reply-To: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> Message-ID: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com> Hi, On Mon, Nov 3, 2008 at 12:43 PM, Peter wrote: > However, Bio.PopGen.SimCoal.Controller seems to assume the executable > will be called just simcoal2 (or simcoal2.exe on Windows), and thus > fails detect a binary called simcoal2_1_2. The unit test however is > more flexible and looks for any binary on the path whose name starts > with simcoal2. Ideally these two should be consistent. I am aware of this, in fact, this issue is documented in the tutorial (9.5.2.2). The idea is that the binary should be called simcoal2 as documented. This can be changed of course. My preference would be to change just the test code. Is this ok with you? > I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as > simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation > issue or a bug in Bio.PopGen.SimCoal.Controller? In my experience it > is not normal for a Linux tool to include the full version in the > executable name - using just simcoal2 does make more sense. Agree. And, again, this is documented in the tutorial. I can go ahead and change the test code (please just confirm). From tiagoantao at gmail.com Mon Nov 3 14:56:05 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 3 Nov 2008 19:56:05 +0000 Subject: [Biopython-dev] Statistics in population genetics module - Part I In-Reply-To: <5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com> References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> <5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com> Message-ID: <6d941f120811031156s2f634c1aq4252b17308ecf24a@mail.gmail.com> Hi, On Mon, Nov 3, 2008 at 3:36 PM, Giovanni Marco Dall'Olio wrote: > For how much time do you think a biopython module should be kept compatible > with older versions, more or less? That is an interesting discussion. My view is that biopython is fairly conservative in that regard. I am not saying that I agree/disagree. There seems to be a certain policy in place, and I respect it. But the point is: Bio.PopGen has to have the same policy has the rest. > It will take a long time to develop the module, and it is sure that we will > make some mistakes. So, what is the best way to proceed? What if we create a I will try to offer my view about this as soon as possible (in the next days). > At the moment I am working with a separated git repository for all the > popgen modules. The problem is that I didn't include all biopython modules > in the repository, so, if any of my changes breaks something in biopython, I > won't know it until I'll merge everything with biopython code. It won't probably break anything as long as you don't change existing code. If you are only doing your parser I suppose it will be very easily accepted in (dont forget test cases and documentation). Regarding Statistics we need to discuss it. > p.s. When python3000 will be released, it will be probably necessary to > rewrite large portions of biopython, if not creating a 'biopython 2' version > (I think they were discussing something like this in bioperl's list). Peter and Michiel opinions on this topic are be fundamental (they do most of the work maintaining biopython). But I suppose retro compatibility is a must. > I thought that maybe, even if we make some 'mistakes' in this version of > biopython, we will be able to fix them in a later version. Mistakes should not break existing code. That is really something we should try to avoid. > I think that a good idea would be starting collecting use cases to have an > idea how many things we'll have to implement in this module. This might sound elitist, but most people doing population genetics don't really have any idea of what they should expect from software. While for the "business of sequences and alignment" there is a large, mature software community, the same doesn't happen in population genetics. Or to put it in another way: you don't want to imagine the type of questions that arrive to my private mailbox ;) . > I sent that mail to the Open::Bio::I last week, but still haven't received > many replies... I will send a message to the various Bio.* mailing list in > the next days. OBF, in my view, is a bit slow and bureaucratic. Anyway, i think that anybody's views will get more importance in proportion of the quantity of code submitted and time devoted to maintenance of the whole thing. Tiago From bugzilla-daemon at portal.open-bio.org Mon Nov 3 17:58:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 17:58:11 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811032258.mA3MwBoH008744@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 17:58 EST ------- (In reply to comment #4) > > I noticed you are using set (line 145 in the new cvs file) which is not > compatible with Python2.3. How should this be addressed? > I've been using something like this elsewhere in Biopython: #TODO - Remove this work around once we drop python 2.3 support try: set except NameError: from sets import Set as set Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Nov 3 18:08:44 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Nov 2008 23:08:44 +0000 Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation In-Reply-To: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com> References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com> Message-ID: <320fb6e00811031508xfef548dm1a0673b7dba70567@mail.gmail.com> On Mon, Nov 3, 2008 at 7:34 PM, Tiago Ant?o wrote: > Hi, > > On Mon, Nov 3, 2008 at 12:43 PM, Peter wrote: >> However, Bio.PopGen.SimCoal.Controller seems to assume the executable >> will be called just simcoal2 (or simcoal2.exe on Windows), and thus >> fails detect a binary called simcoal2_1_2. The unit test however is >> more flexible and looks for any binary on the path whose name starts >> with simcoal2. Ideally these two should be consistent. > > I am aware of this, in fact, this issue is documented in the tutorial > (9.5.2.2). The idea is that the binary should be called simcoal2 as > documented. This can be changed of course. My preference would be to > change just the test code. Is this ok with you? > >> I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as >> simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation >> issue or a bug in Bio.PopGen.SimCoal.Controller? In my experience it >> is not normal for a Linux tool to include the full version in the >> executable name - using just simcoal2 does make more sense. > > Agree. And, again, this is documented in the tutorial. I can go ahead > and change the test code (please just confirm). I had skimmed over the tutorial, but missed this bit - sorry. Hopefully anyone interested in using SIMCOAL would have read this more carefully, but perhaps it could be made more prominent? e.g. try to include a few more keywords like install/installation and executable as well as binary (which I did not think to search for at the time). Let's just change test_PopGen_SimCoal.py to look for simcoal2 (or simcoal2.exe on Windows) so it is consistent with Bio.PopGen.SimCoal.Controller, and I would also mention what the binary should be called in the SimCoalController __init__ docstring. Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 4 04:31:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 04:31:19 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811040931.mA49VJOT019957@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 ------- Comment #2 from dalloliogm at gmail.com 2008-11-04 04:31 EST ------- I have tested the cvs code, it seems to work. Maybe you can allow ids to be integers, also. If you are afraid of causing problems to older scripts, you could str() the arguments if they are not strings. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 04:39:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 04:39:18 -0500 Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and Bio.AlignIO In-Reply-To: Message-ID: <200811040939.mA49dIQ9021075@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2443 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 04:39 EST ------- Marking as fixed - unit tests updated, and the new argument is mentioned in the tutorial as well. A more extensive example would be nice, perhaps using Bio.AlignIO with the Bio.Align.AlignInfo module... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 05:06:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 05:06:40 -0500 Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records In-Reply-To: Message-ID: <200811041006.mA4A6eAt024777@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2628 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 05:06 EST ------- Patch checked in, marking as fixed. Checking in Bio/SeqIO/Interfaces.py; /home/repository/biopython/biopython/Bio/SeqIO/Interfaces.py,v <-- Interfaces.py new revision: 1.11; previous revision: 1.10 done Checking in Bio/SeqIO/__init__.py; /home/repository/biopython/biopython/Bio/SeqIO/__init__.py,v <-- __init__.py new revision: 1.44; previous revision: 1.43 done Checking in Bio/AlignIO/Interfaces.py; /home/repository/biopython/biopython/Bio/AlignIO/Interfaces.py,v <-- Interfaces.py new revision: 1.7; previous revision: 1.6 done Checking in Bio/AlignIO/NexusIO.py; /home/repository/biopython/biopython/Bio/AlignIO/NexusIO.py,v <-- NexusIO.py new revision: 1.7; previous revision: 1.6 done Checking in Bio/AlignIO/__init__.py; /home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v <-- __init__.py new revision: 1.19; previous revision: 1.18 done Checking in Tests/test_SeqIO.py; /home/repository/biopython/biopython/Tests/test_SeqIO.py,v <-- test_SeqIO.py new revision: 1.44; previous revision: 1.43 done Checking in Tests/test_AlignIO.py; /home/repository/biopython/biopython/Tests/test_AlignIO.py,v <-- test_AlignIO.py new revision: 1.17; previous revision: 1.16 done Checking in Tutorial.tex; /home/repository/biopython/biopython/Doc/Tutorial.tex,v <-- Tutorial.tex new revision: 1.183; previous revision: 1.182 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 05:51:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 05:51:23 -0500 Subject: [Biopython-dev] [Bug 2640] New: Proposal: doctest for SeqRecord/biopython Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2640 Summary: Proposal: doctest for SeqRecord/biopython Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com I would like to propose to use doctest tests in biopython. I found them very useful to understand how a script should be used, and moreover they can act as test units. Here it is the main documentation for unittest: - http://www.python.org/doc/2.5.2/lib/module-doctest.html Usually, you add a _test() function to every module, which calls the unittest libraries, and launch it with __name__ == '__main__'. The most significative example is added to the documentation string of every module/function, and tested with doctest.testmod(); later, you add more tests in a separate file, and launch them with doctest.testfile(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 05:52:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 05:52:21 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041052.mA4AqLGX028185@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #1 from dalloliogm at gmail.com 2008-11-04 05:52 EST ------- Created an attachment (id=1031) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1031&action=view) patch to add doctest to SeqRecord.py here it is a patch to add doctest documentation to Bio/SeqRecord.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 06:23:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 06:23:12 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041123.mA4BNCQ0030388@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 06:23 EST ------- Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) Patch to Bio/Seq.py to add start codon handling to translation Patch adds a new boolean argument to the translate method and function, called "init" (rather than my earlier suggestions like "from_start" or "check_start" which could be considered misleading). Docstring: init - Boolean, defaults to False. Should translation check the first codon is a valid initiation (start) codon and translate it as methionine (M)? If False, nothing special is done with the first codon. Example usage of the translate function, >>> from Bio.Seq import translate >>> translate("TTGAAACCCTAG") 'LKP*' >>> translate("TTGAAACCCTAG", init=True, to_stop=True) 'MKP' >>> translate("TTGAAACCCTAG", init=True) 'MKP*' >>> translate("TTGAAACCCTAG", to_stop=True) 'LKP' Using the Seq method, >>> from Bio.Seq import Seq >>> my_seq = Seq("TTGAAACCCTAG") >>> my_seq.translate() Seq('LKP*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_seq.translate(init=True, to_stop=True) Seq('MKP', ExtendedIUPACProtein()) >>> my_seq.translate(init=True) Seq('MKP*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_seq.translate(to_stop=True) Seq('LKP', ExtendedIUPACProtein()) Comments please. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 06:23:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 06:23:39 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041123.mA4BNdAS030439@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1031 is|0 |1 obsolete| | ------- Comment #2 from dalloliogm at gmail.com 2008-11-04 06:23 EST ------- Created an attachment (id=1033) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1033&action=view) patch to add doctest to SeqRecord.py This patch is maybe clearer than the previous one - it adds an example on adding annotations to a SeqRecord. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 4 06:36:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 4 Nov 2008 11:36:50 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta) Message-ID: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com> Dear all, The Numeric to numpy migration is done now, and we are also looking good for python 2.6. After a little off list discussion, its probably time to prepare the next release. However, given the number of changes, and therefore the higher risk that we've broken something, we'll call this a beta release. Are there any bugs or issues people think should block this release? I would like to check in my initiation/start codon argument patch for translation (see Bug 2381), but would like a little discussion on this first (in particular the argument naming). I'd like to try and do the Biopython 1.49 "beta" release at the end of this week (with a follow up Biopython 1.49 "final" release say one week later if needed to deal with any issues from the beta). If this schedule is realistic, then Tiago should be OK to add his next set of PopGen code in about two weeks time (for what would become Biopython 1.50). Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 4 06:48:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 06:48:53 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041148.mA4Bmrag032109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 06:48 EST ------- I think we would need to integrate this into the existing test framework so that any new doctests are actually used. For an example of this on a module by module basis, see test_Wise.py and test_psw.py (although these don't interact well with our test framework on Python 2.3, see bug 2613). If a large number of Biopython modules have doctests then a more automated system could be designed (searching all non-deprecated modules for doctests). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 07:04:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 07:04:54 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041204.mA4C4sHS000823@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #28 from dalloliogm at gmail.com 2008-11-04 07:04 EST ------- (In reply to comment #27) > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] > Patch to Bio/Seq.py to add start codon handling to translation > > Patch adds a new boolean argument to the translate method and function, called > "init" (rather than my earlier suggestions like "from_start" or "check_start" > which could be considered misleading). > > Docstring: > > init - Boolean, defaults to False. Should translation check the > first codon is a valid initiation (start) codon and translate > it as methionine (M)? If False, nothing special is done with > the first codon. I don't like the name 'init' :( it would be better to use an argument with the word 'force' in it. E.g.: force_has_coding, force_first_position, etc.. If you didn't have read this discussion in this bug report, it is not very clear what happens when init=True and why. You should add a description of why there is this options in the docstring. > > Example usage of the translate function, > > >>> from Bio.Seq import translate > >>> translate("TTGAAACCCTAG") > 'LKP*' > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > 'MKP' Without having read the discussion in this bug report, I was expecting an exception here.. why does it forces a Methionine to be in the first position? It loses the information of a Leu in the first position. > >>> translate("TTGAAACCCTAG", init=True) > 'MKP*' > >>> translate("TTGAAACCCTAG", to_stop=True) > 'LKP' > You could add a check for non coding aminoacids: >>> translate("UAACAGTGCAT") ExceptionError: Non coding aminoacid in the first position -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 07:28:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 07:28:56 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041228.mA4CSuvT002892@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 07:28 EST ------- (In reply to comment #28) > (In reply to comment #27) > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] > > Docstring: > > > > init - Boolean, defaults to False. Should translation check the > > first codon is a valid initiation (start) codon and translate > > it as methionine (M)? If False, nothing special is done with > > the first codon. > > I don't like the name 'init' :( it would be better to use an argument with the > word 'force' in it. E.g.: force_has_coding, force_first_position, etc.. Maybe - but I don't think force_has_coding, force_first_position are any clearer, and they are very long. Do you like "with_start_codon" or "with_init_codon"? Note that I used "init" rather than "initiation (codon)" because python already uses init as shorthand for initiation/initialisation. > If you didn't have read this discussion in this bug report, it is not very > clear what happens when init=True and why. If it have been called "start" or "from_start" or "start_codon" the meaning isn't clear either - you might "start" or expect "from_start" to take an integer location, and start_codon to take a three letter string. > You should add a description of why there is this options in the docstring. OK - That makes sense. > > > > Example usage of the translate function, > > > > >>> from Bio.Seq import translate > > >>> translate("TTGAAACCCTAG") > > 'LKP*' > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > > 'MKP' > > Without having read the discussion in this bug report, I was expecting an > exception here.. why does it forces a Methionine to be in the first position? > It loses the information of a Leu in the first position. Because if this was a CDS using an alternative start codon of TTG it would be translated as a methionine and NOT as a leucine (because instead of a typical tRNA-Leu, an initiation tRNA is used). This is whole point of this optional argument. If you want TTG translated blindly as M, don't use the init argument (or set it to False). See also http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi which explicitly lists these alternative codons as giving M when used as starts, e.g. AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M---------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 08:41:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 08:41:51 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811041341.mA4DfpYD009210@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-11-04 08:41 EST ------- I've committed Peter's fix for the set import to CVS. About the replacement for listfns.contents in the modified NaiveBayes code: Did you do any timings to compare the new code to the old code? Since listfns.contents is implemented in C, it may be (much) faster than the replacement code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 08:57:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 08:57:48 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041357.mA4Dvm2B010202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #30 from lpritc at scri.sari.ac.uk 2008-11-04 08:57 EST ------- (In reply to comment #29) > (In reply to comment #28) > > (In reply to comment #27) > > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details] > > > Docstring: > > > > > > init - Boolean, defaults to False. Should translation check the > > > first codon is a valid initiation (start) codon and translate > > > it as methionine (M)? If False, nothing special is done with > > > the first codon. > > > > I don't like the name 'init' :( it would be better to use an argument with the > > word 'force' in it. E.g.: force_has_coding, force_first_position, etc.. > > Maybe - but I don't think force_has_coding, force_first_position are any > clearer, and they are very long. Do you like "with_start_codon" or > "with_init_codon"? I think that there are two key things that are going on as a result of this setting being True: 1) The first codon (starting at position 0) of the nucleotide sequence is being checked as a valid initiation codon 2) If it is such a valid codon, the translated aa is Met (because this is what happens biologically). It's quite a complicated concept, and if we wanted to be completely explicit, an option called 'assert_first_codon_is_initiation_and_translate_to_met' would be clear, but would be far too long to be sensible. Most other shorter options are either ambiguous, misleading, or ambiguously misleading - largely because people will assume that the term means what they want it to mean instead of what it does, as described below: > If it have been called "start" or "from_start" or "start_codon" the meaning > isn't clear either - you might "start" or expect "from_start" to take an > integer location, and start_codon to take a three letter string. I am not too worried about long arguments, so 'assert_first_codon_init' would be fine for me (though does this mean that the first codon of the sequence should be an initiation codon, or that translation should start from the first initiation codon?), but I see the drive for, and value of, brevity. If there's a short, unambiguous option name that you can think of, I'm all for it. An option name that is a little cryptic, but not misleading, such as 'init', also works for me. I would have to go to the minor effort of typing help(seq.translate) to find out what it meant, but it's not very much of a chore. Also, people learn all kinds of non-standard uses for cryptic terms, all the time. For example, what on earth does 'popen3' mean? Why not open_pipes_with_stdin_stdout_stderr? 'popen3' is short, unambiguous (if not immediately obvious), and if you want to know what it means, then help() or a dip in the documentation will tell you. I think the same will be true of 'init', so long as no-one is likely to confuse it with some other meaning. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 08:58:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 08:58:21 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041358.mA4DwLiK010266@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #4 from dalloliogm at gmail.com 2008-11-04 08:58 EST ------- (In reply to comment #3) > I think we would need to integrate this into the existing test framework so > that any new doctests are actually used. For an example of this on a module by > module basis, see test_Wise.py and test_psw.py (although these don't interact > well with our test framework on Python 2.3, see bug 2613). > > If a large number of Biopython modules have doctests then a more automated > system could be designed (searching all non-deprecated modules for doctests). > If you think it would be useful, I can write other doctests for other modules in the following days. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 09:44:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 09:44:15 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041444.mA4EiFv5013693@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #31 from dalloliogm at gmail.com 2008-11-04 09:44 EST ------- (In reply to comment #30) > (In reply to comment #29) > > (In reply to comment #28) > > > (In reply to comment #27) > > > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details] [details] > > It's quite a complicated concept, and if we wanted to be completely explicit, > an option called 'assert_first_codon_is_initiation_and_translate_to_met' would > be clear, but would be far too long to be sensible. Most other shorter options > are either ambiguous, misleading, or ambiguously misleading - largely because > people will assume that the term means what they want it to mean instead of > what it does, as described below: > > > If it have been called "start" or "from_start" or "start_codon" the meaning > > isn't clear either - you might "start" or expect "from_start" to take an > > integer location, and start_codon to take a three letter string. > > I am not too worried about long arguments, so 'assert_first_codon_init' would > be fine for me (though does this mean that the first codon of the sequence > should be an initiation codon, or that translation should start from the first > initiation codon?), but I see the drive for, and value of, brevity. If there's > a short, unambiguous option name that you can think of, I'm all for it. An > option name that is a little cryptic, but not misleading, such as 'init', also > works for me. When I saw 'init' for the first time, I thought there it was some kind of complicated calculus associated with the translate function, that init=False was meant to skip in order to have some kind of faster but less accurate translation. > I would have to go to the minor effort of typing > help(seq.translate) to find out what it meant, but it's not very much of a > chore. It is also a matter of code readibility; I don't think many people would understand that init is meant for that by looking at a script. If I use this option in one of my scripts, and a colleague reads it, I want to be sure that he will be easily understand that I am forcing the first position to be a Methionine. Otherwise, the risk is that he won't understand properly my results. In which of these examples do you understand that the first position is being forced to a Methionine? >>> translate("TTGAAACCCTAG", init=True, to_stop=True) >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) Also, I don't think this option will be used very often. So, it shouldn't be a problem if its name is too long to type, and it would be better if it is easy to understand. > > Also, people learn all kinds of non-standard uses for cryptic terms, all the > time. For example, what on earth does 'popen3' mean? Why not > open_pipes_with_stdin_stdout_stderr? 'popen3' is short, unambiguous (if not > immediately obvious), When I was a python newbie, I really hated the name popen3 :) > and if you want to know what it means, then help() or a > dip in the documentation will tell you. I think the same will be true of > 'init', so long as no-one is likely to confuse it with some other meaning. > > L. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 09:45:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 09:45:17 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041445.mA4EjHg4013777@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 09:45 EST ------- (In reply to comment #4) > > If you think it would be useful, I can write other doctests for other modules > in the following days. > I think adding more doctests would be useful, but they MUST get run by our existing test suite. Otherwise they'll just be human readable documentation (which is still nice) but will not get regularly validated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 10:39:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 10:39:42 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041539.mA4Fdgc8017798@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #32 from lpritc at scri.sari.ac.uk 2008-11-04 10:39 EST ------- (In reply to comment #31) > (In reply to comment #30) > It is also a matter of code readibility; I don't think many people would > understand that init is meant for that by looking at a script. True enough, but if someone's already used it, and you don't know what it means when reading their script, looking it up isn't hard. What's hard is guessing which option you need to invoke, and calling help() is one way to do that, too... Not that I want to extend this argument to single-letter options with *no* relevance to their intent ;) seq.translate(a=True, b='GUG', c=9) > If I use this option in one of my scripts, and a colleague reads it, I want to > be sure that he will be easily understand that I am forcing the first position > to be a Methionine. > Otherwise, the risk is that he won't understand properly my results. Maybe put it in a comment-line? Even if the colleague understands from the code *that* you've translated an alternative start to a methionine, they may not understand *why* - and the comment line is essential, then. > In which of these examples do you understand that the first position is being > forced to a Methionine? None are particularly clear, but only one of them doesn't give me the wrong idea... > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) Because I've read this thread (or looked at the docs) - I understand this one ;) > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) I don't intuitively understand this. Does it mean that the sequence should be translatable? > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) Does this mean that the sequence will be translated from the first methionine the method finds? > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) As above, and does force_stop mean that you add a '*' to the end of the translation? Or that you stop at a stop codon? > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) 'alt_start' I would think referred to allowing translation from alternative start codons. I don't know what alt_stop would mean... > Also, I don't think this option will be used very often. Maybe not. The first use case that comes to mind is QA on CDS-finding: # Check if sequence is CDS: assert candidate_cds.translate(init=True) # Check if reported CDS start is valid assert est[37:].translate(init=True) A second use case is slower in presenting itself... > So, it shouldn't be a problem if its name is too long to type, and it would be > better if it is easy to understand. That's a fair argument, I think. On the whole, though, I would favour a short, unambiguous, slightly cryptic name over a very long, unambiguous name, over an ambiguous name of any length. > When I was a python newbie, I really hated the name popen3 :) At least we have subprocess, now. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 10:44:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 10:44:47 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041544.mA4FilLH018113@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #6 from dalloliogm at gmail.com 2008-11-04 10:44 EST ------- (In reply to comment #5) > (In reply to comment #4) > > > > If you think it would be useful, I can write other doctests for other modules > > in the following days. > > > > I think adding more doctests would be useful, but they MUST get run by our > existing test suite. Otherwise they'll just be human readable documentation > (which is still nice) but will not get regularly validated. There are a few ways to do it, but it is not too difficult to implement. The easiest thing is to use 'doctest.testmod' in the test files. For example, you can add to test_SeqRecord.py the following lines: import doctest from Bio import SeqRecord # import the module, not SeqRecord.SeqRecord print "testing with doctest..." (failures, tests) = doctest.testmod(SeqRecord) if failures == 0: print 'ok' else: print 'some test has failed' or you can launch the '_test' function in every module (see my patch), but this would require importing doctest multiple times. >>> SeqRecord._test() I will write some other doctests in the following days/weeks and post them here as patches, and you will decide. Anyway, do you think they will make biopython's documentation nicer? Do you like them? Sometimes, doctests make the doc strings a bit messy, so some people don't like them. But it is really a matter of how you write them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 11:11:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 11:11:49 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041611.mA4GBnuW020154@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 11:11 EST ------- (In reply to comment #32) > > In which of these examples do you understand that the first position is > > being forced to a Methionine? With my suggested code, you would not just be forcing the first codon to be a methionine. You would also be asking for the first codon to be validated as a start codon (initialisation codon). > None are particularly clear, but only one of them doesn't give me the wrong > idea... In some cases I seem to have guessed different possible meanings for some of these suggested names - so those are probably unclear. > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > > Because I've read this thread (or looked at the docs) - I understand this one > ;) To me this suggests something special is happening with the initialisation of the translation - but I agree its not clear what without checking the documentation. > > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) > > I don't intuitively understand this. Does it mean that the sequence should be > translatable? Ditto - an argument called force_as_translating means nothing to me. You're calling a translation method so what can forcing a translation mean? > > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) > > Does this mean that the sequence will be translated from the first methionine > the method finds? I would have guessed force_methionine would ignore the value of the first three nucleotides in order to treat them as a methionine (even if they are not a start codon). > > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) > > As above, and does force_stop mean that you add a '*' to the end of the > translation? Or that you stop at a stop codon? Like Leighton, I would be confused by "force_stop". It could mean add a stop symbol to the end of the amino acid sequence even if there isn't one there already. > > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) > > 'alt_start' I would think referred to allowing translation from alternative > start codons. I don't know what alt_stop would mean... I think "alt_start" would be misleading for the intended dual functionality. Consider the typical use case for this option - translating a CDS, which most of the time will use the typical start codon AUG / ATG (but not all ways). We'd want the start codon validated - and it often won't be an alternative start codon. So calling the argument "alt_start" is confusing. > > Also, I don't think this option will be used very often. > > Maybe not. The first use case that comes to mind is QA on CDS-finding: > > # Check if sequence is CDS: > assert candidate_cds.translate(init=True) > # Check if reported CDS start is valid > assert est[37:].translate(init=True) > > A second use case is slower in presenting itself... I think translating a CDS is quite a common task - so a very long argument would be bad. Instead of the "init" start codon option in attachment 1032, I'd also be happy with a single boolean argument which does start codon validation, treats this as a methionine, checks the sequence is a multiple of three in length, checks for a final stop codon, and checks for no additional stop codons. We'd ruled out calling this "complete", but maybe "cds" would be better? > > So, it shouldn't be a problem if its name is too long to type, and it would > > be better if it is easy to understand. > > That's a fair argument, I think. On the whole, though, I would favour a > short, unambiguous, slightly cryptic name over a very long, unambiguous > name, over an ambiguous name of any length. There is a lot of subjectiveness in argument naming - clearly we have not come up with a perfect suggestion yet. Unfortunately "init" can be misunderstood (I'm not 100% sure what you were trying to say in comment 31, but I think you thought from the name "init" could be some sort of optional optimisation initialisation). How about "cds_start" instead of "init"? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 12:43:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 12:43:53 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041743.mA4Hhrcc026138@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #34 from bsouthey at gmail.com 2008-11-04 12:43 EST ------- (In reply to comment #33) > (In reply to comment #32) > > > In which of these examples do you understand that the first position is > > > being forced to a Methionine? > > With my suggested code, you would not just be forcing the first codon to be a > methionine. You would also be asking for the first codon to be validated as a > start codon (initialisation codon). > > > None are particularly clear, but only one of them doesn't give me the wrong > > idea... > > In some cases I seem to have guessed different possible meanings for some of > these suggested names - so those are probably unclear. > > > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > > > > Because I've read this thread (or looked at the docs) - I understand this one > > ;) > > To me this suggests something special is happening with the initialisation of > the translation - but I agree its not clear what without checking the > documentation. > > > > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) > > > > I don't intuitively understand this. Does it mean that the sequence should be > > translatable? > > Ditto - an argument called force_as_translating means nothing to me. You're > calling a translation method so what can forcing a translation mean? > > > > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) > > > > Does this mean that the sequence will be translated from the first methionine > > the method finds? > > I would have guessed force_methionine would ignore the value of the first three > nucleotides in order to treat them as a methionine (even if they are not a > start codon). > > > > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) > > > > As above, and does force_stop mean that you add a '*' to the end of the > > translation? Or that you stop at a stop codon? > > Like Leighton, I would be confused by "force_stop". It could mean add a stop > symbol to the end of the amino acid sequence even if there isn't one there > already. > > > > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) > > > > 'alt_start' I would think referred to allowing translation from alternative > > start codons. I don't know what alt_stop would mean... > > I think "alt_start" would be misleading for the intended dual functionality. > Consider the typical use case for this option - translating a CDS, which most > of the time will use the typical start codon AUG / ATG (but not all ways). > We'd want the start codon validated - and it often won't be an alternative > start codon. So calling the argument "alt_start" is confusing. > > > > Also, I don't think this option will be used very often. > > > > Maybe not. The first use case that comes to mind is QA on CDS-finding: > > > > # Check if sequence is CDS: > > assert candidate_cds.translate(init=True) > > # Check if reported CDS start is valid > > assert est[37:].translate(init=True) > > > > A second use case is slower in presenting itself... > > I think translating a CDS is quite a common task - so a very long argument > would be bad. > > Instead of the "init" start codon option in attachment 1032 [details], I'd also be happy > with a single boolean argument which does start codon validation, treats this > as a methionine, checks the sequence is a multiple of three in length, checks > for a final stop codon, and checks for no additional stop codons. We'd ruled > out calling this "complete", but maybe "cds" would be better? > > > > So, it shouldn't be a problem if its name is too long to type, and it would > > > be better if it is easy to understand. > > > > That's a fair argument, I think. On the whole, though, I would favour a > > short, unambiguous, slightly cryptic name over a very long, unambiguous > > name, over an ambiguous name of any length. > > There is a lot of subjectiveness in argument naming - clearly we have not come > up with a perfect suggestion yet. > > Unfortunately "init" can be misunderstood (I'm not 100% sure what you were > trying to say in comment 31, but I think you thought from the name "init" could > be some sort of optional optimisation initialisation). > > How about "cds_start" instead of "init"? > As I think about this and the various comments, I do that you must apply the same reasoning to non-standard translation as was applied to the ORF finding comments. From that I understand that you want a basic translation function so function arguments like to_stop or cds_start would be inappropriate. Also, even if it was possible, I do not see that validating all known start codons under all genetic codes fits here. Rather I think the various comments reflect various combinations of three major steps: 1) Identify the region to be translated like NCBI's sequence viewer: range from 'begin' to 'end' to denote the region to be viewed. Under this view, start_from or begin_at could be the position to start or the first occurrence of a start codon. Likewise to_end or end_at could be a position or the first occurrence of a stop codon. I also note this also implies frame but I think that has a separate meaning. 2) Having defined the region to be translated, translate that region as defined by the frame and selected table. A question here is that if region is defined then should the frame be set to one or not. 3) Address any non-standard codons to the translated sequence. If you are going to allow non-standard start codons, you also need to handle selenocysteine (http://en.wikipedia.org/wiki/Selenocysteine) and less so pyrrolysine (http://en.wikipedia.org/wiki/Pyrrolysine). Technically, you can argue the table used for translation in 2) should reflect this but I consider it a separate issue. Also, the occurrence of a stop codon would likewise need to change. The non-standard codon usages are rare and I do really question if these are really part of the Seq object translate function or belong elsewhere. I really feel that if the user already knows that it is a non-AUG start codon then they can replace the first amino acid with Met rather than rely on the translate function. For example, the CDS field in the Genbank record for Mouse Neuropeptide W (NM_001099664) has: /exception="alternative start codon" /note="non-AUG (CUG) translation initiation codon". So if the user looked at the record then then would know it would need to be changed. If some form of the non-standard codons is included I would think some variant of Leighton's assert idea should be preferred such as using an assert_nonstandard argument (or just nonstandard). This would be a string, list or tuple to denote the changes to be made such as say 'Met1' or 'M1' where three or single letter code of the desired amino acid and the number is the location within the amino acid sequence to be changed. So Met1 would mean changing the amino acid at position one with Methionine (M). But I recognize this is not sufficient to handle other non-standard cases with stop codons. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 13:28:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 13:28:19 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041828.mA4ISJAd028961@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 13:28 EST ------- (In reply to comment #34) > As I think about this and the various comments, I do that you must apply the > same reasoning to non-standard translation as was applied to the ORF finding > comments. From that I understand that you want a basic translation function so > function arguments like to_stop or cds_start would be inappropriate. There is certainly an argument that the Bio.Seq translate function/methods should be kept as simple as possible while providing widely useful functionality. Perhaps given the lack of immediate agreement we are at that point already? Or perhaps this is a reflection of the different types of organisms people work with and thus the relative frequencies of non-standard start codons. > Also, even if it was possible, I do not see that validating all known start > codons under all genetic codes fits here. We have the valid start codons in the CodonTable objects derived from the NCBI, so it is possible to check them. > ... Address any non-standard codons to the translated sequence. If you are > going to allow non-standard start codons, you also need to handle > selenocysteine (http://en.wikipedia.org/wiki/Selenocysteine) and less so > pyrrolysine (http://en.wikipedia.org/wiki/Pyrrolysine). Why? Non-standard codons are pretty common in prokaryotes and the rules for translating them are simple (once the start codon is identified). On the other hand selenocysteine and pyrrolysine are very rare, and we can't define a computer rule to deal with them - so we don't even try. > The non-standard codon usages are rare and I do really question if these are > really part of the Seq object translate function or belong elsewhere. I really > feel that if the user already knows that it is a non-AUG start codon then they > can replace the first amino acid with Met rather than rely on the translate > function. For example, the CDS field in the Genbank record for Mouse > Neuropeptide W (NM_001099664) has: > /exception="alternative start codon" > /note="non-AUG (CUG) translation initiation codon". > So if the user looked at the record then then would know it would need to be > changed. Non-standard start codons are not that rare in prokaryotes (and I would not expect them to be annotated like your mouse example). When translating a well annotated sequence, the location itself should be enough. [I'm assuming we're not talking about the other meaning of the phrase "alternative start codons" - where a gene may have multiple valid start codons giving proteins of different lengths but the same C-terminal region.] > If some form of the non-standard codons is included I would think some > variantof Leighton's assert idea should be preferred such as using an > assert_nonstandard argument (or just nonstandard). This would be a string, > list or tuple to denote the changes to be made such as say 'Met1' or 'M1' > where three or single letter code of the desired amino acid and the number > is the location within the amino acid sequence to be changed. So Met1 would > mean changing the amino acid at position one with Methionine (M). But I > recognize this is not sufficient to handle other non-standard cases with > stop codons. I thought Leighton was just proposing another name for a boolean argument which I had called "init" in attachment 1032. I'm afraid I don't understand your idea of a complicated list argument. ============================================================================= Here is a concrete example, there are 418 annotated genes in E. coli K12 with non-standard start codons - which you might want to translate into proteins. #Using ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.ffn >>> from Bio import SeqIO >>> odd = [record for record in SeqIO.parse(open("NC_000913.ffn"),"fasta") \ if str(record.seq[:3]) <> "ATG"] >>> print "There are %i genes not starting ATG" % len(odd) There are 481 genes not starting ATG >>> record = odd[0] >>> print record.format("fasta") >ref|NC_000913.2|:5234-5530 GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA This starts GTC which is a valid bacterial start codon. I'd like to translate this and get the actual biologically relevant protein as given in the GenBank file NC_000913.gbk (maybe with or without the stop symbol at the end). See: CDS 5234..5530 /gene="yaaX" /locus_tag="b0005" /codon_start=1 /transl_table=11 /product="predicted protein" /protein_id="NP_414546.1" /db_xref="ASAP:ABE-0000015" /db_xref="UniProtKB/Swiss-Prot:P75616" /db_xref="GI:16127999" /db_xref="ECOCYC:G6081" /db_xref="EcoGene:EG14384" /db_xref="GeneID:944747" /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR" Without any non-standard start codon support, my translations start with a V: >>> print record.seq.translate(table=11) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* >>> print record.seq.translate(table=11, to_stop=True) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR With this proposed functionality I can obtain the desired results (both with and without the terminator stop symbol): >>> print record.seq.translate(table=11, to_stop=True, init=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR >>> print record.seq.translate(table=11, init=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* I think that wanting to translate a CDS like this is a fairly common operation. Perhaps not as common as translation of a partial sequence, or translating whole genomes or contigs where we want to translate through the stop codons -- but nevertheless, a common need. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 17:47:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 17:47:02 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811042247.mA4Ml2At014897@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #7 from bsouthey at gmail.com 2008-11-04 17:47 EST ------- (In reply to comment #6) > I've committed Peter's fix for the set import to CVS. > > About the replacement for listfns.contents in the modified NaiveBayes code: Did > you do any timings to compare the new code to the old code? Since > listfns.contents is implemented in C, it may be (much) faster than the > replacement code. > (Hopefully I created a patch correctly.) The purpose of listfns.contents() is to compute the frequency of each class and return it as a dictionary. There is a difference but it is very small between the different versions (1/100ths of second) for what I have looked at (which is more than the actual listfns.contents function). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 17:48:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 17:48:12 -0500 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200811042248.mA4MmCiZ015012@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #6 from bsouthey at gmail.com 2008-11-04 17:48 EST ------- Created an attachment (id=1036) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view) Patch to NaiveBayes -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 21:33:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 21:33:32 -0500 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200811050233.mA52XWrB025772@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #7 from bsouthey at gmail.com 2008-11-04 21:33 EST ------- (In reply to comment #6) > Created an attachment (id=1036) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view) [details] > Patch to NaiveBayes > Sorry about this as I do not know how this ended up here. Please just ignore it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 21:35:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 21:35:53 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811050235.mA52Zr0b025894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1014 is|0 |1 obsolete| | ------- Comment #8 from bsouthey at gmail.com 2008-11-04 21:35 EST ------- Created an attachment (id=1037) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view) Patch to update NaiveBayes Hopefully I got this correct, if not just let me know. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 05:24:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 05:24:15 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051024.mA5AOF60024355@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 05:24 EST ------- (In reply to comment #8) > Created an attachment (id=1037) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view) [details] > Patch to update NaiveBayes > > Hopefully I got this correct, if not just let me know. > At first glance it looks like this patch would remove the Python 2.3 set work around. Easily fixed. Also, I would have called the new get_content_freq function _get_content_freq (leading underscore denoting private) as this is an implementation detail that doesn't need to be part of the public API. I'm curious what your other implementations looked like, as this one does not look that clear to me at first read: p_contents=1.0/len(contents) content_freqs={} for cval in contents: vcount=content_freqs.get(cval,0)+p_contents content_freqs.update({cval:vcount}) In particular, why use the dict update method? Given the possible rounding issues, does doing the rescaling (dividing by the number of elements) at the start make a big time saving (over dividing each total at the end)? I would feel happier with the division at the end (as done in the listfns code). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 07:06:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 07:06:04 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811051206.mA5C64Pg030176@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 07:06 EST ------- I've updated Bio.Seq, Bio.SeqIO and Bio.AlignIO so my existing docstring examples can be used with doctest. Adding code via the __main__ trick to allow each module's test to be run individually might be worthwhile. The rest of this message is a possible "test_docstrings.py" file for our unit tests, which would require manual updating whenever we want to test a additional module. This is probably a neat short term solution while only a relatively small proportion of Biopython uses doctests. ----------------------------------------------------------------- #!/usr/bin/env python # This code is part of the Biopython distribution and governed by its # license. Please see the LICENSE file that should have been included # as part of this package. import doctest, unittest from Bio import Seq, SeqRecord, SeqIO, AlignIO test_modules = [Seq, SeqRecord, SeqIO, AlignIO] test_suite = unittest.TestSuite((doctest.DocTestSuite(module) \ for module in test_modules)) #Using sys.stdout prevent this working nicely when run from idle: #runner = unittest.TextTestRunner(sys.stdout, verbosity = 0) #Using verbosity = 0 means we won't have to regenerate the unit #test output file used by the run_tests.py framework whenever a #new module or doctest is added. runner = unittest.TextTestRunner(verbosity = 0) runner.run(test_suite) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 08:12:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:12:28 -0500 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200811051312.mA5DCSYZ004411@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #4 from chapmanb at 50mail.com 2008-11-05 08:12 EST ------- Fixed with Bio/GenBank/__init__.py 1.93, Bio/SeqFeature.py 1.14. Coordinates are now passed correctly with Peter's suggested fix. The empty slice issue is resolved by adding this as a special case to FeatureLocation nofuzzy attribute retrieval. For standard retrieval the classes are fully available to the user and they would need to make the distinction about how they would like to treat them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 08:14:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:14:51 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811051314.mA5DEpVe004918@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from chapmanb at 50mail.com 2008-11-05 08:14 EST ------- Fixed with Bio/GenBank/__init__.py 1.93, Bio/GenBank/Record.py 1.11 and Bio/GenBank/Scanner.py 1.24 The PROJECT line is parsed as a list of projects for both SeqIO and Record based parsing, for consistency. Output of PROJECT line also added. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 08:18:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:18:22 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051318.mA5DIMPJ005649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2008-11-05 08:18 EST ------- See http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html for some timings of this operation. I think Bruce's approach is most suitable, except for the dict update method; I would use content_freqs[cval] = content_freqs.get(cval,0)+p_contents instead. Depending on the contents of the list, sometimes it runs even faster than the implementation in listfns. > > Given the possible rounding issues, does doing the rescaling (dividing by the > number of elements) at the start make a big time saving (over dividing each > total at the end)? I would feel happier with the division at the end (as done > in the listfns code). > I think the rescaling at the start is a good thing. If the list contains many different objects, rescaling at the end can take a long time. Probably that is not the typical use case here, but on the other hand I don't see a good reason not to save time here. Maybe just my nitpicking, but I think the get_content_freq function will be more readable if we use different variable names inside this function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 08:31:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:31:49 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811051331.mA5DVnNI007802@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 08:31 EST ------- Do you think we have to worry about multiple project lines, or project entries spanning multiple lines? This would require a slight difference to the parsing (to append new project entries instead of replacing any prior entries), and to the output from the record object (including line wrapping). HOWEVER, reading the latest ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt it seems the PROJECT line will be replaced with a DBLINK line next year. With that in mind, I would now suggest we parse the PROJECT and/or DBLINK lines and store them in the record.dbxrefs list (rather than in the annotations). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 08:34:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:34:41 -0500 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200811051334.mA5DYfWx008228@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 08:34 EST ------- Hi Brad, Looking back on this I may have been out by one on the extension calculation, i.e. I'm not 100% sure position.high.val-position.low.val is appropriate. I'll try and look at this later... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 11:51:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 11:51:07 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051651.mA5Gp7R6003323@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #11 from bsouthey at gmail.com 2008-11-05 11:51 EST ------- (In reply to comment #10) > See > http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html > for some timings of this operation. I think Bruce's approach is most suitable, > except for the dict update method; I would use > content_freqs[cval] = content_freqs.get(cval,0)+p_contents > instead. Depending on the contents of the list, sometimes it runs even faster > than the implementation in listfns. > > > > Given the possible rounding issues, does doing the rescaling (dividing by the > > number of elements) at the start make a big time saving (over dividing each > > total at the end)? I would feel happier with the division at the end (as done > > in the listfns code). > > > I think the rescaling at the start is a good thing. If the list contains many > different objects, rescaling at the end can take a long time. Probably that is > not the typical use case here, but on the other hand I don't see a good reason > not to save time here. > > Maybe just my nitpicking, but I think the get_content_freq function will be > more readable if we use different variable names inside this function. > (In reply to comment #10) > See > http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html > for some timings of this operation. I think Bruce's approach is most suitable, > except for the dict update method; I would use > content_freqs[cval] = content_freqs.get(cval,0)+p_contents > instead. Depending on the contents of the list, sometimes it runs even faster > than the implementation in listfns. Basically the goal is find the frequency of each class and store it in a dictionary with the keys being each class and the value being the frequency. So you could count up all observations in each class (essentially a adding one to the appropriate class sum) and then divide each count by the total number of observations - as implemented in the dictget approach.Being more cryptic, we can avoid the second division by adding one/number of observations instead one to the appropriate class sum as implemented in get_content_freq. Thanks for the link, I created a timing code for random lists. get_content_freq is the one I put in the patch get_content_freq2 is the modified version ternary is based the Cory code modified to give frequencies rather than counts dictget is using a dictionary to count then get the frequencies listfns.contents is the Biopython Python version without the C code import. clistfns.contents is the direct import of Biopython module that uses C code My system is running 64-bit Fedora on Linux with Python 2.5.2. The number of observation is not important (difference is very small), I used 1000000 random integers and measured just doing it once and repeat the test 5 times with 1000000 executions and get the minimum time ie min(timeit.repeat(5, 1000000)). Also, this function is not called that much in the NaiveBayes so these are rather extreme cases. Range of ints between one and two: get_content_freq once: 1.90734863281e-05 best of 5: 8.11614704132 get_content_freq2 once: 8.10623168945e-06 best of 5: 4.39126110077 ternary file once: 1.59740447998e-05 best of 5: 9.42879796028 dictget file once: 1.4066696167e-05 best of 5: 10.468517065 listfns.contents once: 1.28746032715e-05 best of 5: 7.50778198242 clistfns.contents once: 6.91413879395e-06 best of 5: 2.71360707283 Range of ints between one and ten: get_content_freq once: 1.90734863281e-05 best of 5: 7.97784090042 get_content_freq2 once: 7.15255737305e-06 best of 5: 4.21833491325 ternary file once: 1.69277191162e-05 best of 5: 9.18815684319 dictget file once: 1.50203704834e-05 best of 5: 10.2242910862 listfns.contents once: 1.50203704834e-05 best of 5: 7.25569987297 clistfns.contents once: 8.10623168945e-06 best of 5: 2.6411280632 Range of ints between one and one hundred: get_content_freq once: 2.00271606445e-05 best of 5: 7.99760317802 get_content_freq2 once: 7.86781311035e-06 best of 5: 4.20446300507 ternary file once: 1.71661376953e-05 best of 5: 9.26767396927 dictget file once: 1.4066696167e-05 best of 5: 10.2449028492 listfns.contents once: 1.4066696167e-05 best of 5: 7.34166693687 clistfns.contents once: 7.15255737305e-06 best of 5: 2.63198709488 So this not dependent on the number of classes. For the most part this numbers are showing more system overheads than major differences between the actual approaches. Therefore I would clearly go with Michiel's version. > > > > Given the possible rounding issues, does doing the rescaling (dividing by the > > number of elements) at the start make a big time saving (over dividing each > > total at the end)? I would feel happier with the division at the end (as done > > in the listfns code). > > > I think the rescaling at the start is a good thing. If the list contains many > different objects, rescaling at the end can take a long time. Probably that is > not the typical use case here, but on the other hand I don't see a good reason > not to save time here. >From the two case scenario above, the get_content_freq methods result in: {1: 0.49978999999354606, 2: 0.50020999999354643} and the others result in: {1: 0.49979000000000001, 2: 0.50021000000000004} On my 64-bit linux system the numerical error is small but within the expectations. It may be worse on a 32-bit system or OS. I really wanted to draw attention to this because tiny differences can be important (not to mention people who don't understand enough about numerical precision). > > Maybe just my nitpicking, but I think the get_content_freq function will be > more readable if we use different variable names inside this function. > Please rename as necessary. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 12:00:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 12:00:42 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051700.mA5H0gxV003976@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #12 from bsouthey at gmail.com 2008-11-05 12:00 EST ------- Created an attachment (id=1038) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1038&action=view) timing different implementions of listfns.content This is my timing code for different implementions of listfns.content. It does assume that there is a local version of listnfs.py without the import clistfns statement at the end and the clistfns function from Bio. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 15:30:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 15:30:46 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052030.mA5KUklP023725@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #36 from bsouthey at gmail.com 2008-11-05 15:30 EST ------- (In reply to comment #35) Okay, this is what I think of the main uses for translation. All these can be easily achieved by the translate arguments table='Standard' and stop_symbol='*' with very little code. So I do not see any need for any extra arguments except for convenience. (I have these uses in file that I will upload after this.) So really my only issue left is what is the expected behaviour for: a) to_stop_codon=True if there are no valid stop codons (my understanding of to_stop). b) from_start_codon=True (or init=True etc) if there are no valid start codons 1) Translation in some given forward frame - reverse frames should be obvious. Looping over these will give all three frames but that could return multiple Seq objects. 2) Translation between any range of locations. From Peter's example, extracting the region between 5234 to 5530 in the complete sequence will give the yaaX gene CDS that can be translated into the protein sequence. 3a) Translate to the first valid stop codon. Perhaps not as expected because it should respect the frame so try: 3b) Translate to the first valid stop codon with respect to selected frame. 3c) Alternatively use to_stop=True argument of the translate. Here translation is to the first valid stop codon OR the end of the sequence. This second aspect is not documented. 4a) Start translation at first start codon. Again, does not respect frame so try: 4b) Translate to the first valid start codon with respect to selected frame. In both cases of 4) the very first codon must be checked against the defined start_codon list in the appropriate CodonTable. Obviously 3) and 4) should raise exceptions if stop or start codons are not found because of the specific request to stop or start translation. But, as in 3c), this could be relaxed to include the end of the sequence. I am not sure the behaviour if there is no valid start codon. Also some variation of 3a) and 4a) could be used to find possible open reading frames (from a start codon to stop codon). But this could return more than one Seq object. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 15:33:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 15:33:52 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052033.mA5KXqqJ023824@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #37 from bsouthey at gmail.com 2008-11-05 15:33 EST ------- Created an attachment (id=1039) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1039&action=view) examples of possible uses of translate -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 17:12:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 17:12:13 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052212.mA5MCDhY028649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 17:12 EST ------- (In reply to comment #36) > (In reply to comment #35) > Okay, this is what I think of the main uses for translation. > All these can be easily achieved by the translate arguments > table='Standard' and stop_symbol='*' with very little code. > So I do not see any need for any extra arguments except > for convenience. (I have these uses in file that I will > upload after this.) Most of your examples seem to relate to open reading frame searches, looking for start/stop codons etc. I agree this kind of thing isn't needed in the basic translate method/function. Doing a CDS translation however is more fiddly due to the methionine at the start, and I think this warrents another option in the basic translate method/function. > So really my only issue left is what is the expected behaviour for: > a) to_stop_codon=True if there are no valid stop codons (my understanding of > to_stop). If you are asking about the current to_stop argument in CVS right now, if there is no in frame stop codon it will translate all the sequence (to_stop has no effect). I've just updated the docstring to make this more explicit (see Bio/Seq.py CVS revision 1.55). Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > b) from_start_codon=True (or init=True etc) if there are no valid start codons As written in attachment 1032, if the sequence does not start with a valid start codon an exception is raised. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 18:09:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 18:09:01 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052309.mA5N91aO031273@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 18:09 EST ------- Created an attachment (id=1040) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view) Patch to Bio/Seq.py for complete CDS translation. (In reply to comment #33) > Instead of the "init" start codon option in attachment 1032, > I'd also be happy with a single boolean argument which does > start codon validation, treats this as a methionine, checks > the sequence is a multiple of three in length, checks for a > final stop codon, and checks for no additional stop codons. > We'd ruled out calling this "complete", but maybe "cds" > would be better? This patch adds this functionality via a "complete_cds" boolean argument. Here is how it could be applied to translate the CDS used as an example in my comment 35, the yaaX gene in E. coli K12: >>> from Bio.Seq import Seq >>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA") >>> my_cds.translate(table=11) Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_cds.translate(table=11, to_stop=True) Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', ExtendedIUPACProtein()) >>> my_cds.translate(table=11, complete_cds=True) Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', ExtendedIUPACProtein()) I would be happy with EITHER of these options, as both can be used to translate a complete coding sequence: (1) the "init" argument (under another name, maybe "cds_start"?) illustrated in attachment 1032. This would check the start codon is valid AND translate it as a methionine. (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) illustrated in this patch. This would check the start codon is valid AND translate it as a methionine AND check there are a whole number of codons AND check it ends with a stop codon AND check there are no extra in-frame stop codons. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 06:14:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:14:07 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811061114.mA6BE7jk002000@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 ------- Comment #3 from dalloliogm at gmail.com 2008-11-06 06:14 EST ------- Created an attachment (id=1041) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) add a check for the seq argument in seqrecord, to be a Seq object and not None This patch adds a check for the seq argument in SeqRecord. If seq is None (by default), it raises a ValueError Exception. If it is a Seq objects, it saves it as self.seq. If it is another kind of object (string, list, integer), it is converted to a string, and then used to instantiate a seq object. I thought that someone could use an integer (e.g.: 010100010101101) as a sequence, and in this case, the integer is first converted to a string (otherwise Seq() would return an error). Please, take care with this patch: I have messed a bit with cvs and patches :(, so, this patch contains also a doctest example that I have added for my self (see bug report 2640). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 06:31:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:31:57 -0500 Subject: [Biopython-dev] [Bug 2643] New: Proposal: fastPhaseOutputIO for SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2643 Summary: Proposal: fastPhaseOutputIO for SeqIO Product: Biopython Version: Not Applicable Platform: PC URL: http://github.com/dalloliogm/biopython--- popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com CC: tiagoantao at gmail.com Hi, fastPHASE is software for haplotype reconstruction and missing genotype estimation from population genetic SNP data. - http://stephenslab.uchicago.edu/software.html It is commonly used by some population genetics bioinformaticians. I had to convert the output from a fastPhase run to fasta; so I wrote a module that reads a fastPhase output file, and returns SeqRecord objects. fastPhase output contains information about SNPs and genotyping, and would probably be supported by the PopGen module that is being written for biopython. However, my module is thought to be used only to read the sequence information from the output file, and to create SeqRecord objects, ignoring any other kind of information. So, in the future we could have to fastPhaseOutputIterator-like modules, one that creates SeqRecord objects, and one other to be used in PopGen. The module has been tested with doctest. I'll attach a file with the tests along with the module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 06:40:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:40:17 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061140.mA6BeHwc003465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #1 from dalloliogm at gmail.com 2008-11-06 06:40 EST ------- Created an attachment (id=1042) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1042&action=view) fastPhase output iterator, for SeqIO If invoked directly, this module tries to call doctest.testfile over a file called test_fastPhaseOutputIO.py (I will post it in 5 minutes). You should edit this module to point it to the right file path on your computer. This module is thought to be used with SeqIO. You should modify SeqIO.__init__.py and add it to the _FormatToIterator dictionary. I didn't wrote a Writer handler, because you are not supposed to create fastPhaseOutput files manually (even if it could be useful for testing purposes). You can see the git history of this module here: - http://github.com/dalloliogm/biopython---popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 06:42:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:42:55 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061142.mA6Bgt77003705@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #2 from dalloliogm at gmail.com 2008-11-06 06:42 EST ------- Created an attachment (id=1043) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1043&action=view) this is a doctest file to test fastPhaseOutputIterator This file is called by fastPhaseOutputIO, when __name__ == '__init__' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 06:44:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:44:55 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061144.mA6BitTU003910@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #3 from dalloliogm at gmail.com 2008-11-06 06:44 EST ------- Created an attachment (id=1044) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1044&action=view) adds fastPhaseOutput support to SeqIO this patchs adds fastPhaseOutput support to SeqIO (not tested) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 06:50:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:50:39 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811061150.mA6Bod9J004289@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 06:50 EST ------- (In reply to comment #3) > Created an attachment (id=1041) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details] > add a check for the seq argument in seqrecord, to be a Seq object and not None > > This patch adds a check for the seq argument in SeqRecord. > If seq is None (by default), it raises a ValueError Exception. > If it is a Seq objects, it saves it as self.seq. > If it is another kind of object (string, list, integer), it is converted to a > string, and then used to instantiate a seq object. I was deliberately not checking the seq argument. There are several reasonable use cases: * a Seq object (normal) or a subclass of it. * a MutableSeq object (seems reasonable, note this is not a subclass of Seq) * None (seems a good way to handle sequence records where we don't know the sequence - for example some GenBank files). * a user defined sequence object which implements the Seq API but does not subclass Seq or MutableSeq (this is more difficult to check). > I thought that someone could use an integer (e.g.: 010100010101101) as a > sequence, and in this case, the integer is first converted to a string > (otherwise Seq() would return an error). Note that if someone did want to use some weird numerical sequence, then the SeqRecord object should NOT be trying to do anything special (guessing what is intended). The user should create a suitable Seq object themselves (ideally with a numerical alphabet object). Explicit rather than implicit (Zen of python). -- Note that I'm not 100% happy with the type checking we've just added. See "duck-typing" and interfaces versus types, http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46 The checks I've added shouldn't be too constraining - but maybe they should use using interface checking instead (or just revert back to no checking). Any comments from other people? This should be being CC'd to the dev mailing list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 07:14:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 07:14:04 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061214.mA6CE4PD005743@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 07:14 EST ------- Hi Marco, This looks interesting :) Could you attach the individual valid sample fastPHASE files as separate attachments (so they can be integrated into the existing unit tests). You seem to have picked very small files in order to use them as doctests; a larger more realistic example would be better for the unit tests (a few 5kb in size should be OK - not too big). Do you have URL for the file format documentation? Are they always DNA for example, or is RNA also possible? If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope with any valid fastPHASE output. In the doctests you have an example: ... BEGIN GENOTYPES ... Ind1 # subpop. label: 6 (internally 1) ... T ... T C ... Ind2 # subpop. label: 6 (internally 1) ... C ... T ... END GENOTYPES You're treating this as an error - "Two chromosomes with different length". Why isn't it parsed as four short sequences (of different lengths): "T", "TC", "C", "T"? Similarly, the final example: ... BEGIN GENOTYPES ... Ind1 # subpop. label: 6 (internally 1) ... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G ... Ind2 # subpop. label: 6 (internally 1) ... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G ... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A ... END GENOTYPES Again, you raised an error - "Missing sequence in input file". If this is a valid file shouldn't it be parsed as three sequences? On the other hand, are these hand edited files which deliberately break the rules? If fastPHASE files SHOULD always come in allele groups (of the same length), then it would be better to integrate the parser into Bio.AlignIO giving pairwise alignments (and you would be able to read it via Bio.SeqIO automatically as well). P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. Would "fastphase" be OK, or is there more than one format? e.g. an input format which might be confused with this? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 07:21:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 07:21:09 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061221.mA6CL9e8006180@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 07:21 EST ------- (In reply to comment #4) > You seem to have picked very small files in order to use them as > doctests; a larger more realistic example would be better for the > unit tests (a few 5kb in size should be OK - not too big). Sorry - that was a typo. I meant a few kb in size (5kb should be OK). I don't have a feel for the typical size of real fastPHASE output, but a few interesting real examples (e.g. covering a range of fastPHASE command line options) would be better than a single large file. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 07:25:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 07:25:42 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061225.mA6CPgsn006472@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 07:25 EST ------- P.S. The module's docstring needs some work - your introduction for this bug might be a good start. We should include the URL http://stephenslab.uchicago.edu/software.html and the reference in the module's docstring: Scheet, P and Stephens, M (2006) "A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase." Am J Hum Genet 78(4):629-44. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Nov 6 08:18:54 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 6 Nov 2008 13:18:54 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta) In-Reply-To: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com> References: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com> Message-ID: <6d941f120811060518w388bd471g129aafdaf02381d4@mail.gmail.com> On Tue, Nov 4, 2008 at 11:36 AM, Peter wrote: > If this schedule is realistic, then Tiago should be OK to add his next > set of PopGen code in about two weeks time (for what would become > Biopython 1.50). I am working on documentation and test cases for LDNe and extra GenePop support (this is more or less orthogonal to the ongoing discussion on statistics), code is all done for weeks. I will start to upload it as soon as you unfroze CVS from 1.49. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 09:24:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 09:24:12 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061424.mA6EOCcB015073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #40 from bsouthey at gmail.com 2008-11-06 09:24 EST ------- (In reply to comment #38) > (In reply to comment #36) > > (In reply to comment #35) > > Okay, this is what I think of the main uses for translation. > > All these can be easily achieved by the translate arguments > > table='Standard' and stop_symbol='*' with very little code. > > So I do not see any need for any extra arguments except > > for convenience. (I have these uses in file that I will > > upload after this.) > > Most of your examples seem to relate to open reading frame searches, looking > for start/stop codons etc. I agree this kind of thing isn't needed in the > basic translate method/function. > > Doing a CDS translation however is more fiddly due to the methionine at the > start, and I think this warrents another option in the basic translate > method/function. > > > So really my only issue left is what is the expected behaviour for: > > a) to_stop_codon=True if there are no valid stop codons (my understanding of > > to_stop). > > If you are asking about the current to_stop argument in CVS right now, if there > is no in frame stop codon it will translate all the sequence (to_stop has no > effect). I've just updated the docstring to make this more explicit (see > Bio/Seq.py CVS revision 1.55). > > Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > I think to_end because end does mean the end of the translation due to a stop codon or end of a sequence. > > b) from_start_codon=True (or init=True etc) if there are no valid start codons > > As written in attachment 1032 [details], if the sequence does not start with a valid > start codon an exception is raised. > Okay. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 09:35:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 09:35:40 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061435.mA6EZe5F015831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #41 from lpritc at scri.sari.ac.uk 2008-11-06 09:35 EST ------- (In reply to comment #40) > > If you are asking about the current to_stop argument in CVS right now, if there > > is no in frame stop codon it will translate all the sequence (to_stop has no > > effect). I've just updated the docstring to make this more explicit (see > > Bio/Seq.py CVS revision 1.55). > > > > Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > > > I think to_end because end does mean the end of the translation due to a stop > codon or end of a sequence. I would take 'to_end' to mean 'to the end of the passed sequence, ignoring all stop codons along the way'. 'to_first_stop' is clearer, to my mind, and even that leaves out the potential (and hopefully redundant) qualifier 'in-frame' ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 09:46:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 09:46:48 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061446.mA6Ekmfj016554@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #42 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 09:46 EST ------- Peter in comment #40 >>> If you are asking about the current to_stop argument in CVS right now, >>> if there is no in frame stop codon it will translate all the sequence >>> (to_stop has no effect). I've just updated the docstring to make this >>> more explicit (see Bio/Seq.py CVS revision 1.55). >>> >>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"? >>> Bruce in comment #41: >> I think to_end because end does mean the end of the translation >> due to a stop codon or end of a sequence. >> Leighton in comment #42: > I would take 'to_end' to mean 'to the end of the passed sequence, > ignoring all stop codons along the way'. 'to_first_stop' is > clearer, to my mind, and even that leaves out the potential (and > hopefully redundant) qualifier 'in-frame' ;) > I agree with Leighton here, "to_end" sounds like "to the end of the sequence given". I quite like "to_first_stop", but it is longer than "to_stop". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 10:07:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:07:06 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061507.mA6F76PK018513@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #43 from bsouthey at gmail.com 2008-11-06 10:07 EST ------- (In reply to comment #39) > Created an attachment (id=1040) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view) [details] > Patch to Bio/Seq.py for complete CDS translation. > > (In reply to comment #33) > > Instead of the "init" start codon option in attachment 1032 [details], > > I'd also be happy with a single boolean argument which does > > start codon validation, treats this as a methionine, checks > > the sequence is a multiple of three in length, checks for a > > final stop codon, and checks for no additional stop codons. > > We'd ruled out calling this "complete", but maybe "cds" > > would be better? > > This patch adds this functionality via a "complete_cds" boolean argument. > > Here is how it could be applied to translate the CDS used as an example in my > comment 35, the yaaX gene in E. coli K12: > > >>> from Bio.Seq import Seq > >>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA") > >>> my_cds.translate(table=11) > Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*', > HasStopCodon(ExtendedIUPACProtein(), '*')) > >>> my_cds.translate(table=11, to_stop=True) > Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', > ExtendedIUPACProtein()) > >>> my_cds.translate(table=11, complete_cds=True) > Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', > ExtendedIUPACProtein()) > > I would be happy with EITHER of these options, as both can be used to translate > a complete coding sequence: > > (1) the "init" argument (under another name, maybe "cds_start"?) illustrated in > attachment 1032 [details]. This would check the start codon is valid AND translate it as > a methionine. > > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) > illustrated in this patch. This would check the start codon is valid AND > translate it as a methionine AND check there are a whole number of codons AND > check it ends with a stop codon AND check there are no extra in-frame stop > codons. > I support (1) but strongly disagree with (2) because 'cds' refers to a complete DNA sequence not just if the sequence starts with M. http://www.yeastgenome.org/help/glossary.html "CDS: CoDing Sequence, region of nucleotides that corresponds to the sequence of amino acids in the predicted protein. The CDS includes start and stop codons, therefore coding sequences begin with an "ATG" and end with a stop codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR, introns, or bases not expressed due to frameshifting, are not included within a CDS. Note that the CDS does not correspond to the actual mRNA sequence." However, I do like being able to obtain the translation of the actual CDS - just not here. I do not support the name 'init' because of reasons discussed. I do not support the name 'cds_start' because of the DNA interpretation and that many Genbank records include the upstream and downstream non-coding regions. In such cases, I would have to find the actual start codon, then I might as well do the translation after that start codon than rely on a check that might be wrong. Perhaps some variant of: a) Similar cases in Python: has_met or has_met1 get_met or get_met1 b) More direct meaning: starts_with_methionine, starts_with_met, starts_with_m -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 10:08:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:08:17 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061508.mA6F8HRo018696@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #44 from bsouthey at gmail.com 2008-11-06 10:08 EST ------- (In reply to comment #42) > Peter in comment #40 > >>> If you are asking about the current to_stop argument in CVS right now, > >>> if there is no in frame stop codon it will translate all the sequence > >>> (to_stop has no effect). I've just updated the docstring to make this > >>> more explicit (see Bio/Seq.py CVS revision 1.55). > >>> > >>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > >>> > > Bruce in comment #41: > >> I think to_end because end does mean the end of the translation > >> due to a stop codon or end of a sequence. > >> > > Leighton in comment #42: > > I would take 'to_end' to mean 'to the end of the passed sequence, > > ignoring all stop codons along the way'. 'to_first_stop' is > > clearer, to my mind, and even that leaves out the potential (and > > hopefully redundant) qualifier 'in-frame' ;) > > > > I agree with Leighton here, "to_end" sounds like "to the end of the sequence > given". I quite like "to_first_stop", but it is longer than "to_stop". > Either is fine with me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 10:11:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:11:38 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061511.mA6FBcAY019165@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 10:11 EST ------- I've now had a quick look at the fastPHASE documentation, and I have the impression that the sequences should always come in pairs: "Output ???les for inferred haplotypes or imputed genotypes contain two lines per given diploid individual, with the order of individuals corresponding to that supplied in the input ???le." Assuming the paired sequences are always the same length, this does suggest the format should be integrated into Bio.AlignIO (giving pairwise alignments) rather than Bio.SeqIO. Have you tried not estimating the haplotypes (by supplying a negative integer following -H), and does this alter the sequence output? Finally could you try the -Z command line argument for the simplified output format (described as two lines per individual, without ???id??? lines, subpopulation labels or summary information from the run). Does this have the sequences? If so this may be a more parser friendly set of output to parse for Bio.SeqIO and/or Bio.AlignIO. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 10:27:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:27:07 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061527.mA6FR7TQ021259@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #45 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 10:27 EST ------- (In reply to comment #43) > (In reply to comment #39) > > I would be happy with EITHER of these options, as both can be used to > > translate a complete coding sequence: > > > > (1) the "init" argument (under another name, maybe "cds_start"?) > > illustrated in attachment 1032. This would check the start > > codon is valid AND translate it as a methionine. > > > > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) > > illustrated in this patch. This would check the start codon is valid AND > > translate it as a methionine AND check there are a whole number of codons > > AND check it ends with a stop codon AND check there are no extra in-frame > > stop codons. > > > > > I support (1) but strongly disagree with (2) because 'cds' refers to > a complete DNA sequence not just if the sequence starts with M. > http://www.yeastgenome.org/help/glossary.html > "CDS: CoDing Sequence, region of nucleotides that corresponds to the > sequence of amino acids in the predicted protein. The CDS includes start and > stop codons, therefore coding sequences begin with an "ATG" and end with a > stop codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR, > introns, or bases not expressed due to frameshifting, are not included within > a CDS. Note that the CDS does not correspond to the actual mRNA sequence." Starting with that definition but being aware of atypical start codons gives: "The CDS includes start and stop codons, therefore coding sequences begin with an "ATG" [or other valid start codon] and end with a stop codon." This then fits exactly with what I'm doing in the "complete_cds" option (attachment 1040). So why the disagreement? > However, I do like being able to obtain the translation of the actual > CDS - just not here. Back in comment 11, I previously mooted having separate methods like translate_to_stop, and translate_cds - but we currently seem to be leaning towards one method with some options. > I do not support the name 'init' because of reasons discussed. I think that is settled, "init" is too ambiguous. > I do not support the name 'cds_start' because of the DNA interpretation and > that many Genbank records include the upstream and downstream non-coding > regions. In such cases, I would have to find the actual start codon, then I > might as well do the translation after that start codon than rely on a check > that might be wrong. In such cases, if your sequence might includes upstream and downstream non-coding regions, then you shouldn't be trying to use the "init"/"cds_start" option (or the "complete_cds" option). By the nature of your uncertain dataset, you'll have to do some extra work to find the start/stop. I don't see how this is an argument against providing an option useful for when you do know where the CDS starts (or do already have the CDS). > Perhaps some variant of: > a) Similar cases in Python: > has_met or has_met1 > get_met or get_met1 > b) More direct meaning: > starts_with_methionine, starts_with_met, starts_with_m > I'd been avoiding names with methionine in them, preferring to focus on initiation or start codon based names. I guess "starts_with_met" is OK. Or maybe "start_met"? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 10:28:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:28:20 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061528.mA6FSKMv021486@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #46 from lpritc at scri.sari.ac.uk 2008-11-06 10:28 EST ------- (In reply to comment #43) > > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) > > illustrated in this patch. This would check the start codon is valid AND > > translate it as a methionine AND check there are a whole number of codons AND > > check it ends with a stop codon AND check there are no extra in-frame stop > > codons. > I support (1) but strongly disagree with (2) because 'cds' refers to a complete > DNA sequence not just if the sequence starts with M. > http://www.yeastgenome.org/help/glossary.html > "CDS: CoDing Sequence, region of nucleotides that corresponds to the > sequence of amino acids in the predicted protein. The CDS includes start and > stop codons, therefore coding sequences begin with an "ATG" and end with a stop > codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR, > introns, or bases not expressed due to frameshifting, are not included within a > CDS. Note that the CDS does not correspond to the actual mRNA sequence." That definition seems to correspond exactly to (2), above; not that web-based definitions have any particular authority ;) "Begin with an ATG" is a eukaryote-specific statement; "Begin with a (valid) start codon" covers this. "End with a stop codon", implying the *first in-frame* stop codon is the same in both cases. Where do you see that they differ? > I do not support the name 'cds_start' because of the DNA interpretation and > that many Genbank records include the upstream and downstream non-coding > regions. In such cases, I would have to find the actual start codon, then I > might as well do the translation after that start codon than rely on a check > that might be wrong. I don't think that the argument is proposed for that particular use-case, which is why I don't think it's valid, there. If, say, you knew that the 5`UTR ran to base 17, then you could check with seq[17:].translate(complete_cds=True) or some such arrangement - but that's not the problem that's being solved with that method argument, I think. > Perhaps some variant of: > a) Similar cases in Python: > has_met or has_met1 > get_met or get_met1 > b) More direct meaning: > starts_with_methionine, starts_with_met, starts_with_m I quite like this way of checking sequence properties, and would prefer an is_cds() (or, to be pedantic, is_conceptual_cds()) method that returns a Boolean, but otherwise implements the sort of behaviour described above. If you only wanted the conceptual translations of sequences that fit the criteria for a CDS, then a one-liner to replace [seq.translate(cds=True) for seq in seqlist] might be [seq.translate() for seq in seqlist if seq.is_cds()] I prefer the second option, for readability, but YMMV. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:06:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 11:06:46 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061606.mA6G6kL7028787@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #8 from dalloliogm at gmail.com 2008-11-06 11:06 EST ------- (In reply to comment #4) > Hi Marco, Hi!! :) > This looks interesting :) > > Could you attach the individual valid sample fastPHASE files as separate > attachments (so they can be integrated into the existing unit tests). You seem > to have picked very small files in order to use them as doctests; a larger more > realistic example would be better for the unit tests (a few 5kb in size should > be OK - not too big). ok Actually I have been using files which come from our laboratory analysis, and I would like to ask if I include them here and how first. > Do you have URL for the file format documentation? The fastphase format seems to be described only in fastphase's manual, which is only accessible after accepting a license agreement. I could contact the authors of the program to ask them to publish the format specifications publicly. It would be in their interest, as otherwise the format could be considered as a not standard. I'll let you know.. > Are they always DNA for example, or is RNA also possible? They should be DNA, In principle they could be also genes, or other kind of characters, but this software is designed for the purpose of reconstructing haplotypes from SNPs/microsatellites. Maybe Tiago has some more experience in this.. > If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope > with any valid fastPHASE output. In the doctests you have an example: > > ... BEGIN GENOTYPES > ... Ind1 # subpop. label: 6 (internally 1) > ... T > ... T C > ... Ind2 # subpop. label: 6 (internally 1) > ... C > ... T > ... END GENOTYPES > You're treating this as an error - "Two chromosomes with different length". > Why isn't it parsed as four short sequences (of different lengths): "T", "TC", > "C", "T"? You should not have a file in which a chromosome is longer than the other one... instead, you should have a '?' indicating data that the program could not infer. > Similarly, the final example: > > ... BEGIN GENOTYPES > ... Ind1 # subpop. label: 6 (internally 1) > ... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G > ... Ind2 # subpop. label: 6 (internally 1) > ... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G > ... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A > ... END GENOTYPES > > Again, you raised an error - "Missing sequence in input file". If this is a > valid file shouldn't it be parsed as three sequences? Because that would mean that one individual has only a chromosome. It doesn't make sense to run fastPhase on an haploid individual. > On the other hand, are these hand edited files which deliberately break the > rules? Yes. Usually you shouldn't have neither of the two cases. But I find it useful when a script tells me if there are weird things in my files (I could have modified them accidentally). This could be refactored in a check_fileformat function. > If fastPHASE files SHOULD always come in allele groups (of the same > length), then it would be better to integrate the parser into Bio.AlignIO > giving pairwise alignments (and you would be able to read it via Bio.SeqIO > automatically as well). This is good idea, I didn't think of it. But how should I modify the module to produce AlignIO objects? > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. > Would "fastphase" be OK, or is there more than one format? e.g. an input > format which might be confused with this? I agree.. I wasn't sure of biopython's naming conventions. > > Peter > Scheet and Stephens (2006) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:12:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 11:12:15 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061612.mA6GCFHq029869@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #9 from dalloliogm at gmail.com 2008-11-06 11:12 EST ------- (In reply to comment #7) > I've now had a quick look at the fastPHASE documentation, and I have the > impression that the sequences should always come in pairs: right! > "Output ???les for inferred haplotypes or imputed genotypes contain two lines > per given diploid individual, with the order of individuals corresponding to > that supplied in the input ???le." > > Assuming the paired sequences are always the same length, this does suggest the > format should be integrated into Bio.AlignIO (giving pairwise alignments) > rather than Bio.SeqIO. > Have you tried not estimating the haplotypes (by supplying a negative integer > following -H), and does this alter the sequence output? I will try it, ok. > Finally could you try the -Z command line argument for the simplified output > format (described as two lines per individual, without ???id??? lines, > subpopulation labels or summary information from the run). Does this have the > sequences? If so this may be a more parser friendly set of output to parse for > Bio.SeqIO and/or Bio.AlignIO. ok, I can try to implement both of the two formats, but for the moment I will prefer to concetrate on one. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 12:11:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 12:11:26 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061711.mA6HBQN5007343@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #47 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 12:11 EST ------- (In reply to comment #46) > If you only wanted the conceptual translations of sequences that fit the > criteria for a CDS, then a one-liner to replace > > [seq.translate(cds=True) for seq in seqlist] > > might be > > [seq.translate() for seq in seqlist if seq.is_cds()] > > I prefer the second option, for readability, but YMMV. > Note the above wouldn't give you translations starting with methionine, you'd need something like: [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()] (assuming we call the "init" option "cds_start") Or, going with the complete_cds option you could build a list of translations of valid CDSs like this: proteins = [] for seq in seqlist : try : proteins.append(seq.translate(complete_cds=True)) except ValueError : #Not a valid CDS, excluded pass Not a one liner, but I think in a real situation you'd want to do something with the invalid CDSs anyway (even if just logging them). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 12:32:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 12:32:52 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061732.mA6HWqE7009337@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #48 from lpritc at scri.sari.ac.uk 2008-11-06 12:32 EST ------- (In reply to comment #47) > (In reply to comment #46) > > [seq.translate() for seq in seqlist if seq.is_cds()] > > > > I prefer the second option, for readability, but YMMV. > > Note the above wouldn't give you translations starting with methionine, you'd > need something like: > > [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()] > > (assuming we call the "init" option "cds_start") Fair point... my focus was on putting that filter into the list comprehension. > Or, going with the complete_cds option you could build a list of translations > of valid CDSs like this: > > proteins = [] > for seq in seqlist : > try : > proteins.append(seq.translate(complete_cds=True)) > except ValueError : > #Not a valid CDS, excluded > pass > > Not a one liner, but I think in a real situation you'd want to do something > with the invalid CDSs anyway (even if just logging them). True enough. It comes down in part to a preference of style, as the same could be achieved with proteins = [] for seq in seqlist : if seq.is_cds(): proteins.append(seq.translate(complete_cds=True)) else: #Not a valid CDS, excluded pass I think the clarity of this arrangement to my eyes comes from 'is/is not a cds' being - naturally-speaking - a property or attribute of the sequence itself. The 'cds_start' argument in your example is then an instruction to treat the translation as though you have a CDS, and implement some specialised behaviour that is appropriate under that circumstance, rather than to implement a test that raises an error if it is failed. By separating the 'is_cds()' call from the 'cds_start' argument, you gain the ability to translate the sequence with either the methionine or the coded amino acid, without losing the test of the sequence being a CDS. Of course, using the 'cds_start=True' argument could force a call to self.is_cds(), anyway. Your non-one-liner could then be as you originally wrote: proteins = [] for seq in seqlist : try: proteins.append(seq.translate(complete_cds=True)) except ValueError: #Not a valid CDS, excluded pass The two advantages I see to having the is_cds() method as a separate call are that it permits separation of the determining the CDS status of the sequence, and that it provides a filter that is more readable than attempting to translate the sequence to find out if it's a valid CDS. If the 'cds_start' argument forces a self.is_cds() test, then the usage can be - I think - exactly as you've been proposing throughout the thread. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 12:33:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 12:33:12 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061733.mA6HXCuE009403@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 12:33 EST ------- (In reply to comment #8) > > ok > Actually I have been using files which come from our laboratory analysis, > and I would like to ask if I include them here and how first. If you can get permission to include a real example (and its not too big) that would be great. Ideally something with at least three alleles. > > Do you have URL for the file format documentation? > > The fastphase format seems to be described only in fastphase's manual, > which is only accessible after accepting a license agreement. > I could contact the authors of the program to ask them to publish the format > specifications publicly. It would be in their interest, as otherwise the > format could be considered as a not standard. I'll let you know. It's not very open, is it :( Are there any other tools that output this file format? Do you think the author might be willing to just add an option to output the sequences in another format (e.g. FASTA, or better an alignment format designed for more than one alignment). This would be a neater solution in the long run (and would benefit anyone using fastPhase - not just Biopython). > > Are they always DNA for example, or is RNA also possible? > > They should be DNA, In principle they could be also genes, or other kind of > characters, but this software is designed for the purpose of reconstructing > haplotypes from SNPs/microsatellites. > Maybe Tiago has some more experience in this.. If it is for DNA only, the sequences/alignments returned should ideally specify a DNA alphabet. > ... > Because that would mean that one individual has only a chromosome. > It doesn't make sense to run fastPhase on an haploid individual. Is fastPhase only for haploids? Could it be used with polyploidy (e.g. plants)? > > On the other hand, are these hand edited files which deliberately break the > > rules? > > Yes. Usually you shouldn't have neither of the two cases. But I find it > useful when a script tells me if there are weird things in my files (I > could have modified them accidentally). Yes - negative test cases are good. However, having them as a doctest made the docstring rather confusing. > > If fastPHASE files SHOULD always come in allele groups (of the same > > length), then it would be better to integrate the parser into Bio.AlignIO > > giving pairwise alignments (and you would be able to read it via Bio.SeqIO > > automatically as well). > > This is good idea, I didn't think of it. > But how should I modify the module to produce AlignIO objects? Essentially Instead of: yield record_one yield record_two you'd do something like this: alignment = Alignment(generic_dna) alignment.add_sequence(id_one, seq_one) alignment.add_sequence(id_two, seq_two) yield alignment > > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case > > rule. Would "fastphase" be OK, or is there more than one format? e.g. > > an input format which might be confused with this? > > I agree.. I wasn't sure of biopython's naming conventions. > This is written down elsewhere - but the format name is a lowercase string (and this is enforced in the API), and the same names are used in both SeqIO and AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS. (In reply to comment #9) > (In reply to comment #7) > > Finally could you try the -Z command line argument for the simplified output > > format (described as two lines per individual, without ???id??? lines, > > subpopulation labels or summary information from the run). Does this have > > the sequences? If so this may be a more parser friendly set of output to > > parse for Bio.SeqIO and/or Bio.AlignIO. > > ok, I can try to implement both of the two formats, but for the moment I will > prefer to concetrate on one. I was actually thinking the -Z format might be much simpler to deal with (I didn't mean to suggest supporting both). On the other hand, the documentation does say the -Z is "not intended for general use". Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Thu Nov 6 13:09:55 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 6 Nov 2008 19:09:55 +0100 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: <200811061733.mA6HXCuE009403@portal.open-bio.org> References: <200811061733.mA6HXCuE009403@portal.open-bio.org> Message-ID: <5aa3b3570811061009i29bb2faflb456978dacbf5218@mail.gmail.com> On Thu, Nov 6, 2008 at 6:33 PM, wrote: > > > > > ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 12:33 EST ------- > (In reply to comment #8) >> >> ok >> Actually I have been using files which come from our laboratory analysis, >> and I would like to ask if I include them here and how first. > > If you can get permission to include a real example (and its not too big) that > would be great. Ideally something with at least three alleles. ok.. >> > Do you have URL for the file format documentation? >> >> The fastphase format seems to be described only in fastphase's manual, >> which is only accessible after accepting a license agreement. >> I could contact the authors of the program to ask them to publish the format >> specifications publicly. It would be in their interest, as otherwise the >> format could be considered as a not standard. I'll let you know. > > It's not very open, is it :( > > Are there any other tools that output this file format? Do you think the > author might be willing to just add an option to output the sequences in > another format (e.g. FASTA, or better an alignment format designed for more > than one alignment). This would be a neater solution in the long run (and > would benefit anyone using fastPhase - not just Biopython). Not for my knowledge. Anyway, consider that a fastPhase run could take days for medium/big samples. In some situations it could be faster to convert its output to fasta (or other ones) directly, instead of re-calculating the results. >> > Are they always DNA for example, or is RNA also possible? >> >> They should be DNA, In principle they could be also genes, or other kind of >> characters, but this software is designed for the purpose of reconstructing >> haplotypes from SNPs/microsatellites. >> Maybe Tiago has some more experience in this.. > > If it is for DNA only, the sequences/alignments returned should ideally specify > a DNA alphabet. mmm ok... Basically it could be used also with characters like genes and other markers.. but in that case, it would not make sense to parse it as a sequence, so nobody would try to do it. >> Because that would mean that one individual has only a chromosome. >> It doesn't make sense to run fastPhase on an haploid individual. > > Is fastPhase only for haploids? Could it be used with polyploidy (e.g. > plants)? I think not... It would be another class of problem. What fastPhase does, is trying to infer haplotypes from genotype data. Humans and most eukaryotes are diploid, so they have two copies of each chromosome; when you genotype markers, for every individuals, you get two informations for each (e.g. 'AC' for a SNP). Let's say you are studying two SNPs in an single individual: you will have 'AC' for the first marker, and 'GT' for the second (you already know that they are in the same chromosome). You want to know which are the haplotypes, which means, if the 'A' from the first SNP is on the same molecule of the 'G' from the second SNP, and so on. For example, you could have a chromosome with 'AG' and the other with 'CT'; or a chromosome with 'AT' and the other with 'CG', and fastPhase tries to calculate which is the most likely (I won't be able to explain all the details properly). Moreover, fastPhase (there are other programs) can infer missing genotype data, which is useful when you have big collections of SNPs. That said, I don't know if it is able to infer haplotypes in polyploid organisms, but I don't think so, as it would be a different class of problem (more complex). I thought that the best thing to do is to do not support poliploidy, and if someone else that uses fastPhase to calculate that comes, it would be easy to adapt the module for it (it would require to just add an option) >> > On the other hand, are these hand edited files which deliberately break the >> > rules? >> >> Yes. Usually you shouldn't have neither of the two cases. But I find it >> useful when a script tells me if there are weird things in my files (I >> could have modified them accidentally). > > Yes - negative test cases are good. However, having them as a doctest made the > docstring rather confusing. mmm I know, that doctest could be refactored. I have started using test recently... I find it is a lot better. > >> > If fastPHASE files SHOULD always come in allele groups (of the same >> > length), then it would be better to integrate the parser into Bio.AlignIO >> > giving pairwise alignments (and you would be able to read it via Bio.SeqIO >> > automatically as well). >> >> This is good idea, I didn't think of it. >> But how should I modify the module to produce AlignIO objects? > > Essentially Instead of: > > yield record_one > yield record_two > > you'd do something like this: > > alignment = Alignment(generic_dna) > alignment.add_sequence(id_one, seq_one) > alignment.add_sequence(id_two, seq_two) > yield alignment sounds easy :) > >> > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case >> > rule. Would "fastphase" be OK, or is there more than one format? e.g. >> > an input format which might be confused with this? >> >> I agree.. I wasn't sure of biopython's naming conventions. >> > > This is written down elsewhere - but the format name is a lowercase string (and > this is enforced in the API), and the same names are used in both SeqIO and > AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS. > > (In reply to comment #9) >> (In reply to comment #7) >> > Finally could you try the -Z command line argument for the simplified output >> > format (described as two lines per individual, without "id" lines, >> > subpopulation labels or summary information from the run). Does this have >> > the sequences? If so this may be a more parser friendly set of output to >> > parse for Bio.SeqIO and/or Bio.AlignIO. >> >> ok, I can try to implement both of the two formats, but for the moment I will >> prefer to concetrate on one. > > I was actually thinking the -Z format might be much simpler to deal with (I > didn't mean to suggest supporting both). On the other hand, the documentation > does say the -Z is "not intended for general use". The problem is that it could take days to run a fastPhase... most of the times you want the longer format, and then proceed to parse it. Anyway, it should not be a big problem to implement it (I am just putting all of that information in SeqRecord.description) > > Peter > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Thu Nov 6 13:20:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 13:20:20 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061820.mA6IKK31012133@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #49 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 13:20 EST ------- OK - thank you all for your input thus far. Unfortunately it is clear that we haven't reached a consensus about translating sequences which begin with the start codon (or the more special case of translating a CDS sequence). However, I hope we are all happy with how things look in CVS right now, which offers a blind translation continuing over any stop codon, and the "to_stop" option which will terminate translation at the first in frame stop codon: See http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py?cvsroot=biopython for the full code, but in summary: class Seq(object): ... def translate(self, table="Standard", stop_symbol="*", to_stop=False): """Turns a nucleotide sequence into a protein sequence. New Seq object. Trying to back-transcribe a protein sequence raises an exception. This method will translate DNA or RNA sequences. Trying to translate a protein sequence raises an exception. table - Which codon table to use? This can be either a name (string) or an NCBI identifier (integer). This defaults to the "Standard" table. stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, "*". to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence). ... With the module level function taking the same arguments: def translate(sequence, table="Standard", stop_symbol="*", to_stop=False): """Translate a nucleotide sequence into amino acids. If given a string, returns a new string object. Given a Seq or MutableSeq, returns a Seq object with a protein alphabet. ... I think everyone is content with the naming of the "to_stop" argument. I'm planning to prepare the Biopython 1.49 beta release tomorrow, so I'm proposing we leave translation like this for Biopython 1.49 (and close this bug), and revisit translation after that is done (hopefully in less than two weeks time). The code in CVS is still a big improvement in terms of writing object orientated code. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 13:34:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 13:34:03 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061834.mA6IY3ra013125@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 13:34 EST ------- Replying to Marco's email on the dev mailing list: >> Are there any other tools that output this file format? Do you think the >> author might be willing to just add an option to output the sequences in >> another format (e.g. FASTA, or better an alignment format designed for more >> than one alignment). This would be a neater solution in the long run (and >> would benefit anyone using fastPhase - not just Biopython). > > Not for my knowledge. > Anyway, consider that a fastPhase run could take days for medium/big samples. > In some situations it could be faster to convert its output to fasta > (or other ones) directly, instead of re-calculating the results. OK - I had not appreciated the run time involved. Clearly it would not be sensible to have to repeat a long analysis just to get the results in another format (e.g. as FASTA, or the simplified -Z output whatever that looks like). >> If it is for DNA only, the sequences/alignments returned should ideally >> specify a DNA alphabet. > > mmm ok... > Basically it could be used also with characters like genes and other > markers.. but in that case, it would not make sense to parse it as a > sequence, so nobody would try to do it. That's interesting, and means assuming DNA wouldn't be safe. Just use the single letter alphabet then (rather than defaulting to the completely generic base alphabet). >>> Because that would mean that one individual has only a chromosome. >>> It doesn't make sense to run fastPhase on an haploid individual. >> >> Is fastPhase only for haploids? Could it be used with polyploidy (e.g. >> plants)? > > I think not... It would be another class of problem. > What fastPhase does, is trying to infer haplotypes from genotype data. OK - you can probably tell I'm not a population biologist from the questions ;) >> I was actually thinking the -Z format might be much simpler to deal >> with (I didn't mean to suggest supporting both). On the other hand, >> the documentation does say the -Z is "not intended for general use". > > The problem is that it could take days to run a fastPhase... most of > the times you want the longer format, and then proceed to parse it. > Anyway, it should not be a big problem to implement it OK (as I wrote above), I can see now that using the simplified -Z output is not sensible. > (I am just putting all of that information in SeqRecord.description) If we know the meaning of some of these fields, then ideally they should go in the annotations dictionary, rather than just in the SeqRecord description. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 14:00:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 14:00:59 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811061900.mA6J0xi3015085@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 14:00 EST ------- I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple unit test from comment 7 to make sure these get validated as part of the Biopython test suite. How does that look to you Marco? I've kept the __init__ example short, not doing anything with annotations. Do you think we should also have the __main__ trick in all modules with doctests? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 14:41:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 14:41:44 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811061941.mA6JfiHM019925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #9 from dalloliogm at gmail.com 2008-11-06 14:41 EST ------- (In reply to comment #8) > I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple > unit test from comment 7 to make sure these get validated as part of the > Biopython test suite. > > How does that look to you Marco? I've kept the __init__ example short, not > doing anything with annotations. I think they look ok.. to me, they seem good examples of how to use the module. > Do you think we should also have the __main__ trick in all modules with > doctests? I am not really experienced in managing such big projects... but I think it could be ok, at least for now. I would personally keep the __init__ trick for every module, because it would make easier to test a single module when you are still writing it. But to test many modules subsequently, the code you posted in in #7 is the way to do. so... in short, I don't know!! :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 15:34:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 15:34:36 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811062034.mA6KYa6b026157@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #50 from bsouthey at gmail.com 2008-11-06 15:34 EST ------- (In reply to comment #48) > (In reply to comment #47) > > (In reply to comment #46) > > > > [seq.translate() for seq in seqlist if seq.is_cds()] > > > > > > I prefer the second option, for readability, but YMMV. > > > > Note the above wouldn't give you translations starting with methionine, you'd > > need something like: > > > > [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()] > > > > (assuming we call the "init" option "cds_start") > > Fair point... my focus was on putting that filter into the list comprehension. > > > Or, going with the complete_cds option you could build a list of translations > > of valid CDSs like this: > > > > proteins = [] > > for seq in seqlist : > > try : > > proteins.append(seq.translate(complete_cds=True)) > > except ValueError : > > #Not a valid CDS, excluded > > pass > > > > Not a one liner, but I think in a real situation you'd want to do something > > with the invalid CDSs anyway (even if just logging them). > > True enough. It comes down in part to a preference of style, as the same could > be achieved with > > proteins = [] > for seq in seqlist : > if seq.is_cds(): > proteins.append(seq.translate(complete_cds=True)) > else: > #Not a valid CDS, excluded > pass > > I think the clarity of this arrangement to my eyes comes from 'is/is not a cds' > being - naturally-speaking - a property or attribute of the sequence itself. > The 'cds_start' argument in your example is then an instruction to treat the > translation as though you have a CDS, and implement some specialised behaviour > that is appropriate under that circumstance, rather than to implement a test > that raises an error if it is failed. By separating the 'is_cds()' call from > the 'cds_start' argument, you gain the ability to translate the sequence with > either the methionine or the coded amino acid, without losing the test of the > sequence being a CDS. > > Of course, using the 'cds_start=True' argument could force a call to > self.is_cds(), anyway. Your non-one-liner could then be as you originally > wrote: > > proteins = [] > for seq in seqlist : > try: > proteins.append(seq.translate(complete_cds=True)) > except ValueError: > #Not a valid CDS, excluded > pass > > The two advantages I see to having the is_cds() method as a separate call are > that it permits separation of the determining the CDS status of the sequence, > and that it provides a filter that is more readable than attempting to > translate the sequence to find out if it's a valid CDS. If the 'cds_start' > argument forces a self.is_cds() test, then the usage can be - I think - exactly > as you've been proposing throughout the thread. > The use of 'cds' alone is wrong because cds refer to DNA not translation and not to protein sequences. The use of cds is confusing or at least vague until you determine how it works. Also it could be wrong in the sense it is a valid cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just not allowed by the table in Bio.Data.CodonTable. I don't object to the purpose, rather I do object to the name. My overriding issue here is that 'cds_start' does not convey the purpose of this argument and this is likely to remain for some time in the API. One interpretation that also comes to mind is that it is the location of the start of the cds in the sequence (cds start at...). I really feel that the name must clearly reflect that it invokes a test that the first codon are in the 'start_codon' list (defined by the selected table from Bio.Data.CodonTable). This is not a check that it is the start of a cds rather it is a check for a possible open reading frame (as not all open reading frames are cds). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 23:46:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 23:46:08 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811070446.mA74k8Js031975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2008-11-06 23:46 EST ------- (In reply to comment #12) I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see if you're happy with this version? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Fri Nov 7 01:56:48 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Fri, 7 Nov 2008 04:56:48 -0200 Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall Message-ID: When I run a command line blast with these parameters: /root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec -q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq I find a match (with evalue of 18). But when I do it from biopyhon I can't find any match: rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db, fin, nuc_mismatch='-5', gap_open = '3', gap_extend = '3', search_length = '1.75e12', expectation='20') Here is the input sequence: >C07SpCP042I015.P5A02.R. [Clone-lib=pCLD 04541] NNNCCCCCCCTCGAGGTCGACNNNNNNNNTAAGCTTGAAATTCTATGATATGCAGTTAGT TGCTNCTNGTTTAGCATTGGTTGGTTAACTTAAAACCTTTTCCTGCAATAATTATATGGA TAATATTACTTTACTTNNNNNNNTATTGCCTTCACTAATTTTTAGGATCTATTTTCTGTT AAATGTTATCTCTTGTTCTTGAGAAGTGCTTTGGAGATCATTTTTCCATCGTATTAACAA AAAGTGAAATAACTACTTGTGCAATCAGGCTTTTCCTACACCAGGGGATAAGGCAAATAA ACTATTCACCTCCTTTAATTAGCTCCCCCCCCCCCCCCTCCCCTTCTTTTCTCTTCATTC CTGANNNANTTAGCTAGTACGCACCATTCAATCAATTATTTCTGTTCCATTTTGTGCTAA ATATGTTTTCAAATGTTTAATATAGTTCTGAAGACAGCAGTTTAATGTTTTGTCTGGCTA ACTGCTATTCTAAGCTCATTGTTTCAGCTTGCAGTTTTGCAGCAAAACCTGTCTGCTGTC CATGAAATCTGGAAGGAATGTAGTAAATTTTACAGTCTCAGCCTTCTATCTCTGAGGAAG TTTATATGGTCCTTCACGGAGCTGAGAGATCTGAATTCAGCCCACACAGCCTTACAGCAC ATGGTGAGATTGGCTTTTACGGAAAACTCTTACATTAGTAGAACTGCTGAGGGGAGGTTT TGTGATTTAAGATTGGATATTCCAGCACCTTCCTCTGGCAATTGGAGTTTCATCGATGTA TCTGTCGACACCGCGGGTAGCAGCAATTTTGATATGGAAAGACAAAGTCTTGGCAGAAAA ACA and here is the database: ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec (I got the parameter from http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen_docs.html#Parameters) Best, SB. -- Vendo isla: http://www.genesdigitales.com/isla/ Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 "It is pitch black. You are likely to be eaten by a grue." -- Zork From bugzilla-daemon at portal.open-bio.org Fri Nov 7 04:37:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 04:37:23 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811070937.mA79bNh9020433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #51 from lpritc at scri.sari.ac.uk 2008-11-07 04:37 EST ------- Just to perpetuate, what I suggest is (in pseudocode, and with argument names up for, well, argument): class Seq: [...] def startswith_startcodon(): """ Returns True if the first three bases of the sequence are a valid start codon in the sequence's codon table, returns False otherwise """ def endswith_stopcodon(): """ Returns True if the length of the sequence is a multiple of three, and the last three bases are a valid stop codon in the sequence's codon table, returns False otherwise """ def is_cds(): """ Returns true if the sequence meets the criteria for a CDS, False otherwise. The criteria are: i) The very first three bases of the sequence are a valid start codon ii) The sequence length is a multiple of three iii) The final three bases of the sequence are a valid stop codon iv) There are no in-frame stop codons, other than the final stop codon """ if not self.startswith_startcodon(): return False if not endswith_stopcodon(): return False # Test for in-frame stop codon, return True if none is found, return False otherwise def translate([...], assert_cds=False, assert_cds_firstcodon=False): """ Returns a new Seq object with the protein translation. If assert_cds is True, but the sequence is not a CDS as determined by self.is_cds(), then an error is thrown. Otherwise, the sequence is translated with the first codon read as a methionine, rather than the amino acid which it would encode at any other position. If assert_cdsfirstcodon is true, but the sequence doesn't start with a valid start codon, then an error is thrown. Otherwise, the sequence is translated with the first codon read as a methionine, as above. """ # Translate away as normal, here [...] if assert_cds: if not self.is_cds(): raise ValueError, "WTF? This is no CDS, my good fellow human!" else: # Make the first amino acid of the translated sequence a Met if assert_cdsfirstcodon: if not self.startswith_startcodon(): raise ValueError, "Hey! Stop playing around, this sequence doesn't start with a start codon" else: # Make the first amino acid of the translated sequence a Met # Then continue as normal This approach provides the following behaviour (assuming things about argument names that can be thrashed out later) # I want to translate some nt sequence, and don't care about stops, starts, or any other stuff aaseq = ntseq.translate() # I want to translate my nt sequence to the first in-frame stop codon, and no further aaseq = ntseq.translate(to_stop=True) # I want to know if my nt sequence is a (putative) CDS ntseq.is_cds() # I want to know if my nt sequence starts with a start codon ntseq.startswith_startcodon() # I want to know if my nt sequence ends with an in-frame stop codon # Note that this is a different question to asking whether there is *any* in-frame stop codon ntseq.endswith_stopcodon() # I want to translate my nt sequence, which I know is a CDS, # but not convert the first codon to a methionine aaseq = ntseq.translate() # I want to translate my nt sequence, which I know is a CDS, # and convert the first codon to a methionine aaseq = ntseq.translate(assert_cds=True) # OK, my sequence isn't a *real* CDS, but it still starts with a valid start codon # (I checked already with ntseq.startswith_startcodon()), and I'd like to convert the first # codon as if it was really a CDS. You don't need to know why, I just do. I'm wacky that way. aaseq = ntseq.translate(assert_cdsfirstcodon=True) # I'd like a list of all my sequences that are valid CDS seqlist = [s for s in myntseqs if s.is_cds()] # I'd like translations of all my sequences that are valid CDS tlist1 = [s.translate() for s in seqlist] tlist2 = [s.translate() for s in myntseqs if s.is_cds()] In terms of nomenclature: The default behaviour of translate() as Peter proposed: read through in-frame and translate with the appropriate codon table - is fine in nearly all circumstances. Most other circumstances are covered by stopping at the first in-frame stop codon, which Peter has implemented, and is an option we all seem to agree on. Biologically-speaking, this behaviour is not always correct for CDS in prokaryotes, where alternative start codons may occur a significant minority of the time. These will be mistranslated if no provision is made for them. I think a useful biological sequence object should at least try to mimic actual biology, so we should provide an option to handle this. We should not assume that a sequence is a CDS unless it is specified by the user. It seems reasonable to me that the term 'cds' should occur in any such argument from the user. We have at least two options for how to proceed with a CDS: i) we can provide a strict CDS-type translation, which requires confirmation that the sequence is, in fact, a CDS; ii) we can provide a weak CDS-type translation, which only modifies the way the start codon is translated. In both cases, behaviour is specific to CDS, and so having 'cds' in the argument name *somewhere* seems obvious, and entirely reasonable. I think that 'assert_cds' makes clear that we are asserting that the sequence is a valid CDS - no internal stops and everything else that comes with that status. I think that 'assert_cdsfirstcodon' avoids any ambiguity over the word 'start', and also conveys that we are asserting that the first (rather than start) codon has some relationship to a CDS; in this case the relationship is that the first codon of the sequence meets the criteria for a CDS. But that's kind of a long argument name ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 7 04:48:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 04:48:18 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811070948.mA79mIRl021035@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #52 from lpritc at scri.sari.ac.uk 2008-11-07 04:48 EST ------- (In reply to comment #50) > The use of 'cds' alone is wrong because cds refer to DNA not translation and > not to protein sequences. The use of cds is confusing or at least vague until > you determine how it works. I think that translate() also refers only to nucleotide sequences, and therefore the association of 'cds' is not inherently confusing on that count. I think that it can be an appropriate term in an argument name (see above). > Also it could be wrong in the sense it is a valid > cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just > not allowed by the table in Bio.Data.CodonTable. It's up to the user to use the correct codon table for their purpose, I think. Otherwise, how would you propose to correct for their error? > [...] 'cds_start' [...] One interpretation that also > comes to mind is that it is the location of the start of the cds in the > sequence (cds start at...). I agree with this. It has the potential to be confusing. > This is not a check that it is the start of a cds > rather it is a check for a possible open reading frame (as not all open reading > frames are cds). It is true that not all ORFs are CDS (indeed, by far the majority are not). However, open reading frames do not have to start with - or even contain - a start codon. They just do not contain an in-frame stop codon. We've been over this definition before (comment #21). L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 7 05:13:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 10:13:21 +0000 Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall In-Reply-To: References: Message-ID: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com> On Fri, Nov 7, 2008 at 6:56 AM, Sebastian Bassi wrote: > When I run a command line blast with these parameters: > > /root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec > -q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq > > I find a match (with evalue of 18). > But when I do it from biopyhon I can't find any match: > > rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db, > fin, nuc_mismatch='-5', > gap_open = '3', > gap_extend = '3', > search_length = '1.75e12', > expectation='20') You are not using exactly the same arguments, so its not surprising you get different results: -q -5 =>nuc_mismatch = -5 (or as a string) -G 3 => gap_open = 3 (or as a string) -E 3 => gap_extend = 3 (or as a string) -F "m D" => filter="m D" (MISSING!) -e 700 => expectation=700 (or as a string) -Y = 1.75e12 => search_length = '1.75e12' (or as a float) Your expectation cut off is more generous in the Biopython version (700) than the commanline line version (20), but that wouldn't explain the difference. Its probably due to omitting the filter option (-F). If that doesn't resolve the difference then there is something very strange going on... Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 7 06:14:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 06:14:13 -0500 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200811071114.mA7BED84026709@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 06:14 EST ------- I've updated CVS to treat a between position like 3^4 (one based counting) as a zero length slice 3:3. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 7 06:19:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 06:19:12 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811071119.mA7BJCjd027093@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 06:19 EST ------- Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO and Bio.AlignIO (the later are complicated due to finding the input files). Thanks for the encouragement Marco - hopefully this has also made the docstring documentation more useful, and will also improve the API docs too: http://biopython.org/DIST/docs/api/ (updated for each release) Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 7 06:52:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 06:52:50 -0500 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200811071152.mA7BqoKj029425@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 06:52 EST ------- "Fixed" by skipping these tests (and the recently added test_docstrings.py) if run on Python 2.3. Python 2.3 doctest uses slightly different formatting. It also doesn't support some features like -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 7 07:32:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 12:32:33 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta) Message-ID: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com> Hi all, I've been going over a few little things on the unit tests (e.g. python 2.3's doctest isn't quite the same), and think I am ready to prepare Biopython 1.49 (beta). I plan to make the Windows installers for Python 2.3, 2.4 and 2.5 against numpy 1.1.1 Currently there is no Windows version of numpy for python 2.6, so we won't be able to ship a Windows installer for python 2.6 for Biopython either. So, its CVS freeze time. Once the beta is out (hopefully later today), we can start using CVS for documentation updates or fixing any bugs reported in the beta. Then in about a week's time I hope to do the Biopython 1.49 "final" release. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 7 10:18:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 10:18:47 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811071518.mA7FIlHb012537@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #14 from bsouthey at gmail.com 2008-11-07 10:18 EST ------- (In reply to comment #13) > (In reply to comment #12) > I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see > if you're happy with this version? > Yes! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Fri Nov 7 11:30:34 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Fri, 7 Nov 2008 14:30:34 -0200 Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall In-Reply-To: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com> References: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com> Message-ID: On Fri, Nov 7, 2008 at 8:13 AM, Peter wrote: > -q -5 =>nuc_mismatch = -5 (or as a string) > -G 3 => gap_open = 3 (or as a string) > -E 3 => gap_extend = 3 (or as a string) > -F "m D" => filter="m D" (MISSING!) I will try with this. > -e 700 => expectation=700 (or as a string) > -Y = 1.75e12 => search_length = '1.75e12' (or as a float) I used string since I have the biopython version with the bug that doesn't allow me to enter non iterable values. > the difference. Its probably due to omitting the filter option (-F). > If that doesn't resolve the difference then there is something very > strange going on... OK, I will check it and get back with the results. Thank you. Best, SB. From biopython at maubp.freeserve.co.uk Fri Nov 7 11:53:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 16:53:58 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta) In-Reply-To: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com> References: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com> Message-ID: <320fb6e00811070853w77cd415dn68b1889c09388fb6@mail.gmail.com> > Once the beta is out (hopefully later today), we can start using CVS > for documentation updates or fixing any bugs reported in the beta. > Then in about a week's time I hope to do the Biopython 1.49 "final" > release. OK - Biopython 1.49 beta is done, available on the website now :) Please don't do any new code checkins for the next week. Additional documentation and unit tests should be fine - and any bug fixes after discussion. I've done a news post, which I can edit if anyone spots anything wrong or has suggestion for improvement, but it will be a good basis for the announcement email: http://news.open-bio.org/news/2008/11/biopython-149-beta-released/ Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 7 11:55:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 11:55:22 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811071655.mA7GtM6F018980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 11:55 EST ------- Grand - this bug seems to be fixed then (and in time for Biopython 1.49 beta). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 8 21:56:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 8 Nov 2008 21:56:59 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811090256.mA92uxgL025316@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #3 from chapmanb at 50mail.com 2008-11-08 21:56 EST ------- Thanks Peter for the heads up on the future changes. Fixed this with respect to the offered suggestions with Bio/GenBank/Record.py 1.12; Bio/GenBank/Scanner.py 1.25 and Bio/GenBank/__init__.py 1.95. I left PROJECT output as shown in our example as it was not clear from the GenBank documentation whether they would be on multiple or single lines. DBLINK was output over multiple line as defined in the documentation. When files with DBLINKs are released we should include a test case. For feature parsing, both DBLINK and PROJECT will be stored as dbxrefs as suggested. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 10:04:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 10:04:09 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811091504.mA9F49hU030667@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-09 10:04 EST ------- You've got a minor bug in there Brad... def dblink(self, content): """Store DBLINK cross references as dbxrefs in our record object. """ dblinks = [l for l in content.split() if l] self.data.dbxrefs.extend(projects) Should be: self.data.dbxrefs.extend(dblinks) However, based on the example DBLINK line, we shouldn't be splitting on spaces at all - for example this transition example for when the PROJECT line and DBLINK lines are present: LOCUS CP000964 5641239 bp DNA circular BCT 24-SEP-2008 DEFINITION Klebsiella pneumoniae 342, complete genome. ACCESSION CP000964 VERSION CP000964.1 GI:206564770 PROJECT GenomeProject:28471 DBLINK Project:28471 Trace Assembly Archive:123456 .... Note that "Trace Assembly Archive:123456" should be a single cross reference. I'll attach a patch for CVS in a moment. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 10:07:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 10:07:30 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811091507.mA9F7U0N030977@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-09 10:07 EST ------- Created an attachment (id=1045) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1045&action=view) Patch to Bio/GenBank/*.py This patch against CVS assumes DBLINK lines contain one cross reference per line. Also maps "GenomeProject:" to "Project:" so that we'll be consistent when the NCBI change this as part of the PROJECT line to DBLINK line switch. Should avoid duplicate entries in the dbxrefs list (especially during the transition period where both PROJECT and DBLINK lines are used). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Nov 9 10:16:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 9 Nov 2008 15:16:50 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released Message-ID: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> Dear Biopythoneers, We are pleased to announce a beta release of Biopython 1.49. There are been some significant changes since Biopython 1.48 was released two months ago, which is why we are initially releasing a beta for wider testing. As previously announced, the big news is that Biopython now uses NumPy rather than its precursor Numeric (the original Numerical Python library). As in the previous releases, Biopython 1.49 beta supports Python 2.3, 2.4 and 2.5 but should now also work fine on Python 2.6. Please note that we intend to drop support for Python 2.3 in a couple of releases time. We also have some new functionality, starting with the basic sequence object (the Seq class) which now has more methods. This encourages a more object orientated coding style, and makes basic biological operations like transcription and translation more accessible and discoverable. Our BioSQL interface can now optionally fetch the NCBI taxonomy on demand when loading sequences (via Bio.Entrez) allowing you to populate the taxon/taxon_name tables gradually. Also, BioSQL should now work with the psycopg2 driver for PostgreSQL (as well as the older psycopg driver). Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now considered to be deprecated, meaning mxTextTools is no longer required to use Biopython. This should not affect any of the typically used parsers (e.g. Bio.SeqIO and Bio.AlignIO). So, if you are feeling brave and know the risks, please try out Biopython 1.49 beta, and let us know on the mailing lists if it works, or more importantly if something doesn't. We'd also like feedback on the updated Biopython Tutorial and Cookbook: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Source distributions and Windows installers are available from the Biopython website: http://biopython.org/wiki/Download Thanks! -Peter on behalf of the Biopython developers P.S. Those of you subscribed to our news feed would have seen this announcement already. For RSS links etc, see: http://biopython.org/wiki/News From bugzilla-daemon at portal.open-bio.org Sun Nov 9 11:00:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 11:00:39 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811091600.mA9G0dZ6003494@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #11 from dalloliogm at gmail.com 2008-11-09 11:00 EST ------- (In reply to comment #10) > Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the > doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO > and Bio.AlignIO (the later are complicated due to finding the input files). > > Thanks for the encouragement Marco - hopefully this has also made the docstring > documentation more useful, and will also improve the API docs too: > http://biopython.org/DIST/docs/api/ (updated for each release) Thanks to you!! :) I am really happy you accepted my patch. I'll see if I can contribute something else. > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Nov 9 11:10:59 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 9 Nov 2008 16:10:59 +0000 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: References: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com> Message-ID: <320fb6e00811090810s342e78f1n3eb45bba051d236f@mail.gmail.com> Getting back to simpler plot examples using pylab, Andrew Dalke wrote up some nice examples plotting Kyte & Doolittle hydrophobicities of protein sequences: http://www.dalkescientific.com/writings/NBN/plotting.html Something based on this idea (but probably leaving out most of the complicated smoothing stuff and labelling the helices) could make a short and sweet line plot example for the Biopython tutorial. Peter From bugzilla-daemon at portal.open-bio.org Sun Nov 9 12:29:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:29:34 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091729.mA9HTYF1011072@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1042 is|0 |1 obsolete| | ------- Comment #12 from dalloliogm at gmail.com 2008-11-09 12:29 EST ------- Created an attachment (id=1046) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1046&action=view) fastPhase output iterator (returns Alignment objects) This is the rewritten fastphaseoutputIO, which returns an Alignment file instead of SeqRecords objects. It can still return SeqRecord objects if a 'ret = seqrecord' parameter is passed, but Alignemnt are returned by default. Moreover, I have de-capitalized (.lower()) the name of the function, and added a link to fastPhase article in the documentation (althought I think the doc would need more work) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 12:30:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:30:25 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091730.mA9HUP6J011190@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1046 is|0 |1 obsolete| | ------- Comment #13 from dalloliogm at gmail.com 2008-11-09 12:30 EST ------- Created an attachment (id=1047) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1047&action=view) a doctest file to test fastPhaseOutputIterator -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 12:34:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:34:19 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091734.mA9HYJ7I011664@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #14 from dalloliogm at gmail.com 2008-11-09 12:34 EST ------- Created an attachment (id=1048) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1048&action=view) use cases/description for fastphaseoutputIO This is a collection of use cases/examples about fastPhaseOutputIO. I thought it could be useful to understand how this module will be used and by who, or just to remind me why I wrote this module later :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 12:41:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:41:26 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091741.mA9HfQlr012379@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #15 from dalloliogm at gmail.com 2008-11-09 12:41 EST ------- (In reply to comment #10) > (In reply to comment #8) > > > If fastPHASE files SHOULD always come in allele groups (of the same > > > length), then it would be better to integrate the parser into Bio.AlignIO > > > giving pairwise alignments (and you would be able to read it via Bio.SeqIO > > > automatically as well). > > > > This is good idea, I didn't think of it. > > But how should I modify the module to produce AlignIO objects? > > Essentially Instead of: > > yield record_one > yield record_two > > you'd do something like this: > > alignment = Alignment(generic_dna) > alignment.add_sequence(id_one, seq_one) > alignment.add_sequence(id_two, seq_two) > yield alignment I have modified the module so it returns Alignment objects instead of SeqRecords. The problem is that Alignment.add_sequence doesn't support SeqRecords objects as inputs; it only requires an id and the sequence. This causes that some information is lost: to be more precise, everything I was putting in 'description' (subpop. label: 6 (internally 1)) is lost, because there is not a way to store it in the Alignment object. Moreover, now the parser only returns a single Alignment object per file (I think it is not supposed to be possible to have two fastphase outputs in the same file), because I thought it was the most useful thing. However, I left an option to have SeqRecord objects returned instead of Alignments (unfortunately I removed them from the doctests :(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 12:46:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:46:13 -0500 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <200811091746.mA9HkDPr012817@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #3 from dalloliogm at gmail.com 2008-11-09 12:46 EST ------- (In reply to comment #0) > It would be nice to be able to supply a list (or iterator) of SeqRecord objects > when creating an alignment object. This would also make the > Bio.SeqIO.to_alignment() function obsolete. I agree with this request; see http://bugzilla.open-bio.org/show_bug.cgi?id=2643#c15 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 12:52:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:52:48 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811091752.mA9HqmqQ013518@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #12 from dalloliogm at gmail.com 2008-11-09 12:52 EST ------- Created an attachment (id=1049) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1049&action=view) add doctests to Bio.Align.Generic.Alignment This is a patch to add doctest to Bio.Align.Generic.Alignment. I just wrote it for myself to understand how this class works.. if you think it could be useful, here it is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 16:35:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 16:35:25 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811092135.mA9LZPBG004563@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 ------- Comment #6 from chapmanb at 50mail.com 2008-11-09 16:35 EST ------- Peter -- thanks for the bug catch and suggestion. Working into the future and trying to predict if NCBI is going to do what they plan is always fun. Your fix looks great to me -- commit away and we can close this out. If things are different when the actually make the change we can always adjust then but this looks very sensible. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 03:58:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 03:58:52 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments for their types In-Reply-To: Message-ID: <200811100858.mAA8wq2i007149@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|SeqRecord.init doesn't check|SeqRecord.init doesn't check |for arguments to their types|for arguments for their | |types ------- Comment #5 from dalloliogm at gmail.com 2008-11-10 03:58 EST ------- (In reply to comment #4) > (In reply to comment #3) > > Created an attachment (id=1041) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details] [details] > > add a check for the seq argument in seqrecord, to be a Seq object and not None > > > > This patch adds a check for the seq argument in SeqRecord. > > If seq is None (by default), it raises a ValueError Exception. > > If it is a Seq objects, it saves it as self.seq. > > If it is another kind of object (string, list, integer), it is converted to a > > string, and then used to instantiate a seq object. > > I was deliberately not checking the seq argument. Ok, understood. I didn't thought of these cases. However, having not a Seq causes errors that are difficult to understand in other functions that use SeqRecord. For example, if you do: >>> a = SeqRecord(id = '1') >>> a.format('fasta') you get the error: : 'NoneType' object has no attribute 'tostring' This could scary an eventual biopython newbie, an exception like to 'error - current SeqRecord object doesn't have a Seq' could be better. What do you think about creating a 'NullSeq' object, which represent a Seq with no value, and using it as a default for SeqRecord? Later we could modify the other functions like .format e Seq.translate to intercept these objects and return the right error message. > There are several reasonable > use cases: > > * a Seq object (normal) or a subclass of it. > * a MutableSeq object (seems reasonable, note this is not a subclass of Seq) > * None (seems a good way to handle sequence records where we don't know the > sequence - for example some GenBank files). > * a user defined sequence object which implements the Seq API but does not > subclass Seq or MutableSeq (this is more difficult to check). > > > I thought that someone could use an integer (e.g.: 010100010101101) as a > > sequence, and in this case, the integer is first converted to a string > > (otherwise Seq() would return an error). > > Note that if someone did want to use some weird numerical sequence, then the > SeqRecord object should NOT be trying to do anything special (guessing what is > intended). The user should create a suitable Seq object themselves (ideally > with a numerical alphabet object). Explicit rather than implicit (Zen of > python). > > -- > > Note that I'm not 100% happy with the type checking we've just added. See > "duck-typing" and interfaces versus types, > http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46 > > The checks I've added shouldn't be too constraining - but maybe they should use > using interface checking instead (or just revert back to no checking). > > Any comments from other people? This should be being CC'd to the dev mailing > list. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 04:09:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 04:09:42 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811100909.mAA99g8S008678@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1043 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 05:16:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 05:16:14 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101016.mAAAGERI012974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 05:16 EST ------- (In reply to comment #15) > I have modified the module so it returns Alignment objects instead of > SeqRecords. > The problem is that Alignment.add_sequence doesn't support SeqRecords objects > as inputs; it only requires an id and the sequence. This causes that some > information is lost: to be more precise, everything I was > putting in 'description' (subpop. label: 6 (internally 1)) is lost, because > there is not a way to store it in the Alignment object. Adding a SeqRecord to an alignment would be enhancement request Bug 2553. I see you've just spotted enhancement request Bug 2554 which would also solve this issue nicely. As a short term solution until one of these bugs is implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API to use alignment._records directly (this is just a list of SeqRecord objects). > Moreover, now the parser only returns a single Alignment object per file (I > think it is not supposed to be possible to have two fastphase outputs in the > same file), because I thought it was the most useful thing. Bio.AlignIO uses generators/iterators just like Bio.SeqIO - so that in general you can return multiple alignments for use with Bio.AlignIO.parse(). However, if the file format really does just return one pairwise alignment, then just yield one alignment (this happens on the Nexus file format). > However, I left an option to have SeqRecord objects returned instead of > Alignments (unfortunately I removed them from the doctests :(). If you want this as part of Bio.AlignIO / Bio.SeqIO you don't need to do this. Once a parser is added to Bio.AlignIO, the file format can also be used from Bio.SeqIO to get SeqRecord objects (the rows of all the alignments). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 05:45:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 05:45:34 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811101045.mAAAjYJ6015314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 05:45 EST ------- (In reply to comment #3) > When files with DBLINKs are released we should include a test case. Definitely. We might be able to just update an existing test case, like the one added for between locations. (In reply to comment #6) > Peter -- thanks for the bug catch and suggestion. Working into the future > and trying to predict if NCBI is going to do what they plan is always fun. Well - they've got about six months to change their mind ;) > Your fix looks great to me -- commit away and we can close this out. Checked in. > If things are different when the actually make the change we can always > adjust then but this looks very sensible. OK. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Nov 10 06:28:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 10 Nov 2008 11:28:00 +0000 Subject: [Biopython-dev] [BioPython] annotations in an Alignment object In-Reply-To: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com> References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com> Message-ID: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com> On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio wrote: > Is there any way to store some annotations in an Alignment object?? > For example: the alignment tool used, its parameters, its version, the > date, and the nature of the sequence aligned. Not officially, no. This is on my mental list of things to do with the alignment object (after Biopython 1.49 is done). I've CC'd the dev-mailing list which is probably a better place to discuss the details. If you look at Bio/AlignIO/StockholmIO.py or the Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of information in a private dictionary, i.e. alignment._annotations. This makes the data available if anyone really needs it, but signals that this is not part of the public API and is likely to change. As part of an alignment annotation enhancement, we should try and establish some agreed standards for naming annotation entries (and also counting systems). > I am asking this because I would like to write a module to create > ldhat input files from an alignment program. > A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html) > is very similar to a fasta file; the only difference is that in its > first line, it contains three numbers, one of which can't always be > inferred by the data. Why go to the trouble of making a new Bio.AlignIO module? For this example from the LDhat manual, it looks like a FASTA file with an extra header: 4 10 1 >SampleA TCCGC??RTT >SampleB TACGC??GTA >SampleC TC?-CTTGTA >SampleD TCC-CTTGTT Rather than writing support for a whole new file format, wouldn't it be easier to do something like this: alignment = ... number_a = 4 number_b = 10 number_c = 1 handle = open("example.txt","w") handle.write("%i %i %i\n" % (number_a, number_b, number_c)) handle.write(alignment.format("fasta")) handle.close() Peter From dalloliogm at gmail.com Mon Nov 10 06:42:31 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 10 Nov 2008 12:42:31 +0100 Subject: [Biopython-dev] [BioPython] annotations in an Alignment object In-Reply-To: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com> References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com> <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com> Message-ID: <5aa3b3570811100342t7c23c0fl2b101be3fd352159@mail.gmail.com> On Mon, Nov 10, 2008 at 12:28 PM, Peter wrote: > On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio > wrote: >> Is there any way to store some annotations in an Alignment object?? >> For example: the alignment tool used, its parameters, its version, the >> date, and the nature of the sequence aligned. > > Not officially, no. This is on my mental list of things to do with > the alignment object (after Biopython 1.49 is done). I've CC'd the > dev-mailing list which is probably a better place to discuss the > details. > > If you look at Bio/AlignIO/StockholmIO.py or the > Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of > information in a private dictionary, i.e. alignment._annotations. > This makes the data available if anyone really needs it, but signals > that this is not part of the public API and is likely to change. > > As part of an alignment annotation enhancement, we should try and > establish some agreed standards for naming annotation entries (and > also counting systems). ok... I will use the private dictionary for my own implementation. Unfortunately I don't have any useful suggestion for this.. >> I am asking this because I would like to write a module to create >> ldhat input files from an alignment program. >> A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html) >> is very similar to a fasta file; the only difference is that in its >> first line, it contains three numbers, one of which can't always be >> inferred by the data. > > Why go to the trouble of making a new Bio.AlignIO module? For this > example from the LDhat manual, it looks like a FASTA file with an > extra header: Yeah.. of course :) Let's say I am simply playing with biopython's code, to better understand it. Since I am going to use this function many times, I will have to write a module for it any way. The first number in the ldhat file is the number of sequences, the second is their length, and the third should be usually one in an alignment object, I suppose. > > 4 10 1 >>SampleA > TCCGC??RTT >>SampleB > TACGC??GTA >>SampleC > TC?-CTTGTA >>SampleD > TCC-CTTGTT > > Rather than writing support for a whole new file format, wouldn't it > be easier to do something like this: > > alignment = ... > number_a = 4 > number_b = 10 > number_c = 1 > > handle = open("example.txt","w") > handle.write("%i %i %i\n" % (number_a, number_b, number_c)) > handle.write(alignment.format("fasta")) > handle.close() > > Peter > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Nov 10 06:48:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 06:48:08 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811101148.mAABm8WO019854@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1033 is|0 |1 obsolete| | ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 06:48 EST ------- (From update of attachment 1033) Something similar was checked into CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 07:02:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 07:02:12 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811101202.mAAC2CV4020912@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1049 is|0 |1 obsolete| | ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 07:02 EST ------- (From update of attachment 1049) I've checked in something similar to CVS - thanks Marco. I've not added a doctest for the format method using "clustal" because I think the bits make the documentation nasty to read. Instead I've just "fasta" and "phylip" only. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 07:14:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 07:14:28 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101214.mAACESXB021859@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 07:14 EST ------- (In reply to comment #16) > (In reply to comment #15) > > I have modified the module so it returns Alignment objects instead of > > SeqRecords. > > The problem is that Alignment.add_sequence doesn't support SeqRecords > > objects as inputs; it only requires an id and the sequence. This > > causes that some information is lost: to be more precise, everything > > I was putting in 'description' (subpop. label: 6 (internally 1)) is > > lost, because there is not a way to store it in the Alignment object. > > Adding a SeqRecord to an alignment would be enhancement request Bug 2553. I > see you've just spotted enhancement request Bug 2554 which would also solve > this issue nicely. As a short term solution until one of these bugs is > implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API > to use alignment._records directly (this is just a list of SeqRecord objects). Or, for another approach which at least avoids private properties but instead makes an assumption that added sequences are always put at the end of the alignment: alignment = Alignment(generic_dna) alignment.add_sequence(id_one, seq_one) assert alignment[-1].id == id_one alignment[-1].description = desrc_one alignment[-1].annotations["label"] = label_one ... alignment.add_sequence(id_two, seq_two) assert alignment[-1].id == id_two alignment[-1].description = desrc_two alignment[-1].annotations["label"] = label_two ... yield alignment However, I agree with you, the best solution is to pass SeqRecord objects to the alignment directly (i.e. Bug 2553 and/or Bug 2554). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 11:04:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:04:06 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101604.mAAG46Cj008024@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #18 from dalloliogm at gmail.com 2008-11-10 11:04 EST ------- (In reply to comment #17) > > Or, for another approach which at least avoids private properties but instead > makes an assumption that added sequences are always put at the end of the > alignment: > > alignment = Alignment(generic_dna) > > alignment.add_sequence(id_one, seq_one) > assert alignment[-1].id == id_one > alignment[-1].description = desrc_one > alignment[-1].annotations["label"] = label_one > ... > > alignment.add_sequence(id_two, seq_two) > assert alignment[-1].id == id_two > alignment[-1].description = desrc_two > alignment[-1].annotations["label"] = label_two > ... > yield alignment > Ok!! I ended up using the first method, but I left a comment in the code to remind me that. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 11:06:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:06:49 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101606.mAAG6nDL008314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1044 is|0 |1 obsolete| | ------- Comment #19 from dalloliogm at gmail.com 2008-11-10 11:06 EST ------- Created an attachment (id=1050) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1050&action=view) fastPhase output iterator (returns an Alignment object with SeqRecords) This version returns an Alignment object with valid SeqRecord objects, using the Alignment._records.append trick. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 11:07:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:07:27 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101607.mAAG7RLr008403@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1047 is|0 |1 obsolete| | ------- Comment #20 from dalloliogm at gmail.com 2008-11-10 11:07 EST ------- Created an attachment (id=1051) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1051&action=view) 1047: a doctest file to test fastPhaseOutputIterator updated for attachment 1050 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 11:34:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:34:34 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101634.mAAGYYbi010826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 11:34 EST ------- Hi Marco, Looking at your example, the important part of the file is this bit: ... BEGIN GENOTYPES Ind1 # subpop. label: 6 (internally 1) T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G T T T T T G C C C C C A A A A G C G C G T C G T C A G T C T A A G A C C T A Ind2 # subpop. label: 6 (internally 1) C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A END GENOTYPES Quoting the manual again, "Output ???les for inferred haplotypes or imputed genotypes contain two lines per given diploid individual, with the order of individuals corresponding to that supplied in the input ???le." In this example we have two individuals, Ind1 and Ind2 (presumably with automatically assigned names). In a real world example, how many individuals would you expect to use? Does it make more sense to return a pairwise alignment for each individual, rather than one large combined alignment? One of the main points for using iterators/generators is they allow us to deal with very large files by not having to keep everything in memory. Now I don't have a feel for what sized files fastPhase could output - maybe a single large alignment is fine. i.e. One combined alignment: IUPACUnambiguousDNA() alignment with 4 rows and 38 columns TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1 TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2 CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1 TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2 versus one pairwise alignment per individual: IUPACUnambiguousDNA() alignment with 2 rows and 38 columns TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1 TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2 IUPACUnambiguousDNA() alignment with 2 rows and 38 columns CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1 TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2 I think you'll have to decide this (unless anyone else following this has a view - Tiago maybe?) P.S. Have you tried with and without the -n option to automatically name the individuals? What happens if the name includes a hash character (#)? I would hope fastPhase would treat this as an error, but it could end up in the output file and confuse the parser. P.P.S. Based on the examples in the manual, typical output might use lower case nucleotides (a, t, c, g) or numbers (0, 1). I presume upper case nucleotides are also fine, but defaulting to this is a bad idea. Please default to Bio.Alphabet.single_letter_alphabet which seems to be the the safest choice (we shouldn't guess). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 14:19:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 14:19:15 -0500 Subject: [Biopython-dev] [Bug 2649] New: Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2649 Summary: Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: paul at rudin.co.uk Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. The numpy default for floats is "float64" on 64 bit machines and this would seem to be a more natural and practical choice. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 17:25:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 17:25:33 -0500 Subject: [Biopython-dev] [Bug 2651] New: Error from test_GAQueens.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2651 Summary: Error from test_GAQueens.py Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I got this error with Python2.5 but it is extremely rare. I think that I seen it before but have never reproduced it. It indicates some bugs are lurking other than the obvious bug with Seq.py that are being triggered by the test. ====================================================================== ERROR: test_GAQueens ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 142, in runSafeTest cur_test.run_tests([]) File "test_GAQueens.py", line 42, in run_tests main(arguments) File "test_GAQueens.py", line 76, in main evolved_pop = evolver.evolve(queens_solved) File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Evolver.py", line 56, in evolve self._population = self._selector.select(self._population) File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Tournament.py", line 77, in select new_orgs[1]) File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Abstract.py", line 53, in mutate_and_crossover final_org_1 = self._repairer.repair(final_org_1) File "test_GAQueens.py", line 234, in repair duplicated_items = self._get_duplicates(organism.genome) File "test_GAQueens.py", line 203, in _get_duplicates if genome.count(item) > 1: File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/Seq.py", line 796, in count if len(search) == 1 : TypeError: object of type 'int' has no len() ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 18:28:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 18:28:26 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811102328.mAANSQiJ032135@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor Component|Main Distribution |Unit Tests ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 18:28 EST ------- What bug in Seq? Trying to call the count method with an integer argument instead of string or another Seq should fail - try it on a string for comparison: >>> "123456".count(1) Traceback (most recent call last): File "", line 1, in ? TypeError: expected a character buffer object I would agree that the TypeError message could be better, "object of type 'int' has no len()" is a little misleading. Are you suggesting that be changed? Genetic algorithms (with a random seed at least) are non deterministic - I've seen some of the GA unit tests fail every so often (but I'm not sure off hand if its just test_GAQueens or not). Rerunning the test will usually be fine. The traceback looks familiar so its probably the same issue, but I haven't had the time or desire to trace through the code to try and work out what is going wrong. I would guess it fails far less than 10% of time, but maybe 1% or 2%. I guess a quick shell script would answer this ;) Maybe we should catch the error condition and issue a runtime error saying "Didn't converge" or whatever would be appropriate terminology. Or automatically restart the test? Or, maybe we can solve the unit test failure by specifying a random seed - that might be a neat solution. N.B. Refiling under unit tests. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 21:30:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 21:30:46 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811110230.mAB2Ukq2020297@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 ------- Comment #2 from bsouthey at gmail.com 2008-11-10 21:30 EST ------- (In reply to comment #1) > What bug in Seq? Trying to call the count method with an integer argument > instead of string or another Seq should fail - try it on a string for > comparison: > > >>> "123456".count(1) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: expected a character buffer object > > I would agree that the TypeError message could be better, "object of type 'int' > has no len()" is a little misleading. Are you suggesting that be changed? That is an 'obvious' bug (in light of the error) because there is no check for that 'sub' is a string. Using the example from the docstring: my_mseq = MutableSeq("AAAATGA") my_mseq.count(1) Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count if len(search) == 1 : TypeError: object of type 'int' has no len() Note that using a dict or list work but perhaps these should not. I think you need to check that 'search' is a string (isinstance(search,basestring)). If not, then fail with some more informative message. > > Genetic algorithms (with a random seed at least) are non deterministic - I've > seen some of the GA unit tests fail every so often (but I'm not sure off hand > if its just test_GAQueens or not). Rerunning the test will usually be fine. > The traceback looks familiar so its probably the same issue, but I haven't had > the time or desire to trace through the code to try and work out what is going > wrong. I would guess it fails far less than 10% of time, but maybe 1% or 2%. > I guess a quick shell script would answer this ;) > > Maybe we should catch the error condition and issue a runtime error saying > "Didn't converge" or whatever would be appropriate terminology. Or > automatically restart the test? Or, maybe we can solve the unit test failure > by specifying a random seed - that might be a neat solution. > > N.B. Refiling under unit tests. > I agree with doing one or more of these at least until the source is identified (hopefully a known case). But I do agree that this is not easy to find and I do not know anything to help. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 05:10:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 05:10:45 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811111010.mABAAjQq029851@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-11 05:10 EST ------- (In reply to comment #2) >(In reply to comment #1) >> What bug in Seq? Trying to call the count method with an integer argument >> instead of string or another Seq should fail - try it on a string for >> comparison: >> >> >>> "123456".count(1) >> Traceback (most recent call last): >> File "", line 1, in ? >> TypeError: expected a character buffer object >> >> I would agree that the TypeError message could be better, "object of type >> 'int' has no len()" is a little misleading. Are you suggesting that be >> changed? > > That is an 'obvious' bug (in light of the error) because there is no check for > that 'sub' is a string. Using the example from the docstring: > my_mseq = MutableSeq("AAAATGA") > my_mseq.count(1) > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count > if len(search) == 1 : > TypeError: object of type 'int' has no len() > > Note that using a dict or list work but perhaps these should not. I think you > need to check that 'search' is a string (isinstance(search,basestring)). If > not, then fail with some more informative message. That's done in CVS. Leaving this bug open to cover the test_GAQueens.py issue. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 06:30:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 06:30:16 -0500 Subject: [Biopython-dev] [Bug 2652] New: Bio.Fasta.Iterator fails with IndexError when opening empty fasta files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2652 Summary: Bio.Fasta.Iterator fails with IndexError when opening empty fasta files Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rjalves at igc.gulbenkian.pt Instead of IndexError a better error handling or at least a more explicit error message. At the first look it's not obvious what is causing the error. Example: In [1]: from Bio import Fasta In [2]: Fasta.Iterator(open("empty.fasta")) --------------------------------------------------------------------------- IndexError Traceback (most recent call last) /var/lib/python-support/python2.5/Bio/Fasta/__init__.pyc in __init__(self, handle, parser, debug) 65 while True : 66 line = handle.readline() ---> 67 if line[0] == ">" : 68 break 69 if debug : print "Skipping: " + line IndexError: string index out of range -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 06:30:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 06:30:45 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111130.mABBUjf8003203@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 rjalves at igc.gulbenkian.pt changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.45 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 06:55:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 06:55:07 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111155.mABBt7Hf005132@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-11 06:55 EST ------- Hi Renato, This bug in Bio.Fasta with empty files was fixed in Biopython 1.49b, see Bio/Fasta/__init__.py revision 1.19. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?cvsroot=biopython#rev1.19 I would encourage you to try Biopython 1.49b, but if you have a reason for running an old version like Biopython 1.45, you could probably update just this one file instead. Ask if you would like specific instructions, but essentially its a one line change, from: if line[0] == ">" : to: if not line or line[0] == ">" : Please note that Bio.Fasta is considered to be obsolete (and was explicitly documented as such as of Biopython 1.48), and may one day be deprecated. However, given this was the main FASTA parsing code in Biopython for some years, we're not going to deprecate it just yet, so you should be OK continuing to use Bio.Fasta in old scripts for a while yet. For new code, we encourage people to use Bio.SeqIO instead, described in the current tutorial and on the wiki: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/wiki/SeqIO Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 07:08:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 07:08:37 -0500 Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. In-Reply-To: Message-ID: <200811111208.mABC8bHw006251@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2008-11-11 07:08 EST ------- I've uploaded a fixed version to CVS; see KDTree.py and KDTreemodule.c at http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/KDTree/?cvsroot=biopython Could you try with these files and see if they work for you? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 11 08:02:18 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 11 Nov 2008 13:02:18 +0000 Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects In-Reply-To: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com> References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com> Message-ID: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com> On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox wrote: > Hi All, > > Two DBSeq objects cannot be concatenated, although the DBSeq object inherits > __add__ from Seq. Interesting point - not something I'd considered (nor anyone else until now!) > It tries to init a new DBSeq object rather than returning a Seq object as would be expected. > ... > Presumably, DBSeq needs to overide Seq.__add__ > (Using CVS as of yesterday...) Clearly we can't create a new DBSeq object (there wouldn't be any suitable sequence in the database to point to), and returning a Seq object is sensible. We should probably continue this discussion on the dev mailing list (CC'd). Either we have the DBSeq override the __add__ method (and __radd__), or we could make the base Seq class always use new Seq objects in __add__ etc. This would affect anyone writing their own Seq subclass... On balance, I think you're right and its DBSeq which needs to be changed. Would you like to tackle this, or should I? We'd also want to extend the BioSQL unit test to cover adding DBSeq+DBSeq, DBSeq+Seq, Seq+DBSeq, DBSeq+MutableSeq, MutableSeq+DBSeq, etc. Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 11 09:48:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 09:48:14 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111448.mABEmEba019180@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 ------- Comment #2 from rjalves at igc.gulbenkian.pt 2008-11-11 09:48 EST ------- Hi Peter, I am using the Biopython package from the debian-lenny repository (which is 1.45), I guess they haven't updated in part due to the change to the Numpy. I will checkout the svn version then. As for why I'm using Bio.Fasta, I'm not using it directly. Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it. Renato -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 11 09:53:32 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 11 Nov 2008 14:53:32 +0000 Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects In-Reply-To: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com> References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com> <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com> Message-ID: <320fb6e00811110653u63e85bc6k572d5fa42ede8280@mail.gmail.com> On Tue, Nov 11, 2008 at 1:02 PM, Peter wrote: > On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox wrote: >> Hi All, >> >> Two DBSeq objects cannot be concatenated, although the DBSeq object inherits >> __add__ from Seq. > > Interesting point - not something I'd considered (nor anyone else until now!) > >> It tries to init a new DBSeq object rather than returning a Seq object as would be expected. >> ... >> Presumably, DBSeq needs to overide Seq.__add__ >> (Using CVS as of yesterday...) > > Clearly we can't create a new DBSeq object (there wouldn't be any > suitable sequence in the database to point to), and returning a Seq > object is sensible. We should probably continue this discussion on > the dev mailing list (CC'd). Fixed in CVS by implementing the __add__ and __radd__ methods in the DBSeq object, and having these simply off load the work to the Seq class. See: BioSQL/BioSeq.py revision: 1.28 Tests/test_BioSQL.py revision: 1.26 Tests/output/test_BioSQL revision: 1.2 Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 11 10:28:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:28:20 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111528.mABFSK8A022517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-11 10:28 EST ------- (In reply to comment #2) > I am using the Biopython package from the debian-lenny repository (which is > 1.45), I guess they haven't updated in part due to the change to the Numpy. I > will checkout the svn version then. Debian sid is using Biopython 1.47, I think lenny is just very conservative. If you don't mind installing NumPy and trying to install Biopython from source, then you could either try getting the latest Biopython code from CVS, or try Biopython 1.49 beta which was released just a few days ago. Ask on the mailing list if you get stuck. > As for why I'm using Bio.Fasta, I'm not using it directly. > Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it. Oh - thanks for that. I've just updated Bio/SeqUtils/CodonUsage.py to use Bio.SeqIO instead of Bio.Fasta (plus added a basic check of this module to our unit tests). Peter [Leaving this bug as resolved fixed] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 10:43:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:43:05 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111543.mABFh5x8023530@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 ------- Comment #4 from rjalves at igc.gulbenkian.pt 2008-11-11 10:43 EST ------- Thanks Biopython 1.49b installed without any problems -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 10:43:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:43:15 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111543.mABFhFBp023551@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 rjalves at igc.gulbenkian.pt changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 10:46:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:46:13 -0500 Subject: [Biopython-dev] [Bug 2653] New: Bio.SeqUtils.CodonUsage is not translation table aware Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2653 Summary: Bio.SeqUtils.CodonUsage is not translation table aware Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Looking at Bio/SeqUtils/CodonUsage.py there is a hard coded dictionary SynonymousCodons, presumably for the standard genetic code. Ideally Bio.SeqUtils.CodonUsage should support any of the genetic code tables defined in Bio.Data.CodonTable, perhaps via an optional initiation argument to the CodonAdaptationIndex object. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 13:09:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 13:09:20 -0500 Subject: [Biopython-dev] [Bug 2653] Bio.SeqUtils.CodonUsage is not translation table aware In-Reply-To: Message-ID: <200811111809.mABI9KXq004974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2653 rjalves at igc.gulbenkian.pt changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rjalves at igc.gulbenkian.pt ------- Comment #1 from rjalves at igc.gulbenkian.pt 2008-11-11 13:09 EST ------- Thanks for the heads up Peter. Also related to the reference codon table used... There is the possibility of a codon being completely absent in all given sequences. In this case the CodonAdaptationIndex.generate_index() function fails with a ZeroDivisionError on line 90. The resource at http://phenotype.biosci.umbc.edu/index.php?page=What_is_CAI might give some good indications on how to work around this and also other (improved?) implementations of CAI. Obviously if you use a different SynonymousCodons table the picture may change. Renato. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 06:14:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 06:14:27 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121114.mACBER3k002184@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #15 from dalloliogm at gmail.com 2008-11-12 06:14 EST ------- (In reply to comment #13) > (From update of attachment 1033 [details]) > Something similar was checked into CVS. > (In reply to comment #13) > (From update of attachment 1033 [details]) > Something similar was checked into CVS. > I saw the changes now! ok.. But I would prefer to put the doctest in the main __doc__ of the function instead of __init__ and __repr__. This is because otherwise they wouldn't be accessible by the users with the help function. Usually you do help(SeqRecord), not help(SeqRecord.__init__). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 06:47:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 06:47:25 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121147.mACBlP4T005886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-12 06:47 EST ------- (In reply to comment #15) > I saw the changes now! The CVS website is updated once an hour, you track this on http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed, http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart from the links when more than one file is changed). > ok.. But I would prefer to put the doctest in the main __doc__ of > the function instead of __init__ and __repr__. > This is because otherwise they wouldn't be accessible by the users with the > help function. Usually you do help(SeqRecord), not help(SeqRecord.__init__). If you do help(object) it shows you the main docstring followed by all the methods and their docstrings (including __init__). On the other hand all the special methods like __init__, __str__, __repr__ etc are going to be confusing for a beginner. On balance, a short example in the main docstring (covering __init__) does seem sensible, and perhaps the __init__ example is then redundant. Does anyone else want to comment? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cymon.cox at googlemail.com Wed Nov 12 05:57:12 2008 From: cymon.cox at googlemail.com (Cymon Cox) Date: Wed, 12 Nov 2008 10:57:12 +0000 Subject: [Biopython-dev] BioSQL buglets Message-ID: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com> All, Selects on the seqfeature_qualifier_value and dbxref tables were not being ordered by rank. This caused multiple qualifier values to be out of order which in turn caused the tests to fail - see comment in http://bugzilla.open-bio.org/show_bug.cgi?id=2616 This also solves a TODO in the test_BioSQL_SeqIO.py: 85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this: 86 +# ("genbank",False, 'GenBank/cor6_6.gb', 6), This test now works and Ive generated new output. In test_BioSQL.py create_database(), postgres returns an error string that 'find's on index 0 when the the database doesnt exist. The comparision therefore needs to be >= 0 rather than >0. All tests now pass OK with postgresql/psycopg2. Patch attached. Cheers, C. -- -------------- next part -------------- A non-text attachment was scrubbed... Name: biosql.patch Type: text/x-patch Size: 5105 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Wed Nov 12 08:12:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 08:12:24 -0500 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200811121312.mACDCOdj011669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-12 08:12 EST ------- (In reply to comment #10) > > We still need to sort out the feature qualifiers loss of ordering... > Fixed in CVS with a another patch from Cymon (via the mailing list). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Nov 12 08:13:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 13:13:16 +0000 Subject: [Biopython-dev] BioSQL buglets In-Reply-To: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com> References: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com> Message-ID: <320fb6e00811120513p3be878b8pe0c5a48fa3945ff5@mail.gmail.com> On Wed, Nov 12, 2008 at 10:57 AM, Cymon Cox wrote: > All, > > Selects on the seqfeature_qualifier_value and dbxref tables were not being > ordered by rank. This caused multiple qualifier values to be out of order > which in turn caused the tests to fail - see comment in > http://bugzilla.open-bio.org/show_bug.cgi?id=2616 > > This also solves a TODO in the test_BioSQL_SeqIO.py: > > 85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this: > 86 +# ("genbank",False, 'GenBank/cor6_6.gb', 6), > > This test now works and Ive generated new output. > > In test_BioSQL.py create_database(), postgres returns an error string that > 'find's on index 0 when the the database doesnt exist. The comparision > therefore needs to be >= 0 rather than >0. > > All tests now pass OK with postgresql/psycopg2. > Patch attached. > > Cheers, C. Excellent - that patch made perfect sense and I've checked it in (almost as is - I tweaked the find index bit slightly). Thank you! At this rate you'll be co-opted as an official maintainer for the BioSQL module ;) Peter P.S. It might have been better to upload the patch to Bug 2616 (or a new Bug) rather than sending it to everyone on the mailing list. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 10:35:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 10:35:54 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121535.mACFZsMl021458@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #17 from dalloliogm at gmail.com 2008-11-12 10:35 EST ------- (In reply to comment #16) > (In reply to comment #15) > > I saw the changes now! > > The CVS website is updated once an hour, you track this on > http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed, > http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart > from the links when more than one file is changed). > > > ok.. But I would prefer to put the doctest in the main __doc__ of > > the function instead of __init__ and __repr__. > > This is because otherwise they wouldn't be accessible by the users with the > > help function. Usually you do help(SeqRecord), not help(SeqRecord.__init__). > > If you do help(object) it shows you the main docstring followed by all the > methods and their docstrings (including __init__). > > On the other hand all the special methods like __init__, __str__, __repr__ etc > are going to be confusing for a beginner. > > On balance, a short example in the main docstring (covering __init__) does seem > sensible, and perhaps the __init__ example is then redundant. well, I was saying that maybe it would be better to move the doctests in __init__ and __repr__ to the main __doc__ of the module. So it will be visible by people using help(module). Moreover, you can to test __repr__ and __init__ from there, without having to repeat the 'from Bio.ALign.Generic import Alignment' stuff and similar every time. as for a few comments you added in Bio.Align.Generic: > #A doctest for __repr__ would be nice, but __class__ comes out differently > #if run via the __main__ trick. maybe you can use the '+ELLIPSIS' directive and about this comment: #A doctest would be nice, but the stuff is very ugly! #The "tab" format is possible, but tabs don't seem to work nicely in doctests. you could use the directive NORMALIZE_WHITESPACE in a similar way. I am attaching a file just to give you an example of how it could be with +ELLIPSIS > Does anyone else want to comment? > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 10:36:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 10:36:37 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121536.mACFabdk021517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #18 from dalloliogm at gmail.com 2008-11-12 10:36 EST ------- Created an attachment (id=1052) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1052&action=view) example of ellipsis directive Example of doctest with ellipsis directive to test Alignment.__repr__ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Wed Nov 12 11:25:47 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 12 Nov 2008 17:25:47 +0100 Subject: [Biopython-dev] a sequence set object in biopython? Message-ID: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> Hi, I think it could be useful to add a generic SequenceSet object in biopython. Such an object would represent a generic set of sequences, and could have some useful methods like .format('fasta') or .align('alignment_tool'). Is there something similar available already? I have noticed that the actual Generic.Alignment is very similar to such an object. However, it would be better to be able to work with a separated class, because sometimes you want to deal with sequences that are not aligned. Some use cases: - a set of sequences that represents all introns in a particular gene, on which I want to calculate the conservation of the splicing regulatory sites. - all genes sequences in an organisms, which I want to convert in EMBL format - a set of seqs to be aligned or used as input for other tools etc.. -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Wed Nov 12 11:29:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 11:29:07 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811121629.mACGT7gs025634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 cymon.cox at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cymon.cox at gmail.com ------- Comment #1 from cymon.cox at gmail.com 2008-11-12 11:29 EST ------- (In reply to comment #0) > This is related to the very broad alignment bug 1944. > > Given two alignments, it can make sense to talk about adding them together. Actually, this is a very common procedure in phylogenetic analyses, where multiple genes/loci are combined into a "super" matrix for a set of taxa. Although, in this case, adding by column, if a taxon/row/identifier was missing in a particular (sub-)alignment it would be filled by "-" (missing data) in the combined matrix. Anyway, I think this would be a very useful enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Nov 12 12:53:35 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 17:53:35 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> Message-ID: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I think it could be useful to add a generic SequenceSet object in biopython. > Such an object would represent a generic set of sequences, and could > have some useful methods like .format('fasta') or > .align('alignment_tool'). > Is there something similar available already? Given your example to turn the SequenceSet into a FASTA file, then clearly you are thinking of a collection of SeqRecord objects rather than just Seq objects. For this kind of thing I personally just use a list of SeqRecord objects. If I want to turn a list of SeqRecord objects into a FASTA file, I can pass the list to the Bio.SeqIO.write() function. Once I've made a FASTA file, I can call an external tool to align them - and then load them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan to do next. > I have noticed that the actual Generic.Alignment is very similar to > such an object. However, it would be better to be able to work with a > separated class, because sometimes you want to deal with sequences > that are not aligned. Yes, the generic alignment is basically a list of SeqRecord objects plus some extra functionality like column access. > Some use cases: > - a set of sequences that represents all introns in a particular gene, > on which I want to calculate the conservation of the splicing > regulatory sites. > - all genes sequences in an organisms, which I want to convert in EMBL format > - a set of seqs to be aligned or used as input for other tools > etc.. All sensible use cases - but all seem to be covered by a simple python list of SeqRecord objects, or in some cases a list of Seq objects (e.g. the introns example, as I doube the introns have names). Peter From tiagoantao at gmail.com Wed Nov 12 13:02:11 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 12 Nov 2008 18:02:11 +0000 Subject: [Biopython-dev] PopGen status and new developments Message-ID: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> Hi, This an email with the status of current PopGen developments. In some points, advice is especially welcome. A. Platform support As Peter noticed there is no Simcoal for the Mac. In a couple of weeks I hope to have access to a Mac in order to try to compile it. In any case I wont be able to distribute it without getting permission from the authors, so the problem might remain... I am now preparing support for LDNe, an application to estimate Ne (effective population size) from LD. This application is Dos(Windows) only. Source code is not available to the public (but the app is free as free beer). I've had access to the source and compiled a Linux version, again, I don't know if the author will let me distribute it. Question: How do people feel about supporting an application like this? Any strong feelings against? B. New developments 1. The above LDNe module is fully coded, and being tested by a few people (not just me). Test code and documentation TBD but easy. 2. Genepop application support (no confusion with file format support, which is done). Partially done and informally tested. Plan to start with just partial support. 3. Fstat parser. Coded. C. Statistics An ongoing interesting discussion started on statistics. I am delayed with doing a proposal to handle statistical processing (my bad, but I will have some free time in the next couple of weeks and I will try to recover). My current existing code on the subject is available on Github (by Giovanni), but I think it will need some change (not in the functionality, but in the architecture). From biopython at maubp.freeserve.co.uk Wed Nov 12 13:06:19 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 18:06:19 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> Message-ID: <320fb6e00811121006mbe32efar2fca638d1a5fe2ef@mail.gmail.com> On Wed, Nov 12, 2008 at 5:53 PM, Peter wrote: > On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio > wrote: >> Hi, >> I think it could be useful to add a generic SequenceSet object in biopython. >> Such an object would represent a generic set of sequences, and could >> have some useful methods like .format('fasta') or >> .align('alignment_tool'). >> Is there something similar available already? > > Given your example to turn the SequenceSet into a FASTA file, then > clearly you are thinking of a collection of SeqRecord objects rather > than just Seq objects. For this kind of thing I personally just use a > list of SeqRecord objects. > > If I want to turn a list of SeqRecord objects into a FASTA file, I can > pass the list to the Bio.SeqIO.write() function. Once I've made a > FASTA file, I can call an external tool to align them - and then load > them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan > to do next. If you really want a list like object with a format method in your code, how about something like this: class SeqRecordList(list) : """Subclass of the python list, to hold SeqRecord objects only.""" #TODO - Override the list methods to make sure all the items #are indeed SeqRecord objects def format(self, format) : """Returns a string of all the records in a requested file format. The argument format should be any file format supported by the Bio.SeqIO.write() function. This must be a lower case string. """ from Bio import SeqIO from StringIO import StringIO handle = StringIO() SeqIO.write(self, handle, format) handle.seek(0) return handle.read() if __name__ == "__main__" : print "Loading records..." from Bio import SeqIO my_list = SeqRecordList(SeqIO.parse(open("ls_orchid.gbk"),"genbank")) print len(my_list) for format in ["fasta","tab"] : print print format print "="*len(format) print my_list.format(format) Peter From biopython at maubp.freeserve.co.uk Wed Nov 12 13:11:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 18:11:30 +0000 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> Message-ID: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> Tiago Ant?o wrote: > A. Platform support > > As Peter noticed there is no Simcoal for the Mac. In a couple of weeks > I hope to have access to a Mac in order to try to compile it. In any > case I wont be able to distribute it without getting permission from > the authors, so the problem might remain... > I am now preparing support for LDNe, an application to estimate Ne > (effective population size) from LD. This application is Dos(Windows) > only. Source code is not available to the public (but the app is free > as free beer). I've had access to the source and compiled a Linux > version, again, I don't know if the author will let me distribute it. > Question: How do people feel about supporting an application like > this? Any strong feelings against? Assuming the tools are useful, then I have no objection to including command line wrappers for them in Biopython. I'm not 100% sure what you meant by "supporting an application like this", but if you are asking about supporting these cross-platform ports of the actual command line tools, then I don't see that as something Biopython should be doing. Peter From tiagoantao at gmail.com Wed Nov 12 13:16:06 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 12 Nov 2008 18:16:06 +0000 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> Message-ID: <6d941f120811121016q17451c83u12b2233eba625944@mail.gmail.com> On Wed, Nov 12, 2008 at 6:11 PM, Peter wrote: > I'm not 100% sure what you meant by "supporting an application like > this", but if you are asking about supporting these cross-platform > ports of the actual command line tools, then I don't see that as > something Biopython should be doing. Sorry, I was not clear: I was just asking about supporting applications that dont have the source available and that don't support all common platforms (the case of LDNe). From dalloliogm at gmail.com Wed Nov 12 13:17:48 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 12 Nov 2008 19:17:48 +0100 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> Message-ID: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> On Wed, Nov 12, 2008 at 6:53 PM, Peter wrote: > On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio > wrote: >> Hi, >> I think it could be useful to add a generic SequenceSet object in biopython. >> Such an object would represent a generic set of sequences, and could >> have some useful methods like .format('fasta') or >> .align('alignment_tool'). >> Is there something similar available already? > > Given your example to turn the SequenceSet into a FASTA file, then > clearly you are thinking of a collection of SeqRecord objects rather > than just Seq objects. For this kind of thing I personally just use a > list of SeqRecord objects. > > If I want to turn a list of SeqRecord objects into a FASTA file, I can > pass the list to the Bio.SeqIO.write() function. Once I've made a > FASTA file, I can call an external tool to align them - and then load > them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan > to do next. > >> Some use cases: >> - a set of sequences that represents all introns in a particular gene, >> on which I want to calculate the conservation of the splicing >> regulatory sites. >> - all genes sequences in an organisms, which I want to convert in EMBL format >> - a set of seqs to be aligned or used as input for other tools >> etc.. > > All sensible use cases - but all seem to be covered by a simple python > list of SeqRecord objects, or in some cases a list of Seq objects > (e.g. the introns example, as I doube the introns have names). > Not always. For example, if I have a set of genes in an organism, sometimes I would need to access to only some of them, by their id; so, a __getattribute__ method to make it work as a dictionary could also be useful. The fact is that I think that such an object would be so widely used, that maybe it would be useful to implement it in biopython. What I would do, honestly, is to create a GenericSeqRecordSet class from which to derive Alignment, specifying that in an alignment all the sequences should have the same lenght. It would not require much work and it would change the interface. very tiny little minusculus p.s. if you need help for implement such a thing or anything else I can volounteer :). > Peter > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From dalloliogm at gmail.com Wed Nov 12 13:19:50 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 12 Nov 2008 19:19:50 +0100 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> Message-ID: <5aa3b3570811121019k3a0710f1n2add599ce0b4f56a@mail.gmail.com> On Wed, Nov 12, 2008 at 7:02 PM, Tiago Ant?o wrote: > Hi, > > This an email with the status of current PopGen developments. In some > points, advice is especially welcome. Hi Tiago!! Have you noticed (I thought it wasn't directly related to PopGen so I didn't tell you directly) about this parser for fastPhaseOutput? - http://bugzilla.open-bio.org/show_bug.cgi?id=2643 > > > A. Platform support > > As Peter noticed there is no Simcoal for the Mac. In a couple of weeks > I hope to have access to a Mac in order to try to compile it. In any > case I wont be able to distribute it without getting permission from > the authors, so the problem might remain... > I am now preparing support for LDNe, an application to estimate Ne > (effective population size) from LD. This application is Dos(Windows) > only. Source code is not available to the public (but the app is free > as free beer). I've had access to the source and compiled a Linux > version, again, I don't know if the author will let me distribute it. > Question: How do people feel about supporting an application like > this? Any strong feelings against? > > > B. New developments > > 1. The above LDNe module is fully coded, and being tested by a few > people (not just me). Test code and documentation TBD but easy. > 2. Genepop application support (no confusion with file format support, > which is done). Partially done and informally tested. Plan to start > with just partial support. > 3. Fstat parser. Coded. > > > C. Statistics > > An ongoing interesting discussion started on statistics. I am delayed > with doing a proposal to handle statistical processing (my bad, but I > will have some free time in the next couple of weeks and I will try to > recover). My current existing code on the subject is available on > Github (by Giovanni), but I think it will need some change (not in the > functionality, but in the architecture). > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Nov 12 13:36:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 18:36:11 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> Message-ID: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> Giovanni Marco Dall'Olio wrote: >> All sensible use cases - but all seem to be covered by a simple python >> list of SeqRecord objects, or in some cases a list of Seq objects >> (e.g. the introns example, as I doube the introns have names). > > Not always. > For example, if I have a set of genes in an organism, sometimes I > would need to access to only some of them, by their id; so, a > __getattribute__ method to make it work as a dictionary could also be > useful. OK, then use a dict of SeqRecords for this, as shown in the tutorial chapter for Bio.SeqIO and the wiki. We even have a helper function Bio.SeqIO.to_dict() to do this and check for duplicate keys. If you need an order preserving dictionary, there are examples of this on the net and there is even PEP372 for adding this to python itself: http://www.python.org/dev/peps/pep-0372/ > The fact is that I think that such an object would be so widely used, > that maybe it would be useful to implement it in biopython. > What I would do, honestly, is to create a GenericSeqRecordSet class > from which to derive Alignment, specifying that in an alignment all > the sequences should have the same lenght. It would not require much > work and it would change the interface. I agree that IF we added some sort of "GenericSeqRecordSet class", it might be sensible for the alignment objects to subclass it - especially if you want it to behave list a python list primarily. Note that in python sets are not order preserving. > very tiny little minusculus p.s. if you need help for implement such a > thing or anything else I can volounteer :). That's good to hear :) However, we'd have to establish the need for this new object first - but so far we've only had two people's view so its too early to form a consensus. I don't see a strong reason for adding yet another object, when the core language provides lists, sets and dict which seem to be enough. Peter From jflatow at gmail.com Wed Nov 12 13:52:35 2008 From: jflatow at gmail.com (Jared Flatow) Date: Wed, 12 Nov 2008 12:52:35 -0600 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> Message-ID: On Nov 12, 2008, at 12:36 PM, Peter wrote: > However, we'd have to establish the need for this new object first - > but so far we've only had two people's view so its too early to form a > consensus. I don't see a strong reason for adding yet another object, > when the core language provides lists, sets and dict which seem to be > enough. I totally agree with you Peter, that's what the basic container types are for. If someone wants to create a subclass of these containers for a specific purpose it is simple enough to do. IMO its kind of silly to try and make sequence specific containers that satisfy everyone's needs. jared From bsouthey at gmail.com Wed Nov 12 13:58:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 12 Nov 2008 12:58:05 -0600 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> Message-ID: <491B273D.9020404@gmail.com> Peter wrote: > Tiago Ant?o wrote: > >> A. Platform support >> >> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks >> I hope to have access to a Mac in order to try to compile it. In any >> case I wont be able to distribute it without getting permission from >> the authors, so the problem might remain... >> I am now preparing support for LDNe, an application to estimate Ne >> (effective population size) from LD. This application is Dos(Windows) >> only. Source code is not available to the public (but the app is free >> as free beer). I've had access to the source and compiled a Linux >> version, again, I don't know if the author will let me distribute it. >> Question: How do people feel about supporting an application like >> this? Any strong feelings against? >> > > Assuming the tools are useful, then I have no objection to including > command line wrappers for them in Biopython. > > I'm not 100% sure what you meant by "supporting an application like > this", but if you are asking about supporting these cross-platform > ports of the actual command line tools, then I don't see that as > something Biopython should be doing. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Hi, I do have concerns about usefulness with regards to Biopython. How widespread is the application? What platforms is it released under (DOS only or some version of windows version like XP or Vista or Windows 7)? Is the application well supported and will it continue to be supported? Under what terms is the application 'free'? How does this integrate into your ideas for Popgen? Would it work like say clustalw where you output something from Biopython, run the application and perhaps import something back into Biopython? If the application requires major data formatting then you would have to determine if it is easier to support the application or integrate it into Biopython. Obviously, this latter requires a clean room implementation of the application or the essential algorithm. Also, you can only provide the specification and can not be involved the actual implementation. Bruce From tiagoantao at gmail.com Wed Nov 12 15:09:31 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 12 Nov 2008 20:09:31 +0000 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <491B273D.9020404@gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> <491B273D.9020404@gmail.com> Message-ID: <6d941f120811121209n75dfb0cfh1fb4e57a98011ed0@mail.gmail.com> Hi, On Wed, Nov 12, 2008 at 6:58 PM, Bruce Southey wrote: > I do have concerns about usefulness with regards to Biopython. It is important to notice that having this application support has no big impact on deployment of biopython. The only visible thing is some tests reporting that the application doesn't exist. This is different from adding a dependency on, say, scipy. I don't think that this imposes any maintenance/installation hurdle at large. I think, this is actually a non-problem on the deployment stage, at least. > How widespread is the application? The application is fairly new (genepop, on the other hand is widely used and old). I cannot answer that question. I know of some people using it, but it is my small, biased, universe. I would guess that currently the number is small. Is there a policy to only support widespread applications? > What platforms is it released under (DOS only or some version of windows > version like XP or Vista or Windows 7)? There is a Dos and Windows frontend. I actually asked the code to the authors and they gave me access to it. I have compiled a Linux version, but I don't know if they are going to make it available. > Is the application well supported and will it continue to be supported? Regarding current support, I can subjectively say that the authors answer my queries rather fast. Regarding the future, I dont know. > Under what terms is the application 'free'? Much software available in this field is made available without no regards for licensing issues. This is already the case for the supported Fdist application (source available, no license). This is problem in the field, where people make things available without much concern for licensing issues. Some people don't care that much about that, they just "make things available". So, if there is a policy to only support applications for which there is a clear license, then this one is out (and some code has to be removed from the current PopGen module, by the way). I never link the code in, I just invoke it (these are mostly wrappers), so there should be no legal issues in any case, I suspect. There is a chicken and egg problem here that needs to be fought: In population genetics there is no widespread tradition of making things open (not because people want closed solutions, but mostly because people don't think about these issues). There is also little tradition in coding (people want ready made solutions. The coding people is relatively few and mostly R based) than in other areas. As an example: i don't know of many direct users of fdist code, but know lots of people which use applications made on top of that code. By the way, Simcoal is GPL (and there are more examples of open code in population genetics, of course). > How does this integrate into your ideas for Popgen? Very well. I have this stated philosophy, from the beginning, of using existing applications and not reinvent the wheel. That being said, I agree that a core statistic implementation should be done (even if there are alternatives). But, mostly, for now, what is available in Bio.PopGen are intelligent wrappers. > Would it work like say clustalw where you output something from Biopython, > run the application and perhaps import something back into Biopython? Yep, it accepts genepop files and the output is fully parsed back. This is still not the case, by the way, with simcoal where the output is not usable (arlequin is needed to analyze the results). I need to do an arlequin parser, that would solve the problem. > If the application requires major data formatting then you would have to It doesn't require any formatting at all as the de facto standard format in the area (genepop) is supported and the results are parsed back. Tiago From dalloliogm at gmail.com Wed Nov 12 19:16:44 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 13 Nov 2008 01:16:44 +0100 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> Message-ID: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com> On Wed, Nov 12, 2008 at 7:36 PM, Peter wrote: > Giovanni Marco Dall'Olio wrote: >>> All sensible use cases - but all seem to be covered by a simple python >>> list of SeqRecord objects, or in some cases a list of Seq objects >>> (e.g. the introns example, as I doube the introns have names). >> >> Not always. >> For example, if I have a set of genes in an organism, sometimes I >> would need to access to only some of them, by their id; so, a >> __getattribute__ method to make it work as a dictionary could also be >> useful. > > OK, then use a dict of SeqRecords for this, as shown in the tutorial > chapter for Bio.SeqIO and the wiki. We even have a helper function > Bio.SeqIO.to_dict() to do this and check for duplicate keys. I would prefer a SeqRecordSet object with a to_dict method :) > If you need an order preserving dictionary, there are examples of this > on the net and there is even PEP372 for adding this to python itself: > http://www.python.org/dev/peps/pep-0372/ >> The fact is that I think that such an object would be so widely used, >> that maybe it would be useful to implement it in biopython. >> What I would do, honestly, is to create a GenericSeqRecordSet class >> from which to derive Alignment, specifying that in an alignment all >> the sequences should have the same lenght. It would not require much >> work and it would change the interface. > > I agree that IF we added some sort of "GenericSeqRecordSet class", it > might be sensible for the alignment objects to subclass it - > especially if you want it to behave list a python list primarily. Let's see it from another point of view. In biopython, if you want to print a set of sequences in fasta format, you have to do the following: >>> s1 = SeqRecord(Seq('cacacac')) >>> s2 = SeqRecord(Seq('cacacac')) >>> seqs = s1, s2 >>> out = '' >>> for seq in seqs: >>> # a "print seq.format('fasta')" statement won't work properly here, because of blank lines >>> out += seq.format('fasta') >>> print out On the other side, printing an alignment in fasta format is a lot simpler: >>> al = Alignment(SingleLetterAlphabet) >>> al.add_sequence('s1', 'cacaca') >>> al.add_sequence('s2, 'cacaca') >>> print al.format('fasta') I work more often with sets of sequences rather than with alignments. So, why it is more difficult to print some un-related sequences in a certain format, than aligned sequence? I would end up using Alignment objects also for sequences that are not aligned. I am also thinking about many format parsers. Wouldn't it be easier: >>> seqs = Bio.SeqIO.parse(filehandler, 'fasta') >>> record_dict = seqs.to_dict() than invoking SeqIO twice? > Note that in python sets are not order preserving. > >> very tiny little minusculus p.s. if you need help for implement such a >> thing or anything else I can volounteer :). > > That's good to hear :) > > However, we'd have to establish the need for this new object first - > but so far we've only had two people's view so its too early to form a > consensus. I don't see a strong reason for adding yet another object, > when the core language provides lists, sets and dict which seem to be > enough. Take for example this code you wrote for me before: > class SeqRecordList(list) : > """Subclass of the python list, to hold SeqRecord objects only.""" > #TODO - Override the list methods to make sure all the items > #are indeed SeqRecord objects > > def format(self, format) : > """Returns a string of all the records in a requested file format. > > The argument format should be any file format supported by > the Bio.SeqIO.write() function. This must be a lower case string. > """ > from Bio import SeqIO > from StringIO import StringIO > handle = StringIO() > SeqIO.write(self, handle, format) > handle.seek(0) > return handle.read() It's very useful, but I don't think a python/biopython newbie would be able to write it. That's why I think it should be included. Last year, I was in another laboratory and I didn't have much experience with biopython, and I was missing such a kind of object. > Peter > Goodnight!! -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Thu Nov 13 02:16:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 02:16:02 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811130716.mAD7G2pw008200@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-11-13 02:16 EST ------- The Nexus module in Bio.Nexus has a function (not a method) 'combine' that can combine Nexus objects. It takes care of missing taxa, taxon sets, etc. Usage is something like: nex1=Nexus.Nexus('myfirstalignment.nex') nex2=Nexus.Nexus('mysecondalignment.nex') combined=Nexus.combine([('fancyname1',nex1),('fancyname2',nex2)]) It looks fairly straightforward to add this to a SeqRecord object. Cheers, Frank (Hi Cymon) (In reply to comment #1) > (In reply to comment #0) > > This is related to the very broad alignment bug 1944. > > > > Given two alignments, it can make sense to talk about adding them together. > > Actually, this is a very common procedure in phylogenetic analyses, where > multiple genes/loci are combined into a "super" matrix for a set of taxa. > Although, in this case, adding by column, if a taxon/row/identifier was missing > in a particular (sub-)alignment it would be filled by "-" (missing data) in the > combined matrix. > > Anyway, I think this would be a very useful enhancement. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 05:19:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 05:19:29 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811131019.mADAJTxs024880@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 05:19 EST ------- (In reply to comment #1) > (In reply to comment #0) > > This is related to the very broad alignment bug 1944. > > > > Given two alignments, it can make sense to talk about adding them together. > > Actually, this is a very common procedure in phylogenetic analyses, where > multiple genes/loci are combined into a "super" matrix for a set of taxa. This was one of the use cases I originally had in mind here (with hindsight I should have mentioned this in the original proposal). Another potentially use for this is in combination with extracting sub-alignments by column (see Bug 2551) - for example to remove some middle region of an alignment by selecting the two end regions and adding them together, e.g. new_align = align[:,:10] + align[:,20:] to remove the region from columns 10 to 20. As described in my original proposal, adding two alignments "by column" would require they have the same number of rows, and the same IDs (possibly in a different order - this is not essential as making the user think about their preferred sort order seem fine to me). I suppose using any common subset of shared names is also well defined, or automatically including null sequences for missing entries (as Frank suggested in comment 2), but I would much prefer to keep any alignment addition simple and explicit - no "magic". More generally you could consider adding any two alignments "by column" if they have the same number of rows, but first we'd have to talk about adding SeqRecord objects. This means doing something sensible with the annotation, in particular the id and name. I was hoping to avoid this. Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list, especially now that we have some positive responses. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Thu Nov 13 05:27:57 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 13 Nov 2008 02:27:57 -0800 (PST) Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com> Message-ID: <25667.98653.qm@web62408.mail.re1.yahoo.com> Adding new classes to Biopython should be done very carefully ... once they're in, it's difficult to remove them again. In the past, removing classes that turned out to be less than ideal was a real headache. Right now I don't see a clear need for a sequence set object ... read on. --- On Wed, 11/12/08, Giovanni Marco Dall'Olio > > > > OK, then use a dict of SeqRecords for this, as shown > > in the tutorial chapter for Bio.SeqIO and the wiki. > > We even have a helper function > > Bio.SeqIO.to_dict() to do this and check for duplicate > > keys. > > I would prefer a SeqRecordSet object with a to_dict method > Wouldn't it be easier: > >>> seqs = Bio.SeqIO.parse(filehandler, > 'fasta') > >>> record_dict = seqs.to_dict() > > than invoking SeqIO twice? Maybe, yes, but it's just a matter of typing and I don't think that by itself it is a good enough reason for a SeqRecordSet class. > Let's see it from another point of view. > In biopython, if you want to print a set of sequences in > fasta format, > you have to do the following: > >>> s1 = SeqRecord(Seq('cacacac')) > >>> s2 = SeqRecord(Seq('cacacac')) > >>> seqs = s1, s2 > >>> out = '' > >>> for seq in seqs: > # a "print seq.format('fasta')" statement won't work > # properly here, because of blank lines > out += seq.format('fasta') > >>> print out I don't quite understand why "print seq.format('fasta')" won't work. > Take for example this code you wrote for me before: > > > class SeqRecordList(list) : > > def format(self, format) : > > from Bio import SeqIO > > from StringIO import StringIO > > handle = StringIO() > > SeqIO.write(self, handle, format) > > handle.seek(0) > > return handle.read() > > It's very useful, but I don't think a > python/biopython newbie would be > able to write it. I agree that this is too complicated. What if we redefine SeqIO.write as def write(self, handle=sys.stdout, format='fasta'): ... So by default SeqIO.write prints to the screen. Then you can do SeqIO.write(records) where records are a list of SeqRecord's. --Michiel. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 06:06:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 06:06:20 -0500 Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records In-Reply-To: Message-ID: <200811131106.mADB6Ki7030741@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2628 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 06:06 EST ------- Note - now that we return the count, this does block a previous suggestion by Michiel that if the handle were omitted the write function could default to returning a string (handled via StringIO internally). I wasn't keen on this idea at the time because it would have given the write function very different behaviour depending on the arguments. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 13 06:11:10 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Nov 2008 11:11:10 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <25667.98653.qm@web62408.mail.re1.yahoo.com> References: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com> <25667.98653.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00811130311t4e813a8fqeb21504fd5696bf1@mail.gmail.com> Michiel wrote: >Marco wrote: >> Take for example this code you [Peter] wrote for me before: >> >> > class SeqRecordList(list) : >> > def format(self, format) : >> > from Bio import SeqIO >> > from StringIO import StringIO >> > handle = StringIO() >> > SeqIO.write(self, handle, format) >> > handle.seek(0) >> > return handle.read() >> >> It's very useful, but I don't think a >> python/biopython newbie would be >> able to write it. > > I agree that this is too complicated. This wasn't aimed at a beginner, but rather for Marco if he really wants to use this kind of object in his own code, or as a basis for further discussion. > What if we redefine SeqIO.write as > > def write(self, handle=sys.stdout, format='fasta'): > ... > > So by default SeqIO.write prints to the screen. Then you can do > > SeqIO.write(records) > > where records are a list of SeqRecord's. We could certainly include something like this in the documentation: #Just an example to create some records: from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord records = [SeqRecord(Seq("ACGT"),"Alpha"), SeqRecord(Seq("GTGC"),"Beta")] #One way to "print" records to screen, import sys from Bio import SeqIO SeqIO.write(records, sys.stdout, "fasta") I'm not so keen on making the handle default to standard out, but this is nicer than the suggestion you made some time ago that if the handle were omitted a string be returned (no longer an option since Bug 2628 was committed). Any other votes for the standard out default? Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 13 06:18:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 06:18:01 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811131118.mADBI1of031964@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #4 from fkauff at biologie.uni-kl.de 2008-11-13 06:18 EST ------- (In reply to comment #3) > > > > > Actually, this is a very common procedure in phylogenetic analyses, where > > multiple genes/loci are combined into a "super" matrix for a set of taxa. > > This was one of the use cases I originally had in mind here (with hindsight I > should have mentioned this in the original proposal). Another potentially use > for this is in combination with extracting sub-alignments by column (see Bug > 2551) - for example to remove some middle region of an alignment by selecting > the two end regions and adding them together, e.g. new_align = align[:,:10] + > align[:,20:] to remove the region from columns 10 to 20. Nexus parser can already handle this by rewriting the data set >> nexobject.write_nexus_data(filename='new.nex',exclude=[range(10,21)],delete=['list','of','taxa','two','delete']) where the indices of remaining character sets and character partitions get recalculated. > > As described in my original proposal, adding two alignments "by column" would > require they have the same number of rows, and the same IDs (possibly in a > different order - this is not essential as making the user think about their > preferred sort order seem fine to me). > > I suppose using any common subset of shared names is also well defined, or > automatically including null sequences for missing entries (as Frank suggested > in comment 2), but I would much prefer to keep any alignment addition simple > and explicit - no "magic". > Yes, missing names are given missing character entries > More generally you could consider adding any two alignments "by column" if they > have the same number of rows, but first we'd have to talk about adding > SeqRecord objects. This means doing something sensible with the annotation, in > particular the id and name. I was hoping to avoid this. > > Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list, > especially now that we have some positive responses. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 07:14:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 07:14:21 -0500 Subject: [Biopython-dev] [Bug 2654] New: Bio.Blast.NCBIStandalone does not support the output file argument Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2654 Summary: Bio.Blast.NCBIStandalone does not support the output file argument Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The NCBI blastall tool defaults to writing its output to standard out, but can be told to write to a file instead: -o BLAST report Output File [File Out] Optional Currently Bio.Blast.NCBIStandalone.blastall() does not support this optional argument - meaning the user wants to save the output they must do this manually from the standard out handle. This also applies to rpsblast and blastpgp as well. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.pruitt at gmail.com Thu Nov 13 08:00:36 2008 From: eric.pruitt at gmail.com (James Pruitt) Date: Thu, 13 Nov 2008 07:00:36 -0600 Subject: [Biopython-dev] Lowess Smooth Improvement Message-ID: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> I made some changes to the Lowess smoothing method as well as written a unit test for it. On my machine, it runs around 37% faster in my unit tests compared to the original lowess method and that is using the numpy.median function so it would probably run even faster with the Bio.Cluster median functoin. How do I go about proposing my code to be included in Bio.Python? -- -Jimmy From biopython at maubp.freeserve.co.uk Thu Nov 13 08:27:51 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Nov 2008 13:27:51 +0000 Subject: [Biopython-dev] Lowess Smooth Improvement In-Reply-To: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> Message-ID: <320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com> On Thu, Nov 13, 2008 at 1:00 PM, James Pruitt wrote: > I made some changes to the Lowess smoothing method as well as written a unit > test for it. On my machine, it runs around 37% faster in my unit tests > compared to the original lowess method and that is using the numpy.median > function so it would probably run even faster with the Bio.Cluster median > functoin. Presumable this is an update for Bio/Statistics/lowess.py? I'm a little confused - this code already uses Bio.Cluster.median if it can, falling back on numpy.median. Maybe you're working from an older version of Bipython? > How do I go about proposing my code to be included in Bio.Python? First file an enhancement Bug, then once the bug is filed you can attached a patch against CVS. If you have any example scripts or unit tests to go with it, even better. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 13 10:25:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 10:25:56 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811131525.mADFPuvi029137@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #22 from dalloliogm at gmail.com 2008-11-13 10:25 EST ------- Created an attachment (id=1053) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1053&action=view) test files for fastPhaseOutput I put the fastPhaseoutput files, used in the tests, in separated files, as asked. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 10:59:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 10:59:02 -0500 Subject: [Biopython-dev] [Bug 2655] New: Sorting sub-features in BioSeq.py can return corrupted feature Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2655 Summary: Sorting sub-features in BioSeq.py can return corrupted feature Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com BioSeq.py retrieves SeqFeatures from a BioSQL database and sorts both the features and any subfeatures. The first sort is superfluous and the second sort is an error that can lead to feature being returned corrupted with the sub-features in an incorrect order. So Ive marked this major... Ive been trying to implement the feature/sub-feature locations test in test_BioSQL_SeqIO. Here's my solution (attached as patch1): """ # Compare sub-feature Locations: # # BioSQL currently does not store fuzzy locations, but instead stores # them as FeatureLocation.nofuzzy_start FeatureLocation.nofuzzy_end. # Hence, the old_sub from SeqIO.parse() will have fuzzy location while # new_sub locations from BioSQL will be fuzzy. # The vast majority of cases will be comparisons of ExactPosition # class locations, so we'll try that first and catch the exceptions. try: assert str(old_sub.location) == str(new_sub.location), \ "%s -> %s" % (str(old_sub.location), str(new_sub.location)) except AssertionError, e: if isinstance(old_sub.location.start, ExactPosition) and \ isinstance(new_sub.location.start, ExactPosition) and \ isinstance(old_sub.location.end, ExactPosition) and \ isinstance(new_sub.location.end, ExactPosition): # Its not a problem with fuzzy locations, re-raise raise AssertionError, e else: #At least one location is fuzzy assert old_sub.location.nofuzzy_start == new_sub.location.nofuzzy_start, \ "%s -> %s" % (old_sub.location.nofuzzy_start, new_sub.location.nofuzzy_start) assert old_sub.location.nofuzzy_end == new_sub.location.nofuzzy_end, \ "%s -> %s" % (old_sub.location.nofuzzy_end, new_sub.location.nofuzzy_end) """ This test causes errors in 3 of the test cases: GenBank/extra_keywords.gb GenBank/one_of.gb GFF/NC_001422.gbk e.g: Testing loading from genbank format file GenBank/extra_keywords.gb - TCCAGGGGATTCACGCGCA...TTG [Gp6GqZ3Q9foPG0HvyXguIGSJN8U] len 154329, AL138972.1 - Retrieving by name/display_id 'DMBR25B3', Traceback (most recent call last): File "test_BioSQL_SeqIO.py", line 371, in compare_records(record, db_rec) File "test_BioSQL_SeqIO.py", line 280, in compare_records compare_features(old_f, new_f) File "test_BioSQL_SeqIO.py", line 185, in compare_features raise AssertionError, e AssertionError: [153489:154269] -> [40:610] This is because each of these records has a peculiar join(...) for the above record: join(153490..154269,AL121804.2:41..610, (an aside how does the user know that returned feature location is a join with a separate accession? How does BioSQL/biopython deal with this?) The error is caused by BioSeq.py _retrieve_features() sorting the sub-features first by sorting on start position: BioSeq.py: 249 sub_feature_list.append((start, subfeature)) 250 sub_feature_list.sort() 251 feature.sub_features = [sub_feature[1] 252 for sub_feature in sub_feature_list] This is an error because it returns the sub-features out of order. Besides this sub-feature sort, and the seqFeature sort, are both unnecessary because the features and sub-features are stored in BioSQL by rank and retrieved by rank, so they should be in the correct order anyway. Attached BioSeq.py patch to remove both sort()'s - patch2 With these patches applied the test_BioSQL_SeqIO and test_BioSQL pass: [cymon at chara Tests]$ python test_BioSQL_SeqIO.py > test_output [cymon at chara Tests]$ diff -ruN test_output output/test_BioSQL_SeqIO --- test_output 2008-11-13 15:39:20.000000000 +0000 +++ output/test_BioSQL_SeqIO 2008-11-12 13:06:19.000000000 +0000 @@ -1,3 +1,4 @@ +test_BioSQL_SeqIO Connecting to database Removing existing sub-database 'biosql-seqio-test' (if exists) (Re)creating empty sub-database 'biosql-seqio-test' [cymon at chara Tests]$ python run_tests.py test_BioSQL_SeqIO.py test_BioSQL_SeqIO ... ok ---------------------------------------------------------------------- Ran 1 test in 15.928s OK [cymon at chara Tests]$ python run_tests.py test_BioSQL.py test_BioSQL ... ok ---------------------------------------------------------------------- Ran 1 test in 25.255s OK -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 11:00:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 11:00:02 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131600.mADG02lb002140@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #1 from cymon.cox at gmail.com 2008-11-13 11:00 EST ------- Created an attachment (id=1054) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1054&action=view) patch1 to test_BioSQL_SeqIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 11:00:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 11:00:35 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131600.mADG0Zhi002264@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #2 from cymon.cox at gmail.com 2008-11-13 11:00 EST ------- Created an attachment (id=1055) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1055&action=view) patch2 to BioSQL/BioSeq.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 11:28:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 11:28:48 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131628.mADGSmmf007542@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 11:28 EST ------- Another sensible improvement - checked in with only minor changes (fixed an assert in the unit test, and removed an old comment about sorting for subfeatures). Checking in BioSQL/BioSeq.py; /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py new revision: 1.30; previous revision: 1.29 done Checking in Tests/test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.25; previous revision: 1.24 done Thanks Cymon, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 13 11:33:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Nov 2008 16:33:43 +0000 Subject: [Biopython-dev] Lowess Smooth Improvement In-Reply-To: <171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com> References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> <320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com> <171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com> Message-ID: <320fb6e00811130833y3413eb36p92be13ca0ee1ed9a@mail.gmail.com> On Thu, Nov 13, 2008 at 4:25 PM, James Pruitt wrote: > I removed the Bio.Cluster reference because the system the code would run on > would not have acccess to it so the code was vestigial but on the version I > will submit, I reincluded the Bio.Cluster median function. Yes-- this is an > update for Bio/Statistics/lowess.py OK - file the enhancement bug, upload the code (ideally as a patch) and we'll take a look :) Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 13 12:09:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 12:09:37 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131709.mADH9blO013661@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #4 from cymon.cox at gmail.com 2008-11-13 12:09 EST ------- (In reply to comment #3) > Another sensible improvement - checked in with only minor changes (fixed an > assert in the unit test, Thanks Peter :) > and removed an old comment about sorting for > subfeatures). If the comment stays in, you'll need to remove these two lines of nonsense as well: test_BioSQL_SeqIO.py: 171 # Hence, the old_sub from SeqIO.parse() will have fuzzy location while 172 # new_sub locations from BioSQL will be fuzzy. Sorry about that. C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 12:17:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 12:17:15 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131717.mADHHFpR015244@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 12:17 EST ------- $ cvs commit -m "Removing two redundant comment lines (see Bug 2655)" test_BioSQL_SeqIO.py =========================================== dev.open-bio.org - Authorized Access Only =========================================== peterc at dev.open-bio.org's password: Checking in test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.26; previous revision: 1.25 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 20:23:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 20:23:26 -0500 Subject: [Biopython-dev] [Bug 2657] New: Improved Bio/Statistics/lowess.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2657 Summary: Improved Bio/Statistics/lowess.py Product: Biopython Version: 1.49b Platform: PC URL: http://pastebin.ca/1255734 OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.pruitt at gmail.com I noticed several calculations were done repeatedly when it could be saved as a single variable and used throughout. Then, I realized that it would be faster since the matrix was a statics size to just hard code solving the matrix into the function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 04:32:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 04:32:36 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811140932.mAE9Wa1f001445@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #1 from dalloliogm at gmail.com 2008-11-14 04:32 EST ------- ok, but consider that all posts on pastebin disappear after 30 days... You should add an attachment by clicking on 'Create a New Attachment' from this page (you can only do that after opening the bug report). p.s. what about adding some doctest to this module? Just to show an example on how to run it. Something like this: """ >>> import numpy >>> x = numpy.array([1, 2, 3, 4, 5]) >>> y = numpy.array([1, 2, 3, 4, 6]) >>> lowess(x, y) expected result """ - http://docs.python.org/library/doctest.html - http://bugzilla.open-bio.org/show_bug.cgi?id=2640 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 05:41:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 05:41:31 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141041.mAEAfVQO007220@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 05:41 EST ------- Created an attachment (id=1057) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1057&action=view) The updated lowess.py from http://pastebin.ca/raw/1255734 Attaching James' new file here so it doesn't just expire at pastebin. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:11:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:11:26 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141111.mAEBBQJm010925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 06:11 EST ------- I've updated CVS to use standard four space indentation, add a doctest and the copyright statement etc. James' code makes two code changes (shown against CVS revision 1.9). 67,68c67,68 < h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)] < w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0) --- > h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)] > w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0) Due to the historic usage "from Numeric import *" this code did once use Numeric.abs here, so it makes sense to use numpy.abs now. Probably just an oversight from the recent Numeric/numpy conversion. This is another reminder that using "from XXX import *" is a bad idea. 76,80c76,82 < b = numpy.array([sum(weights*y), sum(weights*y*x)]) < A = numpy.array([[sum(weights), sum(weights*x)], < [sum(weights*x), sum(weights*x*x)]]) < beta = numpy.linalg.solve(A,b) < yest[i] = beta[0] + beta[1]*x[i] --- > theta = weights*x > b_top = sum(weights*y) > b_bot = sum(theta*y) > a = sum(weights) > b = sum(theta) > d = sum(theta*x) > yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2) I can see the point of calculating and caching these: weights*y weights*x sum(weights*x) Was there a good reason for the name theta for weights*x? I personally think using an explicit matrix solver is much nicer to read than that complex hand coded version. Does it really save much time? My suggestion is just: 76,78c76,81 < b = numpy.array([sum(weights*y), sum(weights*y*x)]) < A = numpy.array([[sum(weights), sum(weights*x)], < [sum(weights*x), sum(weights*x*x)]]) --- > weights_x = weights*x > weights_y = weights*y > sum_weights_x = sum(weights_x) > b = numpy.array([sum(weights_y), sum(weights_y*x)]) > A = numpy.array([[sum(weights), sum_weights_x], > [sum_weights_x, sum(weights_x*x)]]) However, I'm going to leave this for Michiel to resolve (given he wrote the code in the first place). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:15:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:15:09 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141115.mAEBF9Gi011416@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #4 from eric.pruitt at gmail.com 2008-11-14 06:15 EST ------- Created an attachment (id=1058) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) Unit test for lowess.py File will need to have the import statements adjsuted for the Bio.Python structure. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Fri Nov 14 06:18:43 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 14 Nov 2008 11:18:43 +0000 Subject: [Biopython-dev] [BioPython] Problems with Emboss.Primer3 In-Reply-To: <001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de> References: <000801c94598$fd183f20$1022a8c0@ipkgatersleben.de> <320fb6e00811130643p357092f6y8e6d983a11909003@mail.gmail.com> <001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00811140318s452f9a5aj76eb7d505a98b6ee@mail.gmail.com> On Fri, Nov 14, 2008 at 10:37 AM, Stefanie L?ck wrote: > Thanks for the hints! > ... > It gives as well as at the command line: > > " > Command line: > eprimer3 -sequence p3input.txt -outfile out.pr3 -target 50,500 > Return code: > 1 > Errors: > > EMBOSS An error in ajnam.c at line 1991: > > EMBOSSWIN environment variable not defined > > Messages > > " > Any suggestions? This doesn't seem to be a Biopython problem, but an EMBOSS installation or configuration problem. What version of EMBOSS do you have? Maybe try upgrading to version 6? Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:28:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:28:36 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141128.mAEBSaSb013641@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |eric.pruitt at gmail.com ------- Comment #5 from eric.pruitt at gmail.com 2008-11-14 06:28 EST ------- (In reply to comment #3) > I've updated CVS to use standard four space indentation, add a doctest and the > copyright statement etc. > > James' code makes two code changes (shown against CVS revision 1.9). > > 67,68c67,68 > < h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)] > < w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0) > --- > > h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)] > > w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0) > > Due to the historic usage "from Numeric import *" this code did once use > Numeric.abs here, so it makes sense to use numpy.abs now. Probably just an > oversight from the recent Numeric/numpy conversion. This is another reminder > that using "from XXX import *" is a bad idea. > > 76,80c76,82 > < b = numpy.array([sum(weights*y), sum(weights*y*x)]) > < A = numpy.array([[sum(weights), sum(weights*x)], > < [sum(weights*x), sum(weights*x*x)]]) > < beta = numpy.linalg.solve(A,b) > < yest[i] = beta[0] + beta[1]*x[i] > --- > > theta = weights*x > > b_top = sum(weights*y) > > b_bot = sum(theta*y) > > a = sum(weights) > > b = sum(theta) > > d = sum(theta*x) > > yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2) > > I can see the point of calculating and caching these: > weights*y > weights*x > sum(weights*x) > > Was there a good reason for the name theta for weights*x? > > I personally think using an explicit matrix solver is much nicer to read than > that complex hand coded version. Does it really save much time? > > My suggestion is just: > 76,78c76,81 > < b = numpy.array([sum(weights*y), sum(weights*y*x)]) > < A = numpy.array([[sum(weights), sum(weights*x)], > < [sum(weights*x), sum(weights*x*x)]]) > --- > > weights_x = weights*x > > weights_y = weights*y > > sum_weights_x = sum(weights_x) > > b = numpy.array([sum(weights_y), sum(weights_y*x)]) > > A = numpy.array([[sum(weights), sum_weights_x], > > [sum_weights_x, sum(weights_x*x)]]) > > However, I'm going to leave this for Michiel to resolve (given he wrote the > code in the first place). > Yes-- replacing numpy saves quite a bit of time. When I replaced the variable so they werent recalculated every single time, it reduced unit test time 17% compared to the original then repaklcing numpy receduced it to a net 38% from the original so huge difference. Also, I suggest changing something if you all decided to keep numpy. Minor but just a suggestion. > weights_x = weights*x > sum_weights_x = sum(weights_x) > b = numpy.array([sum(weights*y), sum(weights_x*y)]) > A = numpy.array([[sum(weights), sum_weights_x], > [sum_weights_x, sum(weights_x*x)]]) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:32:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:32:39 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141132.mAEBWdlC014111@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 06:32 EST ------- (In reply to comment #4) > Created an attachment (id=1058) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details] > Unit test for lowess.py > > File will need to have the import statements adjsuted for the Bio.Python > structure. > You're also using scipy and rpy (not Biopython dependencies), so if we wanted to include these tests they would have to be made conditional on these external dependencies (so that the test framework knows when it can skip them). Removing them effectivly leaves one simple test: from numpy import array from Bio.Statistics.lowess import lowess hand_iterations = 1 hand_f = 2./3. hand_x = array([0.0,1.0,4.0,7.0]) hand_y = array([0.0,1.0,16.0,49.0]) #Was there a typo in the original, 18.85086... versus 18.5086...? #hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727] hand_out = [ -1.33338941, 2.80323154, 18.50860916, 48.30274834] method_out = lowess(hand_x,hand_y,hand_f,hand_iterations) for a,b in zip(method_out, hand_out) : assert abs(a-b) < 0.00001 print "Done" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:35:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:35:44 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141135.mAEBZiCO014367@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #7 from eric.pruitt at gmail.com 2008-11-14 06:35 EST ------- (In reply to comment #6) > (In reply to comment #4) > > Created an attachment (id=1058) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details] [details] > > Unit test for lowess.py > > > > File will need to have the import statements adjsuted for the Bio.Python > > structure. > > > > You're also using scipy and rpy (not Biopython dependencies), so if we wanted > to include these tests they would have to be made conditional on these external > dependencies (so that the test framework knows when it can skip them). > > Removing them effectivly leaves one simple test: > > from numpy import array > from Bio.Statistics.lowess import lowess > > hand_iterations = 1 > hand_f = 2./3. > hand_x = array([0.0,1.0,4.0,7.0]) > hand_y = array([0.0,1.0,16.0,49.0]) > #Was there a typo in the original, 18.85086... versus 18.5086...? > #hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727] > hand_out = [ -1.33338941, 2.80323154, 18.50860916, 48.30274834] > method_out = lowess(hand_x,hand_y,hand_f,hand_iterations) > for a,b in zip(method_out, hand_out) : > assert abs(a-b) < 0.00001 > print "Done" > When I did the hand calculations, I used a TI-84+ which uses decimal math eliminating the binary error inherent in most python implementations. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:38:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:38:51 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141138.mAEBcpNd014578@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 06:38 EST ------- (In reply to comment #5) >> I personally think using an explicit matrix solver is much nicer to read >> than that complex hand coded version. Does it really save much time? >> ... >> However, I'm going to leave this for Michiel to resolve (given he wrote >> the code in the first place). >> > > Yes-- replacing numpy saves quite a bit of time. When I replaced the variable > so they werent recalculated every single time, it reduced unit test time 17% > compared to the original then repaklcing numpy receduced it to a net 38% from > the original so huge difference. OK - so its clarity versus what sounds like a big speed difference. > Also, I suggest changing something if you all > decided to keep numpy. Minor but just a suggestion. > > > weights_x = weights*x > > sum_weights_x = sum(weights_x) > > b = numpy.array([sum(weights*y), sum(weights_x*y)]) > > A = numpy.array([[sum(weights), sum_weights_x], > > [sum_weights_x, sum(weights_x*x)]]) > I see, in defining b, sum(weights*y*x) can be done as sum(weights_x*y) which avoids creating the temp variable weights_y = weights*y, that does look better. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:41:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:41:05 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141141.mAEBf5IS014888@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC|eric.pruitt at gmail.com | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 06:48:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:48:07 -0500 Subject: [Biopython-dev] [Bug 2658] New: 1.49b version of PDB Neighborsearch still based on Numeric Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2658 Summary: 1.49b version of PDB Neighborsearch still based on Numeric Product: Biopython Version: 1.49b Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rbickerton at gmail.com Using python 2.52, running: python ./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py gives: Traceback (most recent call last): File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 138, in ns=NeighborSearch(al) File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 41, in __init__ assert(self.coords.typecode()=="f") AttributeError: 'numpy.ndarray' object has no attribute 'typecode' Exit 1 A bit of google digging suggested that .typecode()=="f" is a Numarray function that should be updated to its Numpy equivalent. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 07:06:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 07:06:28 -0500 Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch still based on Numeric In-Reply-To: Message-ID: <200811141206.mAEC6SEp016723@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OS/Version|Mac OS |All ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 07:06 EST ------- Yes, that does look like an oversight in the Numeric to NumPy migration. See also Bug 2649 for a related but different issue in Bio.KDTree -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 07:18:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 07:18:25 -0500 Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast In-Reply-To: Message-ID: <200811141218.mAECIPRT017833@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2634 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 07:18 EST ------- Hi Nick, I hope you got your blast to work. I don't think we have an issue with Biopython itself, so I'm going to close this bug. It would be nice to somehow improve the error handling, but that doesn't look straight forward. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 07:24:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 07:24:16 -0500 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200811141224.mAECOGMN018266@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 07:24 EST ------- I'm going to mark this as fixed given it seem to be OK. Please reopen this if there are any issues. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 14 07:27:23 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 12:27:23 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> Message-ID: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> On Sun, Nov 9, 2008 at 3:16 PM, Peter wrote: > Dear Biopythoneers, > > We are pleased to announce a beta release of Biopython 1.49. There are > been some significant changes since Biopython 1.48 was released two > months ago, which is why we are initially releasing a beta for wider > testing. > > As previously announced, the big news is that Biopython now uses NumPy > rather than its precursor Numeric (the original Numerical Python > library). We've had a few Numeric -> NumPy bugs reported, http://bugzilla.open-bio.org/show_bug.cgi?id=2658 Bug 2658 - Bio.PDB.Neighborsearch http://bugzilla.open-bio.org/show_bug.cgi?id=2649 Bug 2649 - Bio.KDTree (probably fixed) I don't think we should release Biopython 1.49 final until these are resolved - but if there was interest I could put out a second beta. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 08:17:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 08:17:39 -0500 Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on Windows, newline issue In-Reply-To: Message-ID: <200811141317.mAEDHdWo021804@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2638 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 08:17 EST ------- Patch checked in after testing with SIMCOAL2 on Windows XP. Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 10:16:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 10:16:12 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811141516.mAEFGClF031759@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 10:16 EST ------- I've added a general example doctest to the main docstring for the SeqRecord object. Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 10:35:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 10:35:18 -0500 Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like numpy or reportlab in run_tests.py In-Reply-To: Message-ID: <200811141535.mAEFZIP8001033@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2524 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 10:35 EST ------- Fixed the numpy test cases (they were getting annoying with python 2.6 on Windows where numpy isn't yet available). The reportlab tests already fail gracefully. I ended up going down this route: > (b) Modify all the tests using these semi-optional libraries to catch > the ImportError and raise MissingExternalDependencyError instead. As > the tests themselves generally don't directly import the external > library this is perhaps messy. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Nov 14 10:39:00 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 14 Nov 2008 09:39:00 -0600 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> Message-ID: <491D9B94.9050805@gmail.com> Peter wrote: > On Sun, Nov 9, 2008 at 3:16 PM, Peter wrote: > >> Dear Biopythoneers, >> >> We are pleased to announce a beta release of Biopython 1.49. There are >> been some significant changes since Biopython 1.48 was released two >> months ago, which is why we are initially releasing a beta for wider >> testing. >> >> As previously announced, the big news is that Biopython now uses NumPy >> rather than its precursor Numeric (the original Numerical Python >> library). >> > > We've had a few Numeric -> NumPy bugs reported, > > http://bugzilla.open-bio.org/show_bug.cgi?id=2658 > Bug 2658 - Bio.PDB.Neighborsearch > > http://bugzilla.open-bio.org/show_bug.cgi?id=2649 > Bug 2649 - Bio.KDTree (probably fixed) > > I don't think we should release Biopython 1.49 final until these are > resolved - but if there was interest I could put out a second beta. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > I noticed that Bio.PDB.Neighborsearch is not being tested. Is there someway to identify which functions are not getting tested? I know it is considerable effort but it would allow the development of tests that at the very least exercise all the Biopython code. (Hopefully this is not as bad as the Numpy documentation marathon.) Bruce From biopython at maubp.freeserve.co.uk Fri Nov 14 10:46:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 15:46:34 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <491D9B94.9050805@gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> <491D9B94.9050805@gmail.com> Message-ID: <320fb6e00811140746m119a040dv778163e0ab034a2@mail.gmail.com> On Fri, Nov 14, 2008 at 3:39 PM, Bruce Southey wrote: > Peter wrote: >> We've had a few Numeric -> NumPy bugs reported, >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 >> Bug 2658 - Bio.PDB.Neighborsearch >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 >> Bug 2649 - Bio.KDTree (probably fixed) >> >> ... > > I noticed that Bio.PDB.Neighborsearch is not being tested. > That fact that we didn't spot Bug 2658 from the unit tests makes that very clear ;) > > Is there someway to identify which functions are not getting tested? > I can't think of an easy way - the best bet might be a quick script to scan all the unit tests and pull out import lines, and from this build a list of all modules which have some coverage. This wouldn't tell us about how much of each module is tested, but it would be better than nothing. > I know it is considerable effort but it would allow the development of tests > that at the very least exercise all the Biopython code. (Hopefully this is > not as bad as the Numpy documentation marathon.) I've written plenty of tests myself, including for existing modules - my gut feeling is full test coverage would be quite a marathon. Compared to the early years of the project, I've propably tried to be a bit stricter about making sure we have test cases and documentation before accepting new code. In some cases this has worked out pretty well (e.g. Tiago's PopGen stuff is covered in the tutorial and has unit tests). In other cases it could put people off contributing code. Peter From biopython at maubp.freeserve.co.uk Fri Nov 14 12:24:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 17:24:33 +0000 Subject: [Biopython-dev] Test coverage Message-ID: <320fb6e00811140924g26cc0703r2629380540a5b667@mail.gmail.com> Bruce: >> >> Is there someway to identify which functions are not getting tested? >> Peter: > I can't think of an easy way - the best bet might be a quick script to > scan all the unit tests and pull out import lines, and from this build > a list of all modules which have some coverage. This wouldn't tell us > about how much of each module is tested, but it would be better than > nothing. I've done a very crude script to try and answer this, and can point out a few modules in need of tests: Bio.Affy Bio.AlignAce Bio.EZRetrieve Bio.Emboss (everything except the primer parsers) Bio.Encodings (obsolete?) Bio.FilteredReader (obsolete?) Bio.MaxEntropy Bio.NMR Bio.NaiveBayes Bio.NetCatch (obsolete?) Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:06:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:06:49 -0500 Subject: [Biopython-dev] [Bug 2659] New: Typo in tutorial section "2.1 General overview of what Biopython provides" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2659 Summary: Typo in tutorial section "2.1 General overview of what Biopython provides" Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "To me, this can be frustrating since I often WAY to just know the one right way to do something." Should be: "To me, this can be frustrating since I often WANT to just know the one right way to do something." -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:16:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:16:18 -0500 Subject: [Biopython-dev] [Bug 2660] New: Typo in tutorial section "2.2 Working with sequences" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2660 Summary: Typo in tutorial section "2.2 Working with sequences" Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "What we have here is a sequence object with a generic alphabet - reflecting the fact WE HAVE SPECIFIED if this is a DNA or protein sequence (okay, a protein with a lot of Alanines, Glycines, Cysteines and Threonines!)." Should read: "What we have here is a sequence object with a generic alphabet - reflecting the fact we have NOT specified if this is a DNA or protein sequence (okay, a protein with a lot of Alanines, Glycines, Cysteines and Threonines!)." -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:28:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:28:12 -0500 Subject: [Biopython-dev] [Bug 2659] Typo in tutorial section "2.1 General overview of what Biopython provides" In-Reply-To: Message-ID: <200811141828.mAEISCmZ013084@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2659 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:28 EST ------- Thanks :) That's fixed in CVS now, see Doc/Tutorial.tex revision 1.185, which you can view online here (updated every hour): http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython We'll update the HTML and PDF on the website as part of the next release (Biopython 1.49). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:34:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:34:34 -0500 Subject: [Biopython-dev] [Bug 2661] New: Typo in: "2.3 A usage example" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2661 Summary: Typo in: "2.3 A usage example" Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "We???ll start with sequence parsing in Section 2.4, but the orchids will be back later on as well - for example WE'LL EXTRA DATA FROM Swiss-Prot from certain orchid proteins in Section 6.1, search PubMed for papers about orchids in Section 6.2, extract sequence data from GenBank in Section 6.3.1, and work with ClustalW multiple sequence alignments of orchid proteins in Section 6.4.1." Capitalized phrase should contain some modifier like "we'll NEED extra", or "we'll GET extra". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:34:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:34:49 -0500 Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working with sequences" In-Reply-To: Message-ID: <200811141834.mAEIYnm6013826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2660 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:34 EST ------- The tutorial on the website (matching Biopython 1.49b) is fine: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Which version of Biopython are you using (you didn't fill this in on the bug report), or where are you reading this? Looking over CVS this text was only like this in Biopython 1.44, so I'm a little confused. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:38:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:38:06 -0500 Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3 A usage example" In-Reply-To: Message-ID: <200811141838.mAEIc6Qo014131@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2661 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:38 EST ------- As per Bug 2660, which version of Biopython are you using (you didn't fill this in on the bug report), or where are you reading this? This has already been fixed to say "extract" instead of "extra" (but I'm not going to check exactly when this was corrected). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:40:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:40:28 -0500 Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working with sequences" In-Reply-To: Message-ID: <200811141840.mAEIeSsm014238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2660 wilcoxjg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.44 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:41:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:41:47 -0500 Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3 A usage example" In-Reply-To: Message-ID: <200811141841.mAEIfll7014298@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2661 wilcoxjg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.44 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:47:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:47:28 -0500 Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working with sequences" In-Reply-To: Message-ID: <200811141847.mAEIlS8Y014586@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2660 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:47 EST ------- Hi Josh, If you were reading the tutorial shipped with Biopython 1.44 this makes sense. I certainly don't want to put you off reporting any other typos, but if you find any more please first check against the (almost completely) up to date version before reporting them: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Note that some of the things covered in the current tutorial will not apply to Biopython 1.44, which is now a year old. I'd encourage you to upgrade if possible. Thanks, Peter P.S. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mhampton at d.umn.edu Fri Nov 14 14:48:42 2008 From: mhampton at d.umn.edu (Marshall Hampton) Date: Fri, 14 Nov 2008 13:48:42 -0600 (CST) Subject: [Biopython-dev] coverage of function testing Message-ID: Hi, I noticed some discussion of the coverage and automation of testing for functions in biopython, and thought I would suggest folks check out the testing and coverage tools in Sage (www.sagemath.org). Testing of functions in Sage is done by testing examples in their docstrings - there are comments to opt out of testing or to indicate if they will take a long time. They also have scripts for checking which functions have at least one such testable example. So you can do something like this: sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py and get SCORE /Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py: 100% (21 of 21) to see if anything is untested. Now that biopython is converting to numpy, I will start arguing for its inclusion as a standard part of Sage (right now it is an optional package). Cheers, Marshall Hampton Integrated Biosciences Program and Department of Mathematics and Statistics University of Minnesota, Duluth From bugzilla-daemon at portal.open-bio.org Fri Nov 14 15:27:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 15:27:12 -0500 Subject: [Biopython-dev] [Bug 2662] New: Typo in tutorial "Chapter 3 Sequence objects " Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2662 Summary: Typo in tutorial "Chapter 3 Sequence objects " Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "First of all the Seq object has a slightly different set of METHODS TO A PLAIN python string (for example, reverse_complement() and translate() methods used for nucleotide sequences)." Should be: "methods THAN a plain python string" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 14 15:29:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 20:29:16 +0000 Subject: [Biopython-dev] coverage of function testing In-Reply-To: References: Message-ID: <320fb6e00811141229j3aa3a7b6ra3a064842e8f007c@mail.gmail.com> On Fri, Nov 14, 2008 at 7:48 PM, Marshall Hampton wrote: > Hi, > > I noticed some discussion of the coverage and automation of testing for > functions in biopython, and thought I would suggest folks check out the > testing and coverage tools in Sage (www.sagemath.org). Testing of functions > in Sage is done by testing examples in their docstrings - there are comments > to opt out of testing or to indicate if they will take a long time. They > also have scripts for checking which functions have at least one such > testable example. So you can do something like this: > > sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py > > and get > > SCORE > /Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py: > 100% (21 of 21) > > to see if anything is untested. That may be worth a go, but there are two sides to this: (1) Making a list of the code that needs testing (pretty much the same for any python library) (2) Working out what is already tested (and here, that means going over Biopython's test framework which is based on unit test, but also includes some use of doctests). This is probably trickier... > Now that biopython is converting to numpy, I will start arguing for its > inclusion as a standard part of Sage (right now it is an optional package). That sounds good - but I have no knowledge of the Sage system and how they divide things up. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:15:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 18:15:57 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811142315.mAENFvNc000930@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 18:15 EST ------- (In reply to comment #0) > Sentence reads: > > "First of all the Seq object has a slightly different set of METHODS TO A > PLAIN python string (for example, reverse_complement() and translate() > methods used for nucleotide sequences)." There's nothing wrong with that (and I got a second opinion on this too). The only thing I think that might need changing is adding a comma: "First of all, the Seq object...". > Should be: > "methods THAN a plain python string" Why exactly? Are you an American? ;) There is also the possible option of "... different ... from ...", but that doesn't flow as nicely here. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:47:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 18:47:16 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811142347.mAENlG5D003824@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #9 from eric.pruitt at gmail.com 2008-11-14 18:47 EST ------- Created an attachment (id=1059) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1059&action=view) Test for speed comparison I wrote a short program to compare the speed of the original lowess function to my version. I thought the way the unit test was written might have affected results. On my system, the new version ran an average of 15 seconds per test as opposed 19 for the old one so not the boost I originally purported but closer to 27%. Posting the program so someone else can compare it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 21:06:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 21:06:49 -0500 Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch still based on Numeric In-Reply-To: Message-ID: <200811150206.mAF26nhu013792@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-14 21:06 EST ------- Fixed in CVS; see Bio/PDB/NeighborSearch.py revision 1.21. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 22:59:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 22:59:22 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811150359.mAF3xM8D020801@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2008-11-14 22:59 EST ------- This warning is due to the introduction of Py_ssize_t in Python 2.5. The best solution for this bug depends on which Python versions will be supported by Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 23:04:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 23:04:00 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811150404.mAF4403S021350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2008-11-14 23:04 EST ------- A few comments: 1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing these two functions suggests that they are equally fast. 2) I have no objection against James' suggestion to speed up the code. The original call to numpy.linalg.solve was probably overkill. 3) Can you submit a unit test that does not use scipy and rpy? We should avoid adding additional dependencies to Biopython. 4) In the long run, I am not sure whether Biopython is the right place for the lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't stop us from improving the code here, though). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 02:16:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 02:16:11 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811150716.mAF7GB1r002223@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-15 02:16 EST ------- I have uploaded a fixed version to CVS. Could you try it? Bio/triemodule.c, revision 1.7. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 11:29:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 11:29:53 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811151629.mAFGTrgj008598@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #11 from eric.pruitt at gmail.com 2008-11-15 11:29 EST ------- (In reply to comment #10) > A few comments: > > 1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing > these two functions suggests that they are equally fast. > 2) I have no objection against James' suggestion to speed up the code. The > original call to numpy.linalg.solve was probably overkill. > 3) Can you submit a unit test that does not use scipy and rpy? We should avoid > adding additional dependencies to Biopython. > 4) In the long run, I am not sure whether Biopython is the right place for the > lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't > stop us from improving the code here, though). > Yes, I only had the scipy and rpy dependencies in my unit test because I wanted to have something to compare your function to when I was going to first use it in my code and to make sure it worked after I made changes to it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 12:07:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 12:07:36 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811151707.mAFH7aZM010885@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1057 is|0 |1 obsolete| | ------- Comment #12 from eric.pruitt at gmail.com 2008-11-15 12:07 EST ------- Created an attachment (id=1060) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1060&action=view) Updated lowess.py Renamed "theta" to a more logical name, "weighted_mul_x." Replaced numpy.abs with regular abs statement (Actually lead to a very slight but still there speed increase). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 12:08:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 12:08:15 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811151708.mAFH8F6n010936@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1058 is|0 |1 obsolete| | ------- Comment #13 from eric.pruitt at gmail.com 2008-11-15 12:08 EST ------- Created an attachment (id=1061) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1061&action=view) Unit test for lowess.py removing scipy and rpy dependencies -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 03:36:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 03:36:32 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811170836.mAH8aWoY027949@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp 2008-11-17 03:36 EST ------- I have uploaded the new code and the unit test with some modifications to CVS. Could you have a look at it to see if you're happy with the result? I am using numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional speedup. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 05:33:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 05:33:37 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171033.mAHAXbbS003922@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 05:33 EST ------- I haven't tried this on Linux yet. =================================== I've just updated to CVS and rebuilt on Windows with mingw32 (gcc 3.4.4 cygming special), using Python 2.3, 2.4, 2.5 and 2.6 - no warnings from the Bio.Trie code. I should have checked for any warnings BEFORE updating to CVS, but didn't. =================================== However, on Mac OS X 10.5 "Leopard" with I now get a lot of pointer warnings: building 'Bio.trie' extension creating build/temp.macosx-10.3-i386-2.5 creating build/temp.macosx-10.3-i386-2.5/Bio gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -IBio -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c Bio/triemodule.c -o build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:498: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:498: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -IBio -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c Bio/trie.c -o build/temp.macosx-10.3-i386-2.5/Bio/trie.o Bio/trie.c: In function ???Trie_set???: Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c: In function ???Trie_set???: Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c: In function ???Trie_get???: Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_transition???: Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c: In function ???Trie_get???: Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_transition???: Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_trie???: Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_trie???: Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???Trie_has_prefix???: Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat??? differ in signednessBio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_iterate_helper???: Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c: In function ???Trie_has_prefix???: Bio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_with_prefix_helper???: Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c: In function ???_iterate_helper???: Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c: In function ???_with_prefix_helper???: Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_serialize_transition???:Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c: In function ???_serialize_transition???: Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_deserialize_transition???: Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c: In function ???test???: Bio/trie.c:752: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:753: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:754: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:755: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:757: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:758: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:759: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c: In function ???_deserialize_transition???: Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup??? differ in signednessBio/trie.c:760: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:762: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:763: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:765: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:768: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:769: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c: In function ???test???: Bio/trie.c:752: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:753: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:754: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:755: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:757: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:758: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:759: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:760: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:762: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:763: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:765: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:768: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:769: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o build/temp.macosx-10.3-i386-2.5/Bio/trie.o -o build/lib.macosx-10.3-i386-2.5/Bio/trie.so $ python Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. $ gcc -v Using built-in specs. Target: i686-apple-darwin9 Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9 Thread model: posix gcc version 4.0.1 (Apple Inc. build 5465) Note that this gcc is only 4.0.1, while Bruce reported this bug on 4.3.2. The good news is test_trie.py and test_triefind.py still pass. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 05:41:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 05:41:35 -0500 Subject: [Biopython-dev] [Bug 2666] New: Bio.PDB.NeighborSearch self test often fails with MemoryError Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2666 Summary: Bio.PDB.NeighborSearch self test often fails with MemoryError Product: Biopython Version: Not Applicable Platform: PC OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk >From the Biopython source code (from CVS), in the Bio/PDB folder, running NeighborSearch.py does a quick self test. This is a random test, and sometimes this is fine: $ python NeighborSearch.py Found 1 Found 4 Found 3 Found 2 Found 2 Found 2 Found 3 Found 3 Found 1 Found 5 Found 2 Found 3 Found 2 Found 2 Found 2 Found 6 Found 3 Found 2 Found 3 Found 1 However, about 50% of the time I get something like this: $ python NeighborSearch.py Found 2 Found 1 Found 2 Found 1 Found 1 Found 1 Found 4 Found Traceback (most recent call last): File "NeighborSearch.py", line 139, in print "Found ", len(ns.search_all(5.0)) File "NeighborSearch.py", line 104, in search_all self.kdt.all_search(radius) File "/Users/pjcock/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/KDTree/KDTree.py", line 198, in all_search self.neighbors = self.kdt.neighbor_search(radius) MemoryError: calculation failed due to lack of memory I've tried this on a MAC which had over 4GB or RAM free at the time, so I don't believe this really is a MemoryError. I've also tried this on a less powerful Windows machine, which fails in the same way (it can finish the test, but possibly with a lower success rate). [As an aside, I'm planning to use this self test to create an actual Biopython unit test for the Bio.PDB.NeighborSearch module.] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 06:42:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 06:42:24 -0500 Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often fails with KDTree MemoryError In-Reply-To: Message-ID: <200811171142.mAHBgOD9008929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2666 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Bio.PDB.NeighborSearch self |Bio.PDB.NeighborSearch self |test often fails with |test often fails with KDTree |MemoryError |MemoryError ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 06:42 EST ------- I suspect this is failing when there are NO entries found within the specified radius. Changing this line: print "Found ", len(ns.search_all(5.0)) to use a larger search radius seems to "fix" the test, e.g. print "Found ", len(ns.search_all(10.0)) Similarly, dropping it to radius 2.0 makes it fail almost every time. I suspect something is amiss in the KDTree C code from the traceback. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 06:44:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 06:44:45 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171144.mAHBijrj009171@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2008-11-17 06:44 EST ------- (In reply to comment #3) Yes I know; that is bug #2608. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 07:09:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 07:09:15 -0500 Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often fails with KDTree MemoryError In-Reply-To: Message-ID: <200811171209.mAHC9FUF010799@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2666 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-17 07:09 EST ------- I fixed Bio.KDTree and committed it to CVS; please give it a try. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 07:14:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 07:14:19 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171214.mAHCEJa0011060@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 07:14 EST ------- (In reply to comment #4) > (In reply to comment #3) > Yes I know; that is bug #2608. > Oh. Sorry - I had seen Bug 2608 but hadn't made the connection. I've just confirmed Linux with gcc 4.1.2 is still happy. Over to Bruce to test with gcc 4.3.2 then... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 07:25:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 07:25:21 -0500 Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often fails with KDTree MemoryError In-Reply-To: Message-ID: <200811171225.mAHCPLmC011729@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2666 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 07:25 EST ------- That's fixed it - thanks! I've also updated test_PDB.py to include a quick test of this code, based on the Bio/PDB/NeighborSearch.py self test code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon Nov 17 08:27:51 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 17 Nov 2008 13:27:51 +0000 Subject: [Biopython-dev] PopGen.Stats Message-ID: <6d941f120811170527g752c28a7j48b42569c947853d@mail.gmail.com> After too much thinking and too much delaying (delaying in two distinct senses: proposal delay and delaying for more than 1 year doing the module), here is my proposal on how to proceed. Remembering a few fundamental points: 1. Statistics is the core of population genetics. Never Bio.PopGen will be relevant without it. 2. The framework should be future proof. 3. The API should be for general use (ie not only based on the cases developers know of). 4. It is very difficult to a have a broad view on how an API like this can be used (uses vary population genetics of cancer with micro arrays/lots of data to conservation genetics of species with a few samples and little number of loci). A waterfall approach to development is not only outdated as it would be quite counter productive. So I have no bureaucratic design document to provide. My proposal is to choose a bunch of statistics and tests that are representative of what people might use and implement them. During the implementation, through refactoring a reasonable API should take form. What statistics should be choosen then? What are representative statistics? I was able to find a list of classifications to start. This list got some inspiration from the very good Arlequin manual. Here are the different dimensions that I found: 1. Intra-Population versus Inter-population statistics. Say expected heterozygosity versus Fst 2. Marker dependent vs Marker independent. Say Allelic range (for microsatelites only) versus Fis 3. Data type: haployic, genotypic phase unknown, genotypic phase known, genoptypic dominant, frequency only. Say for expected heterozygosity frequencies are enough, for observed heterozygosity genotypic phase unknown data is necessary. 4. Single locus (e.g. allelic richness, ExpHe, Fst) versus multi-loci (e.g., number of polimorphic sites, LD or EHH) 5. Temporal/longitudinal vs single point in time. Say temporal-Fst versus Fst. 6. Population versus Landscape. This issue I suggest abandon for now. So, the idea is to choose a set of statistics that elucidate these points, with a good subset we will have a feeling on how everything fits together. We implement them and then iterate until the API "feels good". A suggestion of statistics: ExpHz non-temporal, intra, single-locus, marker independent, genotypic - gametic unk ObsHz non-temporal, intra, single-locus, independent, genotypic - gametic kn Fst(CW) non-temporal, inter, single-locus, indep, genotypic - gametic unk temporal-Fst temporal, intra, single-locus, indep, genotypic - gametic unk LD(D') non-temporal, intra, multi-locus, indep, haplo/geno Fk temporal, intra, single-locus, indep, geno S (polimorphic sites), non-temporal, intra, multi-locus, indep, haplo/geno Alleic range, nt, intra, single-locus, microsat, haplo/geno EHH, nt, positional Tajima D, nt, intra, single-locus, sequence/rflp There is still the issue of tests (say Hardy-Weinberg deviation), but that can be thought while the rest is being done. The good news is that the half of the above is already implemented (exceptions are allelic range, S, Tajima D, EHH - presented in increasing order of implementation difficulty). I propose implementing the remaining (I can do that, unless any other wants to give it a try) and then iterate the API until there is a rough agreement). This can be done on GIT (BTW, my username there is tiagoantao). I propose that ability to influence policy is roughly proportional with the time spent coding/effort done ;) . PS - I am assuming a sequence is a single locus in my reasoning. Of course it can be seen (and sometimes is) as a sequence of loci (SNPs). From bugzilla-daemon at portal.open-bio.org Mon Nov 17 13:29:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 13:29:08 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171829.mAHIT8u9006711@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #6 from bsouthey at gmail.com 2008-11-17 13:29 EST ------- > Over to Bruce to test with gcc 4.3.2 then... > Still the same warning for Python 2.5 and 2.6: Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:498: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type See PEP 353 (http://www.python.org/dev/peps/pep-0353/) which suggests to include: #if PY_VERSION_HEX < 0x02050000 && !defined(PY_SSIZE_T_MIN) typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #endif I did not get the warning after I added it to Bio.trie.h (as I thought that this would be the appropriate location for it) and changed the declaration in _write_value_to_handle for length to: Py_ssize_t length; But while this is fine for Python 2.3 and Python 2.4, I get the error with Python 2.5 and Python 2.6: [snip] test_trie ... ERROR test_triefind ... ok ====================================================================== ERROR: test_trie ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_trie.py", line 87, in trieobj3 = trie.load(h) ValueError: bad marshal data -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Nov 17 13:35:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 17 Nov 2008 12:35:05 -0600 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: References: Message-ID: <4921B959.2080706@gmail.com> Hi, I was just running the test under a very fresh cvs version and under Python2.3 the test was hanging with test_GASelection. Of course, there was no problem after killing it and rerunning the test. I think this also pertains to bug 2651 so I thought I would ask if there was a way to examine this further before doing anything else. I understand that this is problem with randomization involved, but it does indicate a more subtle problem is present. I would really like to track down the source of the problem. Does anyone have any ideas on how I could try to examine this further? Thanks Bruce From biopython at maubp.freeserve.co.uk Mon Nov 17 13:50:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Nov 2008 18:50:14 +0000 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <4921B959.2080706@gmail.com> References: <4921B959.2080706@gmail.com> Message-ID: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey wrote: > Hi, > I was just running the test under a very fresh cvs version and under > Python2.3 the test was hanging with test_GASelection. Of course, there was > no problem after killing it and rerunning the test. I think this also > pertains to bug 2651 so I thought I would ask if there was a way to examine > this further before doing anything else. I understand that this is problem > with randomization involved, but it does indicate a more subtle problem is > present. I would really like to track down the source of the problem. > > Does anyone have any ideas on how I could try to examine this further? If you have installed CVS (or indeed any recent version of Biopython, as the GA stuff hasn't changed recently IIRC), then in the Tests directory you can just run: $ python test_GASelection.py You'll find sometimes it gets stuck. I tried modifying the file so that the end reads as follows: if __name__ == "__main__": #sys.exit(run_tests(sys.argv)) ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest, RouletteWheelSelectionTest] runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) test_loader = unittest.TestLoader() test_loader.testMethodPrefix = 't_' test=ALL_TESTS[1] #Edit me: 0, 1 or 2 cur_suite = test_loader.loadTestsFromTestCase(test) count = 0 while True : count += 1 print "#"*50, count runner.run(cur_suite) On my machine, DiversitySelectionTest and RouletteWheelSelectionTest seem safe - the tests just run and run until you interrupt them with ctrl+c. However, this clearly gets stuck in TournamentSelectionTest - so we've narrowed this down a bit. Reading that bit of code, there is an apparent risk of an infinite loop if by chance org_1 happens to be the worst organism in the population. Perhaps adding a simple counter to break out of the loop if after 1000 tries org_1 is still the worst - but I'm not sure what to do then. Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 17 13:59:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 13:59:26 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811171859.mAHIxQgZ009193@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 13:59 EST ------- This is a quick hack to help pin-point the problem, assuming you have the CVS or recent version of Biopython installed, modify the end of test_GAQueens.py as follows: if __name__ == "__main__": #sys.exit(main(sys.argv)) count = 0 while True : count +=1 print "#"*50, count run_tests([]) This just repeats the test until it fails: $ python test_GAQueens.py ... ################################################## 7 Calculating for 5 queens... Generating an initial population of 1000 organisms... Evolving the population and searching for a solution... Traceback (most recent call last): File "test_GAQueens.py", line 405, in run_tests([]) File "test_GAQueens.py", line 42, in run_tests main(arguments) File "test_GAQueens.py", line 76, in main evolved_pop = evolver.evolve(queens_solved) File "/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Evolver.py", line 56, in evolve self._population = self._selector.select(self._population) File "/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Tournament.py", line 77, in select new_orgs[1]) File "/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Abstract.py", line 53, in mutate_and_crossover final_org_1 = self._repairer.repair(final_org_1) File "test_GAQueens.py", line 234, in repair duplicated_items = self._get_duplicates(organism.genome) File "test_GAQueens.py", line 203, in _get_duplicates if genome.count(item) > 1: File "/Users/xxx/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Seq.py", line 886, in count raise TypeError("expected a string, Seq or MutableSeq") TypeError: expected a string, Seq or MutableSeq i.e. The same traceback as in Bruce's original report (allowing for the update to the Seq object's count method), but easier to reproduce. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 14:18:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 14:18:24 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811171918.mAHJIO5t010436@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 14:18 EST ------- Solved with Tests/test_GAQueens.py revision 1.3 in CVS. When test_GAQueens.py was written, a Seq object would accept an integer argument. Since Biopython 1.45, or to be exact Bio/Seq.py CVS revision 1.20 (see Bug 2386), the Seq object's count method will not accept an integer argument. This wasn't deliberate, but is consistent with a python string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Nov 17 15:03:54 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 17 Nov 2008 14:03:54 -0600 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> Message-ID: <4921CE2A.3090606@gmail.com> Peter wrote: > On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey wrote: > >> Hi, >> I was just running the test under a very fresh cvs version and under >> Python2.3 the test was hanging with test_GASelection. Of course, there was >> no problem after killing it and rerunning the test. I think this also >> pertains to bug 2651 so I thought I would ask if there was a way to examine >> this further before doing anything else. I understand that this is problem >> with randomization involved, but it does indicate a more subtle problem is >> present. I would really like to track down the source of the problem. >> >> Does anyone have any ideas on how I could try to examine this further? >> > > If you have installed CVS (or indeed any recent version of Biopython, > as the GA stuff hasn't changed recently IIRC), then in the Tests > directory you can just run: > > $ python test_GASelection.py > > You'll find sometimes it gets stuck. I tried modifying the file so > that the end reads as follows: > > if __name__ == "__main__": > #sys.exit(run_tests(sys.argv)) > > ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest, > RouletteWheelSelectionTest] > > runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) > test_loader = unittest.TestLoader() > test_loader.testMethodPrefix = 't_' > > test=ALL_TESTS[1] #Edit me: 0, 1 or 2 > cur_suite = test_loader.loadTestsFromTestCase(test) > count = 0 > while True : > count += 1 > print "#"*50, count > runner.run(cur_suite) > > On my machine, DiversitySelectionTest and RouletteWheelSelectionTest > seem safe - the tests just run and run until you interrupt them with > ctrl+c. > > However, this clearly gets stuck in TournamentSelectionTest - so we've > narrowed this down a bit. Reading that bit of code, there is an > apparent risk of an infinite loop if by chance org_1 happens to be the > worst organism in the population. Perhaps adding a simple counter to > break out of the loop if after 1000 tries org_1 is still the worst - > but I'm not sure what to do then. > > Peter > > Hi, I ran the test multiple times using a bash loop and I think I tracked down this specific problem to within the actual test code, specifically the function TournamentSelectionTest.t_select_best(). I think this what Peter noticed. This is how I understand things which I hope is sufficient correct to understand it. The test simulates a genome that has 3 locations with the 4 bases coded as '0', '1', '2', and '3' for an 'organism'. (Note the 3 locations is hard coded into the random_genome function.) The calculation of fitness of an organism is just the integer of the coded values do the first position is hundreds, the second is tens and last is ones. In the TournamentSelectionTest.t_select_best, a second organism is simulated that must have a better fitness than the first. The problem comes is when the simulated genome of the first organism is '000' because the fitness is zero. This creates an infinite loop because the line : if org_2.fitness < org_1.fitness: will always to false but eventually this must be true to break the loop. Obviously this loop becomes infinite and, given that there are only three locations, it should be rather frequent. Is it sufficient to use the condition '<='? Alternatively, is there someway to fix the genome of the first organism rather than a random one? For example, instead of the random_organism() declare it as say: org_1=Organism('100', test_fitness) Bruce From biopython at maubp.freeserve.co.uk Mon Nov 17 16:49:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Nov 2008 21:49:02 +0000 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <4921CE2A.3090606@gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> <4921CE2A.3090606@gmail.com> Message-ID: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> Bruce wrote: > Peter wrote: >> However, this clearly gets stuck in TournamentSelectionTest - so we've >> narrowed this down a bit. Reading that bit of code, there is an >> apparent risk of an infinite loop if by chance org_1 happens to be the >> worst organism in the population. Perhaps adding a simple counter to >> break out of the loop if after 1000 tries org_1 is still the worst - >> but I'm not sure what to do then. >> >> Peter > > Hi, > I ran the test multiple times using a bash loop and I think I tracked down > this specific problem to within the actual test code, specifically the > function TournamentSelectionTest.t_select_best(). I think this what Peter > noticed. Yes, this was what I was describing. > This is how I understand things which I hope is sufficient correct to > understand it. > > The test simulates a genome that has 3 locations with the 4 bases coded > as '0', '1', '2', and '3' for an 'organism'. (Note the 3 locations is hard > coded into the random_genome function.) The calculation of fitness of an > organism is just the integer of the coded values do the first position is > hundreds, the second is tens and last is ones. > > In the TournamentSelectionTest.t_select_best, a second organism is simulated > that must have a better fitness than the first. The problem comes is when > the simulated genome of the first organism is '000' because the fitness is > zero. This creates an infinite loop because the line : > if org_2.fitness < org_1.fitness: > will always to false but eventually this must be true to break the loop. > Obviously this loop becomes infinite and, given that there are only three > locations, it should be rather frequent. Yes. > Is it sufficient to use the condition '<='? No, I don't think so. The point of the setup seems to be to look for a pair of organisms where one is measurably fitter than the other (and make sure the better one is indeed selected). > Alternatively, is there someway to fix the genome of the first organism > rather than a random one? > For example, instead of the random_organism() declare it as say: > org_1=Organism('100', test_fitness) We could do something like: #Choose anything except the worst organism, "000", while True : org_1=random_organism() if test_fitness(org_1) > 0 : break [Not tested yet] This at least is more or less random. Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 17 17:10:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 17:10:27 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811172210.mAHMARax021977@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #15 from eric.pruitt at gmail.com 2008-11-17 17:10 EST ------- (In reply to comment #14) > I have uploaded the new code and the unit test with some modifications to CVS. > Could you have a look at it to see if you're happy with the result? I am using > numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional > speedup. > That worked really well; I'm happy with the results. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 17:22:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 17:22:52 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811172222.mAHMMq6F022720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 17:22 EST ------- (In reply to comment #15) > > That worked really well; I'm happy with the results. > Excellent - thanks James & Michiel! Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Nov 17 17:49:19 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 17 Nov 2008 16:49:19 -0600 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> <4921CE2A.3090606@gmail.com> <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> Message-ID: <4921F4EF.4030005@gmail.com> Peter wrote: [snip] > >> Alternatively, is there someway to fix the genome of the first organism >> rather than a random one? >> For example, instead of the random_organism() declare it as say: >> org_1=Organism('100', test_fitness) >> > > We could do something like: > > #Choose anything except the worst organism, "000", > while True : > org_1=random_organism() > if test_fitness(org_1) > 0 : break > This needs to be: if org_1.fitness > 0 : break Also, when looping the test, I occasionally get Test not getting an organism already in the new population. ... FAIL Test basic selection on a small population. ... ok ====================================================================== FAIL: Test not getting an organism already in the new population. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_GASelection.py", line 130, in t_no_retrive_organism assert new_org != org, "Got organism already in the new population." AssertionError: Got organism already in the new population. I'll try to look at it tomorrow. Bruce PS thanks for fixing test_GAQueens.py as I have not got it error even running it 10000 times. From biopython at maubp.freeserve.co.uk Mon Nov 17 18:18:12 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Nov 2008 23:18:12 +0000 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <4921F4EF.4030005@gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> <4921CE2A.3090606@gmail.com> <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> <4921F4EF.4030005@gmail.com> Message-ID: <320fb6e00811171518p78a3c25cq527c2ef338692ad2@mail.gmail.com> > This needs to be: > if org_1.fitness > 0 : break Yeah. I've checked in a fix based on this approach, could you try test_GASelection.py revision 1.3 just to make sure I've not done something silly. > Also, when looping the test, I occasionally get > Test not getting an organism already in the new population. ... FAIL > Test basic selection on a small population. ... ok > > ====================================================================== > FAIL: Test not getting an organism already in the new population. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_GASelection.py", line 130, in t_no_retrive_organism > assert new_org != org, "Got organism already in the new population." > AssertionError: Got organism already in the new population. Confirmed - when I was just looking for the hanging sub-test, I didn't spot this. >From my reading of the GA code there is no guarantee that DiversitySelection will return a completely new organism. If it has to generate one at random, there is a small chance it will match something already in the population. i.e. the test itself is flawed. We could try this say 10 times, but even then the test could fail. I've fixed this in test_GASelection.py revision 1.4 by simply commenting out the assert in DiversitySelectionTest.t_no_retrive_organism. However, maybe the underlying Bio.GA.Selection.Diversity code could be altered instead to guarantee this possibly desirable behaviour? Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 18 06:13:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 06:13:31 -0500 Subject: [Biopython-dev] [Bug 2670] New: Populate seqfeature.display_name Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2670 Summary: Populate seqfeature.display_name Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The seqfeature table has a display_name text field, currently left blank by Biopython's loader, but is populated by BioPerl. This field is used in GBrowse for example: http://gmod.org/wiki/GBrowse We could use the protein_id, locus_tag, etc depending on what annotation is available (ideally use the same as BioPerl). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 10:06:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:06:06 -0500 Subject: [Biopython-dev] [Bug 2671] New: Including GenomeDiagram in the main Biopython distribution Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2671 Summary: Including GenomeDiagram in the main Biopython distribution Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk Thanks largely to the efforts of Robert Cadena, we have modified GenomeDiagram so that it plays nicely with the current CVS of Biopython and would like to propose its inclusion as part of the main distribution. GenomeDiagram is described in a Bioinformatics publication (http://dx.doi.org/10.1093/bioinformatics/btk021), and is useful for construction of circular and linear images of biological sequence data, with a specific domain of visualisation of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence as publication-quality vector graphics. It's based on the Reportlab backend, and can be used to produce rastered and streamed image output, too. The major changes that have been made to the version previously available at http://bioinf.scri.ac.uk/lp are: Class names have been changed and no longer have the GD prefix References to 'colour' have been changed to 'color', but both spellings are still permitted in function calls, for backwards-compatibility The default font has been changed to 'Vera', which is shipped with Reportlab, to avoid some problems with unavailable fonts Code for wx widgets has been removed, although the Observer/Observable code remains, allowing user widgets to hook into the code, if that's desirable. Some test code is included, testing colour translation and the ability to produce PDF output in circular and linear diagram formats. Other minor changes to reduce deprecation warnings (those in Reportlab proper remain, however), and to remove code that caused font issues. There are known issues, still. Writing to a raster format, such as PNG, uses Reportlab's renderPM code, which defaults to using fonts that are not installed by Reportlab itself, anymore. This is a Reportlab issue and doesn't affect production of PDF output, so testing currently only checks the ability to generate PDF output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 10:12:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:12:32 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811181512.mAIFCWJY023516@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #1 from lpritc at scri.sari.ac.uk 2008-11-18 10:12 EST ------- Created an attachment (id=1063) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1063&action=view) GenomeDiagram code, ready to drop into Biopython CVS Contains GenomeDiagram code under Bio.Graphics.GenomeDiagram, and test code with examples. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 10:44:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:44:29 -0500 Subject: [Biopython-dev] [Bug 2672] New: test_lowess and test_docstrings fail to check if numpy is installed Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2672 Summary: test_lowess and test_docstrings fail to check if numpy is installed Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I used the cvs version with a version Python 2.5 that does not have numpy installed. Both test_lowess and test_docstring need to have checks for the presence of Numpy like other tests that require NumPy. These tests should also be skipped with messages like: test_kNN ... skipping. Install NumPy if you want to use Bio.kNN. ====================================================================== ERROR: test_docstrings ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_docstrings.py", line 18, in import Bio.Statistics.lowess File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py", line 23, in import numpy ImportError: No module named numpy ====================================================================== ERROR: test_lowess ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_lowess.py", line 1, in from Bio.Statistics.lowess import lowess File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py", line 23, in import numpy ImportError: No module named numpy -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 10:56:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:56:01 -0500 Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to check if numpy is installed In-Reply-To: Message-ID: <200811181556.mAIFu1o1026838@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2672 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 10:56 EST ------- I've fixed test_lowess.py with CVS revision 1.2 to check for numpy as in Bug 2534 For test_docstring.py, I think we could split this in two: test_docstring.py - no numpy dependence test_docstring_numpy.py - for modules which need numpy Or, have some code within test_docstring.py to adjust the list of tests according to if numpy is installed or not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 11:05:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 11:05:29 -0500 Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to check if numpy is installed In-Reply-To: Message-ID: <200811181605.mAIG5TjK027987@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2672 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 11:05 EST ------- (In reply to comment #1) > For test_docstring.py, I think we could split this in two: > > test_docstring.py - no numpy dependence > test_docstring_numpy.py - for modules which need numpy > > Or, have some code within test_docstring.py to adjust the list of tests > according to if numpy is installed or not. I've gone for the second approach, see test_docstring.py CVS revision 1.6 Marking as fixed. Thanks Bruce :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 11:08:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 11:08:54 -0500 Subject: [Biopython-dev] [Bug 2607] Gcc "differ in signedness" warning with cstringfnsmodule.c In-Reply-To: Message-ID: <200811181608.mAIG8ss2028159@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2607 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 11:08 EST ------- Since this bug was filed, we've declared this module obsolete for Biopython 1.49, and assuming we press ahead and deprecate it in Biopython 1.50 then I don't see any point in fixing this compiler warning. Marking as "won't fix". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 13:35:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 13:35:25 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811181835.mAIIZPgc004892@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 13:35 EST ------- (In reply to comment #6) > Still the same warning for Python 2.5 and 2.6: > > Bio/triemodule.c: In function ???_write_value_to_handle???: > Bio/triemodule.c:498: warning: passing argument 3 of > ???PyString_AsStringAndSize??? from incompatible pointer type It looks like PyString_AsStringAndSize will expect a Py_ssize_t length, and not just an int length. Suggested patch: Index: triemodule.c =================================================================== RCS file: /home/repository/biopython/biopython/Bio/triemodule.c,v retrieving revision 1.7 diff -r1.7 triemodule.c 486a487,489 > #if PY_VERSION_HEX < 0x02050000 > Py_ssize_t length; > #else 487a491 > #endif i.e. in function _write_value_to_handle, at line 486 replace this: int length; with this: #if PY_VERSION_HEX < 0x02050000 Py_ssize_t length; #else int length; #endif This still compiles for me on Python 2.5.2 with gcc 4.0.1 on a Mac. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 21:11:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 21:11:34 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811190211.mAJ2BYpO031573@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2008-11-18 21:11 EST ------- I've uploaded a slightly different version to CVS (there were more Py_ssize_t / int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We should also see if the unit test still passes on 64 bit platforms. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 22:08:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 22:08:43 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811190308.mAJ38hkI003686@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #9 from bsouthey at gmail.com 2008-11-18 22:08 EST ------- I quickly build the cvs version and the associated tests passed with the various Python versions 2.3, 2.4, 2.5 (with and without numpy) and 2.6 on my system. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 03:45:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 03:45:52 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811190845.mAJ8jqv4023408@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #2 from lpritc at scri.sari.ac.uk 2008-11-19 03:45 EST ------- The copyright/credit section at the top of each file still needs to be changed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 05:14:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 05:14:57 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811191014.mAJAEv6m032436@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 05:14 EST ------- (In reply to comment #8) > I've uploaded a slightly different version to CVS (there were more Py_ssize_t > / int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We > should also see if the unit test still passes on 64 bit platforms. > CVS version compiles triemodule with no warnings using Python 2.5.2 with gcc 4.0.1 on a Mac. Unit tests pass. CVS version compiles triemodule with no warnings using Python 2.5 with gcc 4.1.2 on Linux (i686 so 32 bit). Unit tests pass. CVS version compiles triemodule with no warnings using Python 2.4.3 with gcc 3.4.6 on Linux (x86_64 so 64 bit). Unit tests pass. It sounds like Bruce has checked all python versions with gcc 4.3.2 on Linux. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 07:17:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 07:17:23 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811191217.mAJCHN21008817@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp 2008-11-19 07:17 EST ------- I tried several Windows versions and a 64 bit unix platform. Everything seems to be OK. Closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 09:38:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:38:33 -0500 Subject: [Biopython-dev] [Bug 2674] New: test_kNN: Removal of from numpy import * Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2674 Summary: test_kNN: Removal of from numpy import * Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com This test contains a import numpy statement to check numpy is available. Therefore it is sufficient just to say 'import numpy'. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 09:39:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:39:52 -0500 Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import * In-Reply-To: Message-ID: <200811191439.mAJEdqkH019174@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2674 ------- Comment #1 from bsouthey at gmail.com 2008-11-19 09:39 EST ------- Created an attachment (id=1064) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1064&action=view) patch to change import numpy statement Just for completeness. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 09:42:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:42:27 -0500 Subject: [Biopython-dev] [Bug 2675] New: Use import numpy in kNN Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2675 Summary: Use import numpy in kNN Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Replacing the 'from numpy import *' statement with import numpy. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 09:43:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:43:12 -0500 Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN In-Reply-To: Message-ID: <200811191443.mAJEhCXu019472@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2675 ------- Comment #1 from bsouthey at gmail.com 2008-11-19 09:43 EST ------- Created an attachment (id=1065) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1065&action=view) patch to change import numpy statement Changes the way numpy is imported. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 09:53:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:53:31 -0500 Subject: [Biopython-dev] [Bug 2676] New: LogisticRegression: changed the way numpy is imported Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2676 Summary: LogisticRegression: changed the way numpy is imported Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com A patch to remove the usage of 'from numpy import *' usage. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 09:54:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:54:10 -0500 Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way numpy is imported In-Reply-To: Message-ID: <200811191454.mAJEsAeg020318@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2676 ------- Comment #1 from bsouthey at gmail.com 2008-11-19 09:54 EST ------- Created an attachment (id=1066) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1066&action=view) patch to change import numpy statement -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 10:04:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:04:39 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191504.mAJF4diO021040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython-dev at biopython.org AssignedTo|biopython-dev at biopython.org |chapmanb at 50mail.com ------- Comment #3 from chapmanb at 50mail.com 2008-11-19 10:04 EST ------- Leighton; This is great; thanks for getting it together. I took a look at this last night and have a couple of quick comments: - on the licensing front, the current GPL is not compatible with the Biopython license; it would be nice to have you explicitly say you are okay with re-licensing this version under the Biopython license (http://www.biopython.org/DIST/LICENSE) - Would it be possible to update the GenomeDiagram documentation from here (http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to reflect the new namespace and class name changes? Mentioning some of the gotchas you have below, possibly to replace the installation section, would also be nice. I would like Peter and anyone one else interested to weigh in, but I can work on getting this in after the next release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 10:13:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:13:46 -0500 Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import * In-Reply-To: Message-ID: <200811191513.mAJFDkuO021701@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2674 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:13 EST ------- Fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 10:17:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:17:28 -0500 Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN In-Reply-To: Message-ID: <200811191517.mAJFHSID022021@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2675 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:17 EST ------- Fixed in CVS, Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 10:21:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:21:41 -0500 Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way numpy is imported In-Reply-To: Message-ID: <200811191521.mAJFLf8a022292@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2676 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:21 EST ------- Fixed in CVS, thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 10:29:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:29:25 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191529.mAJFTPhW022858@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1062 is|0 |1 obsolete| | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:29 EST ------- (From update of attachment 1062) This attachment seems to have been removed (or failed to upload?). See attachment 1063 instead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 10:29:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:29:50 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191529.mAJFTon7022928@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:29 EST ------- (In reply to comment #3) > > I would like Peter and anyone one else interested to weigh in, but > I can work on getting this in after the next release. > I'm all for adding GenomeDiagram to Biopython (as stated on the mailing list). I haven't actually looked at this revised code base yet - but as I've used GD before and know Leighton "in real life" it might be easier for me to shepherd this into CVS - but the more eyes the better ;) We might also consider getting Leighton CVS access (provisionally use with this module only). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 11:07:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 11:07:24 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191607.mAJG7OcJ025581@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #6 from lpritc at scri.sari.ac.uk 2008-11-19 11:07 EST ------- (In reply to comment #5) > Leighton; > This is great; thanks for getting it together. I took a look at this last night > and have a couple of quick comments: No problem. Robert Cadena deserves the bulk of the credit - he made most of the changes. > - on the licensing front, the current GPL is not compatible with the Biopython > license; it would be nice to have you explicitly say you are okay with > re-licensing this version under the Biopython license > (http://www.biopython.org/DIST/LICENSE) I am perfectly happy with re-licensing the GD code under the Biopython license. If you need a gpg-signed document to say so, I can provide one ;) > - Would it be possible to update the GenomeDiagram documentation from here > (http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to > reflect the new namespace and class name changes? Yep - I'll do that, next. > Mentioning some of the > gotchas you have below, possibly to replace the installation section, would > also be nice. Definitely. Most of the gotchas are Reportlab-related, but they definitely have a place under Installation in the docs. > I would like Peter and anyone one else interested to weigh in, but I can work > on getting this in after the next release. The more, the merrier... it's not my little baby anymore it's out in the big world ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 16:49:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 16:49:48 -0500 Subject: [Biopython-dev] [Bug 2677] New: BioSQL seqfeature enhancements Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2677 Summary: BioSQL seqfeature enhancements Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator storage and test. Added remote location storage for sub-features, and test. Ive used the "Sequence Keys" ontology for the location operator and stored loc op in the location_qualifier_value table - not sure this is right... Patches attached. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 16:51:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 16:51:53 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811192151.mAJLprRP024242@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #1 from cymon.cox at gmail.com 2008-11-19 16:51 EST ------- Created an attachment (id=1072) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1072&action=view) Patch for BioSQL/BioSeq.py and Loader.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 16:52:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 16:52:46 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811192152.mAJLqk91024384@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #2 from cymon.cox at gmail.com 2008-11-19 16:52 EST ------- Created an attachment (id=1073) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1073&action=view) Patch for BioSQL test cases -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 05:17:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 05:17:17 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811201017.mAKAHHA8027467@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-20 05:17 EST ------- (In reply to comment #0) > Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator > storage and test. Added remote location storage for sub-features, and test. > Excellent - I see you've removed the naive min/max to find the parent feature's location when dealing with sub-features. This should fix the special case where a feature spans the origin on a circular genome. That should take care of many of my "TODO" entries in test_BioSQL_SeqIO.py :) > > Ive used the "Sequence Keys" ontology for the location operator and stored > loc op in the location_qualifier_value table - not sure this is right... > I'm not sure off hand either, but would like us to check before committing this. In the short term, what ever BioPerl does is "right" as I'm treating that as the BioSQL reference implementation. > > Patches attached. > I've scanned over them quickly, and they look fine. The comments do help :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 05:53:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 05:53:19 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811201053.mAKArJsp029436@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-20 05:53 EST ------- Unless anyone else wants to weigh in on Josh's side, I'm not going to change this. Closing bug - but thanks for reporting it anyway Josh. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 20 05:55:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 20 Nov 2008 10:55:57 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> Message-ID: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com> OK, Progress since Biopython 1.49 beta was released: > We've had a few Numeric -> NumPy bugs reported, > > http://bugzilla.open-bio.org/show_bug.cgi?id=2658 > Bug 2658 - Bio.PDB.Neighborsearch Fixed. > http://bugzilla.open-bio.org/show_bug.cgi?id=2649 > Bug 2649 - Bio.KDTree (probably fixed) No confirmation from the original reporter, but looks OK. > I don't think we should release Biopython 1.49 final until these are > resolved - but if there was interest I could put out a second beta. No-one seems to want a second beta, which saves me some time :) There have been a few other bugs reported and fixed in the meantime, right now the only thing I think holding up the release of Biopython 1.49 is: http://bugzilla.open-bio.org/show_bug.cgi?id=2677 Bug 2677 - BioSQL seqfeature enhancements Is there anything else? Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 20 09:19:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 09:19:39 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811201419.mAKEJcW6011296@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-11-20 09:19 EST ------- I am not a native English speaker, but I do agree with Josh that the original phrase "... different set of methods TO a plain python string" sounds strange to me. I would suggest something along the lines of "the set of methods of a Seq object are slightly different from those of a plain python string." But again, that may be Double Dutch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 09:34:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 09:34:25 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811201434.mAKEYPOh015951@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-20 09:34 EST ------- (In reply to comment #3) > I am not a native English speaker, but I do agree with Josh that the original > phrase "... different set of methods TO a plain python string" sounds strange > to me. As a native English speaker I'm happy with this as is, but concede international usage may vary - and I do want the Tutorial to be as assessable as possible. > I would suggest something along the lines of "the set of methods of a > Seq object are slightly different from those of a plain python string." > But again, that may be Double Dutch. I would say a "set of methods" is singular, but the rest of this sentence is plural. How about completely rephrasing: First of all, they have some different methods (for example, Seq objects have reverse_complement() and translate() methods used for nucleotide sequences). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Nov 20 10:09:42 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 20 Nov 2008 09:09:42 -0600 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com> Message-ID: <49257DB6.5080902@gmail.com> Hi, In connection with Peter's email on forthcoming release, I was wondering what to do about certain modules that do not seem to be used. I started to look at the examples that lack test coverage in case one could do something for the Biopython 1.49 release. But this should not provide any reason for delay the release and may stretch beyond it. Given the potential long term impact and spirit of people who donated the code, I was thinking that the release notes could denote which modules are unsupported and need some usage feedback. In future releases the use of these modules would raise a warning about being unsupported or obsolete. Please note that I am not against any of these modules except for the requirement to maintain them and developing suitable tests. The possible modules are those that Peter previously mentioned that had no tests: Bio.Affy Bio.AlignAce Bio.EZRetrieve Bio.Emboss (everything except the primer parsers) Bio.Encodings (obsolete?) Bio.FilteredReader (obsolete?) Bio.MaxEntropy Bio.NMR Bio.NaiveBayes Bio.NetCatch (obsolete?) I think that Bio.MaxEntropy and Bio.NaiveBayes are useful and I did provide an example that is included in the code. However I am not confident in these methods to maintain these mainly due to my lack of knowledge. Similarly for Bio.Affy, I currently work a lot with two-dye systems but not Affy. I find that Bio.Affy provides insufficient functionality because it does really only reads the intensities and misses other important information in version 3 of Affy format. I do recognize that it could be a base for Affy stuff that may be useful for users such as the PopGen users that use Affy SNP arrays. Bruce Peter wrote: > OK, > > Progress since Biopython 1.49 beta was released: > > >> We've had a few Numeric -> NumPy bugs reported, >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 >> Bug 2658 - Bio.PDB.Neighborsearch >> > > Fixed. > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 >> Bug 2649 - Bio.KDTree (probably fixed) >> > > No confirmation from the original reporter, but looks OK. > > >> I don't think we should release Biopython 1.49 final until these are >> resolved - but if there was interest I could put out a second beta. >> > > No-one seems to want a second beta, which saves me some time :) > > There have been a few other bugs reported and fixed in the meantime, > right now the only thing I think holding up the release of Biopython > 1.49 is: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2677 > Bug 2677 - BioSQL seqfeature enhancements > > Is there anything else? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From bsouthey at gmail.com Thu Nov 20 11:26:40 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 20 Nov 2008 10:26:40 -0600 Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or redunant Message-ID: <49258FC0.10703@gmail.com> Hi, The Bio.EZRetrieve module retrieves a single nucleotide sequence from EZRetrieve website: http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, or IMAGE ID. No other genomes are supported. Although it appears faster than a Bio.GenBank query, I do not see that this module provides any special functionality than that already provided by Bio.GenBank and similar. So I think this module is obsolete and redundant. Notes: 1) Obviously LocusLink has been superseded by Entrez Gene. 2) The documented genome builds are 2003 (eg human BUILD.34 at 11/04/2003) but not known if these have been updated since. 3) The start of the sequence is zero. You can use from_='start' instead but the can not mix it with numerical ending. 4) The actual website provides additional information including NCBI links (LocusLink and Nucleic) and does base counting. 5) There are other functions provided by the website like multiple retrievals. The website example is for 'homeobox B6 [/Homo sapiens/]': import Bio.EZRetrieve seq=Bio.EZRetrieve.retrieve_single('BC014651', 1, 20) print seq Gives: >BC014651:HOXB6 ACCACACCTAGGTCGGAGCA Bruce From bugzilla-daemon at portal.open-bio.org Thu Nov 20 12:05:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:05:22 -0500 Subject: [Biopython-dev] [Bug 2678] New: Entrez.esearch does not always retrieve or find DTD files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2678 Summary: Entrez.esearch does not always retrieve or find DTD files Product: Biopython Version: 1.49b Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk When using Entrez.esearch, I have observed an intermittent failure to recover DTD files. These are not being cached on successful search attempts. It may be worth including them in the distribution. Traceback: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) Traceback (most recent call last): File "./get_entrez_ests.py", line 158, in main() File "./get_entrez_ests.py", line 45, in main options.verbose) File "./get_entrez_ests.py", line 76, in get_entrez_session results = Entrez.read(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py", line 286, in read record = handler.run(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 95, in run self.parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 283, in external_entity_ref_handler parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 280, in external_entity_ref_handler handle = urllib.urlopen(systemId) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 87, in urlopen return opener.open(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open return getattr(self, name)(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 461, in open_file return self.open_local_file(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 475, in open_local_file raise IOError(e.errno, e.strerror, e.filename) IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 20 12:06:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 20 Nov 2008 17:06:34 +0000 Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or redunant In-Reply-To: <49258FC0.10703@gmail.com> References: <49258FC0.10703@gmail.com> Message-ID: <320fb6e00811200906p4b8ba2b9jca212a39ec8f972c@mail.gmail.com> On Thu, Nov 20, 2008 at 4:26 PM, Bruce Southey wrote: > Hi, > The Bio.EZRetrieve module retrieves a single nucleotide sequence from > EZRetrieve website: > http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp > It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, or > IMAGE ID. No other genomes are supported. > > Although it appears faster than a Bio.GenBank query, I do not see that this > module provides any special functionality than that already provided by > Bio.GenBank and similar. So I think this module is obsolete and redundant. Note the online bits of Bio.GenBank are considered obsoleted by Bio.Entrez anyway. Maybe we should actually deprecate these for Biopython 1.49... I would agree in some ways Bio.EZRetrieve module is also obsolete and redundant, see also: http://lists.open-bio.org/pipermail/biopython-dev/2008-March/003503.html Unless anyone wants to defend Bio.EZRetrieve, let's ask on the main list about declaring it obsolete for Biopython 1.49 (documentation change only) and deprecating it in the next release (adding a warning only). Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 20 12:06:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:06:37 -0500 Subject: [Biopython-dev] [Bug 2678] Entrez.esearch does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811201706.mAKH6b1r006648@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #1 from lpritc at scri.sari.ac.uk 2008-11-20 12:06 EST ------- And this time, more usefully, traceback with problem code: >>> handle = Entrez.einfo() >>> record = Entrez.read(handle) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py", line 286, in read record = handler.run(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 95, in run self.parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 283, in external_entity_ref_handler parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 280, in external_entity_ref_handler handle = urllib.urlopen(systemId) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 87, in urlopen return opener.open(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open return getattr(self, name)(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 461, in open_file return self.open_local_file(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 475, in open_local_file raise IOError(e.errno, e.strerror, e.filename) IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 12:07:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:07:40 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811201707.mAKH7ej9006714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 lpritc at scri.sari.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Entrez.esearch does not |Bio.Entrez module does not |always retrieve or find DTD |always retrieve or find DTD |files |files -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 12:14:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:14:35 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811201714.mAKHEZj4007097@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #4 from cymon.cox at gmail.com 2008-11-20 12:14 EST ------- (In reply to comment #3) > (In reply to comment #0) > > Ive used the "Sequence Keys" ontology for the location operator and stored > > loc op in the location_qualifier_value table - not sure this is right... > > > > I'm not sure off hand either, but would like us to check before committing > this. In the short term, what ever BioPerl does is "right" as I'm treating > that as the BioSQL reference implementation. I don't read Perl - but I grep'ed through the source and only found one ref to the location_qualifier_value, and that was in the docs. So maybe they don't store it there... Sorry I can be of more help, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 17:01:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 17:01:13 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811202201.mAKM1Dce030238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-20 17:01 EST ------- Could you make a list of the missing DTDs? You add the missing ones to Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you may find other missing DTDs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 03:54:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 03:54:00 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811210854.mAL8s0Dt009861@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #3 from lpritc at scri.sari.ac.uk 2008-11-21 03:53 EST ------- (In reply to comment #2) > Could you make a list of the missing DTDs? You add the missing ones to > Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd > and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you > may find other missing DTDs. I'll add the DTDs that I noted above, but the problem is intermittent and I haven't seen the issue arise again at all, this morning. If I see anything else give an error, I'll make a note here. This may be something to keep in mind if other, similar errors are reported from future Entrez searches, but if the problem is the result of excessive server load, or timeouts, it may not be reliably repeatable. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 05:52:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 05:52:17 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811211052.mALAqHel020569@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 05:52 EST ------- (In reply to comment #4) > (In reply to comment #3) > > (In reply to comment #0) > > > Ive used the "Sequence Keys" ontology for the location operator and stored > > > loc op in the location_qualifier_value table - not sure this is right... > > > > > > > I'm not sure off hand either, but would like us to check before committing > > this. In the short term, what ever BioPerl does is "right" as I'm treating > > that as the BioSQL reference implementation. > > I don't read Perl - but I grep'ed through the source and only found one ref to > the location_qualifier_value, and that was in the docs. So maybe they don't > store it there... > > Sorry I can be of more help, C. > I tried browsing and searching the BioPerl-db source, but couldn't find the answer, so I tried the direct route and used their load_seqdatabase.pl script to import a GenBank file (with at least one join location) and inspected the tables. The answer is that location.term_id is always left as NULL, so there is no ontology to worry about. Doing something sensible with ontologies (e.g. support for existing strict ontologies like SO or SOFA) rather than the current ad-hoc relaxed approach (adding new ontology terms on the fly) taken by BioPerl and Biopython is a possible future enhancement. I'm going to look at modifying you patch to leave location.term_id as NULL, with the aim of committing that today and then doing the Biopython 1.49 release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 06:54:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 06:54:18 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811211154.mALBsIcR025739@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1073 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 06:59:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 06:59:08 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811211159.mALBx89Z026099@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1072 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 06:59 EST ------- (From update of attachment 1072) Hi Cymon, I've just checked in something based on your patches: Checking in BioSQL/Loader.py; /home/repository/biopython/biopython/BioSQL/Loader.py,v <-- Loader.py new revision: 1.37; previous revision: 1.36 done Checking in BioSQL/BioSeq.py; /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py new revision: 1.31; previous revision: 1.30 done Checking in Tests/test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.27; previous revision: 1.26 done This should fix the strand, feature db ref in locations, and importantly the start/end with sub-features. I am avoiding the ontology question by leaving location.term_id as NULL (following BioPerl usage). I'd like to do the same with location_qualifier_value.term_id but the schema does not allow NULL here. Interestingly BioPerl does not seem to use this table, so I assume they (like Biopython) have been assuming "join". I think this is still a big improvement, but that the (sub)feature.location_operator issue could wait. We'll need to discuss on the BioSQL mailing list how this should be handled consistently. Leaving this bug open. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 07:04:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 07:04:39 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811211204.mALC4dUW026607@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1048|application/octet-stream |text/plain mime type| | ------- Comment #23 from dalloliogm at gmail.com 2008-11-21 07:04 EST ------- (From update of attachment 1048) changed mime type -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 07:18:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 07:18:35 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811211218.mALCIZds027946@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 07:18 EST ------- Fixed in CVS revision 1.187 of biopython/Doc/Tutorial.tex by completely rephrasing to avoid the contentious sentence structure. See: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython Now reads: > There are two important di???erences between Seq objects and standard > python strings. First of all, they have di???erent methods. Although > the Seq object supports many of the same methods as a plain string, > its translate() method di???ers by doing biological translation, and > there are also additional biologically relevant methods like > reverse_complement(). Secondly, the Seq object has an important > attribute, alphabet, which is an object describing what the individual > characters making up the sequence string ???mean???, and how they should > be interpreted. For example, is AGTACACTGGT a DNA sequence, or just > a protein sequence that happens to be rich in Alanines, Glycines, > Cysteines and Threonines? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 21 07:38:07 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 12:38:07 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 Message-ID: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com> On Nov 20, Peter wrote: > No-one seems to want a second beta, which saves me some time :) > > There have been a few other bugs reported and fixed in the meantime, > right now the only thing I think holding up the release of Biopython > 1.49 is: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2677 > Bug 2677 - BioSQL seqfeature enhancements I've committed most of this bug fix to CVS, I think the remaining issue can wait until after Biopython 1.49 is out. > Is there anything else? If there are no last minute objections, my plan is to do the Biopython 1.49 release this afternoon, hopefully starting after lunch - in about one hour's time. Please **consider CVS frozen from now**. Hopefully I'll have the build done within the next 12 hours, including the Windows installers. Once the release is out, we'll give it a few days just in case there are any issues to force a re-release, and then reopen CVS. Tiago has some more PopGen code waiting, and there is also GenomeDiagram to look forward too (Bug 2671). Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 21 09:46:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 09:46:29 -0500 Subject: [Biopython-dev] [Bug 2680] New: Bio.AlignAce.Parser.py need to import string Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2680 Summary: Bio.AlignAce.Parser.py need to import string Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The file Bio.AlignAce.Parser.py needs to 'import string' because it uses the function 'string.atof()'. Also, please note that string.atof() is a depreciated function (since Python 2.0) but it will not get removed until Python 3. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 09:57:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 09:57:47 -0500 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: Message-ID: <200811211457.mALEvlR5009727@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2680 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 09:57 EST ------- This used to work via the "from Bio.ParserSupport import *", as up until Biopython 1.48 that imported string. Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be included in Biopython 1.49). I'm leaving this bug open as I would rather not use the string module here at all - probably we can just use float() instead of string.atof() but that can wait until after Biopython 1.49 is out. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Nov 21 10:19:22 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 21 Nov 2008 09:19:22 -0600 Subject: [Biopython-dev] Use of depreciated string functions Message-ID: <4926D17A.8080101@gmail.com> Hi, There are a number of files in Bio that import string. Many of these use depreciated functions (since Version 2) that are now string methods mainly string.atof(), string.atoi() and string.join(). The only real advantage of modifying these is to remove an import statement because these will not be removed until Python 3. Perhaps the one exception is in HotRand.py: hex_digit = string.hexdigits.find( letter ) There are about 23 unique files that I identified via grep and many have more than one usage. While changing these is busy work, please let me know if you would like me to create patches for the next version of Biopython (ie 1.50) or just ignore this. Thanks Bruce From biopython at maubp.freeserve.co.uk Fri Nov 21 10:26:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 15:26:52 +0000 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <4926D17A.8080101@gmail.com> References: <4926D17A.8080101@gmail.com> Message-ID: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey wrote: > Hi, > There are a number of files in Bio that import string. Many of these use > depreciated functions (since Version 2) that are now string methods mainly > string.atof(), string.atoi() and string.join(). The only real advantage of > modifying these is to remove an import statement because these will not be > removed until Python 3. > > Perhaps the one exception is in HotRand.py: hex_digit = > string.hexdigits.find( letter ) > > There are about 23 unique files that I identified via grep and many have > more than one usage. While changing these is busy work, please let me know > if you would like me to create patches for the next version of Biopython (ie > 1.50) or just ignore this. As you say, there isn't much benefit from doing this other than removing an import and making another small step towards Python 3.0 compatibility. We have gradually been phasing out "import string" already, usually when working on a module which used it. Once I've dealt with Biopython 1.49, I'd be happy to look at a patch to remove more "import string" usage from non-obsolete, non-deprecated code. It would be a little risky doing this to modules without unit tests, but that's another area you've shown some interest in anyway... Thanks, Peter From bartek at rezolwenta.eu.org Fri Nov 21 10:32:02 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 21 Nov 2008 16:32:02 +0100 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <200811211457.mALEvlR5009727@portal.open-bio.org> References: <200811211457.mALEvlR5009727@portal.open-bio.org> Message-ID: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> Hello, I fixed the bug (changed both uses of string.atof() to float() ), and commited to CVS, although I cannot close it in Bugzilla (my dev.open-bio account does not seem to work for bugzilla). cheers Bartek Wilczynski On Fri, Nov 21, 2008 at 3:57 PM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2680 > > > > > > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 09:57 EST ------- > This used to work via the "from Bio.ParserSupport import *", as up until > Biopython 1.48 that imported string. > > Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be > included in Biopython 1.49). > > I'm leaving this bug open as I would rather not use the string module here at > all - probably we can just use float() instead of string.atof() but that can > wait until after Biopython 1.49 is out. > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bugzilla-daemon at portal.open-bio.org Fri Nov 21 10:41:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 10:41:54 -0500 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: Message-ID: <200811211541.mALFfsDM013508@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2680 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 10:41 EST ------- Bartek's email: > Hello, > > I fixed the bug (changed both uses of string.atof() to float() ), > and commited to CVS, although I cannot close it in Bugzilla (my > dev.open-bio account does not seem to work for bugzilla). > > cheers > Bartek Wilczynski Marking this as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 21 10:45:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 15:45:42 +0000 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> References: <200811211457.mALEvlR5009727@portal.open-bio.org> <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> Message-ID: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski wrote: > Hello, > > I fixed the bug (changed both uses of string.atof() to float() ), and > commited to CVS, although I cannot close it in Bugzilla (my > dev.open-bio account does not seem to work for bugzilla). > > cheers > Bartek Wilczynski Thanks Bartek, I was partway through the build process for the Biopython 1.49 release, but I've got that latest Bio/AliceAce/Parser.py file now. I've closed Bug 2680 - I'm not sure how the permissions work on Bugzilla exactly... On a related note - could you write a unit test for Bio.AlignAce please? Thanks, Peter From biopython at maubp.freeserve.co.uk Fri Nov 21 11:07:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 16:07:00 +0000 Subject: [Biopython-dev] Warnings from epydoc Message-ID: <320fb6e00811210807xed03553x24e3abc571e9f20a@mail.gmail.com> Hi all, Something that I could have mentioned when I built the beta is there are a lot of warnings from epydoc. Ignoring a few from deprecated modules etc, there is a whole class as follows: Warning: Module Bio.KDTree.KDTree is shadowed by a variable with the same name. Warning: Module Bio.PDB.DSSP is shadowed by a variable with the same name. Warning: Module Bio.PDB.FragmentMapper is shadowed by a variable with the same name. Warning: Module Bio.PDB.NeighborSearch is shadowed by a variable with the same name. Warning: Module Bio.PDB.PDBIO is shadowed by a variable with the same name. Warning: Module Bio.PDB.PDBList is shadowed by a variable with the same name. Warning: Module Bio.PDB.PDBParser is shadowed by a variable with the same name. Warning: Module Bio.PDB.ResidueDepth is shadowed by a variable with the same name. Warning: Module Bio.PDB.StructureAlignment is shadowed by a variable with the same name. Warning: Module Bio.PDB.Superimposer is shadowed by a variable with the same name. Warning: Module Bio.PDB.Vector is shadowed by a variable with the same name. Warning: Module Bio.PDB.parse_pdb_header is shadowed by a variable with the same name. Warning: Module Bio.SVDSuperimposer.SVDSuperimposer is shadowed by a variable with the same name. Warning: Module Bio.SCOP.Residues is shadowed by a variable with the same name. One visible side effect of this in the epydoc output is these modules get shown with an apostrophe suffix for disambiguation. On another point, I think some of the imports used in Bio.PopGen are making epydoc unhappy: +------------------------------------------------------------------------------------------------- | In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Cache.py: | Import failed (but source code parsing was successful). | Error: ImportError: No module named PopGen.SimCoal.Controller (line 14) | +------------------------------------------------------------------------------------------------- | In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Async.py: | Import failed (but source code parsing was successful). | Error: ImportError: No module named PopGen.SimCoal.Controller (line 16) | Taking Bio/PopGen/SimCoal/Cache.py as an example, currently this has: from PopGen.SimCoal.Controller import SimCoalController from PopGen import Config Perhaps this should be changed to either local imports: from Controller import SimCoalController import Config or full imports: from Bio.PopGen.SimCoal.Controller import SimCoalController from Bio.PopGen import Config (Neither tested yet). I don't know if the current imports have any downsides (apart from upsetting epydoc), as the current code works and the unit tests pass. Peter From bsouthey at gmail.com Fri Nov 21 11:15:29 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 21 Nov 2008 10:15:29 -0600 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> References: <200811211457.mALEvlR5009727@portal.open-bio.org> <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> Message-ID: <4926DEA1.7020405@gmail.com> Peter wrote: > On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski > wrote: > >> Hello, >> >> I fixed the bug (changed both uses of string.atof() to float() ), and >> commited to CVS, although I cannot close it in Bugzilla (my >> dev.open-bio account does not seem to work for bugzilla). >> >> cheers >> Bartek Wilczynski >> > > Thanks Bartek, > > I was partway through the build process for the Biopython 1.49 > release, but I've got that latest Bio/AliceAce/Parser.py file now. > I've closed Bug 2680 - I'm not sure how the permissions work on > Bugzilla exactly... > > On a related note - could you write a unit test for Bio.AlignAce please? > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Hi Bartek, I just started on working through understanding the functionality of the code so it would be really great to the tests and a tutorial section on AlignAce. So far I know that there needs to be at least two tests for AlignAce: 1) Running Bio.AlignAce.AlignAceStandalone 2) Parsing the output from AlignAce There needs to be similar tests for CompareAce. Also, could you please add the following lines to your AlignAce2004 code (I downloaded it from your site yesterday) to standard.h? #include #include I needed these to compile AlignAce under Linux with gcc version 4.3.2. I would also suggest not to include binaries because they are statically linked to old C++ libraries. Running just './AlignACE' gives the error: ./AlignACE: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory Thanks Bruce From biopython at maubp.freeserve.co.uk Fri Nov 21 11:59:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 16:59:08 +0000 Subject: [Biopython-dev] Biopython 1.49 released Message-ID: <320fb6e00811210859n2d128fd6nc21ad1012e1d93bf@mail.gmail.com> Dear Biopythoneers, We are pleased to announce the release of Biopython 1.49. There have been some significant changes since Biopython 1.48 was released a few months ago, which is why we initially released a beta for wider testing. Thank you to all those who tried this and reported the minor problems uncovered. As previously announced, the big news is that Biopython now uses NumPy rather than its precursor Numeric (the original Numerical Python library). As in the previous releases, Biopython 1.49 supports Python 2.3, 2.4 and 2.5 but should now also work fine on Python 2.6. Please note that we intend to drop support for Python 2.3 in a couple of releases time. We also have some new functionality, starting with the basic sequence object (the Seq class) which now has more methods. This encourages a more object orientated coding style, and makes basic biological operations like transcription and translation more accessible and discoverable. Our BioSQL interface can now optionally fetch the NCBI taxonomy on demand when loading sequences (via Bio.Entrez) allowing you to populate the taxon/taxon_name tables gradually. Also, BioSQL should now work with the psycopg2 driver for PostgreSQL (as well as the older psycopg driver), and the handling of feature locations has also been improved. We've also updated the Biopython Tutorial and Cookbook (also available in PDF). http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now considered to be deprecated, meaning mxTextTools is no longer required to use Biopython. This should not affect any of the typically used parsers (e.g. Bio.SeqIO and Bio.AlignIO). Given there have been more changes than in recent Biopython releases, please do check your old scripts still work fine, and let us know on the mailing list or file a bug if there is anything wrong. Source distributions and Windows installers are available from the Biopython website: http://biopython.org/wiki/Download Thanks! -Peter on behalf of the Biopython developers P.S. You may wish to subscribe to our news feed. For RSS links etc, see: http://biopython.org/wiki/News From biopython at maubp.freeserve.co.uk Fri Nov 21 12:05:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 17:05:46 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 In-Reply-To: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com> References: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com> Message-ID: <320fb6e00811210905i4835819bvb4955b05658ef535@mail.gmail.com> > If there are no last minute objections, my plan is to do the Biopython > 1.49 release this afternoon, hopefully starting after lunch - in about > one hour's time. > > Please **consider CVS frozen from now**. Hopefully I'll have the > build done within the next 12 hours, including the Windows installers. OK, the release is out. Thanks everyone! I haven't sat down and counted, but it feels like there were more people involved and taking an interest than for Biopython 1.48, which is great. > Once the release is out, we'll give it a few days just in case there > are any issues to force a re-release, and then reopen CVS. The CVS "freeze" is over, but for the next couple of days, please only commit small bug fixes and documentation improvements. Baring any surprises, we can expect to start looking at adding new code mid next week: > Tiago has some more PopGen code waiting, and there is also > GenomeDiagram to look forward too (Bug 2671). Have a good weekend, Regards, Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 21 12:24:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 12:24:55 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811211724.mALHOt8x003395@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 12:24 EST ------- Looking at the code for the external_entity_ref_handler function in Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files. Would this be a worthwhile enhancement? We would have to cope with the fact that the process may not have permissions to write to the DTD directory, perhaps by falling back on the system temp folder? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 14:22:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 14:22:36 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200811211922.mALJMa8Q011752@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #3 from joelb at lanl.gov 2008-11-21 14:22 EST ------- I never heard back from info at genbank, so I found a different contact there and I just re-sent the problem. I'll follow up when I hear something. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 14:31:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 14:31:26 -0500 Subject: [Biopython-dev] [Bug 2681] New: BioSQL: record annotations enhancements Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2681 Summary: BioSQL: record annotations enhancements Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com BioSQL storage and retrieval of record annotations. See also bug 2396. Patch fixes 3 annotations: 1) Fixed date/dates typo. 2) comment's were being stored by not retrieved - fixed with test. 3) A 'reference' annotation, even if an empty list, was being retrieved in a DBSeqRecord. Fixed so that if there are no references there is no annotation in DBSeqRecord. Other annotations: 'date', 'ncbi_taxid', 'gi', and 'contig' are the only annotations we are not handling correctly in the test suite. 'date' can be ignored if present in DBSeqRecord but absent in SeqRecord because the current date is entered into table if a date is not present in the record. Annotation 'ncbi_taxid' will be present in the DBSeqRecords even when not present in the loaded SeqRecord as they are grabbed from the taxon table. We can therefore ignore this specific comparision: old record absent, new record present. Some swiss prot SeqRecords have ncbi_taxid and they are retrieved correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing from the retrieved DBSeqRecord: sp012, sp014, Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in bioentry, if the gi annotation is missing, which is pulled as the gi annotation. So the swissprot, fasta, and embl DBSeqRecords return the accession as the gi (GenBank identifier). I think this is misleading; annotation 'gi' in the DBSeqRecord should really be named a more generic 'identifier'... What to do here? 'contig' is ignored by loader because it's a SeqFeature object. Is there any reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 14:32:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 14:32:43 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811211932.mALJWhXP012653@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #1 from cymon.cox at gmail.com 2008-11-21 14:32 EST ------- Created an attachment (id=1074) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1074&action=view) BioSQL patch for enhancements to record annotations -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 17:41:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 17:41:16 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811212241.mALMfGT8026797@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 17:41 EST ------- (In reply to comment #0) > 1) Fixed date/dates typo. Why is it a typo? Change not checked in. > 2) comment's were being stored by not retrieved - fixed with test. Looks good, except for returning an empty list if there were no comments. > 3) A 'reference' annotation, even if an empty list, was being retrieved in a > DBSeqRecord. Fixed so that if there are no references there is no annotation > in DBSeqRecord. I agree, but preferred a smaller change for this: Checking in BioSQL/BioSeq.py; /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py new revision: 1.33; previous revision: 1.32 done Checking in Tests/test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.29; previous revision: 1.28 done This was based closely on your patch, so thank you! You are making steady progress through the remaining "TODO" notes I left when writing test_BioSQL_SeqIO.py :) > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing > from the retrieved DBSeqRecord: sp012, sp014, Note some swiss prot records may be multi-species, which the BioSQL schema can't cope with. Not sure if that applies here. > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in > bioentry, if the gi annotation is missing, which is pulled as the gi > annotation. There probably is something not quite right here. Are you talking about the bioentry.identifier entry in the database? Perhaps an explicit example might help. As an aside, I think "gi" (GeneIndex used by NCBI) might be better stored in the record.dbxrefs, but that could be a parser change... > 'contig' is ignored by loader because it's a SeqFeature object. Is there any > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) I couldn't even say off hand how the CONTIG line in that example would be parsed, let alone how it gets dealt with when loading into BioSQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 17:42:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 17:42:33 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811212242.mALMgXAN026914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 17:42 EST ------- P.S. For a little background, see Bug 2396. Looking back I can see why I missed the comments annotation at the time (being stored in a different table). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 18:47:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 18:47:13 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811212347.mALNlDsF030565@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-21 18:47 EST ------- (In reply to comment #4) > Looking at the code for the external_entity_ref_handler function in > Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files. > > Would this be a worthwhile enhancement? We would have to cope with the fact > that the process may not have permissions to write to the DTD directory, > perhaps by falling back on the system temp folder? > I think that there is an easier solution, which is to include all missing DTDs with the Biopython installation. The number of DTDs is limited; I tried to identify all of them but apparently I missed some. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 18:49:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 18:49:27 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811212349.mALNnRMn030720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-11-21 18:49 EST ------- > I'll add the DTDs that I noted above, but the problem is intermittent and I > haven't seen the issue arise again at all, this morning. If I see anything > else give an error, I'll make a note here. > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read it from there. If not, it tries to download it. This may fail if the servers are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when Biopython is installed), you won't run into this problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 23 10:16:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 23 Nov 2008 10:16:53 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811231516.mANFGraa019222@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #7 from dalloliogm at gmail.com 2008-11-23 10:16 EST ------- (In reply to comment #0) > The major changes that have been made to the version previously available at > http://bioinf.scri.ac.uk/lp are: That's a very nice contribution, thank you!!! This link is wrong, I think you mean http://bioinf.scri.ac.uk/lp/programs.php#genomediagram > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From dalloliogm at gmail.com Sun Nov 23 12:33:54 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 23 Nov 2008 18:33:54 +0100 Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython Message-ID: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com> Hi people, I thought that the inclusion of GenomeDiagrams in biopython is such an interesting news, that I wrote a blog post on it: - http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/ I have used images from some tutorials without asking, I hope it is not a problem. Cheers! :) On Sun, Nov 23, 2008 at 4:16 PM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2671 > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From mjldehoon at yahoo.com Mon Nov 24 01:44:13 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 23 Nov 2008 22:44:13 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework Message-ID: <871524.42970.qm@web62403.mail.re1.yahoo.com> Hi everybody, Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result. Biopython uses test scripts that print output to stdout, together with an output file that contains the correct output. After running each test script, it compares the generated output with the correct output to see if the test was successful. This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result. However, more than half of Biopython's tests do not actually make use of this testing framework: test_BioSQL test_CAPS test_Cluster test_CodonTable test_Compass test_Crystal test_DocSQL test_EmbossPrimer test_Entrez test_Fasta test_GACrossover test_GAMutation test_GAOrganism test_GAQueens test_GARepair test_GASelection test_GFF test_GFF2 test_GraphicsChromosome test_GraphicsDistribution test_GraphicsGeneral test_HMMCasino test_HMMGeneral test_HotRand test_KDTree test_KeyWList test_LogisticRegression test_Medline test_NNExclusiveOr test_NNGene test_NNGeneral test_Pathway test_PopGen_FDist test_PopGen_FDist_nodepend test_PopGen_SimCoal test_PopGen_SimCoal_nodepend test_Registry test_Restriction test_SCOP_Astral test_SCOP_Cla test_SCOP_Des test_SCOP_Dom test_SCOP_Hie test_SCOP_Raf test_SCOP_Residues test_SCOP_Scop test_Wise test_docstrings test_kNN test_lowess test_psw These tests have trivial output, for example test_Cluster: test_Cluster test_clusterdistance (test_Cluster.TestCluster) ... ok test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok test_kcluster (test_Cluster.TestCluster) ... ok test_matrix_parse (test_Cluster.TestCluster) ... ok test_median_mean (test_Cluster.TestCluster) ... ok test_somcluster (test_Cluster.TestCluster) ... ok test_treecluster (test_Cluster.TestCluster) ... ok ---------------------------------------------------------------------- Ran 7 tests in 0.015s OK I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython. Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior. I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality. Comments, suggestions, anybody? --Michiel. From dalloliogm at gmail.com Mon Nov 24 04:04:08 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 24 Nov 2008 10:04:08 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com> References: <871524.42970.qm@web62403.mail.re1.yahoo.com> Message-ID: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com> On Mon, Nov 24, 2008 at 7:44 AM, Michiel de Hoon wrote: > Hi everybody, > > Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result. Hi, I was also proposing to use the doctest framework for some of the modules, and for enhancing documentation. - http://bugzilla.open-bio.org/show_bug.cgi?id=2640 > Biopython uses test scripts that print output to stdout, together with an output file that contains the > correct output. After running each test script, it compares the generated output with the correct > output to see if the test was successful. > > This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result. > > However, more than half of Biopython's tests do not actually make use of this testing framework: > Do you need help in re-organizing all of these modules? > test_BioSQL > test_CAPS > test_Cluster > test_CodonTable > test_Compass > test_Crystal > test_DocSQL > test_EmbossPrimer > test_Entrez > test_Fasta > test_GACrossover > test_GAMutation > test_GAOrganism > test_GAQueens > test_GARepair > test_GASelection > test_GFF > test_GFF2 > test_GraphicsChromosome > test_GraphicsDistribution > test_GraphicsGeneral > test_HMMCasino > test_HMMGeneral > test_HotRand > test_KDTree > test_KeyWList > test_LogisticRegression > test_Medline > test_NNExclusiveOr > test_NNGene > test_NNGeneral > test_Pathway > test_PopGen_FDist > test_PopGen_FDist_nodepend > test_PopGen_SimCoal > test_PopGen_SimCoal_nodepend > test_Registry > test_Restriction > test_SCOP_Astral > test_SCOP_Cla > test_SCOP_Des > test_SCOP_Dom > test_SCOP_Hie > test_SCOP_Raf > test_SCOP_Residues > test_SCOP_Scop > test_Wise > test_docstrings > test_kNN > test_lowess > test_psw > > These tests have trivial output, for example test_Cluster: > > test_Cluster > test_clusterdistance (test_Cluster.TestCluster) ... ok > test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok > test_kcluster (test_Cluster.TestCluster) ... ok > test_matrix_parse (test_Cluster.TestCluster) ... ok > test_median_mean (test_Cluster.TestCluster) ... ok > test_somcluster (test_Cluster.TestCluster) ... ok > test_treecluster (test_Cluster.TestCluster) ... ok > > ---------------------------------------------------------------------- > Ran 7 tests in 0.015s > > OK > > I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython. > > Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior. > > I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality. > > Comments, suggestions, anybody? > > --Michiel. > > > > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bartek at rezolwenta.eu.org Mon Nov 24 07:45:52 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 24 Nov 2008 13:45:52 +0100 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> References: <200811211457.mALEvlR5009727@portal.open-bio.org> <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> Message-ID: <8b34ec180811240445w3e6e97d8k38c1740e84372184@mail.gmail.com> On Fri, Nov 21, 2008 at 4:45 PM, Peter wrote: > > On a related note - could you write a unit test for Bio.AlignAce please? > Hi Peter, I do not have much experience with writing unit tests but I would like to do it (treating it as an opportunity to learn more on unit tests). There are two issues which are somewhat related to this: - I have some more code related to sequence motif analysis which I'm using myself and could contribute as an extension to BIo.AlignACE. If people are interested in having this in biopython, it would be sensible to think about refactoring Bio.AlignACE and Bio.MEME which both provide a Motif class with largely overlapping functionality. I could do that and at the same time write unit tests for the new version. For that it would be cool to get input from all current or potential users of this functionality. I'll think about it a little and maybe write to biopython-users list. - The other issue is connected with the type of the tests I should write. Since Michiel brought this topic up recently, I'd like to know whether I should do it in the python (doctest) or biopython way. cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bartek at rezolwenta.eu.org Mon Nov 24 09:51:12 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 24 Nov 2008 15:51:12 +0100 Subject: [Biopython-dev] Refactoring motif analysis code Message-ID: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> Hello All, Currently, there are two packages dealing with motif analysis in biopython : Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney). Both of them are quite old and they were developed independently so the functionality is largely overlapping. Particularly the files AlignAce/Motif.py and MEME/Motif.py contain almost identical functionality useful for anyone interested in motif analysis of writing a parser for yet another motif searching tool. I'd like to change this and create a new library called Bio.Motif, which would contain: -Motif class for all general functionality concerning motif objects: i/o, comparisons, sequence scanning -AlignAce Parser -MEME Parser When this is completed, we could deprecate the AlignAce and MEME modules. For AlignAce I have most of the code already written, I need to rewrite portions of MEME parser to work with different motif implementation (not a major pain). Then I just need to polish it a bit and provide tests and a short tutorial. After this rather long intro I'd like to ask about several things: - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy about deprecating them? - Are there any features which people would find valuable in Bio.Motif - Both MEME and AlignAce are DNA-oriented, I've never worked on Protein motifs myself, but I'd like to know whether anyone is interested in using Bio.Motif for that Any comments/ideas are welcome cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From dalloliogm at gmail.com Mon Nov 24 10:25:23 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 24 Nov 2008 16:25:23 +0100 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> Message-ID: <5aa3b3570811240725n54f7f624oc1db5fe0b88e3f5a@mail.gmail.com> On Mon, Nov 24, 2008 at 3:51 PM, Bartek Wilczynski wrote: > Hello All, > > Currently, there are two packages dealing with motif analysis in biopython : > Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney). Hi, I asked a question about motifs one year ago on this list. Here it is the thread: - http://lists.open-bio.org/pipermail/biopython/2007-September/003727.html I would just like to tell you that I have tried the TAMO framework you suggested me, and found it very useful. I am not using it anymore because I don't need it, but I remember that I liked: - the methods to represent motifs as matrixes of frequencies/occurrencies etc.. - the fact that it was easy to create a motif from an alignment of sequences - the integration it had with this website: http://weblogo.berkeley.edu/logo.cgi. I would suggest you to provide integration with this other web service, which enable to plot the difference between two sequence logos: http://www.twosamplelogo.org/examples.html. Maybe you should contact TAMO's author to ask him if he wants to contribute, because I remember that its framework was really complete. > > Both of them are quite old and they were developed independently so > the functionality is largely overlapping. > Particularly the files AlignAce/Motif.py and MEME/Motif.py contain > almost identical functionality useful for > anyone interested in motif analysis of writing a parser for yet > another motif searching tool. > > I'd like to change this and create a new library called Bio.Motif, > which would contain: > -Motif class for all general functionality concerning motif objects: > i/o, comparisons, sequence scanning > -AlignAce Parser > -MEME Parser > > When this is completed, we could deprecate the AlignAce and MEME > modules. For AlignAce I have most of the code > already written, I need to rewrite portions of MEME parser to work > with different motif implementation (not a major pain). > Then I just need to polish it a bit and provide tests and a short tutorial. > > After this rather long intro I'd like to ask about several things: > - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy > about deprecating them? > - Are there any features which people would find valuable in Bio.Motif > - Both MEME and AlignAce are DNA-oriented, I've never worked on > Protein motifs myself, but I'd like to know whether anyone is > interested in using Bio.Motif for that > > Any comments/ideas are welcome > > cheers > Bartek > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bsouthey at gmail.com Mon Nov 24 10:54:32 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 24 Nov 2008 09:54:32 -0600 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> Message-ID: <492ACE38.1090301@gmail.com> Bartek Wilczynski wrote: > Hello All, > > Currently, there are two packages dealing with motif analysis in biopython : > Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney). > Actually I am not that thrilled with the licenses for these packages and similar packages because these are free only for academic use. To me this clashes with the spirit of an open-sourced project especially a BSD-licensed one. But if there is a need for such modules then these modules should be included. > Both of them are quite old and they were developed independently so > the functionality is largely overlapping. > Particularly the files AlignAce/Motif.py and MEME/Motif.py contain > almost identical functionality useful for > anyone interested in motif analysis of writing a parser for yet > another motif searching tool. > > I'd like to change this and create a new library called Bio.Motif, > which would contain: > -Motif class for all general functionality concerning motif objects: > i/o, comparisons, sequence scanning > -AlignAce Parser > -MEME Parser > > While it is only free for academic use, have you seen TAMO? *TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs. * Bioinformatics. 2005 Jul 15;21(14):3164-5. http://fraenkel.mit.edu/TAMO/ > When this is completed, we could deprecate the AlignAce and MEME > modules. For AlignAce I have most of the code > already written, I need to rewrite portions of MEME parser to work > with different motif implementation (not a major pain). > Then I just need to polish it a bit and provide tests and a short tutorial. > > After this rather long intro I'd like to ask about several things: > - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy > about deprecating them? > Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-) Based on the CVS, both have been untouched for about three years. Also, what species are these used for? One of the papers of AlignAce indicate that the base composition was set for yeast. > - Are there any features which people would find valuable in Bio.Motif > - Both MEME and AlignAce are DNA-oriented, I've never worked on > Protein motifs myself, but I'd like to know whether anyone is > interested in using Bio.Motif for that > > Any comments/ideas are welcome > > cheers > Bartek > > Personally I would be interested in a general protein motif finding module because of my current research. However, I do have a different view with respect to the Biopython community as indicated above with the licenses. Bruce From bsouthey at gmail.com Mon Nov 24 12:47:21 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 24 Nov 2008 11:47:21 -0600 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> References: <4926D17A.8080101@gmail.com> <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> Message-ID: <492AE8A9.1000406@gmail.com> Peter wrote: > On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey wrote: > >> Hi, >> There are a number of files in Bio that import string. Many of these use >> depreciated functions (since Version 2) that are now string methods mainly >> string.atof(), string.atoi() and string.join(). The only real advantage of >> modifying these is to remove an import statement because these will not be >> removed until Python 3. >> >> Perhaps the one exception is in HotRand.py: hex_digit = >> string.hexdigits.find( letter ) >> >> There are about 23 unique files that I identified via grep and many have >> more than one usage. While changing these is busy work, please let me know >> if you would like me to create patches for the next version of Biopython (ie >> 1.50) or just ignore this. >> > > As you say, there isn't much benefit from doing this other than > removing an import and making another small step towards Python 3.0 > compatibility. We have gradually been phasing out "import string" > already, usually when working on a module which used it. > > Once I've dealt with Biopython 1.49, I'd be happy to look at a patch > to remove more "import string" usage from non-obsolete, non-deprecated > code. It would be a little risky doing this to modules without unit > tests, but that's another area you've shown some interest in anyway... > > Thanks, > > Peter > > Hi, I was planning to get started on with these depending on what time I have available. So just a quick question: Do you want one bug report per patch per file? Or just let me know if there is another way. Thanks Bruce From biopython at maubp.freeserve.co.uk Mon Nov 24 13:42:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Nov 2008 18:42:08 +0000 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <492AE8A9.1000406@gmail.com> References: <4926D17A.8080101@gmail.com> <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> <492AE8A9.1000406@gmail.com> Message-ID: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com> On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey wrote: >> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch >> to remove more "import string" usage from non-obsolete, non-deprecated >> code. It would be a little risky doing this to modules without unit >> tests, but that's another area you've shown some interest in anyway... >> >> Thanks, >> >> Peter > > Hi, > I was planning to get started on with these depending on what time I have > available. So just a quick question: > Do you want one bug report per patch per file? > Or just let me know if there is another way. I'd suggest one general bug, and uploading one patch per module - that way the can be evaluated on a case by case basis (a single huge multi-file patch would be more difficult, and could become out of date). Personally however, I would prioritise more unit test coverage over this, but on the other hand its the kind of short task you can handle when you have the odd spare 10 minutes. Up to you. Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 24 15:40:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 15:40:49 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811242040.mAOKenEi002020@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #4 from cymon.cox at gmail.com 2008-11-24 15:40 EST ------- (In reply to comment #2) > (In reply to comment #0) > > 1) Fixed date/dates typo. > > Why is it a typo? Change not checked in. The function _load_bioentry_date in Loader.py inserts the annotation 'date', if present, or the current date if not, into the bioentry_qualifier_value table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, which should be 'date' and not 'dates'. Also, because Loader.py handles dates separately, they should not be handled by the function load_annotations. > > 2) comment's were being stored by not retrieved - fixed with test. > > Looks good, except for returning an empty list if there were no comments. > > > 3) A 'reference' annotation, even if an empty list, was being retrieved in a > > DBSeqRecord. Fixed so that if there are no references there is no annotation > > in DBSeqRecord. > > I agree, but preferred a smaller change for this: > > Checking in BioSQL/BioSeq.py; > /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py > new revision: 1.33; previous revision: 1.32 > done > Checking in Tests/test_BioSQL_SeqIO.py; > /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- > test_BioSQL_SeqIO.py > new revision: 1.29; previous revision: 1.28 > done Actually, your version of _retrieve_comment never returns comments ;-) On the wider issue: perhaps, it's best if DBSeqRecord's always have the same set of attributes, even if comments and references are empty lists. Trying to regenerate the attributes present in the loaded SeqRecord is, I think, not the way to go, and not possible (or at least currently not attempted) for fasta records. Perhaps we should be coding around the issue in the test suite rather than changing the attributes of the DBSeqRecord so that it passes the test... > > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved > > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing > > from the retrieved DBSeqRecord: sp012, sp014, > > Note some swiss prot records may be multi-species, which the BioSQL schema > can't cope with. Not sure if that applies here. Yep, thats exactly what was causing the problem. Currently the code refuses to load an ncbi_taxid, which I think is correct, after all which one should be loaded? Anyway, I'll look into this a bit more... > > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved > > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in > > bioentry, if the gi annotation is missing, which is pulled as the gi > > annotation. > > There probably is something not quite right here. Are you talking about the > bioentry.identifier entry in the database? Perhaps an explicit example might > help. As an aside, I think "gi" (GeneIndex used by NCBI) might be better > stored in the record.dbxrefs, but that could be a parser change... Ah, OK, will look further into this as well... > > 'contig' is ignored by loader because it's a SeqFeature object. Is there any > > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) > > I couldn't even say off hand how the CONTIG line in that example would be > parsed, let alone how it gets dealt with when loading into BioSQL. Well, the parser correctly deals with it as a SeqFeature (with a whole bunch of sub_features) but it never gets loaded its not dealt with at all an falls of the bottom of the function; I cant see any reason not to load it... C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 24 16:40:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 16:40:24 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811242140.mAOLeO8n008996@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #5 from cymon.cox at gmail.com 2008-11-24 16:40 EST ------- (In reply to comment #2) > (In reply to comment #0) > > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved > > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in > > bioentry, if the gi annotation is missing, which is pulled as the gi > > annotation. > > There probably is something not quite right here. Are you talking about the > bioentry.identifier entry in the database? Perhaps an explicit example might > help. As an aside, I think "gi" (GeneIndex used by NCBI) might be better > stored in the record.dbxrefs, but that could be a parser change... The "gi" annotation of a parsed GenBank record refers to this GenInfo Identifier: >From NCBI: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#GInB """ "GenInfo Identifier" sequence identification number, in this case, for the nucleotide sequence. If a sequence changes in any way, a new GI number will be assigned. GI sequence identifiers run parallel to the new accession.version system of sequence identifiers. """ This is stored in bioentry.identifier. However, "gi"'s are not present in swissprot, fasta, and embl records, instead the following couplet loads the record.id into the identifier slot: Loader.py: 519 if "gi" in record.annotations : 520 identifier = record.annotations["gi"] 521 else : 522 identifier = record.id But of course, the record.id is not the "gi" - so perhaps the bioentry.identifier should be left NULL if the "gi" number is missing. Or we might consider calling the DBSeqRecord attribute "identifier" rather than "gi"... Here's an example of an EMBL file where the record.id becomes the "gi": Testing loading from embl format file EMBL/TRBG361.embl - AAACAAACCAAATATGGAT...AAA [jfp/7BKv3jTJAU/4jVMrSftEq20] len 1859, X56734.1 - Retrieving by name/display_id 'X56734', old annos diff: set([]) new annos diff: set(['dates', 'ncbi_taxid', 'gi']) OLD: taxonomy = ['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'core eudicotyledons', 'rosids', 'eurosids I', 'Fabales', 'Fabaceae', 'Papilionoideae', 'Trifolieae', 'Trifolium'] references = [, ] accessions = ['X56734', 'S46826'] data_file_division = PLN organism = Trifolium repens (white clover) sequence_version = 1 NEW: dates = ['24-NOV-2008'] ncbi_taxid = 3899 references = [, ] accessions = ['X56734', 'S46826'] data_file_division = PLN taxonomy = ['Trifolium repens (white clover)'] gi = X56734.1 organism = Trifolium repens (white clover) sequence_version = ['1'] ncbi_taxid: 3899 C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 24 17:51:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 17:51:37 -0500 Subject: [Biopython-dev] [Bug 2683] New: Modules with unused string modules Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2683 Summary: Modules with unused string modules Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com This is a trivial general bug for any Biopython modules that import the string module but do not use it. A different bug will be used for those modules that actually use any depreciated string functions. Please attach any similar modules to this report. AlignAce modules: Bio/AlignAce/AlignAceStandalone.py Bio/AlignAce/CompareAceStandalone.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 24 18:05:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 18:05:27 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811242305.mAON5Rs2017499@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #6 from cymon.cox at gmail.com 2008-11-24 18:05 EST ------- (In reply to comment #4) > (In reply to comment #2) > > (In reply to comment #0) > > > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved > > > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing > > > from the retrieved DBSeqRecord: sp012, sp014, > > > > Note some swiss prot records may be multi-species, which the BioSQL schema > > can't cope with. Not sure if that applies here. > > Yep, thats exactly what was causing the problem. Currently the code refuses to > load an ncbi_taxid, which I think is correct, after all which one should be > loaded? Anyway, I'll look into this a bit more... So, how best to handle records with multiple taxa: SwissProt/sp014 has 10 organisms which are currently loaded directly into the taxon_name table: biosql_test=# select name, name_class from taxon_name where taxon_id = 94; name | name_class ------------------------------------------------------------------------------ Oryza sativa (Rice), Nicotiana tabacum (Common tobacco) Hordeum vulgare (Barley), Triticum aestivum (Wheat) Secale cereale (Rye), Zea mays (Maize), Pisum sativum (Garden pea) Spinacia oleracea (Spinach), Capsicum annuum (Bell pepper) Mesembryanthemum crys | scientific name (1 row) That's clearly not a scientific name... The record has the ncbi_taxon_ids: OX NCBI_TaxID=4530, 4097, 4513, 4565, 4550, 4577, 3888, 3562, 4072, 3544, 19 OX 3555, 3696; Which are currently not stored because there is more than one: Loader.py: 150 ncbi_taxon_id = None 151 if "ncbi_taxid" in record.annotations : 152 #Could be a list of IDs. 153 if isinstance(record.annotations["ncbi_taxid"],list) : 154 if len(record.annotations["ncbi_taxid"])==1 : 155 ncbi_taxon_id = record.annotations["ncbi_taxid"][0] 156 else : 157 ncbi_taxon_id = record.annotations["ncbi_taxid"] BioSQL is clearly not designed to store records from multiple taxa: one bioentry has one taxon_id. Should biopython be refusing to load such records if the scientific name is not a binomial? What does perl do? C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Mon Nov 24 23:08:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 24 Nov 2008 20:08:18 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com> Message-ID: <199296.58154.qm@web62402.mail.re1.yahoo.com> > > However, more than half of Biopython's tests do > > not actually make use of this testing framework: > > Do you need help in re-organizing all of these modules? That would be helpful, but let's see first if there are any objections to my proposal. We'll also have to decide the pathway to change the tests without breaking anything. For the unit tests I listed, the changes should be trivial, but still we need to check if any problems show up. Thanks! --Michiel. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 09:31:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 09:31:18 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251431.mAPEVIYj014396@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #1 from bsouthey at gmail.com 2008-11-25 09:31 EST ------- Bio/Crystal/__init__.py imports but does appear to use the following modules: array string Seq MutableSeq -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 09:40:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 09:40:23 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251440.mAPEeN8f015160@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #2 from barwil at gmail.com 2008-11-25 09:40 EST ------- > AlignAce modules: > Bio/AlignAce/AlignAceStandalone.py > Bio/AlignAce/CompareAceStandalone.py > Fixed in CVS now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Tue Nov 25 09:40:41 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 25 Nov 2008 09:40:41 -0500 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com> References: <871524.42970.qm@web62403.mail.re1.yahoo.com> Message-ID: <20081125144041.GC83220@sobchak.mgh.harvard.edu> Hi Michiel; Good thoughts on this; my comments are below. > Biopython's testing framework is built on top of Python's unit testing > framewerk. Python's unit testing framework makes use of assertion > statements to compare the result of a command to the expected result. > Biopython uses test scripts that print output to stdout, together with > an output file that contains the correct output. After running each > test script, it compares the generated output with the correct output > to see if the test was successful. Agreed with the distinction between the unit tests and the "dump lots of text and compare" approach. I've written both and do think the unit testing/assertion model is more robust since you can go back and actually get some insight into what someone was thinking when they wrote an assertion. > However, more than half of Biopython's tests do not actually make use of this testing framework: [...] > These tests have trivial output, for example test_Cluster: > > test_Cluster > test_clusterdistance (test_Cluster.TestCluster) ... ok > test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok > test_kcluster (test_Cluster.TestCluster) ... ok > test_matrix_parse (test_Cluster.TestCluster) ... ok > test_median_mean (test_Cluster.TestCluster) ... ok > test_somcluster (test_Cluster.TestCluster) ... ok > test_treecluster (test_Cluster.TestCluster) ... ok They really do make use of the framework, but at a higher level. I agree that if you run a single test it makes little difference whether you use 'run_tests.py test_Cluster' or just run 'test_Cluster.py' directly. However, when you are running all the tests as is regular done in development or before pushing releases, this comparison is important. It will will pick out if you get a line like: test_clusterdistance (test_Cluster.TestCluster) ... ERROR instead of the expected ok and report this in the summary for all of the tests. Otherwise this is likely to get lost in all of the results. > Personally, I find Python's unit testing framework easier to > understand than Biopython's testing framework. It doesn't need a > separate output file, and it is easier to match each line of code with > the correct behavior. > > I would therefore like to suggest to move from Biopython's testing > framework to Python's testing framework. This also relieves us of the > task of explaining Biopython's testing framework to contributors, > and allows us to make better use of what Python already provides. > Comparing output line-by-line, as Biopython's testing framework > currently does, can still be used by test scripts that need this > functionality. Is the testing framework you are proposing different from the unit tests used the individual tests? How does your proposed manage the higher level functionality of checking if all sub-tests within one of the test suites passes? Brad From bugzilla-daemon at portal.open-bio.org Tue Nov 25 10:24:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 10:24:33 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251524.mAPFOXe2019581@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #3 from bsouthey at gmail.com 2008-11-25 10:24 EST ------- Bio/FilteredReader.py imports but does appear to use the following modules: os string copy from File import UndoHandle -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 11:13:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:13:01 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811251613.mAPGD1FG024870@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #7 from cymon.cox at gmail.com 2008-11-25 11:13 EST ------- (In reply to comment #6) > (From update of attachment 1072 [details]) > I think this is still a big improvement, but that the > (sub)feature.location_operator issue could wait. We'll need to discuss on the > BioSQL mailing list how this should be handled consistently. > > Leaving this bug open. Further to the "where to put the (sub)feature.location_operator" (eg. "join", "order") question, this comment appears in the BioPerl MySQL schema for the location_qualifier_value table: -- location qualifiers - mainly intended for fuzzies but anything -- can go in here -- some controlled vocab terms have slots; So, this would seem a suitable place to store the attribute. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 11:13:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:13:07 -0500 Subject: [Biopython-dev] [Bug 2684] New: GenBank/__init__.py: Removing loop over string.whitespace Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2684 Summary: GenBank/__init__.py: Removing loop over string.whitespace Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The function '_clean_location' in GenBank/__init__.py uses a 'for' loop over string.whitespace that removes whitespace from string. A simpler way is to just split the string on whitespace and rejoin it as a single line: location_line=''.join(location_string.split()) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 11:14:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:14:19 -0500 Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over string.whitespace In-Reply-To: Message-ID: <200811251614.mAPGEJvT025100@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2684 ------- Comment #1 from bsouthey at gmail.com 2008-11-25 11:14 EST ------- Created an attachment (id=1083) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1083&action=view) Removal of unnessary loop over string.whitespace -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 11:30:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:30:01 -0500 Subject: [Biopython-dev] [Bug 2685] New: HotRand provides an unnecessary function to convert hex to integer Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2685 Summary: HotRand provides an unnecessary function to convert hex to integer Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The file Bio/HotRand.py defines the function hex_convert that converts a hex number to an integer number. This functionality is provided by the builtin int() with appropriate radix, i.e. int(hex_number, 16) This function could be removed or replaced to avoiding using the string module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 11:31:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:31:09 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811251631.mAPGV91O027180@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 ------- Comment #1 from bsouthey at gmail.com 2008-11-25 11:31 EST ------- Created an attachment (id=1084) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1084&action=view) Replaces hex_convert() with int() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 11:52:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:52:12 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811251652.mAPGqCMt029684@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1084 is|0 |1 obsolete| | ------- Comment #2 from bsouthey at gmail.com 2008-11-25 11:52 EST ------- Created an attachment (id=1085) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1085&action=view) Messed up the first patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 11:53:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:53:41 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811251653.mAPGrfPk029811@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1085 is|0 |1 obsolete| | ------- Comment #3 from bsouthey at gmail.com 2008-11-25 11:53 EST ------- Created an attachment (id=1086) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1086&action=view) Sorry wrong version -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 13:18:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 13:18:59 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251818.mAPIIxQt006109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #4 from bsouthey at gmail.com 2008-11-25 13:18 EST ------- These are the last files that I have found in Bio that import the string module but are not used: IntelliGenetics/__init__.py IntelliGenetics/intelligenetics_format.py IntelliGenetics/Record.py NetCatch.py SCOP/__init__.py PDB/PSEA.py (imports upper) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 17:18:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 17:18:41 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811252218.mAPMIfFX029455@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|translate and transcibe |translate and transcribe |methods for the Seq object |methods for the Seq object |(in Bio.Seq) |(in Bio.Seq) ------- Comment #53 from mmokrejs at ribosome.natur.cuni.cz 2008-11-25 17:18 EST ------- (In reply to comment #27) > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] > Patch to Bio/Seq.py to add start codon handling to translation > > Patch adds a new boolean argument to the translate method and function, called > "init" (rather than my earlier suggestions like "from_start" or "check_start" > which could be considered misleading). > > Docstring: > > init - Boolean, defaults to False. Should translation check the > first codon is a valid initiation (start) codon and translate > it as methionine (M)? If False, nothing special is done with > the first codon. What kind of check is it doing? I think it just forces the first letter to be 'M'. > > > Example usage of the translate function, > > >>> from Bio.Seq import translate > >>> translate("TTGAAACCCTAG") > 'LKP*' > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > 'MKP' > >>> translate("TTGAAACCCTAG", init=True) > 'MKP*' > >>> translate("TTGAAACCCTAG", to_stop=True) > 'LKP' I don't like the "init" argument either. I would call it force_initiator_Met instead. BTW, non-canonical initiator codon is CUG, where did you found UUG? Sorry, I got overloaded by many other tasks so haven't read any other follow-ups, I just hit the email from bugzilla by luck. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 10:57:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 10:57:05 -0500 Subject: [Biopython-dev] [Bug 2688] New: Removal of depreciated string functions Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2688 Summary: Removal of depreciated string functions Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com This is a general bug to remove any depreciated string functions from Biopython modules. I apologize in advance for the noise this creates especially due to my mistakes. I have tested and validated the subsequent patches on my Linux system with Python versions 2.3, 2.4, 2.5 and 2.6. However, I do recognize that patches may be in code not used by the tests. The following files require importing the string module and are thus excluded (although depreciated functions may still be used): Bio/Decode.py - maketrans() Bio/EUtils/POM.py - maketrans() Bio/Prosite/Pattern.py - maketrans() Bio/Seq.py - maketrans() triefind.py - defines string.punctuation + string.whitespace The following files have alternative reports GenBank/__init__.py HotRand.py The following files are depreciated and are excluded: Emboss/Primer.py stringfns.py MetaTool/__init__.py MetaTool/metatool_format.py MetaTool/Record.py NBRF/__init__.py Ndb/__init__.py Transcribe.py The following files import but do not use the string module AlignAce/AlignAceStandalone.py (fixed) AlignAce/CompareAceStandalone.py (fixed) Crystal/__init__.py IntelliGenetics/__init__.py IntelliGenetics/intelligenetics_format.py IntelliGenetics/Record.py NetCatch.py SCOP/__init__.py The following files are known to use string module and have patches: Align/AlignInfo.py Blast/ParseBlastTable.py FSSP/__init__.py NMR/NOEtools.py NMR/xpktools.py PDB/MMCIFParser.py SubsMat/__init__.py Blast/Record.py Compass/__init__.py Data/CodonTable.py Eutils/sourcegen.py Eutils/tests/unittest.py Fasta/FastaAlign.py FilteredReader.py GFF/easy.py HMM/Utilities.py Index.py MEME/Parser.py NeuralNetwork/Gene/Pattern.py NeuralNetwork/Gene/Schema.py Parsers/spark.py PDB/parse_pdb_header.py PDB/PDBList.py PDB/PDBParser.py PDB/PSEA.py SCOP/__init__.py utils.py I did not see an trivial resolution for the functions in: SubsMat/FreqTable.py So I rewrote the functions to avoid using map. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 10:58:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 10:58:03 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261558.mAQFw3wc029231@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #1 from bsouthey at gmail.com 2008-11-26 10:58 EST ------- Created an attachment (id=1088) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1088&action=view) Remove depreciated string functions -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 10:59:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 10:59:27 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261559.mAQFxR5t029522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #2 from bsouthey at gmail.com 2008-11-26 10:59 EST ------- Created an attachment (id=1089) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1089&action=view) Blast/Record.py patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:01:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:01:30 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261601.mAQG1U4h029894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #3 from bsouthey at gmail.com 2008-11-26 11:01 EST ------- Created an attachment (id=1090) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1090&action=view) Compass/__init__.py depreciated string functions -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:02:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:02:26 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261602.mAQG2Qlx030068@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #4 from bsouthey at gmail.com 2008-11-26 11:02 EST ------- Created an attachment (id=1091) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1091&action=view) Data/CodonTable.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:03:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:03:14 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261603.mAQG3ETM030188@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #5 from bsouthey at gmail.com 2008-11-26 11:03 EST ------- Created an attachment (id=1092) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1092&action=view) Eutils/sourcegen.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:04:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:04:07 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261604.mAQG47K1030328@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #6 from bsouthey at gmail.com 2008-11-26 11:04 EST ------- Created an attachment (id=1093) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1093&action=view) Eutils/tests/unittest.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:05:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:05:14 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261605.mAQG5EUu030457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #7 from bsouthey at gmail.com 2008-11-26 11:05 EST ------- Created an attachment (id=1094) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1094&action=view) Fasta/FastaAlign.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:06:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:06:35 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261606.mAQG6ZqF030610@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #8 from bsouthey at gmail.com 2008-11-26 11:06 EST ------- Created an attachment (id=1095) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1095&action=view) FSSP/__init__.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:09:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:09:26 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261609.mAQG9QMf030939@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1095 is|0 |1 obsolete| | ------- Comment #9 from bsouthey at gmail.com 2008-11-26 11:09 EST ------- Created an attachment (id=1096) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1096&action=view) FSSP/__init__.py corrected Got the files in the wrong order. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:10:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:10:25 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261610.mAQGAP10031066@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #10 from bsouthey at gmail.com 2008-11-26 11:10 EST ------- Created an attachment (id=1097) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1097&action=view) GFF/easy.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:11:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:11:19 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261611.mAQGBJ28031191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #11 from bsouthey at gmail.com 2008-11-26 11:11 EST ------- Created an attachment (id=1098) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1098&action=view) HMM/Utilities.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:31:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:31:52 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261631.mAQGVqef001363@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #12 from bsouthey at gmail.com 2008-11-26 11:31 EST ------- Created an attachment (id=1099) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1099&action=view) Index.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:32:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:32:37 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261632.mAQGWbYF001446@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #13 from bsouthey at gmail.com 2008-11-26 11:32 EST ------- Created an attachment (id=1100) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1100&action=view) MEME/Parser.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:33:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:33:41 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261633.mAQGXfww001564@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #14 from bsouthey at gmail.com 2008-11-26 11:33 EST ------- Created an attachment (id=1101) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1101&action=view) NeuralNetwork/Gene/Pattern.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:34:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:34:41 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261634.mAQGYf0u001687@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #15 from bsouthey at gmail.com 2008-11-26 11:34 EST ------- Created an attachment (id=1102) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1102&action=view) NeuralNetwork/Gene/Schema.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:35:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:35:35 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261635.mAQGZZno001826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #16 from bsouthey at gmail.com 2008-11-26 11:35 EST ------- Created an attachment (id=1103) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1103&action=view) NMR/NOEtools.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:36:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:36:19 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261636.mAQGaJXQ001918@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #17 from bsouthey at gmail.com 2008-11-26 11:36 EST ------- Created an attachment (id=1104) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1104&action=view) NMR/xpktools.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:37:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:37:14 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261637.mAQGbEX0002035@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #18 from bsouthey at gmail.com 2008-11-26 11:37 EST ------- Created an attachment (id=1105) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1105&action=view) Parsers/spark.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:38:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:38:42 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261638.mAQGcgvH002293@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #19 from bsouthey at gmail.com 2008-11-26 11:38 EST ------- Created an attachment (id=1106) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1106&action=view) Blast/ParseBlastTable.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:39:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:39:37 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261639.mAQGdbdC002442@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #20 from bsouthey at gmail.com 2008-11-26 11:39 EST ------- Created an attachment (id=1107) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1107&action=view) PDB/MMCIFParser.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:40:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:40:56 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261640.mAQGeuHm002669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #21 from bsouthey at gmail.com 2008-11-26 11:40 EST ------- Created an attachment (id=1108) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1108&action=view) PDB/parse_pdb_header.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:41:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:41:56 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261641.mAQGfuJj002827@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #22 from bsouthey at gmail.com 2008-11-26 11:41 EST ------- Created an attachment (id=1109) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1109&action=view) PDB/PDBList.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:42:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:42:41 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261642.mAQGgfiH002929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #23 from bsouthey at gmail.com 2008-11-26 11:42 EST ------- Created an attachment (id=1110) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1110&action=view) PDB/PDBParser.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:43:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:43:28 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261643.mAQGhSbJ003019@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #24 from bsouthey at gmail.com 2008-11-26 11:43 EST ------- Created an attachment (id=1111) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1111&action=view) SubsMat/__init__.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:46:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:46:00 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261646.mAQGk0id003484@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #25 from bsouthey at gmail.com 2008-11-26 11:46 EST ------- Created an attachment (id=1112) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1112&action=view) SubsMat/FreqTable.py The two functions involved were rewritten because of the use of map(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:49:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:49:58 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261649.mAQGnwds003938@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #26 from bsouthey at gmail.com 2008-11-26 11:49 EST ------- Created an attachment (id=1113) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1113&action=view) utils.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 11:55:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:55:45 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811261655.mAQGtjPA004778@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1086 is|0 |1 obsolete| | ------- Comment #4 from bsouthey at gmail.com 2008-11-26 11:55 EST ------- Created an attachment (id=1115) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1115&action=view) Modified HotRand.hex_convert() function Hopefully the last attempt to get the right version as a patch! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Wed Nov 26 12:10:57 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 26 Nov 2008 11:10:57 -0600 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com> References: <4926D17A.8080101@gmail.com> <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> <492AE8A9.1000406@gmail.com> <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com> Message-ID: <492D8321.2060301@gmail.com> Peter wrote: > On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey wrote: > >>> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch >>> to remove more "import string" usage from non-obsolete, non-deprecated >>> code. It would be a little risky doing this to modules without unit >>> tests, but that's another area you've shown some interest in anyway... >>> >>> Thanks, >>> >>> Peter >>> >> Hi, >> I was planning to get started on with these depending on what time I have >> available. So just a quick question: >> Do you want one bug report per patch per file? >> Or just let me know if there is another way. >> > > I'd suggest one general bug, and uploading one patch per module - that > way the can be evaluated on a case by case basis (a single huge > multi-file patch would be more difficult, and could become out of > date). > > Personally however, I would prioritise more unit test coverage over > this, but on the other hand its the kind of short task you can handle > when you have the odd spare 10 minutes. Up to you. > > Peter > Hi, I have filed Bug 2688 as a general bug for the files in the Bio module that use the depreciated string functions. I listed all the files that I identified that imported string and whether or not I provided a patch for it. Bug 2683 lists those files that import string but do not use it. There is one attachment for each file (excluding mistakes). In addition, Bugs 2684 and 2685 were created because these involve rewritten code that was related to this. I probably should have created a separate one for SubsMat/FreqTable.py although the reason directly involves the string module. Regards Bruce From bugzilla-daemon at portal.open-bio.org Wed Nov 26 20:23:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 20:23:32 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811270123.mAR1NWWu011079@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 20:23 EST ------- As far as I can tell, the HotRand.hex_convert function is not used any more in Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3 of Bio.HotRand. So I think that we can simply deprecate this function. If there are no objections, I'll add a DeprecationWarning and use Bruce's code in the mean time until the function is removed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 22:06:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 22:06:59 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270306.mAR36xuB020451@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1088 is|0 |1 obsolete| | ------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 22:06 EST ------- (From update of attachment 1088) Committed to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 23:16:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 23:16:43 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270416.mAR4Gh40027250@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1089 is|0 |1 obsolete| | ------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 23:16 EST ------- (From update of attachment 1089) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 23:29:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 23:29:01 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270429.mAR4T1tn027991@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1090 is|0 |1 obsolete| | ------- Comment #29 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 23:29 EST ------- (From update of attachment 1090) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 23:45:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 23:45:40 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270445.mAR4jeph029067@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1091 is|0 |1 obsolete| | ------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 23:45 EST ------- (From update of attachment 1091) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 01:54:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 01:54:12 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270654.mAR6sC92005762@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1092 is|0 |1 obsolete| | ------- Comment #31 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 01:54 EST ------- (From update of attachment 1092) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 04:35:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 04:35:50 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811270935.mAR9Zoxj019658@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #8 from lpritc at scri.sari.ac.uk 2008-11-27 04:35 EST ------- (In reply to comment #7) > (In reply to comment #0) > > > The major changes that have been made to the version previously available at > > http://bioinf.scri.ac.uk/lp are: > > That's a very nice contribution, thank you!!! > This link is wrong, I think you mean > http://bioinf.scri.ac.uk/lp/programs.php#genomediagram Thanks, Marco. You're absolutely correct - and people ought to be able to navigate to there from the link I gave. Thanks for posting the accurate link. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From lpritc at scri.ac.uk Thu Nov 27 04:33:43 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 27 Nov 2008 09:33:43 +0000 Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython In-Reply-To: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com> Message-ID: Thanks Giovanni, On 23/11/2008 17:33, "Giovanni Marco Dall'Olio" wrote: > I thought that the inclusion of GenomeDiagrams in biopython is such an > interesting news, that I wrote a blog post on it: > - http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/ I left a comment there ;) > I have used images from some tutorials without asking, I hope it is > not a problem. No problem at all - I think the old license covered it, and I'm pretty sure that the Biopython license will, too. Even if they didn't, as the original copyright holder, I approve ;) L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Thu Nov 27 04:57:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 04:57:00 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811270957.mAR9v0i0021623@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #54 from lpritc at scri.sari.ac.uk 2008-11-27 04:56 EST ------- (In reply to comment #53) > (In reply to comment #27) > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] > > Patch to Bio/Seq.py to add start codon handling to translation > > > > Patch adds a new boolean argument to the translate method and function, called > > "init" (rather than my earlier suggestions like "from_start" or "check_start" > > which could be considered misleading). [...] > I don't like the "init" argument either. I would call it force_initiator_Met > instead. BTW, non-canonical initiator codon is CUG, where did you found UUG? This may clarify things: >From the E. coli K-12 sequencing paper (http://dx.doi.org/10.1126/science.277.5331.1453): "The distribution of start codons is as follows: ATG, 3542; GTG, 612; and TTG, 130. There is also one ATT and possibly a CTG" It's not that unusual an occurrence, and there are a small number of known alternative start codons. 'Forcing' a Met start imposes the result that the first codon is a methionine, rather than checking that the first codon *could be* a methionine. I prefer the second behaviour. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 05:41:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 05:41:18 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271041.mARAfITj025395@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1093 is|0 |1 obsolete| | ------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 05:41 EST ------- (From update of attachment 1093) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 05:46:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 05:46:57 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271046.mARAkv9t025868@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1094 is|0 |1 obsolete| | ------- Comment #33 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 05:46 EST ------- (From update of attachment 1094) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 06:08:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 06:08:30 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271108.mARB8U6n027821@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1096 is|0 |1 obsolete| | ------- Comment #34 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 06:08 EST ------- (From update of attachment 1096) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 06:14:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 06:14:18 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271114.mARBEI5w028329@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1097 is|0 |1 obsolete| | ------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 06:14 EST ------- (From update of attachment 1097) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Thu Nov 27 08:09:43 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 27 Nov 2008 05:09:43 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <20081125144041.GC83220@sobchak.mgh.harvard.edu> Message-ID: <45956.75241.qm@web62406.mail.re1.yahoo.com> > > However, more than half of Biopython's tests do > > not actually make use of this testing framework: > > [...] > > These tests have trivial output, for example > test_Cluster: > > > > test_Cluster > > test_clusterdistance (test_Cluster.TestCluster) ... ok > > test_distancematrix_kmedoids > > (test_Cluster.TestCluster) ... ok > > test_kcluster (test_Cluster.TestCluster) ... ok > > test_matrix_parse (test_Cluster.TestCluster) ... ok > > test_median_mean (test_Cluster.TestCluster) ... ok > > test_somcluster (test_Cluster.TestCluster) ... ok > > test_treecluster (test_Cluster.TestCluster) ... ok > > They really do make use of the framework, but at a higher > level. I agree that if you run a single test it makes little > difference whether you use 'run_tests.py test_Cluster' or just > run 'test_Cluster.py' directly. However, when you are > running all the tests as is regular done in development > or before pushing releases, this comparison is important. It > will pick out if you get a line like: > > test_clusterdistance (test_Cluster.TestCluster) ... ERROR > > instead of the expected ok and report this in the summary > for all of the tests. Otherwise this is likely to get lost > in all of the results. Actually, I never use the summary produced by run_tests.py. I just check which tests failed, and then fix them one by one by running the individual test scripts. > > I would therefore like to suggest to move from > > Biopython's testing framework to Python's testing > > framework. This also relieves us of the > > task of explaining Biopython's testing framework > > to contributors, and allows us to make better use > > of what Python already provides. ... > Is the testing framework you are proposing different from > the unit tests used the individual tests? I am proposing to use the regular Python unit testing framework as it is. This means that most Biopython tests do not change at all (or only trivially). The run_tests.py script will need to be modified though to remove the requirement of having an output file for each individual test. > How does your proposed > manage the higher level functionality of checking if all sub-tests > within one of the test suites passes? If one of the sub-tests fails, Python's unit testing framework will tell us so, though (perhaps) not exactly which sub-test fails. However, that is easy to figure out just by running the individual test script by itself. --Michiel From bugzilla-daemon at portal.open-bio.org Thu Nov 27 08:33:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 08:33:46 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811271333.mARDXkHx009514@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 08:33 EST ------- Fixed in CVS, thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 09:38:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 09:38:04 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811271438.mAREc4IG018238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #9 from lpritc at scri.sari.ac.uk 2008-11-27 09:38 EST ------- The revised color/colour code in AbstractDrawer.py causes all bar charts in linear diagrams to be the default colour of light green. A fixed version of AbstractDrawer is provided as an attachment. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 09:39:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 09:39:37 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811271439.mAREdbXp018415@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #10 from lpritc at scri.sari.ac.uk 2008-11-27 09:39 EST ------- Created an attachment (id=1121) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1121&action=view) Revised AbstractDrawer.py This revision fixes a behaviour where bar charts for linear diagrams cannot be changed from tehir defautl colour. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 20:33:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 20:33:56 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280133.mAS1XuXq002406@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1098 is|0 |1 obsolete| | ------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 20:33 EST ------- (From update of attachment 1098) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 20:52:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 20:52:10 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280152.mAS1qAR3003698@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1099 is|0 |1 obsolete| | ------- Comment #37 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 20:52 EST ------- (From update of attachment 1099) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 21:27:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 21:27:29 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280227.mAS2RTea005795@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #38 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 21:27 EST ------- (From update of attachment 1100) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 21:27:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 21:27:47 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280227.mAS2RlEg005835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1100 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 21:55:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 21:55:11 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280255.mAS2tBTL007510@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1101 is|0 |1 obsolete| | ------- Comment #39 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 21:55 EST ------- (From update of attachment 1101) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 22:02:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 22:02:25 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280302.mAS32Pxh008177@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1102 is|0 |1 obsolete| | ------- Comment #40 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 22:02 EST ------- (From update of attachment 1102) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 23:08:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:08:57 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280408.mAS48vaq012054@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1103 is|0 |1 obsolete| | ------- Comment #41 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:08 EST ------- (From update of attachment 1103) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 23:16:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:16:29 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280416.mAS4GThb012692@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1104 is|0 |1 obsolete| | ------- Comment #42 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:16 EST ------- (From update of attachment 1104) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 23:22:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:22:37 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280422.mAS4MbVR013025@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1105 is|0 |1 obsolete| | ------- Comment #43 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:22 EST ------- (From update of attachment 1105) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 23:50:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:50:59 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280450.mAS4oxjC014450@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1106 is|0 |1 obsolete| | ------- Comment #44 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:50 EST ------- (From update of attachment 1106) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 00:07:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 00:07:15 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280507.mAS57F3P015386@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1107 is|0 |1 obsolete| | ------- Comment #45 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 00:07 EST ------- (From update of attachment 1107) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 03:48:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 03:48:30 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280848.mAS8mUmr028058@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1108 is|0 |1 obsolete| | ------- Comment #46 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 03:47 EST ------- (From update of attachment 1108) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 05:07:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:07:05 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281007.mASA751F001103@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1109 is|0 |1 obsolete| | ------- Comment #47 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:07 EST ------- (From update of attachment 1109) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 05:22:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:22:13 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281022.mASAMDwt002023@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1110 is|0 |1 obsolete| | ------- Comment #48 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:22 EST ------- (From update of attachment 1110) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 05:29:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:29:16 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281029.mASATGhi002380@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1111 is|0 |1 obsolete| | ------- Comment #49 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:29 EST ------- (From update of attachment 1111) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 05:29:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:29:39 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281029.mASATdU5002440@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1112 is|0 |1 obsolete| | ------- Comment #50 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:29 EST ------- (From update of attachment 1112) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 05:30:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:30:23 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281030.mASAUNDX002501@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1113 is|0 |1 obsolete| | ------- Comment #51 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:30 EST ------- (From update of attachment 1113) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 28 06:09:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 11:09:30 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <45956.75241.qm@web62406.mail.re1.yahoo.com> References: <20081125144041.GC83220@sobchak.mgh.harvard.edu> <45956.75241.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> Hello all, Sorry for not replying earlier - I've been travelling and didn't get to check my email as often as I had hoped. I'm going to reply to several points in this one email... Marco wrote: > I was also proposing to use the doctest framework for some of the > modules, and for enhancing documentation. > http://bugzilla.open-bio.org/show_bug.cgi?id=2640 As Marco points out, there is also the option of using doctest, which were doing in some of the unit tests (e.g. test_wise.py). I like the idea of using doctest were we want to include examples in the docstrings anyway. Marco wasn't suggesting this, but just to be clear, I don't think we should use JUST doctest for all our unit tests. Many test cases would make misleading documentation, and also having lots and lots of doctest examples would also hide the important parts of the documentation. Additionally, doctests using input files are not straightforward due to path issues. Brad wrote: > Agreed with the distinction between the unit tests and the "dump > lots of text and compare" approach. I've written both and do think > the unit testing/assertion model is more robust since you can go > back and actually get some insight into what someone was thinking > when they wrote an assertion. I have probably written more of the "dump lots of text and compare" style tests. I think these have a number of advantages: (1) Easier for beginneers to write a test, you can almost take any example script and use that. You don't have to learn the unit test framework. (2) Debugging a failing test in IDLE is much easier - using unit tests you have all that framework between you and the local scope where the error happens. (3) For many broad tests, manually setting up the expected output for an assert is extremely tedious (e.g. parsing sequences and checking their checksums). We could discuss a modification to run_tests.py so that if there is no expected output file output/test_XXX for test_XXX.py we just run test_XXX.py and check its return value (I think Michiel had previously suggested something like this). Perhaps for more robustness, capture the output and compare it to a predefined list of regular expressions covering the typical outputs. For example, looking at output/test_Cluster, the first line is the test name, but rest follows the patten "test_... ok". I imaging only a few output styles exist. With such a change, half the unit test's (e.g. test_Cluster.py) wouldn't need their output file in CVS (output/test_Cluster). Michiel de Hoon wrote: > If one of the sub-tests fails, Python's unit testing framework will tell us so, > though (perhaps) not exactly which sub-test fails. However, that is easy to > figure out just by running the individual test script by itself. That won't always work. Consider intermittent network problems, or tests using random data - in general it really is worthwhile having run_tests.py report a little more than just which test_XXX.py module failed. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 28 06:53:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 06:53:36 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281153.mASBra4q008163@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #52 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 06:53 EST ------- Although I had offered to look over the patches, it looks like Michiel has reviewed and committed them all while I was away, so I don't have to ;) Thank you both! Can we close this bug now? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 06:57:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 06:57:35 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811281157.mASBvZ6A008475@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 06:57 EST ------- (In reply to comment #5) > As far as I can tell, the HotRand.hex_convert function is not used any more in > Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3 > of Bio.HotRand. So I think that we can simply deprecate this function. If there > are no objections, I'll add a DeprecationWarning and use Bruce's code in the > mean time until the function is removed. +1 on this plan. (I was going to say we should deprecate this function rather than removing it, but you'd already covered that). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 07:05:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 07:05:14 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811281205.mASC5EY8009077@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 07:05 EST ------- (In reply to comment #7) > (In reply to comment #6) > > (From update of attachment 1072 [details] [details]) > > I think this is still a big improvement, but that the > > (sub)feature.location_operator issue could wait. We'll > > need to discuss on the > > BioSQL mailing list how this should be handled consistently. > > > > Leaving this bug open. > > Further to the "where to put the (sub)feature.location_operator" (eg. "join", > "order") question, this comment appears in the BioPerl MySQL schema for the > location_qualifier_value table: > > -- location qualifiers - mainly intended for fuzzies but anything > -- can go in here > -- some controlled vocab terms have slots; > > So, this would seem a suitable place to store the attribute. > Yes, but if we record something in the location_qualifier_value table we can't use a NULL term_id (possibly a schema limitation). We therefore need to use a particular ontology, which is where some co-ordination with the other BioSQL projects is needed (so that we all default to the same ontology). I'd meant to send of an email about this to the BioSQL mailing list but didn't get it done before I had to leave for a trip. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 07:24:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 07:24:19 -0500 Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over string.whitespace In-Reply-To: Message-ID: <200811281224.mASCOJSg010226@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2684 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 07:24 EST ------- Marking as fixed - I've checked in a simplified version of your patch. See Bio/GenBank/__init__.py revision 1.98 in CVS. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython Thanks Bruce. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 28 07:37:04 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 12:37:04 +0000 Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <580790.81356.qm@web62404.mail.re1.yahoo.com> References: <580790.81356.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon wrote: >>>> from Bio import Entrez >>>> handle = Entrez.elink(dbfrom='pubmed',id=12345) >>>> record = Entrez.read(handle) > > Feel free to write a section about Entrez.elink for the Biopython documentation :-). > Currently, this section is almost empty. This does need a little love, doesn't it. Here is a slightly longer example which could form the basis of a tutorial entry: >>> from Bio import Entrez >>> Entrez.email = "A.N.Other at example.com" >>> pmid = "12230038" >>> handle = Entrez.elink(dbfrom='pubmed', id=pmid) >>> result = Entrez.read(handle) >>> for link in result[0]["LinkSetDb"][0]['Link'] : ... print link The deeply nested nature of the XML results do suggest that a helper function in Bio.Entrez would be useful here. Maybe something like: def find_related(dbfrom, id) : #Returns a list of dictionaries containing Score and ID matched result = read(elink(dbfrom=dbfrom, id=id)) return result[0]["LinkSetDb"][0]['Link'] It might make more sense to return just a list of ID strings, but the score may be interesting. Peter From biopython at maubp.freeserve.co.uk Fri Nov 28 08:05:38 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 13:05:38 +0000 Subject: [Biopython-dev] Bio.Entrez batched downloads Message-ID: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com> This is returning to a topic we've discussed in the past - the NCBI Entrez API is quite low level, and the Bio.Entrez module reflects this. As a result certain "typical" tasks require more code than one might expect. In particular, batched downloads of a large result set. The tutorial covers using Bio.Entrez.efetch in a loop to download a result set in a batch, for example writing out a MedLine or FASTA format file. This seems like a common need - starting either from a list of IDs, or better from a history webenv and query_key. I think there is a use for a Bio.Entrez.batched_efetch or download_many function to save people re-implementing their own batched downloader (even just as a copy and paste from the tutorial). If the NCBI every give any explicit guidance on batch sizes then we can update Biopython centrally - rather than individual scripts requiring changes everywhere. We might also be able to include some basic error checking to (e.g. empty or partial downloads). One catch is that downloading and concatenating batches as XML files does not give a valid XML file - but this is safe for MedLine, FASTA, GenBank etc. This proposed function could raise an exception if used with XML to avoid this issue. In terms of the API for getting the data back, there are several options * Take an output handle as an argument (which would be written to as each batch was downloaded) * Return a handle - the implementation would be a bit more complicated as we should avoid holding everything in memory, but would then be very similar to the existing Bio.Entrez.efetch function in its usage. Other options which I don't like: * Take an output filename (less flexible than just taking an output handle) * Return the data as a string (memory concerns with large downloads) Note that related functions like the deprecated Bio.PubMed.download_many (and early versions of Bio.GenBank.download_many) used a complicated function call back mechanism (which required knowing the file format in advance and having a parser for it). This doesn't seem sensible for a generic function. Currently Bio.GenBank.download_many (obsolete, soon to be deprecated) just makes a single call to Bio.Entrez.efetch, regardless of the number of records / amount of data expected. Peter From biopython at maubp.freeserve.co.uk Fri Nov 28 12:26:45 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 17:26:45 +0000 Subject: [Biopython-dev] Deprecation and removal policy Message-ID: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> Back on 27 June 2008, in preparation for what became Biopython 1.47, Michiel wrote: > In recent releases, we have been using the rule of thumb to remove all > modules from a new Biopython release that were deprecated two > releases ago. I was thinking that when we made releases about six months apart, this rule of thumb effectively gave a year's warning. Recently we're made releases roughly every three months, which translates to only about six months warning, so I think we should be a little more restrained in removing deprecated code in future. As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in Release 1.48 (Sept 2009). Under the old rule of thumb, we could remove this module from CVS now (as the deprecation was present in Biopython 1.48 and 1.49). If we release Biopython 1.50 in January or February 2009 (for the sake of argument), that means the deprecation would have been in place for only four or five months - which seems too rash. How about a new policy that after adding a deprecation warning, deprecated modules/functions are kept for at least two public releases AND at least 12 months (counting from the first release when they are deprecated - not the date of the CVS change) before being removed? Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 28 15:10:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 15:10:43 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811282010.mASKAhuK012846@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 15:10 EST ------- (In reply to comment #8) > Yes, but if we record something in the location_qualifier_value table we can't > use a NULL term_id (possibly a schema limitation). We therefore need to use a > particular ontology, which is where some co-ordination with the other BioSQL > projects is needed (so that we all default to the same ontology). I'd meant > to send of an email about this to the BioSQL mailing list but didn't get it > done before I had to leave for a trip. I've started a discussion on the BioSQL mailing list, see this thread: http://lists.open-bio.org/pipermail/biosql-l/2008-November/001412.html - me http://lists.open-bio.org/pipermail/biosql-l/2008-November/001414.html - Richard from BioJava http://lists.open-bio.org/pipermail/biosql-l/2008-November/001413.html - me etc. Cymon - if you haven't already done so, I would encourage you to sign up to the BioSQL mailing list. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 23:48:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 23:48:46 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811290448.mAT4mkmI008416@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #53 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 23:48 EST ------- (In reply to comment #52) > Can we close this bug now? > Not yet, there are a few more things to consider in the original description. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 29 00:01:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 29 Nov 2008 00:01:12 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811290501.mAT51ClZ009532@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from mdehoon at ims.u-tokyo.ac.jp 2008-11-29 00:01 EST ------- I used Bruce's patch and added a DeprecationWarning to the hex_convert function, and modified the unit test accordingly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Nov 29 00:13:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 28 Nov 2008 21:13:33 -0800 (PST) Subject: [Biopython-dev] Bio.Entrez batched downloads In-Reply-To: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com> Message-ID: <432417.5854.qm@web62405.mail.re1.yahoo.com> Sorry, but I am -1 on this. This sounds like software bloat to me. The reason that the NCBI Entrez API is low level is that they are unable to predict how users will want to use the NCBI Entrez. We as Biopython know little more than NCBI, except that our users want to access NCBI Entrez via Python, so we provide a Python interface to NCBI Entrez. Also, I don't think that the current situation is unsatisfactory. The Bio.Entrez API is extremely simple, and with an example in the tutorial it should be very easy to use; I don't see a problem with copying and pasting from the tutorial, provided that sufficient information is available there. --Michiel. --- On Fri, 11/28/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] Bio.Entrez batched downloads > To: "BioPython-Dev Mailing List" > Date: Friday, November 28, 2008, 8:05 AM > This is returning to a topic we've discussed in the past > - the NCBI > Entrez API is quite low level, and the Bio.Entrez module > reflects > this. As a result certain "typical" tasks > require more code than one > might expect. In particular, batched downloads of a large > result set. > > The tutorial covers using Bio.Entrez.efetch in a loop to > download a > result set in a batch, for example writing out a MedLine or > FASTA > format file. This seems like a common need - starting > either from a > list of IDs, or better from a history webenv and query_key. > I think > there is a use for a Bio.Entrez.batched_efetch or > download_many > function to save people re-implementing their own batched > downloader > (even just as a copy and paste from the tutorial). > > If the NCBI every give any explicit guidance on batch sizes > then we > can update Biopython centrally - rather than individual > scripts > requiring changes everywhere. We might also be able to > include some > basic error checking to (e.g. empty or partial downloads). > One catch > is that downloading and concatenating batches as XML files > does not > give a valid XML file - but this is safe for MedLine, > FASTA, GenBank > etc. This proposed function could raise an exception if > used with XML > to avoid this issue. > > In terms of the API for getting the data back, there are > several options > * Take an output handle as an argument (which would be > written to as > each batch was downloaded) > * Return a handle - the implementation would be a bit more > complicated > as we should avoid holding everything in memory, but would > then be > very similar to the existing Bio.Entrez.efetch function in > its usage. > > Other options which I don't like: > * Take an output filename (less flexible than just taking > an output handle) > * Return the data as a string (memory concerns with large > downloads) > > Note that related functions like the deprecated > Bio.PubMed.download_many (and early versions of > Bio.GenBank.download_many) used a complicated function call > back > mechanism (which required knowing the file format in > advance and > having a parser for it). This doesn't seem sensible > for a generic > function. Currently Bio.GenBank.download_many (obsolete, > soon to be > deprecated) just makes a single call to Bio.Entrez.efetch, > regardless > of the number of records / amount of data expected. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mjldehoon at yahoo.com Sat Nov 29 00:22:10 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 28 Nov 2008 21:22:10 -0800 (PST) Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> Message-ID: <246349.44664.qm@web62404.mail.re1.yahoo.com> > The deeply nested nature of the XML results do suggest that > a helper function in Bio.Entrez would be useful here. Maybe > something like: > > def find_related(dbfrom, id) : > #Returns a list of dictionaries containing Score and ID > # matched > result = read(elink(dbfrom=dbfrom, id=id)) > return result[0]["LinkSetDb"][0]['Link'] > > It might make more sense to return just a list of ID > strings, but the score may be interesting. > The problem this user encountered was that the DeprecationWarning in PubMed.find_related function contained very little information and did not mention that Entrez.elink is the appropriate function to use: "Find related articles in PubMed, returns an ID list (DEPRECATED). Please use Bio.Entrez instead as described in the Biopython Tutorial." and in addition that currently the description of Bio.Entrez.elink in the tutorial is almost empty. Instead of adding a function to Bio.Entrez that helps this particular user, we should improve our documentation to enable all users to use Bio.Entrez appropriately. The set of helper functions to Bio.Entrez that we could write is virtually endless; we should not go down that path. --Michiel. From bugzilla-daemon at portal.open-bio.org Sat Nov 29 01:02:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 29 Nov 2008 01:02:01 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811290602.mAT621Lc012846@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #54 from mdehoon at ims.u-tokyo.ac.jp 2008-11-29 01:02 EST ------- All fixed now; I hope I didn't screw up anything. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Nov 29 02:04:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 28 Nov 2008 23:04:33 -0800 (PST) Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> Message-ID: <652169.76582.qm@web62406.mail.re1.yahoo.com> I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks! --Michiel. --- On Fri, 11/28/08, Peter wrote: > From: Peter > Subject: Re: [BioPython] PubMed find_related > To: mjldehoon at yahoo.com > Cc: "BioPython-Dev Mailing List" > Date: Friday, November 28, 2008, 7:37 AM > On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon > wrote: > >>>> from Bio import Entrez > >>>> handle = > Entrez.elink(dbfrom='pubmed',id=12345) > >>>> record = Entrez.read(handle) > > > > Feel free to write a section about Entrez.elink for > the Biopython documentation :-). > > Currently, this section is almost empty. > > This does need a little love, doesn't it. Here is a > slightly longer > example which could form the basis of a tutorial entry: > > >>> from Bio import Entrez > >>> Entrez.email = > "A.N.Other at example.com" > >>> pmid = "12230038" > >>> handle = > Entrez.elink(dbfrom='pubmed', id=pmid) > >>> result = Entrez.read(handle) > >>> for link in > result[0]["LinkSetDb"][0]['Link'] : > ... print link > > The deeply nested nature of the XML results do suggest that > a helper > function in Bio.Entrez would be useful here. Maybe > something like: > > def find_related(dbfrom, id) : > #Returns a list of dictionaries containing Score and ID > matched > result = read(elink(dbfrom=dbfrom, id=id)) > return > result[0]["LinkSetDb"][0]['Link'] > > It might make more sense to return just a list of ID > strings, but the > score may be interesting. > > Peter From biopython at maubp.freeserve.co.uk Sat Nov 29 08:36:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 29 Nov 2008 13:36:16 +0000 Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <246349.44664.qm@web62404.mail.re1.yahoo.com> References: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> <246349.44664.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00811290536n7fe25b0fxfe78d52b16014a92@mail.gmail.com> On Sat, Nov 29, 2008 at 5:22 AM, Michiel de Hoon wrote: > > The problem this user encountered was that the DeprecationWarning in > PubMed.find_related function contained very little information and did > not mention that Entrez.elink is the appropriate function to use: > > "Find related articles in PubMed, returns an ID list (DEPRECATED). > Please use Bio.Entrez instead as described in the Biopython Tutorial." We could make the deprecation warnings from Bio.PubMed (and the online bits of Bio.GenBank) a little more explicit about which bits of Bio.Entrez to use. I made a start on updating Bio/PubMed.py on my work computer on Friday, so I'll try to remember to finish this off on Monday. > and in addition that currently the description of Bio.Entrez.elink in the > tutorial is almost empty. Instead of adding a function to Bio.Entrez > that helps this particular user, we should improve our documentation > to enable all users to use Bio.Entrez appropriately. The tutorial update for elink looks good (see below). > The set of helper functions to Bio.Entrez that we could write is > virtually endless; we should not go down that path. I take your point - there are lots of possible helper functions we could consider. As long as we cover the typical use cases in the tutorial that should be enough. On Sat, Nov 29, 2008 at 7:04 AM, Michiel de Hoon wrote: > I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks! > > --Michiel. That looks good - and tries to explain the nested result structure too. Peter From bsouthey at gmail.com Sun Nov 30 21:37:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Sun, 30 Nov 2008 20:37:05 -0600 Subject: [Biopython-dev] Deprecation and removal policy In-Reply-To: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> Message-ID: On Fri, Nov 28, 2008 at 11:26 AM, Peter wrote: > Back on 27 June 2008, in preparation for what became Biopython 1.47, > Michiel wrote: >> In recent releases, we have been using the rule of thumb to remove all >> modules from a new Biopython release that were deprecated two >> releases ago. > > I was thinking that when we made releases about six months apart, this > rule of thumb effectively gave a year's warning. Recently we're made > releases roughly every three months, which translates to only about > six months warning, so I think we should be a little more restrained > in removing deprecated code in future. > > As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in > Release 1.48 (Sept 2009). Under the old rule of thumb, we could > remove this module from CVS now (as the deprecation was present in > Biopython 1.48 and 1.49). If we release Biopython 1.50 in January or > February 2009 (for the sake of argument), that means the deprecation > would have been in place for only four or five months - which seems > too rash. > > How about a new policy that after adding a deprecation warning, > deprecated modules/functions are kept for at least two public releases > AND at least 12 months (counting from the first release when they are > deprecated - not the date of the CVS change) before being removed? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, Generally I would agree with idea for code that is under active development. For certain code that has not really been touched for a few years except for trivial changes (like removing string functions), I think 12 months is perhaps too long if it passes two releases. Regardless of how it is done, Python 3 will need to be supported (the final release is due soon) and I do not see a reason to port depreciated modules or functions just because of some policy. So I would add the provision that depreciated code will not be ported to the Python 3 compatible Biopython branch. Bruce From bugzilla-daemon at portal.open-bio.org Sat Nov 1 04:02:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 00:02:49 -0400 Subject: [Biopython-dev] [Bug 2627] Updated Bio.MarkovModel to remove oldnumeric and listfns imports In-Reply-To: Message-ID: <200811010402.mA142nUi010329@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2627 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-11-01 00:02 EST ------- I made some changes to this patch and committed it to CVS; see MarkovModel.py revision 1.9. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 05:38:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 01:38:41 -0400 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200811010538.mA15cfGM016656@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-01 01:38 EST ------- Committed to CVS with some changes; see MaxEntropy.py versions 1.8 and 1.9. I added your example at the bottom of Bio/MaxEntropy.py. Next time, instead of the complete new code for a module, please attach a patch instead. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 06:59:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 02:59:40 -0400 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811010659.mA16xedF020106@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-11-01 02:59 EST ------- I committed part of this patch to CVS; see NaiveBayes.py revision 1.9. Could you check your classify function? It seems to contain some debugging statements. Also, do we need the classifyprob function? If you send in a new version of this code, please attach it as a patch to the current version of NaiveBayes.py in CVS. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 21:22:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 17:22:53 -0400 Subject: [Biopython-dev] [Bug 2592] numpy migration for Bio.PDB.Vector In-Reply-To: Message-ID: <200811012122.mA1LMrf6021694@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2592 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-01 17:22 EST ------- Fixed in CVS, see Bio/PDB/Vector.py revision 1.45 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 1 22:11:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 1 Nov 2008 18:11:47 -0400 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811012211.mA1MBl3b026482@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-01 18:11 EST ------- Here is an example of how the updated Seq object might be used (taken from the new edition of the tutorial in CVS): >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna) >>> coding_dna.translate() Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*')) >>> coding_dna.translate(to_stop=True) Seq('MAIVMGR', IUPACProtein()) Using the Vertebrate Mitochondrial table instead: >>> coding_dna.translate(table="Vertebrate Mitochondrial") Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*')) >>> coding_dna.translate(table=2) Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*')) >>> coding_dna.translate(table=2, to_stop=True) Seq('MAIVMGRWKGAR', IUPACProtein()) As I said in comment 24, the name "to_stop" and its behaviour are taken from the old (now obsolete) Bio.Translate module. ------------------------------------------------------------- I'm also considering adding an additional boolean argument too (see comment 22): > Validate the first codon is a valid start codon, and translate > it as M (even if going on the genetic code it would normally be > say L). This should be a boolean argument defaulting to False, > possible names "start", "check_start", "from_start", ... I would prefer to avoid calling this argument "start" given the existing meaning associated with "start" and "end" used in python strings (for specifying a sub-sequence to be translated - discussed earlier on this bug). This would be especially useful for translating a gene/CDS sequence into protein where making sure a non-standard start codon is translated as "M" is non-trivial. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 11:17:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 06:17:59 -0500 Subject: [Biopython-dev] [Bug 2638] New: test_PopGen_SimCoal_nodepend.py fails on Windows do to newline issue Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2638 Summary: test_PopGen_SimCoal_nodepend.py fails on Windows do to newline issue Product: Biopython Version: Not Applicable Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This unit test attempts to regenerate a plain text SimCoal file, and currently fails on Windows (but passes on Linux and Mac OS X). Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 11:22:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 06:22:16 -0500 Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on Windows do to newline issue In-Reply-To: Message-ID: <200811031122.mA3BMGwX013481@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2638 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 06:22 EST ------- Created an attachment (id=1030) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1030&action=view) Patch to the PopGen/SimCoal/Template.py and the unit test Looking at the code, rather than using \n to mean a platform aware new line, \r\n is used (this doesn't always give a CR LF, but on Windows you get CR CR LF instead). Also, are the template files in CVS as plain text files or binary files? I haven't double checked but I think they may be checked in as binary files with DOS/Windows new lines... I haven't committed this as I don't have SIMCOAL installed to check there are no side effects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 11:22:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 06:22:53 -0500 Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on Windows, newline issue In-Reply-To: Message-ID: <200811031122.mA3BMr8B013540@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2638 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|test_PopGen_SimCoal_nodepend|test_PopGen_SimCoal_nodepend |.py fails on Windows do to |.py fails on Windows, |newline issue |newline issue ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 06:22 EST ------- Removed typo in the bug summary (title). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Nov 3 11:48:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Nov 2008 11:48:06 +0000 Subject: [Biopython-dev] New line issues in the source zip or tarballs In-Reply-To: <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com> References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com> <6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com> <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com> Message-ID: <320fb6e00811030348vb7b6068v549ebfab9f6ec76b@mail.gmail.com> On Mon, Sep 8 Peterwrote: > Tiago wrote: >> Peter wrote: >>> In the case of test_PopGen_SimCoal_nodepend.py the failure is >>> expecting simple.par and simple_100_30.par to be exactly the same size >>> (in class TemplateTest, line 47). This is not true going to be true >>> when the input file uses Unix new lines but the generated file uses >>> Windows new lines. Perhaps using a simple bit of code to load the >>> files line by line and compare them would work here? >> >> I am currently at a workshop (I belong to the organization committee, so I >> don't have much time), but I will try to sort this in the next couple of >> days. > > This issue new line issue has probably been there since Biopython 1.45 > without anyone else spotting it, so I don't see fixing it as urgent. > Hopefully we can resolve this for the next release instead. I've filed Bug 2638 on this with a possible patch. Could you take a look at this please? I just tried installing SIMCOAL2 on my Mac, but failed. To be fair, they do only appear to support Linux and Windows... Thanks Peter From biopython at maubp.freeserve.co.uk Mon Nov 3 12:43:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Nov 2008 12:43:22 +0000 Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation Message-ID: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> Hi Tiago, I've just compiled SIMCOAL2 on a Linux machine from http://cmpg.unibe.ch/software/simcoal2/ (version 2.1.2). If anyone else tries this, it required the use of -fpermissive on g++ 4.1.2 to compile (and gave lots of deprecation warnings, plus some trivial ones about header files which didn't end with a newline). The make file specifies the executable name as simcoal2_1_2, however it does not include an install target, so it is up to the user where to put the binary (e.g. I used ~/bin/ rather than system wide) and perhaps what to call it. The provided pre-compiled binary is also called simcoal2_1_2. However, Bio.PopGen.SimCoal.Controller seems to assume the executable will be called just simcoal2 (or simcoal2.exe on Windows), and thus fails detect a binary called simcoal2_1_2. The unit test however is more flexible and looks for any binary on the path whose name starts with simcoal2. Ideally these two should be consistent. I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation issue or a bug in Bio.PopGen.SimCoal.Controller? In my experience it is not normal for a Linux tool to include the full version in the executable name - using just simcoal2 does make more sense. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 3 17:16:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 12:16:41 -0500 Subject: [Biopython-dev] [Bug 2639] New: SeqRecord.init doesn't check for arguments to their types Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2639 Summary: SeqRecord.init doesn't check for arguments to their types Product: Biopython Version: 1.47 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com SeqRecord doesn't check if description is a string when creating SeqRecord objects. This causes an error when later you will have to print the record in formats like fasta. >>> from Bio.Seq import Seq >>> from Bio.SeqRecord import SeqRecord >>> sr = SeqRecord(Seq('aaa'), description = [1, 2, 3]) # should give an error here! >>> print sr.fasta : 'list' object has no attribute 'replace' Looking at SeqRecord.__init__ code, none of the arguments is checked for its type. This is a minor bug, but if you want to solve it, you just have to add some isinstance() check in the init function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 18:47:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 13:47:59 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811031847.mA3IlxuE025247@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 13:47 EST ------- Fixed in CVS, although there is a small chance this will break existing scripts which relied on the old lax behaviour. Peter P.S. Assuming you are using an unmodified Biopython, the last line of your example wouldn't work: >>> print sr.fasta Try: >>> print sr.format("fasta") -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 3 19:33:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 14:33:39 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811031933.mA3JXdcZ028123@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #4 from bsouthey at gmail.com 2008-11-03 14:33 EST ------- (In reply to comment #3) > I committed part of this patch to CVS; see NaiveBayes.py revision 1.9. > Could you check your classify function? It seems to contain some debugging > statements. Also, do we need the classifyprob function? > If you send in a new version of this code, please attach it as a patch to the > current version of NaiveBayes.py in CVS. > Thanks! > Yes, there is a print statement at the end of the 'classify' function (line 125 of attached file) that should be removed (as with any print statements that are commented out). These were to check that the values were the same as the original code. The classifyprob function can be dropped with not problems. I just wanted to return the probability but I also recognize that it is not very useful. I noticed you are using set (line 145 in the new cvs file) which is not compatible with Python2.3. How should this be addressed? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon Nov 3 19:34:36 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 3 Nov 2008 19:34:36 +0000 Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation In-Reply-To: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> Message-ID: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com> Hi, On Mon, Nov 3, 2008 at 12:43 PM, Peter wrote: > However, Bio.PopGen.SimCoal.Controller seems to assume the executable > will be called just simcoal2 (or simcoal2.exe on Windows), and thus > fails detect a binary called simcoal2_1_2. The unit test however is > more flexible and looks for any binary on the path whose name starts > with simcoal2. Ideally these two should be consistent. I am aware of this, in fact, this issue is documented in the tutorial (9.5.2.2). The idea is that the binary should be called simcoal2 as documented. This can be changed of course. My preference would be to change just the test code. Is this ok with you? > I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as > simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation > issue or a bug in Bio.PopGen.SimCoal.Controller? In my experience it > is not normal for a Linux tool to include the full version in the > executable name - using just simcoal2 does make more sense. Agree. And, again, this is documented in the tutorial. I can go ahead and change the test code (please just confirm). From tiagoantao at gmail.com Mon Nov 3 19:56:05 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 3 Nov 2008 19:56:05 +0000 Subject: [Biopython-dev] Statistics in population genetics module - Part I In-Reply-To: <5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com> References: <6d941f120810301658wec8678ald332abb8ddbdf80d@mail.gmail.com> <5aa3b3570811030736g7d7a0893x759777252c8d1828@mail.gmail.com> Message-ID: <6d941f120811031156s2f634c1aq4252b17308ecf24a@mail.gmail.com> Hi, On Mon, Nov 3, 2008 at 3:36 PM, Giovanni Marco Dall'Olio wrote: > For how much time do you think a biopython module should be kept compatible > with older versions, more or less? That is an interesting discussion. My view is that biopython is fairly conservative in that regard. I am not saying that I agree/disagree. There seems to be a certain policy in place, and I respect it. But the point is: Bio.PopGen has to have the same policy has the rest. > It will take a long time to develop the module, and it is sure that we will > make some mistakes. So, what is the best way to proceed? What if we create a I will try to offer my view about this as soon as possible (in the next days). > At the moment I am working with a separated git repository for all the > popgen modules. The problem is that I didn't include all biopython modules > in the repository, so, if any of my changes breaks something in biopython, I > won't know it until I'll merge everything with biopython code. It won't probably break anything as long as you don't change existing code. If you are only doing your parser I suppose it will be very easily accepted in (dont forget test cases and documentation). Regarding Statistics we need to discuss it. > p.s. When python3000 will be released, it will be probably necessary to > rewrite large portions of biopython, if not creating a 'biopython 2' version > (I think they were discussing something like this in bioperl's list). Peter and Michiel opinions on this topic are be fundamental (they do most of the work maintaining biopython). But I suppose retro compatibility is a must. > I thought that maybe, even if we make some 'mistakes' in this version of > biopython, we will be able to fix them in a later version. Mistakes should not break existing code. That is really something we should try to avoid. > I think that a good idea would be starting collecting use cases to have an > idea how many things we'll have to implement in this module. This might sound elitist, but most people doing population genetics don't really have any idea of what they should expect from software. While for the "business of sequences and alignment" there is a large, mature software community, the same doesn't happen in population genetics. Or to put it in another way: you don't want to imagine the type of questions that arrive to my private mailbox ;) . > I sent that mail to the Open::Bio::I last week, but still haven't received > many replies... I will send a message to the various Bio.* mailing list in > the next days. OBF, in my view, is a bit slow and bureaucratic. Anyway, i think that anybody's views will get more importance in proportion of the quantity of code submitted and time devoted to maintenance of the whole thing. Tiago From bugzilla-daemon at portal.open-bio.org Mon Nov 3 22:58:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 3 Nov 2008 17:58:11 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811032258.mA3MwBoH008744@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-03 17:58 EST ------- (In reply to comment #4) > > I noticed you are using set (line 145 in the new cvs file) which is not > compatible with Python2.3. How should this be addressed? > I've been using something like this elsewhere in Biopython: #TODO - Remove this work around once we drop python 2.3 support try: set except NameError: from sets import Set as set Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Nov 3 23:08:44 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 3 Nov 2008 23:08:44 +0000 Subject: [Biopython-dev] Bio.PopGen and SIMCOAL2 installation In-Reply-To: <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com> References: <320fb6e00811030443w4d620c83w64c83fdafb9afa96@mail.gmail.com> <6d941f120811031134p4c0f1756k5ded879de7555dad@mail.gmail.com> Message-ID: <320fb6e00811031508xfef548dm1a0673b7dba70567@mail.gmail.com> On Mon, Nov 3, 2008 at 7:34 PM, Tiago Ant?o wrote: > Hi, > > On Mon, Nov 3, 2008 at 12:43 PM, Peter wrote: >> However, Bio.PopGen.SimCoal.Controller seems to assume the executable >> will be called just simcoal2 (or simcoal2.exe on Windows), and thus >> fails detect a binary called simcoal2_1_2. The unit test however is >> more flexible and looks for any binary on the path whose name starts >> with simcoal2. Ideally these two should be consistent. > > I am aware of this, in fact, this issue is documented in the tutorial > (9.5.2.2). The idea is that the binary should be called simcoal2 as > documented. This can be changed of course. My preference would be to > change just the test code. Is this ok with you? > >> I can make test_PopGen_SimCoal.py pass by installing SIMCOAL2 as >> simcoal2 rather than simcoal2_1_2, but is this a SIMCOAL2 installation >> issue or a bug in Bio.PopGen.SimCoal.Controller? In my experience it >> is not normal for a Linux tool to include the full version in the >> executable name - using just simcoal2 does make more sense. > > Agree. And, again, this is documented in the tutorial. I can go ahead > and change the test code (please just confirm). I had skimmed over the tutorial, but missed this bit - sorry. Hopefully anyone interested in using SIMCOAL would have read this more carefully, but perhaps it could be made more prominent? e.g. try to include a few more keywords like install/installation and executable as well as binary (which I did not think to search for at the time). Let's just change test_PopGen_SimCoal.py to look for simcoal2 (or simcoal2.exe on Windows) so it is consistent with Bio.PopGen.SimCoal.Controller, and I would also mention what the binary should be called in the SimCoalController __init__ docstring. Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 4 09:31:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 04:31:19 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811040931.mA49VJOT019957@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 ------- Comment #2 from dalloliogm at gmail.com 2008-11-04 04:31 EST ------- I have tested the cvs code, it seems to work. Maybe you can allow ids to be integers, also. If you are afraid of causing problems to older scripts, you could str() the arguments if they are not strings. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 09:39:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 04:39:18 -0500 Subject: [Biopython-dev] [Bug 2443] Specifying the alphabet in Bio.SeqIO and Bio.AlignIO In-Reply-To: Message-ID: <200811040939.mA49dIQ9021075@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2443 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 04:39 EST ------- Marking as fixed - unit tests updated, and the new argument is mentioned in the tutorial as well. A more extensive example would be nice, perhaps using Bio.AlignIO with the Bio.Align.AlignInfo module... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 10:06:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 05:06:40 -0500 Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records In-Reply-To: Message-ID: <200811041006.mA4A6eAt024777@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2628 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 05:06 EST ------- Patch checked in, marking as fixed. Checking in Bio/SeqIO/Interfaces.py; /home/repository/biopython/biopython/Bio/SeqIO/Interfaces.py,v <-- Interfaces.py new revision: 1.11; previous revision: 1.10 done Checking in Bio/SeqIO/__init__.py; /home/repository/biopython/biopython/Bio/SeqIO/__init__.py,v <-- __init__.py new revision: 1.44; previous revision: 1.43 done Checking in Bio/AlignIO/Interfaces.py; /home/repository/biopython/biopython/Bio/AlignIO/Interfaces.py,v <-- Interfaces.py new revision: 1.7; previous revision: 1.6 done Checking in Bio/AlignIO/NexusIO.py; /home/repository/biopython/biopython/Bio/AlignIO/NexusIO.py,v <-- NexusIO.py new revision: 1.7; previous revision: 1.6 done Checking in Bio/AlignIO/__init__.py; /home/repository/biopython/biopython/Bio/AlignIO/__init__.py,v <-- __init__.py new revision: 1.19; previous revision: 1.18 done Checking in Tests/test_SeqIO.py; /home/repository/biopython/biopython/Tests/test_SeqIO.py,v <-- test_SeqIO.py new revision: 1.44; previous revision: 1.43 done Checking in Tests/test_AlignIO.py; /home/repository/biopython/biopython/Tests/test_AlignIO.py,v <-- test_AlignIO.py new revision: 1.17; previous revision: 1.16 done Checking in Tutorial.tex; /home/repository/biopython/biopython/Doc/Tutorial.tex,v <-- Tutorial.tex new revision: 1.183; previous revision: 1.182 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 10:51:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 05:51:23 -0500 Subject: [Biopython-dev] [Bug 2640] New: Proposal: doctest for SeqRecord/biopython Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2640 Summary: Proposal: doctest for SeqRecord/biopython Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com I would like to propose to use doctest tests in biopython. I found them very useful to understand how a script should be used, and moreover they can act as test units. Here it is the main documentation for unittest: - http://www.python.org/doc/2.5.2/lib/module-doctest.html Usually, you add a _test() function to every module, which calls the unittest libraries, and launch it with __name__ == '__main__'. The most significative example is added to the documentation string of every module/function, and tested with doctest.testmod(); later, you add more tests in a separate file, and launch them with doctest.testfile(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 10:52:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 05:52:21 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041052.mA4AqLGX028185@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #1 from dalloliogm at gmail.com 2008-11-04 05:52 EST ------- Created an attachment (id=1031) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1031&action=view) patch to add doctest to SeqRecord.py here it is a patch to add doctest documentation to Bio/SeqRecord.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 11:23:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 06:23:12 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041123.mA4BNCQ0030388@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 06:23 EST ------- Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) Patch to Bio/Seq.py to add start codon handling to translation Patch adds a new boolean argument to the translate method and function, called "init" (rather than my earlier suggestions like "from_start" or "check_start" which could be considered misleading). Docstring: init - Boolean, defaults to False. Should translation check the first codon is a valid initiation (start) codon and translate it as methionine (M)? If False, nothing special is done with the first codon. Example usage of the translate function, >>> from Bio.Seq import translate >>> translate("TTGAAACCCTAG") 'LKP*' >>> translate("TTGAAACCCTAG", init=True, to_stop=True) 'MKP' >>> translate("TTGAAACCCTAG", init=True) 'MKP*' >>> translate("TTGAAACCCTAG", to_stop=True) 'LKP' Using the Seq method, >>> from Bio.Seq import Seq >>> my_seq = Seq("TTGAAACCCTAG") >>> my_seq.translate() Seq('LKP*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_seq.translate(init=True, to_stop=True) Seq('MKP', ExtendedIUPACProtein()) >>> my_seq.translate(init=True) Seq('MKP*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_seq.translate(to_stop=True) Seq('LKP', ExtendedIUPACProtein()) Comments please. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 11:23:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 06:23:39 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041123.mA4BNdAS030439@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1031 is|0 |1 obsolete| | ------- Comment #2 from dalloliogm at gmail.com 2008-11-04 06:23 EST ------- Created an attachment (id=1033) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1033&action=view) patch to add doctest to SeqRecord.py This patch is maybe clearer than the previous one - it adds an example on adding annotations to a SeqRecord. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 4 11:36:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 4 Nov 2008 11:36:50 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta) Message-ID: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com> Dear all, The Numeric to numpy migration is done now, and we are also looking good for python 2.6. After a little off list discussion, its probably time to prepare the next release. However, given the number of changes, and therefore the higher risk that we've broken something, we'll call this a beta release. Are there any bugs or issues people think should block this release? I would like to check in my initiation/start codon argument patch for translation (see Bug 2381), but would like a little discussion on this first (in particular the argument naming). I'd like to try and do the Biopython 1.49 "beta" release at the end of this week (with a follow up Biopython 1.49 "final" release say one week later if needed to deal with any issues from the beta). If this schedule is realistic, then Tiago should be OK to add his next set of PopGen code in about two weeks time (for what would become Biopython 1.50). Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 4 11:48:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 06:48:53 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041148.mA4Bmrag032109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 06:48 EST ------- I think we would need to integrate this into the existing test framework so that any new doctests are actually used. For an example of this on a module by module basis, see test_Wise.py and test_psw.py (although these don't interact well with our test framework on Python 2.3, see bug 2613). If a large number of Biopython modules have doctests then a more automated system could be designed (searching all non-deprecated modules for doctests). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 12:04:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 07:04:54 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041204.mA4C4sHS000823@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #28 from dalloliogm at gmail.com 2008-11-04 07:04 EST ------- (In reply to comment #27) > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] > Patch to Bio/Seq.py to add start codon handling to translation > > Patch adds a new boolean argument to the translate method and function, called > "init" (rather than my earlier suggestions like "from_start" or "check_start" > which could be considered misleading). > > Docstring: > > init - Boolean, defaults to False. Should translation check the > first codon is a valid initiation (start) codon and translate > it as methionine (M)? If False, nothing special is done with > the first codon. I don't like the name 'init' :( it would be better to use an argument with the word 'force' in it. E.g.: force_has_coding, force_first_position, etc.. If you didn't have read this discussion in this bug report, it is not very clear what happens when init=True and why. You should add a description of why there is this options in the docstring. > > Example usage of the translate function, > > >>> from Bio.Seq import translate > >>> translate("TTGAAACCCTAG") > 'LKP*' > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > 'MKP' Without having read the discussion in this bug report, I was expecting an exception here.. why does it forces a Methionine to be in the first position? It loses the information of a Leu in the first position. > >>> translate("TTGAAACCCTAG", init=True) > 'MKP*' > >>> translate("TTGAAACCCTAG", to_stop=True) > 'LKP' > You could add a check for non coding aminoacids: >>> translate("UAACAGTGCAT") ExceptionError: Non coding aminoacid in the first position -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 12:28:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 07:28:56 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041228.mA4CSuvT002892@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 07:28 EST ------- (In reply to comment #28) > (In reply to comment #27) > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] > > Docstring: > > > > init - Boolean, defaults to False. Should translation check the > > first codon is a valid initiation (start) codon and translate > > it as methionine (M)? If False, nothing special is done with > > the first codon. > > I don't like the name 'init' :( it would be better to use an argument with the > word 'force' in it. E.g.: force_has_coding, force_first_position, etc.. Maybe - but I don't think force_has_coding, force_first_position are any clearer, and they are very long. Do you like "with_start_codon" or "with_init_codon"? Note that I used "init" rather than "initiation (codon)" because python already uses init as shorthand for initiation/initialisation. > If you didn't have read this discussion in this bug report, it is not very > clear what happens when init=True and why. If it have been called "start" or "from_start" or "start_codon" the meaning isn't clear either - you might "start" or expect "from_start" to take an integer location, and start_codon to take a three letter string. > You should add a description of why there is this options in the docstring. OK - That makes sense. > > > > Example usage of the translate function, > > > > >>> from Bio.Seq import translate > > >>> translate("TTGAAACCCTAG") > > 'LKP*' > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > > 'MKP' > > Without having read the discussion in this bug report, I was expecting an > exception here.. why does it forces a Methionine to be in the first position? > It loses the information of a Leu in the first position. Because if this was a CDS using an alternative start codon of TTG it would be translated as a methionine and NOT as a leucine (because instead of a typical tRNA-Leu, an initiation tRNA is used). This is whole point of this optional argument. If you want TTG translated blindly as M, don't use the init argument (or set it to False). See also http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi which explicitly lists these alternative codons as giving M when used as starts, e.g. AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M---------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 13:41:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 08:41:51 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811041341.mA4DfpYD009210@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-11-04 08:41 EST ------- I've committed Peter's fix for the set import to CVS. About the replacement for listfns.contents in the modified NaiveBayes code: Did you do any timings to compare the new code to the old code? Since listfns.contents is implemented in C, it may be (much) faster than the replacement code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 13:57:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 08:57:48 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041357.mA4Dvm2B010202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #30 from lpritc at scri.sari.ac.uk 2008-11-04 08:57 EST ------- (In reply to comment #29) > (In reply to comment #28) > > (In reply to comment #27) > > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details] > > > Docstring: > > > > > > init - Boolean, defaults to False. Should translation check the > > > first codon is a valid initiation (start) codon and translate > > > it as methionine (M)? If False, nothing special is done with > > > the first codon. > > > > I don't like the name 'init' :( it would be better to use an argument with the > > word 'force' in it. E.g.: force_has_coding, force_first_position, etc.. > > Maybe - but I don't think force_has_coding, force_first_position are any > clearer, and they are very long. Do you like "with_start_codon" or > "with_init_codon"? I think that there are two key things that are going on as a result of this setting being True: 1) The first codon (starting at position 0) of the nucleotide sequence is being checked as a valid initiation codon 2) If it is such a valid codon, the translated aa is Met (because this is what happens biologically). It's quite a complicated concept, and if we wanted to be completely explicit, an option called 'assert_first_codon_is_initiation_and_translate_to_met' would be clear, but would be far too long to be sensible. Most other shorter options are either ambiguous, misleading, or ambiguously misleading - largely because people will assume that the term means what they want it to mean instead of what it does, as described below: > If it have been called "start" or "from_start" or "start_codon" the meaning > isn't clear either - you might "start" or expect "from_start" to take an > integer location, and start_codon to take a three letter string. I am not too worried about long arguments, so 'assert_first_codon_init' would be fine for me (though does this mean that the first codon of the sequence should be an initiation codon, or that translation should start from the first initiation codon?), but I see the drive for, and value of, brevity. If there's a short, unambiguous option name that you can think of, I'm all for it. An option name that is a little cryptic, but not misleading, such as 'init', also works for me. I would have to go to the minor effort of typing help(seq.translate) to find out what it meant, but it's not very much of a chore. Also, people learn all kinds of non-standard uses for cryptic terms, all the time. For example, what on earth does 'popen3' mean? Why not open_pipes_with_stdin_stdout_stderr? 'popen3' is short, unambiguous (if not immediately obvious), and if you want to know what it means, then help() or a dip in the documentation will tell you. I think the same will be true of 'init', so long as no-one is likely to confuse it with some other meaning. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 13:58:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 08:58:21 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041358.mA4DwLiK010266@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #4 from dalloliogm at gmail.com 2008-11-04 08:58 EST ------- (In reply to comment #3) > I think we would need to integrate this into the existing test framework so > that any new doctests are actually used. For an example of this on a module by > module basis, see test_Wise.py and test_psw.py (although these don't interact > well with our test framework on Python 2.3, see bug 2613). > > If a large number of Biopython modules have doctests then a more automated > system could be designed (searching all non-deprecated modules for doctests). > If you think it would be useful, I can write other doctests for other modules in the following days. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 14:44:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 09:44:15 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041444.mA4EiFv5013693@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #31 from dalloliogm at gmail.com 2008-11-04 09:44 EST ------- (In reply to comment #30) > (In reply to comment #29) > > (In reply to comment #28) > > > (In reply to comment #27) > > > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] [details] [details] > > It's quite a complicated concept, and if we wanted to be completely explicit, > an option called 'assert_first_codon_is_initiation_and_translate_to_met' would > be clear, but would be far too long to be sensible. Most other shorter options > are either ambiguous, misleading, or ambiguously misleading - largely because > people will assume that the term means what they want it to mean instead of > what it does, as described below: > > > If it have been called "start" or "from_start" or "start_codon" the meaning > > isn't clear either - you might "start" or expect "from_start" to take an > > integer location, and start_codon to take a three letter string. > > I am not too worried about long arguments, so 'assert_first_codon_init' would > be fine for me (though does this mean that the first codon of the sequence > should be an initiation codon, or that translation should start from the first > initiation codon?), but I see the drive for, and value of, brevity. If there's > a short, unambiguous option name that you can think of, I'm all for it. An > option name that is a little cryptic, but not misleading, such as 'init', also > works for me. When I saw 'init' for the first time, I thought there it was some kind of complicated calculus associated with the translate function, that init=False was meant to skip in order to have some kind of faster but less accurate translation. > I would have to go to the minor effort of typing > help(seq.translate) to find out what it meant, but it's not very much of a > chore. It is also a matter of code readibility; I don't think many people would understand that init is meant for that by looking at a script. If I use this option in one of my scripts, and a colleague reads it, I want to be sure that he will be easily understand that I am forcing the first position to be a Methionine. Otherwise, the risk is that he won't understand properly my results. In which of these examples do you understand that the first position is being forced to a Methionine? >>> translate("TTGAAACCCTAG", init=True, to_stop=True) >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) Also, I don't think this option will be used very often. So, it shouldn't be a problem if its name is too long to type, and it would be better if it is easy to understand. > > Also, people learn all kinds of non-standard uses for cryptic terms, all the > time. For example, what on earth does 'popen3' mean? Why not > open_pipes_with_stdin_stdout_stderr? 'popen3' is short, unambiguous (if not > immediately obvious), When I was a python newbie, I really hated the name popen3 :) > and if you want to know what it means, then help() or a > dip in the documentation will tell you. I think the same will be true of > 'init', so long as no-one is likely to confuse it with some other meaning. > > L. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 14:45:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 09:45:17 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041445.mA4EjHg4013777@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 09:45 EST ------- (In reply to comment #4) > > If you think it would be useful, I can write other doctests for other modules > in the following days. > I think adding more doctests would be useful, but they MUST get run by our existing test suite. Otherwise they'll just be human readable documentation (which is still nice) but will not get regularly validated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 15:39:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 10:39:42 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041539.mA4Fdgc8017798@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #32 from lpritc at scri.sari.ac.uk 2008-11-04 10:39 EST ------- (In reply to comment #31) > (In reply to comment #30) > It is also a matter of code readibility; I don't think many people would > understand that init is meant for that by looking at a script. True enough, but if someone's already used it, and you don't know what it means when reading their script, looking it up isn't hard. What's hard is guessing which option you need to invoke, and calling help() is one way to do that, too... Not that I want to extend this argument to single-letter options with *no* relevance to their intent ;) seq.translate(a=True, b='GUG', c=9) > If I use this option in one of my scripts, and a colleague reads it, I want to > be sure that he will be easily understand that I am forcing the first position > to be a Methionine. > Otherwise, the risk is that he won't understand properly my results. Maybe put it in a comment-line? Even if the colleague understands from the code *that* you've translated an alternative start to a methionine, they may not understand *why* - and the comment line is essential, then. > In which of these examples do you understand that the first position is being > forced to a Methionine? None are particularly clear, but only one of them doesn't give me the wrong idea... > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) Because I've read this thread (or looked at the docs) - I understand this one ;) > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) I don't intuitively understand this. Does it mean that the sequence should be translatable? > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) Does this mean that the sequence will be translated from the first methionine the method finds? > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) As above, and does force_stop mean that you add a '*' to the end of the translation? Or that you stop at a stop codon? > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) 'alt_start' I would think referred to allowing translation from alternative start codons. I don't know what alt_stop would mean... > Also, I don't think this option will be used very often. Maybe not. The first use case that comes to mind is QA on CDS-finding: # Check if sequence is CDS: assert candidate_cds.translate(init=True) # Check if reported CDS start is valid assert est[37:].translate(init=True) A second use case is slower in presenting itself... > So, it shouldn't be a problem if its name is too long to type, and it would be > better if it is easy to understand. That's a fair argument, I think. On the whole, though, I would favour a short, unambiguous, slightly cryptic name over a very long, unambiguous name, over an ambiguous name of any length. > When I was a python newbie, I really hated the name popen3 :) At least we have subprocess, now. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 15:44:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 10:44:47 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811041544.mA4FilLH018113@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #6 from dalloliogm at gmail.com 2008-11-04 10:44 EST ------- (In reply to comment #5) > (In reply to comment #4) > > > > If you think it would be useful, I can write other doctests for other modules > > in the following days. > > > > I think adding more doctests would be useful, but they MUST get run by our > existing test suite. Otherwise they'll just be human readable documentation > (which is still nice) but will not get regularly validated. There are a few ways to do it, but it is not too difficult to implement. The easiest thing is to use 'doctest.testmod' in the test files. For example, you can add to test_SeqRecord.py the following lines: import doctest from Bio import SeqRecord # import the module, not SeqRecord.SeqRecord print "testing with doctest..." (failures, tests) = doctest.testmod(SeqRecord) if failures == 0: print 'ok' else: print 'some test has failed' or you can launch the '_test' function in every module (see my patch), but this would require importing doctest multiple times. >>> SeqRecord._test() I will write some other doctests in the following days/weeks and post them here as patches, and you will decide. Anyway, do you think they will make biopython's documentation nicer? Do you like them? Sometimes, doctests make the doc strings a bit messy, so some people don't like them. But it is really a matter of how you write them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 16:11:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 11:11:49 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041611.mA4GBnuW020154@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 11:11 EST ------- (In reply to comment #32) > > In which of these examples do you understand that the first position is > > being forced to a Methionine? With my suggested code, you would not just be forcing the first codon to be a methionine. You would also be asking for the first codon to be validated as a start codon (initialisation codon). > None are particularly clear, but only one of them doesn't give me the wrong > idea... In some cases I seem to have guessed different possible meanings for some of these suggested names - so those are probably unclear. > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > > Because I've read this thread (or looked at the docs) - I understand this one > ;) To me this suggests something special is happening with the initialisation of the translation - but I agree its not clear what without checking the documentation. > > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) > > I don't intuitively understand this. Does it mean that the sequence should be > translatable? Ditto - an argument called force_as_translating means nothing to me. You're calling a translation method so what can forcing a translation mean? > > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) > > Does this mean that the sequence will be translated from the first methionine > the method finds? I would have guessed force_methionine would ignore the value of the first three nucleotides in order to treat them as a methionine (even if they are not a start codon). > > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) > > As above, and does force_stop mean that you add a '*' to the end of the > translation? Or that you stop at a stop codon? Like Leighton, I would be confused by "force_stop". It could mean add a stop symbol to the end of the amino acid sequence even if there isn't one there already. > > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) > > 'alt_start' I would think referred to allowing translation from alternative > start codons. I don't know what alt_stop would mean... I think "alt_start" would be misleading for the intended dual functionality. Consider the typical use case for this option - translating a CDS, which most of the time will use the typical start codon AUG / ATG (but not all ways). We'd want the start codon validated - and it often won't be an alternative start codon. So calling the argument "alt_start" is confusing. > > Also, I don't think this option will be used very often. > > Maybe not. The first use case that comes to mind is QA on CDS-finding: > > # Check if sequence is CDS: > assert candidate_cds.translate(init=True) > # Check if reported CDS start is valid > assert est[37:].translate(init=True) > > A second use case is slower in presenting itself... I think translating a CDS is quite a common task - so a very long argument would be bad. Instead of the "init" start codon option in attachment 1032, I'd also be happy with a single boolean argument which does start codon validation, treats this as a methionine, checks the sequence is a multiple of three in length, checks for a final stop codon, and checks for no additional stop codons. We'd ruled out calling this "complete", but maybe "cds" would be better? > > So, it shouldn't be a problem if its name is too long to type, and it would > > be better if it is easy to understand. > > That's a fair argument, I think. On the whole, though, I would favour a > short, unambiguous, slightly cryptic name over a very long, unambiguous > name, over an ambiguous name of any length. There is a lot of subjectiveness in argument naming - clearly we have not come up with a perfect suggestion yet. Unfortunately "init" can be misunderstood (I'm not 100% sure what you were trying to say in comment 31, but I think you thought from the name "init" could be some sort of optional optimisation initialisation). How about "cds_start" instead of "init"? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 17:43:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 12:43:53 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041743.mA4Hhrcc026138@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #34 from bsouthey at gmail.com 2008-11-04 12:43 EST ------- (In reply to comment #33) > (In reply to comment #32) > > > In which of these examples do you understand that the first position is > > > being forced to a Methionine? > > With my suggested code, you would not just be forcing the first codon to be a > methionine. You would also be asking for the first codon to be validated as a > start codon (initialisation codon). > > > None are particularly clear, but only one of them doesn't give me the wrong > > idea... > > In some cases I seem to have guessed different possible meanings for some of > these suggested names - so those are probably unclear. > > > > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > > > > Because I've read this thread (or looked at the docs) - I understand this one > > ;) > > To me this suggests something special is happening with the initialisation of > the translation - but I agree its not clear what without checking the > documentation. > > > > >>> translate("TTGAAACCCTAG", force_as_translating=True, to_stop=True) > > > > I don't intuitively understand this. Does it mean that the sequence should be > > translatable? > > Ditto - an argument called force_as_translating means nothing to me. You're > calling a translation method so what can forcing a translation mean? > > > > >>> translate("TTGAAACCCTAG", force_methionine=True, to_stop=True) > > > > Does this mean that the sequence will be translated from the first methionine > > the method finds? > > I would have guessed force_methionine would ignore the value of the first three > nucleotides in order to treat them as a methionine (even if they are not a > start codon). > > > > >>> translate("TTGAAACCCTAG", force_methionine=True, force_stop=True) > > > > As above, and does force_stop mean that you add a '*' to the end of the > > translation? Or that you stop at a stop codon? > > Like Leighton, I would be confused by "force_stop". It could mean add a stop > symbol to the end of the amino acid sequence even if there isn't one there > already. > > > > >>> translate("TTGAAACCCTAG", alt_start=True, alt_stop=True) > > > > 'alt_start' I would think referred to allowing translation from alternative > > start codons. I don't know what alt_stop would mean... > > I think "alt_start" would be misleading for the intended dual functionality. > Consider the typical use case for this option - translating a CDS, which most > of the time will use the typical start codon AUG / ATG (but not all ways). > We'd want the start codon validated - and it often won't be an alternative > start codon. So calling the argument "alt_start" is confusing. > > > > Also, I don't think this option will be used very often. > > > > Maybe not. The first use case that comes to mind is QA on CDS-finding: > > > > # Check if sequence is CDS: > > assert candidate_cds.translate(init=True) > > # Check if reported CDS start is valid > > assert est[37:].translate(init=True) > > > > A second use case is slower in presenting itself... > > I think translating a CDS is quite a common task - so a very long argument > would be bad. > > Instead of the "init" start codon option in attachment 1032 [details], I'd also be happy > with a single boolean argument which does start codon validation, treats this > as a methionine, checks the sequence is a multiple of three in length, checks > for a final stop codon, and checks for no additional stop codons. We'd ruled > out calling this "complete", but maybe "cds" would be better? > > > > So, it shouldn't be a problem if its name is too long to type, and it would > > > be better if it is easy to understand. > > > > That's a fair argument, I think. On the whole, though, I would favour a > > short, unambiguous, slightly cryptic name over a very long, unambiguous > > name, over an ambiguous name of any length. > > There is a lot of subjectiveness in argument naming - clearly we have not come > up with a perfect suggestion yet. > > Unfortunately "init" can be misunderstood (I'm not 100% sure what you were > trying to say in comment 31, but I think you thought from the name "init" could > be some sort of optional optimisation initialisation). > > How about "cds_start" instead of "init"? > As I think about this and the various comments, I do that you must apply the same reasoning to non-standard translation as was applied to the ORF finding comments. From that I understand that you want a basic translation function so function arguments like to_stop or cds_start would be inappropriate. Also, even if it was possible, I do not see that validating all known start codons under all genetic codes fits here. Rather I think the various comments reflect various combinations of three major steps: 1) Identify the region to be translated like NCBI's sequence viewer: range from 'begin' to 'end' to denote the region to be viewed. Under this view, start_from or begin_at could be the position to start or the first occurrence of a start codon. Likewise to_end or end_at could be a position or the first occurrence of a stop codon. I also note this also implies frame but I think that has a separate meaning. 2) Having defined the region to be translated, translate that region as defined by the frame and selected table. A question here is that if region is defined then should the frame be set to one or not. 3) Address any non-standard codons to the translated sequence. If you are going to allow non-standard start codons, you also need to handle selenocysteine (http://en.wikipedia.org/wiki/Selenocysteine) and less so pyrrolysine (http://en.wikipedia.org/wiki/Pyrrolysine). Technically, you can argue the table used for translation in 2) should reflect this but I consider it a separate issue. Also, the occurrence of a stop codon would likewise need to change. The non-standard codon usages are rare and I do really question if these are really part of the Seq object translate function or belong elsewhere. I really feel that if the user already knows that it is a non-AUG start codon then they can replace the first amino acid with Met rather than rely on the translate function. For example, the CDS field in the Genbank record for Mouse Neuropeptide W (NM_001099664) has: /exception="alternative start codon" /note="non-AUG (CUG) translation initiation codon". So if the user looked at the record then then would know it would need to be changed. If some form of the non-standard codons is included I would think some variant of Leighton's assert idea should be preferred such as using an assert_nonstandard argument (or just nonstandard). This would be a string, list or tuple to denote the changes to be made such as say 'Met1' or 'M1' where three or single letter code of the desired amino acid and the number is the location within the amino acid sequence to be changed. So Met1 would mean changing the amino acid at position one with Methionine (M). But I recognize this is not sufficient to handle other non-standard cases with stop codons. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 18:28:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 13:28:19 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811041828.mA4ISJAd028961@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-04 13:28 EST ------- (In reply to comment #34) > As I think about this and the various comments, I do that you must apply the > same reasoning to non-standard translation as was applied to the ORF finding > comments. From that I understand that you want a basic translation function so > function arguments like to_stop or cds_start would be inappropriate. There is certainly an argument that the Bio.Seq translate function/methods should be kept as simple as possible while providing widely useful functionality. Perhaps given the lack of immediate agreement we are at that point already? Or perhaps this is a reflection of the different types of organisms people work with and thus the relative frequencies of non-standard start codons. > Also, even if it was possible, I do not see that validating all known start > codons under all genetic codes fits here. We have the valid start codons in the CodonTable objects derived from the NCBI, so it is possible to check them. > ... Address any non-standard codons to the translated sequence. If you are > going to allow non-standard start codons, you also need to handle > selenocysteine (http://en.wikipedia.org/wiki/Selenocysteine) and less so > pyrrolysine (http://en.wikipedia.org/wiki/Pyrrolysine). Why? Non-standard codons are pretty common in prokaryotes and the rules for translating them are simple (once the start codon is identified). On the other hand selenocysteine and pyrrolysine are very rare, and we can't define a computer rule to deal with them - so we don't even try. > The non-standard codon usages are rare and I do really question if these are > really part of the Seq object translate function or belong elsewhere. I really > feel that if the user already knows that it is a non-AUG start codon then they > can replace the first amino acid with Met rather than rely on the translate > function. For example, the CDS field in the Genbank record for Mouse > Neuropeptide W (NM_001099664) has: > /exception="alternative start codon" > /note="non-AUG (CUG) translation initiation codon". > So if the user looked at the record then then would know it would need to be > changed. Non-standard start codons are not that rare in prokaryotes (and I would not expect them to be annotated like your mouse example). When translating a well annotated sequence, the location itself should be enough. [I'm assuming we're not talking about the other meaning of the phrase "alternative start codons" - where a gene may have multiple valid start codons giving proteins of different lengths but the same C-terminal region.] > If some form of the non-standard codons is included I would think some > variantof Leighton's assert idea should be preferred such as using an > assert_nonstandard argument (or just nonstandard). This would be a string, > list or tuple to denote the changes to be made such as say 'Met1' or 'M1' > where three or single letter code of the desired amino acid and the number > is the location within the amino acid sequence to be changed. So Met1 would > mean changing the amino acid at position one with Methionine (M). But I > recognize this is not sufficient to handle other non-standard cases with > stop codons. I thought Leighton was just proposing another name for a boolean argument which I had called "init" in attachment 1032. I'm afraid I don't understand your idea of a complicated list argument. ============================================================================= Here is a concrete example, there are 418 annotated genes in E. coli K12 with non-standard start codons - which you might want to translate into proteins. #Using ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K12_substr__MG1655/NC_000913.ffn >>> from Bio import SeqIO >>> odd = [record for record in SeqIO.parse(open("NC_000913.ffn"),"fasta") \ if str(record.seq[:3]) <> "ATG"] >>> print "There are %i genes not starting ATG" % len(odd) There are 481 genes not starting ATG >>> record = odd[0] >>> print record.format("fasta") >ref|NC_000913.2|:5234-5530 GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA This starts GTC which is a valid bacterial start codon. I'd like to translate this and get the actual biologically relevant protein as given in the GenBank file NC_000913.gbk (maybe with or without the stop symbol at the end). See: CDS 5234..5530 /gene="yaaX" /locus_tag="b0005" /codon_start=1 /transl_table=11 /product="predicted protein" /protein_id="NP_414546.1" /db_xref="ASAP:ABE-0000015" /db_xref="UniProtKB/Swiss-Prot:P75616" /db_xref="GI:16127999" /db_xref="ECOCYC:G6081" /db_xref="EcoGene:EG14384" /db_xref="GeneID:944747" /translation="MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGY YWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR" Without any non-standard start codon support, my translations start with a V: >>> print record.seq.translate(table=11) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* >>> print record.seq.translate(table=11, to_stop=True) VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR With this proposed functionality I can obtain the desired results (both with and without the terminator stop symbol): >>> print record.seq.translate(table=11, to_stop=True, init=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR >>> print record.seq.translate(table=11, init=True) MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQHYEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR* I think that wanting to translate a CDS like this is a fairly common operation. Perhaps not as common as translation of a partial sequence, or translating whole genomes or contigs where we want to translate through the stop codons -- but nevertheless, a common need. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 22:47:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 17:47:02 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811042247.mA4Ml2At014897@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #7 from bsouthey at gmail.com 2008-11-04 17:47 EST ------- (In reply to comment #6) > I've committed Peter's fix for the set import to CVS. > > About the replacement for listfns.contents in the modified NaiveBayes code: Did > you do any timings to compare the new code to the old code? Since > listfns.contents is implemented in C, it may be (much) faster than the > replacement code. > (Hopefully I created a patch correctly.) The purpose of listfns.contents() is to compute the frequency of each class and return it as a dictionary. There is a difference but it is very small between the different versions (1/100ths of second) for what I have looked at (which is more than the actual listfns.contents function). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 4 22:48:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 17:48:12 -0500 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200811042248.mA4MmCiZ015012@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #6 from bsouthey at gmail.com 2008-11-04 17:48 EST ------- Created an attachment (id=1036) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view) Patch to NaiveBayes -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 02:33:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 21:33:32 -0500 Subject: [Biopython-dev] [Bug 2631] Updated Bio.MaxEntropy to remove listfns import In-Reply-To: Message-ID: <200811050233.mA52XWrB025772@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2631 ------- Comment #7 from bsouthey at gmail.com 2008-11-04 21:33 EST ------- (In reply to comment #6) > Created an attachment (id=1036) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1036&action=view) [details] > Patch to NaiveBayes > Sorry about this as I do not know how this ended up here. Please just ignore it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 02:35:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 4 Nov 2008 21:35:53 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811050235.mA52Zr0b025894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1014 is|0 |1 obsolete| | ------- Comment #8 from bsouthey at gmail.com 2008-11-04 21:35 EST ------- Created an attachment (id=1037) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view) Patch to update NaiveBayes Hopefully I got this correct, if not just let me know. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 10:24:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 05:24:15 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051024.mA5AOF60024355@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 05:24 EST ------- (In reply to comment #8) > Created an attachment (id=1037) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1037&action=view) [details] > Patch to update NaiveBayes > > Hopefully I got this correct, if not just let me know. > At first glance it looks like this patch would remove the Python 2.3 set work around. Easily fixed. Also, I would have called the new get_content_freq function _get_content_freq (leading underscore denoting private) as this is an implementation detail that doesn't need to be part of the public API. I'm curious what your other implementations looked like, as this one does not look that clear to me at first read: p_contents=1.0/len(contents) content_freqs={} for cval in contents: vcount=content_freqs.get(cval,0)+p_contents content_freqs.update({cval:vcount}) In particular, why use the dict update method? Given the possible rounding issues, does doing the rescaling (dividing by the number of elements) at the start make a big time saving (over dividing each total at the end)? I would feel happier with the division at the end (as done in the listfns code). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 12:06:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 07:06:04 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811051206.mA5C64Pg030176@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 07:06 EST ------- I've updated Bio.Seq, Bio.SeqIO and Bio.AlignIO so my existing docstring examples can be used with doctest. Adding code via the __main__ trick to allow each module's test to be run individually might be worthwhile. The rest of this message is a possible "test_docstrings.py" file for our unit tests, which would require manual updating whenever we want to test a additional module. This is probably a neat short term solution while only a relatively small proportion of Biopython uses doctests. ----------------------------------------------------------------- #!/usr/bin/env python # This code is part of the Biopython distribution and governed by its # license. Please see the LICENSE file that should have been included # as part of this package. import doctest, unittest from Bio import Seq, SeqRecord, SeqIO, AlignIO test_modules = [Seq, SeqRecord, SeqIO, AlignIO] test_suite = unittest.TestSuite((doctest.DocTestSuite(module) \ for module in test_modules)) #Using sys.stdout prevent this working nicely when run from idle: #runner = unittest.TextTestRunner(sys.stdout, verbosity = 0) #Using verbosity = 0 means we won't have to regenerate the unit #test output file used by the run_tests.py framework whenever a #new module or doctest is added. runner = unittest.TextTestRunner(verbosity = 0) runner.run(test_suite) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 13:12:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:12:28 -0500 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200811051312.mA5DCSYZ004411@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #4 from chapmanb at 50mail.com 2008-11-05 08:12 EST ------- Fixed with Bio/GenBank/__init__.py 1.93, Bio/SeqFeature.py 1.14. Coordinates are now passed correctly with Peter's suggested fix. The empty slice issue is resolved by adding this as a special case to FeatureLocation nofuzzy attribute retrieval. For standard retrieval the classes are fully available to the user and they would need to make the distinction about how they would like to treat them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 13:14:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:14:51 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811051314.mA5DEpVe004918@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from chapmanb at 50mail.com 2008-11-05 08:14 EST ------- Fixed with Bio/GenBank/__init__.py 1.93, Bio/GenBank/Record.py 1.11 and Bio/GenBank/Scanner.py 1.24 The PROJECT line is parsed as a list of projects for both SeqIO and Record based parsing, for consistency. Output of PROJECT line also added. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 13:18:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:18:22 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051318.mA5DIMPJ005649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2008-11-05 08:18 EST ------- See http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html for some timings of this operation. I think Bruce's approach is most suitable, except for the dict update method; I would use content_freqs[cval] = content_freqs.get(cval,0)+p_contents instead. Depending on the contents of the list, sometimes it runs even faster than the implementation in listfns. > > Given the possible rounding issues, does doing the rescaling (dividing by the > number of elements) at the start make a big time saving (over dividing each > total at the end)? I would feel happier with the division at the end (as done > in the listfns code). > I think the rescaling at the start is a good thing. If the list contains many different objects, rescaling at the end can take a long time. Probably that is not the typical use case here, but on the other hand I don't see a good reason not to save time here. Maybe just my nitpicking, but I think the get_content_freq function will be more readable if we use different variable names inside this function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 13:31:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:31:49 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811051331.mA5DVnNI007802@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 08:31 EST ------- Do you think we have to worry about multiple project lines, or project entries spanning multiple lines? This would require a slight difference to the parsing (to append new project entries instead of replacing any prior entries), and to the output from the record object (including line wrapping). HOWEVER, reading the latest ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt it seems the PROJECT line will be replaced with a DBLINK line next year. With that in mind, I would now suggest we parse the PROJECT and/or DBLINK lines and store them in the record.dbxrefs list (rather than in the annotations). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 13:34:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 08:34:41 -0500 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200811051334.mA5DYfWx008228@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 08:34 EST ------- Hi Brad, Looking back on this I may have been out by one on the extension calculation, i.e. I'm not 100% sure position.high.val-position.low.val is appropriate. I'll try and look at this later... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 16:51:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 11:51:07 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051651.mA5Gp7R6003323@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #11 from bsouthey at gmail.com 2008-11-05 11:51 EST ------- (In reply to comment #10) > See > http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html > for some timings of this operation. I think Bruce's approach is most suitable, > except for the dict update method; I would use > content_freqs[cval] = content_freqs.get(cval,0)+p_contents > instead. Depending on the contents of the list, sometimes it runs even faster > than the implementation in listfns. > > > > Given the possible rounding issues, does doing the rescaling (dividing by the > > number of elements) at the start make a big time saving (over dividing each > > total at the end)? I would feel happier with the division at the end (as done > > in the listfns code). > > > I think the rescaling at the start is a good thing. If the list contains many > different objects, rescaling at the end can take a long time. Probably that is > not the typical use case here, but on the other hand I don't see a good reason > not to save time here. > > Maybe just my nitpicking, but I think the get_content_freq function will be > more readable if we use different variable names inside this function. > (In reply to comment #10) > See > http://coreygoldberg.blogspot.com/2008/07/python-counting-items-in-list.html > for some timings of this operation. I think Bruce's approach is most suitable, > except for the dict update method; I would use > content_freqs[cval] = content_freqs.get(cval,0)+p_contents > instead. Depending on the contents of the list, sometimes it runs even faster > than the implementation in listfns. Basically the goal is find the frequency of each class and store it in a dictionary with the keys being each class and the value being the frequency. So you could count up all observations in each class (essentially a adding one to the appropriate class sum) and then divide each count by the total number of observations - as implemented in the dictget approach.Being more cryptic, we can avoid the second division by adding one/number of observations instead one to the appropriate class sum as implemented in get_content_freq. Thanks for the link, I created a timing code for random lists. get_content_freq is the one I put in the patch get_content_freq2 is the modified version ternary is based the Cory code modified to give frequencies rather than counts dictget is using a dictionary to count then get the frequencies listfns.contents is the Biopython Python version without the C code import. clistfns.contents is the direct import of Biopython module that uses C code My system is running 64-bit Fedora on Linux with Python 2.5.2. The number of observation is not important (difference is very small), I used 1000000 random integers and measured just doing it once and repeat the test 5 times with 1000000 executions and get the minimum time ie min(timeit.repeat(5, 1000000)). Also, this function is not called that much in the NaiveBayes so these are rather extreme cases. Range of ints between one and two: get_content_freq once: 1.90734863281e-05 best of 5: 8.11614704132 get_content_freq2 once: 8.10623168945e-06 best of 5: 4.39126110077 ternary file once: 1.59740447998e-05 best of 5: 9.42879796028 dictget file once: 1.4066696167e-05 best of 5: 10.468517065 listfns.contents once: 1.28746032715e-05 best of 5: 7.50778198242 clistfns.contents once: 6.91413879395e-06 best of 5: 2.71360707283 Range of ints between one and ten: get_content_freq once: 1.90734863281e-05 best of 5: 7.97784090042 get_content_freq2 once: 7.15255737305e-06 best of 5: 4.21833491325 ternary file once: 1.69277191162e-05 best of 5: 9.18815684319 dictget file once: 1.50203704834e-05 best of 5: 10.2242910862 listfns.contents once: 1.50203704834e-05 best of 5: 7.25569987297 clistfns.contents once: 8.10623168945e-06 best of 5: 2.6411280632 Range of ints between one and one hundred: get_content_freq once: 2.00271606445e-05 best of 5: 7.99760317802 get_content_freq2 once: 7.86781311035e-06 best of 5: 4.20446300507 ternary file once: 1.71661376953e-05 best of 5: 9.26767396927 dictget file once: 1.4066696167e-05 best of 5: 10.2449028492 listfns.contents once: 1.4066696167e-05 best of 5: 7.34166693687 clistfns.contents once: 7.15255737305e-06 best of 5: 2.63198709488 So this not dependent on the number of classes. For the most part this numbers are showing more system overheads than major differences between the actual approaches. Therefore I would clearly go with Michiel's version. > > > > Given the possible rounding issues, does doing the rescaling (dividing by the > > number of elements) at the start make a big time saving (over dividing each > > total at the end)? I would feel happier with the division at the end (as done > > in the listfns code). > > > I think the rescaling at the start is a good thing. If the list contains many > different objects, rescaling at the end can take a long time. Probably that is > not the typical use case here, but on the other hand I don't see a good reason > not to save time here. >From the two case scenario above, the get_content_freq methods result in: {1: 0.49978999999354606, 2: 0.50020999999354643} and the others result in: {1: 0.49979000000000001, 2: 0.50021000000000004} On my 64-bit linux system the numerical error is small but within the expectations. It may be worse on a 32-bit system or OS. I really wanted to draw attention to this because tiny differences can be important (not to mention people who don't understand enough about numerical precision). > > Maybe just my nitpicking, but I think the get_content_freq function will be > more readable if we use different variable names inside this function. > Please rename as necessary. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 17:00:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 12:00:42 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811051700.mA5H0gxV003976@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #12 from bsouthey at gmail.com 2008-11-05 12:00 EST ------- Created an attachment (id=1038) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1038&action=view) timing different implementions of listfns.content This is my timing code for different implementions of listfns.content. It does assume that there is a local version of listnfs.py without the import clistfns statement at the end and the clistfns function from Bio. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 20:30:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 15:30:46 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052030.mA5KUklP023725@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #36 from bsouthey at gmail.com 2008-11-05 15:30 EST ------- (In reply to comment #35) Okay, this is what I think of the main uses for translation. All these can be easily achieved by the translate arguments table='Standard' and stop_symbol='*' with very little code. So I do not see any need for any extra arguments except for convenience. (I have these uses in file that I will upload after this.) So really my only issue left is what is the expected behaviour for: a) to_stop_codon=True if there are no valid stop codons (my understanding of to_stop). b) from_start_codon=True (or init=True etc) if there are no valid start codons 1) Translation in some given forward frame - reverse frames should be obvious. Looping over these will give all three frames but that could return multiple Seq objects. 2) Translation between any range of locations. From Peter's example, extracting the region between 5234 to 5530 in the complete sequence will give the yaaX gene CDS that can be translated into the protein sequence. 3a) Translate to the first valid stop codon. Perhaps not as expected because it should respect the frame so try: 3b) Translate to the first valid stop codon with respect to selected frame. 3c) Alternatively use to_stop=True argument of the translate. Here translation is to the first valid stop codon OR the end of the sequence. This second aspect is not documented. 4a) Start translation at first start codon. Again, does not respect frame so try: 4b) Translate to the first valid start codon with respect to selected frame. In both cases of 4) the very first codon must be checked against the defined start_codon list in the appropriate CodonTable. Obviously 3) and 4) should raise exceptions if stop or start codons are not found because of the specific request to stop or start translation. But, as in 3c), this could be relaxed to include the end of the sequence. I am not sure the behaviour if there is no valid start codon. Also some variation of 3a) and 4a) could be used to find possible open reading frames (from a start codon to stop codon). But this could return more than one Seq object. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 20:33:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 15:33:52 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052033.mA5KXqqJ023824@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #37 from bsouthey at gmail.com 2008-11-05 15:33 EST ------- Created an attachment (id=1039) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1039&action=view) examples of possible uses of translate -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 22:12:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 17:12:13 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052212.mA5MCDhY028649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #38 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 17:12 EST ------- (In reply to comment #36) > (In reply to comment #35) > Okay, this is what I think of the main uses for translation. > All these can be easily achieved by the translate arguments > table='Standard' and stop_symbol='*' with very little code. > So I do not see any need for any extra arguments except > for convenience. (I have these uses in file that I will > upload after this.) Most of your examples seem to relate to open reading frame searches, looking for start/stop codons etc. I agree this kind of thing isn't needed in the basic translate method/function. Doing a CDS translation however is more fiddly due to the methionine at the start, and I think this warrents another option in the basic translate method/function. > So really my only issue left is what is the expected behaviour for: > a) to_stop_codon=True if there are no valid stop codons (my understanding of > to_stop). If you are asking about the current to_stop argument in CVS right now, if there is no in frame stop codon it will translate all the sequence (to_stop has no effect). I've just updated the docstring to make this more explicit (see Bio/Seq.py CVS revision 1.55). Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > b) from_start_codon=True (or init=True etc) if there are no valid start codons As written in attachment 1032, if the sequence does not start with a valid start codon an exception is raised. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 5 23:09:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 5 Nov 2008 18:09:01 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811052309.mA5N91aO031273@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #39 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-05 18:09 EST ------- Created an attachment (id=1040) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view) Patch to Bio/Seq.py for complete CDS translation. (In reply to comment #33) > Instead of the "init" start codon option in attachment 1032, > I'd also be happy with a single boolean argument which does > start codon validation, treats this as a methionine, checks > the sequence is a multiple of three in length, checks for a > final stop codon, and checks for no additional stop codons. > We'd ruled out calling this "complete", but maybe "cds" > would be better? This patch adds this functionality via a "complete_cds" boolean argument. Here is how it could be applied to translate the CDS used as an example in my comment 35, the yaaX gene in E. coli K12: >>> from Bio.Seq import Seq >>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA") >>> my_cds.translate(table=11) Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*', HasStopCodon(ExtendedIUPACProtein(), '*')) >>> my_cds.translate(table=11, to_stop=True) Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', ExtendedIUPACProtein()) >>> my_cds.translate(table=11, complete_cds=True) Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', ExtendedIUPACProtein()) I would be happy with EITHER of these options, as both can be used to translate a complete coding sequence: (1) the "init" argument (under another name, maybe "cds_start"?) illustrated in attachment 1032. This would check the start codon is valid AND translate it as a methionine. (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) illustrated in this patch. This would check the start codon is valid AND translate it as a methionine AND check there are a whole number of codons AND check it ends with a stop codon AND check there are no extra in-frame stop codons. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:14:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:14:07 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811061114.mA6BE7jk002000@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 ------- Comment #3 from dalloliogm at gmail.com 2008-11-06 06:14 EST ------- Created an attachment (id=1041) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) add a check for the seq argument in seqrecord, to be a Seq object and not None This patch adds a check for the seq argument in SeqRecord. If seq is None (by default), it raises a ValueError Exception. If it is a Seq objects, it saves it as self.seq. If it is another kind of object (string, list, integer), it is converted to a string, and then used to instantiate a seq object. I thought that someone could use an integer (e.g.: 010100010101101) as a sequence, and in this case, the integer is first converted to a string (otherwise Seq() would return an error). Please, take care with this patch: I have messed a bit with cvs and patches :(, so, this patch contains also a doctest example that I have added for my self (see bug report 2640). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:31:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:31:57 -0500 Subject: [Biopython-dev] [Bug 2643] New: Proposal: fastPhaseOutputIO for SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2643 Summary: Proposal: fastPhaseOutputIO for SeqIO Product: Biopython Version: Not Applicable Platform: PC URL: http://github.com/dalloliogm/biopython--- popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com CC: tiagoantao at gmail.com Hi, fastPHASE is software for haplotype reconstruction and missing genotype estimation from population genetic SNP data. - http://stephenslab.uchicago.edu/software.html It is commonly used by some population genetics bioinformaticians. I had to convert the output from a fastPhase run to fasta; so I wrote a module that reads a fastPhase output file, and returns SeqRecord objects. fastPhase output contains information about SNPs and genotyping, and would probably be supported by the PopGen module that is being written for biopython. However, my module is thought to be used only to read the sequence information from the output file, and to create SeqRecord objects, ignoring any other kind of information. So, in the future we could have to fastPhaseOutputIterator-like modules, one that creates SeqRecord objects, and one other to be used in PopGen. The module has been tested with doctest. I'll attach a file with the tests along with the module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:40:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:40:17 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061140.mA6BeHwc003465@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #1 from dalloliogm at gmail.com 2008-11-06 06:40 EST ------- Created an attachment (id=1042) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1042&action=view) fastPhase output iterator, for SeqIO If invoked directly, this module tries to call doctest.testfile over a file called test_fastPhaseOutputIO.py (I will post it in 5 minutes). You should edit this module to point it to the right file path on your computer. This module is thought to be used with SeqIO. You should modify SeqIO.__init__.py and add it to the _FormatToIterator dictionary. I didn't wrote a Writer handler, because you are not supposed to create fastPhaseOutput files manually (even if it could be useful for testing purposes). You can see the git history of this module here: - http://github.com/dalloliogm/biopython---popgen/tree/master/src/PopGen/Gio/fastPhaseOutputIO.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:42:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:42:55 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061142.mA6Bgt77003705@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #2 from dalloliogm at gmail.com 2008-11-06 06:42 EST ------- Created an attachment (id=1043) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1043&action=view) this is a doctest file to test fastPhaseOutputIterator This file is called by fastPhaseOutputIO, when __name__ == '__init__' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:44:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:44:55 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061144.mA6BitTU003910@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #3 from dalloliogm at gmail.com 2008-11-06 06:44 EST ------- Created an attachment (id=1044) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1044&action=view) adds fastPhaseOutput support to SeqIO this patchs adds fastPhaseOutput support to SeqIO (not tested) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 11:50:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 06:50:39 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments to their types In-Reply-To: Message-ID: <200811061150.mA6Bod9J004289@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 06:50 EST ------- (In reply to comment #3) > Created an attachment (id=1041) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details] > add a check for the seq argument in seqrecord, to be a Seq object and not None > > This patch adds a check for the seq argument in SeqRecord. > If seq is None (by default), it raises a ValueError Exception. > If it is a Seq objects, it saves it as self.seq. > If it is another kind of object (string, list, integer), it is converted to a > string, and then used to instantiate a seq object. I was deliberately not checking the seq argument. There are several reasonable use cases: * a Seq object (normal) or a subclass of it. * a MutableSeq object (seems reasonable, note this is not a subclass of Seq) * None (seems a good way to handle sequence records where we don't know the sequence - for example some GenBank files). * a user defined sequence object which implements the Seq API but does not subclass Seq or MutableSeq (this is more difficult to check). > I thought that someone could use an integer (e.g.: 010100010101101) as a > sequence, and in this case, the integer is first converted to a string > (otherwise Seq() would return an error). Note that if someone did want to use some weird numerical sequence, then the SeqRecord object should NOT be trying to do anything special (guessing what is intended). The user should create a suitable Seq object themselves (ideally with a numerical alphabet object). Explicit rather than implicit (Zen of python). -- Note that I'm not 100% happy with the type checking we've just added. See "duck-typing" and interfaces versus types, http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46 The checks I've added shouldn't be too constraining - but maybe they should use using interface checking instead (or just revert back to no checking). Any comments from other people? This should be being CC'd to the dev mailing list. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 12:14:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 07:14:04 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061214.mA6CE4PD005743@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 07:14 EST ------- Hi Marco, This looks interesting :) Could you attach the individual valid sample fastPHASE files as separate attachments (so they can be integrated into the existing unit tests). You seem to have picked very small files in order to use them as doctests; a larger more realistic example would be better for the unit tests (a few 5kb in size should be OK - not too big). Do you have URL for the file format documentation? Are they always DNA for example, or is RNA also possible? If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope with any valid fastPHASE output. In the doctests you have an example: ... BEGIN GENOTYPES ... Ind1 # subpop. label: 6 (internally 1) ... T ... T C ... Ind2 # subpop. label: 6 (internally 1) ... C ... T ... END GENOTYPES You're treating this as an error - "Two chromosomes with different length". Why isn't it parsed as four short sequences (of different lengths): "T", "TC", "C", "T"? Similarly, the final example: ... BEGIN GENOTYPES ... Ind1 # subpop. label: 6 (internally 1) ... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G ... Ind2 # subpop. label: 6 (internally 1) ... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G ... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A ... END GENOTYPES Again, you raised an error - "Missing sequence in input file". If this is a valid file shouldn't it be parsed as three sequences? On the other hand, are these hand edited files which deliberately break the rules? If fastPHASE files SHOULD always come in allele groups (of the same length), then it would be better to integrate the parser into Bio.AlignIO giving pairwise alignments (and you would be able to read it via Bio.SeqIO automatically as well). P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. Would "fastphase" be OK, or is there more than one format? e.g. an input format which might be confused with this? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 12:21:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 07:21:09 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061221.mA6CL9e8006180@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 07:21 EST ------- (In reply to comment #4) > You seem to have picked very small files in order to use them as > doctests; a larger more realistic example would be better for the > unit tests (a few 5kb in size should be OK - not too big). Sorry - that was a typo. I meant a few kb in size (5kb should be OK). I don't have a feel for the typical size of real fastPHASE output, but a few interesting real examples (e.g. covering a range of fastPHASE command line options) would be better than a single large file. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 12:25:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 07:25:42 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061225.mA6CPgsn006472@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 07:25 EST ------- P.S. The module's docstring needs some work - your introduction for this bug might be a good start. We should include the URL http://stephenslab.uchicago.edu/software.html and the reference in the module's docstring: Scheet, P and Stephens, M (2006) "A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase." Am J Hum Genet 78(4):629-44. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Thu Nov 6 13:18:54 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 6 Nov 2008 13:18:54 +0000 Subject: [Biopython-dev] Preparing for Biopython 1.49 (beta) In-Reply-To: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com> References: <320fb6e00811040336k12a834b9o2fa103b8fabf7ec1@mail.gmail.com> Message-ID: <6d941f120811060518w388bd471g129aafdaf02381d4@mail.gmail.com> On Tue, Nov 4, 2008 at 11:36 AM, Peter wrote: > If this schedule is realistic, then Tiago should be OK to add his next > set of PopGen code in about two weeks time (for what would become > Biopython 1.50). I am working on documentation and test cases for LDNe and extra GenePop support (this is more or less orthogonal to the ongoing discussion on statistics), code is all done for weeks. I will start to upload it as soon as you unfroze CVS from 1.49. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 14:24:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 09:24:12 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061424.mA6EOCcB015073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #40 from bsouthey at gmail.com 2008-11-06 09:24 EST ------- (In reply to comment #38) > (In reply to comment #36) > > (In reply to comment #35) > > Okay, this is what I think of the main uses for translation. > > All these can be easily achieved by the translate arguments > > table='Standard' and stop_symbol='*' with very little code. > > So I do not see any need for any extra arguments except > > for convenience. (I have these uses in file that I will > > upload after this.) > > Most of your examples seem to relate to open reading frame searches, looking > for start/stop codons etc. I agree this kind of thing isn't needed in the > basic translate method/function. > > Doing a CDS translation however is more fiddly due to the methionine at the > start, and I think this warrents another option in the basic translate > method/function. > > > So really my only issue left is what is the expected behaviour for: > > a) to_stop_codon=True if there are no valid stop codons (my understanding of > > to_stop). > > If you are asking about the current to_stop argument in CVS right now, if there > is no in frame stop codon it will translate all the sequence (to_stop has no > effect). I've just updated the docstring to make this more explicit (see > Bio/Seq.py CVS revision 1.55). > > Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > I think to_end because end does mean the end of the translation due to a stop codon or end of a sequence. > > b) from_start_codon=True (or init=True etc) if there are no valid start codons > > As written in attachment 1032 [details], if the sequence does not start with a valid > start codon an exception is raised. > Okay. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 14:35:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 09:35:40 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061435.mA6EZe5F015831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #41 from lpritc at scri.sari.ac.uk 2008-11-06 09:35 EST ------- (In reply to comment #40) > > If you are asking about the current to_stop argument in CVS right now, if there > > is no in frame stop codon it will translate all the sequence (to_stop has no > > effect). I've just updated the docstring to make this more explicit (see > > Bio/Seq.py CVS revision 1.55). > > > > Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > > > I think to_end because end does mean the end of the translation due to a stop > codon or end of a sequence. I would take 'to_end' to mean 'to the end of the passed sequence, ignoring all stop codons along the way'. 'to_first_stop' is clearer, to my mind, and even that leaves out the potential (and hopefully redundant) qualifier 'in-frame' ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 14:46:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 09:46:48 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061446.mA6Ekmfj016554@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #42 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 09:46 EST ------- Peter in comment #40 >>> If you are asking about the current to_stop argument in CVS right now, >>> if there is no in frame stop codon it will translate all the sequence >>> (to_stop has no effect). I've just updated the docstring to make this >>> more explicit (see Bio/Seq.py CVS revision 1.55). >>> >>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"? >>> Bruce in comment #41: >> I think to_end because end does mean the end of the translation >> due to a stop codon or end of a sequence. >> Leighton in comment #42: > I would take 'to_end' to mean 'to the end of the passed sequence, > ignoring all stop codons along the way'. 'to_first_stop' is > clearer, to my mind, and even that leaves out the potential (and > hopefully redundant) qualifier 'in-frame' ;) > I agree with Leighton here, "to_end" sounds like "to the end of the sequence given". I quite like "to_first_stop", but it is longer than "to_stop". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 15:07:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:07:06 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061507.mA6F76PK018513@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #43 from bsouthey at gmail.com 2008-11-06 10:07 EST ------- (In reply to comment #39) > Created an attachment (id=1040) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view) [details] > Patch to Bio/Seq.py for complete CDS translation. > > (In reply to comment #33) > > Instead of the "init" start codon option in attachment 1032 [details], > > I'd also be happy with a single boolean argument which does > > start codon validation, treats this as a methionine, checks > > the sequence is a multiple of three in length, checks for a > > final stop codon, and checks for no additional stop codons. > > We'd ruled out calling this "complete", but maybe "cds" > > would be better? > > This patch adds this functionality via a "complete_cds" boolean argument. > > Here is how it could be applied to translate the CDS used as an example in my > comment 35, the yaaX gene in E. coli K12: > > >>> from Bio.Seq import Seq > >>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA") > >>> my_cds.translate(table=11) > Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*', > HasStopCodon(ExtendedIUPACProtein(), '*')) > >>> my_cds.translate(table=11, to_stop=True) > Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', > ExtendedIUPACProtein()) > >>> my_cds.translate(table=11, complete_cds=True) > Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR', > ExtendedIUPACProtein()) > > I would be happy with EITHER of these options, as both can be used to translate > a complete coding sequence: > > (1) the "init" argument (under another name, maybe "cds_start"?) illustrated in > attachment 1032 [details]. This would check the start codon is valid AND translate it as > a methionine. > > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) > illustrated in this patch. This would check the start codon is valid AND > translate it as a methionine AND check there are a whole number of codons AND > check it ends with a stop codon AND check there are no extra in-frame stop > codons. > I support (1) but strongly disagree with (2) because 'cds' refers to a complete DNA sequence not just if the sequence starts with M. http://www.yeastgenome.org/help/glossary.html "CDS: CoDing Sequence, region of nucleotides that corresponds to the sequence of amino acids in the predicted protein. The CDS includes start and stop codons, therefore coding sequences begin with an "ATG" and end with a stop codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR, introns, or bases not expressed due to frameshifting, are not included within a CDS. Note that the CDS does not correspond to the actual mRNA sequence." However, I do like being able to obtain the translation of the actual CDS - just not here. I do not support the name 'init' because of reasons discussed. I do not support the name 'cds_start' because of the DNA interpretation and that many Genbank records include the upstream and downstream non-coding regions. In such cases, I would have to find the actual start codon, then I might as well do the translation after that start codon than rely on a check that might be wrong. Perhaps some variant of: a) Similar cases in Python: has_met or has_met1 get_met or get_met1 b) More direct meaning: starts_with_methionine, starts_with_met, starts_with_m -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 15:08:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:08:17 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061508.mA6F8HRo018696@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #44 from bsouthey at gmail.com 2008-11-06 10:08 EST ------- (In reply to comment #42) > Peter in comment #40 > >>> If you are asking about the current to_stop argument in CVS right now, > >>> if there is no in frame stop codon it will translate all the sequence > >>> (to_stop has no effect). I've just updated the docstring to make this > >>> more explicit (see Bio/Seq.py CVS revision 1.55). > >>> > >>> Do you think "to_stop_codon" is a clearer argument name than "to_stop"? > >>> > > Bruce in comment #41: > >> I think to_end because end does mean the end of the translation > >> due to a stop codon or end of a sequence. > >> > > Leighton in comment #42: > > I would take 'to_end' to mean 'to the end of the passed sequence, > > ignoring all stop codons along the way'. 'to_first_stop' is > > clearer, to my mind, and even that leaves out the potential (and > > hopefully redundant) qualifier 'in-frame' ;) > > > > I agree with Leighton here, "to_end" sounds like "to the end of the sequence > given". I quite like "to_first_stop", but it is longer than "to_stop". > Either is fine with me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 15:11:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:11:38 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061511.mA6FBcAY019165@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 10:11 EST ------- I've now had a quick look at the fastPHASE documentation, and I have the impression that the sequences should always come in pairs: "Output ???les for inferred haplotypes or imputed genotypes contain two lines per given diploid individual, with the order of individuals corresponding to that supplied in the input ???le." Assuming the paired sequences are always the same length, this does suggest the format should be integrated into Bio.AlignIO (giving pairwise alignments) rather than Bio.SeqIO. Have you tried not estimating the haplotypes (by supplying a negative integer following -H), and does this alter the sequence output? Finally could you try the -Z command line argument for the simplified output format (described as two lines per individual, without ???id??? lines, subpopulation labels or summary information from the run). Does this have the sequences? If so this may be a more parser friendly set of output to parse for Bio.SeqIO and/or Bio.AlignIO. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 15:27:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:27:07 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061527.mA6FR7TQ021259@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #45 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 10:27 EST ------- (In reply to comment #43) > (In reply to comment #39) > > I would be happy with EITHER of these options, as both can be used to > > translate a complete coding sequence: > > > > (1) the "init" argument (under another name, maybe "cds_start"?) > > illustrated in attachment 1032. This would check the start > > codon is valid AND translate it as a methionine. > > > > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) > > illustrated in this patch. This would check the start codon is valid AND > > translate it as a methionine AND check there are a whole number of codons > > AND check it ends with a stop codon AND check there are no extra in-frame > > stop codons. > > > > > I support (1) but strongly disagree with (2) because 'cds' refers to > a complete DNA sequence not just if the sequence starts with M. > http://www.yeastgenome.org/help/glossary.html > "CDS: CoDing Sequence, region of nucleotides that corresponds to the > sequence of amino acids in the predicted protein. The CDS includes start and > stop codons, therefore coding sequences begin with an "ATG" and end with a > stop codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR, > introns, or bases not expressed due to frameshifting, are not included within > a CDS. Note that the CDS does not correspond to the actual mRNA sequence." Starting with that definition but being aware of atypical start codons gives: "The CDS includes start and stop codons, therefore coding sequences begin with an "ATG" [or other valid start codon] and end with a stop codon." This then fits exactly with what I'm doing in the "complete_cds" option (attachment 1040). So why the disagreement? > However, I do like being able to obtain the translation of the actual > CDS - just not here. Back in comment 11, I previously mooted having separate methods like translate_to_stop, and translate_cds - but we currently seem to be leaning towards one method with some options. > I do not support the name 'init' because of reasons discussed. I think that is settled, "init" is too ambiguous. > I do not support the name 'cds_start' because of the DNA interpretation and > that many Genbank records include the upstream and downstream non-coding > regions. In such cases, I would have to find the actual start codon, then I > might as well do the translation after that start codon than rely on a check > that might be wrong. In such cases, if your sequence might includes upstream and downstream non-coding regions, then you shouldn't be trying to use the "init"/"cds_start" option (or the "complete_cds" option). By the nature of your uncertain dataset, you'll have to do some extra work to find the start/stop. I don't see how this is an argument against providing an option useful for when you do know where the CDS starts (or do already have the CDS). > Perhaps some variant of: > a) Similar cases in Python: > has_met or has_met1 > get_met or get_met1 > b) More direct meaning: > starts_with_methionine, starts_with_met, starts_with_m > I'd been avoiding names with methionine in them, preferring to focus on initiation or start codon based names. I guess "starts_with_met" is OK. Or maybe "start_met"? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 15:28:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 10:28:20 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061528.mA6FSKMv021486@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #46 from lpritc at scri.sari.ac.uk 2008-11-06 10:28 EST ------- (In reply to comment #43) > > (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?) > > illustrated in this patch. This would check the start codon is valid AND > > translate it as a methionine AND check there are a whole number of codons AND > > check it ends with a stop codon AND check there are no extra in-frame stop > > codons. > I support (1) but strongly disagree with (2) because 'cds' refers to a complete > DNA sequence not just if the sequence starts with M. > http://www.yeastgenome.org/help/glossary.html > "CDS: CoDing Sequence, region of nucleotides that corresponds to the > sequence of amino acids in the predicted protein. The CDS includes start and > stop codons, therefore coding sequences begin with an "ATG" and end with a stop > codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR, > introns, or bases not expressed due to frameshifting, are not included within a > CDS. Note that the CDS does not correspond to the actual mRNA sequence." That definition seems to correspond exactly to (2), above; not that web-based definitions have any particular authority ;) "Begin with an ATG" is a eukaryote-specific statement; "Begin with a (valid) start codon" covers this. "End with a stop codon", implying the *first in-frame* stop codon is the same in both cases. Where do you see that they differ? > I do not support the name 'cds_start' because of the DNA interpretation and > that many Genbank records include the upstream and downstream non-coding > regions. In such cases, I would have to find the actual start codon, then I > might as well do the translation after that start codon than rely on a check > that might be wrong. I don't think that the argument is proposed for that particular use-case, which is why I don't think it's valid, there. If, say, you knew that the 5`UTR ran to base 17, then you could check with seq[17:].translate(complete_cds=True) or some such arrangement - but that's not the problem that's being solved with that method argument, I think. > Perhaps some variant of: > a) Similar cases in Python: > has_met or has_met1 > get_met or get_met1 > b) More direct meaning: > starts_with_methionine, starts_with_met, starts_with_m I quite like this way of checking sequence properties, and would prefer an is_cds() (or, to be pedantic, is_conceptual_cds()) method that returns a Boolean, but otherwise implements the sort of behaviour described above. If you only wanted the conceptual translations of sequences that fit the criteria for a CDS, then a one-liner to replace [seq.translate(cds=True) for seq in seqlist] might be [seq.translate() for seq in seqlist if seq.is_cds()] I prefer the second option, for readability, but YMMV. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 16:06:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 11:06:46 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061606.mA6G6kL7028787@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #8 from dalloliogm at gmail.com 2008-11-06 11:06 EST ------- (In reply to comment #4) > Hi Marco, Hi!! :) > This looks interesting :) > > Could you attach the individual valid sample fastPHASE files as separate > attachments (so they can be integrated into the existing unit tests). You seem > to have picked very small files in order to use them as doctests; a larger more > realistic example would be better for the unit tests (a few 5kb in size should > be OK - not too big). ok Actually I have been using files which come from our laboratory analysis, and I would like to ask if I include them here and how first. > Do you have URL for the file format documentation? The fastphase format seems to be described only in fastphase's manual, which is only accessible after accepting a license agreement. I could contact the authors of the program to ask them to publish the format specifications publicly. It would be in their interest, as otherwise the format could be considered as a not standard. I'll let you know.. > Are they always DNA for example, or is RNA also possible? They should be DNA, In principle they could be also genes, or other kind of characters, but this software is designed for the purpose of reconstructing haplotypes from SNPs/microsatellites. Maybe Tiago has some more experience in this.. > If you want to include a fastPHASE parser in Bio.SeqIO it should ideally cope > with any valid fastPHASE output. In the doctests you have an example: > > ... BEGIN GENOTYPES > ... Ind1 # subpop. label: 6 (internally 1) > ... T > ... T C > ... Ind2 # subpop. label: 6 (internally 1) > ... C > ... T > ... END GENOTYPES > You're treating this as an error - "Two chromosomes with different length". > Why isn't it parsed as four short sequences (of different lengths): "T", "TC", > "C", "T"? You should not have a file in which a chromosome is longer than the other one... instead, you should have a '?' indicating data that the program could not infer. > Similarly, the final example: > > ... BEGIN GENOTYPES > ... Ind1 # subpop. label: 6 (internally 1) > ... T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G > ... Ind2 # subpop. label: 6 (internally 1) > ... C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G > ... T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A > ... END GENOTYPES > > Again, you raised an error - "Missing sequence in input file". If this is a > valid file shouldn't it be parsed as three sequences? Because that would mean that one individual has only a chromosome. It doesn't make sense to run fastPhase on an haploid individual. > On the other hand, are these hand edited files which deliberately break the > rules? Yes. Usually you shouldn't have neither of the two cases. But I find it useful when a script tells me if there are weird things in my files (I could have modified them accidentally). This could be refactored in a check_fileformat function. > If fastPHASE files SHOULD always come in allele groups (of the same > length), then it would be better to integrate the parser into Bio.AlignIO > giving pairwise alignments (and you would be able to read it via Bio.SeqIO > automatically as well). This is good idea, I didn't think of it. But how should I modify the module to produce AlignIO objects? > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case rule. > Would "fastphase" be OK, or is there more than one format? e.g. an input > format which might be confused with this? I agree.. I wasn't sure of biopython's naming conventions. > > Peter > Scheet and Stephens (2006) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 16:12:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 11:12:15 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061612.mA6GCFHq029869@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #9 from dalloliogm at gmail.com 2008-11-06 11:12 EST ------- (In reply to comment #7) > I've now had a quick look at the fastPHASE documentation, and I have the > impression that the sequences should always come in pairs: right! > "Output ???les for inferred haplotypes or imputed genotypes contain two lines > per given diploid individual, with the order of individuals corresponding to > that supplied in the input ???le." > > Assuming the paired sequences are always the same length, this does suggest the > format should be integrated into Bio.AlignIO (giving pairwise alignments) > rather than Bio.SeqIO. > Have you tried not estimating the haplotypes (by supplying a negative integer > following -H), and does this alter the sequence output? I will try it, ok. > Finally could you try the -Z command line argument for the simplified output > format (described as two lines per individual, without ???id??? lines, > subpopulation labels or summary information from the run). Does this have the > sequences? If so this may be a more parser friendly set of output to parse for > Bio.SeqIO and/or Bio.AlignIO. ok, I can try to implement both of the two formats, but for the moment I will prefer to concetrate on one. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 17:11:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 12:11:26 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061711.mA6HBQN5007343@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #47 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 12:11 EST ------- (In reply to comment #46) > If you only wanted the conceptual translations of sequences that fit the > criteria for a CDS, then a one-liner to replace > > [seq.translate(cds=True) for seq in seqlist] > > might be > > [seq.translate() for seq in seqlist if seq.is_cds()] > > I prefer the second option, for readability, but YMMV. > Note the above wouldn't give you translations starting with methionine, you'd need something like: [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()] (assuming we call the "init" option "cds_start") Or, going with the complete_cds option you could build a list of translations of valid CDSs like this: proteins = [] for seq in seqlist : try : proteins.append(seq.translate(complete_cds=True)) except ValueError : #Not a valid CDS, excluded pass Not a one liner, but I think in a real situation you'd want to do something with the invalid CDSs anyway (even if just logging them). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 17:32:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 12:32:52 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061732.mA6HWqE7009337@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #48 from lpritc at scri.sari.ac.uk 2008-11-06 12:32 EST ------- (In reply to comment #47) > (In reply to comment #46) > > [seq.translate() for seq in seqlist if seq.is_cds()] > > > > I prefer the second option, for readability, but YMMV. > > Note the above wouldn't give you translations starting with methionine, you'd > need something like: > > [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()] > > (assuming we call the "init" option "cds_start") Fair point... my focus was on putting that filter into the list comprehension. > Or, going with the complete_cds option you could build a list of translations > of valid CDSs like this: > > proteins = [] > for seq in seqlist : > try : > proteins.append(seq.translate(complete_cds=True)) > except ValueError : > #Not a valid CDS, excluded > pass > > Not a one liner, but I think in a real situation you'd want to do something > with the invalid CDSs anyway (even if just logging them). True enough. It comes down in part to a preference of style, as the same could be achieved with proteins = [] for seq in seqlist : if seq.is_cds(): proteins.append(seq.translate(complete_cds=True)) else: #Not a valid CDS, excluded pass I think the clarity of this arrangement to my eyes comes from 'is/is not a cds' being - naturally-speaking - a property or attribute of the sequence itself. The 'cds_start' argument in your example is then an instruction to treat the translation as though you have a CDS, and implement some specialised behaviour that is appropriate under that circumstance, rather than to implement a test that raises an error if it is failed. By separating the 'is_cds()' call from the 'cds_start' argument, you gain the ability to translate the sequence with either the methionine or the coded amino acid, without losing the test of the sequence being a CDS. Of course, using the 'cds_start=True' argument could force a call to self.is_cds(), anyway. Your non-one-liner could then be as you originally wrote: proteins = [] for seq in seqlist : try: proteins.append(seq.translate(complete_cds=True)) except ValueError: #Not a valid CDS, excluded pass The two advantages I see to having the is_cds() method as a separate call are that it permits separation of the determining the CDS status of the sequence, and that it provides a filter that is more readable than attempting to translate the sequence to find out if it's a valid CDS. If the 'cds_start' argument forces a self.is_cds() test, then the usage can be - I think - exactly as you've been proposing throughout the thread. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 17:33:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 12:33:12 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061733.mA6HXCuE009403@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 12:33 EST ------- (In reply to comment #8) > > ok > Actually I have been using files which come from our laboratory analysis, > and I would like to ask if I include them here and how first. If you can get permission to include a real example (and its not too big) that would be great. Ideally something with at least three alleles. > > Do you have URL for the file format documentation? > > The fastphase format seems to be described only in fastphase's manual, > which is only accessible after accepting a license agreement. > I could contact the authors of the program to ask them to publish the format > specifications publicly. It would be in their interest, as otherwise the > format could be considered as a not standard. I'll let you know. It's not very open, is it :( Are there any other tools that output this file format? Do you think the author might be willing to just add an option to output the sequences in another format (e.g. FASTA, or better an alignment format designed for more than one alignment). This would be a neater solution in the long run (and would benefit anyone using fastPhase - not just Biopython). > > Are they always DNA for example, or is RNA also possible? > > They should be DNA, In principle they could be also genes, or other kind of > characters, but this software is designed for the purpose of reconstructing > haplotypes from SNPs/microsatellites. > Maybe Tiago has some more experience in this.. If it is for DNA only, the sequences/alignments returned should ideally specify a DNA alphabet. > ... > Because that would mean that one individual has only a chromosome. > It doesn't make sense to run fastPhase on an haploid individual. Is fastPhase only for haploids? Could it be used with polyploidy (e.g. plants)? > > On the other hand, are these hand edited files which deliberately break the > > rules? > > Yes. Usually you shouldn't have neither of the two cases. But I find it > useful when a script tells me if there are weird things in my files (I > could have modified them accidentally). Yes - negative test cases are good. However, having them as a doctest made the docstring rather confusing. > > If fastPHASE files SHOULD always come in allele groups (of the same > > length), then it would be better to integrate the parser into Bio.AlignIO > > giving pairwise alignments (and you would be able to read it via Bio.SeqIO > > automatically as well). > > This is good idea, I didn't think of it. > But how should I modify the module to produce AlignIO objects? Essentially Instead of: yield record_one yield record_two you'd do something like this: alignment = Alignment(generic_dna) alignment.add_sequence(id_one, seq_one) alignment.add_sequence(id_two, seq_two) yield alignment > > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case > > rule. Would "fastphase" be OK, or is there more than one format? e.g. > > an input format which might be confused with this? > > I agree.. I wasn't sure of biopython's naming conventions. > This is written down elsewhere - but the format name is a lowercase string (and this is enforced in the API), and the same names are used in both SeqIO and AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS. (In reply to comment #9) > (In reply to comment #7) > > Finally could you try the -Z command line argument for the simplified output > > format (described as two lines per individual, without ???id??? lines, > > subpopulation labels or summary information from the run). Does this have > > the sequences? If so this may be a more parser friendly set of output to > > parse for Bio.SeqIO and/or Bio.AlignIO. > > ok, I can try to implement both of the two formats, but for the moment I will > prefer to concetrate on one. I was actually thinking the -Z format might be much simpler to deal with (I didn't mean to suggest supporting both). On the other hand, the documentation does say the -Z is "not intended for general use". Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Thu Nov 6 18:09:55 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 6 Nov 2008 19:09:55 +0100 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: <200811061733.mA6HXCuE009403@portal.open-bio.org> References: <200811061733.mA6HXCuE009403@portal.open-bio.org> Message-ID: <5aa3b3570811061009i29bb2faflb456978dacbf5218@mail.gmail.com> On Thu, Nov 6, 2008 at 6:33 PM, wrote: > > > > > ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 12:33 EST ------- > (In reply to comment #8) >> >> ok >> Actually I have been using files which come from our laboratory analysis, >> and I would like to ask if I include them here and how first. > > If you can get permission to include a real example (and its not too big) that > would be great. Ideally something with at least three alleles. ok.. >> > Do you have URL for the file format documentation? >> >> The fastphase format seems to be described only in fastphase's manual, >> which is only accessible after accepting a license agreement. >> I could contact the authors of the program to ask them to publish the format >> specifications publicly. It would be in their interest, as otherwise the >> format could be considered as a not standard. I'll let you know. > > It's not very open, is it :( > > Are there any other tools that output this file format? Do you think the > author might be willing to just add an option to output the sequences in > another format (e.g. FASTA, or better an alignment format designed for more > than one alignment). This would be a neater solution in the long run (and > would benefit anyone using fastPhase - not just Biopython). Not for my knowledge. Anyway, consider that a fastPhase run could take days for medium/big samples. In some situations it could be faster to convert its output to fasta (or other ones) directly, instead of re-calculating the results. >> > Are they always DNA for example, or is RNA also possible? >> >> They should be DNA, In principle they could be also genes, or other kind of >> characters, but this software is designed for the purpose of reconstructing >> haplotypes from SNPs/microsatellites. >> Maybe Tiago has some more experience in this.. > > If it is for DNA only, the sequences/alignments returned should ideally specify > a DNA alphabet. mmm ok... Basically it could be used also with characters like genes and other markers.. but in that case, it would not make sense to parse it as a sequence, so nobody would try to do it. >> Because that would mean that one individual has only a chromosome. >> It doesn't make sense to run fastPhase on an haploid individual. > > Is fastPhase only for haploids? Could it be used with polyploidy (e.g. > plants)? I think not... It would be another class of problem. What fastPhase does, is trying to infer haplotypes from genotype data. Humans and most eukaryotes are diploid, so they have two copies of each chromosome; when you genotype markers, for every individuals, you get two informations for each (e.g. 'AC' for a SNP). Let's say you are studying two SNPs in an single individual: you will have 'AC' for the first marker, and 'GT' for the second (you already know that they are in the same chromosome). You want to know which are the haplotypes, which means, if the 'A' from the first SNP is on the same molecule of the 'G' from the second SNP, and so on. For example, you could have a chromosome with 'AG' and the other with 'CT'; or a chromosome with 'AT' and the other with 'CG', and fastPhase tries to calculate which is the most likely (I won't be able to explain all the details properly). Moreover, fastPhase (there are other programs) can infer missing genotype data, which is useful when you have big collections of SNPs. That said, I don't know if it is able to infer haplotypes in polyploid organisms, but I don't think so, as it would be a different class of problem (more complex). I thought that the best thing to do is to do not support poliploidy, and if someone else that uses fastPhase to calculate that comes, it would be easy to adapt the module for it (it would require to just add an option) >> > On the other hand, are these hand edited files which deliberately break the >> > rules? >> >> Yes. Usually you shouldn't have neither of the two cases. But I find it >> useful when a script tells me if there are weird things in my files (I >> could have modified them accidentally). > > Yes - negative test cases are good. However, having them as a doctest made the > docstring rather confusing. mmm I know, that doctest could be refactored. I have started using test recently... I find it is a lot better. > >> > If fastPHASE files SHOULD always come in allele groups (of the same >> > length), then it would be better to integrate the parser into Bio.AlignIO >> > giving pairwise alignments (and you would be able to read it via Bio.SeqIO >> > automatically as well). >> >> This is good idea, I didn't think of it. >> But how should I modify the module to produce AlignIO objects? > > Essentially Instead of: > > yield record_one > yield record_two > > you'd do something like this: > > alignment = Alignment(generic_dna) > alignment.add_sequence(id_one, seq_one) > alignment.add_sequence(id_two, seq_two) > yield alignment sounds easy :) > >> > P.S. Your suggested format name "fastPhaseOutput" breaks the lower case >> > rule. Would "fastphase" be OK, or is there more than one format? e.g. >> > an input format which might be confused with this? >> >> I agree.. I wasn't sure of biopython's naming conventions. >> > > This is written down elsewhere - but the format name is a lowercase string (and > this is enforced in the API), and the same names are used in both SeqIO and > AlignIO. Where possible we use the same name as BioPerl's SeqIO and EMBOSS. > > (In reply to comment #9) >> (In reply to comment #7) >> > Finally could you try the -Z command line argument for the simplified output >> > format (described as two lines per individual, without "id" lines, >> > subpopulation labels or summary information from the run). Does this have >> > the sequences? If so this may be a more parser friendly set of output to >> > parse for Bio.SeqIO and/or Bio.AlignIO. >> >> ok, I can try to implement both of the two formats, but for the moment I will >> prefer to concetrate on one. > > I was actually thinking the -Z format might be much simpler to deal with (I > didn't mean to suggest supporting both). On the other hand, the documentation > does say the -Z is "not intended for general use". The problem is that it could take days to run a fastPhase... most of the times you want the longer format, and then proceed to parse it. Anyway, it should not be a big problem to implement it (I am just putting all of that information in SeqRecord.description) > > Peter > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Thu Nov 6 18:20:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 13:20:20 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811061820.mA6IKK31012133@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #49 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 13:20 EST ------- OK - thank you all for your input thus far. Unfortunately it is clear that we haven't reached a consensus about translating sequences which begin with the start codon (or the more special case of translating a CDS sequence). However, I hope we are all happy with how things look in CVS right now, which offers a blind translation continuing over any stop codon, and the "to_stop" option which will terminate translation at the first in frame stop codon: See http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Seq.py?cvsroot=biopython for the full code, but in summary: class Seq(object): ... def translate(self, table="Standard", stop_symbol="*", to_stop=False): """Turns a nucleotide sequence into a protein sequence. New Seq object. Trying to back-transcribe a protein sequence raises an exception. This method will translate DNA or RNA sequences. Trying to translate a protein sequence raises an exception. table - Which codon table to use? This can be either a name (string) or an NCBI identifier (integer). This defaults to the "Standard" table. stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, "*". to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence). ... With the module level function taking the same arguments: def translate(sequence, table="Standard", stop_symbol="*", to_stop=False): """Translate a nucleotide sequence into amino acids. If given a string, returns a new string object. Given a Seq or MutableSeq, returns a Seq object with a protein alphabet. ... I think everyone is content with the naming of the "to_stop" argument. I'm planning to prepare the Biopython 1.49 beta release tomorrow, so I'm proposing we leave translation like this for Biopython 1.49 (and close this bug), and revisit translation after that is done (hopefully in less than two weeks time). The code in CVS is still a big improvement in terms of writing object orientated code. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 18:34:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 13:34:03 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811061834.mA6IY3ra013125@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 13:34 EST ------- Replying to Marco's email on the dev mailing list: >> Are there any other tools that output this file format? Do you think the >> author might be willing to just add an option to output the sequences in >> another format (e.g. FASTA, or better an alignment format designed for more >> than one alignment). This would be a neater solution in the long run (and >> would benefit anyone using fastPhase - not just Biopython). > > Not for my knowledge. > Anyway, consider that a fastPhase run could take days for medium/big samples. > In some situations it could be faster to convert its output to fasta > (or other ones) directly, instead of re-calculating the results. OK - I had not appreciated the run time involved. Clearly it would not be sensible to have to repeat a long analysis just to get the results in another format (e.g. as FASTA, or the simplified -Z output whatever that looks like). >> If it is for DNA only, the sequences/alignments returned should ideally >> specify a DNA alphabet. > > mmm ok... > Basically it could be used also with characters like genes and other > markers.. but in that case, it would not make sense to parse it as a > sequence, so nobody would try to do it. That's interesting, and means assuming DNA wouldn't be safe. Just use the single letter alphabet then (rather than defaulting to the completely generic base alphabet). >>> Because that would mean that one individual has only a chromosome. >>> It doesn't make sense to run fastPhase on an haploid individual. >> >> Is fastPhase only for haploids? Could it be used with polyploidy (e.g. >> plants)? > > I think not... It would be another class of problem. > What fastPhase does, is trying to infer haplotypes from genotype data. OK - you can probably tell I'm not a population biologist from the questions ;) >> I was actually thinking the -Z format might be much simpler to deal >> with (I didn't mean to suggest supporting both). On the other hand, >> the documentation does say the -Z is "not intended for general use". > > The problem is that it could take days to run a fastPhase... most of > the times you want the longer format, and then proceed to parse it. > Anyway, it should not be a big problem to implement it OK (as I wrote above), I can see now that using the simplified -Z output is not sensible. > (I am just putting all of that information in SeqRecord.description) If we know the meaning of some of these fields, then ideally they should go in the annotations dictionary, rather than just in the SeqRecord description. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 19:00:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 14:00:59 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811061900.mA6J0xi3015085@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-06 14:00 EST ------- I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple unit test from comment 7 to make sure these get validated as part of the Biopython test suite. How does that look to you Marco? I've kept the __init__ example short, not doing anything with annotations. Do you think we should also have the __main__ trick in all modules with doctests? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 19:41:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 14:41:44 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811061941.mA6JfiHM019925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #9 from dalloliogm at gmail.com 2008-11-06 14:41 EST ------- (In reply to comment #8) > I've added a few doctests to SeqRecord.py in CVS revision 1.24, plus the simple > unit test from comment 7 to make sure these get validated as part of the > Biopython test suite. > > How does that look to you Marco? I've kept the __init__ example short, not > doing anything with annotations. I think they look ok.. to me, they seem good examples of how to use the module. > Do you think we should also have the __main__ trick in all modules with > doctests? I am not really experienced in managing such big projects... but I think it could be ok, at least for now. I would personally keep the __init__ trick for every module, because it would make easier to test a single module when you are still writing it. But to test many modules subsequently, the code you posted in in #7 is the way to do. so... in short, I don't know!! :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 6 20:34:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 15:34:36 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811062034.mA6KYa6b026157@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #50 from bsouthey at gmail.com 2008-11-06 15:34 EST ------- (In reply to comment #48) > (In reply to comment #47) > > (In reply to comment #46) > > > > [seq.translate() for seq in seqlist if seq.is_cds()] > > > > > > I prefer the second option, for readability, but YMMV. > > > > Note the above wouldn't give you translations starting with methionine, you'd > > need something like: > > > > [seq.translate(cds_start=True) for seq in seqlist if seq.is_cds()] > > > > (assuming we call the "init" option "cds_start") > > Fair point... my focus was on putting that filter into the list comprehension. > > > Or, going with the complete_cds option you could build a list of translations > > of valid CDSs like this: > > > > proteins = [] > > for seq in seqlist : > > try : > > proteins.append(seq.translate(complete_cds=True)) > > except ValueError : > > #Not a valid CDS, excluded > > pass > > > > Not a one liner, but I think in a real situation you'd want to do something > > with the invalid CDSs anyway (even if just logging them). > > True enough. It comes down in part to a preference of style, as the same could > be achieved with > > proteins = [] > for seq in seqlist : > if seq.is_cds(): > proteins.append(seq.translate(complete_cds=True)) > else: > #Not a valid CDS, excluded > pass > > I think the clarity of this arrangement to my eyes comes from 'is/is not a cds' > being - naturally-speaking - a property or attribute of the sequence itself. > The 'cds_start' argument in your example is then an instruction to treat the > translation as though you have a CDS, and implement some specialised behaviour > that is appropriate under that circumstance, rather than to implement a test > that raises an error if it is failed. By separating the 'is_cds()' call from > the 'cds_start' argument, you gain the ability to translate the sequence with > either the methionine or the coded amino acid, without losing the test of the > sequence being a CDS. > > Of course, using the 'cds_start=True' argument could force a call to > self.is_cds(), anyway. Your non-one-liner could then be as you originally > wrote: > > proteins = [] > for seq in seqlist : > try: > proteins.append(seq.translate(complete_cds=True)) > except ValueError: > #Not a valid CDS, excluded > pass > > The two advantages I see to having the is_cds() method as a separate call are > that it permits separation of the determining the CDS status of the sequence, > and that it provides a filter that is more readable than attempting to > translate the sequence to find out if it's a valid CDS. If the 'cds_start' > argument forces a self.is_cds() test, then the usage can be - I think - exactly > as you've been proposing throughout the thread. > The use of 'cds' alone is wrong because cds refer to DNA not translation and not to protein sequences. The use of cds is confusing or at least vague until you determine how it works. Also it could be wrong in the sense it is a valid cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just not allowed by the table in Bio.Data.CodonTable. I don't object to the purpose, rather I do object to the name. My overriding issue here is that 'cds_start' does not convey the purpose of this argument and this is likely to remain for some time in the API. One interpretation that also comes to mind is that it is the location of the start of the cds in the sequence (cds start at...). I really feel that the name must clearly reflect that it invokes a test that the first codon are in the 'start_codon' list (defined by the selected table from Bio.Data.CodonTable). This is not a check that it is the start of a cds rather it is a check for a possible open reading frame (as not all open reading frames are cds). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 7 04:46:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 6 Nov 2008 23:46:08 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811070446.mA74k8Js031975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #13 from mdehoon at ims.u-tokyo.ac.jp 2008-11-06 23:46 EST ------- (In reply to comment #12) I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see if you're happy with this version? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Fri Nov 7 06:56:48 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Fri, 7 Nov 2008 04:56:48 -0200 Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall Message-ID: When I run a command line blast with these parameters: /root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec -q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq I find a match (with evalue of 18). But when I do it from biopyhon I can't find any match: rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db, fin, nuc_mismatch='-5', gap_open = '3', gap_extend = '3', search_length = '1.75e12', expectation='20') Here is the input sequence: >C07SpCP042I015.P5A02.R. [Clone-lib=pCLD 04541] NNNCCCCCCCTCGAGGTCGACNNNNNNNNTAAGCTTGAAATTCTATGATATGCAGTTAGT TGCTNCTNGTTTAGCATTGGTTGGTTAACTTAAAACCTTTTCCTGCAATAATTATATGGA TAATATTACTTTACTTNNNNNNNTATTGCCTTCACTAATTTTTAGGATCTATTTTCTGTT AAATGTTATCTCTTGTTCTTGAGAAGTGCTTTGGAGATCATTTTTCCATCGTATTAACAA AAAGTGAAATAACTACTTGTGCAATCAGGCTTTTCCTACACCAGGGGATAAGGCAAATAA ACTATTCACCTCCTTTAATTAGCTCCCCCCCCCCCCCCTCCCCTTCTTTTCTCTTCATTC CTGANNNANTTAGCTAGTACGCACCATTCAATCAATTATTTCTGTTCCATTTTGTGCTAA ATATGTTTTCAAATGTTTAATATAGTTCTGAAGACAGCAGTTTAATGTTTTGTCTGGCTA ACTGCTATTCTAAGCTCATTGTTTCAGCTTGCAGTTTTGCAGCAAAACCTGTCTGCTGTC CATGAAATCTGGAAGGAATGTAGTAAATTTTACAGTCTCAGCCTTCTATCTCTGAGGAAG TTTATATGGTCCTTCACGGAGCTGAGAGATCTGAATTCAGCCCACACAGCCTTACAGCAC ATGGTGAGATTGGCTTTTACGGAAAACTCTTACATTAGTAGAACTGCTGAGGGGAGGTTT TGTGATTTAAGATTGGATATTCCAGCACCTTCCTCTGGCAATTGGAGTTTCATCGATGTA TCTGTCGACACCGCGGGTAGCAGCAATTTTGATATGGAAAGACAAAGTCTTGGCAGAAAA ACA and here is the database: ftp://ftp.ncbi.nih.gov/pub/UniVec/UniVec (I got the parameter from http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen_docs.html#Parameters) Best, SB. -- Vendo isla: http://www.genesdigitales.com/isla/ Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 "It is pitch black. You are likely to be eaten by a grue." -- Zork From bugzilla-daemon at portal.open-bio.org Fri Nov 7 09:37:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 04:37:23 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811070937.mA79bNh9020433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #51 from lpritc at scri.sari.ac.uk 2008-11-07 04:37 EST ------- Just to perpetuate, what I suggest is (in pseudocode, and with argument names up for, well, argument): class Seq: [...] def startswith_startcodon(): """ Returns True if the first three bases of the sequence are a valid start codon in the sequence's codon table, returns False otherwise """ def endswith_stopcodon(): """ Returns True if the length of the sequence is a multiple of three, and the last three bases are a valid stop codon in the sequence's codon table, returns False otherwise """ def is_cds(): """ Returns true if the sequence meets the criteria for a CDS, False otherwise. The criteria are: i) The very first three bases of the sequence are a valid start codon ii) The sequence length is a multiple of three iii) The final three bases of the sequence are a valid stop codon iv) There are no in-frame stop codons, other than the final stop codon """ if not self.startswith_startcodon(): return False if not endswith_stopcodon(): return False # Test for in-frame stop codon, return True if none is found, return False otherwise def translate([...], assert_cds=False, assert_cds_firstcodon=False): """ Returns a new Seq object with the protein translation. If assert_cds is True, but the sequence is not a CDS as determined by self.is_cds(), then an error is thrown. Otherwise, the sequence is translated with the first codon read as a methionine, rather than the amino acid which it would encode at any other position. If assert_cdsfirstcodon is true, but the sequence doesn't start with a valid start codon, then an error is thrown. Otherwise, the sequence is translated with the first codon read as a methionine, as above. """ # Translate away as normal, here [...] if assert_cds: if not self.is_cds(): raise ValueError, "WTF? This is no CDS, my good fellow human!" else: # Make the first amino acid of the translated sequence a Met if assert_cdsfirstcodon: if not self.startswith_startcodon(): raise ValueError, "Hey! Stop playing around, this sequence doesn't start with a start codon" else: # Make the first amino acid of the translated sequence a Met # Then continue as normal This approach provides the following behaviour (assuming things about argument names that can be thrashed out later) # I want to translate some nt sequence, and don't care about stops, starts, or any other stuff aaseq = ntseq.translate() # I want to translate my nt sequence to the first in-frame stop codon, and no further aaseq = ntseq.translate(to_stop=True) # I want to know if my nt sequence is a (putative) CDS ntseq.is_cds() # I want to know if my nt sequence starts with a start codon ntseq.startswith_startcodon() # I want to know if my nt sequence ends with an in-frame stop codon # Note that this is a different question to asking whether there is *any* in-frame stop codon ntseq.endswith_stopcodon() # I want to translate my nt sequence, which I know is a CDS, # but not convert the first codon to a methionine aaseq = ntseq.translate() # I want to translate my nt sequence, which I know is a CDS, # and convert the first codon to a methionine aaseq = ntseq.translate(assert_cds=True) # OK, my sequence isn't a *real* CDS, but it still starts with a valid start codon # (I checked already with ntseq.startswith_startcodon()), and I'd like to convert the first # codon as if it was really a CDS. You don't need to know why, I just do. I'm wacky that way. aaseq = ntseq.translate(assert_cdsfirstcodon=True) # I'd like a list of all my sequences that are valid CDS seqlist = [s for s in myntseqs if s.is_cds()] # I'd like translations of all my sequences that are valid CDS tlist1 = [s.translate() for s in seqlist] tlist2 = [s.translate() for s in myntseqs if s.is_cds()] In terms of nomenclature: The default behaviour of translate() as Peter proposed: read through in-frame and translate with the appropriate codon table - is fine in nearly all circumstances. Most other circumstances are covered by stopping at the first in-frame stop codon, which Peter has implemented, and is an option we all seem to agree on. Biologically-speaking, this behaviour is not always correct for CDS in prokaryotes, where alternative start codons may occur a significant minority of the time. These will be mistranslated if no provision is made for them. I think a useful biological sequence object should at least try to mimic actual biology, so we should provide an option to handle this. We should not assume that a sequence is a CDS unless it is specified by the user. It seems reasonable to me that the term 'cds' should occur in any such argument from the user. We have at least two options for how to proceed with a CDS: i) we can provide a strict CDS-type translation, which requires confirmation that the sequence is, in fact, a CDS; ii) we can provide a weak CDS-type translation, which only modifies the way the start codon is translated. In both cases, behaviour is specific to CDS, and so having 'cds' in the argument name *somewhere* seems obvious, and entirely reasonable. I think that 'assert_cds' makes clear that we are asserting that the sequence is a valid CDS - no internal stops and everything else that comes with that status. I think that 'assert_cdsfirstcodon' avoids any ambiguity over the word 'start', and also conveys that we are asserting that the first (rather than start) codon has some relationship to a CDS; in this case the relationship is that the first codon of the sequence meets the criteria for a CDS. But that's kind of a long argument name ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 7 09:48:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 04:48:18 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811070948.mA79mIRl021035@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #52 from lpritc at scri.sari.ac.uk 2008-11-07 04:48 EST ------- (In reply to comment #50) > The use of 'cds' alone is wrong because cds refer to DNA not translation and > not to protein sequences. The use of cds is confusing or at least vague until > you determine how it works. I think that translate() also refers only to nucleotide sequences, and therefore the association of 'cds' is not inherently confusing on that count. I think that it can be an appropriate term in an argument name (see above). > Also it could be wrong in the sense it is a valid > cds (see the GUG initiation in mammalian NAT1 example at the NCBI link) just > not allowed by the table in Bio.Data.CodonTable. It's up to the user to use the correct codon table for their purpose, I think. Otherwise, how would you propose to correct for their error? > [...] 'cds_start' [...] One interpretation that also > comes to mind is that it is the location of the start of the cds in the > sequence (cds start at...). I agree with this. It has the potential to be confusing. > This is not a check that it is the start of a cds > rather it is a check for a possible open reading frame (as not all open reading > frames are cds). It is true that not all ORFs are CDS (indeed, by far the majority are not). However, open reading frames do not have to start with - or even contain - a start codon. They just do not contain an in-frame stop codon. We've been over this definition before (comment #21). L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 7 10:13:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 10:13:21 +0000 Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall In-Reply-To: References: Message-ID: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com> On Fri, Nov 7, 2008 at 6:56 AM, Sebastian Bassi wrote: > When I run a command line blast with these parameters: > > /root/blast-2.2.18/bin/blastall -p blastn -d /var/www/blast/db/UniVec > -q -5 -G 3 -E 3 -F "m D" -e 700 -Y 1.75e12 -i tmpsq > > I find a match (with evalue of 18). > But when I do it from biopyhon I can't find any match: > > rh, eh = NCBIStandalone.blastall(blast_exe, "blastn", db, > fin, nuc_mismatch='-5', > gap_open = '3', > gap_extend = '3', > search_length = '1.75e12', > expectation='20') You are not using exactly the same arguments, so its not surprising you get different results: -q -5 =>nuc_mismatch = -5 (or as a string) -G 3 => gap_open = 3 (or as a string) -E 3 => gap_extend = 3 (or as a string) -F "m D" => filter="m D" (MISSING!) -e 700 => expectation=700 (or as a string) -Y = 1.75e12 => search_length = '1.75e12' (or as a float) Your expectation cut off is more generous in the Biopython version (700) than the commanline line version (20), but that wouldn't explain the difference. Its probably due to omitting the filter option (-F). If that doesn't resolve the difference then there is something very strange going on... Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 7 11:14:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 06:14:13 -0500 Subject: [Biopython-dev] [Bug 2622] Parsing between position locations like 5933^5934 in GenBank/EMBL files In-Reply-To: Message-ID: <200811071114.mA7BED84026709@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2622 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 06:14 EST ------- I've updated CVS to treat a between position like 3^4 (one based counting) as a zero length slice 3:3. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 7 11:19:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 06:19:12 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811071119.mA7BJCjd027093@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 06:19 EST ------- Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO and Bio.AlignIO (the later are complicated due to finding the input files). Thanks for the encouragement Marco - hopefully this has also made the docstring documentation more useful, and will also improve the API docs too: http://biopython.org/DIST/docs/api/ (updated for each release) Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 7 11:52:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 06:52:50 -0500 Subject: [Biopython-dev] [Bug 2613] test_Wise and test_psw fail under Python 2.3 In-Reply-To: Message-ID: <200811071152.mA7BqoKj029425@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2613 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 06:52 EST ------- "Fixed" by skipping these tests (and the recently added test_docstrings.py) if run on Python 2.3. Python 2.3 doctest uses slightly different formatting. It also doesn't support some features like -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 7 12:32:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 12:32:33 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta) Message-ID: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com> Hi all, I've been going over a few little things on the unit tests (e.g. python 2.3's doctest isn't quite the same), and think I am ready to prepare Biopython 1.49 (beta). I plan to make the Windows installers for Python 2.3, 2.4 and 2.5 against numpy 1.1.1 Currently there is no Windows version of numpy for python 2.6, so we won't be able to ship a Windows installer for python 2.6 for Biopython either. So, its CVS freeze time. Once the beta is out (hopefully later today), we can start using CVS for documentation updates or fixing any bugs reported in the beta. Then in about a week's time I hope to do the Biopython 1.49 "final" release. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 7 15:18:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 10:18:47 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811071518.mA7FIlHb012537@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 ------- Comment #14 from bsouthey at gmail.com 2008-11-07 10:18 EST ------- (In reply to comment #13) > (In reply to comment #12) > I have uploaded a fixed version of Bio.NaiveBayes to CVS. Can you check to see > if you're happy with this version? > Yes! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From sbassi at gmail.com Fri Nov 7 16:30:34 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Fri, 7 Nov 2008 14:30:34 -0200 Subject: [Biopython-dev] Possible problem with NCBIStandalone.blastall In-Reply-To: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com> References: <320fb6e00811070213i4aa5955arf233180d6a047de0@mail.gmail.com> Message-ID: On Fri, Nov 7, 2008 at 8:13 AM, Peter wrote: > -q -5 =>nuc_mismatch = -5 (or as a string) > -G 3 => gap_open = 3 (or as a string) > -E 3 => gap_extend = 3 (or as a string) > -F "m D" => filter="m D" (MISSING!) I will try with this. > -e 700 => expectation=700 (or as a string) > -Y = 1.75e12 => search_length = '1.75e12' (or as a float) I used string since I have the biopython version with the bug that doesn't allow me to enter non iterable values. > the difference. Its probably due to omitting the filter option (-F). > If that doesn't resolve the difference then there is something very > strange going on... OK, I will check it and get back with the results. Thank you. Best, SB. From biopython at maubp.freeserve.co.uk Fri Nov 7 16:53:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Nov 2008 16:53:58 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 (beta) In-Reply-To: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com> References: <320fb6e00811070432x123e806foa06b7f3d94bdb068@mail.gmail.com> Message-ID: <320fb6e00811070853w77cd415dn68b1889c09388fb6@mail.gmail.com> > Once the beta is out (hopefully later today), we can start using CVS > for documentation updates or fixing any bugs reported in the beta. > Then in about a week's time I hope to do the Biopython 1.49 "final" > release. OK - Biopython 1.49 beta is done, available on the website now :) Please don't do any new code checkins for the next week. Additional documentation and unit tests should be fine - and any bug fixes after discussion. I've done a news post, which I can edit if anyone spots anything wrong or has suggestion for improvement, but it will be a good basis for the announcement email: http://news.open-bio.org/news/2008/11/biopython-149-beta-released/ Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 7 16:55:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 7 Nov 2008 11:55:22 -0500 Subject: [Biopython-dev] [Bug 2629] Updated Bio.NaiveBayes to listfns import In-Reply-To: Message-ID: <200811071655.mA7GtM6F018980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2629 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-07 11:55 EST ------- Grand - this bug seems to be fixed then (and in time for Biopython 1.49 beta). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 02:56:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 8 Nov 2008 21:56:59 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811090256.mA92uxgL025316@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #3 from chapmanb at 50mail.com 2008-11-08 21:56 EST ------- Thanks Peter for the heads up on the future changes. Fixed this with respect to the offered suggestions with Bio/GenBank/Record.py 1.12; Bio/GenBank/Scanner.py 1.25 and Bio/GenBank/__init__.py 1.95. I left PROJECT output as shown in our example as it was not clear from the GenBank documentation whether they would be on multiple or single lines. DBLINK was output over multiple line as defined in the documentation. When files with DBLINKs are released we should include a test case. For feature parsing, both DBLINK and PROJECT will be stored as dbxrefs as suggested. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 15:04:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 10:04:09 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811091504.mA9F49hU030667@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-09 10:04 EST ------- You've got a minor bug in there Brad... def dblink(self, content): """Store DBLINK cross references as dbxrefs in our record object. """ dblinks = [l for l in content.split() if l] self.data.dbxrefs.extend(projects) Should be: self.data.dbxrefs.extend(dblinks) However, based on the example DBLINK line, we shouldn't be splitting on spaces at all - for example this transition example for when the PROJECT line and DBLINK lines are present: LOCUS CP000964 5641239 bp DNA circular BCT 24-SEP-2008 DEFINITION Klebsiella pneumoniae 342, complete genome. ACCESSION CP000964 VERSION CP000964.1 GI:206564770 PROJECT GenomeProject:28471 DBLINK Project:28471 Trace Assembly Archive:123456 .... Note that "Trace Assembly Archive:123456" should be a single cross reference. I'll attach a patch for CVS in a moment. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 15:07:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 10:07:30 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811091507.mA9F7U0N030977@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-09 10:07 EST ------- Created an attachment (id=1045) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1045&action=view) Patch to Bio/GenBank/*.py This patch against CVS assumes DBLINK lines contain one cross reference per line. Also maps "GenomeProject:" to "Project:" so that we'll be consistent when the NCBI change this as part of the PROJECT line to DBLINK line switch. Should avoid duplicate entries in the dbxrefs list (especially during the transition period where both PROJECT and DBLINK lines are used). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Nov 9 15:16:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 9 Nov 2008 15:16:50 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released Message-ID: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> Dear Biopythoneers, We are pleased to announce a beta release of Biopython 1.49. There are been some significant changes since Biopython 1.48 was released two months ago, which is why we are initially releasing a beta for wider testing. As previously announced, the big news is that Biopython now uses NumPy rather than its precursor Numeric (the original Numerical Python library). As in the previous releases, Biopython 1.49 beta supports Python 2.3, 2.4 and 2.5 but should now also work fine on Python 2.6. Please note that we intend to drop support for Python 2.3 in a couple of releases time. We also have some new functionality, starting with the basic sequence object (the Seq class) which now has more methods. This encourages a more object orientated coding style, and makes basic biological operations like transcription and translation more accessible and discoverable. Our BioSQL interface can now optionally fetch the NCBI taxonomy on demand when loading sequences (via Bio.Entrez) allowing you to populate the taxon/taxon_name tables gradually. Also, BioSQL should now work with the psycopg2 driver for PostgreSQL (as well as the older psycopg driver). Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now considered to be deprecated, meaning mxTextTools is no longer required to use Biopython. This should not affect any of the typically used parsers (e.g. Bio.SeqIO and Bio.AlignIO). So, if you are feeling brave and know the risks, please try out Biopython 1.49 beta, and let us know on the mailing lists if it works, or more importantly if something doesn't. We'd also like feedback on the updated Biopython Tutorial and Cookbook: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Source distributions and Windows installers are available from the Biopython website: http://biopython.org/wiki/Download Thanks! -Peter on behalf of the Biopython developers P.S. Those of you subscribed to our news feed would have seen this announcement already. For RSS links etc, see: http://biopython.org/wiki/News From bugzilla-daemon at portal.open-bio.org Sun Nov 9 16:00:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 11:00:39 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811091600.mA9G0dZ6003494@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #11 from dalloliogm at gmail.com 2008-11-09 11:00 EST ------- (In reply to comment #10) > Marking as fixed - I've updated SeqRecord.py in CVS revision 1.25 to call the > doctests via the __main__ trick, with similar changes for Bio.Seq, Bio.SeqIO > and Bio.AlignIO (the later are complicated due to finding the input files). > > Thanks for the encouragement Marco - hopefully this has also made the docstring > documentation more useful, and will also improve the API docs too: > http://biopython.org/DIST/docs/api/ (updated for each release) Thanks to you!! :) I am really happy you accepted my patch. I'll see if I can contribute something else. > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Nov 9 16:10:59 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 9 Nov 2008 16:10:59 +0000 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: References: <320fb6e00810150709u2aed9855kb8cf91318f287765@mail.gmail.com> Message-ID: <320fb6e00811090810s342e78f1n3eb45bba051d236f@mail.gmail.com> Getting back to simpler plot examples using pylab, Andrew Dalke wrote up some nice examples plotting Kyte & Doolittle hydrophobicities of protein sequences: http://www.dalkescientific.com/writings/NBN/plotting.html Something based on this idea (but probably leaving out most of the complicated smoothing stuff and labelling the helices) could make a short and sweet line plot example for the Biopython tutorial. Peter From bugzilla-daemon at portal.open-bio.org Sun Nov 9 17:29:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:29:34 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091729.mA9HTYF1011072@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1042 is|0 |1 obsolete| | ------- Comment #12 from dalloliogm at gmail.com 2008-11-09 12:29 EST ------- Created an attachment (id=1046) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1046&action=view) fastPhase output iterator (returns Alignment objects) This is the rewritten fastphaseoutputIO, which returns an Alignment file instead of SeqRecords objects. It can still return SeqRecord objects if a 'ret = seqrecord' parameter is passed, but Alignemnt are returned by default. Moreover, I have de-capitalized (.lower()) the name of the function, and added a link to fastPhase article in the documentation (althought I think the doc would need more work) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 17:30:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:30:25 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091730.mA9HUP6J011190@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1046 is|0 |1 obsolete| | ------- Comment #13 from dalloliogm at gmail.com 2008-11-09 12:30 EST ------- Created an attachment (id=1047) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1047&action=view) a doctest file to test fastPhaseOutputIterator -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 17:34:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:34:19 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091734.mA9HYJ7I011664@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #14 from dalloliogm at gmail.com 2008-11-09 12:34 EST ------- Created an attachment (id=1048) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1048&action=view) use cases/description for fastphaseoutputIO This is a collection of use cases/examples about fastPhaseOutputIO. I thought it could be useful to understand how this module will be used and by who, or just to remind me why I wrote this module later :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 17:41:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:41:26 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811091741.mA9HfQlr012379@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #15 from dalloliogm at gmail.com 2008-11-09 12:41 EST ------- (In reply to comment #10) > (In reply to comment #8) > > > If fastPHASE files SHOULD always come in allele groups (of the same > > > length), then it would be better to integrate the parser into Bio.AlignIO > > > giving pairwise alignments (and you would be able to read it via Bio.SeqIO > > > automatically as well). > > > > This is good idea, I didn't think of it. > > But how should I modify the module to produce AlignIO objects? > > Essentially Instead of: > > yield record_one > yield record_two > > you'd do something like this: > > alignment = Alignment(generic_dna) > alignment.add_sequence(id_one, seq_one) > alignment.add_sequence(id_two, seq_two) > yield alignment I have modified the module so it returns Alignment objects instead of SeqRecords. The problem is that Alignment.add_sequence doesn't support SeqRecords objects as inputs; it only requires an id and the sequence. This causes that some information is lost: to be more precise, everything I was putting in 'description' (subpop. label: 6 (internally 1)) is lost, because there is not a way to store it in the Alignment object. Moreover, now the parser only returns a single Alignment object per file (I think it is not supposed to be possible to have two fastphase outputs in the same file), because I thought it was the most useful thing. However, I left an option to have SeqRecord objects returned instead of Alignments (unfortunately I removed them from the doctests :(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 17:46:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:46:13 -0500 Subject: [Biopython-dev] [Bug 2554] Creating an Alignment from a list of SeqRecord objects In-Reply-To: Message-ID: <200811091746.mA9HkDPr012817@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2554 ------- Comment #3 from dalloliogm at gmail.com 2008-11-09 12:46 EST ------- (In reply to comment #0) > It would be nice to be able to supply a list (or iterator) of SeqRecord objects > when creating an alignment object. This would also make the > Bio.SeqIO.to_alignment() function obsolete. I agree with this request; see http://bugzilla.open-bio.org/show_bug.cgi?id=2643#c15 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 17:52:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 12:52:48 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811091752.mA9HqmqQ013518@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #12 from dalloliogm at gmail.com 2008-11-09 12:52 EST ------- Created an attachment (id=1049) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1049&action=view) add doctests to Bio.Align.Generic.Alignment This is a patch to add doctest to Bio.Align.Generic.Alignment. I just wrote it for myself to understand how this class works.. if you think it could be useful, here it is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 9 21:35:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 9 Nov 2008 16:35:25 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811092135.mA9LZPBG004563@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 ------- Comment #6 from chapmanb at 50mail.com 2008-11-09 16:35 EST ------- Peter -- thanks for the bug catch and suggestion. Working into the future and trying to predict if NCBI is going to do what they plan is always fun. Your fix looks great to me -- commit away and we can close this out. If things are different when the actually make the change we can always adjust then but this looks very sensible. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 08:58:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 03:58:52 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments for their types In-Reply-To: Message-ID: <200811100858.mAA8wq2i007149@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|SeqRecord.init doesn't check|SeqRecord.init doesn't check |for arguments to their types|for arguments for their | |types ------- Comment #5 from dalloliogm at gmail.com 2008-11-10 03:58 EST ------- (In reply to comment #4) > (In reply to comment #3) > > Created an attachment (id=1041) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1041&action=view) [details] [details] > > add a check for the seq argument in seqrecord, to be a Seq object and not None > > > > This patch adds a check for the seq argument in SeqRecord. > > If seq is None (by default), it raises a ValueError Exception. > > If it is a Seq objects, it saves it as self.seq. > > If it is another kind of object (string, list, integer), it is converted to a > > string, and then used to instantiate a seq object. > > I was deliberately not checking the seq argument. Ok, understood. I didn't thought of these cases. However, having not a Seq causes errors that are difficult to understand in other functions that use SeqRecord. For example, if you do: >>> a = SeqRecord(id = '1') >>> a.format('fasta') you get the error: : 'NoneType' object has no attribute 'tostring' This could scary an eventual biopython newbie, an exception like to 'error - current SeqRecord object doesn't have a Seq' could be better. What do you think about creating a 'NullSeq' object, which represent a Seq with no value, and using it as a default for SeqRecord? Later we could modify the other functions like .format e Seq.translate to intercept these objects and return the right error message. > There are several reasonable > use cases: > > * a Seq object (normal) or a subclass of it. > * a MutableSeq object (seems reasonable, note this is not a subclass of Seq) > * None (seems a good way to handle sequence records where we don't know the > sequence - for example some GenBank files). > * a user defined sequence object which implements the Seq API but does not > subclass Seq or MutableSeq (this is more difficult to check). > > > I thought that someone could use an integer (e.g.: 010100010101101) as a > > sequence, and in this case, the integer is first converted to a string > > (otherwise Seq() would return an error). > > Note that if someone did want to use some weird numerical sequence, then the > SeqRecord object should NOT be trying to do anything special (guessing what is > intended). The user should create a suitable Seq object themselves (ideally > with a numerical alphabet object). Explicit rather than implicit (Zen of > python). > > -- > > Note that I'm not 100% happy with the type checking we've just added. See > "duck-typing" and interfaces versus types, > http://www.python.org/doc/2.5.2/tut/node18.html#l2h-46 > > The checks I've added shouldn't be too constraining - but maybe they should use > using interface checking instead (or just revert back to no checking). > > Any comments from other people? This should be being CC'd to the dev mailing > list. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 09:09:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 04:09:42 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811100909.mAA99g8S008678@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1043 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 10:16:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 05:16:14 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101016.mAAAGERI012974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 05:16 EST ------- (In reply to comment #15) > I have modified the module so it returns Alignment objects instead of > SeqRecords. > The problem is that Alignment.add_sequence doesn't support SeqRecords objects > as inputs; it only requires an id and the sequence. This causes that some > information is lost: to be more precise, everything I was > putting in 'description' (subpop. label: 6 (internally 1)) is lost, because > there is not a way to store it in the Alignment object. Adding a SeqRecord to an alignment would be enhancement request Bug 2553. I see you've just spotted enhancement request Bug 2554 which would also solve this issue nicely. As a short term solution until one of these bugs is implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API to use alignment._records directly (this is just a list of SeqRecord objects). > Moreover, now the parser only returns a single Alignment object per file (I > think it is not supposed to be possible to have two fastphase outputs in the > same file), because I thought it was the most useful thing. Bio.AlignIO uses generators/iterators just like Bio.SeqIO - so that in general you can return multiple alignments for use with Bio.AlignIO.parse(). However, if the file format really does just return one pairwise alignment, then just yield one alignment (this happens on the Nexus file format). > However, I left an option to have SeqRecord objects returned instead of > Alignments (unfortunately I removed them from the doctests :(). If you want this as part of Bio.AlignIO / Bio.SeqIO you don't need to do this. Once a parser is added to Bio.AlignIO, the file format can also be used from Bio.SeqIO to get SeqRecord objects (the rows of all the alignments). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 10:45:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 05:45:34 -0500 Subject: [Biopython-dev] [Bug 2225] Do something with the PROJECT line in GenBank files In-Reply-To: Message-ID: <200811101045.mAAAjYJ6015314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2225 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 05:45 EST ------- (In reply to comment #3) > When files with DBLINKs are released we should include a test case. Definitely. We might be able to just update an existing test case, like the one added for between locations. (In reply to comment #6) > Peter -- thanks for the bug catch and suggestion. Working into the future > and trying to predict if NCBI is going to do what they plan is always fun. Well - they've got about six months to change their mind ;) > Your fix looks great to me -- commit away and we can close this out. Checked in. > If things are different when the actually make the change we can always > adjust then but this looks very sensible. OK. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Nov 10 11:28:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 10 Nov 2008 11:28:00 +0000 Subject: [Biopython-dev] [BioPython] annotations in an Alignment object In-Reply-To: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com> References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com> Message-ID: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com> On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio wrote: > Is there any way to store some annotations in an Alignment object?? > For example: the alignment tool used, its parameters, its version, the > date, and the nature of the sequence aligned. Not officially, no. This is on my mental list of things to do with the alignment object (after Biopython 1.49 is done). I've CC'd the dev-mailing list which is probably a better place to discuss the details. If you look at Bio/AlignIO/StockholmIO.py or the Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of information in a private dictionary, i.e. alignment._annotations. This makes the data available if anyone really needs it, but signals that this is not part of the public API and is likely to change. As part of an alignment annotation enhancement, we should try and establish some agreed standards for naming annotation entries (and also counting systems). > I am asking this because I would like to write a module to create > ldhat input files from an alignment program. > A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html) > is very similar to a fasta file; the only difference is that in its > first line, it contains three numbers, one of which can't always be > inferred by the data. Why go to the trouble of making a new Bio.AlignIO module? For this example from the LDhat manual, it looks like a FASTA file with an extra header: 4 10 1 >SampleA TCCGC??RTT >SampleB TACGC??GTA >SampleC TC?-CTTGTA >SampleD TCC-CTTGTT Rather than writing support for a whole new file format, wouldn't it be easier to do something like this: alignment = ... number_a = 4 number_b = 10 number_c = 1 handle = open("example.txt","w") handle.write("%i %i %i\n" % (number_a, number_b, number_c)) handle.write(alignment.format("fasta")) handle.close() Peter From dalloliogm at gmail.com Mon Nov 10 11:42:31 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 10 Nov 2008 12:42:31 +0100 Subject: [Biopython-dev] [BioPython] annotations in an Alignment object In-Reply-To: <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com> References: <5aa3b3570811100304o4655fe60o4ecabf41e054c211@mail.gmail.com> <320fb6e00811100328j1a565c36t7f3522344e7c95c0@mail.gmail.com> Message-ID: <5aa3b3570811100342t7c23c0fl2b101be3fd352159@mail.gmail.com> On Mon, Nov 10, 2008 at 12:28 PM, Peter wrote: > On Mon, Nov 10, 2008 at 11:04 AM, Giovanni Marco Dall'Olio > wrote: >> Is there any way to store some annotations in an Alignment object?? >> For example: the alignment tool used, its parameters, its version, the >> date, and the nature of the sequence aligned. > > Not officially, no. This is on my mental list of things to do with > the alignment object (after Biopython 1.49 is done). I've CC'd the > dev-mailing list which is probably a better place to discuss the > details. > > If you look at Bio/AlignIO/StockholmIO.py or the > Bio/AlignIO/FastaIO.py code you'll see I've recorded this kind of > information in a private dictionary, i.e. alignment._annotations. > This makes the data available if anyone really needs it, but signals > that this is not part of the public API and is likely to change. > > As part of an alignment annotation enhancement, we should try and > establish some agreed standards for naming annotation entries (and > also counting systems). ok... I will use the private dictionary for my own implementation. Unfortunately I don't have any useful suggestion for this.. >> I am asking this because I would like to write a module to create >> ldhat input files from an alignment program. >> A ldhat file (http://www.stats.ox.ac.uk/~mcvean/LDhat/instructions.html) >> is very similar to a fasta file; the only difference is that in its >> first line, it contains three numbers, one of which can't always be >> inferred by the data. > > Why go to the trouble of making a new Bio.AlignIO module? For this > example from the LDhat manual, it looks like a FASTA file with an > extra header: Yeah.. of course :) Let's say I am simply playing with biopython's code, to better understand it. Since I am going to use this function many times, I will have to write a module for it any way. The first number in the ldhat file is the number of sequences, the second is their length, and the third should be usually one in an alignment object, I suppose. > > 4 10 1 >>SampleA > TCCGC??RTT >>SampleB > TACGC??GTA >>SampleC > TC?-CTTGTA >>SampleD > TCC-CTTGTT > > Rather than writing support for a whole new file format, wouldn't it > be easier to do something like this: > > alignment = ... > number_a = 4 > number_b = 10 > number_c = 1 > > handle = open("example.txt","w") > handle.write("%i %i %i\n" % (number_a, number_b, number_c)) > handle.write(alignment.format("fasta")) > handle.close() > > Peter > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Nov 10 11:48:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 06:48:08 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811101148.mAABm8WO019854@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1033 is|0 |1 obsolete| | ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 06:48 EST ------- (From update of attachment 1033) Something similar was checked into CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 12:02:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 07:02:12 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811101202.mAAC2CV4020912@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1049 is|0 |1 obsolete| | ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 07:02 EST ------- (From update of attachment 1049) I've checked in something similar to CVS - thanks Marco. I've not added a doctest for the format method using "clustal" because I think the bits make the documentation nasty to read. Instead I've just "fasta" and "phylip" only. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 12:14:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 07:14:28 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101214.mAACESXB021859@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 07:14 EST ------- (In reply to comment #16) > (In reply to comment #15) > > I have modified the module so it returns Alignment objects instead of > > SeqRecords. > > The problem is that Alignment.add_sequence doesn't support SeqRecords > > objects as inputs; it only requires an id and the sequence. This > > causes that some information is lost: to be more precise, everything > > I was putting in 'description' (subpop. label: 6 (internally 1)) is > > lost, because there is not a way to store it in the Alignment object. > > Adding a SeqRecord to an alignment would be enhancement request Bug 2553. I > see you've just spotted enhancement request Bug 2554 which would also solve > this issue nicely. As a short term solution until one of these bugs is > implemented, some of the Bio.AlignIO parsers "cheat" and bypass the public API > to use alignment._records directly (this is just a list of SeqRecord objects). Or, for another approach which at least avoids private properties but instead makes an assumption that added sequences are always put at the end of the alignment: alignment = Alignment(generic_dna) alignment.add_sequence(id_one, seq_one) assert alignment[-1].id == id_one alignment[-1].description = desrc_one alignment[-1].annotations["label"] = label_one ... alignment.add_sequence(id_two, seq_two) assert alignment[-1].id == id_two alignment[-1].description = desrc_two alignment[-1].annotations["label"] = label_two ... yield alignment However, I agree with you, the best solution is to pass SeqRecord objects to the alignment directly (i.e. Bug 2553 and/or Bug 2554). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 16:04:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:04:06 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101604.mAAG46Cj008024@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #18 from dalloliogm at gmail.com 2008-11-10 11:04 EST ------- (In reply to comment #17) > > Or, for another approach which at least avoids private properties but instead > makes an assumption that added sequences are always put at the end of the > alignment: > > alignment = Alignment(generic_dna) > > alignment.add_sequence(id_one, seq_one) > assert alignment[-1].id == id_one > alignment[-1].description = desrc_one > alignment[-1].annotations["label"] = label_one > ... > > alignment.add_sequence(id_two, seq_two) > assert alignment[-1].id == id_two > alignment[-1].description = desrc_two > alignment[-1].annotations["label"] = label_two > ... > yield alignment > Ok!! I ended up using the first method, but I left a comment in the code to remind me that. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 16:06:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:06:49 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101606.mAAG6nDL008314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1044 is|0 |1 obsolete| | ------- Comment #19 from dalloliogm at gmail.com 2008-11-10 11:06 EST ------- Created an attachment (id=1050) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1050&action=view) fastPhase output iterator (returns an Alignment object with SeqRecords) This version returns an Alignment object with valid SeqRecord objects, using the Alignment._records.append trick. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 16:07:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:07:27 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101607.mAAG7RLr008403@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1047 is|0 |1 obsolete| | ------- Comment #20 from dalloliogm at gmail.com 2008-11-10 11:07 EST ------- Created an attachment (id=1051) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1051&action=view) 1047: a doctest file to test fastPhaseOutputIterator updated for attachment 1050 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 16:34:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 11:34:34 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811101634.mAAGYYbi010826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 11:34 EST ------- Hi Marco, Looking at your example, the important part of the file is this bit: ... BEGIN GENOTYPES Ind1 # subpop. label: 6 (internally 1) T T T T T G A A A C C A A A G A C G C T G C G T C A G C C T G C A A T C T G T T T T T G C C C C C A A A A G C G C G T C G T C A G T C T A A G A C C T A Ind2 # subpop. label: 6 (internally 1) C T T T T G C C C T C A A A A G T G C T G T G C C A G T C T A C G G C C T G T T T T T G A A A C C A A A G A C G C T T C G T C A G T A T A C G A T C T A END GENOTYPES Quoting the manual again, "Output ???les for inferred haplotypes or imputed genotypes contain two lines per given diploid individual, with the order of individuals corresponding to that supplied in the input ???le." In this example we have two individuals, Ind1 and Ind2 (presumably with automatically assigned names). In a real world example, how many individuals would you expect to use? Does it make more sense to return a pairwise alignment for each individual, rather than one large combined alignment? One of the main points for using iterators/generators is they allow us to deal with very large files by not having to keep everything in memory. Now I don't have a feel for what sized files fastPhase could output - maybe a single large alignment is fine. i.e. One combined alignment: IUPACUnambiguousDNA() alignment with 4 rows and 38 columns TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1 TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2 CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1 TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2 versus one pairwise alignment per individual: IUPACUnambiguousDNA() alignment with 2 rows and 38 columns TTTTTGAAACCAAAGACGCTGCGTCAGCCTGCAATCTG Ind1_all1 TTTTTGCCCCCAAAAGCGCGTCGTCAGTCTAAGACCTA Ind1_all2 IUPACUnambiguousDNA() alignment with 2 rows and 38 columns CTTTTGCCCTCAAAAGTGCTGTGCCAGTCTACGGCCTG Ind2_all1 TTTTTGAAACCAAAGACGCTTCGTCAGTATACGATCTA Ind2_all2 I think you'll have to decide this (unless anyone else following this has a view - Tiago maybe?) P.S. Have you tried with and without the -n option to automatically name the individuals? What happens if the name includes a hash character (#)? I would hope fastPhase would treat this as an error, but it could end up in the output file and confuse the parser. P.P.S. Based on the examples in the manual, typical output might use lower case nucleotides (a, t, c, g) or numbers (0, 1). I presume upper case nucleotides are also fine, but defaulting to this is a bad idea. Please default to Bio.Alphabet.single_letter_alphabet which seems to be the the safest choice (we shouldn't guess). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 19:19:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 14:19:15 -0500 Subject: [Biopython-dev] [Bug 2649] New: Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2649 Summary: Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: paul at rudin.co.uk Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. The numpy default for floats is "float64" on 64 bit machines and this would seem to be a more natural and practical choice. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 22:25:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 17:25:33 -0500 Subject: [Biopython-dev] [Bug 2651] New: Error from test_GAQueens.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2651 Summary: Error from test_GAQueens.py Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I got this error with Python2.5 but it is extremely rare. I think that I seen it before but have never reproduced it. It indicates some bugs are lurking other than the obvious bug with Seq.py that are being triggered by the test. ====================================================================== ERROR: test_GAQueens ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 142, in runSafeTest cur_test.run_tests([]) File "test_GAQueens.py", line 42, in run_tests main(arguments) File "test_GAQueens.py", line 76, in main evolved_pop = evolver.evolve(queens_solved) File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Evolver.py", line 56, in evolve self._population = self._selector.select(self._population) File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Tournament.py", line 77, in select new_orgs[1]) File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/GA/Selection/Abstract.py", line 53, in mutate_and_crossover final_org_1 = self._repairer.repair(final_org_1) File "test_GAQueens.py", line 234, in repair duplicated_items = self._get_duplicates(organism.genome) File "test_GAQueens.py", line 203, in _get_duplicates if genome.count(item) > 1: File "/home/bsouthey/python/biopython-1.49b/build/lib.linux-x86_64-2.5/Bio/Seq.py", line 796, in count if len(search) == 1 : TypeError: object of type 'int' has no len() ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 10 23:28:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 18:28:26 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811102328.mAANSQiJ032135@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |minor Component|Main Distribution |Unit Tests ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-10 18:28 EST ------- What bug in Seq? Trying to call the count method with an integer argument instead of string or another Seq should fail - try it on a string for comparison: >>> "123456".count(1) Traceback (most recent call last): File "", line 1, in ? TypeError: expected a character buffer object I would agree that the TypeError message could be better, "object of type 'int' has no len()" is a little misleading. Are you suggesting that be changed? Genetic algorithms (with a random seed at least) are non deterministic - I've seen some of the GA unit tests fail every so often (but I'm not sure off hand if its just test_GAQueens or not). Rerunning the test will usually be fine. The traceback looks familiar so its probably the same issue, but I haven't had the time or desire to trace through the code to try and work out what is going wrong. I would guess it fails far less than 10% of time, but maybe 1% or 2%. I guess a quick shell script would answer this ;) Maybe we should catch the error condition and issue a runtime error saying "Didn't converge" or whatever would be appropriate terminology. Or automatically restart the test? Or, maybe we can solve the unit test failure by specifying a random seed - that might be a neat solution. N.B. Refiling under unit tests. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 02:30:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 10 Nov 2008 21:30:46 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811110230.mAB2Ukq2020297@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 ------- Comment #2 from bsouthey at gmail.com 2008-11-10 21:30 EST ------- (In reply to comment #1) > What bug in Seq? Trying to call the count method with an integer argument > instead of string or another Seq should fail - try it on a string for > comparison: > > >>> "123456".count(1) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: expected a character buffer object > > I would agree that the TypeError message could be better, "object of type 'int' > has no len()" is a little misleading. Are you suggesting that be changed? That is an 'obvious' bug (in light of the error) because there is no check for that 'sub' is a string. Using the example from the docstring: my_mseq = MutableSeq("AAAATGA") my_mseq.count(1) Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count if len(search) == 1 : TypeError: object of type 'int' has no len() Note that using a dict or list work but perhaps these should not. I think you need to check that 'search' is a string (isinstance(search,basestring)). If not, then fail with some more informative message. > > Genetic algorithms (with a random seed at least) are non deterministic - I've > seen some of the GA unit tests fail every so often (but I'm not sure off hand > if its just test_GAQueens or not). Rerunning the test will usually be fine. > The traceback looks familiar so its probably the same issue, but I haven't had > the time or desire to trace through the code to try and work out what is going > wrong. I would guess it fails far less than 10% of time, but maybe 1% or 2%. > I guess a quick shell script would answer this ;) > > Maybe we should catch the error condition and issue a runtime error saying > "Didn't converge" or whatever would be appropriate terminology. Or > automatically restart the test? Or, maybe we can solve the unit test failure > by specifying a random seed - that might be a neat solution. > > N.B. Refiling under unit tests. > I agree with doing one or more of these at least until the source is identified (hopefully a known case). But I do agree that this is not easy to find and I do not know anything to help. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 10:10:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 05:10:45 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811111010.mABAAjQq029851@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-11 05:10 EST ------- (In reply to comment #2) >(In reply to comment #1) >> What bug in Seq? Trying to call the count method with an integer argument >> instead of string or another Seq should fail - try it on a string for >> comparison: >> >> >>> "123456".count(1) >> Traceback (most recent call last): >> File "", line 1, in ? >> TypeError: expected a character buffer object >> >> I would agree that the TypeError message could be better, "object of type >> 'int' has no len()" is a little misleading. Are you suggesting that be >> changed? > > That is an 'obvious' bug (in light of the error) because there is no check for > that 'sub' is a string. Using the example from the docstring: > my_mseq = MutableSeq("AAAATGA") > my_mseq.count(1) > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib64/python2.5/site-packages/Bio/Seq.py", line 722, in count > if len(search) == 1 : > TypeError: object of type 'int' has no len() > > Note that using a dict or list work but perhaps these should not. I think you > need to check that 'search' is a string (isinstance(search,basestring)). If > not, then fail with some more informative message. That's done in CVS. Leaving this bug open to cover the test_GAQueens.py issue. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 11:30:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 06:30:16 -0500 Subject: [Biopython-dev] [Bug 2652] New: Bio.Fasta.Iterator fails with IndexError when opening empty fasta files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2652 Summary: Bio.Fasta.Iterator fails with IndexError when opening empty fasta files Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rjalves at igc.gulbenkian.pt Instead of IndexError a better error handling or at least a more explicit error message. At the first look it's not obvious what is causing the error. Example: In [1]: from Bio import Fasta In [2]: Fasta.Iterator(open("empty.fasta")) --------------------------------------------------------------------------- IndexError Traceback (most recent call last) /var/lib/python-support/python2.5/Bio/Fasta/__init__.pyc in __init__(self, handle, parser, debug) 65 while True : 66 line = handle.readline() ---> 67 if line[0] == ">" : 68 break 69 if debug : print "Skipping: " + line IndexError: string index out of range -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 11:30:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 06:30:45 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111130.mABBUjf8003203@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 rjalves at igc.gulbenkian.pt changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.45 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 11:55:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 06:55:07 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111155.mABBt7Hf005132@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-11 06:55 EST ------- Hi Renato, This bug in Bio.Fasta with empty files was fixed in Biopython 1.49b, see Bio/Fasta/__init__.py revision 1.19. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Fasta/__init__.py?cvsroot=biopython#rev1.19 I would encourage you to try Biopython 1.49b, but if you have a reason for running an old version like Biopython 1.45, you could probably update just this one file instead. Ask if you would like specific instructions, but essentially its a one line change, from: if line[0] == ">" : to: if not line or line[0] == ">" : Please note that Bio.Fasta is considered to be obsolete (and was explicitly documented as such as of Biopython 1.48), and may one day be deprecated. However, given this was the main FASTA parsing code in Biopython for some years, we're not going to deprecate it just yet, so you should be OK continuing to use Bio.Fasta in old scripts for a while yet. For new code, we encourage people to use Bio.SeqIO instead, described in the current tutorial and on the wiki: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/wiki/SeqIO Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 12:08:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 07:08:37 -0500 Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. In-Reply-To: Message-ID: <200811111208.mABC8bHw006251@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2008-11-11 07:08 EST ------- I've uploaded a fixed version to CVS; see KDTree.py and KDTreemodule.c at http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/KDTree/?cvsroot=biopython Could you try with these files and see if they work for you? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 11 13:02:18 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 11 Nov 2008 13:02:18 +0000 Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects In-Reply-To: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com> References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com> Message-ID: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com> On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox wrote: > Hi All, > > Two DBSeq objects cannot be concatenated, although the DBSeq object inherits > __add__ from Seq. Interesting point - not something I'd considered (nor anyone else until now!) > It tries to init a new DBSeq object rather than returning a Seq object as would be expected. > ... > Presumably, DBSeq needs to overide Seq.__add__ > (Using CVS as of yesterday...) Clearly we can't create a new DBSeq object (there wouldn't be any suitable sequence in the database to point to), and returning a Seq object is sensible. We should probably continue this discussion on the dev mailing list (CC'd). Either we have the DBSeq override the __add__ method (and __radd__), or we could make the base Seq class always use new Seq objects in __add__ etc. This would affect anyone writing their own Seq subclass... On balance, I think you're right and its DBSeq which needs to be changed. Would you like to tackle this, or should I? We'd also want to extend the BioSQL unit test to cover adding DBSeq+DBSeq, DBSeq+Seq, Seq+DBSeq, DBSeq+MutableSeq, MutableSeq+DBSeq, etc. Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 11 14:48:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 09:48:14 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111448.mABEmEba019180@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 ------- Comment #2 from rjalves at igc.gulbenkian.pt 2008-11-11 09:48 EST ------- Hi Peter, I am using the Biopython package from the debian-lenny repository (which is 1.45), I guess they haven't updated in part due to the change to the Numpy. I will checkout the svn version then. As for why I'm using Bio.Fasta, I'm not using it directly. Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it. Renato -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 11 14:53:32 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 11 Nov 2008 14:53:32 +0000 Subject: [Biopython-dev] [BioPython] Cannot __add__ two DBSeq objects In-Reply-To: <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com> References: <7265d4f0811110439h6c18e111te97d23070565cca2@mail.gmail.com> <320fb6e00811110502y624cf6c1r52c316d61a1f7228@mail.gmail.com> Message-ID: <320fb6e00811110653u63e85bc6k572d5fa42ede8280@mail.gmail.com> On Tue, Nov 11, 2008 at 1:02 PM, Peter wrote: > On Tue, Nov 11, 2008 at 12:39 PM, Cymon Cox wrote: >> Hi All, >> >> Two DBSeq objects cannot be concatenated, although the DBSeq object inherits >> __add__ from Seq. > > Interesting point - not something I'd considered (nor anyone else until now!) > >> It tries to init a new DBSeq object rather than returning a Seq object as would be expected. >> ... >> Presumably, DBSeq needs to overide Seq.__add__ >> (Using CVS as of yesterday...) > > Clearly we can't create a new DBSeq object (there wouldn't be any > suitable sequence in the database to point to), and returning a Seq > object is sensible. We should probably continue this discussion on > the dev mailing list (CC'd). Fixed in CVS by implementing the __add__ and __radd__ methods in the DBSeq object, and having these simply off load the work to the Seq class. See: BioSQL/BioSeq.py revision: 1.28 Tests/test_BioSQL.py revision: 1.26 Tests/output/test_BioSQL revision: 1.2 Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 11 15:28:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:28:20 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111528.mABFSK8A022517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-11 10:28 EST ------- (In reply to comment #2) > I am using the Biopython package from the debian-lenny repository (which is > 1.45), I guess they haven't updated in part due to the change to the Numpy. I > will checkout the svn version then. Debian sid is using Biopython 1.47, I think lenny is just very conservative. If you don't mind installing NumPy and trying to install Biopython from source, then you could either try getting the latest Biopython code from CVS, or try Biopython 1.49 beta which was released just a few days ago. Ask on the mailing list if you get stuck. > As for why I'm using Bio.Fasta, I'm not using it directly. > Bio.SeqUtils.CodonUsage.CodonAdaptationIndex.cai_for_gene() calls it. Oh - thanks for that. I've just updated Bio/SeqUtils/CodonUsage.py to use Bio.SeqIO instead of Bio.Fasta (plus added a basic check of this module to our unit tests). Peter [Leaving this bug as resolved fixed] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 15:43:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:43:05 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111543.mABFh5x8023530@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 ------- Comment #4 from rjalves at igc.gulbenkian.pt 2008-11-11 10:43 EST ------- Thanks Biopython 1.49b installed without any problems -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 15:43:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:43:15 -0500 Subject: [Biopython-dev] [Bug 2652] Bio.Fasta.Iterator fails with IndexError when opening empty fasta files In-Reply-To: Message-ID: <200811111543.mABFhFBp023551@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2652 rjalves at igc.gulbenkian.pt changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 15:46:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 10:46:13 -0500 Subject: [Biopython-dev] [Bug 2653] New: Bio.SeqUtils.CodonUsage is not translation table aware Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2653 Summary: Bio.SeqUtils.CodonUsage is not translation table aware Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Looking at Bio/SeqUtils/CodonUsage.py there is a hard coded dictionary SynonymousCodons, presumably for the standard genetic code. Ideally Bio.SeqUtils.CodonUsage should support any of the genetic code tables defined in Bio.Data.CodonTable, perhaps via an optional initiation argument to the CodonAdaptationIndex object. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 11 18:09:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 11 Nov 2008 13:09:20 -0500 Subject: [Biopython-dev] [Bug 2653] Bio.SeqUtils.CodonUsage is not translation table aware In-Reply-To: Message-ID: <200811111809.mABI9KXq004974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2653 rjalves at igc.gulbenkian.pt changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rjalves at igc.gulbenkian.pt ------- Comment #1 from rjalves at igc.gulbenkian.pt 2008-11-11 13:09 EST ------- Thanks for the heads up Peter. Also related to the reference codon table used... There is the possibility of a codon being completely absent in all given sequences. In this case the CodonAdaptationIndex.generate_index() function fails with a ZeroDivisionError on line 90. The resource at http://phenotype.biosci.umbc.edu/index.php?page=What_is_CAI might give some good indications on how to work around this and also other (improved?) implementations of CAI. Obviously if you use a different SynonymousCodons table the picture may change. Renato. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 11:14:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 06:14:27 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121114.mACBER3k002184@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #15 from dalloliogm at gmail.com 2008-11-12 06:14 EST ------- (In reply to comment #13) > (From update of attachment 1033 [details]) > Something similar was checked into CVS. > (In reply to comment #13) > (From update of attachment 1033 [details]) > Something similar was checked into CVS. > I saw the changes now! ok.. But I would prefer to put the doctest in the main __doc__ of the function instead of __init__ and __repr__. This is because otherwise they wouldn't be accessible by the users with the help function. Usually you do help(SeqRecord), not help(SeqRecord.__init__). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 11:47:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 06:47:25 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121147.mACBlP4T005886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-12 06:47 EST ------- (In reply to comment #15) > I saw the changes now! The CVS website is updated once an hour, you track this on http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed, http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart from the links when more than one file is changed). > ok.. But I would prefer to put the doctest in the main __doc__ of > the function instead of __init__ and __repr__. > This is because otherwise they wouldn't be accessible by the users with the > help function. Usually you do help(SeqRecord), not help(SeqRecord.__init__). If you do help(object) it shows you the main docstring followed by all the methods and their docstrings (including __init__). On the other hand all the special methods like __init__, __str__, __repr__ etc are going to be confusing for a beginner. On balance, a short example in the main docstring (covering __init__) does seem sensible, and perhaps the __init__ example is then redundant. Does anyone else want to comment? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cymon.cox at googlemail.com Wed Nov 12 10:57:12 2008 From: cymon.cox at googlemail.com (Cymon Cox) Date: Wed, 12 Nov 2008 10:57:12 +0000 Subject: [Biopython-dev] BioSQL buglets Message-ID: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com> All, Selects on the seqfeature_qualifier_value and dbxref tables were not being ordered by rank. This caused multiple qualifier values to be out of order which in turn caused the tests to fail - see comment in http://bugzilla.open-bio.org/show_bug.cgi?id=2616 This also solves a TODO in the test_BioSQL_SeqIO.py: 85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this: 86 +# ("genbank",False, 'GenBank/cor6_6.gb', 6), This test now works and Ive generated new output. In test_BioSQL.py create_database(), postgres returns an error string that 'find's on index 0 when the the database doesnt exist. The comparision therefore needs to be >= 0 rather than >0. All tests now pass OK with postgresql/psycopg2. Patch attached. Cheers, C. -- -------------- next part -------------- A non-text attachment was scrubbed... Name: biosql.patch Type: text/x-patch Size: 5105 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Wed Nov 12 13:12:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 08:12:24 -0500 Subject: [Biopython-dev] [Bug 2616] BioSQL support for Psycopg2 In-Reply-To: Message-ID: <200811121312.mACDCOdj011669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2616 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-12 08:12 EST ------- (In reply to comment #10) > > We still need to sort out the feature qualifiers loss of ordering... > Fixed in CVS with a another patch from Cymon (via the mailing list). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Nov 12 13:13:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 13:13:16 +0000 Subject: [Biopython-dev] BioSQL buglets In-Reply-To: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com> References: <7265d4f0811120257y241f67fl514b77cb03712552@mail.gmail.com> Message-ID: <320fb6e00811120513p3be878b8pe0c5a48fa3945ff5@mail.gmail.com> On Wed, Nov 12, 2008 at 10:57 AM, Cymon Cox wrote: > All, > > Selects on the seqfeature_qualifier_value and dbxref tables were not being > ordered by rank. This caused multiple qualifier values to be out of order > which in turn caused the tests to fail - see comment in > http://bugzilla.open-bio.org/show_bug.cgi?id=2616 > > This also solves a TODO in the test_BioSQL_SeqIO.py: > > 85 +#TODO - Pin down the "Duplicate entry" IntegrityError from this: > 86 +# ("genbank",False, 'GenBank/cor6_6.gb', 6), > > This test now works and Ive generated new output. > > In test_BioSQL.py create_database(), postgres returns an error string that > 'find's on index 0 when the the database doesnt exist. The comparision > therefore needs to be >= 0 rather than >0. > > All tests now pass OK with postgresql/psycopg2. > Patch attached. > > Cheers, C. Excellent - that patch made perfect sense and I've checked it in (almost as is - I tweaked the find index bit slightly). Thank you! At this rate you'll be co-opted as an official maintainer for the BioSQL module ;) Peter P.S. It might have been better to upload the patch to Bug 2616 (or a new Bug) rather than sending it to everyone on the mailing list. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 15:35:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 10:35:54 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121535.mACFZsMl021458@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #17 from dalloliogm at gmail.com 2008-11-12 10:35 EST ------- (In reply to comment #16) > (In reply to comment #15) > > I saw the changes now! > > The CVS website is updated once an hour, you track this on > http://biopython.org/wiki/Tracking_CVS_commits which displays the RSS feed, > http://biopython.open-bio.org/CVS2RSS/biopython.rss (this works great apart > from the links when more than one file is changed). > > > ok.. But I would prefer to put the doctest in the main __doc__ of > > the function instead of __init__ and __repr__. > > This is because otherwise they wouldn't be accessible by the users with the > > help function. Usually you do help(SeqRecord), not help(SeqRecord.__init__). > > If you do help(object) it shows you the main docstring followed by all the > methods and their docstrings (including __init__). > > On the other hand all the special methods like __init__, __str__, __repr__ etc > are going to be confusing for a beginner. > > On balance, a short example in the main docstring (covering __init__) does seem > sensible, and perhaps the __init__ example is then redundant. well, I was saying that maybe it would be better to move the doctests in __init__ and __repr__ to the main __doc__ of the module. So it will be visible by people using help(module). Moreover, you can to test __repr__ and __init__ from there, without having to repeat the 'from Bio.ALign.Generic import Alignment' stuff and similar every time. as for a few comments you added in Bio.Align.Generic: > #A doctest for __repr__ would be nice, but __class__ comes out differently > #if run via the __main__ trick. maybe you can use the '+ELLIPSIS' directive and about this comment: #A doctest would be nice, but the stuff is very ugly! #The "tab" format is possible, but tabs don't seem to work nicely in doctests. you could use the directive NORMALIZE_WHITESPACE in a similar way. I am attaching a file just to give you an example of how it could be with +ELLIPSIS > Does anyone else want to comment? > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 12 15:36:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 10:36:37 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811121536.mACFabdk021517@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 ------- Comment #18 from dalloliogm at gmail.com 2008-11-12 10:36 EST ------- Created an attachment (id=1052) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1052&action=view) example of ellipsis directive Example of doctest with ellipsis directive to test Alignment.__repr__ -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Wed Nov 12 16:25:47 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 12 Nov 2008 17:25:47 +0100 Subject: [Biopython-dev] a sequence set object in biopython? Message-ID: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> Hi, I think it could be useful to add a generic SequenceSet object in biopython. Such an object would represent a generic set of sequences, and could have some useful methods like .format('fasta') or .align('alignment_tool'). Is there something similar available already? I have noticed that the actual Generic.Alignment is very similar to such an object. However, it would be better to be able to work with a separated class, because sometimes you want to deal with sequences that are not aligned. Some use cases: - a set of sequences that represents all introns in a particular gene, on which I want to calculate the conservation of the splicing regulatory sites. - all genes sequences in an organisms, which I want to convert in EMBL format - a set of seqs to be aligned or used as input for other tools etc.. -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Wed Nov 12 16:29:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 12 Nov 2008 11:29:07 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811121629.mACGT7gs025634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 cymon.cox at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cymon.cox at gmail.com ------- Comment #1 from cymon.cox at gmail.com 2008-11-12 11:29 EST ------- (In reply to comment #0) > This is related to the very broad alignment bug 1944. > > Given two alignments, it can make sense to talk about adding them together. Actually, this is a very common procedure in phylogenetic analyses, where multiple genes/loci are combined into a "super" matrix for a set of taxa. Although, in this case, adding by column, if a taxon/row/identifier was missing in a particular (sub-)alignment it would be filled by "-" (missing data) in the combined matrix. Anyway, I think this would be a very useful enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Nov 12 17:53:35 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 17:53:35 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> Message-ID: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I think it could be useful to add a generic SequenceSet object in biopython. > Such an object would represent a generic set of sequences, and could > have some useful methods like .format('fasta') or > .align('alignment_tool'). > Is there something similar available already? Given your example to turn the SequenceSet into a FASTA file, then clearly you are thinking of a collection of SeqRecord objects rather than just Seq objects. For this kind of thing I personally just use a list of SeqRecord objects. If I want to turn a list of SeqRecord objects into a FASTA file, I can pass the list to the Bio.SeqIO.write() function. Once I've made a FASTA file, I can call an external tool to align them - and then load them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan to do next. > I have noticed that the actual Generic.Alignment is very similar to > such an object. However, it would be better to be able to work with a > separated class, because sometimes you want to deal with sequences > that are not aligned. Yes, the generic alignment is basically a list of SeqRecord objects plus some extra functionality like column access. > Some use cases: > - a set of sequences that represents all introns in a particular gene, > on which I want to calculate the conservation of the splicing > regulatory sites. > - all genes sequences in an organisms, which I want to convert in EMBL format > - a set of seqs to be aligned or used as input for other tools > etc.. All sensible use cases - but all seem to be covered by a simple python list of SeqRecord objects, or in some cases a list of Seq objects (e.g. the introns example, as I doube the introns have names). Peter From tiagoantao at gmail.com Wed Nov 12 18:02:11 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 12 Nov 2008 18:02:11 +0000 Subject: [Biopython-dev] PopGen status and new developments Message-ID: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> Hi, This an email with the status of current PopGen developments. In some points, advice is especially welcome. A. Platform support As Peter noticed there is no Simcoal for the Mac. In a couple of weeks I hope to have access to a Mac in order to try to compile it. In any case I wont be able to distribute it without getting permission from the authors, so the problem might remain... I am now preparing support for LDNe, an application to estimate Ne (effective population size) from LD. This application is Dos(Windows) only. Source code is not available to the public (but the app is free as free beer). I've had access to the source and compiled a Linux version, again, I don't know if the author will let me distribute it. Question: How do people feel about supporting an application like this? Any strong feelings against? B. New developments 1. The above LDNe module is fully coded, and being tested by a few people (not just me). Test code and documentation TBD but easy. 2. Genepop application support (no confusion with file format support, which is done). Partially done and informally tested. Plan to start with just partial support. 3. Fstat parser. Coded. C. Statistics An ongoing interesting discussion started on statistics. I am delayed with doing a proposal to handle statistical processing (my bad, but I will have some free time in the next couple of weeks and I will try to recover). My current existing code on the subject is available on Github (by Giovanni), but I think it will need some change (not in the functionality, but in the architecture). From biopython at maubp.freeserve.co.uk Wed Nov 12 18:06:19 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 18:06:19 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> Message-ID: <320fb6e00811121006mbe32efar2fca638d1a5fe2ef@mail.gmail.com> On Wed, Nov 12, 2008 at 5:53 PM, Peter wrote: > On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio > wrote: >> Hi, >> I think it could be useful to add a generic SequenceSet object in biopython. >> Such an object would represent a generic set of sequences, and could >> have some useful methods like .format('fasta') or >> .align('alignment_tool'). >> Is there something similar available already? > > Given your example to turn the SequenceSet into a FASTA file, then > clearly you are thinking of a collection of SeqRecord objects rather > than just Seq objects. For this kind of thing I personally just use a > list of SeqRecord objects. > > If I want to turn a list of SeqRecord objects into a FASTA file, I can > pass the list to the Bio.SeqIO.write() function. Once I've made a > FASTA file, I can call an external tool to align them - and then load > them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan > to do next. If you really want a list like object with a format method in your code, how about something like this: class SeqRecordList(list) : """Subclass of the python list, to hold SeqRecord objects only.""" #TODO - Override the list methods to make sure all the items #are indeed SeqRecord objects def format(self, format) : """Returns a string of all the records in a requested file format. The argument format should be any file format supported by the Bio.SeqIO.write() function. This must be a lower case string. """ from Bio import SeqIO from StringIO import StringIO handle = StringIO() SeqIO.write(self, handle, format) handle.seek(0) return handle.read() if __name__ == "__main__" : print "Loading records..." from Bio import SeqIO my_list = SeqRecordList(SeqIO.parse(open("ls_orchid.gbk"),"genbank")) print len(my_list) for format in ["fasta","tab"] : print print format print "="*len(format) print my_list.format(format) Peter From biopython at maubp.freeserve.co.uk Wed Nov 12 18:11:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 18:11:30 +0000 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> Message-ID: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> Tiago Ant?o wrote: > A. Platform support > > As Peter noticed there is no Simcoal for the Mac. In a couple of weeks > I hope to have access to a Mac in order to try to compile it. In any > case I wont be able to distribute it without getting permission from > the authors, so the problem might remain... > I am now preparing support for LDNe, an application to estimate Ne > (effective population size) from LD. This application is Dos(Windows) > only. Source code is not available to the public (but the app is free > as free beer). I've had access to the source and compiled a Linux > version, again, I don't know if the author will let me distribute it. > Question: How do people feel about supporting an application like > this? Any strong feelings against? Assuming the tools are useful, then I have no objection to including command line wrappers for them in Biopython. I'm not 100% sure what you meant by "supporting an application like this", but if you are asking about supporting these cross-platform ports of the actual command line tools, then I don't see that as something Biopython should be doing. Peter From tiagoantao at gmail.com Wed Nov 12 18:16:06 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 12 Nov 2008 18:16:06 +0000 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> Message-ID: <6d941f120811121016q17451c83u12b2233eba625944@mail.gmail.com> On Wed, Nov 12, 2008 at 6:11 PM, Peter wrote: > I'm not 100% sure what you meant by "supporting an application like > this", but if you are asking about supporting these cross-platform > ports of the actual command line tools, then I don't see that as > something Biopython should be doing. Sorry, I was not clear: I was just asking about supporting applications that dont have the source available and that don't support all common platforms (the case of LDNe). From dalloliogm at gmail.com Wed Nov 12 18:17:48 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 12 Nov 2008 19:17:48 +0100 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> Message-ID: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> On Wed, Nov 12, 2008 at 6:53 PM, Peter wrote: > On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio > wrote: >> Hi, >> I think it could be useful to add a generic SequenceSet object in biopython. >> Such an object would represent a generic set of sequences, and could >> have some useful methods like .format('fasta') or >> .align('alignment_tool'). >> Is there something similar available already? > > Given your example to turn the SequenceSet into a FASTA file, then > clearly you are thinking of a collection of SeqRecord objects rather > than just Seq objects. For this kind of thing I personally just use a > list of SeqRecord objects. > > If I want to turn a list of SeqRecord objects into a FASTA file, I can > pass the list to the Bio.SeqIO.write() function. Once I've made a > FASTA file, I can call an external tool to align them - and then load > them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan > to do next. > >> Some use cases: >> - a set of sequences that represents all introns in a particular gene, >> on which I want to calculate the conservation of the splicing >> regulatory sites. >> - all genes sequences in an organisms, which I want to convert in EMBL format >> - a set of seqs to be aligned or used as input for other tools >> etc.. > > All sensible use cases - but all seem to be covered by a simple python > list of SeqRecord objects, or in some cases a list of Seq objects > (e.g. the introns example, as I doube the introns have names). > Not always. For example, if I have a set of genes in an organism, sometimes I would need to access to only some of them, by their id; so, a __getattribute__ method to make it work as a dictionary could also be useful. The fact is that I think that such an object would be so widely used, that maybe it would be useful to implement it in biopython. What I would do, honestly, is to create a GenericSeqRecordSet class from which to derive Alignment, specifying that in an alignment all the sequences should have the same lenght. It would not require much work and it would change the interface. very tiny little minusculus p.s. if you need help for implement such a thing or anything else I can volounteer :). > Peter > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From dalloliogm at gmail.com Wed Nov 12 18:19:50 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 12 Nov 2008 19:19:50 +0100 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> Message-ID: <5aa3b3570811121019k3a0710f1n2add599ce0b4f56a@mail.gmail.com> On Wed, Nov 12, 2008 at 7:02 PM, Tiago Ant?o wrote: > Hi, > > This an email with the status of current PopGen developments. In some > points, advice is especially welcome. Hi Tiago!! Have you noticed (I thought it wasn't directly related to PopGen so I didn't tell you directly) about this parser for fastPhaseOutput? - http://bugzilla.open-bio.org/show_bug.cgi?id=2643 > > > A. Platform support > > As Peter noticed there is no Simcoal for the Mac. In a couple of weeks > I hope to have access to a Mac in order to try to compile it. In any > case I wont be able to distribute it without getting permission from > the authors, so the problem might remain... > I am now preparing support for LDNe, an application to estimate Ne > (effective population size) from LD. This application is Dos(Windows) > only. Source code is not available to the public (but the app is free > as free beer). I've had access to the source and compiled a Linux > version, again, I don't know if the author will let me distribute it. > Question: How do people feel about supporting an application like > this? Any strong feelings against? > > > B. New developments > > 1. The above LDNe module is fully coded, and being tested by a few > people (not just me). Test code and documentation TBD but easy. > 2. Genepop application support (no confusion with file format support, > which is done). Partially done and informally tested. Plan to start > with just partial support. > 3. Fstat parser. Coded. > > > C. Statistics > > An ongoing interesting discussion started on statistics. I am delayed > with doing a proposal to handle statistical processing (my bad, but I > will have some free time in the next couple of weeks and I will try to > recover). My current existing code on the subject is available on > Github (by Giovanni), but I think it will need some change (not in the > functionality, but in the architecture). > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Nov 12 18:36:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 12 Nov 2008 18:36:11 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> Message-ID: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> Giovanni Marco Dall'Olio wrote: >> All sensible use cases - but all seem to be covered by a simple python >> list of SeqRecord objects, or in some cases a list of Seq objects >> (e.g. the introns example, as I doube the introns have names). > > Not always. > For example, if I have a set of genes in an organism, sometimes I > would need to access to only some of them, by their id; so, a > __getattribute__ method to make it work as a dictionary could also be > useful. OK, then use a dict of SeqRecords for this, as shown in the tutorial chapter for Bio.SeqIO and the wiki. We even have a helper function Bio.SeqIO.to_dict() to do this and check for duplicate keys. If you need an order preserving dictionary, there are examples of this on the net and there is even PEP372 for adding this to python itself: http://www.python.org/dev/peps/pep-0372/ > The fact is that I think that such an object would be so widely used, > that maybe it would be useful to implement it in biopython. > What I would do, honestly, is to create a GenericSeqRecordSet class > from which to derive Alignment, specifying that in an alignment all > the sequences should have the same lenght. It would not require much > work and it would change the interface. I agree that IF we added some sort of "GenericSeqRecordSet class", it might be sensible for the alignment objects to subclass it - especially if you want it to behave list a python list primarily. Note that in python sets are not order preserving. > very tiny little minusculus p.s. if you need help for implement such a > thing or anything else I can volounteer :). That's good to hear :) However, we'd have to establish the need for this new object first - but so far we've only had two people's view so its too early to form a consensus. I don't see a strong reason for adding yet another object, when the core language provides lists, sets and dict which seem to be enough. Peter From jflatow at gmail.com Wed Nov 12 18:52:35 2008 From: jflatow at gmail.com (Jared Flatow) Date: Wed, 12 Nov 2008 12:52:35 -0600 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> Message-ID: On Nov 12, 2008, at 12:36 PM, Peter wrote: > However, we'd have to establish the need for this new object first - > but so far we've only had two people's view so its too early to form a > consensus. I don't see a strong reason for adding yet another object, > when the core language provides lists, sets and dict which seem to be > enough. I totally agree with you Peter, that's what the basic container types are for. If someone wants to create a subclass of these containers for a specific purpose it is simple enough to do. IMO its kind of silly to try and make sequence specific containers that satisfy everyone's needs. jared From bsouthey at gmail.com Wed Nov 12 18:58:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 12 Nov 2008 12:58:05 -0600 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> Message-ID: <491B273D.9020404@gmail.com> Peter wrote: > Tiago Ant?o wrote: > >> A. Platform support >> >> As Peter noticed there is no Simcoal for the Mac. In a couple of weeks >> I hope to have access to a Mac in order to try to compile it. In any >> case I wont be able to distribute it without getting permission from >> the authors, so the problem might remain... >> I am now preparing support for LDNe, an application to estimate Ne >> (effective population size) from LD. This application is Dos(Windows) >> only. Source code is not available to the public (but the app is free >> as free beer). I've had access to the source and compiled a Linux >> version, again, I don't know if the author will let me distribute it. >> Question: How do people feel about supporting an application like >> this? Any strong feelings against? >> > > Assuming the tools are useful, then I have no objection to including > command line wrappers for them in Biopython. > > I'm not 100% sure what you meant by "supporting an application like > this", but if you are asking about supporting these cross-platform > ports of the actual command line tools, then I don't see that as > something Biopython should be doing. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Hi, I do have concerns about usefulness with regards to Biopython. How widespread is the application? What platforms is it released under (DOS only or some version of windows version like XP or Vista or Windows 7)? Is the application well supported and will it continue to be supported? Under what terms is the application 'free'? How does this integrate into your ideas for Popgen? Would it work like say clustalw where you output something from Biopython, run the application and perhaps import something back into Biopython? If the application requires major data formatting then you would have to determine if it is easier to support the application or integrate it into Biopython. Obviously, this latter requires a clean room implementation of the application or the essential algorithm. Also, you can only provide the specification and can not be involved the actual implementation. Bruce From tiagoantao at gmail.com Wed Nov 12 20:09:31 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 12 Nov 2008 20:09:31 +0000 Subject: [Biopython-dev] PopGen status and new developments In-Reply-To: <491B273D.9020404@gmail.com> References: <6d941f120811121002k75c8ab43g54ebeb968342648b@mail.gmail.com> <320fb6e00811121011q26665967tce65a0e125b3e032@mail.gmail.com> <491B273D.9020404@gmail.com> Message-ID: <6d941f120811121209n75dfb0cfh1fb4e57a98011ed0@mail.gmail.com> Hi, On Wed, Nov 12, 2008 at 6:58 PM, Bruce Southey wrote: > I do have concerns about usefulness with regards to Biopython. It is important to notice that having this application support has no big impact on deployment of biopython. The only visible thing is some tests reporting that the application doesn't exist. This is different from adding a dependency on, say, scipy. I don't think that this imposes any maintenance/installation hurdle at large. I think, this is actually a non-problem on the deployment stage, at least. > How widespread is the application? The application is fairly new (genepop, on the other hand is widely used and old). I cannot answer that question. I know of some people using it, but it is my small, biased, universe. I would guess that currently the number is small. Is there a policy to only support widespread applications? > What platforms is it released under (DOS only or some version of windows > version like XP or Vista or Windows 7)? There is a Dos and Windows frontend. I actually asked the code to the authors and they gave me access to it. I have compiled a Linux version, but I don't know if they are going to make it available. > Is the application well supported and will it continue to be supported? Regarding current support, I can subjectively say that the authors answer my queries rather fast. Regarding the future, I dont know. > Under what terms is the application 'free'? Much software available in this field is made available without no regards for licensing issues. This is already the case for the supported Fdist application (source available, no license). This is problem in the field, where people make things available without much concern for licensing issues. Some people don't care that much about that, they just "make things available". So, if there is a policy to only support applications for which there is a clear license, then this one is out (and some code has to be removed from the current PopGen module, by the way). I never link the code in, I just invoke it (these are mostly wrappers), so there should be no legal issues in any case, I suspect. There is a chicken and egg problem here that needs to be fought: In population genetics there is no widespread tradition of making things open (not because people want closed solutions, but mostly because people don't think about these issues). There is also little tradition in coding (people want ready made solutions. The coding people is relatively few and mostly R based) than in other areas. As an example: i don't know of many direct users of fdist code, but know lots of people which use applications made on top of that code. By the way, Simcoal is GPL (and there are more examples of open code in population genetics, of course). > How does this integrate into your ideas for Popgen? Very well. I have this stated philosophy, from the beginning, of using existing applications and not reinvent the wheel. That being said, I agree that a core statistic implementation should be done (even if there are alternatives). But, mostly, for now, what is available in Bio.PopGen are intelligent wrappers. > Would it work like say clustalw where you output something from Biopython, > run the application and perhaps import something back into Biopython? Yep, it accepts genepop files and the output is fully parsed back. This is still not the case, by the way, with simcoal where the output is not usable (arlequin is needed to analyze the results). I need to do an arlequin parser, that would solve the problem. > If the application requires major data formatting then you would have to It doesn't require any formatting at all as the de facto standard format in the area (genepop) is supported and the results are parsed back. Tiago From dalloliogm at gmail.com Thu Nov 13 00:16:44 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 13 Nov 2008 01:16:44 +0100 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> References: <5aa3b3570811120825y6ed11c00y384751e8f0f7adff@mail.gmail.com> <320fb6e00811120953t57c206e7nd0c8151b92361d5a@mail.gmail.com> <5aa3b3570811121017u72eb7552v94275368cb23cf48@mail.gmail.com> <320fb6e00811121036w17e0d2acv6723c751350f1893@mail.gmail.com> Message-ID: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com> On Wed, Nov 12, 2008 at 7:36 PM, Peter wrote: > Giovanni Marco Dall'Olio wrote: >>> All sensible use cases - but all seem to be covered by a simple python >>> list of SeqRecord objects, or in some cases a list of Seq objects >>> (e.g. the introns example, as I doube the introns have names). >> >> Not always. >> For example, if I have a set of genes in an organism, sometimes I >> would need to access to only some of them, by their id; so, a >> __getattribute__ method to make it work as a dictionary could also be >> useful. > > OK, then use a dict of SeqRecords for this, as shown in the tutorial > chapter for Bio.SeqIO and the wiki. We even have a helper function > Bio.SeqIO.to_dict() to do this and check for duplicate keys. I would prefer a SeqRecordSet object with a to_dict method :) > If you need an order preserving dictionary, there are examples of this > on the net and there is even PEP372 for adding this to python itself: > http://www.python.org/dev/peps/pep-0372/ >> The fact is that I think that such an object would be so widely used, >> that maybe it would be useful to implement it in biopython. >> What I would do, honestly, is to create a GenericSeqRecordSet class >> from which to derive Alignment, specifying that in an alignment all >> the sequences should have the same lenght. It would not require much >> work and it would change the interface. > > I agree that IF we added some sort of "GenericSeqRecordSet class", it > might be sensible for the alignment objects to subclass it - > especially if you want it to behave list a python list primarily. Let's see it from another point of view. In biopython, if you want to print a set of sequences in fasta format, you have to do the following: >>> s1 = SeqRecord(Seq('cacacac')) >>> s2 = SeqRecord(Seq('cacacac')) >>> seqs = s1, s2 >>> out = '' >>> for seq in seqs: >>> # a "print seq.format('fasta')" statement won't work properly here, because of blank lines >>> out += seq.format('fasta') >>> print out On the other side, printing an alignment in fasta format is a lot simpler: >>> al = Alignment(SingleLetterAlphabet) >>> al.add_sequence('s1', 'cacaca') >>> al.add_sequence('s2, 'cacaca') >>> print al.format('fasta') I work more often with sets of sequences rather than with alignments. So, why it is more difficult to print some un-related sequences in a certain format, than aligned sequence? I would end up using Alignment objects also for sequences that are not aligned. I am also thinking about many format parsers. Wouldn't it be easier: >>> seqs = Bio.SeqIO.parse(filehandler, 'fasta') >>> record_dict = seqs.to_dict() than invoking SeqIO twice? > Note that in python sets are not order preserving. > >> very tiny little minusculus p.s. if you need help for implement such a >> thing or anything else I can volounteer :). > > That's good to hear :) > > However, we'd have to establish the need for this new object first - > but so far we've only had two people's view so its too early to form a > consensus. I don't see a strong reason for adding yet another object, > when the core language provides lists, sets and dict which seem to be > enough. Take for example this code you wrote for me before: > class SeqRecordList(list) : > """Subclass of the python list, to hold SeqRecord objects only.""" > #TODO - Override the list methods to make sure all the items > #are indeed SeqRecord objects > > def format(self, format) : > """Returns a string of all the records in a requested file format. > > The argument format should be any file format supported by > the Bio.SeqIO.write() function. This must be a lower case string. > """ > from Bio import SeqIO > from StringIO import StringIO > handle = StringIO() > SeqIO.write(self, handle, format) > handle.seek(0) > return handle.read() It's very useful, but I don't think a python/biopython newbie would be able to write it. That's why I think it should be included. Last year, I was in another laboratory and I didn't have much experience with biopython, and I was missing such a kind of object. > Peter > Goodnight!! -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Thu Nov 13 07:16:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 02:16:02 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811130716.mAD7G2pw008200@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #2 from fkauff at biologie.uni-kl.de 2008-11-13 02:16 EST ------- The Nexus module in Bio.Nexus has a function (not a method) 'combine' that can combine Nexus objects. It takes care of missing taxa, taxon sets, etc. Usage is something like: nex1=Nexus.Nexus('myfirstalignment.nex') nex2=Nexus.Nexus('mysecondalignment.nex') combined=Nexus.combine([('fancyname1',nex1),('fancyname2',nex2)]) It looks fairly straightforward to add this to a SeqRecord object. Cheers, Frank (Hi Cymon) (In reply to comment #1) > (In reply to comment #0) > > This is related to the very broad alignment bug 1944. > > > > Given two alignments, it can make sense to talk about adding them together. > > Actually, this is a very common procedure in phylogenetic analyses, where > multiple genes/loci are combined into a "super" matrix for a set of taxa. > Although, in this case, adding by column, if a taxon/row/identifier was missing > in a particular (sub-)alignment it would be filled by "-" (missing data) in the > combined matrix. > > Anyway, I think this would be a very useful enhancement. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 10:19:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 05:19:29 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811131019.mADAJTxs024880@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 05:19 EST ------- (In reply to comment #1) > (In reply to comment #0) > > This is related to the very broad alignment bug 1944. > > > > Given two alignments, it can make sense to talk about adding them together. > > Actually, this is a very common procedure in phylogenetic analyses, where > multiple genes/loci are combined into a "super" matrix for a set of taxa. This was one of the use cases I originally had in mind here (with hindsight I should have mentioned this in the original proposal). Another potentially use for this is in combination with extracting sub-alignments by column (see Bug 2551) - for example to remove some middle region of an alignment by selecting the two end regions and adding them together, e.g. new_align = align[:,:10] + align[:,20:] to remove the region from columns 10 to 20. As described in my original proposal, adding two alignments "by column" would require they have the same number of rows, and the same IDs (possibly in a different order - this is not essential as making the user think about their preferred sort order seem fine to me). I suppose using any common subset of shared names is also well defined, or automatically including null sequences for missing entries (as Frank suggested in comment 2), but I would much prefer to keep any alignment addition simple and explicit - no "magic". More generally you could consider adding any two alignments "by column" if they have the same number of rows, but first we'd have to talk about adding SeqRecord objects. This means doing something sensible with the annotation, in particular the id and name. I was hoping to avoid this. Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list, especially now that we have some positive responses. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Thu Nov 13 10:27:57 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 13 Nov 2008 02:27:57 -0800 (PST) Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com> Message-ID: <25667.98653.qm@web62408.mail.re1.yahoo.com> Adding new classes to Biopython should be done very carefully ... once they're in, it's difficult to remove them again. In the past, removing classes that turned out to be less than ideal was a real headache. Right now I don't see a clear need for a sequence set object ... read on. --- On Wed, 11/12/08, Giovanni Marco Dall'Olio > > > > OK, then use a dict of SeqRecords for this, as shown > > in the tutorial chapter for Bio.SeqIO and the wiki. > > We even have a helper function > > Bio.SeqIO.to_dict() to do this and check for duplicate > > keys. > > I would prefer a SeqRecordSet object with a to_dict method > Wouldn't it be easier: > >>> seqs = Bio.SeqIO.parse(filehandler, > 'fasta') > >>> record_dict = seqs.to_dict() > > than invoking SeqIO twice? Maybe, yes, but it's just a matter of typing and I don't think that by itself it is a good enough reason for a SeqRecordSet class. > Let's see it from another point of view. > In biopython, if you want to print a set of sequences in > fasta format, > you have to do the following: > >>> s1 = SeqRecord(Seq('cacacac')) > >>> s2 = SeqRecord(Seq('cacacac')) > >>> seqs = s1, s2 > >>> out = '' > >>> for seq in seqs: > # a "print seq.format('fasta')" statement won't work > # properly here, because of blank lines > out += seq.format('fasta') > >>> print out I don't quite understand why "print seq.format('fasta')" won't work. > Take for example this code you wrote for me before: > > > class SeqRecordList(list) : > > def format(self, format) : > > from Bio import SeqIO > > from StringIO import StringIO > > handle = StringIO() > > SeqIO.write(self, handle, format) > > handle.seek(0) > > return handle.read() > > It's very useful, but I don't think a > python/biopython newbie would be > able to write it. I agree that this is too complicated. What if we redefine SeqIO.write as def write(self, handle=sys.stdout, format='fasta'): ... So by default SeqIO.write prints to the screen. Then you can do SeqIO.write(records) where records are a list of SeqRecord's. --Michiel. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 11:06:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 06:06:20 -0500 Subject: [Biopython-dev] [Bug 2628] Have Bio.SeqIO.write(...) and Bio.AlignIO.write(...) return number of records In-Reply-To: Message-ID: <200811131106.mADB6Ki7030741@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2628 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 06:06 EST ------- Note - now that we return the count, this does block a previous suggestion by Michiel that if the handle were omitted the write function could default to returning a string (handled via StringIO internally). I wasn't keen on this idea at the time because it would have given the write function very different behaviour depending on the arguments. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 13 11:11:10 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Nov 2008 11:11:10 +0000 Subject: [Biopython-dev] [BioPython] a sequence set object in biopython? In-Reply-To: <25667.98653.qm@web62408.mail.re1.yahoo.com> References: <5aa3b3570811121616u5f95cc8du9f0d91e4743f067f@mail.gmail.com> <25667.98653.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00811130311t4e813a8fqeb21504fd5696bf1@mail.gmail.com> Michiel wrote: >Marco wrote: >> Take for example this code you [Peter] wrote for me before: >> >> > class SeqRecordList(list) : >> > def format(self, format) : >> > from Bio import SeqIO >> > from StringIO import StringIO >> > handle = StringIO() >> > SeqIO.write(self, handle, format) >> > handle.seek(0) >> > return handle.read() >> >> It's very useful, but I don't think a >> python/biopython newbie would be >> able to write it. > > I agree that this is too complicated. This wasn't aimed at a beginner, but rather for Marco if he really wants to use this kind of object in his own code, or as a basis for further discussion. > What if we redefine SeqIO.write as > > def write(self, handle=sys.stdout, format='fasta'): > ... > > So by default SeqIO.write prints to the screen. Then you can do > > SeqIO.write(records) > > where records are a list of SeqRecord's. We could certainly include something like this in the documentation: #Just an example to create some records: from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord records = [SeqRecord(Seq("ACGT"),"Alpha"), SeqRecord(Seq("GTGC"),"Beta")] #One way to "print" records to screen, import sys from Bio import SeqIO SeqIO.write(records, sys.stdout, "fasta") I'm not so keen on making the handle default to standard out, but this is nicer than the suggestion you made some time ago that if the handle were omitted a string be returned (no longer an option since Bug 2628 was committed). Any other votes for the standard out default? Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 13 11:18:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 06:18:01 -0500 Subject: [Biopython-dev] [Bug 2552] Adding alignments In-Reply-To: Message-ID: <200811131118.mADBI1of031964@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2552 ------- Comment #4 from fkauff at biologie.uni-kl.de 2008-11-13 06:18 EST ------- (In reply to comment #3) > > > > > Actually, this is a very common procedure in phylogenetic analyses, where > > multiple genes/loci are combined into a "super" matrix for a set of taxa. > > This was one of the use cases I originally had in mind here (with hindsight I > should have mentioned this in the original proposal). Another potentially use > for this is in combination with extracting sub-alignments by column (see Bug > 2551) - for example to remove some middle region of an alignment by selecting > the two end regions and adding them together, e.g. new_align = align[:,:10] + > align[:,20:] to remove the region from columns 10 to 20. Nexus parser can already handle this by rewriting the data set >> nexobject.write_nexus_data(filename='new.nex',exclude=[range(10,21)],delete=['list','of','taxa','two','delete']) where the indices of remaining character sets and character partitions get recalculated. > > As described in my original proposal, adding two alignments "by column" would > require they have the same number of rows, and the same IDs (possibly in a > different order - this is not essential as making the user think about their > preferred sort order seem fine to me). > > I suppose using any common subset of shared names is also well defined, or > automatically including null sequences for missing entries (as Frank suggested > in comment 2), but I would much prefer to keep any alignment addition simple > and explicit - no "magic". > Yes, missing names are given missing character entries > More generally you could consider adding any two alignments "by column" if they > have the same number of rows, but first we'd have to talk about adding > SeqRecord objects. This means doing something sensible with the annotation, in > particular the id and name. I was hoping to avoid this. > > Once Biopython 1.49 is out, dealing with this bug is certainly on my todo list, > especially now that we have some positive responses. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 12:14:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 07:14:21 -0500 Subject: [Biopython-dev] [Bug 2654] New: Bio.Blast.NCBIStandalone does not support the output file argument Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2654 Summary: Bio.Blast.NCBIStandalone does not support the output file argument Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The NCBI blastall tool defaults to writing its output to standard out, but can be told to write to a file instead: -o BLAST report Output File [File Out] Optional Currently Bio.Blast.NCBIStandalone.blastall() does not support this optional argument - meaning the user wants to save the output they must do this manually from the standard out handle. This also applies to rpsblast and blastpgp as well. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.pruitt at gmail.com Thu Nov 13 13:00:36 2008 From: eric.pruitt at gmail.com (James Pruitt) Date: Thu, 13 Nov 2008 07:00:36 -0600 Subject: [Biopython-dev] Lowess Smooth Improvement Message-ID: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> I made some changes to the Lowess smoothing method as well as written a unit test for it. On my machine, it runs around 37% faster in my unit tests compared to the original lowess method and that is using the numpy.median function so it would probably run even faster with the Bio.Cluster median functoin. How do I go about proposing my code to be included in Bio.Python? -- -Jimmy From biopython at maubp.freeserve.co.uk Thu Nov 13 13:27:51 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Nov 2008 13:27:51 +0000 Subject: [Biopython-dev] Lowess Smooth Improvement In-Reply-To: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> Message-ID: <320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com> On Thu, Nov 13, 2008 at 1:00 PM, James Pruitt wrote: > I made some changes to the Lowess smoothing method as well as written a unit > test for it. On my machine, it runs around 37% faster in my unit tests > compared to the original lowess method and that is using the numpy.median > function so it would probably run even faster with the Bio.Cluster median > functoin. Presumable this is an update for Bio/Statistics/lowess.py? I'm a little confused - this code already uses Bio.Cluster.median if it can, falling back on numpy.median. Maybe you're working from an older version of Bipython? > How do I go about proposing my code to be included in Bio.Python? First file an enhancement Bug, then once the bug is filed you can attached a patch against CVS. If you have any example scripts or unit tests to go with it, even better. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 13 15:25:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 10:25:56 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811131525.mADFPuvi029137@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 ------- Comment #22 from dalloliogm at gmail.com 2008-11-13 10:25 EST ------- Created an attachment (id=1053) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1053&action=view) test files for fastPhaseOutput I put the fastPhaseoutput files, used in the tests, in separated files, as asked. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 15:59:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 10:59:02 -0500 Subject: [Biopython-dev] [Bug 2655] New: Sorting sub-features in BioSeq.py can return corrupted feature Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2655 Summary: Sorting sub-features in BioSeq.py can return corrupted feature Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: major Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com BioSeq.py retrieves SeqFeatures from a BioSQL database and sorts both the features and any subfeatures. The first sort is superfluous and the second sort is an error that can lead to feature being returned corrupted with the sub-features in an incorrect order. So Ive marked this major... Ive been trying to implement the feature/sub-feature locations test in test_BioSQL_SeqIO. Here's my solution (attached as patch1): """ # Compare sub-feature Locations: # # BioSQL currently does not store fuzzy locations, but instead stores # them as FeatureLocation.nofuzzy_start FeatureLocation.nofuzzy_end. # Hence, the old_sub from SeqIO.parse() will have fuzzy location while # new_sub locations from BioSQL will be fuzzy. # The vast majority of cases will be comparisons of ExactPosition # class locations, so we'll try that first and catch the exceptions. try: assert str(old_sub.location) == str(new_sub.location), \ "%s -> %s" % (str(old_sub.location), str(new_sub.location)) except AssertionError, e: if isinstance(old_sub.location.start, ExactPosition) and \ isinstance(new_sub.location.start, ExactPosition) and \ isinstance(old_sub.location.end, ExactPosition) and \ isinstance(new_sub.location.end, ExactPosition): # Its not a problem with fuzzy locations, re-raise raise AssertionError, e else: #At least one location is fuzzy assert old_sub.location.nofuzzy_start == new_sub.location.nofuzzy_start, \ "%s -> %s" % (old_sub.location.nofuzzy_start, new_sub.location.nofuzzy_start) assert old_sub.location.nofuzzy_end == new_sub.location.nofuzzy_end, \ "%s -> %s" % (old_sub.location.nofuzzy_end, new_sub.location.nofuzzy_end) """ This test causes errors in 3 of the test cases: GenBank/extra_keywords.gb GenBank/one_of.gb GFF/NC_001422.gbk e.g: Testing loading from genbank format file GenBank/extra_keywords.gb - TCCAGGGGATTCACGCGCA...TTG [Gp6GqZ3Q9foPG0HvyXguIGSJN8U] len 154329, AL138972.1 - Retrieving by name/display_id 'DMBR25B3', Traceback (most recent call last): File "test_BioSQL_SeqIO.py", line 371, in compare_records(record, db_rec) File "test_BioSQL_SeqIO.py", line 280, in compare_records compare_features(old_f, new_f) File "test_BioSQL_SeqIO.py", line 185, in compare_features raise AssertionError, e AssertionError: [153489:154269] -> [40:610] This is because each of these records has a peculiar join(...) for the above record: join(153490..154269,AL121804.2:41..610, (an aside how does the user know that returned feature location is a join with a separate accession? How does BioSQL/biopython deal with this?) The error is caused by BioSeq.py _retrieve_features() sorting the sub-features first by sorting on start position: BioSeq.py: 249 sub_feature_list.append((start, subfeature)) 250 sub_feature_list.sort() 251 feature.sub_features = [sub_feature[1] 252 for sub_feature in sub_feature_list] This is an error because it returns the sub-features out of order. Besides this sub-feature sort, and the seqFeature sort, are both unnecessary because the features and sub-features are stored in BioSQL by rank and retrieved by rank, so they should be in the correct order anyway. Attached BioSeq.py patch to remove both sort()'s - patch2 With these patches applied the test_BioSQL_SeqIO and test_BioSQL pass: [cymon at chara Tests]$ python test_BioSQL_SeqIO.py > test_output [cymon at chara Tests]$ diff -ruN test_output output/test_BioSQL_SeqIO --- test_output 2008-11-13 15:39:20.000000000 +0000 +++ output/test_BioSQL_SeqIO 2008-11-12 13:06:19.000000000 +0000 @@ -1,3 +1,4 @@ +test_BioSQL_SeqIO Connecting to database Removing existing sub-database 'biosql-seqio-test' (if exists) (Re)creating empty sub-database 'biosql-seqio-test' [cymon at chara Tests]$ python run_tests.py test_BioSQL_SeqIO.py test_BioSQL_SeqIO ... ok ---------------------------------------------------------------------- Ran 1 test in 15.928s OK [cymon at chara Tests]$ python run_tests.py test_BioSQL.py test_BioSQL ... ok ---------------------------------------------------------------------- Ran 1 test in 25.255s OK -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 16:00:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 11:00:02 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131600.mADG02lb002140@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #1 from cymon.cox at gmail.com 2008-11-13 11:00 EST ------- Created an attachment (id=1054) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1054&action=view) patch1 to test_BioSQL_SeqIO -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 16:00:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 11:00:35 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131600.mADG0Zhi002264@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #2 from cymon.cox at gmail.com 2008-11-13 11:00 EST ------- Created an attachment (id=1055) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1055&action=view) patch2 to BioSQL/BioSeq.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 16:28:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 11:28:48 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131628.mADGSmmf007542@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 11:28 EST ------- Another sensible improvement - checked in with only minor changes (fixed an assert in the unit test, and removed an old comment about sorting for subfeatures). Checking in BioSQL/BioSeq.py; /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py new revision: 1.30; previous revision: 1.29 done Checking in Tests/test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.25; previous revision: 1.24 done Thanks Cymon, Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 13 16:33:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 13 Nov 2008 16:33:43 +0000 Subject: [Biopython-dev] Lowess Smooth Improvement In-Reply-To: <171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com> References: <171e8a410811130500o71c455f6mda64ab19c138e48f@mail.gmail.com> <320fb6e00811130527m41238780n9fe7f9c6de1a2d0a@mail.gmail.com> <171e8a410811130825x5732bd99o252e26f2bafa8e13@mail.gmail.com> Message-ID: <320fb6e00811130833y3413eb36p92be13ca0ee1ed9a@mail.gmail.com> On Thu, Nov 13, 2008 at 4:25 PM, James Pruitt wrote: > I removed the Bio.Cluster reference because the system the code would run on > would not have acccess to it so the code was vestigial but on the version I > will submit, I reincluded the Bio.Cluster median function. Yes-- this is an > update for Bio/Statistics/lowess.py OK - file the enhancement bug, upload the code (ideally as a patch) and we'll take a look :) Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 13 17:09:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 12:09:37 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131709.mADH9blO013661@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #4 from cymon.cox at gmail.com 2008-11-13 12:09 EST ------- (In reply to comment #3) > Another sensible improvement - checked in with only minor changes (fixed an > assert in the unit test, Thanks Peter :) > and removed an old comment about sorting for > subfeatures). If the comment stays in, you'll need to remove these two lines of nonsense as well: test_BioSQL_SeqIO.py: 171 # Hence, the old_sub from SeqIO.parse() will have fuzzy location while 172 # new_sub locations from BioSQL will be fuzzy. Sorry about that. C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 13 17:17:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 12:17:15 -0500 Subject: [Biopython-dev] [Bug 2655] Sorting sub-features in BioSeq.py can return corrupted feature In-Reply-To: Message-ID: <200811131717.mADHHFpR015244@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2655 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-13 12:17 EST ------- $ cvs commit -m "Removing two redundant comment lines (see Bug 2655)" test_BioSQL_SeqIO.py =========================================== dev.open-bio.org - Authorized Access Only =========================================== peterc at dev.open-bio.org's password: Checking in test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.26; previous revision: 1.25 done -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 01:23:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 13 Nov 2008 20:23:26 -0500 Subject: [Biopython-dev] [Bug 2657] New: Improved Bio/Statistics/lowess.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2657 Summary: Improved Bio/Statistics/lowess.py Product: Biopython Version: 1.49b Platform: PC URL: http://pastebin.ca/1255734 OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.pruitt at gmail.com I noticed several calculations were done repeatedly when it could be saved as a single variable and used throughout. Then, I realized that it would be faster since the matrix was a statics size to just hard code solving the matrix into the function. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 09:32:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 04:32:36 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811140932.mAE9Wa1f001445@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #1 from dalloliogm at gmail.com 2008-11-14 04:32 EST ------- ok, but consider that all posts on pastebin disappear after 30 days... You should add an attachment by clicking on 'Create a New Attachment' from this page (you can only do that after opening the bug report). p.s. what about adding some doctest to this module? Just to show an example on how to run it. Something like this: """ >>> import numpy >>> x = numpy.array([1, 2, 3, 4, 5]) >>> y = numpy.array([1, 2, 3, 4, 6]) >>> lowess(x, y) expected result """ - http://docs.python.org/library/doctest.html - http://bugzilla.open-bio.org/show_bug.cgi?id=2640 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 10:41:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 05:41:31 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141041.mAEAfVQO007220@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 05:41 EST ------- Created an attachment (id=1057) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1057&action=view) The updated lowess.py from http://pastebin.ca/raw/1255734 Attaching James' new file here so it doesn't just expire at pastebin. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:11:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:11:26 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141111.mAEBBQJm010925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 06:11 EST ------- I've updated CVS to use standard four space indentation, add a doctest and the copyright statement etc. James' code makes two code changes (shown against CVS revision 1.9). 67,68c67,68 < h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)] < w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0) --- > h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)] > w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0) Due to the historic usage "from Numeric import *" this code did once use Numeric.abs here, so it makes sense to use numpy.abs now. Probably just an oversight from the recent Numeric/numpy conversion. This is another reminder that using "from XXX import *" is a bad idea. 76,80c76,82 < b = numpy.array([sum(weights*y), sum(weights*y*x)]) < A = numpy.array([[sum(weights), sum(weights*x)], < [sum(weights*x), sum(weights*x*x)]]) < beta = numpy.linalg.solve(A,b) < yest[i] = beta[0] + beta[1]*x[i] --- > theta = weights*x > b_top = sum(weights*y) > b_bot = sum(theta*y) > a = sum(weights) > b = sum(theta) > d = sum(theta*x) > yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2) I can see the point of calculating and caching these: weights*y weights*x sum(weights*x) Was there a good reason for the name theta for weights*x? I personally think using an explicit matrix solver is much nicer to read than that complex hand coded version. Does it really save much time? My suggestion is just: 76,78c76,81 < b = numpy.array([sum(weights*y), sum(weights*y*x)]) < A = numpy.array([[sum(weights), sum(weights*x)], < [sum(weights*x), sum(weights*x*x)]]) --- > weights_x = weights*x > weights_y = weights*y > sum_weights_x = sum(weights_x) > b = numpy.array([sum(weights_y), sum(weights_y*x)]) > A = numpy.array([[sum(weights), sum_weights_x], > [sum_weights_x, sum(weights_x*x)]]) However, I'm going to leave this for Michiel to resolve (given he wrote the code in the first place). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:15:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:15:09 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141115.mAEBF9Gi011416@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #4 from eric.pruitt at gmail.com 2008-11-14 06:15 EST ------- Created an attachment (id=1058) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) Unit test for lowess.py File will need to have the import statements adjsuted for the Bio.Python structure. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Fri Nov 14 11:18:43 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 14 Nov 2008 11:18:43 +0000 Subject: [Biopython-dev] [BioPython] Problems with Emboss.Primer3 In-Reply-To: <001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de> References: <000801c94598$fd183f20$1022a8c0@ipkgatersleben.de> <320fb6e00811130643p357092f6y8e6d983a11909003@mail.gmail.com> <001001c94644$eeaf5c00$1022a8c0@ipkgatersleben.de> Message-ID: <320fb6e00811140318s452f9a5aj76eb7d505a98b6ee@mail.gmail.com> On Fri, Nov 14, 2008 at 10:37 AM, Stefanie L?ck wrote: > Thanks for the hints! > ... > It gives as well as at the command line: > > " > Command line: > eprimer3 -sequence p3input.txt -outfile out.pr3 -target 50,500 > Return code: > 1 > Errors: > > EMBOSS An error in ajnam.c at line 1991: > > EMBOSSWIN environment variable not defined > > Messages > > " > Any suggestions? This doesn't seem to be a Biopython problem, but an EMBOSS installation or configuration problem. What version of EMBOSS do you have? Maybe try upgrading to version 6? Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:28:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:28:36 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141128.mAEBSaSb013641@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |eric.pruitt at gmail.com ------- Comment #5 from eric.pruitt at gmail.com 2008-11-14 06:28 EST ------- (In reply to comment #3) > I've updated CVS to use standard four space indentation, add a doctest and the > copyright statement etc. > > James' code makes two code changes (shown against CVS revision 1.9). > > 67,68c67,68 > < h = [numpy.sort(abs(x-x[i]))[r] for i in range(n)] > < w = numpy.clip(abs(([x]-numpy.transpose([x]))/h),0.0,1.0) > --- > > h = [numpy.sort(numpy.abs(x-x[i]))[r] for i in range(n)] > > w = numpy.clip(numpy.abs(([x]-numpy.transpose([x]))/h),0.0,1.0) > > Due to the historic usage "from Numeric import *" this code did once use > Numeric.abs here, so it makes sense to use numpy.abs now. Probably just an > oversight from the recent Numeric/numpy conversion. This is another reminder > that using "from XXX import *" is a bad idea. > > 76,80c76,82 > < b = numpy.array([sum(weights*y), sum(weights*y*x)]) > < A = numpy.array([[sum(weights), sum(weights*x)], > < [sum(weights*x), sum(weights*x*x)]]) > < beta = numpy.linalg.solve(A,b) > < yest[i] = beta[0] + beta[1]*x[i] > --- > > theta = weights*x > > b_top = sum(weights*y) > > b_bot = sum(theta*y) > > a = sum(weights) > > b = sum(theta) > > d = sum(theta*x) > > yest[i] = (d*b_top-b*b_bot+(a*b_bot-b*b_top)*x[i])/(a*d-b**2) > > I can see the point of calculating and caching these: > weights*y > weights*x > sum(weights*x) > > Was there a good reason for the name theta for weights*x? > > I personally think using an explicit matrix solver is much nicer to read than > that complex hand coded version. Does it really save much time? > > My suggestion is just: > 76,78c76,81 > < b = numpy.array([sum(weights*y), sum(weights*y*x)]) > < A = numpy.array([[sum(weights), sum(weights*x)], > < [sum(weights*x), sum(weights*x*x)]]) > --- > > weights_x = weights*x > > weights_y = weights*y > > sum_weights_x = sum(weights_x) > > b = numpy.array([sum(weights_y), sum(weights_y*x)]) > > A = numpy.array([[sum(weights), sum_weights_x], > > [sum_weights_x, sum(weights_x*x)]]) > > However, I'm going to leave this for Michiel to resolve (given he wrote the > code in the first place). > Yes-- replacing numpy saves quite a bit of time. When I replaced the variable so they werent recalculated every single time, it reduced unit test time 17% compared to the original then repaklcing numpy receduced it to a net 38% from the original so huge difference. Also, I suggest changing something if you all decided to keep numpy. Minor but just a suggestion. > weights_x = weights*x > sum_weights_x = sum(weights_x) > b = numpy.array([sum(weights*y), sum(weights_x*y)]) > A = numpy.array([[sum(weights), sum_weights_x], > [sum_weights_x, sum(weights_x*x)]]) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:32:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:32:39 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141132.mAEBWdlC014111@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 06:32 EST ------- (In reply to comment #4) > Created an attachment (id=1058) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details] > Unit test for lowess.py > > File will need to have the import statements adjsuted for the Bio.Python > structure. > You're also using scipy and rpy (not Biopython dependencies), so if we wanted to include these tests they would have to be made conditional on these external dependencies (so that the test framework knows when it can skip them). Removing them effectivly leaves one simple test: from numpy import array from Bio.Statistics.lowess import lowess hand_iterations = 1 hand_f = 2./3. hand_x = array([0.0,1.0,4.0,7.0]) hand_y = array([0.0,1.0,16.0,49.0]) #Was there a typo in the original, 18.85086... versus 18.5086...? #hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727] hand_out = [ -1.33338941, 2.80323154, 18.50860916, 48.30274834] method_out = lowess(hand_x,hand_y,hand_f,hand_iterations) for a,b in zip(method_out, hand_out) : assert abs(a-b) < 0.00001 print "Done" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:35:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:35:44 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141135.mAEBZiCO014367@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #7 from eric.pruitt at gmail.com 2008-11-14 06:35 EST ------- (In reply to comment #6) > (In reply to comment #4) > > Created an attachment (id=1058) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1058&action=view) [details] [details] > > Unit test for lowess.py > > > > File will need to have the import statements adjsuted for the Bio.Python > > structure. > > > > You're also using scipy and rpy (not Biopython dependencies), so if we wanted > to include these tests they would have to be made conditional on these external > dependencies (so that the test framework knows when it can skip them). > > Removing them effectivly leaves one simple test: > > from numpy import array > from Bio.Statistics.lowess import lowess > > hand_iterations = 1 > hand_f = 2./3. > hand_x = array([0.0,1.0,4.0,7.0]) > hand_y = array([0.0,1.0,16.0,49.0]) > #Was there a typo in the original, 18.85086... versus 18.5086...? > #hand_out = [-1.333391371257, 2.802858739, 18.850860916, 48.302727] > hand_out = [ -1.33338941, 2.80323154, 18.50860916, 48.30274834] > method_out = lowess(hand_x,hand_y,hand_f,hand_iterations) > for a,b in zip(method_out, hand_out) : > assert abs(a-b) < 0.00001 > print "Done" > When I did the hand calculations, I used a TI-84+ which uses decimal math eliminating the binary error inherent in most python implementations. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:38:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:38:51 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141138.mAEBcpNd014578@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 06:38 EST ------- (In reply to comment #5) >> I personally think using an explicit matrix solver is much nicer to read >> than that complex hand coded version. Does it really save much time? >> ... >> However, I'm going to leave this for Michiel to resolve (given he wrote >> the code in the first place). >> > > Yes-- replacing numpy saves quite a bit of time. When I replaced the variable > so they werent recalculated every single time, it reduced unit test time 17% > compared to the original then repaklcing numpy receduced it to a net 38% from > the original so huge difference. OK - so its clarity versus what sounds like a big speed difference. > Also, I suggest changing something if you all > decided to keep numpy. Minor but just a suggestion. > > > weights_x = weights*x > > sum_weights_x = sum(weights_x) > > b = numpy.array([sum(weights*y), sum(weights_x*y)]) > > A = numpy.array([[sum(weights), sum_weights_x], > > [sum_weights_x, sum(weights_x*x)]]) > I see, in defining b, sum(weights*y*x) can be done as sum(weights_x*y) which avoids creating the temp variable weights_y = weights*y, that does look better. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:41:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:41:05 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811141141.mAEBf5IS014888@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC|eric.pruitt at gmail.com | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 11:48:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 06:48:07 -0500 Subject: [Biopython-dev] [Bug 2658] New: 1.49b version of PDB Neighborsearch still based on Numeric Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2658 Summary: 1.49b version of PDB Neighborsearch still based on Numeric Product: Biopython Version: 1.49b Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rbickerton at gmail.com Using python 2.52, running: python ./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py gives: Traceback (most recent call last): File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 138, in ns=NeighborSearch(al) File "./lib/python2.5/site-packages/Bio/PDB/NeighborSearch.py", line 41, in __init__ assert(self.coords.typecode()=="f") AttributeError: 'numpy.ndarray' object has no attribute 'typecode' Exit 1 A bit of google digging suggested that .typecode()=="f" is a Numarray function that should be updated to its Numpy equivalent. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 12:06:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 07:06:28 -0500 Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch still based on Numeric In-Reply-To: Message-ID: <200811141206.mAEC6SEp016723@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OS/Version|Mac OS |All ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 07:06 EST ------- Yes, that does look like an oversight in the Numeric to NumPy migration. See also Bug 2649 for a related but different issue in Bio.KDTree -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 12:18:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 07:18:25 -0500 Subject: [Biopython-dev] [Bug 2634] PAM30 Matrix doesn't work with qblast In-Reply-To: Message-ID: <200811141218.mAECIPRT017833@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2634 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 07:18 EST ------- Hi Nick, I hope you got your blast to work. I don't think we have an issue with Biopython itself, so I'm going to close this bug. It would be nice to somehow improve the error handling, but that doesn't look straight forward. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 12:24:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 07:24:16 -0500 Subject: [Biopython-dev] [Bug 2604] test_Restriction failure with Python 2.6 (also cause error in test_CAPS) In-Reply-To: Message-ID: <200811141224.mAECOGMN018266@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2604 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 07:24 EST ------- I'm going to mark this as fixed given it seem to be OK. Please reopen this if there are any issues. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 14 12:27:23 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 12:27:23 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> Message-ID: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> On Sun, Nov 9, 2008 at 3:16 PM, Peter wrote: > Dear Biopythoneers, > > We are pleased to announce a beta release of Biopython 1.49. There are > been some significant changes since Biopython 1.48 was released two > months ago, which is why we are initially releasing a beta for wider > testing. > > As previously announced, the big news is that Biopython now uses NumPy > rather than its precursor Numeric (the original Numerical Python > library). We've had a few Numeric -> NumPy bugs reported, http://bugzilla.open-bio.org/show_bug.cgi?id=2658 Bug 2658 - Bio.PDB.Neighborsearch http://bugzilla.open-bio.org/show_bug.cgi?id=2649 Bug 2649 - Bio.KDTree (probably fixed) I don't think we should release Biopython 1.49 final until these are resolved - but if there was interest I could put out a second beta. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 13:17:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 08:17:39 -0500 Subject: [Biopython-dev] [Bug 2638] test_PopGen_SimCoal_nodepend.py fails on Windows, newline issue In-Reply-To: Message-ID: <200811141317.mAEDHdWo021804@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2638 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 08:17 EST ------- Patch checked in after testing with SIMCOAL2 on Windows XP. Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 15:16:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 10:16:12 -0500 Subject: [Biopython-dev] [Bug 2640] Proposal: doctest for SeqRecord/biopython In-Reply-To: Message-ID: <200811141516.mAEFGClF031759@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2640 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 10:16 EST ------- I've added a general example doctest to the main docstring for the SeqRecord object. Marking as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 15:35:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 10:35:18 -0500 Subject: [Biopython-dev] [Bug 2524] Handle missing libraries like numpy or reportlab in run_tests.py In-Reply-To: Message-ID: <200811141535.mAEFZIP8001033@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2524 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 10:35 EST ------- Fixed the numpy test cases (they were getting annoying with python 2.6 on Windows where numpy isn't yet available). The reportlab tests already fail gracefully. I ended up going down this route: > (b) Modify all the tests using these semi-optional libraries to catch > the ImportError and raise MissingExternalDependencyError instead. As > the tests themselves generally don't directly import the external > library this is perhaps messy. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Nov 14 15:39:00 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 14 Nov 2008 09:39:00 -0600 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> Message-ID: <491D9B94.9050805@gmail.com> Peter wrote: > On Sun, Nov 9, 2008 at 3:16 PM, Peter wrote: > >> Dear Biopythoneers, >> >> We are pleased to announce a beta release of Biopython 1.49. There are >> been some significant changes since Biopython 1.48 was released two >> months ago, which is why we are initially releasing a beta for wider >> testing. >> >> As previously announced, the big news is that Biopython now uses NumPy >> rather than its precursor Numeric (the original Numerical Python >> library). >> > > We've had a few Numeric -> NumPy bugs reported, > > http://bugzilla.open-bio.org/show_bug.cgi?id=2658 > Bug 2658 - Bio.PDB.Neighborsearch > > http://bugzilla.open-bio.org/show_bug.cgi?id=2649 > Bug 2649 - Bio.KDTree (probably fixed) > > I don't think we should release Biopython 1.49 final until these are > resolved - but if there was interest I could put out a second beta. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > I noticed that Bio.PDB.Neighborsearch is not being tested. Is there someway to identify which functions are not getting tested? I know it is considerable effort but it would allow the development of tests that at the very least exercise all the Biopython code. (Hopefully this is not as bad as the Numpy documentation marathon.) Bruce From biopython at maubp.freeserve.co.uk Fri Nov 14 15:46:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 15:46:34 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <491D9B94.9050805@gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> <491D9B94.9050805@gmail.com> Message-ID: <320fb6e00811140746m119a040dv778163e0ab034a2@mail.gmail.com> On Fri, Nov 14, 2008 at 3:39 PM, Bruce Southey wrote: > Peter wrote: >> We've had a few Numeric -> NumPy bugs reported, >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 >> Bug 2658 - Bio.PDB.Neighborsearch >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 >> Bug 2649 - Bio.KDTree (probably fixed) >> >> ... > > I noticed that Bio.PDB.Neighborsearch is not being tested. > That fact that we didn't spot Bug 2658 from the unit tests makes that very clear ;) > > Is there someway to identify which functions are not getting tested? > I can't think of an easy way - the best bet might be a quick script to scan all the unit tests and pull out import lines, and from this build a list of all modules which have some coverage. This wouldn't tell us about how much of each module is tested, but it would be better than nothing. > I know it is considerable effort but it would allow the development of tests > that at the very least exercise all the Biopython code. (Hopefully this is > not as bad as the Numpy documentation marathon.) I've written plenty of tests myself, including for existing modules - my gut feeling is full test coverage would be quite a marathon. Compared to the early years of the project, I've propably tried to be a bit stricter about making sure we have test cases and documentation before accepting new code. In some cases this has worked out pretty well (e.g. Tiago's PopGen stuff is covered in the tutorial and has unit tests). In other cases it could put people off contributing code. Peter From biopython at maubp.freeserve.co.uk Fri Nov 14 17:24:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 17:24:33 +0000 Subject: [Biopython-dev] Test coverage Message-ID: <320fb6e00811140924g26cc0703r2629380540a5b667@mail.gmail.com> Bruce: >> >> Is there someway to identify which functions are not getting tested? >> Peter: > I can't think of an easy way - the best bet might be a quick script to > scan all the unit tests and pull out import lines, and from this build > a list of all modules which have some coverage. This wouldn't tell us > about how much of each module is tested, but it would be better than > nothing. I've done a very crude script to try and answer this, and can point out a few modules in need of tests: Bio.Affy Bio.AlignAce Bio.EZRetrieve Bio.Emboss (everything except the primer parsers) Bio.Encodings (obsolete?) Bio.FilteredReader (obsolete?) Bio.MaxEntropy Bio.NMR Bio.NaiveBayes Bio.NetCatch (obsolete?) Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:06:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:06:49 -0500 Subject: [Biopython-dev] [Bug 2659] New: Typo in tutorial section "2.1 General overview of what Biopython provides" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2659 Summary: Typo in tutorial section "2.1 General overview of what Biopython provides" Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "To me, this can be frustrating since I often WAY to just know the one right way to do something." Should be: "To me, this can be frustrating since I often WANT to just know the one right way to do something." -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:16:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:16:18 -0500 Subject: [Biopython-dev] [Bug 2660] New: Typo in tutorial section "2.2 Working with sequences" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2660 Summary: Typo in tutorial section "2.2 Working with sequences" Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "What we have here is a sequence object with a generic alphabet - reflecting the fact WE HAVE SPECIFIED if this is a DNA or protein sequence (okay, a protein with a lot of Alanines, Glycines, Cysteines and Threonines!)." Should read: "What we have here is a sequence object with a generic alphabet - reflecting the fact we have NOT specified if this is a DNA or protein sequence (okay, a protein with a lot of Alanines, Glycines, Cysteines and Threonines!)." -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:28:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:28:12 -0500 Subject: [Biopython-dev] [Bug 2659] Typo in tutorial section "2.1 General overview of what Biopython provides" In-Reply-To: Message-ID: <200811141828.mAEISCmZ013084@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2659 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:28 EST ------- Thanks :) That's fixed in CVS now, see Doc/Tutorial.tex revision 1.185, which you can view online here (updated every hour): http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython We'll update the HTML and PDF on the website as part of the next release (Biopython 1.49). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:34:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:34:34 -0500 Subject: [Biopython-dev] [Bug 2661] New: Typo in: "2.3 A usage example" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2661 Summary: Typo in: "2.3 A usage example" Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "We???ll start with sequence parsing in Section 2.4, but the orchids will be back later on as well - for example WE'LL EXTRA DATA FROM Swiss-Prot from certain orchid proteins in Section 6.1, search PubMed for papers about orchids in Section 6.2, extract sequence data from GenBank in Section 6.3.1, and work with ClustalW multiple sequence alignments of orchid proteins in Section 6.4.1." Capitalized phrase should contain some modifier like "we'll NEED extra", or "we'll GET extra". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:34:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:34:49 -0500 Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working with sequences" In-Reply-To: Message-ID: <200811141834.mAEIYnm6013826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2660 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:34 EST ------- The tutorial on the website (matching Biopython 1.49b) is fine: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Which version of Biopython are you using (you didn't fill this in on the bug report), or where are you reading this? Looking over CVS this text was only like this in Biopython 1.44, so I'm a little confused. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:38:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:38:06 -0500 Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3 A usage example" In-Reply-To: Message-ID: <200811141838.mAEIc6Qo014131@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2661 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:38 EST ------- As per Bug 2660, which version of Biopython are you using (you didn't fill this in on the bug report), or where are you reading this? This has already been fixed to say "extract" instead of "extra" (but I'm not going to check exactly when this was corrected). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:40:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:40:28 -0500 Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working with sequences" In-Reply-To: Message-ID: <200811141840.mAEIeSsm014238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2660 wilcoxjg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.44 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:41:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:41:47 -0500 Subject: [Biopython-dev] [Bug 2661] Typo in: "2.3 A usage example" In-Reply-To: Message-ID: <200811141841.mAEIfll7014298@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2661 wilcoxjg at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Version|Not Applicable |1.44 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 18:47:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 13:47:28 -0500 Subject: [Biopython-dev] [Bug 2660] Typo in tutorial section "2.2 Working with sequences" In-Reply-To: Message-ID: <200811141847.mAEIlS8Y014586@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2660 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 13:47 EST ------- Hi Josh, If you were reading the tutorial shipped with Biopython 1.44 this makes sense. I certainly don't want to put you off reporting any other typos, but if you find any more please first check against the (almost completely) up to date version before reporting them: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Note that some of the things covered in the current tutorial will not apply to Biopython 1.44, which is now a year old. I'd encourage you to upgrade if possible. Thanks, Peter P.S. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mhampton at d.umn.edu Fri Nov 14 19:48:42 2008 From: mhampton at d.umn.edu (Marshall Hampton) Date: Fri, 14 Nov 2008 13:48:42 -0600 (CST) Subject: [Biopython-dev] coverage of function testing Message-ID: Hi, I noticed some discussion of the coverage and automation of testing for functions in biopython, and thought I would suggest folks check out the testing and coverage tools in Sage (www.sagemath.org). Testing of functions in Sage is done by testing examples in their docstrings - there are comments to opt out of testing or to indicate if they will take a long time. They also have scripts for checking which functions have at least one such testable example. So you can do something like this: sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py and get SCORE /Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py: 100% (21 of 21) to see if anything is untested. Now that biopython is converting to numpy, I will start arguing for its inclusion as a standard part of Sage (right now it is an optional package). Cheers, Marshall Hampton Integrated Biosciences Program and Department of Mathematics and Statistics University of Minnesota, Duluth From bugzilla-daemon at portal.open-bio.org Fri Nov 14 20:27:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 15:27:12 -0500 Subject: [Biopython-dev] [Bug 2662] New: Typo in tutorial "Chapter 3 Sequence objects " Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2662 Summary: Typo in tutorial "Chapter 3 Sequence objects " Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: wilcoxjg at gmail.com Sentence reads: "First of all the Seq object has a slightly different set of METHODS TO A PLAIN python string (for example, reverse_complement() and translate() methods used for nucleotide sequences)." Should be: "methods THAN a plain python string" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 14 20:29:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 14 Nov 2008 20:29:16 +0000 Subject: [Biopython-dev] coverage of function testing In-Reply-To: References: Message-ID: <320fb6e00811141229j3aa3a7b6ra3a064842e8f007c@mail.gmail.com> On Fri, Nov 14, 2008 at 7:48 PM, Marshall Hampton wrote: > Hi, > > I noticed some discussion of the coverage and automation of testing for > functions in biopython, and thought I would suggest folks check out the > testing and coverage tools in Sage (www.sagemath.org). Testing of functions > in Sage is done by testing examples in their docstrings - there are comments > to opt out of testing or to indicate if they will take a long time. They > also have scripts for checking which functions have at least one such > testable example. So you can do something like this: > > sage -coverage PATH_TO_SAGE/sage/geometry/polyhedra.py > > and get > > SCORE > /Volumes/D/sage-3.2.alpha0/devel/sage-main/sage/geometry/polyhedra.py: > 100% (21 of 21) > > to see if anything is untested. That may be worth a go, but there are two sides to this: (1) Making a list of the code that needs testing (pretty much the same for any python library) (2) Working out what is already tested (and here, that means going over Biopython's test framework which is based on unit test, but also includes some use of doctests). This is probably trickier... > Now that biopython is converting to numpy, I will start arguing for its > inclusion as a standard part of Sage (right now it is an optional package). That sounds good - but I have no knowledge of the Sage system and how they divide things up. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 14 23:15:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 18:15:57 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811142315.mAENFvNc000930@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-14 18:15 EST ------- (In reply to comment #0) > Sentence reads: > > "First of all the Seq object has a slightly different set of METHODS TO A > PLAIN python string (for example, reverse_complement() and translate() > methods used for nucleotide sequences)." There's nothing wrong with that (and I got a second opinion on this too). The only thing I think that might need changing is adding a comma: "First of all, the Seq object...". > Should be: > "methods THAN a plain python string" Why exactly? Are you an American? ;) There is also the possible option of "... different ... from ...", but that doesn't flow as nicely here. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 14 23:47:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 18:47:16 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811142347.mAENlG5D003824@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #9 from eric.pruitt at gmail.com 2008-11-14 18:47 EST ------- Created an attachment (id=1059) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1059&action=view) Test for speed comparison I wrote a short program to compare the speed of the original lowess function to my version. I thought the way the unit test was written might have affected results. On my system, the new version ran an average of 15 seconds per test as opposed 19 for the old one so not the boost I originally purported but closer to 27%. Posting the program so someone else can compare it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 02:06:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 21:06:49 -0500 Subject: [Biopython-dev] [Bug 2658] 1.49b version of PDB Neighborsearch still based on Numeric In-Reply-To: Message-ID: <200811150206.mAF26nhu013792@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-14 21:06 EST ------- Fixed in CVS; see Bio/PDB/NeighborSearch.py revision 1.21. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 03:59:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 22:59:22 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811150359.mAF3xM8D020801@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2008-11-14 22:59 EST ------- This warning is due to the introduction of Py_ssize_t in Python 2.5. The best solution for this bug depends on which Python versions will be supported by Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 04:04:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 14 Nov 2008 23:04:00 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811150404.mAF4403S021350@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #10 from mdehoon at ims.u-tokyo.ac.jp 2008-11-14 23:04 EST ------- A few comments: 1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing these two functions suggests that they are equally fast. 2) I have no objection against James' suggestion to speed up the code. The original call to numpy.linalg.solve was probably overkill. 3) Can you submit a unit test that does not use scipy and rpy? We should avoid adding additional dependencies to Biopython. 4) In the long run, I am not sure whether Biopython is the right place for the lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't stop us from improving the code here, though). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 07:16:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 02:16:11 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811150716.mAF7GB1r002223@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-15 02:16 EST ------- I have uploaded a fixed version to CVS. Could you try it? Bio/triemodule.c, revision 1.7. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 16:29:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 11:29:53 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811151629.mAFGTrgj008598@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #11 from eric.pruitt at gmail.com 2008-11-15 11:29 EST ------- (In reply to comment #10) > A few comments: > > 1) Is there a reason to use numpy.abs instead of Python's built-int abs? Timing > these two functions suggests that they are equally fast. > 2) I have no objection against James' suggestion to speed up the code. The > original call to numpy.linalg.solve was probably overkill. > 3) Can you submit a unit test that does not use scipy and rpy? We should avoid > adding additional dependencies to Biopython. > 4) In the long run, I am not sure whether Biopython is the right place for the > lowess function. Probably NumPy or Matplotlib would be better. (that shouldn't > stop us from improving the code here, though). > Yes, I only had the scipy and rpy dependencies in my unit test because I wanted to have something to compare your function to when I was going to first use it in my code and to make sure it worked after I made changes to it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 17:07:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 12:07:36 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811151707.mAFH7aZM010885@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1057 is|0 |1 obsolete| | ------- Comment #12 from eric.pruitt at gmail.com 2008-11-15 12:07 EST ------- Created an attachment (id=1060) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1060&action=view) Updated lowess.py Renamed "theta" to a more logical name, "weighted_mul_x." Replaced numpy.abs with regular abs statement (Actually lead to a very slight but still there speed increase). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 15 17:08:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 15 Nov 2008 12:08:15 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811151708.mAFH8F6n010936@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 eric.pruitt at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1058 is|0 |1 obsolete| | ------- Comment #13 from eric.pruitt at gmail.com 2008-11-15 12:08 EST ------- Created an attachment (id=1061) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1061&action=view) Unit test for lowess.py removing scipy and rpy dependencies -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 08:36:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 03:36:32 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811170836.mAH8aWoY027949@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp 2008-11-17 03:36 EST ------- I have uploaded the new code and the unit test with some modifications to CVS. Could you have a look at it to see if you're happy with the result? I am using numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional speedup. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 10:33:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 05:33:37 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171033.mAHAXbbS003922@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 05:33 EST ------- I haven't tried this on Linux yet. =================================== I've just updated to CVS and rebuilt on Windows with mingw32 (gcc 3.4.4 cygming special), using Python 2.3, 2.4, 2.5 and 2.6 - no warnings from the Bio.Trie code. I should have checked for any warnings BEFORE updating to CVS, but didn't. =================================== However, on Mac OS X 10.5 "Leopard" with I now get a lot of pointer warnings: building 'Bio.trie' extension creating build/temp.macosx-10.3-i386-2.5 creating build/temp.macosx-10.3-i386-2.5/Bio gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -IBio -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c Bio/triemodule.c -o build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:498: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:498: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type gcc -arch ppc -arch i386 -isysroot /Developer/SDKs/MacOSX10.4u.sdk -fno-strict-aliasing -Wno-long-double -no-cpp-precomp -mno-fused-madd -fno-common -dynamic -DNDEBUG -g -O3 -IBio -I/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5 -c Bio/trie.c -o build/temp.macosx-10.3-i386-2.5/Bio/trie.o Bio/trie.c: In function ???Trie_set???: Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c: In function ???Trie_set???: Bio/trie.c:103: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c:156: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 1 of ???strncpy??? differ in signedness Bio/trie.c:162: warning: pointer targets in passing argument 2 of ???strncpy??? differ in signedness Bio/trie.c:164: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c: In function ???Trie_get???: Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_transition???: Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c: In function ???Trie_get???: Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:229: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:235: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_transition???: Bio/trie.c:268: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:272: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_trie???: Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:284: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_get_approximate_trie???: Bio/trie.c:353: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:355: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???Trie_has_prefix???: Bio/trie.c:356: warning: pointer targets in passing argument 1 of ???strcat??? differ in signednessBio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:356: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:367: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:369: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_iterate_helper???: Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c: In function ???Trie_has_prefix???: Bio/trie.c:440: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:441: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_with_prefix_helper???: Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:443: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c: In function ???_iterate_helper???: Bio/trie.c:468: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:470: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 1 of ???strcat??? differ in signedness Bio/trie.c:475: warning: pointer targets in passing argument 2 of ???strcat??? differ in signedness Bio/trie.c: In function ???_with_prefix_helper???: Bio/trie.c:521: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:522: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_serialize_transition???:Bio/trie.c:524: warning: pointer targets in passing argument 1 of ???strncmp??? differ in signedness Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:524: warning: pointer targets in passing argument 2 of ???strncmp??? differ in signedness Bio/trie.c:530: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 1 of ???strncat??? differ in signedness Bio/trie.c:536: warning: pointer targets in passing argument 2 of ???strncat??? differ in signedness Bio/trie.c: In function ???_serialize_transition???: Bio/trie.c:621: warning: pointer targets in passing argument 1 of ???strlen??? differ in signedness Bio/trie.c: In function ???_deserialize_transition???: Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup??? differ in signedness Bio/trie.c: In function ???test???: Bio/trie.c:752: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:753: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:754: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:755: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:757: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:758: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:759: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c: In function ???_deserialize_transition???: Bio/trie.c:708: warning: pointer targets in passing argument 1 of ???strdup??? differ in signednessBio/trie.c:760: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:762: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:763: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:765: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:768: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:769: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c: In function ???test???: Bio/trie.c:752: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:753: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:754: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:755: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:757: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:758: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:759: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:760: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:762: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:763: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:765: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness Bio/trie.c:768: warning: pointer targets in passing argument 2 of ???Trie_set??? differ in signedness Bio/trie.c:769: warning: pointer targets in passing argument 2 of ???Trie_get??? differ in signedness gcc -arch i386 -arch ppc -isysroot /Developer/SDKs/MacOSX10.4u.sdk -g -bundle -undefined dynamic_lookup build/temp.macosx-10.3-i386-2.5/Bio/triemodule.o build/temp.macosx-10.3-i386-2.5/Bio/trie.o -o build/lib.macosx-10.3-i386-2.5/Bio/trie.so $ python Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. $ gcc -v Using built-in specs. Target: i686-apple-darwin9 Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9 Thread model: posix gcc version 4.0.1 (Apple Inc. build 5465) Note that this gcc is only 4.0.1, while Bruce reported this bug on 4.3.2. The good news is test_trie.py and test_triefind.py still pass. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 10:41:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 05:41:35 -0500 Subject: [Biopython-dev] [Bug 2666] New: Bio.PDB.NeighborSearch self test often fails with MemoryError Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2666 Summary: Bio.PDB.NeighborSearch self test often fails with MemoryError Product: Biopython Version: Not Applicable Platform: PC OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk >From the Biopython source code (from CVS), in the Bio/PDB folder, running NeighborSearch.py does a quick self test. This is a random test, and sometimes this is fine: $ python NeighborSearch.py Found 1 Found 4 Found 3 Found 2 Found 2 Found 2 Found 3 Found 3 Found 1 Found 5 Found 2 Found 3 Found 2 Found 2 Found 2 Found 6 Found 3 Found 2 Found 3 Found 1 However, about 50% of the time I get something like this: $ python NeighborSearch.py Found 2 Found 1 Found 2 Found 1 Found 1 Found 1 Found 4 Found Traceback (most recent call last): File "NeighborSearch.py", line 139, in print "Found ", len(ns.search_all(5.0)) File "NeighborSearch.py", line 104, in search_all self.kdt.all_search(radius) File "/Users/pjcock/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/KDTree/KDTree.py", line 198, in all_search self.neighbors = self.kdt.neighbor_search(radius) MemoryError: calculation failed due to lack of memory I've tried this on a MAC which had over 4GB or RAM free at the time, so I don't believe this really is a MemoryError. I've also tried this on a less powerful Windows machine, which fails in the same way (it can finish the test, but possibly with a lower success rate). [As an aside, I'm planning to use this self test to create an actual Biopython unit test for the Bio.PDB.NeighborSearch module.] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 11:42:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 06:42:24 -0500 Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often fails with KDTree MemoryError In-Reply-To: Message-ID: <200811171142.mAHBgOD9008929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2666 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Bio.PDB.NeighborSearch self |Bio.PDB.NeighborSearch self |test often fails with |test often fails with KDTree |MemoryError |MemoryError ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 06:42 EST ------- I suspect this is failing when there are NO entries found within the specified radius. Changing this line: print "Found ", len(ns.search_all(5.0)) to use a larger search radius seems to "fix" the test, e.g. print "Found ", len(ns.search_all(10.0)) Similarly, dropping it to radius 2.0 makes it fail almost every time. I suspect something is amiss in the KDTree C code from the traceback. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 11:44:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 06:44:45 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171144.mAHBijrj009171@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2008-11-17 06:44 EST ------- (In reply to comment #3) Yes I know; that is bug #2608. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 12:09:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 07:09:15 -0500 Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often fails with KDTree MemoryError In-Reply-To: Message-ID: <200811171209.mAHC9FUF010799@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2666 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-17 07:09 EST ------- I fixed Bio.KDTree and committed it to CVS; please give it a try. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 12:14:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 07:14:19 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171214.mAHCEJa0011060@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 07:14 EST ------- (In reply to comment #4) > (In reply to comment #3) > Yes I know; that is bug #2608. > Oh. Sorry - I had seen Bug 2608 but hadn't made the connection. I've just confirmed Linux with gcc 4.1.2 is still happy. Over to Bruce to test with gcc 4.3.2 then... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 12:25:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 07:25:21 -0500 Subject: [Biopython-dev] [Bug 2666] Bio.PDB.NeighborSearch self test often fails with KDTree MemoryError In-Reply-To: Message-ID: <200811171225.mAHCPLmC011729@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2666 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 07:25 EST ------- That's fixed it - thanks! I've also updated test_PDB.py to include a quick test of this code, based on the Bio/PDB/NeighborSearch.py self test code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon Nov 17 13:27:51 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 17 Nov 2008 13:27:51 +0000 Subject: [Biopython-dev] PopGen.Stats Message-ID: <6d941f120811170527g752c28a7j48b42569c947853d@mail.gmail.com> After too much thinking and too much delaying (delaying in two distinct senses: proposal delay and delaying for more than 1 year doing the module), here is my proposal on how to proceed. Remembering a few fundamental points: 1. Statistics is the core of population genetics. Never Bio.PopGen will be relevant without it. 2. The framework should be future proof. 3. The API should be for general use (ie not only based on the cases developers know of). 4. It is very difficult to a have a broad view on how an API like this can be used (uses vary population genetics of cancer with micro arrays/lots of data to conservation genetics of species with a few samples and little number of loci). A waterfall approach to development is not only outdated as it would be quite counter productive. So I have no bureaucratic design document to provide. My proposal is to choose a bunch of statistics and tests that are representative of what people might use and implement them. During the implementation, through refactoring a reasonable API should take form. What statistics should be choosen then? What are representative statistics? I was able to find a list of classifications to start. This list got some inspiration from the very good Arlequin manual. Here are the different dimensions that I found: 1. Intra-Population versus Inter-population statistics. Say expected heterozygosity versus Fst 2. Marker dependent vs Marker independent. Say Allelic range (for microsatelites only) versus Fis 3. Data type: haployic, genotypic phase unknown, genotypic phase known, genoptypic dominant, frequency only. Say for expected heterozygosity frequencies are enough, for observed heterozygosity genotypic phase unknown data is necessary. 4. Single locus (e.g. allelic richness, ExpHe, Fst) versus multi-loci (e.g., number of polimorphic sites, LD or EHH) 5. Temporal/longitudinal vs single point in time. Say temporal-Fst versus Fst. 6. Population versus Landscape. This issue I suggest abandon for now. So, the idea is to choose a set of statistics that elucidate these points, with a good subset we will have a feeling on how everything fits together. We implement them and then iterate until the API "feels good". A suggestion of statistics: ExpHz non-temporal, intra, single-locus, marker independent, genotypic - gametic unk ObsHz non-temporal, intra, single-locus, independent, genotypic - gametic kn Fst(CW) non-temporal, inter, single-locus, indep, genotypic - gametic unk temporal-Fst temporal, intra, single-locus, indep, genotypic - gametic unk LD(D') non-temporal, intra, multi-locus, indep, haplo/geno Fk temporal, intra, single-locus, indep, geno S (polimorphic sites), non-temporal, intra, multi-locus, indep, haplo/geno Alleic range, nt, intra, single-locus, microsat, haplo/geno EHH, nt, positional Tajima D, nt, intra, single-locus, sequence/rflp There is still the issue of tests (say Hardy-Weinberg deviation), but that can be thought while the rest is being done. The good news is that the half of the above is already implemented (exceptions are allelic range, S, Tajima D, EHH - presented in increasing order of implementation difficulty). I propose implementing the remaining (I can do that, unless any other wants to give it a try) and then iterate the API until there is a rough agreement). This can be done on GIT (BTW, my username there is tiagoantao). I propose that ability to influence policy is roughly proportional with the time spent coding/effort done ;) . PS - I am assuming a sequence is a single locus in my reasoning. Of course it can be seen (and sometimes is) as a sequence of loci (SNPs). From bugzilla-daemon at portal.open-bio.org Mon Nov 17 18:29:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 13:29:08 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811171829.mAHIT8u9006711@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #6 from bsouthey at gmail.com 2008-11-17 13:29 EST ------- > Over to Bruce to test with gcc 4.3.2 then... > Still the same warning for Python 2.5 and 2.6: Bio/triemodule.c: In function ???_write_value_to_handle???: Bio/triemodule.c:498: warning: passing argument 3 of ???PyString_AsStringAndSize??? from incompatible pointer type See PEP 353 (http://www.python.org/dev/peps/pep-0353/) which suggests to include: #if PY_VERSION_HEX < 0x02050000 && !defined(PY_SSIZE_T_MIN) typedef int Py_ssize_t; #define PY_SSIZE_T_MAX INT_MAX #define PY_SSIZE_T_MIN INT_MIN #endif I did not get the warning after I added it to Bio.trie.h (as I thought that this would be the appropriate location for it) and changed the declaration in _write_value_to_handle for length to: Py_ssize_t length; But while this is fine for Python 2.3 and Python 2.4, I get the error with Python 2.5 and Python 2.6: [snip] test_trie ... ERROR test_triefind ... ok ====================================================================== ERROR: test_trie ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_trie.py", line 87, in trieobj3 = trie.load(h) ValueError: bad marshal data -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Nov 17 18:35:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 17 Nov 2008 12:35:05 -0600 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: References: Message-ID: <4921B959.2080706@gmail.com> Hi, I was just running the test under a very fresh cvs version and under Python2.3 the test was hanging with test_GASelection. Of course, there was no problem after killing it and rerunning the test. I think this also pertains to bug 2651 so I thought I would ask if there was a way to examine this further before doing anything else. I understand that this is problem with randomization involved, but it does indicate a more subtle problem is present. I would really like to track down the source of the problem. Does anyone have any ideas on how I could try to examine this further? Thanks Bruce From biopython at maubp.freeserve.co.uk Mon Nov 17 18:50:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Nov 2008 18:50:14 +0000 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <4921B959.2080706@gmail.com> References: <4921B959.2080706@gmail.com> Message-ID: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey wrote: > Hi, > I was just running the test under a very fresh cvs version and under > Python2.3 the test was hanging with test_GASelection. Of course, there was > no problem after killing it and rerunning the test. I think this also > pertains to bug 2651 so I thought I would ask if there was a way to examine > this further before doing anything else. I understand that this is problem > with randomization involved, but it does indicate a more subtle problem is > present. I would really like to track down the source of the problem. > > Does anyone have any ideas on how I could try to examine this further? If you have installed CVS (or indeed any recent version of Biopython, as the GA stuff hasn't changed recently IIRC), then in the Tests directory you can just run: $ python test_GASelection.py You'll find sometimes it gets stuck. I tried modifying the file so that the end reads as follows: if __name__ == "__main__": #sys.exit(run_tests(sys.argv)) ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest, RouletteWheelSelectionTest] runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) test_loader = unittest.TestLoader() test_loader.testMethodPrefix = 't_' test=ALL_TESTS[1] #Edit me: 0, 1 or 2 cur_suite = test_loader.loadTestsFromTestCase(test) count = 0 while True : count += 1 print "#"*50, count runner.run(cur_suite) On my machine, DiversitySelectionTest and RouletteWheelSelectionTest seem safe - the tests just run and run until you interrupt them with ctrl+c. However, this clearly gets stuck in TournamentSelectionTest - so we've narrowed this down a bit. Reading that bit of code, there is an apparent risk of an infinite loop if by chance org_1 happens to be the worst organism in the population. Perhaps adding a simple counter to break out of the loop if after 1000 tries org_1 is still the worst - but I'm not sure what to do then. Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 17 18:59:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 13:59:26 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811171859.mAHIxQgZ009193@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 13:59 EST ------- This is a quick hack to help pin-point the problem, assuming you have the CVS or recent version of Biopython installed, modify the end of test_GAQueens.py as follows: if __name__ == "__main__": #sys.exit(main(sys.argv)) count = 0 while True : count +=1 print "#"*50, count run_tests([]) This just repeats the test until it fails: $ python test_GAQueens.py ... ################################################## 7 Calculating for 5 queens... Generating an initial population of 1000 organisms... Evolving the population and searching for a solution... Traceback (most recent call last): File "test_GAQueens.py", line 405, in run_tests([]) File "test_GAQueens.py", line 42, in run_tests main(arguments) File "test_GAQueens.py", line 76, in main evolved_pop = evolver.evolve(queens_solved) File "/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Evolver.py", line 56, in evolve self._population = self._selector.select(self._population) File "/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Tournament.py", line 77, in select new_orgs[1]) File "/Users/xxx/Downloads/Software/biopython-1.49b/build/lib.macosx-10.3-i386-2.5/Bio/GA/Selection/Abstract.py", line 53, in mutate_and_crossover final_org_1 = self._repairer.repair(final_org_1) File "test_GAQueens.py", line 234, in repair duplicated_items = self._get_duplicates(organism.genome) File "test_GAQueens.py", line 203, in _get_duplicates if genome.count(item) > 1: File "/Users/xxx/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Seq.py", line 886, in count raise TypeError("expected a string, Seq or MutableSeq") TypeError: expected a string, Seq or MutableSeq i.e. The same traceback as in Bruce's original report (allowing for the update to the Seq object's count method), but easier to reproduce. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 19:18:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 14:18:24 -0500 Subject: [Biopython-dev] [Bug 2651] Error from test_GAQueens.py In-Reply-To: Message-ID: <200811171918.mAHJIO5t010436@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2651 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 14:18 EST ------- Solved with Tests/test_GAQueens.py revision 1.3 in CVS. When test_GAQueens.py was written, a Seq object would accept an integer argument. Since Biopython 1.45, or to be exact Bio/Seq.py CVS revision 1.20 (see Bug 2386), the Seq object's count method will not accept an integer argument. This wasn't deliberate, but is consistent with a python string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Nov 17 20:03:54 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 17 Nov 2008 14:03:54 -0600 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> Message-ID: <4921CE2A.3090606@gmail.com> Peter wrote: > On Mon, Nov 17, 2008 at 6:35 PM, Bruce Southey wrote: > >> Hi, >> I was just running the test under a very fresh cvs version and under >> Python2.3 the test was hanging with test_GASelection. Of course, there was >> no problem after killing it and rerunning the test. I think this also >> pertains to bug 2651 so I thought I would ask if there was a way to examine >> this further before doing anything else. I understand that this is problem >> with randomization involved, but it does indicate a more subtle problem is >> present. I would really like to track down the source of the problem. >> >> Does anyone have any ideas on how I could try to examine this further? >> > > If you have installed CVS (or indeed any recent version of Biopython, > as the GA stuff hasn't changed recently IIRC), then in the Tests > directory you can just run: > > $ python test_GASelection.py > > You'll find sometimes it gets stuck. I tried modifying the file so > that the end reads as follows: > > if __name__ == "__main__": > #sys.exit(run_tests(sys.argv)) > > ALL_TESTS = [DiversitySelectionTest, TournamentSelectionTest, > RouletteWheelSelectionTest] > > runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) > test_loader = unittest.TestLoader() > test_loader.testMethodPrefix = 't_' > > test=ALL_TESTS[1] #Edit me: 0, 1 or 2 > cur_suite = test_loader.loadTestsFromTestCase(test) > count = 0 > while True : > count += 1 > print "#"*50, count > runner.run(cur_suite) > > On my machine, DiversitySelectionTest and RouletteWheelSelectionTest > seem safe - the tests just run and run until you interrupt them with > ctrl+c. > > However, this clearly gets stuck in TournamentSelectionTest - so we've > narrowed this down a bit. Reading that bit of code, there is an > apparent risk of an infinite loop if by chance org_1 happens to be the > worst organism in the population. Perhaps adding a simple counter to > break out of the loop if after 1000 tries org_1 is still the worst - > but I'm not sure what to do then. > > Peter > > Hi, I ran the test multiple times using a bash loop and I think I tracked down this specific problem to within the actual test code, specifically the function TournamentSelectionTest.t_select_best(). I think this what Peter noticed. This is how I understand things which I hope is sufficient correct to understand it. The test simulates a genome that has 3 locations with the 4 bases coded as '0', '1', '2', and '3' for an 'organism'. (Note the 3 locations is hard coded into the random_genome function.) The calculation of fitness of an organism is just the integer of the coded values do the first position is hundreds, the second is tens and last is ones. In the TournamentSelectionTest.t_select_best, a second organism is simulated that must have a better fitness than the first. The problem comes is when the simulated genome of the first organism is '000' because the fitness is zero. This creates an infinite loop because the line : if org_2.fitness < org_1.fitness: will always to false but eventually this must be true to break the loop. Obviously this loop becomes infinite and, given that there are only three locations, it should be rather frequent. Is it sufficient to use the condition '<='? Alternatively, is there someway to fix the genome of the first organism rather than a random one? For example, instead of the random_organism() declare it as say: org_1=Organism('100', test_fitness) Bruce From biopython at maubp.freeserve.co.uk Mon Nov 17 21:49:02 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Nov 2008 21:49:02 +0000 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <4921CE2A.3090606@gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> <4921CE2A.3090606@gmail.com> Message-ID: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> Bruce wrote: > Peter wrote: >> However, this clearly gets stuck in TournamentSelectionTest - so we've >> narrowed this down a bit. Reading that bit of code, there is an >> apparent risk of an infinite loop if by chance org_1 happens to be the >> worst organism in the population. Perhaps adding a simple counter to >> break out of the loop if after 1000 tries org_1 is still the worst - >> but I'm not sure what to do then. >> >> Peter > > Hi, > I ran the test multiple times using a bash loop and I think I tracked down > this specific problem to within the actual test code, specifically the > function TournamentSelectionTest.t_select_best(). I think this what Peter > noticed. Yes, this was what I was describing. > This is how I understand things which I hope is sufficient correct to > understand it. > > The test simulates a genome that has 3 locations with the 4 bases coded > as '0', '1', '2', and '3' for an 'organism'. (Note the 3 locations is hard > coded into the random_genome function.) The calculation of fitness of an > organism is just the integer of the coded values do the first position is > hundreds, the second is tens and last is ones. > > In the TournamentSelectionTest.t_select_best, a second organism is simulated > that must have a better fitness than the first. The problem comes is when > the simulated genome of the first organism is '000' because the fitness is > zero. This creates an infinite loop because the line : > if org_2.fitness < org_1.fitness: > will always to false but eventually this must be true to break the loop. > Obviously this loop becomes infinite and, given that there are only three > locations, it should be rather frequent. Yes. > Is it sufficient to use the condition '<='? No, I don't think so. The point of the setup seems to be to look for a pair of organisms where one is measurably fitter than the other (and make sure the better one is indeed selected). > Alternatively, is there someway to fix the genome of the first organism > rather than a random one? > For example, instead of the random_organism() declare it as say: > org_1=Organism('100', test_fitness) We could do something like: #Choose anything except the worst organism, "000", while True : org_1=random_organism() if test_fitness(org_1) > 0 : break [Not tested yet] This at least is more or less random. Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 17 22:10:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 17:10:27 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811172210.mAHMARax021977@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 ------- Comment #15 from eric.pruitt at gmail.com 2008-11-17 17:10 EST ------- (In reply to comment #14) > I have uploaded the new code and the unit test with some modifications to CVS. > Could you have a look at it to see if you're happy with the result? I am using > numpy.dot(x,y) instead of sum(x*y) whereever possible; this gave an additional > speedup. > That worked really well; I'm happy with the results. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 17 22:22:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 17 Nov 2008 17:22:52 -0500 Subject: [Biopython-dev] [Bug 2657] Improved Bio/Statistics/lowess.py In-Reply-To: Message-ID: <200811172222.mAHMMq6F022720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2657 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-17 17:22 EST ------- (In reply to comment #15) > > That worked really well; I'm happy with the results. > Excellent - thanks James & Michiel! Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Nov 17 22:49:19 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 17 Nov 2008 16:49:19 -0600 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> <4921CE2A.3090606@gmail.com> <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> Message-ID: <4921F4EF.4030005@gmail.com> Peter wrote: [snip] > >> Alternatively, is there someway to fix the genome of the first organism >> rather than a random one? >> For example, instead of the random_organism() declare it as say: >> org_1=Organism('100', test_fitness) >> > > We could do something like: > > #Choose anything except the worst organism, "000", > while True : > org_1=random_organism() > if test_fitness(org_1) > 0 : break > This needs to be: if org_1.fitness > 0 : break Also, when looping the test, I occasionally get Test not getting an organism already in the new population. ... FAIL Test basic selection on a small population. ... ok ====================================================================== FAIL: Test not getting an organism already in the new population. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_GASelection.py", line 130, in t_no_retrive_organism assert new_org != org, "Got organism already in the new population." AssertionError: Got organism already in the new population. I'll try to look at it tomorrow. Bruce PS thanks for fixing test_GAQueens.py as I have not got it error even running it 10000 times. From biopython at maubp.freeserve.co.uk Mon Nov 17 23:18:12 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 17 Nov 2008 23:18:12 +0000 Subject: [Biopython-dev] test_GASelection hangs In-Reply-To: <4921F4EF.4030005@gmail.com> References: <4921B959.2080706@gmail.com> <320fb6e00811171050v541106d8n371d92f9b7f6c595@mail.gmail.com> <4921CE2A.3090606@gmail.com> <320fb6e00811171349j3bb2757epa7e52e5e55ac0c95@mail.gmail.com> <4921F4EF.4030005@gmail.com> Message-ID: <320fb6e00811171518p78a3c25cq527c2ef338692ad2@mail.gmail.com> > This needs to be: > if org_1.fitness > 0 : break Yeah. I've checked in a fix based on this approach, could you try test_GASelection.py revision 1.3 just to make sure I've not done something silly. > Also, when looping the test, I occasionally get > Test not getting an organism already in the new population. ... FAIL > Test basic selection on a small population. ... ok > > ====================================================================== > FAIL: Test not getting an organism already in the new population. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "test_GASelection.py", line 130, in t_no_retrive_organism > assert new_org != org, "Got organism already in the new population." > AssertionError: Got organism already in the new population. Confirmed - when I was just looking for the hanging sub-test, I didn't spot this. >From my reading of the GA code there is no guarantee that DiversitySelection will return a completely new organism. If it has to generate one at random, there is a small chance it will match something already in the population. i.e. the test itself is flawed. We could try this say 10 times, but even then the test could fail. I've fixed this in test_GASelection.py revision 1.4 by simply commenting out the assert in DiversitySelectionTest.t_no_retrive_organism. However, maybe the underlying Bio.GA.Selection.Diversity code could be altered instead to guarantee this possibly desirable behaviour? Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 18 11:13:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 06:13:31 -0500 Subject: [Biopython-dev] [Bug 2670] New: Populate seqfeature.display_name Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2670 Summary: Populate seqfeature.display_name Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The seqfeature table has a display_name text field, currently left blank by Biopython's loader, but is populated by BioPerl. This field is used in GBrowse for example: http://gmod.org/wiki/GBrowse We could use the protein_id, locus_tag, etc depending on what annotation is available (ideally use the same as BioPerl). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 15:06:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:06:06 -0500 Subject: [Biopython-dev] [Bug 2671] New: Including GenomeDiagram in the main Biopython distribution Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2671 Summary: Including GenomeDiagram in the main Biopython distribution Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk Thanks largely to the efforts of Robert Cadena, we have modified GenomeDiagram so that it plays nicely with the current CVS of Biopython and would like to propose its inclusion as part of the main distribution. GenomeDiagram is described in a Bioinformatics publication (http://dx.doi.org/10.1093/bioinformatics/btk021), and is useful for construction of circular and linear images of biological sequence data, with a specific domain of visualisation of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence as publication-quality vector graphics. It's based on the Reportlab backend, and can be used to produce rastered and streamed image output, too. The major changes that have been made to the version previously available at http://bioinf.scri.ac.uk/lp are: Class names have been changed and no longer have the GD prefix References to 'colour' have been changed to 'color', but both spellings are still permitted in function calls, for backwards-compatibility The default font has been changed to 'Vera', which is shipped with Reportlab, to avoid some problems with unavailable fonts Code for wx widgets has been removed, although the Observer/Observable code remains, allowing user widgets to hook into the code, if that's desirable. Some test code is included, testing colour translation and the ability to produce PDF output in circular and linear diagram formats. Other minor changes to reduce deprecation warnings (those in Reportlab proper remain, however), and to remove code that caused font issues. There are known issues, still. Writing to a raster format, such as PNG, uses Reportlab's renderPM code, which defaults to using fonts that are not installed by Reportlab itself, anymore. This is a Reportlab issue and doesn't affect production of PDF output, so testing currently only checks the ability to generate PDF output. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 15:12:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:12:32 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811181512.mAIFCWJY023516@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #1 from lpritc at scri.sari.ac.uk 2008-11-18 10:12 EST ------- Created an attachment (id=1063) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1063&action=view) GenomeDiagram code, ready to drop into Biopython CVS Contains GenomeDiagram code under Bio.Graphics.GenomeDiagram, and test code with examples. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 15:44:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:44:29 -0500 Subject: [Biopython-dev] [Bug 2672] New: test_lowess and test_docstrings fail to check if numpy is installed Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2672 Summary: test_lowess and test_docstrings fail to check if numpy is installed Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I used the cvs version with a version Python 2.5 that does not have numpy installed. Both test_lowess and test_docstring need to have checks for the presence of Numpy like other tests that require NumPy. These tests should also be skipped with messages like: test_kNN ... skipping. Install NumPy if you want to use Bio.kNN. ====================================================================== ERROR: test_docstrings ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_docstrings.py", line 18, in import Bio.Statistics.lowess File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py", line 23, in import numpy ImportError: No module named numpy ====================================================================== ERROR: test_lowess ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_lowess.py", line 1, in from Bio.Statistics.lowess import lowess File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Statistics/lowess.py", line 23, in import numpy ImportError: No module named numpy -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 15:56:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 10:56:01 -0500 Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to check if numpy is installed In-Reply-To: Message-ID: <200811181556.mAIFu1o1026838@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2672 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 10:56 EST ------- I've fixed test_lowess.py with CVS revision 1.2 to check for numpy as in Bug 2534 For test_docstring.py, I think we could split this in two: test_docstring.py - no numpy dependence test_docstring_numpy.py - for modules which need numpy Or, have some code within test_docstring.py to adjust the list of tests according to if numpy is installed or not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 16:05:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 11:05:29 -0500 Subject: [Biopython-dev] [Bug 2672] test_lowess and test_docstrings fail to check if numpy is installed In-Reply-To: Message-ID: <200811181605.mAIG5TjK027987@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2672 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 11:05 EST ------- (In reply to comment #1) > For test_docstring.py, I think we could split this in two: > > test_docstring.py - no numpy dependence > test_docstring_numpy.py - for modules which need numpy > > Or, have some code within test_docstring.py to adjust the list of tests > according to if numpy is installed or not. I've gone for the second approach, see test_docstring.py CVS revision 1.6 Marking as fixed. Thanks Bruce :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 16:08:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 11:08:54 -0500 Subject: [Biopython-dev] [Bug 2607] Gcc "differ in signedness" warning with cstringfnsmodule.c In-Reply-To: Message-ID: <200811181608.mAIG8ss2028159@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2607 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 11:08 EST ------- Since this bug was filed, we've declared this module obsolete for Biopython 1.49, and assuming we press ahead and deprecate it in Biopython 1.50 then I don't see any point in fixing this compiler warning. Marking as "won't fix". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 18 18:35:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 13:35:25 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811181835.mAIIZPgc004892@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-18 13:35 EST ------- (In reply to comment #6) > Still the same warning for Python 2.5 and 2.6: > > Bio/triemodule.c: In function ???_write_value_to_handle???: > Bio/triemodule.c:498: warning: passing argument 3 of > ???PyString_AsStringAndSize??? from incompatible pointer type It looks like PyString_AsStringAndSize will expect a Py_ssize_t length, and not just an int length. Suggested patch: Index: triemodule.c =================================================================== RCS file: /home/repository/biopython/biopython/Bio/triemodule.c,v retrieving revision 1.7 diff -r1.7 triemodule.c 486a487,489 > #if PY_VERSION_HEX < 0x02050000 > Py_ssize_t length; > #else 487a491 > #endif i.e. in function _write_value_to_handle, at line 486 replace this: int length; with this: #if PY_VERSION_HEX < 0x02050000 Py_ssize_t length; #else int length; #endif This still compiles for me on Python 2.5.2 with gcc 4.0.1 on a Mac. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 02:11:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 21:11:34 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811190211.mAJ2BYpO031573@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #8 from mdehoon at ims.u-tokyo.ac.jp 2008-11-18 21:11 EST ------- I've uploaded a slightly different version to CVS (there were more Py_ssize_t / int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We should also see if the unit test still passes on 64 bit platforms. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 03:08:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 18 Nov 2008 22:08:43 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811190308.mAJ38hkI003686@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #9 from bsouthey at gmail.com 2008-11-18 22:08 EST ------- I quickly build the cvs version and the associated tests passed with the various Python versions 2.3, 2.4, 2.5 (with and without numpy) and 2.6 on my system. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 08:45:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 03:45:52 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811190845.mAJ8jqv4023408@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #2 from lpritc at scri.sari.ac.uk 2008-11-19 03:45 EST ------- The copyright/credit section at the top of each file still needs to be changed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 10:14:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 05:14:57 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811191014.mAJAEv6m032436@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 05:14 EST ------- (In reply to comment #8) > I've uploaded a slightly different version to CVS (there were more Py_ssize_t > / int issues). Could you try that one? Bio/triemodule.c, revision 1.8. We > should also see if the unit test still passes on 64 bit platforms. > CVS version compiles triemodule with no warnings using Python 2.5.2 with gcc 4.0.1 on a Mac. Unit tests pass. CVS version compiles triemodule with no warnings using Python 2.5 with gcc 4.1.2 on Linux (i686 so 32 bit). Unit tests pass. CVS version compiles triemodule with no warnings using Python 2.4.3 with gcc 3.4.6 on Linux (x86_64 so 64 bit). Unit tests pass. It sounds like Bruce has checked all python versions with gcc 4.3.2 on Linux. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 12:17:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 07:17:23 -0500 Subject: [Biopython-dev] [Bug 2609] Gcc 4.3.2 'initialization from incompatible pointer type' warning with triemodule.c In-Reply-To: Message-ID: <200811191217.mAJCHN21008817@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2609 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp 2008-11-19 07:17 EST ------- I tried several Windows versions and a 64 bit unix platform. Everything seems to be OK. Closing this bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 14:38:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:38:33 -0500 Subject: [Biopython-dev] [Bug 2674] New: test_kNN: Removal of from numpy import * Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2674 Summary: test_kNN: Removal of from numpy import * Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com This test contains a import numpy statement to check numpy is available. Therefore it is sufficient just to say 'import numpy'. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 14:39:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:39:52 -0500 Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import * In-Reply-To: Message-ID: <200811191439.mAJEdqkH019174@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2674 ------- Comment #1 from bsouthey at gmail.com 2008-11-19 09:39 EST ------- Created an attachment (id=1064) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1064&action=view) patch to change import numpy statement Just for completeness. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 14:42:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:42:27 -0500 Subject: [Biopython-dev] [Bug 2675] New: Use import numpy in kNN Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2675 Summary: Use import numpy in kNN Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Replacing the 'from numpy import *' statement with import numpy. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 14:43:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:43:12 -0500 Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN In-Reply-To: Message-ID: <200811191443.mAJEhCXu019472@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2675 ------- Comment #1 from bsouthey at gmail.com 2008-11-19 09:43 EST ------- Created an attachment (id=1065) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1065&action=view) patch to change import numpy statement Changes the way numpy is imported. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 14:53:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:53:31 -0500 Subject: [Biopython-dev] [Bug 2676] New: LogisticRegression: changed the way numpy is imported Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2676 Summary: LogisticRegression: changed the way numpy is imported Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com A patch to remove the usage of 'from numpy import *' usage. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 14:54:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 09:54:10 -0500 Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way numpy is imported In-Reply-To: Message-ID: <200811191454.mAJEsAeg020318@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2676 ------- Comment #1 from bsouthey at gmail.com 2008-11-19 09:54 EST ------- Created an attachment (id=1066) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1066&action=view) patch to change import numpy statement -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 15:04:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:04:39 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191504.mAJF4diO021040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython-dev at biopython.org AssignedTo|biopython-dev at biopython.org |chapmanb at 50mail.com ------- Comment #3 from chapmanb at 50mail.com 2008-11-19 10:04 EST ------- Leighton; This is great; thanks for getting it together. I took a look at this last night and have a couple of quick comments: - on the licensing front, the current GPL is not compatible with the Biopython license; it would be nice to have you explicitly say you are okay with re-licensing this version under the Biopython license (http://www.biopython.org/DIST/LICENSE) - Would it be possible to update the GenomeDiagram documentation from here (http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to reflect the new namespace and class name changes? Mentioning some of the gotchas you have below, possibly to replace the installation section, would also be nice. I would like Peter and anyone one else interested to weigh in, but I can work on getting this in after the next release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. You are the assignee for the bug, or are watching the assignee. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 15:13:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:13:46 -0500 Subject: [Biopython-dev] [Bug 2674] test_kNN: Removal of from numpy import * In-Reply-To: Message-ID: <200811191513.mAJFDkuO021701@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2674 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:13 EST ------- Fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 15:17:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:17:28 -0500 Subject: [Biopython-dev] [Bug 2675] Use import numpy in kNN In-Reply-To: Message-ID: <200811191517.mAJFHSID022021@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2675 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:17 EST ------- Fixed in CVS, Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 15:21:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:21:41 -0500 Subject: [Biopython-dev] [Bug 2676] LogisticRegression: changed the way numpy is imported In-Reply-To: Message-ID: <200811191521.mAJFLf8a022292@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2676 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:21 EST ------- Fixed in CVS, thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 15:29:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:29:25 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191529.mAJFTPhW022858@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1062 is|0 |1 obsolete| | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:29 EST ------- (From update of attachment 1062) This attachment seems to have been removed (or failed to upload?). See attachment 1063 instead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 15:29:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 10:29:50 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191529.mAJFTon7022928@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-19 10:29 EST ------- (In reply to comment #3) > > I would like Peter and anyone one else interested to weigh in, but > I can work on getting this in after the next release. > I'm all for adding GenomeDiagram to Biopython (as stated on the mailing list). I haven't actually looked at this revised code base yet - but as I've used GD before and know Leighton "in real life" it might be easier for me to shepherd this into CVS - but the more eyes the better ;) We might also consider getting Leighton CVS access (provisionally use with this module only). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 16:07:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 11:07:24 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811191607.mAJG7OcJ025581@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #6 from lpritc at scri.sari.ac.uk 2008-11-19 11:07 EST ------- (In reply to comment #5) > Leighton; > This is great; thanks for getting it together. I took a look at this last night > and have a couple of quick comments: No problem. Robert Cadena deserves the bulk of the credit - he made most of the changes. > - on the licensing front, the current GPL is not compatible with the Biopython > license; it would be nice to have you explicitly say you are okay with > re-licensing this version under the Biopython license > (http://www.biopython.org/DIST/LICENSE) I am perfectly happy with re-licensing the GD code under the Biopython license. If you need a gpg-signed document to say so, I can provide one ;) > - Would it be possible to update the GenomeDiagram documentation from here > (http://bioinf.scri.ac.uk/lp/downloads/programs/genomediagram/userguide.pdf) to > reflect the new namespace and class name changes? Yep - I'll do that, next. > Mentioning some of the > gotchas you have below, possibly to replace the installation section, would > also be nice. Definitely. Most of the gotchas are Reportlab-related, but they definitely have a place under Installation in the docs. > I would like Peter and anyone one else interested to weigh in, but I can work > on getting this in after the next release. The more, the merrier... it's not my little baby anymore it's out in the big world ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 21:49:48 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 16:49:48 -0500 Subject: [Biopython-dev] [Bug 2677] New: BioSQL seqfeature enhancements Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2677 Summary: BioSQL seqfeature enhancements Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator storage and test. Added remote location storage for sub-features, and test. Ive used the "Sequence Keys" ontology for the location operator and stored loc op in the location_qualifier_value table - not sure this is right... Patches attached. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 21:51:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 16:51:53 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811192151.mAJLprRP024242@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #1 from cymon.cox at gmail.com 2008-11-19 16:51 EST ------- Created an attachment (id=1072) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1072&action=view) Patch for BioSQL/BioSeq.py and Loader.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 19 21:52:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 19 Nov 2008 16:52:46 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811192152.mAJLqk91024384@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #2 from cymon.cox at gmail.com 2008-11-19 16:52 EST ------- Created an attachment (id=1073) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1073&action=view) Patch for BioSQL test cases -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 10:17:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 05:17:17 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811201017.mAKAHHA8027467@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-20 05:17 EST ------- (In reply to comment #0) > Cleaned-up (sub-)seqFeature locations, and strand. Added location_operator > storage and test. Added remote location storage for sub-features, and test. > Excellent - I see you've removed the naive min/max to find the parent feature's location when dealing with sub-features. This should fix the special case where a feature spans the origin on a circular genome. That should take care of many of my "TODO" entries in test_BioSQL_SeqIO.py :) > > Ive used the "Sequence Keys" ontology for the location operator and stored > loc op in the location_qualifier_value table - not sure this is right... > I'm not sure off hand either, but would like us to check before committing this. In the short term, what ever BioPerl does is "right" as I'm treating that as the BioSQL reference implementation. > > Patches attached. > I've scanned over them quickly, and they look fine. The comments do help :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 10:53:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 05:53:19 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811201053.mAKArJsp029436@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-20 05:53 EST ------- Unless anyone else wants to weigh in on Josh's side, I'm not going to change this. Closing bug - but thanks for reporting it anyway Josh. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 20 10:55:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 20 Nov 2008 10:55:57 +0000 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> Message-ID: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com> OK, Progress since Biopython 1.49 beta was released: > We've had a few Numeric -> NumPy bugs reported, > > http://bugzilla.open-bio.org/show_bug.cgi?id=2658 > Bug 2658 - Bio.PDB.Neighborsearch Fixed. > http://bugzilla.open-bio.org/show_bug.cgi?id=2649 > Bug 2649 - Bio.KDTree (probably fixed) No confirmation from the original reporter, but looks OK. > I don't think we should release Biopython 1.49 final until these are > resolved - but if there was interest I could put out a second beta. No-one seems to want a second beta, which saves me some time :) There have been a few other bugs reported and fixed in the meantime, right now the only thing I think holding up the release of Biopython 1.49 is: http://bugzilla.open-bio.org/show_bug.cgi?id=2677 Bug 2677 - BioSQL seqfeature enhancements Is there anything else? Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 20 14:19:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 09:19:39 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811201419.mAKEJcW6011296@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 ------- Comment #3 from mdehoon at ims.u-tokyo.ac.jp 2008-11-20 09:19 EST ------- I am not a native English speaker, but I do agree with Josh that the original phrase "... different set of methods TO a plain python string" sounds strange to me. I would suggest something along the lines of "the set of methods of a Seq object are slightly different from those of a plain python string." But again, that may be Double Dutch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 14:34:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 09:34:25 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811201434.mAKEYPOh015951@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|INVALID | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-20 09:34 EST ------- (In reply to comment #3) > I am not a native English speaker, but I do agree with Josh that the original > phrase "... different set of methods TO a plain python string" sounds strange > to me. As a native English speaker I'm happy with this as is, but concede international usage may vary - and I do want the Tutorial to be as assessable as possible. > I would suggest something along the lines of "the set of methods of a > Seq object are slightly different from those of a plain python string." > But again, that may be Double Dutch. I would say a "set of methods" is singular, but the rest of this sentence is plural. How about completely rephrasing: First of all, they have some different methods (for example, Seq objects have reverse_complement() and translate() methods used for nucleotide sequences). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Nov 20 15:09:42 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 20 Nov 2008 09:09:42 -0600 Subject: [Biopython-dev] Biopython 1.49 beta released In-Reply-To: <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com> References: <320fb6e00811090716v58637d55o470246df4175464e@mail.gmail.com> <320fb6e00811140427u50b3d42bn9514a8352d936960@mail.gmail.com> <320fb6e00811200255x5325a7d4kf4d118350a9e7e65@mail.gmail.com> Message-ID: <49257DB6.5080902@gmail.com> Hi, In connection with Peter's email on forthcoming release, I was wondering what to do about certain modules that do not seem to be used. I started to look at the examples that lack test coverage in case one could do something for the Biopython 1.49 release. But this should not provide any reason for delay the release and may stretch beyond it. Given the potential long term impact and spirit of people who donated the code, I was thinking that the release notes could denote which modules are unsupported and need some usage feedback. In future releases the use of these modules would raise a warning about being unsupported or obsolete. Please note that I am not against any of these modules except for the requirement to maintain them and developing suitable tests. The possible modules are those that Peter previously mentioned that had no tests: Bio.Affy Bio.AlignAce Bio.EZRetrieve Bio.Emboss (everything except the primer parsers) Bio.Encodings (obsolete?) Bio.FilteredReader (obsolete?) Bio.MaxEntropy Bio.NMR Bio.NaiveBayes Bio.NetCatch (obsolete?) I think that Bio.MaxEntropy and Bio.NaiveBayes are useful and I did provide an example that is included in the code. However I am not confident in these methods to maintain these mainly due to my lack of knowledge. Similarly for Bio.Affy, I currently work a lot with two-dye systems but not Affy. I find that Bio.Affy provides insufficient functionality because it does really only reads the intensities and misses other important information in version 3 of Affy format. I do recognize that it could be a base for Affy stuff that may be useful for users such as the PopGen users that use Affy SNP arrays. Bruce Peter wrote: > OK, > > Progress since Biopython 1.49 beta was released: > > >> We've had a few Numeric -> NumPy bugs reported, >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2658 >> Bug 2658 - Bio.PDB.Neighborsearch >> > > Fixed. > > >> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 >> Bug 2649 - Bio.KDTree (probably fixed) >> > > No confirmation from the original reporter, but looks OK. > > >> I don't think we should release Biopython 1.49 final until these are >> resolved - but if there was interest I could put out a second beta. >> > > No-one seems to want a second beta, which saves me some time :) > > There have been a few other bugs reported and fixed in the meantime, > right now the only thing I think holding up the release of Biopython > 1.49 is: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2677 > Bug 2677 - BioSQL seqfeature enhancements > > Is there anything else? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From bsouthey at gmail.com Thu Nov 20 16:26:40 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 20 Nov 2008 10:26:40 -0600 Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or redunant Message-ID: <49258FC0.10703@gmail.com> Hi, The Bio.EZRetrieve module retrieves a single nucleotide sequence from EZRetrieve website: http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, or IMAGE ID. No other genomes are supported. Although it appears faster than a Bio.GenBank query, I do not see that this module provides any special functionality than that already provided by Bio.GenBank and similar. So I think this module is obsolete and redundant. Notes: 1) Obviously LocusLink has been superseded by Entrez Gene. 2) The documented genome builds are 2003 (eg human BUILD.34 at 11/04/2003) but not known if these have been updated since. 3) The start of the sequence is zero. You can use from_='start' instead but the can not mix it with numerical ending. 4) The actual website provides additional information including NCBI links (LocusLink and Nucleic) and does base counting. 5) There are other functions provided by the website like multiple retrievals. The website example is for 'homeobox B6 [/Homo sapiens/]': import Bio.EZRetrieve seq=Bio.EZRetrieve.retrieve_single('BC014651', 1, 20) print seq Gives: >BC014651:HOXB6 ACCACACCTAGGTCGGAGCA Bruce From bugzilla-daemon at portal.open-bio.org Thu Nov 20 17:05:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:05:22 -0500 Subject: [Biopython-dev] [Bug 2678] New: Entrez.esearch does not always retrieve or find DTD files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2678 Summary: Entrez.esearch does not always retrieve or find DTD files Product: Biopython Version: 1.49b Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk When using Entrez.esearch, I have observed an intermittent failure to recover DTD files. These are not being cached on successful search attempts. It may be worth including them in the distribution. Traceback: /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) Traceback (most recent call last): File "./get_entrez_ests.py", line 158, in main() File "./get_entrez_ests.py", line 45, in main options.verbose) File "./get_entrez_ests.py", line 76, in get_entrez_session results = Entrez.read(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py", line 286, in read record = handler.run(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 95, in run self.parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 283, in external_entity_ref_handler parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 280, in external_entity_ref_handler handle = urllib.urlopen(systemId) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 87, in urlopen return opener.open(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open return getattr(self, name)(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 461, in open_file return self.open_local_file(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 475, in open_local_file raise IOError(e.errno, e.strerror, e.filename) IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 20 17:06:34 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 20 Nov 2008 17:06:34 +0000 Subject: [Biopython-dev] Bio.EZRetrieve appears to be obsolete or redunant In-Reply-To: <49258FC0.10703@gmail.com> References: <49258FC0.10703@gmail.com> Message-ID: <320fb6e00811200906p4b8ba2b9jca212a39ec8f972c@mail.gmail.com> On Thu, Nov 20, 2008 at 4:26 PM, Bruce Southey wrote: > Hi, > The Bio.EZRetrieve module retrieves a single nucleotide sequence from > EZRetrieve website: > http://siriusb.umdnj.edu:18080/EZRetrieve/single_r.jsp > It requires a human, rat or mouse nucleic GenBank, UniGene, LocusLink, or > IMAGE ID. No other genomes are supported. > > Although it appears faster than a Bio.GenBank query, I do not see that this > module provides any special functionality than that already provided by > Bio.GenBank and similar. So I think this module is obsolete and redundant. Note the online bits of Bio.GenBank are considered obsoleted by Bio.Entrez anyway. Maybe we should actually deprecate these for Biopython 1.49... I would agree in some ways Bio.EZRetrieve module is also obsolete and redundant, see also: http://lists.open-bio.org/pipermail/biopython-dev/2008-March/003503.html Unless anyone wants to defend Bio.EZRetrieve, let's ask on the main list about declaring it obsolete for Biopython 1.49 (documentation change only) and deprecating it in the next release (adding a warning only). Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 20 17:06:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:06:37 -0500 Subject: [Biopython-dev] [Bug 2678] Entrez.esearch does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811201706.mAKH6b1r006648@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #1 from lpritc at scri.sari.ac.uk 2008-11-20 12:06 EST ------- And this time, more usefully, traceback with problem code: >>> handle = Entrez.einfo() >>> record = Entrez.read(handle) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml1-strict.dtd not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py:279: UserWarning: DTD file xhtml-lat1.ent not found in Biopython installation; trying to retrieve it from NCBI warnings.warn("DTD file %s not found in Biopython installation; trying to retrieve it from NCBI" % filename) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/__init__.py", line 286, in read record = handler.run(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 95, in run self.parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 283, in external_entity_ref_handler parser.ParseFile(handle) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Entrez/Parser.py", line 280, in external_entity_ref_handler handle = urllib.urlopen(systemId) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 87, in urlopen return opener.open(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open return getattr(self, name)(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 461, in open_file return self.open_local_file(url) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 475, in open_local_file raise IOError(e.errno, e.strerror, e.filename) IOError: [Errno 2] No such file or directory: 'xhtml-lat1.ent' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 17:07:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:07:40 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811201707.mAKH7ej9006714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 lpritc at scri.sari.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Entrez.esearch does not |Bio.Entrez module does not |always retrieve or find DTD |always retrieve or find DTD |files |files -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 17:14:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 12:14:35 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811201714.mAKHEZj4007097@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #4 from cymon.cox at gmail.com 2008-11-20 12:14 EST ------- (In reply to comment #3) > (In reply to comment #0) > > Ive used the "Sequence Keys" ontology for the location operator and stored > > loc op in the location_qualifier_value table - not sure this is right... > > > > I'm not sure off hand either, but would like us to check before committing > this. In the short term, what ever BioPerl does is "right" as I'm treating > that as the BioSQL reference implementation. I don't read Perl - but I grep'ed through the source and only found one ref to the location_qualifier_value, and that was in the docs. So maybe they don't store it there... Sorry I can be of more help, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 20 22:01:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 20 Nov 2008 17:01:13 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811202201.mAKM1Dce030238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-11-20 17:01 EST ------- Could you make a list of the missing DTDs? You add the missing ones to Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you may find other missing DTDs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 08:54:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 03:54:00 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811210854.mAL8s0Dt009861@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #3 from lpritc at scri.sari.ac.uk 2008-11-21 03:53 EST ------- (In reply to comment #2) > Could you make a list of the missing DTDs? You add the missing ones to > Bio/Entrez/DTDs and reinstall Biopython. It looks like only xhtml1-strict.dtd > and xhtml-lat1.ent are missing, but after adding these to Bio/Entrez/DTDs you > may find other missing DTDs. I'll add the DTDs that I noted above, but the problem is intermittent and I haven't seen the issue arise again at all, this morning. If I see anything else give an error, I'll make a note here. This may be something to keep in mind if other, similar errors are reported from future Entrez searches, but if the problem is the result of excessive server load, or timeouts, it may not be reliably repeatable. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 10:52:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 05:52:17 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811211052.mALAqHel020569@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 05:52 EST ------- (In reply to comment #4) > (In reply to comment #3) > > (In reply to comment #0) > > > Ive used the "Sequence Keys" ontology for the location operator and stored > > > loc op in the location_qualifier_value table - not sure this is right... > > > > > > > I'm not sure off hand either, but would like us to check before committing > > this. In the short term, what ever BioPerl does is "right" as I'm treating > > that as the BioSQL reference implementation. > > I don't read Perl - but I grep'ed through the source and only found one ref to > the location_qualifier_value, and that was in the docs. So maybe they don't > store it there... > > Sorry I can be of more help, C. > I tried browsing and searching the BioPerl-db source, but couldn't find the answer, so I tried the direct route and used their load_seqdatabase.pl script to import a GenBank file (with at least one join location) and inspected the tables. The answer is that location.term_id is always left as NULL, so there is no ontology to worry about. Doing something sensible with ontologies (e.g. support for existing strict ontologies like SO or SOFA) rather than the current ad-hoc relaxed approach (adding new ontology terms on the fly) taken by BioPerl and Biopython is a possible future enhancement. I'm going to look at modifying you patch to leave location.term_id as NULL, with the aim of committing that today and then doing the Biopython 1.49 release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 11:54:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 06:54:18 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811211154.mALBsIcR025739@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1073 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 11:59:08 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 06:59:08 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811211159.mALBx89Z026099@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1072 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 06:59 EST ------- (From update of attachment 1072) Hi Cymon, I've just checked in something based on your patches: Checking in BioSQL/Loader.py; /home/repository/biopython/biopython/BioSQL/Loader.py,v <-- Loader.py new revision: 1.37; previous revision: 1.36 done Checking in BioSQL/BioSeq.py; /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py new revision: 1.31; previous revision: 1.30 done Checking in Tests/test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.27; previous revision: 1.26 done This should fix the strand, feature db ref in locations, and importantly the start/end with sub-features. I am avoiding the ontology question by leaving location.term_id as NULL (following BioPerl usage). I'd like to do the same with location_qualifier_value.term_id but the schema does not allow NULL here. Interestingly BioPerl does not seem to use this table, so I assume they (like Biopython) have been assuming "join". I think this is still a big improvement, but that the (sub)feature.location_operator issue could wait. We'll need to discuss on the BioSQL mailing list how this should be handled consistently. Leaving this bug open. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 12:04:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 07:04:39 -0500 Subject: [Biopython-dev] [Bug 2643] Proposal: fastPhaseOutputIO for SeqIO In-Reply-To: Message-ID: <200811211204.mALC4dUW026607@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2643 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1048|application/octet-stream |text/plain mime type| | ------- Comment #23 from dalloliogm at gmail.com 2008-11-21 07:04 EST ------- (From update of attachment 1048) changed mime type -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 12:18:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 07:18:35 -0500 Subject: [Biopython-dev] [Bug 2662] Typo in tutorial "Chapter 3 Sequence objects " In-Reply-To: Message-ID: <200811211218.mALCIZds027946@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2662 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 07:18 EST ------- Fixed in CVS revision 1.187 of biopython/Doc/Tutorial.tex by completely rephrasing to avoid the contentious sentence structure. See: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython Now reads: > There are two important di???erences between Seq objects and standard > python strings. First of all, they have di???erent methods. Although > the Seq object supports many of the same methods as a plain string, > its translate() method di???ers by doing biological translation, and > there are also additional biologically relevant methods like > reverse_complement(). Secondly, the Seq object has an important > attribute, alphabet, which is an object describing what the individual > characters making up the sequence string ???mean???, and how they should > be interpreted. For example, is AGTACACTGGT a DNA sequence, or just > a protein sequence that happens to be rich in Alanines, Glycines, > Cysteines and Threonines? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 21 12:38:07 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 12:38:07 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 Message-ID: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com> On Nov 20, Peter wrote: > No-one seems to want a second beta, which saves me some time :) > > There have been a few other bugs reported and fixed in the meantime, > right now the only thing I think holding up the release of Biopython > 1.49 is: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2677 > Bug 2677 - BioSQL seqfeature enhancements I've committed most of this bug fix to CVS, I think the remaining issue can wait until after Biopython 1.49 is out. > Is there anything else? If there are no last minute objections, my plan is to do the Biopython 1.49 release this afternoon, hopefully starting after lunch - in about one hour's time. Please **consider CVS frozen from now**. Hopefully I'll have the build done within the next 12 hours, including the Windows installers. Once the release is out, we'll give it a few days just in case there are any issues to force a re-release, and then reopen CVS. Tiago has some more PopGen code waiting, and there is also GenomeDiagram to look forward too (Bug 2671). Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 21 14:46:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 09:46:29 -0500 Subject: [Biopython-dev] [Bug 2680] New: Bio.AlignAce.Parser.py need to import string Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2680 Summary: Bio.AlignAce.Parser.py need to import string Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The file Bio.AlignAce.Parser.py needs to 'import string' because it uses the function 'string.atof()'. Also, please note that string.atof() is a depreciated function (since Python 2.0) but it will not get removed until Python 3. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 14:57:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 09:57:47 -0500 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: Message-ID: <200811211457.mALEvlR5009727@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2680 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 09:57 EST ------- This used to work via the "from Bio.ParserSupport import *", as up until Biopython 1.48 that imported string. Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be included in Biopython 1.49). I'm leaving this bug open as I would rather not use the string module here at all - probably we can just use float() instead of string.atof() but that can wait until after Biopython 1.49 is out. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Nov 21 15:19:22 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 21 Nov 2008 09:19:22 -0600 Subject: [Biopython-dev] Use of depreciated string functions Message-ID: <4926D17A.8080101@gmail.com> Hi, There are a number of files in Bio that import string. Many of these use depreciated functions (since Version 2) that are now string methods mainly string.atof(), string.atoi() and string.join(). The only real advantage of modifying these is to remove an import statement because these will not be removed until Python 3. Perhaps the one exception is in HotRand.py: hex_digit = string.hexdigits.find( letter ) There are about 23 unique files that I identified via grep and many have more than one usage. While changing these is busy work, please let me know if you would like me to create patches for the next version of Biopython (ie 1.50) or just ignore this. Thanks Bruce From biopython at maubp.freeserve.co.uk Fri Nov 21 15:26:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 15:26:52 +0000 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <4926D17A.8080101@gmail.com> References: <4926D17A.8080101@gmail.com> Message-ID: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey wrote: > Hi, > There are a number of files in Bio that import string. Many of these use > depreciated functions (since Version 2) that are now string methods mainly > string.atof(), string.atoi() and string.join(). The only real advantage of > modifying these is to remove an import statement because these will not be > removed until Python 3. > > Perhaps the one exception is in HotRand.py: hex_digit = > string.hexdigits.find( letter ) > > There are about 23 unique files that I identified via grep and many have > more than one usage. While changing these is busy work, please let me know > if you would like me to create patches for the next version of Biopython (ie > 1.50) or just ignore this. As you say, there isn't much benefit from doing this other than removing an import and making another small step towards Python 3.0 compatibility. We have gradually been phasing out "import string" already, usually when working on a module which used it. Once I've dealt with Biopython 1.49, I'd be happy to look at a patch to remove more "import string" usage from non-obsolete, non-deprecated code. It would be a little risky doing this to modules without unit tests, but that's another area you've shown some interest in anyway... Thanks, Peter From bartek at rezolwenta.eu.org Fri Nov 21 15:32:02 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 21 Nov 2008 16:32:02 +0100 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <200811211457.mALEvlR5009727@portal.open-bio.org> References: <200811211457.mALEvlR5009727@portal.open-bio.org> Message-ID: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> Hello, I fixed the bug (changed both uses of string.atof() to float() ), and commited to CVS, although I cannot close it in Bugzilla (my dev.open-bio account does not seem to work for bugzilla). cheers Bartek Wilczynski On Fri, Nov 21, 2008 at 3:57 PM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2680 > > > > > > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 09:57 EST ------- > This used to work via the "from Bio.ParserSupport import *", as up until > Biopython 1.48 that imported string. > > Fixed in Bio/AlignAce/Parser.py revision 1.4 by importing string (this will be > included in Biopython 1.49). > > I'm leaving this bug open as I would rather not use the string module here at > all - probably we can just use float() instead of string.atof() but that can > wait until after Biopython 1.49 is out. > > > -- > Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug, or are watching the assignee. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bugzilla-daemon at portal.open-bio.org Fri Nov 21 15:41:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 10:41:54 -0500 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: Message-ID: <200811211541.mALFfsDM013508@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2680 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 10:41 EST ------- Bartek's email: > Hello, > > I fixed the bug (changed both uses of string.atof() to float() ), > and commited to CVS, although I cannot close it in Bugzilla (my > dev.open-bio account does not seem to work for bugzilla). > > cheers > Bartek Wilczynski Marking this as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 21 15:45:42 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 15:45:42 +0000 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> References: <200811211457.mALEvlR5009727@portal.open-bio.org> <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> Message-ID: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski wrote: > Hello, > > I fixed the bug (changed both uses of string.atof() to float() ), and > commited to CVS, although I cannot close it in Bugzilla (my > dev.open-bio account does not seem to work for bugzilla). > > cheers > Bartek Wilczynski Thanks Bartek, I was partway through the build process for the Biopython 1.49 release, but I've got that latest Bio/AliceAce/Parser.py file now. I've closed Bug 2680 - I'm not sure how the permissions work on Bugzilla exactly... On a related note - could you write a unit test for Bio.AlignAce please? Thanks, Peter From biopython at maubp.freeserve.co.uk Fri Nov 21 16:07:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 16:07:00 +0000 Subject: [Biopython-dev] Warnings from epydoc Message-ID: <320fb6e00811210807xed03553x24e3abc571e9f20a@mail.gmail.com> Hi all, Something that I could have mentioned when I built the beta is there are a lot of warnings from epydoc. Ignoring a few from deprecated modules etc, there is a whole class as follows: Warning: Module Bio.KDTree.KDTree is shadowed by a variable with the same name. Warning: Module Bio.PDB.DSSP is shadowed by a variable with the same name. Warning: Module Bio.PDB.FragmentMapper is shadowed by a variable with the same name. Warning: Module Bio.PDB.NeighborSearch is shadowed by a variable with the same name. Warning: Module Bio.PDB.PDBIO is shadowed by a variable with the same name. Warning: Module Bio.PDB.PDBList is shadowed by a variable with the same name. Warning: Module Bio.PDB.PDBParser is shadowed by a variable with the same name. Warning: Module Bio.PDB.ResidueDepth is shadowed by a variable with the same name. Warning: Module Bio.PDB.StructureAlignment is shadowed by a variable with the same name. Warning: Module Bio.PDB.Superimposer is shadowed by a variable with the same name. Warning: Module Bio.PDB.Vector is shadowed by a variable with the same name. Warning: Module Bio.PDB.parse_pdb_header is shadowed by a variable with the same name. Warning: Module Bio.SVDSuperimposer.SVDSuperimposer is shadowed by a variable with the same name. Warning: Module Bio.SCOP.Residues is shadowed by a variable with the same name. One visible side effect of this in the epydoc output is these modules get shown with an apostrophe suffix for disambiguation. On another point, I think some of the imports used in Bio.PopGen are making epydoc unhappy: +------------------------------------------------------------------------------------------------- | In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Cache.py: | Import failed (but source code parsing was successful). | Error: ImportError: No module named PopGen.SimCoal.Controller (line 14) | +------------------------------------------------------------------------------------------------- | In /usr/local/lib/python2.5/site-packages/Bio/PopGen/SimCoal/Async.py: | Import failed (but source code parsing was successful). | Error: ImportError: No module named PopGen.SimCoal.Controller (line 16) | Taking Bio/PopGen/SimCoal/Cache.py as an example, currently this has: from PopGen.SimCoal.Controller import SimCoalController from PopGen import Config Perhaps this should be changed to either local imports: from Controller import SimCoalController import Config or full imports: from Bio.PopGen.SimCoal.Controller import SimCoalController from Bio.PopGen import Config (Neither tested yet). I don't know if the current imports have any downsides (apart from upsetting epydoc), as the current code works and the unit tests pass. Peter From bsouthey at gmail.com Fri Nov 21 16:15:29 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 21 Nov 2008 10:15:29 -0600 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> References: <200811211457.mALEvlR5009727@portal.open-bio.org> <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> Message-ID: <4926DEA1.7020405@gmail.com> Peter wrote: > On Fri, Nov 21, 2008 at 3:32 PM, Bartek Wilczynski > wrote: > >> Hello, >> >> I fixed the bug (changed both uses of string.atof() to float() ), and >> commited to CVS, although I cannot close it in Bugzilla (my >> dev.open-bio account does not seem to work for bugzilla). >> >> cheers >> Bartek Wilczynski >> > > Thanks Bartek, > > I was partway through the build process for the Biopython 1.49 > release, but I've got that latest Bio/AliceAce/Parser.py file now. > I've closed Bug 2680 - I'm not sure how the permissions work on > Bugzilla exactly... > > On a related note - could you write a unit test for Bio.AlignAce please? > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Hi Bartek, I just started on working through understanding the functionality of the code so it would be really great to the tests and a tutorial section on AlignAce. So far I know that there needs to be at least two tests for AlignAce: 1) Running Bio.AlignAce.AlignAceStandalone 2) Parsing the output from AlignAce There needs to be similar tests for CompareAce. Also, could you please add the following lines to your AlignAce2004 code (I downloaded it from your site yesterday) to standard.h? #include #include I needed these to compile AlignAce under Linux with gcc version 4.3.2. I would also suggest not to include binaries because they are statically linked to old C++ libraries. Running just './AlignACE' gives the error: ./AlignACE: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory Thanks Bruce From biopython at maubp.freeserve.co.uk Fri Nov 21 16:59:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 16:59:08 +0000 Subject: [Biopython-dev] Biopython 1.49 released Message-ID: <320fb6e00811210859n2d128fd6nc21ad1012e1d93bf@mail.gmail.com> Dear Biopythoneers, We are pleased to announce the release of Biopython 1.49. There have been some significant changes since Biopython 1.48 was released a few months ago, which is why we initially released a beta for wider testing. Thank you to all those who tried this and reported the minor problems uncovered. As previously announced, the big news is that Biopython now uses NumPy rather than its precursor Numeric (the original Numerical Python library). As in the previous releases, Biopython 1.49 supports Python 2.3, 2.4 and 2.5 but should now also work fine on Python 2.6. Please note that we intend to drop support for Python 2.3 in a couple of releases time. We also have some new functionality, starting with the basic sequence object (the Seq class) which now has more methods. This encourages a more object orientated coding style, and makes basic biological operations like transcription and translation more accessible and discoverable. Our BioSQL interface can now optionally fetch the NCBI taxonomy on demand when loading sequences (via Bio.Entrez) allowing you to populate the taxon/taxon_name tables gradually. Also, BioSQL should now work with the psycopg2 driver for PostgreSQL (as well as the older psycopg driver), and the handling of feature locations has also been improved. We've also updated the Biopython Tutorial and Cookbook (also available in PDF). http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Finally, our old parsing infrastructure (Martel and Bio.Mindy) is now considered to be deprecated, meaning mxTextTools is no longer required to use Biopython. This should not affect any of the typically used parsers (e.g. Bio.SeqIO and Bio.AlignIO). Given there have been more changes than in recent Biopython releases, please do check your old scripts still work fine, and let us know on the mailing list or file a bug if there is anything wrong. Source distributions and Windows installers are available from the Biopython website: http://biopython.org/wiki/Download Thanks! -Peter on behalf of the Biopython developers P.S. You may wish to subscribe to our news feed. For RSS links etc, see: http://biopython.org/wiki/News From biopython at maubp.freeserve.co.uk Fri Nov 21 17:05:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 21 Nov 2008 17:05:46 +0000 Subject: [Biopython-dev] CVS freeze for Biopython 1.49 In-Reply-To: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com> References: <320fb6e00811210438v272d32afta03497a846716df6@mail.gmail.com> Message-ID: <320fb6e00811210905i4835819bvb4955b05658ef535@mail.gmail.com> > If there are no last minute objections, my plan is to do the Biopython > 1.49 release this afternoon, hopefully starting after lunch - in about > one hour's time. > > Please **consider CVS frozen from now**. Hopefully I'll have the > build done within the next 12 hours, including the Windows installers. OK, the release is out. Thanks everyone! I haven't sat down and counted, but it feels like there were more people involved and taking an interest than for Biopython 1.48, which is great. > Once the release is out, we'll give it a few days just in case there > are any issues to force a re-release, and then reopen CVS. The CVS "freeze" is over, but for the next couple of days, please only commit small bug fixes and documentation improvements. Baring any surprises, we can expect to start looking at adding new code mid next week: > Tiago has some more PopGen code waiting, and there is also > GenomeDiagram to look forward too (Bug 2671). Have a good weekend, Regards, Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 21 17:24:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 12:24:55 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811211724.mALHOt8x003395@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 12:24 EST ------- Looking at the code for the external_entity_ref_handler function in Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files. Would this be a worthwhile enhancement? We would have to cope with the fact that the process may not have permissions to write to the DTD directory, perhaps by falling back on the system temp folder? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 19:22:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 14:22:36 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200811211922.mALJMa8Q011752@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #3 from joelb at lanl.gov 2008-11-21 14:22 EST ------- I never heard back from info at genbank, so I found a different contact there and I just re-sent the problem. I'll follow up when I hear something. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 19:31:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 14:31:26 -0500 Subject: [Biopython-dev] [Bug 2681] New: BioSQL: record annotations enhancements Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2681 Summary: BioSQL: record annotations enhancements Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: cymon.cox at gmail.com BioSQL storage and retrieval of record annotations. See also bug 2396. Patch fixes 3 annotations: 1) Fixed date/dates typo. 2) comment's were being stored by not retrieved - fixed with test. 3) A 'reference' annotation, even if an empty list, was being retrieved in a DBSeqRecord. Fixed so that if there are no references there is no annotation in DBSeqRecord. Other annotations: 'date', 'ncbi_taxid', 'gi', and 'contig' are the only annotations we are not handling correctly in the test suite. 'date' can be ignored if present in DBSeqRecord but absent in SeqRecord because the current date is entered into table if a date is not present in the record. Annotation 'ncbi_taxid' will be present in the DBSeqRecords even when not present in the loaded SeqRecord as they are grabbed from the taxon table. We can therefore ignore this specific comparision: old record absent, new record present. Some swiss prot SeqRecords have ncbi_taxid and they are retrieved correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing from the retrieved DBSeqRecord: sp012, sp014, Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in bioentry, if the gi annotation is missing, which is pulled as the gi annotation. So the swissprot, fasta, and embl DBSeqRecords return the accession as the gi (GenBank identifier). I think this is misleading; annotation 'gi' in the DBSeqRecord should really be named a more generic 'identifier'... What to do here? 'contig' is ignored by loader because it's a SeqFeature object. Is there any reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 19:32:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 14:32:43 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811211932.mALJWhXP012653@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #1 from cymon.cox at gmail.com 2008-11-21 14:32 EST ------- Created an attachment (id=1074) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1074&action=view) BioSQL patch for enhancements to record annotations -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 22:41:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 17:41:16 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811212241.mALMfGT8026797@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 17:41 EST ------- (In reply to comment #0) > 1) Fixed date/dates typo. Why is it a typo? Change not checked in. > 2) comment's were being stored by not retrieved - fixed with test. Looks good, except for returning an empty list if there were no comments. > 3) A 'reference' annotation, even if an empty list, was being retrieved in a > DBSeqRecord. Fixed so that if there are no references there is no annotation > in DBSeqRecord. I agree, but preferred a smaller change for this: Checking in BioSQL/BioSeq.py; /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py new revision: 1.33; previous revision: 1.32 done Checking in Tests/test_BioSQL_SeqIO.py; /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- test_BioSQL_SeqIO.py new revision: 1.29; previous revision: 1.28 done This was based closely on your patch, so thank you! You are making steady progress through the remaining "TODO" notes I left when writing test_BioSQL_SeqIO.py :) > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing > from the retrieved DBSeqRecord: sp012, sp014, Note some swiss prot records may be multi-species, which the BioSQL schema can't cope with. Not sure if that applies here. > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in > bioentry, if the gi annotation is missing, which is pulled as the gi > annotation. There probably is something not quite right here. Are you talking about the bioentry.identifier entry in the database? Perhaps an explicit example might help. As an aside, I think "gi" (GeneIndex used by NCBI) might be better stored in the record.dbxrefs, but that could be a parser change... > 'contig' is ignored by loader because it's a SeqFeature object. Is there any > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) I couldn't even say off hand how the CONTIG line in that example would be parsed, let alone how it gets dealt with when loading into BioSQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 22:42:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 17:42:33 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811212242.mALMgXAN026914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-21 17:42 EST ------- P.S. For a little background, see Bug 2396. Looking back I can see why I missed the comments annotation at the time (being stored in a different table). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 23:47:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 18:47:13 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811212347.mALNlDsF030565@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-21 18:47 EST ------- (In reply to comment #4) > Looking at the code for the external_entity_ref_handler function in > Bio/Entrez/Parser.py is doesn't actually attempt to cache missing DTD files. > > Would this be a worthwhile enhancement? We would have to cope with the fact > that the process may not have permissions to write to the DTD directory, > perhaps by falling back on the system temp folder? > I think that there is an easier solution, which is to include all missing DTDs with the Biopython installation. The number of DTDs is limited; I tried to identify all of them but apparently I missed some. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 21 23:49:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 21 Nov 2008 18:49:27 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200811212349.mALNnRMn030720@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-11-21 18:49 EST ------- > I'll add the DTDs that I noted above, but the problem is intermittent and I > haven't seen the issue arise again at all, this morning. If I see anything > else give an error, I'll make a note here. > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read it from there. If not, it tries to download it. This may fail if the servers are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when Biopython is installed), you won't run into this problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Nov 23 15:16:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 23 Nov 2008 10:16:53 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811231516.mANFGraa019222@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #7 from dalloliogm at gmail.com 2008-11-23 10:16 EST ------- (In reply to comment #0) > The major changes that have been made to the version previously available at > http://bioinf.scri.ac.uk/lp are: That's a very nice contribution, thank you!!! This link is wrong, I think you mean http://bioinf.scri.ac.uk/lp/programs.php#genomediagram > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From dalloliogm at gmail.com Sun Nov 23 17:33:54 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 23 Nov 2008 18:33:54 +0100 Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython Message-ID: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com> Hi people, I thought that the inclusion of GenomeDiagrams in biopython is such an interesting news, that I wrote a blog post on it: - http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/ I have used images from some tutorials without asking, I hope it is not a problem. Cheers! :) On Sun, Nov 23, 2008 at 4:16 PM, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=2671 > > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From mjldehoon at yahoo.com Mon Nov 24 06:44:13 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 23 Nov 2008 22:44:13 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework Message-ID: <871524.42970.qm@web62403.mail.re1.yahoo.com> Hi everybody, Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result. Biopython uses test scripts that print output to stdout, together with an output file that contains the correct output. After running each test script, it compares the generated output with the correct output to see if the test was successful. This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result. However, more than half of Biopython's tests do not actually make use of this testing framework: test_BioSQL test_CAPS test_Cluster test_CodonTable test_Compass test_Crystal test_DocSQL test_EmbossPrimer test_Entrez test_Fasta test_GACrossover test_GAMutation test_GAOrganism test_GAQueens test_GARepair test_GASelection test_GFF test_GFF2 test_GraphicsChromosome test_GraphicsDistribution test_GraphicsGeneral test_HMMCasino test_HMMGeneral test_HotRand test_KDTree test_KeyWList test_LogisticRegression test_Medline test_NNExclusiveOr test_NNGene test_NNGeneral test_Pathway test_PopGen_FDist test_PopGen_FDist_nodepend test_PopGen_SimCoal test_PopGen_SimCoal_nodepend test_Registry test_Restriction test_SCOP_Astral test_SCOP_Cla test_SCOP_Des test_SCOP_Dom test_SCOP_Hie test_SCOP_Raf test_SCOP_Residues test_SCOP_Scop test_Wise test_docstrings test_kNN test_lowess test_psw These tests have trivial output, for example test_Cluster: test_Cluster test_clusterdistance (test_Cluster.TestCluster) ... ok test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok test_kcluster (test_Cluster.TestCluster) ... ok test_matrix_parse (test_Cluster.TestCluster) ... ok test_median_mean (test_Cluster.TestCluster) ... ok test_somcluster (test_Cluster.TestCluster) ... ok test_treecluster (test_Cluster.TestCluster) ... ok ---------------------------------------------------------------------- Ran 7 tests in 0.015s OK I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython. Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior. I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality. Comments, suggestions, anybody? --Michiel. From dalloliogm at gmail.com Mon Nov 24 09:04:08 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 24 Nov 2008 10:04:08 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com> References: <871524.42970.qm@web62403.mail.re1.yahoo.com> Message-ID: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com> On Mon, Nov 24, 2008 at 7:44 AM, Michiel de Hoon wrote: > Hi everybody, > > Biopython's testing framework is built on top of Python's unit testing framewerk. Python's unit testing framework makes use of assertion statements to compare the result of a command to the expected result. Hi, I was also proposing to use the doctest framework for some of the modules, and for enhancing documentation. - http://bugzilla.open-bio.org/show_bug.cgi?id=2640 > Biopython uses test scripts that print output to stdout, together with an output file that contains the > correct output. After running each test script, it compares the generated output with the correct > output to see if the test was successful. > > This approach can be useful for modules that deal with different file formats. For example, you can read in a file in one format, write it out in a different format, and compare it with the expected result. > > However, more than half of Biopython's tests do not actually make use of this testing framework: > Do you need help in re-organizing all of these modules? > test_BioSQL > test_CAPS > test_Cluster > test_CodonTable > test_Compass > test_Crystal > test_DocSQL > test_EmbossPrimer > test_Entrez > test_Fasta > test_GACrossover > test_GAMutation > test_GAOrganism > test_GAQueens > test_GARepair > test_GASelection > test_GFF > test_GFF2 > test_GraphicsChromosome > test_GraphicsDistribution > test_GraphicsGeneral > test_HMMCasino > test_HMMGeneral > test_HotRand > test_KDTree > test_KeyWList > test_LogisticRegression > test_Medline > test_NNExclusiveOr > test_NNGene > test_NNGeneral > test_Pathway > test_PopGen_FDist > test_PopGen_FDist_nodepend > test_PopGen_SimCoal > test_PopGen_SimCoal_nodepend > test_Registry > test_Restriction > test_SCOP_Astral > test_SCOP_Cla > test_SCOP_Des > test_SCOP_Dom > test_SCOP_Hie > test_SCOP_Raf > test_SCOP_Residues > test_SCOP_Scop > test_Wise > test_docstrings > test_kNN > test_lowess > test_psw > > These tests have trivial output, for example test_Cluster: > > test_Cluster > test_clusterdistance (test_Cluster.TestCluster) ... ok > test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok > test_kcluster (test_Cluster.TestCluster) ... ok > test_matrix_parse (test_Cluster.TestCluster) ... ok > test_median_mean (test_Cluster.TestCluster) ... ok > test_somcluster (test_Cluster.TestCluster) ... ok > test_treecluster (test_Cluster.TestCluster) ... ok > > ---------------------------------------------------------------------- > Ran 7 tests in 0.015s > > OK > > I suspect that for many of the remaining tests Biopython's unit testing framework doesn't bring any real advantage, but is used anyway solely because it currently is the standard in Biopython. > > Personally, I find Python's unit testing framework easier to understand than Biopython's testing framework. It doesn't need a separate output file, and it is easier to match each line of code with the correct behavior. > > I would therefore like to suggest to move from Biopython's testing framework to Python's testing framework. This also relieves us of the task of explaining Biopython's testing framework to contributors, and allows us to make better use of what Python already provides. Comparing output line-by-line, as Biopython's testing framework currently does, can still be used by test scripts that need this functionality. > > Comments, suggestions, anybody? > > --Michiel. > > > > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bartek at rezolwenta.eu.org Mon Nov 24 12:45:52 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 24 Nov 2008 13:45:52 +0100 Subject: [Biopython-dev] [Bug 2680] Bio.AlignAce.Parser.py need to import string In-Reply-To: <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> References: <200811211457.mALEvlR5009727@portal.open-bio.org> <8b34ec180811210732o4266a87ey2a4c14a7ddc5ead5@mail.gmail.com> <320fb6e00811210745yc8e796ei9bc04a2e2cebda8b@mail.gmail.com> Message-ID: <8b34ec180811240445w3e6e97d8k38c1740e84372184@mail.gmail.com> On Fri, Nov 21, 2008 at 4:45 PM, Peter wrote: > > On a related note - could you write a unit test for Bio.AlignAce please? > Hi Peter, I do not have much experience with writing unit tests but I would like to do it (treating it as an opportunity to learn more on unit tests). There are two issues which are somewhat related to this: - I have some more code related to sequence motif analysis which I'm using myself and could contribute as an extension to BIo.AlignACE. If people are interested in having this in biopython, it would be sensible to think about refactoring Bio.AlignACE and Bio.MEME which both provide a Motif class with largely overlapping functionality. I could do that and at the same time write unit tests for the new version. For that it would be cool to get input from all current or potential users of this functionality. I'll think about it a little and maybe write to biopython-users list. - The other issue is connected with the type of the tests I should write. Since Michiel brought this topic up recently, I'd like to know whether I should do it in the python (doctest) or biopython way. cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bartek at rezolwenta.eu.org Mon Nov 24 14:51:12 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 24 Nov 2008 15:51:12 +0100 Subject: [Biopython-dev] Refactoring motif analysis code Message-ID: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> Hello All, Currently, there are two packages dealing with motif analysis in biopython : Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney). Both of them are quite old and they were developed independently so the functionality is largely overlapping. Particularly the files AlignAce/Motif.py and MEME/Motif.py contain almost identical functionality useful for anyone interested in motif analysis of writing a parser for yet another motif searching tool. I'd like to change this and create a new library called Bio.Motif, which would contain: -Motif class for all general functionality concerning motif objects: i/o, comparisons, sequence scanning -AlignAce Parser -MEME Parser When this is completed, we could deprecate the AlignAce and MEME modules. For AlignAce I have most of the code already written, I need to rewrite portions of MEME parser to work with different motif implementation (not a major pain). Then I just need to polish it a bit and provide tests and a short tutorial. After this rather long intro I'd like to ask about several things: - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy about deprecating them? - Are there any features which people would find valuable in Bio.Motif - Both MEME and AlignAce are DNA-oriented, I've never worked on Protein motifs myself, but I'd like to know whether anyone is interested in using Bio.Motif for that Any comments/ideas are welcome cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From dalloliogm at gmail.com Mon Nov 24 15:25:23 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 24 Nov 2008 16:25:23 +0100 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> Message-ID: <5aa3b3570811240725n54f7f624oc1db5fe0b88e3f5a@mail.gmail.com> On Mon, Nov 24, 2008 at 3:51 PM, Bartek Wilczynski wrote: > Hello All, > > Currently, there are two packages dealing with motif analysis in biopython : > Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney). Hi, I asked a question about motifs one year ago on this list. Here it is the thread: - http://lists.open-bio.org/pipermail/biopython/2007-September/003727.html I would just like to tell you that I have tried the TAMO framework you suggested me, and found it very useful. I am not using it anymore because I don't need it, but I remember that I liked: - the methods to represent motifs as matrixes of frequencies/occurrencies etc.. - the fact that it was easy to create a motif from an alignment of sequences - the integration it had with this website: http://weblogo.berkeley.edu/logo.cgi. I would suggest you to provide integration with this other web service, which enable to plot the difference between two sequence logos: http://www.twosamplelogo.org/examples.html. Maybe you should contact TAMO's author to ask him if he wants to contribute, because I remember that its framework was really complete. > > Both of them are quite old and they were developed independently so > the functionality is largely overlapping. > Particularly the files AlignAce/Motif.py and MEME/Motif.py contain > almost identical functionality useful for > anyone interested in motif analysis of writing a parser for yet > another motif searching tool. > > I'd like to change this and create a new library called Bio.Motif, > which would contain: > -Motif class for all general functionality concerning motif objects: > i/o, comparisons, sequence scanning > -AlignAce Parser > -MEME Parser > > When this is completed, we could deprecate the AlignAce and MEME > modules. For AlignAce I have most of the code > already written, I need to rewrite portions of MEME parser to work > with different motif implementation (not a major pain). > Then I just need to polish it a bit and provide tests and a short tutorial. > > After this rather long intro I'd like to ask about several things: > - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy > about deprecating them? > - Are there any features which people would find valuable in Bio.Motif > - Both MEME and AlignAce are DNA-oriented, I've never worked on > Protein motifs myself, but I'd like to know whether anyone is > interested in using Bio.Motif for that > > Any comments/ideas are welcome > > cheers > Bartek > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- ----------------------------------------------------------- My Blog on Bioinformatics (italian): http://bioinfoblog.it From bsouthey at gmail.com Mon Nov 24 15:54:32 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 24 Nov 2008 09:54:32 -0600 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> Message-ID: <492ACE38.1090301@gmail.com> Bartek Wilczynski wrote: > Hello All, > > Currently, there are two packages dealing with motif analysis in biopython : > Bio.AlignAce (written by me) and Bio.MEME (written by Jason Hackney). > Actually I am not that thrilled with the licenses for these packages and similar packages because these are free only for academic use. To me this clashes with the spirit of an open-sourced project especially a BSD-licensed one. But if there is a need for such modules then these modules should be included. > Both of them are quite old and they were developed independently so > the functionality is largely overlapping. > Particularly the files AlignAce/Motif.py and MEME/Motif.py contain > almost identical functionality useful for > anyone interested in motif analysis of writing a parser for yet > another motif searching tool. > > I'd like to change this and create a new library called Bio.Motif, > which would contain: > -Motif class for all general functionality concerning motif objects: > i/o, comparisons, sequence scanning > -AlignAce Parser > -MEME Parser > > While it is only free for academic use, have you seen TAMO? *TAMO: a flexible, object-oriented framework for analyzing transcriptional regulation using DNA-sequence motifs. * Bioinformatics. 2005 Jul 15;21(14):3164-5. http://fraenkel.mit.edu/TAMO/ > When this is completed, we could deprecate the AlignAce and MEME > modules. For AlignAce I have most of the code > already written, I need to rewrite portions of MEME parser to work > with different motif implementation (not a major pain). > Then I just need to polish it a bit and provide tests and a short tutorial. > > After this rather long intro I'd like to ask about several things: > - Are there many Bio.AlignAce or Bio.MEME users who would be unhappy > about deprecating them? > Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-) Based on the CVS, both have been untouched for about three years. Also, what species are these used for? One of the papers of AlignAce indicate that the base composition was set for yeast. > - Are there any features which people would find valuable in Bio.Motif > - Both MEME and AlignAce are DNA-oriented, I've never worked on > Protein motifs myself, but I'd like to know whether anyone is > interested in using Bio.Motif for that > > Any comments/ideas are welcome > > cheers > Bartek > > Personally I would be interested in a general protein motif finding module because of my current research. However, I do have a different view with respect to the Biopython community as indicated above with the licenses. Bruce From bsouthey at gmail.com Mon Nov 24 17:47:21 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 24 Nov 2008 11:47:21 -0600 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> References: <4926D17A.8080101@gmail.com> <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> Message-ID: <492AE8A9.1000406@gmail.com> Peter wrote: > On Fri, Nov 21, 2008 at 3:19 PM, Bruce Southey wrote: > >> Hi, >> There are a number of files in Bio that import string. Many of these use >> depreciated functions (since Version 2) that are now string methods mainly >> string.atof(), string.atoi() and string.join(). The only real advantage of >> modifying these is to remove an import statement because these will not be >> removed until Python 3. >> >> Perhaps the one exception is in HotRand.py: hex_digit = >> string.hexdigits.find( letter ) >> >> There are about 23 unique files that I identified via grep and many have >> more than one usage. While changing these is busy work, please let me know >> if you would like me to create patches for the next version of Biopython (ie >> 1.50) or just ignore this. >> > > As you say, there isn't much benefit from doing this other than > removing an import and making another small step towards Python 3.0 > compatibility. We have gradually been phasing out "import string" > already, usually when working on a module which used it. > > Once I've dealt with Biopython 1.49, I'd be happy to look at a patch > to remove more "import string" usage from non-obsolete, non-deprecated > code. It would be a little risky doing this to modules without unit > tests, but that's another area you've shown some interest in anyway... > > Thanks, > > Peter > > Hi, I was planning to get started on with these depending on what time I have available. So just a quick question: Do you want one bug report per patch per file? Or just let me know if there is another way. Thanks Bruce From biopython at maubp.freeserve.co.uk Mon Nov 24 18:42:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 24 Nov 2008 18:42:08 +0000 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <492AE8A9.1000406@gmail.com> References: <4926D17A.8080101@gmail.com> <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> <492AE8A9.1000406@gmail.com> Message-ID: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com> On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey wrote: >> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch >> to remove more "import string" usage from non-obsolete, non-deprecated >> code. It would be a little risky doing this to modules without unit >> tests, but that's another area you've shown some interest in anyway... >> >> Thanks, >> >> Peter > > Hi, > I was planning to get started on with these depending on what time I have > available. So just a quick question: > Do you want one bug report per patch per file? > Or just let me know if there is another way. I'd suggest one general bug, and uploading one patch per module - that way the can be evaluated on a case by case basis (a single huge multi-file patch would be more difficult, and could become out of date). Personally however, I would prioritise more unit test coverage over this, but on the other hand its the kind of short task you can handle when you have the odd spare 10 minutes. Up to you. Peter From bugzilla-daemon at portal.open-bio.org Mon Nov 24 20:40:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 15:40:49 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811242040.mAOKenEi002020@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #4 from cymon.cox at gmail.com 2008-11-24 15:40 EST ------- (In reply to comment #2) > (In reply to comment #0) > > 1) Fixed date/dates typo. > > Why is it a typo? Change not checked in. The function _load_bioentry_date in Loader.py inserts the annotation 'date', if present, or the current date if not, into the bioentry_qualifier_value table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, which should be 'date' and not 'dates'. Also, because Loader.py handles dates separately, they should not be handled by the function load_annotations. > > 2) comment's were being stored by not retrieved - fixed with test. > > Looks good, except for returning an empty list if there were no comments. > > > 3) A 'reference' annotation, even if an empty list, was being retrieved in a > > DBSeqRecord. Fixed so that if there are no references there is no annotation > > in DBSeqRecord. > > I agree, but preferred a smaller change for this: > > Checking in BioSQL/BioSeq.py; > /home/repository/biopython/biopython/BioSQL/BioSeq.py,v <-- BioSeq.py > new revision: 1.33; previous revision: 1.32 > done > Checking in Tests/test_BioSQL_SeqIO.py; > /home/repository/biopython/biopython/Tests/test_BioSQL_SeqIO.py,v <-- > test_BioSQL_SeqIO.py > new revision: 1.29; previous revision: 1.28 > done Actually, your version of _retrieve_comment never returns comments ;-) On the wider issue: perhaps, it's best if DBSeqRecord's always have the same set of attributes, even if comments and references are empty lists. Trying to regenerate the attributes present in the loaded SeqRecord is, I think, not the way to go, and not possible (or at least currently not attempted) for fasta records. Perhaps we should be coding around the issue in the test suite rather than changing the attributes of the DBSeqRecord so that it passes the test... > > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved > > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing > > from the retrieved DBSeqRecord: sp012, sp014, > > Note some swiss prot records may be multi-species, which the BioSQL schema > can't cope with. Not sure if that applies here. Yep, thats exactly what was causing the problem. Currently the code refuses to load an ncbi_taxid, which I think is correct, after all which one should be loaded? Anyway, I'll look into this a bit more... > > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved > > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in > > bioentry, if the gi annotation is missing, which is pulled as the gi > > annotation. > > There probably is something not quite right here. Are you talking about the > bioentry.identifier entry in the database? Perhaps an explicit example might > help. As an aside, I think "gi" (GeneIndex used by NCBI) might be better > stored in the record.dbxrefs, but that could be a parser change... Ah, OK, will look further into this as well... > > 'contig' is ignored by loader because it's a SeqFeature object. Is there any > > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) > > I couldn't even say off hand how the CONTIG line in that example would be > parsed, let alone how it gets dealt with when loading into BioSQL. Well, the parser correctly deals with it as a SeqFeature (with a whole bunch of sub_features) but it never gets loaded its not dealt with at all an falls of the bottom of the function; I cant see any reason not to load it... C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 24 21:40:24 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 16:40:24 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811242140.mAOLeO8n008996@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #5 from cymon.cox at gmail.com 2008-11-24 16:40 EST ------- (In reply to comment #2) > (In reply to comment #0) > > Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved > > DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in > > bioentry, if the gi annotation is missing, which is pulled as the gi > > annotation. > > There probably is something not quite right here. Are you talking about the > bioentry.identifier entry in the database? Perhaps an explicit example might > help. As an aside, I think "gi" (GeneIndex used by NCBI) might be better > stored in the record.dbxrefs, but that could be a parser change... The "gi" annotation of a parsed GenBank record refers to this GenInfo Identifier: >From NCBI: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#GInB """ "GenInfo Identifier" sequence identification number, in this case, for the nucleotide sequence. If a sequence changes in any way, a new GI number will be assigned. GI sequence identifiers run parallel to the new accession.version system of sequence identifiers. """ This is stored in bioentry.identifier. However, "gi"'s are not present in swissprot, fasta, and embl records, instead the following couplet loads the record.id into the identifier slot: Loader.py: 519 if "gi" in record.annotations : 520 identifier = record.annotations["gi"] 521 else : 522 identifier = record.id But of course, the record.id is not the "gi" - so perhaps the bioentry.identifier should be left NULL if the "gi" number is missing. Or we might consider calling the DBSeqRecord attribute "identifier" rather than "gi"... Here's an example of an EMBL file where the record.id becomes the "gi": Testing loading from embl format file EMBL/TRBG361.embl - AAACAAACCAAATATGGAT...AAA [jfp/7BKv3jTJAU/4jVMrSftEq20] len 1859, X56734.1 - Retrieving by name/display_id 'X56734', old annos diff: set([]) new annos diff: set(['dates', 'ncbi_taxid', 'gi']) OLD: taxonomy = ['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'eudicotyledons', 'core eudicotyledons', 'rosids', 'eurosids I', 'Fabales', 'Fabaceae', 'Papilionoideae', 'Trifolieae', 'Trifolium'] references = [, ] accessions = ['X56734', 'S46826'] data_file_division = PLN organism = Trifolium repens (white clover) sequence_version = 1 NEW: dates = ['24-NOV-2008'] ncbi_taxid = 3899 references = [, ] accessions = ['X56734', 'S46826'] data_file_division = PLN taxonomy = ['Trifolium repens (white clover)'] gi = X56734.1 organism = Trifolium repens (white clover) sequence_version = ['1'] ncbi_taxid: 3899 C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 24 22:51:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 17:51:37 -0500 Subject: [Biopython-dev] [Bug 2683] New: Modules with unused string modules Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2683 Summary: Modules with unused string modules Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com This is a trivial general bug for any Biopython modules that import the string module but do not use it. A different bug will be used for those modules that actually use any depreciated string functions. Please attach any similar modules to this report. AlignAce modules: Bio/AlignAce/AlignAceStandalone.py Bio/AlignAce/CompareAceStandalone.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Nov 24 23:05:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 24 Nov 2008 18:05:27 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200811242305.mAON5Rs2017499@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #6 from cymon.cox at gmail.com 2008-11-24 18:05 EST ------- (In reply to comment #4) > (In reply to comment #2) > > (In reply to comment #0) > > > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved > > > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing > > > from the retrieved DBSeqRecord: sp012, sp014, > > > > Note some swiss prot records may be multi-species, which the BioSQL schema > > can't cope with. Not sure if that applies here. > > Yep, thats exactly what was causing the problem. Currently the code refuses to > load an ncbi_taxid, which I think is correct, after all which one should be > loaded? Anyway, I'll look into this a bit more... So, how best to handle records with multiple taxa: SwissProt/sp014 has 10 organisms which are currently loaded directly into the taxon_name table: biosql_test=# select name, name_class from taxon_name where taxon_id = 94; name | name_class ------------------------------------------------------------------------------ Oryza sativa (Rice), Nicotiana tabacum (Common tobacco) Hordeum vulgare (Barley), Triticum aestivum (Wheat) Secale cereale (Rye), Zea mays (Maize), Pisum sativum (Garden pea) Spinacia oleracea (Spinach), Capsicum annuum (Bell pepper) Mesembryanthemum crys | scientific name (1 row) That's clearly not a scientific name... The record has the ncbi_taxon_ids: OX NCBI_TaxID=4530, 4097, 4513, 4565, 4550, 4577, 3888, 3562, 4072, 3544, 19 OX 3555, 3696; Which are currently not stored because there is more than one: Loader.py: 150 ncbi_taxon_id = None 151 if "ncbi_taxid" in record.annotations : 152 #Could be a list of IDs. 153 if isinstance(record.annotations["ncbi_taxid"],list) : 154 if len(record.annotations["ncbi_taxid"])==1 : 155 ncbi_taxon_id = record.annotations["ncbi_taxid"][0] 156 else : 157 ncbi_taxon_id = record.annotations["ncbi_taxid"] BioSQL is clearly not designed to store records from multiple taxa: one bioentry has one taxon_id. Should biopython be refusing to load such records if the scientific name is not a binomial? What does perl do? C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue Nov 25 04:08:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 24 Nov 2008 20:08:18 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570811240104m1442e5dfkd0c0f92c6fa772f9@mail.gmail.com> Message-ID: <199296.58154.qm@web62402.mail.re1.yahoo.com> > > However, more than half of Biopython's tests do > > not actually make use of this testing framework: > > Do you need help in re-organizing all of these modules? That would be helpful, but let's see first if there are any objections to my proposal. We'll also have to decide the pathway to change the tests without breaking anything. For the unit tests I listed, the changes should be trivial, but still we need to check if any problems show up. Thanks! --Michiel. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 14:31:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 09:31:18 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251431.mAPEVIYj014396@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #1 from bsouthey at gmail.com 2008-11-25 09:31 EST ------- Bio/Crystal/__init__.py imports but does appear to use the following modules: array string Seq MutableSeq -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 14:40:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 09:40:23 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251440.mAPEeN8f015160@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #2 from barwil at gmail.com 2008-11-25 09:40 EST ------- > AlignAce modules: > Bio/AlignAce/AlignAceStandalone.py > Bio/AlignAce/CompareAceStandalone.py > Fixed in CVS now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Tue Nov 25 14:40:41 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 25 Nov 2008 09:40:41 -0500 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <871524.42970.qm@web62403.mail.re1.yahoo.com> References: <871524.42970.qm@web62403.mail.re1.yahoo.com> Message-ID: <20081125144041.GC83220@sobchak.mgh.harvard.edu> Hi Michiel; Good thoughts on this; my comments are below. > Biopython's testing framework is built on top of Python's unit testing > framewerk. Python's unit testing framework makes use of assertion > statements to compare the result of a command to the expected result. > Biopython uses test scripts that print output to stdout, together with > an output file that contains the correct output. After running each > test script, it compares the generated output with the correct output > to see if the test was successful. Agreed with the distinction between the unit tests and the "dump lots of text and compare" approach. I've written both and do think the unit testing/assertion model is more robust since you can go back and actually get some insight into what someone was thinking when they wrote an assertion. > However, more than half of Biopython's tests do not actually make use of this testing framework: [...] > These tests have trivial output, for example test_Cluster: > > test_Cluster > test_clusterdistance (test_Cluster.TestCluster) ... ok > test_distancematrix_kmedoids (test_Cluster.TestCluster) ... ok > test_kcluster (test_Cluster.TestCluster) ... ok > test_matrix_parse (test_Cluster.TestCluster) ... ok > test_median_mean (test_Cluster.TestCluster) ... ok > test_somcluster (test_Cluster.TestCluster) ... ok > test_treecluster (test_Cluster.TestCluster) ... ok They really do make use of the framework, but at a higher level. I agree that if you run a single test it makes little difference whether you use 'run_tests.py test_Cluster' or just run 'test_Cluster.py' directly. However, when you are running all the tests as is regular done in development or before pushing releases, this comparison is important. It will will pick out if you get a line like: test_clusterdistance (test_Cluster.TestCluster) ... ERROR instead of the expected ok and report this in the summary for all of the tests. Otherwise this is likely to get lost in all of the results. > Personally, I find Python's unit testing framework easier to > understand than Biopython's testing framework. It doesn't need a > separate output file, and it is easier to match each line of code with > the correct behavior. > > I would therefore like to suggest to move from Biopython's testing > framework to Python's testing framework. This also relieves us of the > task of explaining Biopython's testing framework to contributors, > and allows us to make better use of what Python already provides. > Comparing output line-by-line, as Biopython's testing framework > currently does, can still be used by test scripts that need this > functionality. Is the testing framework you are proposing different from the unit tests used the individual tests? How does your proposed manage the higher level functionality of checking if all sub-tests within one of the test suites passes? Brad From bugzilla-daemon at portal.open-bio.org Tue Nov 25 15:24:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 10:24:33 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251524.mAPFOXe2019581@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #3 from bsouthey at gmail.com 2008-11-25 10:24 EST ------- Bio/FilteredReader.py imports but does appear to use the following modules: os string copy from File import UndoHandle -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 16:13:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:13:01 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811251613.mAPGD1FG024870@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #7 from cymon.cox at gmail.com 2008-11-25 11:13 EST ------- (In reply to comment #6) > (From update of attachment 1072 [details]) > I think this is still a big improvement, but that the > (sub)feature.location_operator issue could wait. We'll need to discuss on the > BioSQL mailing list how this should be handled consistently. > > Leaving this bug open. Further to the "where to put the (sub)feature.location_operator" (eg. "join", "order") question, this comment appears in the BioPerl MySQL schema for the location_qualifier_value table: -- location qualifiers - mainly intended for fuzzies but anything -- can go in here -- some controlled vocab terms have slots; So, this would seem a suitable place to store the attribute. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 16:13:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:13:07 -0500 Subject: [Biopython-dev] [Bug 2684] New: GenBank/__init__.py: Removing loop over string.whitespace Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2684 Summary: GenBank/__init__.py: Removing loop over string.whitespace Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The function '_clean_location' in GenBank/__init__.py uses a 'for' loop over string.whitespace that removes whitespace from string. A simpler way is to just split the string on whitespace and rejoin it as a single line: location_line=''.join(location_string.split()) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 16:14:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:14:19 -0500 Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over string.whitespace In-Reply-To: Message-ID: <200811251614.mAPGEJvT025100@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2684 ------- Comment #1 from bsouthey at gmail.com 2008-11-25 11:14 EST ------- Created an attachment (id=1083) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1083&action=view) Removal of unnessary loop over string.whitespace -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 16:30:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:30:01 -0500 Subject: [Biopython-dev] [Bug 2685] New: HotRand provides an unnecessary function to convert hex to integer Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2685 Summary: HotRand provides an unnecessary function to convert hex to integer Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The file Bio/HotRand.py defines the function hex_convert that converts a hex number to an integer number. This functionality is provided by the builtin int() with appropriate radix, i.e. int(hex_number, 16) This function could be removed or replaced to avoiding using the string module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 16:31:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:31:09 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811251631.mAPGV91O027180@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 ------- Comment #1 from bsouthey at gmail.com 2008-11-25 11:31 EST ------- Created an attachment (id=1084) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1084&action=view) Replaces hex_convert() with int() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 16:52:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:52:12 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811251652.mAPGqCMt029684@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1084 is|0 |1 obsolete| | ------- Comment #2 from bsouthey at gmail.com 2008-11-25 11:52 EST ------- Created an attachment (id=1085) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1085&action=view) Messed up the first patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 16:53:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 11:53:41 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811251653.mAPGrfPk029811@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1085 is|0 |1 obsolete| | ------- Comment #3 from bsouthey at gmail.com 2008-11-25 11:53 EST ------- Created an attachment (id=1086) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1086&action=view) Sorry wrong version -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 18:18:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 13:18:59 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811251818.mAPIIxQt006109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 ------- Comment #4 from bsouthey at gmail.com 2008-11-25 13:18 EST ------- These are the last files that I have found in Bio that import the string module but are not used: IntelliGenetics/__init__.py IntelliGenetics/intelligenetics_format.py IntelliGenetics/Record.py NetCatch.py SCOP/__init__.py PDB/PSEA.py (imports upper) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 25 22:18:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 25 Nov 2008 17:18:41 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811252218.mAPMIfFX029455@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 mmokrejs at ribosome.natur.cuni.cz changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|translate and transcibe |translate and transcribe |methods for the Seq object |methods for the Seq object |(in Bio.Seq) |(in Bio.Seq) ------- Comment #53 from mmokrejs at ribosome.natur.cuni.cz 2008-11-25 17:18 EST ------- (In reply to comment #27) > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] > Patch to Bio/Seq.py to add start codon handling to translation > > Patch adds a new boolean argument to the translate method and function, called > "init" (rather than my earlier suggestions like "from_start" or "check_start" > which could be considered misleading). > > Docstring: > > init - Boolean, defaults to False. Should translation check the > first codon is a valid initiation (start) codon and translate > it as methionine (M)? If False, nothing special is done with > the first codon. What kind of check is it doing? I think it just forces the first letter to be 'M'. > > > Example usage of the translate function, > > >>> from Bio.Seq import translate > >>> translate("TTGAAACCCTAG") > 'LKP*' > >>> translate("TTGAAACCCTAG", init=True, to_stop=True) > 'MKP' > >>> translate("TTGAAACCCTAG", init=True) > 'MKP*' > >>> translate("TTGAAACCCTAG", to_stop=True) > 'LKP' I don't like the "init" argument either. I would call it force_initiator_Met instead. BTW, non-canonical initiator codon is CUG, where did you found UUG? Sorry, I got overloaded by many other tasks so haven't read any other follow-ups, I just hit the email from bugzilla by luck. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 15:57:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 10:57:05 -0500 Subject: [Biopython-dev] [Bug 2688] New: Removal of depreciated string functions Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2688 Summary: Removal of depreciated string functions Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com This is a general bug to remove any depreciated string functions from Biopython modules. I apologize in advance for the noise this creates especially due to my mistakes. I have tested and validated the subsequent patches on my Linux system with Python versions 2.3, 2.4, 2.5 and 2.6. However, I do recognize that patches may be in code not used by the tests. The following files require importing the string module and are thus excluded (although depreciated functions may still be used): Bio/Decode.py - maketrans() Bio/EUtils/POM.py - maketrans() Bio/Prosite/Pattern.py - maketrans() Bio/Seq.py - maketrans() triefind.py - defines string.punctuation + string.whitespace The following files have alternative reports GenBank/__init__.py HotRand.py The following files are depreciated and are excluded: Emboss/Primer.py stringfns.py MetaTool/__init__.py MetaTool/metatool_format.py MetaTool/Record.py NBRF/__init__.py Ndb/__init__.py Transcribe.py The following files import but do not use the string module AlignAce/AlignAceStandalone.py (fixed) AlignAce/CompareAceStandalone.py (fixed) Crystal/__init__.py IntelliGenetics/__init__.py IntelliGenetics/intelligenetics_format.py IntelliGenetics/Record.py NetCatch.py SCOP/__init__.py The following files are known to use string module and have patches: Align/AlignInfo.py Blast/ParseBlastTable.py FSSP/__init__.py NMR/NOEtools.py NMR/xpktools.py PDB/MMCIFParser.py SubsMat/__init__.py Blast/Record.py Compass/__init__.py Data/CodonTable.py Eutils/sourcegen.py Eutils/tests/unittest.py Fasta/FastaAlign.py FilteredReader.py GFF/easy.py HMM/Utilities.py Index.py MEME/Parser.py NeuralNetwork/Gene/Pattern.py NeuralNetwork/Gene/Schema.py Parsers/spark.py PDB/parse_pdb_header.py PDB/PDBList.py PDB/PDBParser.py PDB/PSEA.py SCOP/__init__.py utils.py I did not see an trivial resolution for the functions in: SubsMat/FreqTable.py So I rewrote the functions to avoid using map. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 15:58:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 10:58:03 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261558.mAQFw3wc029231@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #1 from bsouthey at gmail.com 2008-11-26 10:58 EST ------- Created an attachment (id=1088) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1088&action=view) Remove depreciated string functions -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 15:59:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 10:59:27 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261559.mAQFxR5t029522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #2 from bsouthey at gmail.com 2008-11-26 10:59 EST ------- Created an attachment (id=1089) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1089&action=view) Blast/Record.py patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:01:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:01:30 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261601.mAQG1U4h029894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #3 from bsouthey at gmail.com 2008-11-26 11:01 EST ------- Created an attachment (id=1090) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1090&action=view) Compass/__init__.py depreciated string functions -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:02:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:02:26 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261602.mAQG2Qlx030068@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #4 from bsouthey at gmail.com 2008-11-26 11:02 EST ------- Created an attachment (id=1091) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1091&action=view) Data/CodonTable.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:03:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:03:14 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261603.mAQG3ETM030188@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #5 from bsouthey at gmail.com 2008-11-26 11:03 EST ------- Created an attachment (id=1092) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1092&action=view) Eutils/sourcegen.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:04:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:04:07 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261604.mAQG47K1030328@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #6 from bsouthey at gmail.com 2008-11-26 11:04 EST ------- Created an attachment (id=1093) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1093&action=view) Eutils/tests/unittest.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:05:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:05:14 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261605.mAQG5EUu030457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #7 from bsouthey at gmail.com 2008-11-26 11:05 EST ------- Created an attachment (id=1094) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1094&action=view) Fasta/FastaAlign.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:06:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:06:35 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261606.mAQG6ZqF030610@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #8 from bsouthey at gmail.com 2008-11-26 11:06 EST ------- Created an attachment (id=1095) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1095&action=view) FSSP/__init__.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:09:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:09:26 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261609.mAQG9QMf030939@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1095 is|0 |1 obsolete| | ------- Comment #9 from bsouthey at gmail.com 2008-11-26 11:09 EST ------- Created an attachment (id=1096) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1096&action=view) FSSP/__init__.py corrected Got the files in the wrong order. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:10:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:10:25 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261610.mAQGAP10031066@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #10 from bsouthey at gmail.com 2008-11-26 11:10 EST ------- Created an attachment (id=1097) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1097&action=view) GFF/easy.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:11:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:11:19 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261611.mAQGBJ28031191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #11 from bsouthey at gmail.com 2008-11-26 11:11 EST ------- Created an attachment (id=1098) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1098&action=view) HMM/Utilities.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:31:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:31:52 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261631.mAQGVqef001363@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #12 from bsouthey at gmail.com 2008-11-26 11:31 EST ------- Created an attachment (id=1099) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1099&action=view) Index.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:32:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:32:37 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261632.mAQGWbYF001446@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #13 from bsouthey at gmail.com 2008-11-26 11:32 EST ------- Created an attachment (id=1100) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1100&action=view) MEME/Parser.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:33:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:33:41 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261633.mAQGXfww001564@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #14 from bsouthey at gmail.com 2008-11-26 11:33 EST ------- Created an attachment (id=1101) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1101&action=view) NeuralNetwork/Gene/Pattern.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:34:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:34:41 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261634.mAQGYf0u001687@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #15 from bsouthey at gmail.com 2008-11-26 11:34 EST ------- Created an attachment (id=1102) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1102&action=view) NeuralNetwork/Gene/Schema.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:35:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:35:35 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261635.mAQGZZno001826@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #16 from bsouthey at gmail.com 2008-11-26 11:35 EST ------- Created an attachment (id=1103) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1103&action=view) NMR/NOEtools.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:36:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:36:19 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261636.mAQGaJXQ001918@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #17 from bsouthey at gmail.com 2008-11-26 11:36 EST ------- Created an attachment (id=1104) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1104&action=view) NMR/xpktools.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:37:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:37:14 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261637.mAQGbEX0002035@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #18 from bsouthey at gmail.com 2008-11-26 11:37 EST ------- Created an attachment (id=1105) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1105&action=view) Parsers/spark.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:38:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:38:42 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261638.mAQGcgvH002293@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #19 from bsouthey at gmail.com 2008-11-26 11:38 EST ------- Created an attachment (id=1106) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1106&action=view) Blast/ParseBlastTable.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:39:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:39:37 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261639.mAQGdbdC002442@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #20 from bsouthey at gmail.com 2008-11-26 11:39 EST ------- Created an attachment (id=1107) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1107&action=view) PDB/MMCIFParser.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:40:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:40:56 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261640.mAQGeuHm002669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #21 from bsouthey at gmail.com 2008-11-26 11:40 EST ------- Created an attachment (id=1108) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1108&action=view) PDB/parse_pdb_header.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:41:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:41:56 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261641.mAQGfuJj002827@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #22 from bsouthey at gmail.com 2008-11-26 11:41 EST ------- Created an attachment (id=1109) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1109&action=view) PDB/PDBList.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:42:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:42:41 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261642.mAQGgfiH002929@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #23 from bsouthey at gmail.com 2008-11-26 11:42 EST ------- Created an attachment (id=1110) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1110&action=view) PDB/PDBParser.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:43:28 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:43:28 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261643.mAQGhSbJ003019@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #24 from bsouthey at gmail.com 2008-11-26 11:43 EST ------- Created an attachment (id=1111) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1111&action=view) SubsMat/__init__.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:46:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:46:00 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261646.mAQGk0id003484@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #25 from bsouthey at gmail.com 2008-11-26 11:46 EST ------- Created an attachment (id=1112) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1112&action=view) SubsMat/FreqTable.py The two functions involved were rewritten because of the use of map(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:49:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:49:58 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811261649.mAQGnwds003938@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #26 from bsouthey at gmail.com 2008-11-26 11:49 EST ------- Created an attachment (id=1113) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1113&action=view) utils.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Nov 26 16:55:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 11:55:45 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811261655.mAQGtjPA004778@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1086 is|0 |1 obsolete| | ------- Comment #4 from bsouthey at gmail.com 2008-11-26 11:55 EST ------- Created an attachment (id=1115) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1115&action=view) Modified HotRand.hex_convert() function Hopefully the last attempt to get the right version as a patch! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Wed Nov 26 17:10:57 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 26 Nov 2008 11:10:57 -0600 Subject: [Biopython-dev] Use of depreciated string functions In-Reply-To: <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com> References: <4926D17A.8080101@gmail.com> <320fb6e00811210726n94e277ex359d93de0855045e@mail.gmail.com> <492AE8A9.1000406@gmail.com> <320fb6e00811241042g646ff65fq61d3751537c882b1@mail.gmail.com> Message-ID: <492D8321.2060301@gmail.com> Peter wrote: > On Mon, Nov 24, 2008 at 5:47 PM, Bruce Southey wrote: > >>> Once I've dealt with Biopython 1.49, I'd be happy to look at a patch >>> to remove more "import string" usage from non-obsolete, non-deprecated >>> code. It would be a little risky doing this to modules without unit >>> tests, but that's another area you've shown some interest in anyway... >>> >>> Thanks, >>> >>> Peter >>> >> Hi, >> I was planning to get started on with these depending on what time I have >> available. So just a quick question: >> Do you want one bug report per patch per file? >> Or just let me know if there is another way. >> > > I'd suggest one general bug, and uploading one patch per module - that > way the can be evaluated on a case by case basis (a single huge > multi-file patch would be more difficult, and could become out of > date). > > Personally however, I would prioritise more unit test coverage over > this, but on the other hand its the kind of short task you can handle > when you have the odd spare 10 minutes. Up to you. > > Peter > Hi, I have filed Bug 2688 as a general bug for the files in the Bio module that use the depreciated string functions. I listed all the files that I identified that imported string and whether or not I provided a patch for it. Bug 2683 lists those files that import string but do not use it. There is one attachment for each file (excluding mistakes). In addition, Bugs 2684 and 2685 were created because these involve rewritten code that was related to this. I probably should have created a separate one for SubsMat/FreqTable.py although the reason directly involves the string module. Regards Bruce From bugzilla-daemon at portal.open-bio.org Thu Nov 27 01:23:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 20:23:32 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811270123.mAR1NWWu011079@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 20:23 EST ------- As far as I can tell, the HotRand.hex_convert function is not used any more in Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3 of Bio.HotRand. So I think that we can simply deprecate this function. If there are no objections, I'll add a DeprecationWarning and use Bruce's code in the mean time until the function is removed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 03:06:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 22:06:59 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270306.mAR36xuB020451@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1088 is|0 |1 obsolete| | ------- Comment #27 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 22:06 EST ------- (From update of attachment 1088) Committed to CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 04:16:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 23:16:43 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270416.mAR4Gh40027250@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1089 is|0 |1 obsolete| | ------- Comment #28 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 23:16 EST ------- (From update of attachment 1089) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 04:29:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 23:29:01 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270429.mAR4T1tn027991@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1090 is|0 |1 obsolete| | ------- Comment #29 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 23:29 EST ------- (From update of attachment 1090) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 04:45:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 26 Nov 2008 23:45:40 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270445.mAR4jeph029067@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1091 is|0 |1 obsolete| | ------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2008-11-26 23:45 EST ------- (From update of attachment 1091) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 06:54:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 01:54:12 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811270654.mAR6sC92005762@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1092 is|0 |1 obsolete| | ------- Comment #31 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 01:54 EST ------- (From update of attachment 1092) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 09:35:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 04:35:50 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811270935.mAR9Zoxj019658@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #8 from lpritc at scri.sari.ac.uk 2008-11-27 04:35 EST ------- (In reply to comment #7) > (In reply to comment #0) > > > The major changes that have been made to the version previously available at > > http://bioinf.scri.ac.uk/lp are: > > That's a very nice contribution, thank you!!! > This link is wrong, I think you mean > http://bioinf.scri.ac.uk/lp/programs.php#genomediagram Thanks, Marco. You're absolutely correct - and people ought to be able to navigate to there from the link I gave. Thanks for posting the accurate link. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From lpritc at scri.ac.uk Thu Nov 27 09:33:43 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 27 Nov 2008 09:33:43 +0000 Subject: [Biopython-dev] blog article on GenomeDiagram in Biopython In-Reply-To: <5aa3b3570811230933n2de8af3lf31d3c4b962930a3@mail.gmail.com> Message-ID: Thanks Giovanni, On 23/11/2008 17:33, "Giovanni Marco Dall'Olio" wrote: > I thought that the inclusion of GenomeDiagrams in biopython is such an > interesting news, that I wrote a blog post on it: > - http://bioinfoblog.it/2008/11/genome-diagrams-included-in-biopython-150/ I left a comment there ;) > I have used images from some tutorials without asking, I hope it is > not a problem. No problem at all - I think the old license covered it, and I'm pretty sure that the Biopython license will, too. Even if they didn't, as the original copyright holder, I approve ;) L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Thu Nov 27 09:57:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 04:57:00 -0500 Subject: [Biopython-dev] [Bug 2381] translate and transcribe methods for the Seq object (in Bio.Seq) In-Reply-To: Message-ID: <200811270957.mAR9v0i0021623@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2381 ------- Comment #54 from lpritc at scri.sari.ac.uk 2008-11-27 04:56 EST ------- (In reply to comment #53) > (In reply to comment #27) > > Created an attachment (id=1032) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1032&action=view) [details] [details] > > Patch to Bio/Seq.py to add start codon handling to translation > > > > Patch adds a new boolean argument to the translate method and function, called > > "init" (rather than my earlier suggestions like "from_start" or "check_start" > > which could be considered misleading). [...] > I don't like the "init" argument either. I would call it force_initiator_Met > instead. BTW, non-canonical initiator codon is CUG, where did you found UUG? This may clarify things: >From the E. coli K-12 sequencing paper (http://dx.doi.org/10.1126/science.277.5331.1453): "The distribution of start codons is as follows: ATG, 3542; GTG, 612; and TTG, 130. There is also one ATT and possibly a CTG" It's not that unusual an occurrence, and there are a small number of known alternative start codons. 'Forcing' a Met start imposes the result that the first codon is a methionine, rather than checking that the first codon *could be* a methionine. I prefer the second behaviour. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 10:41:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 05:41:18 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271041.mARAfITj025395@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1093 is|0 |1 obsolete| | ------- Comment #32 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 05:41 EST ------- (From update of attachment 1093) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 10:46:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 05:46:57 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271046.mARAkv9t025868@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1094 is|0 |1 obsolete| | ------- Comment #33 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 05:46 EST ------- (From update of attachment 1094) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 11:08:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 06:08:30 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271108.mARB8U6n027821@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1096 is|0 |1 obsolete| | ------- Comment #34 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 06:08 EST ------- (From update of attachment 1096) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 11:14:18 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 06:14:18 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811271114.mARBEI5w028329@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1097 is|0 |1 obsolete| | ------- Comment #35 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 06:14 EST ------- (From update of attachment 1097) Committed to CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Thu Nov 27 13:09:43 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 27 Nov 2008 05:09:43 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <20081125144041.GC83220@sobchak.mgh.harvard.edu> Message-ID: <45956.75241.qm@web62406.mail.re1.yahoo.com> > > However, more than half of Biopython's tests do > > not actually make use of this testing framework: > > [...] > > These tests have trivial output, for example > test_Cluster: > > > > test_Cluster > > test_clusterdistance (test_Cluster.TestCluster) ... ok > > test_distancematrix_kmedoids > > (test_Cluster.TestCluster) ... ok > > test_kcluster (test_Cluster.TestCluster) ... ok > > test_matrix_parse (test_Cluster.TestCluster) ... ok > > test_median_mean (test_Cluster.TestCluster) ... ok > > test_somcluster (test_Cluster.TestCluster) ... ok > > test_treecluster (test_Cluster.TestCluster) ... ok > > They really do make use of the framework, but at a higher > level. I agree that if you run a single test it makes little > difference whether you use 'run_tests.py test_Cluster' or just > run 'test_Cluster.py' directly. However, when you are > running all the tests as is regular done in development > or before pushing releases, this comparison is important. It > will pick out if you get a line like: > > test_clusterdistance (test_Cluster.TestCluster) ... ERROR > > instead of the expected ok and report this in the summary > for all of the tests. Otherwise this is likely to get lost > in all of the results. Actually, I never use the summary produced by run_tests.py. I just check which tests failed, and then fix them one by one by running the individual test scripts. > > I would therefore like to suggest to move from > > Biopython's testing framework to Python's testing > > framework. This also relieves us of the > > task of explaining Biopython's testing framework > > to contributors, and allows us to make better use > > of what Python already provides. ... > Is the testing framework you are proposing different from > the unit tests used the individual tests? I am proposing to use the regular Python unit testing framework as it is. This means that most Biopython tests do not change at all (or only trivially). The run_tests.py script will need to be modified though to remove the requirement of having an output file for each individual test. > How does your proposed > manage the higher level functionality of checking if all sub-tests > within one of the test suites passes? If one of the sub-tests fails, Python's unit testing framework will tell us so, though (perhaps) not exactly which sub-test fails. However, that is easy to figure out just by running the individual test script by itself. --Michiel From bugzilla-daemon at portal.open-bio.org Thu Nov 27 13:33:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 08:33:46 -0500 Subject: [Biopython-dev] [Bug 2683] Modules with unused string modules In-Reply-To: Message-ID: <200811271333.mARDXkHx009514@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2683 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 08:33 EST ------- Fixed in CVS, thanks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 14:38:04 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 09:38:04 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811271438.mAREc4IG018238@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #9 from lpritc at scri.sari.ac.uk 2008-11-27 09:38 EST ------- The revised color/colour code in AbstractDrawer.py causes all bar charts in linear diagrams to be the default colour of light green. A fixed version of AbstractDrawer is provided as an attachment. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Nov 27 14:39:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 09:39:37 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200811271439.mAREdbXp018415@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #10 from lpritc at scri.sari.ac.uk 2008-11-27 09:39 EST ------- Created an attachment (id=1121) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1121&action=view) Revised AbstractDrawer.py This revision fixes a behaviour where bar charts for linear diagrams cannot be changed from tehir defautl colour. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 01:33:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 20:33:56 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280133.mAS1XuXq002406@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1098 is|0 |1 obsolete| | ------- Comment #36 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 20:33 EST ------- (From update of attachment 1098) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 01:52:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 20:52:10 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280152.mAS1qAR3003698@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1099 is|0 |1 obsolete| | ------- Comment #37 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 20:52 EST ------- (From update of attachment 1099) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 02:27:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 21:27:29 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280227.mAS2RTea005795@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #38 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 21:27 EST ------- (From update of attachment 1100) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 02:27:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 21:27:47 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280227.mAS2RlEg005835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1100 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 02:55:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 21:55:11 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280255.mAS2tBTL007510@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1101 is|0 |1 obsolete| | ------- Comment #39 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 21:55 EST ------- (From update of attachment 1101) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 03:02:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 22:02:25 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280302.mAS32Pxh008177@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1102 is|0 |1 obsolete| | ------- Comment #40 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 22:02 EST ------- (From update of attachment 1102) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 04:08:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:08:57 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280408.mAS48vaq012054@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1103 is|0 |1 obsolete| | ------- Comment #41 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:08 EST ------- (From update of attachment 1103) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 04:16:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:16:29 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280416.mAS4GThb012692@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1104 is|0 |1 obsolete| | ------- Comment #42 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:16 EST ------- (From update of attachment 1104) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 04:22:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:22:37 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280422.mAS4MbVR013025@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1105 is|0 |1 obsolete| | ------- Comment #43 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:22 EST ------- (From update of attachment 1105) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 04:50:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 27 Nov 2008 23:50:59 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280450.mAS4oxjC014450@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1106 is|0 |1 obsolete| | ------- Comment #44 from mdehoon at ims.u-tokyo.ac.jp 2008-11-27 23:50 EST ------- (From update of attachment 1106) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 05:07:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 00:07:15 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280507.mAS57F3P015386@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1107 is|0 |1 obsolete| | ------- Comment #45 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 00:07 EST ------- (From update of attachment 1107) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 08:48:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 03:48:30 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811280848.mAS8mUmr028058@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1108 is|0 |1 obsolete| | ------- Comment #46 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 03:47 EST ------- (From update of attachment 1108) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 10:07:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:07:05 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281007.mASA751F001103@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1109 is|0 |1 obsolete| | ------- Comment #47 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:07 EST ------- (From update of attachment 1109) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 10:22:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:22:13 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281022.mASAMDwt002023@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1110 is|0 |1 obsolete| | ------- Comment #48 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:22 EST ------- (From update of attachment 1110) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 10:29:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:29:16 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281029.mASATGhi002380@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1111 is|0 |1 obsolete| | ------- Comment #49 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:29 EST ------- (From update of attachment 1111) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 10:29:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:29:39 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281029.mASATdU5002440@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1112 is|0 |1 obsolete| | ------- Comment #50 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:29 EST ------- (From update of attachment 1112) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 10:30:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 05:30:23 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281030.mASAUNDX002501@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1113 is|0 |1 obsolete| | ------- Comment #51 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 05:30 EST ------- (From update of attachment 1113) Fixed in CVS -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 28 11:09:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 11:09:30 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <45956.75241.qm@web62406.mail.re1.yahoo.com> References: <20081125144041.GC83220@sobchak.mgh.harvard.edu> <45956.75241.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> Hello all, Sorry for not replying earlier - I've been travelling and didn't get to check my email as often as I had hoped. I'm going to reply to several points in this one email... Marco wrote: > I was also proposing to use the doctest framework for some of the > modules, and for enhancing documentation. > http://bugzilla.open-bio.org/show_bug.cgi?id=2640 As Marco points out, there is also the option of using doctest, which were doing in some of the unit tests (e.g. test_wise.py). I like the idea of using doctest were we want to include examples in the docstrings anyway. Marco wasn't suggesting this, but just to be clear, I don't think we should use JUST doctest for all our unit tests. Many test cases would make misleading documentation, and also having lots and lots of doctest examples would also hide the important parts of the documentation. Additionally, doctests using input files are not straightforward due to path issues. Brad wrote: > Agreed with the distinction between the unit tests and the "dump > lots of text and compare" approach. I've written both and do think > the unit testing/assertion model is more robust since you can go > back and actually get some insight into what someone was thinking > when they wrote an assertion. I have probably written more of the "dump lots of text and compare" style tests. I think these have a number of advantages: (1) Easier for beginneers to write a test, you can almost take any example script and use that. You don't have to learn the unit test framework. (2) Debugging a failing test in IDLE is much easier - using unit tests you have all that framework between you and the local scope where the error happens. (3) For many broad tests, manually setting up the expected output for an assert is extremely tedious (e.g. parsing sequences and checking their checksums). We could discuss a modification to run_tests.py so that if there is no expected output file output/test_XXX for test_XXX.py we just run test_XXX.py and check its return value (I think Michiel had previously suggested something like this). Perhaps for more robustness, capture the output and compare it to a predefined list of regular expressions covering the typical outputs. For example, looking at output/test_Cluster, the first line is the test name, but rest follows the patten "test_... ok". I imaging only a few output styles exist. With such a change, half the unit test's (e.g. test_Cluster.py) wouldn't need their output file in CVS (output/test_Cluster). Michiel de Hoon wrote: > If one of the sub-tests fails, Python's unit testing framework will tell us so, > though (perhaps) not exactly which sub-test fails. However, that is easy to > figure out just by running the individual test script by itself. That won't always work. Consider intermittent network problems, or tests using random data - in general it really is worthwhile having run_tests.py report a little more than just which test_XXX.py module failed. Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 28 11:53:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 06:53:36 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811281153.mASBra4q008163@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #52 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 06:53 EST ------- Although I had offered to look over the patches, it looks like Michiel has reviewed and committed them all while I was away, so I don't have to ;) Thank you both! Can we close this bug now? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 11:57:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 06:57:35 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811281157.mASBvZ6A008475@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 06:57 EST ------- (In reply to comment #5) > As far as I can tell, the HotRand.hex_convert function is not used any more in > Bio.HotRand or anywhere else in Biopython; its usage was lost in revision 1.3 > of Bio.HotRand. So I think that we can simply deprecate this function. If there > are no objections, I'll add a DeprecationWarning and use Bruce's code in the > mean time until the function is removed. +1 on this plan. (I was going to say we should deprecate this function rather than removing it, but you'd already covered that). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 12:05:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 07:05:14 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811281205.mASC5EY8009077@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 07:05 EST ------- (In reply to comment #7) > (In reply to comment #6) > > (From update of attachment 1072 [details] [details]) > > I think this is still a big improvement, but that the > > (sub)feature.location_operator issue could wait. We'll > > need to discuss on the > > BioSQL mailing list how this should be handled consistently. > > > > Leaving this bug open. > > Further to the "where to put the (sub)feature.location_operator" (eg. "join", > "order") question, this comment appears in the BioPerl MySQL schema for the > location_qualifier_value table: > > -- location qualifiers - mainly intended for fuzzies but anything > -- can go in here > -- some controlled vocab terms have slots; > > So, this would seem a suitable place to store the attribute. > Yes, but if we record something in the location_qualifier_value table we can't use a NULL term_id (possibly a schema limitation). We therefore need to use a particular ontology, which is where some co-ordination with the other BioSQL projects is needed (so that we all default to the same ontology). I'd meant to send of an email about this to the BioSQL mailing list but didn't get it done before I had to leave for a trip. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Nov 28 12:24:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 07:24:19 -0500 Subject: [Biopython-dev] [Bug 2684] GenBank/__init__.py: Removing loop over string.whitespace In-Reply-To: Message-ID: <200811281224.mASCOJSg010226@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2684 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 07:24 EST ------- Marking as fixed - I've checked in a simplified version of your patch. See Bio/GenBank/__init__.py revision 1.98 in CVS. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython Thanks Bruce. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Nov 28 12:37:04 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 12:37:04 +0000 Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <580790.81356.qm@web62404.mail.re1.yahoo.com> References: <580790.81356.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon wrote: >>>> from Bio import Entrez >>>> handle = Entrez.elink(dbfrom='pubmed',id=12345) >>>> record = Entrez.read(handle) > > Feel free to write a section about Entrez.elink for the Biopython documentation :-). > Currently, this section is almost empty. This does need a little love, doesn't it. Here is a slightly longer example which could form the basis of a tutorial entry: >>> from Bio import Entrez >>> Entrez.email = "A.N.Other at example.com" >>> pmid = "12230038" >>> handle = Entrez.elink(dbfrom='pubmed', id=pmid) >>> result = Entrez.read(handle) >>> for link in result[0]["LinkSetDb"][0]['Link'] : ... print link The deeply nested nature of the XML results do suggest that a helper function in Bio.Entrez would be useful here. Maybe something like: def find_related(dbfrom, id) : #Returns a list of dictionaries containing Score and ID matched result = read(elink(dbfrom=dbfrom, id=id)) return result[0]["LinkSetDb"][0]['Link'] It might make more sense to return just a list of ID strings, but the score may be interesting. Peter From biopython at maubp.freeserve.co.uk Fri Nov 28 13:05:38 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 13:05:38 +0000 Subject: [Biopython-dev] Bio.Entrez batched downloads Message-ID: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com> This is returning to a topic we've discussed in the past - the NCBI Entrez API is quite low level, and the Bio.Entrez module reflects this. As a result certain "typical" tasks require more code than one might expect. In particular, batched downloads of a large result set. The tutorial covers using Bio.Entrez.efetch in a loop to download a result set in a batch, for example writing out a MedLine or FASTA format file. This seems like a common need - starting either from a list of IDs, or better from a history webenv and query_key. I think there is a use for a Bio.Entrez.batched_efetch or download_many function to save people re-implementing their own batched downloader (even just as a copy and paste from the tutorial). If the NCBI every give any explicit guidance on batch sizes then we can update Biopython centrally - rather than individual scripts requiring changes everywhere. We might also be able to include some basic error checking to (e.g. empty or partial downloads). One catch is that downloading and concatenating batches as XML files does not give a valid XML file - but this is safe for MedLine, FASTA, GenBank etc. This proposed function could raise an exception if used with XML to avoid this issue. In terms of the API for getting the data back, there are several options * Take an output handle as an argument (which would be written to as each batch was downloaded) * Return a handle - the implementation would be a bit more complicated as we should avoid holding everything in memory, but would then be very similar to the existing Bio.Entrez.efetch function in its usage. Other options which I don't like: * Take an output filename (less flexible than just taking an output handle) * Return the data as a string (memory concerns with large downloads) Note that related functions like the deprecated Bio.PubMed.download_many (and early versions of Bio.GenBank.download_many) used a complicated function call back mechanism (which required knowing the file format in advance and having a parser for it). This doesn't seem sensible for a generic function. Currently Bio.GenBank.download_many (obsolete, soon to be deprecated) just makes a single call to Bio.Entrez.efetch, regardless of the number of records / amount of data expected. Peter From biopython at maubp.freeserve.co.uk Fri Nov 28 17:26:45 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 28 Nov 2008 17:26:45 +0000 Subject: [Biopython-dev] Deprecation and removal policy Message-ID: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> Back on 27 June 2008, in preparation for what became Biopython 1.47, Michiel wrote: > In recent releases, we have been using the rule of thumb to remove all > modules from a new Biopython release that were deprecated two > releases ago. I was thinking that when we made releases about six months apart, this rule of thumb effectively gave a year's warning. Recently we're made releases roughly every three months, which translates to only about six months warning, so I think we should be a little more restrained in removing deprecated code in future. As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in Release 1.48 (Sept 2009). Under the old rule of thumb, we could remove this module from CVS now (as the deprecation was present in Biopython 1.48 and 1.49). If we release Biopython 1.50 in January or February 2009 (for the sake of argument), that means the deprecation would have been in place for only four or five months - which seems too rash. How about a new policy that after adding a deprecation warning, deprecated modules/functions are kept for at least two public releases AND at least 12 months (counting from the first release when they are deprecated - not the date of the CVS change) before being removed? Peter From bugzilla-daemon at portal.open-bio.org Fri Nov 28 20:10:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 15:10:43 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200811282010.mASKAhuK012846@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2008-11-28 15:10 EST ------- (In reply to comment #8) > Yes, but if we record something in the location_qualifier_value table we can't > use a NULL term_id (possibly a schema limitation). We therefore need to use a > particular ontology, which is where some co-ordination with the other BioSQL > projects is needed (so that we all default to the same ontology). I'd meant > to send of an email about this to the BioSQL mailing list but didn't get it > done before I had to leave for a trip. I've started a discussion on the BioSQL mailing list, see this thread: http://lists.open-bio.org/pipermail/biosql-l/2008-November/001412.html - me http://lists.open-bio.org/pipermail/biosql-l/2008-November/001414.html - Richard from BioJava http://lists.open-bio.org/pipermail/biosql-l/2008-November/001413.html - me etc. Cymon - if you haven't already done so, I would encourage you to sign up to the BioSQL mailing list. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 29 04:48:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 28 Nov 2008 23:48:46 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811290448.mAT4mkmI008416@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 ------- Comment #53 from mdehoon at ims.u-tokyo.ac.jp 2008-11-28 23:48 EST ------- (In reply to comment #52) > Can we close this bug now? > Not yet, there are a few more things to consider in the original description. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Nov 29 05:01:12 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 29 Nov 2008 00:01:12 -0500 Subject: [Biopython-dev] [Bug 2685] HotRand provides an unnecessary function to convert hex to integer In-Reply-To: Message-ID: <200811290501.mAT51ClZ009532@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2685 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #7 from mdehoon at ims.u-tokyo.ac.jp 2008-11-29 00:01 EST ------- I used Bruce's patch and added a DeprecationWarning to the hex_convert function, and modified the unit test accordingly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Nov 29 05:13:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 28 Nov 2008 21:13:33 -0800 (PST) Subject: [Biopython-dev] Bio.Entrez batched downloads In-Reply-To: <320fb6e00811280505m3b065877r99785f306a356aa@mail.gmail.com> Message-ID: <432417.5854.qm@web62405.mail.re1.yahoo.com> Sorry, but I am -1 on this. This sounds like software bloat to me. The reason that the NCBI Entrez API is low level is that they are unable to predict how users will want to use the NCBI Entrez. We as Biopython know little more than NCBI, except that our users want to access NCBI Entrez via Python, so we provide a Python interface to NCBI Entrez. Also, I don't think that the current situation is unsatisfactory. The Bio.Entrez API is extremely simple, and with an example in the tutorial it should be very easy to use; I don't see a problem with copying and pasting from the tutorial, provided that sufficient information is available there. --Michiel. --- On Fri, 11/28/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] Bio.Entrez batched downloads > To: "BioPython-Dev Mailing List" > Date: Friday, November 28, 2008, 8:05 AM > This is returning to a topic we've discussed in the past > - the NCBI > Entrez API is quite low level, and the Bio.Entrez module > reflects > this. As a result certain "typical" tasks > require more code than one > might expect. In particular, batched downloads of a large > result set. > > The tutorial covers using Bio.Entrez.efetch in a loop to > download a > result set in a batch, for example writing out a MedLine or > FASTA > format file. This seems like a common need - starting > either from a > list of IDs, or better from a history webenv and query_key. > I think > there is a use for a Bio.Entrez.batched_efetch or > download_many > function to save people re-implementing their own batched > downloader > (even just as a copy and paste from the tutorial). > > If the NCBI every give any explicit guidance on batch sizes > then we > can update Biopython centrally - rather than individual > scripts > requiring changes everywhere. We might also be able to > include some > basic error checking to (e.g. empty or partial downloads). > One catch > is that downloading and concatenating batches as XML files > does not > give a valid XML file - but this is safe for MedLine, > FASTA, GenBank > etc. This proposed function could raise an exception if > used with XML > to avoid this issue. > > In terms of the API for getting the data back, there are > several options > * Take an output handle as an argument (which would be > written to as > each batch was downloaded) > * Return a handle - the implementation would be a bit more > complicated > as we should avoid holding everything in memory, but would > then be > very similar to the existing Bio.Entrez.efetch function in > its usage. > > Other options which I don't like: > * Take an output filename (less flexible than just taking > an output handle) > * Return the data as a string (memory concerns with large > downloads) > > Note that related functions like the deprecated > Bio.PubMed.download_many (and early versions of > Bio.GenBank.download_many) used a complicated function call > back > mechanism (which required knowing the file format in > advance and > having a parser for it). This doesn't seem sensible > for a generic > function. Currently Bio.GenBank.download_many (obsolete, > soon to be > deprecated) just makes a single call to Bio.Entrez.efetch, > regardless > of the number of records / amount of data expected. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mjldehoon at yahoo.com Sat Nov 29 05:22:10 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 28 Nov 2008 21:22:10 -0800 (PST) Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> Message-ID: <246349.44664.qm@web62404.mail.re1.yahoo.com> > The deeply nested nature of the XML results do suggest that > a helper function in Bio.Entrez would be useful here. Maybe > something like: > > def find_related(dbfrom, id) : > #Returns a list of dictionaries containing Score and ID > # matched > result = read(elink(dbfrom=dbfrom, id=id)) > return result[0]["LinkSetDb"][0]['Link'] > > It might make more sense to return just a list of ID > strings, but the score may be interesting. > The problem this user encountered was that the DeprecationWarning in PubMed.find_related function contained very little information and did not mention that Entrez.elink is the appropriate function to use: "Find related articles in PubMed, returns an ID list (DEPRECATED). Please use Bio.Entrez instead as described in the Biopython Tutorial." and in addition that currently the description of Bio.Entrez.elink in the tutorial is almost empty. Instead of adding a function to Bio.Entrez that helps this particular user, we should improve our documentation to enable all users to use Bio.Entrez appropriately. The set of helper functions to Bio.Entrez that we could write is virtually endless; we should not go down that path. --Michiel. From bugzilla-daemon at portal.open-bio.org Sat Nov 29 06:02:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 29 Nov 2008 01:02:01 -0500 Subject: [Biopython-dev] [Bug 2688] Removal of depreciated string functions In-Reply-To: Message-ID: <200811290602.mAT621Lc012846@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2688 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #54 from mdehoon at ims.u-tokyo.ac.jp 2008-11-29 01:02 EST ------- All fixed now; I hope I didn't screw up anything. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Nov 29 07:04:33 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 28 Nov 2008 23:04:33 -0800 (PST) Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> Message-ID: <652169.76582.qm@web62406.mail.re1.yahoo.com> I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks! --Michiel. --- On Fri, 11/28/08, Peter wrote: > From: Peter > Subject: Re: [BioPython] PubMed find_related > To: mjldehoon at yahoo.com > Cc: "BioPython-Dev Mailing List" > Date: Friday, November 28, 2008, 7:37 AM > On Tue, Nov 25, 2008 at 4:05 AM, Michiel de Hoon > wrote: > >>>> from Bio import Entrez > >>>> handle = > Entrez.elink(dbfrom='pubmed',id=12345) > >>>> record = Entrez.read(handle) > > > > Feel free to write a section about Entrez.elink for > the Biopython documentation :-). > > Currently, this section is almost empty. > > This does need a little love, doesn't it. Here is a > slightly longer > example which could form the basis of a tutorial entry: > > >>> from Bio import Entrez > >>> Entrez.email = > "A.N.Other at example.com" > >>> pmid = "12230038" > >>> handle = > Entrez.elink(dbfrom='pubmed', id=pmid) > >>> result = Entrez.read(handle) > >>> for link in > result[0]["LinkSetDb"][0]['Link'] : > ... print link > > The deeply nested nature of the XML results do suggest that > a helper > function in Bio.Entrez would be useful here. Maybe > something like: > > def find_related(dbfrom, id) : > #Returns a list of dictionaries containing Score and ID > matched > result = read(elink(dbfrom=dbfrom, id=id)) > return > result[0]["LinkSetDb"][0]['Link'] > > It might make more sense to return just a list of ID > strings, but the > score may be interesting. > > Peter From biopython at maubp.freeserve.co.uk Sat Nov 29 13:36:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 29 Nov 2008 13:36:16 +0000 Subject: [Biopython-dev] [BioPython] PubMed find_related In-Reply-To: <246349.44664.qm@web62404.mail.re1.yahoo.com> References: <320fb6e00811280437w8f9f3d2t84716f7a554b913@mail.gmail.com> <246349.44664.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00811290536n7fe25b0fxfe78d52b16014a92@mail.gmail.com> On Sat, Nov 29, 2008 at 5:22 AM, Michiel de Hoon wrote: > > The problem this user encountered was that the DeprecationWarning in > PubMed.find_related function contained very little information and did > not mention that Entrez.elink is the appropriate function to use: > > "Find related articles in PubMed, returns an ID list (DEPRECATED). > Please use Bio.Entrez instead as described in the Biopython Tutorial." We could make the deprecation warnings from Bio.PubMed (and the online bits of Bio.GenBank) a little more explicit about which bits of Bio.Entrez to use. I made a start on updating Bio/PubMed.py on my work computer on Friday, so I'll try to remember to finish this off on Monday. > and in addition that currently the description of Bio.Entrez.elink in the > tutorial is almost empty. Instead of adding a function to Bio.Entrez > that helps this particular user, we should improve our documentation > to enable all users to use Bio.Entrez appropriately. The tutorial update for elink looks good (see below). > The set of helper functions to Bio.Entrez that we could write is > virtually endless; we should not go down that path. I take your point - there are lots of possible helper functions we could consider. As long as we cover the typical use cases in the tutorial that should be enough. On Sat, Nov 29, 2008 at 7:04 AM, Michiel de Hoon wrote: > I've expanded your example a bit and added it to the documentation of Entrez.elink. Thanks! > > --Michiel. That looks good - and tries to explain the nested result structure too. Peter