From biopython at maubp.freeserve.co.uk Mon Dec 1 07:56:12 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 1 Dec 2008 12:56:12 +0000 Subject: [Biopython-dev] Deprecation and removal policy In-Reply-To: References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> Message-ID: <320fb6e00812010456r9ae1a66p66032d02377003db@mail.gmail.com> Peter wrote: >> ... >> How about a new policy that after adding a deprecation warning, >> deprecated modules/functions are kept for at least two public releases >> AND at least 12 months (counting from the first release when they are >> deprecated - not the date of the CVS change) before being removed? Bruce wrote: > > Hi, > Generally I would agree with idea for code that is under active > development. For certain code that has not really been touched for a > few years except for trivial changes (like removing string functions), > I think 12 months is perhaps too long if it passes two releases. Just because some (deprecated) code hasn't been changed in several years doesn't mean no-one is using it. Giving less warning for removing such old but stable code isn't fair. > Regardless of how it is done, Python 3 will need to be supported (the > final release is due soon) and I do not see a reason to port > depreciated modules or functions just because of some policy. So I > would add the provision that depreciated code will not be ported to > the Python 3 compatible Biopython branch. I disagree - dropping old modules is changing the API, counter to Guido and other's recommendation/request: "Don't change your APIs incompatibly when porting to Py3k." http://www.artima.com/weblogs/viewpost.jsp?thread=227041 If porting any particular deprecated module or piece of code to Python 3 proved too difficult, then maybe we might drop that code (for example, due to third party dependencies on an obsolete version of mxTextTools, I don't think we'll port Martel/Mindy to Python 3). Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 1 10:36:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:36:33 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011536.mB1FaXWF003857@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:36 EST ------- Unit Test ========= The unit test included, test_GenomeDiagram.py adds yet another GenBank file to the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB is best avoided. I would prefer we used existing GenBank files already included in Biopython which would serve just as well. e.g. GenBank/NC_005816.gb file (Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1) which is circular. 9609 bp. GenBank/arab1.gb (Arabidopsis thaliana BAC T25K16 from chromosome I) which is linear. 86436 bp. Also, the code to parse the GenBank file does so via Bio.GenBank, and I would prefer to use Bio.SeqIO here. I'll attach a revised version shortly... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 10:40:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:40:22 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200812011540.mB1FeMWx004105@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:40 EST ------- Bio.Graphics.GenomeDiagram.Utilities ==================================== This is a collection of utilities for getting information useful for graph values. From the docstring, o apply_to_window (sequence, window_size, function, step=None) Apply a passed function to fragments of the passed sequence of size window_size, with each window separated by the passed step. o calc_gc_content (sequence) Returns the %GC content of a passed sequence o calc_at_content (sequence) Returns the %AT content of a passed sequence o calc_gc_skew (sequence) Returns the GC skew of a passed sequence o calc_at_skew (sequence) Returns the AT skew of a passed sequence o gc_content (sequence, window_size, step=None) Returns the %GC content of a passed sequence in windows of the passed size, separated by the passed step size o at_content (sequence, window_size, step=None) Returns the %AT content of a passed sequence in windows of the passed size, separated by the passed step size o gc_skew (sequence, window_size, step=None) Returns the GC skew of a passed sequence in windows of the passed size, separated by the passed step size o at_skew (sequence, window_size, step=None) Returns the AT skew of a passed sequence in windows of the passed size, separated by the passed step size I can see why these were useful when GenomeDiagram was a separate package, but I don't think we should add this file to Biopython as it is unnecessary code duplication. If we do lack any of this functionality, putting it somewhere under Bio.SeqUtils makes more sense than under Bio.Graphics. I have not looked at any implications this may have for the existing documentation or the GenomeDiagram unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 10:47:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:47:01 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011547.mB1Fl1qY004683@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:47 EST ------- Bio.Graphics.GenomeDiagram.DrawAll ================================== According to the comments, this is a script to walk a directory structure below the directory passed, and draw images of each .gbk file found there. While useful, I don't think this belongs in the core library. Maybe rename it and move it into our scripts or example directory instead... Bio.Graphics.GenomeDiagram.Utilities ==================================== This is a collection of utilities for getting information useful for graph values. From the docstring, o apply_to_window (sequence, window_size, function, step=None) Apply a passed function to fragments of the passed sequence of size window_size, with each window separated by the passed step. o calc_gc_content (sequence) Returns the %GC content of a passed sequence o calc_at_content (sequence) Returns the %AT content of a passed sequence o calc_gc_skew (sequence) Returns the GC skew of a passed sequence o calc_at_skew (sequence) Returns the AT skew of a passed sequence o gc_content (sequence, window_size, step=None) Returns the %GC content of a passed sequence in windows of the passed size, separated by the passed step size o at_content (sequence, window_size, step=None) Returns the %AT content of a passed sequence in windows of the passed size, separated by the passed step size o gc_skew (sequence, window_size, step=None) Returns the GC skew of a passed sequence in windows of the passed size, separated by the passed step size o at_skew (sequence, window_size, step=None) Returns the AT skew of a passed sequence in windows of the passed size, separated by the passed step size I can see why these were useful when GenomeDiagram was a separate package, but I don't think we should add this file to Biopython as it is unnecessary code duplication. If we do lack any of this functionality, putting it somewhere under Bio.SeqUtils makes more sense than under Bio.Graphics. I have not looked at any implications this may have for the existing documentation or the GenomeDiagram unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 10:49:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:49:14 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200812011549.mB1FnEB8004888@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:49 EST ------- (In reply to comment #10) > Bio.Graphics.GenomeDiagram.Utilities > ==================================== > This is a collection of utilities for getting information useful for graph > values. From the docstring, ... Sorry - ignore this comment, it should have been on Bug 2671. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 10:51:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:51:19 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011551.mB1FpJNU005019@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #13 from lpritc at scri.sari.ac.uk 2008-12-01 10:51 EST ------- (In reply to comment #11) > Unit Test > ========= > The unit test included, test_GenomeDiagram.py adds yet another GenBank file to > the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB > is best avoided. I would prefer we used existing GenBank files already > included in Biopython which would serve just as well. That's a good idea. > Also, the code to parse the GenBank file does so via Bio.GenBank, and I would > prefer to use Bio.SeqIO here. I noticed that in revising the documentation, but hadn't got around to doing anything about it, except in the example code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 10:59:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:59:35 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011559.mB1FxZwH005670@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #14 from lpritc at scri.sari.ac.uk 2008-12-01 10:59 EST ------- (In reply to comment #12) > Bio.Graphics.GenomeDiagram.DrawAll > ================================== > According to the comments, this is a script to walk a directory structure below > the directory passed, and draw images of each .gbk file found there. > > While useful, I don't think this belongs in the core library. Maybe rename it > and move it into our scripts or example directory instead... Ah. I thought I'd left that one out. I was picturing perhaps having a Utilities.py module containing a function with that behaviour, and/or functions that drew a standard representation of a GenBank file, so that those who are not interested in the minutiae of the API/drawing their diagrams could still get a fair amount of function for little effort. On reflection, these functions are perhaps better suited to living in __init__.py. What do you think? > Bio.Graphics.GenomeDiagram.Utilities > ==================================== > This is a collection of utilities for getting information useful for graph > values, > I can see why these were useful when GenomeDiagram was a separate package, but > I don't think we should add this file to Biopython as it is unnecessary code > duplication. If we do lack any of this functionality, putting it somewhere > under Bio.SeqUtils makes more sense than under Bio.Graphics. Where there is repetition of function here, I'm happy to go with established Biopython code in preference. For graph data, GenomeDiagram expects a list of (position, value) tuples, which the functions in Utilities.py supply directly. There will be a level of user-processing required in moving to the Biopython versions. Perhaps the inclusion of similar functions in __init__ that wrap the Biopython versions to produce the appropriate format for graphs would be useful here? > I have not looked at any implications this may have for the existing > documentation or the GenomeDiagram unit test. Removing Utilities.py outright will affect both the documentation and the unit test. Both require those functions (or something similar) to generate test/example graph data. I would be happy to replace the existing functions with wrapped Biopython functions in __init__ - does this seem like a sensible option? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 11:59:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 11:59:50 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011659.mB1GxoGa009013@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1063 is|0 |1 obsolete| | Attachment #1121 is|0 |1 obsolete| | ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 11:59 EST ------- Created an attachment (id=1132) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1132&action=view) Zip of python files to go under Bio/Graphics/GenomeDiagram This attachment is just the main python files, omitting DrawAll.py and Utilities.py (see comment 12 and comment 14). The unit test needs updating to match (but then passes, updated version to follow). (In reply to comment #0) > Code for wx widgets has been removed, although the Observer/Observable code > remains, allowing user widgets to hook into the code, if that's desirable. There was a tiny bit of wx stuff still there in Diagram.py which I have removed in this version. After discussion with Leighton directly, due to possible uncertainly over the licensing of the Observer/Observable code (originally based on an example by Peter Norvig) this has been removed, together with the associated "set" methods in Diagram.py etc. This code was intended to assist using GenomeDiagram within a GUI. Note that if we later want to reintroduce this functionality, using python's property feature (with get/set functions) would allow the set function to update the observer. Leighton's old code would only update the observer if the set method was used explicitly (and not if the object property were updated directly). (In reply to comment #6) > I am perfectly happy with re-licensing the GD code under the Biopython > license. If you need a gpg-signed document to say so, I can provide one ;) I've updated the header of each file to reflect the Biopython license. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 12:20:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 12:20:57 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011720.mB1HKvIJ010157@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 12:20 EST ------- Created an attachment (id=1133) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1133&action=view) Revised test_GenomeDiagram.py This uses the existing GenBank/arab1.gb file for input. It also includes a (slightly modified) copy of the GenomeDiagram.Utilities functions as a short term solution to the issues raised in comment 12 and comment 14. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:01:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 15:01:44 -0500 Subject: [Biopython-dev] [Bug 2693] New: LogisticRegression convergence criterion is too lenient Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2693 Summary: LogisticRegression convergence criterion is too lenient Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com In R and SAS, the example in the code and tutorial provides the following parameters: Intercept = 18.9622 x1 = -0.0714 x2 = 0.0444 By default, Bio/LogisticRegression.py defines the following parameters MAX_ITERATIONS = 500 CONVERGE_THRESHOLD = 0.01 The convergence threshold is too lenient so the iterations terminate before the expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD = 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS is greater than 7761 with my system. MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within Bio/LogisticRegression.py module but should be part of the API for the train function such as: def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD = 0.000000001, MAX_ITERATIONS=10000): Note the algorithm used requires a large number of iterations and the train function does not display the degree of convergence attained when MAX_ITERATIONS is exceeded. Jeffrey Whitaker provides Python code using an alternative algorithm: http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py Furthermore, the update_fn should also pass the previous likelihood or difference is likelihood so the actual convergence can be seen. Really the update_fn should be more general than this and be able to display more information but the attached patches provides the previous llh (old_llik). def show_progress(iteration, old_llh, loglikelihood): print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:", loglikelihood, "Diff:", (old_llh-loglikelihood) model = LogisticRegression.train(xs, ys, update_fn=show_progress) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:03:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 15:03:27 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200812012003.mB1K3Rqg017974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #1 from bsouthey at gmail.com 2008-12-01 15:03 EST ------- Created an attachment (id=1134) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1134&action=view) Improvements to LogisticRegression.py Addresses certain problems with LogisticRegression.py and enhances the module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Mon Dec 1 15:53:59 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 1 Dec 2008 21:53:59 +0100 Subject: [Biopython-dev] [BioPython] Refactoring motif analysis code In-Reply-To: <492ACE38.1090301@gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> Message-ID: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> Hi all, I've done some work regarding the motif analysis in Biopython. I've done the following stuff: - refactored the Bio.AlignAce and Bio.MEME to use one common motif object - Put all of the refactored code in the Bio.Motif directory - Added more code (from my attic) to do motif comparisons and computing thresholds (this was actually written by my colleague Norbert Dojer, but I adapted it and I have his permission to contribute the code) - written a short tutorial on the usage of Bio.Motif (that's where I'd put it). - Written a basic test suite for the new motif. I haven't added it to cvs yet, but posted it as an attchment to the enhancement proposal in bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2694 I have cvs access, so I can commit the changes myself, but I'd like to wait for an "OK" from someone more involved in the release process. Since Giovanni and Bruce have responded to my previous call for comments, I'll try to answer them below: On Mon, Nov 24, 2008 at 4:54 PM, Bruce Southey wrote: > > Actually I am not that thrilled with the licenses for these packages and > similar packages because these are free only for academic use. To me this > clashes with the spirit of an open-sourced project especially a BSD-licensed > one. But if there is a need for such modules then these modules should be > included. > I have similar feelings about the "academic-use-only" licenses. On the other hand, since most of the biopython users are in academia, then I don't see it as a big problem. Also, since I don't have any truly open and free replacement for these programs, I think it's better to keep them. In fact the new Bio.Motif package provides some methods for motif comparisons, which at least to some extent can be used as a replacement for the respective functions of CompareACE and MAST. As a side note, I think that there is no point in providing parsers for every single motif finder that comes out, and I don't think that AlignAce and MEME are the best or the most representative ones. It just happened that these parsers were written "to scratch someone's itch". I think that the other functionality (motif searching, comparisons,weblogo) might be more useful to people. > While it is only free for academic use, have you seen TAMO? > *TAMO: a flexible, object-oriented framework for analyzing transcriptional > regulation using DNA-sequence motifs. * > Bioinformatics. 2005 Jul 15;21(14):3164-5. > > > http://fraenkel.mit.edu/TAMO/ Yes, I've seen it and I've even recommended it on the biopython mailing list when there was no replacement in biopython. However, their library is free only for academia and AFAIK it's not using biopython datastructures, so needs some work to integrate with TAMO if you are using Biopython. Bio.Motif is meant to provide free software for Motif analysis. > Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-) > Based on the CVS, both have been untouched for about three years. > Well, I've not used it myself for a while... I'm no longer doing de-novo motif discovery. However, it still works so it's potentially useful. I think this is largely due to the lack of documentation for the Bio.AlignAce and Bio.MEME tools (partially my fault). Hopefully people will start using this if they read the tutorial. > Also, what species are these used for? > One of the papers of AlignAce indicate that the base composition was set for > yeast. > They're both general purpose, you can set the gc content for alignAce and even an HMM for MEME. > > Personally I would be interested in a general protein motif finding module > because of my current research. However, I do have a different view with > respect to the Biopython community as indicated above with the licenses. Both MEME and AlignAce can be used to find motifs in proteins, but it has not so much to do with Bio.Motif, since it does not provide any motif-finnding capabilities by itself. In general Bio.Motif should be able to deal with protein motifs, but I've never tested it (I'm mostly using it for DNA motifs), so I'll be happy to help if you find bugs. On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio wrote: > > I would just like to tell you that I have tried the TAMO framework you > suggested me, and found it very useful. Yes, I remember, but the problem is with the TAMO license. I think that the Motif object might be still useful since it is free, allows to read motifs from databases like JASPAR to scan sequences and/or compare them with "your" motifs. > I am not using it anymore because I don't need it, but I remember that I liked: > - the methods to represent motifs as matrixes of frequencies/occurrencies etc.. done > - the fact that it was easy to create a motif from an alignment of sequences depending on your definition of easy, it's there > - the integration it had with this website: > http://weblogo.berkeley.edu/logo.cgi. done > I would suggest you to provide integration with this other web > service, which enable to plot the difference between two sequence > logos: http://www.twosamplelogo.org/examples.html. This I haven't done yet, but I'll try to provide functionality for that (shouldn't take too long). -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From dalloliogm at gmail.com Mon Dec 1 16:07:08 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 1 Dec 2008 22:07:08 +0100 Subject: [Biopython-dev] [BioPython] Refactoring motif analysis code In-Reply-To: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> Message-ID: <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com> On Mon, Dec 1, 2008 at 9:53 PM, Bartek Wilczynski wrote: > On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio > wrote: >> >> I would just like to tell you that I have tried the TAMO framework you >> suggested me, and found it very useful. > > Yes, I remember, but the problem is with the TAMO license. I think > that the Motif object might be still > useful since it is free, allows to read motifs from databases like > JASPAR to scan sequences and/or > compare them with "your" motifs. Thanks for all these changes. I remember that I wrote a mail to TAMO's authors when I was using it. They seemed to be interested in integrating the code with biopython, so maybe the license issue could be superated. It's up to you, whether you want to reimplement all the functions they have or not. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bartek at rezolwenta.eu.org Tue Dec 2 04:39:37 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 2 Dec 2008 10:39:37 +0100 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com> <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com> Message-ID: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com> On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio wrote: > Thanks for all these changes. > I remember that I wrote a mail to TAMO's authors when I was using it. > They seemed to be interested in integrating the code with biopython, > so maybe the license issue could be superated. > It's up to you, whether you want to reimplement all the functions they > have or not. I have to say I haven't done anything yet towards integrating TAMO with biopython. So far, my own code was doing the job for me, and since there was a certain learning curve to get into TAMO, I didn't look closely into it. I've looked more carefully now at it and I have two general thoughts: - There is a number of features in TAMO, for which there is no counterpart in Bio.Motif. Just by looking at module names I've found: - MDscan parser - their own EM motif finding scheme (some kind of EM method) - several motif comparison functions from MotifCompare - a lot of nice little methods for motifs like textLogo, giflogo, etc. - There is quite an overlap between biopython and TAMO. They implemented their own Sequence handling, FASTA Parser, clustering module etc. There will be some gruntwork with integrating their code into Biopython (findining and reconciling the overlaps) I also have to say, that I'm a bit scared by copright statements in the TAMO code, saying it belongs to the Whitehead institute. I don't want to be overly pessimistic, but the process of releasing this code under biopython license might be slow. What I think is the best way to go is to clean up current mess with Bio.Alignace and Bio.MEME, and then ask people for contributions. If TAMO developers would be willing to contribute I'll be happy to help with integration into biopython. It will take some time anyway, so I wouldn't delay the inclusion of Bio.Motif into Biopython. cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From timothyham at gmail.com Tue Dec 2 19:19:48 2008 From: timothyham at gmail.com (Timothy Ham) Date: Tue, 2 Dec 2008 16:19:48 -0800 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) Message-ID: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> Hi everyone, The current biopython GenBank parser dies while parsing VectorNTI generated files. For example, until recently, BioPython did not accept an empty SOURCE field. It still does not handle an empty VERSION or ACCESSION fields (consumer.data.id never gets filled), which is the default for user generated vector maps via VectorNTI. Now, it is easy enough to change the GenBank parser to handle malformed genbank files, (I can submit patches) but the real question becomes: > Should BioPython handle malformed genbank files at all? I would like to be practical and say yes, since VectorNTI is a very common, widely used format, but I wanted to ask the community before submitting my patches. Thanks for the great work, Tim From bsouthey at gmail.com Tue Dec 2 21:33:26 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 2 Dec 2008 20:33:26 -0600 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com> <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com> <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com> Message-ID: On Tue, Dec 2, 2008 at 3:39 AM, Bartek Wilczynski wrote: > On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio > wrote: > >> Thanks for all these changes. >> I remember that I wrote a mail to TAMO's authors when I was using it. >> They seemed to be interested in integrating the code with biopython, >> so maybe the license issue could be superated. >> It's up to you, whether you want to reimplement all the functions they >> have or not. > > I have to say I haven't done anything yet towards integrating TAMO > with biopython. > So far, my own code was doing the job for me, and since there was a > certain learning curve to get into TAMO, > I didn't look closely into it. I've looked more carefully now at it > and I have two general thoughts: > - There is a number of features in TAMO, for which there is no > counterpart in Bio.Motif. Just by looking at module names I've found: > - MDscan parser > - their own EM motif finding scheme (some kind of EM method) > - several motif comparison functions from MotifCompare > - a lot of nice little methods for motifs like textLogo, giflogo, etc. > - There is quite an overlap between biopython and TAMO. They > implemented their own Sequence handling, FASTA Parser, clustering > module etc. There will be some gruntwork with integrating their code > into Biopython (findining and reconciling the overlaps) > > I also have to say, that I'm a bit scared by copright statements in > the TAMO code, saying it belongs to the Whitehead institute. I don't > want to be overly pessimistic, but the process of releasing this code > under biopython license might be slow. > > What I think is the best way to go is to clean up current mess with > Bio.Alignace and Bio.MEME, and then ask people for contributions. > If TAMO developers would be willing to contribute I'll be happy to > help with integration into biopython. It will take some time anyway, > so I wouldn't delay the inclusion of Bio.Motif into Biopython. > > cheers > Bartek > > > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > I would agree that you should ignore TAMO and just focus on developing a suitable framework to integrate Alignace and MEME as you have indicated. I would presume that the other motif finding applications will also fit into that framework. Unless the TAMO code is under a BSD-style or equivalent license that is compatible with Biopython you must stop looking at it. I know it is hard to avoid as the comes up on Google with a simple search. If the TAMO code gets suitably licensed, then fine but until then it can cause major problems that can involve the whole Biopython project (even including GPLed code can do this). Bruce From biopython at maubp.freeserve.co.uk Wed Dec 3 16:10:49 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Dec 2008 21:10:49 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] PubMed Entrez Utility 2009 DTD changes In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov> Message-ID: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com> This email from the NCBI will be of interest for Bio.Entrez - we may need to add a few DTD files to Bio.Entrez in preparation for this... see also Bug 2678. Peter ---------- Forwarded message ---------- From: Date: Wed, Dec 3, 2008 at 8:57 PM Subject: [Utilities-announce] PubMed Entrez Utility 2009 DTD changes To: utilities-announce at ncbi.nlm.nih.gov PubMed Entrez Utility Users, We anticipate switching to the updated PubMed 2009 DTDs on December 15, 2008. 2009 DTDs are available from the Entrez DTD page: http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/index.html The DTD changes for the 2009 production year, as noted in the Revision Notes section near the top of each DTD, are: NLMMedline DTD (used for MEDLINE/PubMed) a. Changed entity reference from "nlmmedlinecitation_080101.dtd" to: "nlmmedlinecitation_090101.dtd" b. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref and NlmDcmsID element [Edited 10/16/08] c. FOR V.3: Added GrantCountry.Ref entity [Edited 10/30/08] NLMMedlineCitation DTD (used for MEDLINE/PubMed data) a. Changed entity reference from "nlmsharedcatcit_080101.dtd" to: "nlmsharedcatcit_090101.dtd" b. Moved entity Type to nlmcommon dtd c. Added NLM value to entity Source d. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref [Edited 10/16/08] NLMSharedCatCit DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile) a. Changed entity reference from "nlmcommon_080101.dtd" to "nlmcommon_090101.dtd" b. Moved OtherAbstract element from nlmsharedcatcit dtd to nlmcommon dtd NLMCommon DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile) a. Added ValidYN attribute to Investigator element b. Moved OtherAbstract element from nlmsharedcatcit to nlmcommon dtd c. Added OtherAbstract element to NCBIArticle element d. Moved entity Type from nlmmedlinecitation to nlmcommon dtd e. Added Publisher value to entity Type f. Deleted Consumer value from entity Type g. Added Country element to Grant element h. FOR V.2: Changed Country value to GrantCountry.Ref in Grant Element [Edited 10/30/08] NLMCatalogRecord DTD (used for CatfilePlus and Serfile in XML format): a. Changed entity reference from "nlmsharedcatcit_080101.dtd" to: "nlmsharedcatcit_090101.dtd" b. Added PrecedingInPart, SupersedesInPart, SucceedingInPart, SupersededInPartBy values to entity TitleType _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From biopython at maubp.freeserve.co.uk Thu Dec 4 05:26:39 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Dec 2008 10:26:39 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> Message-ID: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham wrote: > > Hi everyone, > > The current biopython GenBank parser dies while parsing VectorNTI > generated files. For example, until recently, BioPython did not > accept an empty SOURCE field. It still does not handle an empty > VERSION or ACCESSION fields (consumer.data.id never gets filled), > which is the default for user generated vector maps via VectorNTI. I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97 after Tim contacted me offlist - there was no bug report. > Now, it is easy enough to change the GenBank parser to handle > malformed genbank files, (I can submit patches) but the real question > becomes: >> Should BioPython handle malformed genbank files at all? > I would like to be practical and say yes, since VectorNTI is a very > common, widely used format, but I wanted to ask the community before > submitting my patches. > > Thanks for the great work, > Tim As I'm the defacto maintainer for Bio.GenBank, I guess unless the list as a whole has a consensus this is my call. Reading the GenBank file format spec, the ACCESSION and VERSION lines are clearly intended to be mandatory. Note that for mandatory fields, IIRC, the NCBI will use a single dot/period as a place holder when there is no data. So I would argue that VectorNTI is producing invalid files, and you should write to the authors and encourage them to follow the spec more closely (even if we do change Biopython to cope). However, I'm willing to bend a little on out of spec GenBank files (in cases like this where there is no ambiguity about the parsing), but I would want a real example output file from VectorNTI to include for a unit test. This is important as we need to use something sensible for the SeqRecord's id property if the ACCESSION and VERSION are missing. Peter From mjldehoon at yahoo.com Thu Dec 4 07:32:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 4 Dec 2008 04:32:18 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> Message-ID: <442447.52362.qm@web62407.mail.re1.yahoo.com> > Michiel de Hoon wrote: > > If one of the sub-tests fails, Python's unit > > testing framework will tell us so, > > though (perhaps) not exactly which sub-test fails. > > However, that is easy to > > figure out just by running the individual test script > > by itself. > > That won't always work. Consider intermittent network > problems, or tests using random data - in general it > really is worthwhile having run_tests.py report a little > more than just which test_XXX.py module failed. > I wonder if Python's unit testing framework allows us to capture exactly which sub-test fails. I'll look into that. Ideally, it should be possible to have regular Python unit tests and Biopython-style print-and-compare tests side by side, and get information about failing sub-tests for both. --Michiel. From bsouthey at gmail.com Thu Dec 4 10:02:13 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 04 Dec 2008 09:02:13 -0600 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> Message-ID: <4937F0F5.6070905@gmail.com> Peter wrote: > On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham wrote: > >> Hi everyone, >> >> The current biopython GenBank parser dies while parsing VectorNTI >> generated files. For example, until recently, BioPython did not >> accept an empty SOURCE field. It still does not handle an empty >> VERSION or ACCESSION fields (consumer.data.id never gets filled), >> which is the default for user generated vector maps via VectorNTI. >> > > I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97 > after Tim contacted me offlist - there was no bug report. > > >> Now, it is easy enough to change the GenBank parser to handle >> malformed genbank files, (I can submit patches) but the real question >> becomes: >> >>> Should BioPython handle malformed genbank files at all? >>> >> I would like to be practical and say yes, since VectorNTI is a very >> common, widely used format, but I wanted to ask the community before >> submitting my patches. >> >> Thanks for the great work, >> Tim >> > > As I'm the defacto maintainer for Bio.GenBank, I guess unless the list > as a whole has a consensus this is my call. > > Reading the GenBank file format spec, the ACCESSION and VERSION lines > are clearly intended to be mandatory. Note that for mandatory fields, > IIRC, the NCBI will use a single dot/period as a place holder when > there is no data. So I would argue that VectorNTI is producing > invalid files, and you should write to the authors and encourage them > to follow the spec more closely (even if we do change Biopython to > cope). > > However, I'm willing to bend a little on out of spec GenBank files (in > cases like this where there is no ambiguity about the parsing), but I > would want a real example output file from VectorNTI to include for a > unit test. This is important as we need to use something sensible for > the SeqRecord's id property if the ACCESSION and VERSION are missing. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the 'complete release notes for the current version of GenBank'. From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that ACCESSION and VERSION are mandatory and I interpret the '/' to mean 'with'. The relevant section is: 3.4.2 Entry Organization " The second part of each sequence entry record contains the information appropriate to its keyword, in positions 13 to 80 for keywords and positions 11 to 80 for the sequence. The following is a brief description of each entry field. Detailed information about each field may be found in Sections 3.4.4 to 3.4.15. LOCUS - A short mnemonic name for the entry, chosen to suggest the sequence's definition. Mandatory keyword/exactly one record. DEFINITION - A concise description of the sequence. Mandatory keyword/one or more records. ACCESSION - The primary accession number is a unique, unchanging identifier assigned to each GenBank sequence record. (Please use this identifier when citing information from GenBank.) Mandatory keyword/one or more records. VERSION - A compound identifier consisting of the primary accession number and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the sequence by NCBI. " Mandatory keyword/exactly one record. If these entries are missing then Biopython must raise an exception because the GenBank file is invalid. While I have not seen an example, does a VectorNTI output contain the LOCUS field that could be used an accession number? I think it is fairly common for the accession number to be part of the LOCUS field. Bruce From biopython at maubp.freeserve.co.uk Thu Dec 4 10:16:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Dec 2008 15:16:20 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <4937F0F5.6070905@gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <4937F0F5.6070905@gmail.com> Message-ID: <320fb6e00812040716h1fb4bfbflf5a37456102722cc@mail.gmail.com> On Thu, Dec 4, 2008 at 3:02 PM, Bruce Southey wrote: > Peter wrote: >> Reading the GenBank file format spec, the ACCESSION and VERSION lines >> are clearly intended to be mandatory. Note that for mandatory fields, >> IIRC, the NCBI will use a single dot/period as a place holder when >> there is no data. So I would argue that VectorNTI is producing >> invalid files, and you should write to the authors and encourage them >> to follow the spec more closely (even if we do change Biopython to >> cope). Bruce wrote: > At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the > 'complete release notes for the current version of GenBank'. > From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that > ACCESSION and VERSION are mandatory ... We agree on this, according to the current NCBI standard, a GenBank file missing the ACCESSION or VERSION line is technically invalid. Bruce: > If these entries are missing then Biopython must raise an exception because > the GenBank file is invalid. I see a difference between a GenBank parser, and a GenBank validator. While it would be nice to just say "your file is invalid", in many cases the meaning of the file isn't ambiguous and can still be safely parsed. From past experience, even the NCBI sometimes provide invalid files which break their own rules (e.g. Biopython Bug 2591). In my personal opinion, a strict parser which rejects any invalid GenBank file isn't actually that useful - there is a grey area where a little leniency is very helpful: Peter wrote: >> However, I'm willing to bend a little on out of spec GenBank files (in >> cases like this where there is no ambiguity about the parsing), but I >> would want a real example output file from VectorNTI to include for a >> unit test. This is important as we need to use something sensible for >> the SeqRecord's id property if the ACCESSION and VERSION are missing. Peter From biopython at maubp.freeserve.co.uk Thu Dec 4 17:15:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Dec 2008 22:15:26 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> Message-ID: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> Tim wrote: > I have attached two representative example genbank outputs from > VectorNTI. I don't know if the mailing list accepts attachments, but > if it can't, is there a place where I can put it (maybe the biopython > wiki?) I got them, thanks. For future reference, it would have been better to have filed a bug on bugzilla, and then (once the bug is filed) you can attach files to it. Earlier Tim wrote: >>> The current biopython GenBank parser dies while parsing VectorNTI >>> generated files. For example, until recently, BioPython did not >>> accept an empty SOURCE field. It still does not handle an empty >>> VERSION or ACCESSION fields (consumer.data.id never gets filled), >>> which is the default for user generated vector maps via VectorNTI. Now that I've got your two files, my copy of Biopython seem to read them just fine. What exactly do you mean by the "parser dies"? Could you show us a snippet of code and if relevant the exception error - plus details of your OS, version of Python and Biopthon etc? Thanks Peter From timothyham at gmail.com Thu Dec 4 21:09:21 2008 From: timothyham at gmail.com (Timothy Ham) Date: Thu, 4 Dec 2008 18:09:21 -0800 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> Message-ID: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com> On Thu, Dec 4, 2008 at 2:15 PM, Peter wrote: > Now that I've got your two files, my copy of Biopython seem to read > them just fine. What exactly do you mean by the "parser dies"? Could > you show us a snippet of code and if relevant the exception error - > plus details of your OS, version of Python and Biopthon etc? > > Thanks > > Peter > Ah, my bad. I was running it against an old version. It looks like it was fixed as of /biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48). The current version does the right thing. Thanks much, Tim From biopython at maubp.freeserve.co.uk Fri Dec 5 05:19:12 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Dec 2008 10:19:12 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com> Message-ID: <320fb6e00812050219k376fdda2r969fe78a547b0ff6@mail.gmail.com> Tim wrote: > Ah, my bad. I was running it against an old version. It looks like it > was fixed as of > /biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48). > The current version does the right thing. Oh right - that was when I was testing parsing of the slightly non-standard GenBank output from the EMBOSS seqret tool. Anyway, problem solved :) Peter From bugzilla-daemon at portal.open-bio.org Fri Dec 5 06:59:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 06:59:07 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812051159.mB5Bx7TR009168@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-05 06:59 EST ------- (In reply to comment #0) > The default font has been changed to 'Vera', which is shipped with Reportlab, > to avoid some problems with unavailable fonts On my Mac "Vera" doesn't work, and going back to the default of 'Helvetica' seems best on Unix in general. Also, Helvetica is one of the standard fonts which all PDF viewers should be able to render. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 11:44:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 11:44:10 -0500 Subject: [Biopython-dev] [Bug 2697] New: MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2697 Summary: MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The Bio.MaxEntrophy.classify() assumes that the targets are integers starting at zero. However, a model can be trained by using character values. This requires a simple change in a loop in that function. Also, the convergence criteria is hard coded into the file by the following gloable definitions: MAX_IIS_ITERATIONS = 10000 # Maximum iterations for IIS. IIS_CONVERGE = 1E-5 # Convergence criteria for IIS. MAX_NEWTON_ITERATIONS = 100 # Maximum iterations on Newton's method. NEWTON_CONVERGE = 1E-10 # Convergence criteria for Newton's method. This makes it impossible for the user to specify their own values without changing the actual function. This is changed by passing these values to the train function and subfunctions. Both of these are fixed in an attached patch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 11:47:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 11:47:15 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812051647.mB5GlFRQ020087@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #1 from bsouthey at gmail.com 2008-12-05 11:47 EST ------- Created an attachment (id=1139) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1139&action=view) Fixes to MaxEntrophy 1) Fixes MaxEntrophy.calculate to use the target classes from the data 2) Permits the user to define their own convergence criterion -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 11:59:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 11:59:51 -0500 Subject: [Biopython-dev] [Bug 2698] New: Attempt at a unit test for MaxEntrophy Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2698 Summary: Attempt at a unit test for MaxEntrophy Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I used test_LogisticRegression.py to develop a test for MaxEntrophy. However, I could not get MaxEntrophy to train on that dataset. Indeed I have found it to be very sensitive to both data and functions making it extremely hard to develop bioinformatics-based data and associated test. So in the end I generated data based on some of my work. I trained the model outside the tests because I do not know how to avoid retraining the model for each test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 12:00:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 12:00:29 -0500 Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy In-Reply-To: Message-ID: <200812051700.mB5H0Ted022044@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2698 ------- Comment #1 from bsouthey at gmail.com 2008-12-05 12:00 EST ------- Created an attachment (id=1140) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1140&action=view) Test for MaxEntrophy -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From timothyham at gmail.com Thu Dec 4 16:52:33 2008 From: timothyham at gmail.com (Timothy Ham) Date: Thu, 4 Dec 2008 13:52:33 -0800 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> Message-ID: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> On Thu, Dec 4, 2008 at 2:26 AM, Peter wrote: > On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham wrote: >> >> Hi everyone, >> >> The current biopython GenBank parser dies while parsing VectorNTI >> generated files. For example, until recently, BioPython did not >> accept an empty SOURCE field. It still does not handle an empty >> VERSION or ACCESSION fields (consumer.data.id never gets filled), >> which is the default for user generated vector maps via VectorNTI. > > I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97 > after Tim contacted me offlist - there was no bug report. > >> Now, it is easy enough to change the GenBank parser to handle >> malformed genbank files, (I can submit patches) but the real question >> becomes: >>> Should BioPython handle malformed genbank files at all? >> I would like to be practical and say yes, since VectorNTI is a very >> common, widely used format, but I wanted to ask the community before >> submitting my patches. >> >> Thanks for the great work, >> Tim > > As I'm the defacto maintainer for Bio.GenBank, I guess unless the list > as a whole has a consensus this is my call. > > Reading the GenBank file format spec, the ACCESSION and VERSION lines > are clearly intended to be mandatory. Note that for mandatory fields, > IIRC, the NCBI will use a single dot/period as a place holder when > there is no data. So I would argue that VectorNTI is producing > invalid files, and you should write to the authors and encourage them > to follow the spec more closely (even if we do change Biopython to > cope). > > However, I'm willing to bend a little on out of spec GenBank files (in > cases like this where there is no ambiguity about the parsing), but I > would want a real example output file from VectorNTI to include for a > unit test. This is important as we need to use something sensible for > the SeqRecord's id property if the ACCESSION and VERSION are missing. > > Peter > I have attached two representative example genbank outputs from VectorNTI. I don't know if the mailing list accepts attachments, but if it can't, is there a place where I can put it (maybe the biopython wiki?) Tim -------------- next part -------------- A non-text attachment was scrubbed... Name: vnti_example.zip Type: application/zip Size: 11716 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Tue Dec 9 09:55:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 09:55:05 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091455.mB9Et5iX017478@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1132|application/octet-stream |text/plain mime type| | Attachment #1132 is|0 |1 patch| | ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 09:55 EST ------- (From update of attachment 1132) Checked into CVS (with the font defaulting to Helvetica as discussed with Leighton privately). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 09:55:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 09:55:56 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091455.mB9Etu7C017584@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1132 is|1 |0 patch| | Attachment #1132 is|0 |1 obsolete| | ------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 09:55 EST ------- (From update of attachment 1132) This is now obsolete - checked into CVS (with the font defaulting to elvetica as discussed with Leighton privately). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 10:12:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:12:56 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091512.mB9FCusM019463@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 10:12 EST ------- (In reply to comment #12) > > Bio.Graphics.GenomeDiagram.Utilities > ==================================== > This is a collection of utilities for getting information useful for graph > values. From the docstring, > > o apply_to_window (sequence, window_size, function, step=None) Apply a > passed function to fragments of the passed sequence of > size window_size, with each window separated by the > passed step. This windowing function is rather specific to GenomeDiagram by the nature of how it returns the values and their positions. The handling of the end of the sequence is also non-general. Suppose we put apply_to_window somewhere under Bio.Graphics.GenomeDiagram. It can then be used with any sequence analysis function which takes a sequence/string and returns a float, returning the scores and window positions as expected by GenomeDiagram for drawing graphical tracks. That would leave the following general non-windowed functions from Utilities.py, calc_gc_content - returns a float in the range 0 to 1. calc_at_content - returns a float in the range 0 to 1. calc_gc_skew - returns a float, gives zero if there is no GC content. calc_at_skew - returns a float, gives zero if there is no AT content. Bio.SeqUtils already has several functions including: GC - returns a float in the range 0 to 100 (i.e. 100 times the actual fraction) GC_skew - returns a list of floats using a default window size of 100bp. Gives a floating point exception if there is no GC content in any window. Personally I don't like the fact that the existing GC function returns a number between 0 and 100, but otherwise this code is fine. I don't think the current GC_skew function is intuitive and doesn't cover the non-windowed use-case where you want the GC_skew of the whole sequence passed in. This is important if you want to do your own windowing (e.g. comparing GC skew of individual genes to the whole genome). Because they differ from the existing Bio.SeqUtils code, I think there is a case for adding the four non-windowed functions from GenomeDiagram's Utilities.py under Bio.SeqUtils. Perhaps under a sub module like Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions in Bio.SeqUtils could be deprecated or at least declared obsolete. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 10:19:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:19:23 -0500 Subject: [Biopython-dev] [Bug 2704] New: Parser for the markx10 alignment format Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2704 Summary: Parser for the markx10 alignment format Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: osvaldo.zagordi at bsse.ethz.ch Hi, I recently wrote some code to parse the Emboss alignment format markx10 (format explained at http://emboss.sourceforge.net/docs/themes/AlignFormats.html) Since it is slightly different from the Fasta m10 (not surprising, right?) I had to adapt FastaIO.py. I thought this might eventually be included in biopython. Important: I noticed that if the alignment program exits for some reason and does not close the alignment file with two lines like these #--------------------------------------- #--------------------------------------- bad things can happen (e.g., sucking all the memory of the system)). Could it be that a similar issue applies to FastaIO parser as well? Best, Osvaldo -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 10:35:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:35:57 -0500 Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format In-Reply-To: Message-ID: <200812091535.mB9FZvHG021117@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2704 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 10:35 EST ------- This sounds interesting Osvaldo, Now that you've filed this bug, you should be able to upload the python file (or a patch). Given EMBOSS's markx10 output is intended to be like FASTA's -m 10 output (but with the addition of EMBOSS style headers and footers), it *might* be nicer to have one parser for both. Right now I don't know how similar EMBOSS's output really is. If we do go for the simpler option of two separate parsers, it would certainly be a good idea in the long run for them to share some code. (In reply to comment #0) > Important: > I noticed that if the alignment program exits for some reason and > does not close the alignment file with two lines like these > #--------------------------------------- > #--------------------------------------- > bad things can happen (e.g., sucking all the memory of the system)). > Could it be that a similar issue applies to FastaIO parser as well? Does this happen create such a file by hand (lacking these files) and try and read that? If so it should be easier to debug. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 10:43:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:43:19 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091543.mB9FhJfV021598@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #21 from lpritc at scri.sari.ac.uk 2008-12-09 10:43 EST ------- (In reply to comment #20) > (In reply to comment #12) > > > > Bio.Graphics.GenomeDiagram.Utilities > > ==================================== > > This is a collection of utilities for getting information useful for graph > > values. From the docstring, > > > > o apply_to_window (sequence, window_size, function, step=None) Apply a > > passed function to fragments of the passed sequence of > > size window_size, with each window separated by the > > passed step. > > This windowing function is rather specific to GenomeDiagram by the nature of > how it returns the values and their positions. The handling of the end of the > sequence is also non-general. Suppose we put apply_to_window somewhere under > Bio.Graphics.GenomeDiagram. It can then be used with any sequence analysis > function which takes a sequence/string and returns a float, returning the > scores and window positions as expected by GenomeDiagram for drawing graphical > tracks. That seems sensible, to me. I like the generality that would result from it, and it seems like apply_to_window could even be a useful convenience function addition to Bio.SeqUtils in its own right. [...] > Because they differ from the existing Bio.SeqUtils code, I think there is a > case for adding the four non-windowed functions from GenomeDiagram's > Utilities.py under Bio.SeqUtils. Perhaps under a sub module like > Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions > in Bio.SeqUtils could be deprecated or at least declared obsolete. I think that there's value to be had in standardising to a floating-point 0..1 or -1..1 range for some of these kinds of functions, so I would support such a move on those grounds. Regarding my GC skew code (and the corresponding AT skew code): that the behaviour when there is no GC in the sequence is misleading (read: wrong ;) ). Strictly, a divide-by-zero error would be correct here, but I just lazily went for a zero value for ease of drawing, instead of doing something that properly indicated 'not a number'. I think that what needs to be done for GenomeDiagram is to modify the graphing code so that it does something appropriate for NaNs (however they may be indicated) - this should perhaps be to stop at the preceding point, and resume at the subsequent point, for line graphs; not to draw a box for the heat map; and not to draw a bar for the bar chart (not that this will always be distinguishable from a zero value...). The GenomeDiagram GC/AT skew code also needs to be modified to return None or some other NaN indicator before its behaviour can be considered correct. Apologies for propagating those shortcuts - my bad. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 11:20:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:20:06 -0500 Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format In-Reply-To: Message-ID: <200812091620.mB9GK6Si024603@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2704 ------- Comment #2 from osvaldo.zagordi at bsse.ethz.ch 2008-12-09 11:20 EST ------- Created an attachment (id=1151) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1151&action=view) Class Markx10Iterator for markx10 alignment format Attached a simple example of using the code. Just running simple_test.py should be enough. If you remove the last two lines #------ from tmp_align.needle the program loops sucking more and more memory -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 11:20:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:20:23 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091620.mB9GKNCm024646@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 11:20 EST ------- (In reply to comment #21) > Regarding my GC skew code (and the corresponding AT skew code): that the > behaviour when there is no GC in the sequence is misleading > (read: wrong ;) ). > Strictly, a divide-by-zero error would be correct here, but I just lazily went > for a zero value for ease of drawing, instead of doing something that properly > indicated 'not a number'. Yeah - you're right. Either we just allow the divide by zero to be raised, or return a NaN, maybe via float("nan") unless there is a better way without getting NumPy involved. > I think that what needs to be done for GenomeDiagram > is to modify the graphing code so that it does something appropriate for NaNs > (however they may be indicated) - this should perhaps be to stop at the > preceding point, and resume at the subsequent point, for line graphs; not to > draw a box for the heat map; and not to draw a bar for the bar chart (not that > this will always be distinguishable from a zero value...). OK. I can see what just using zero was a nice short cut here. > The GenomeDiagram GC/AT skew code also needs to be modified to return None or > some other NaN indicator before its behaviour can be considered correct. Or, if we accept that "sequence scoring functions" may raise a divide by zero error, then apply_to_window should be also to cope and map this to an appropriate nan indicator (e.g. None or float("nan")). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 11:39:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:39:27 -0500 Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format In-Reply-To: Message-ID: <200812091639.mB9GdRTJ026010@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2704 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 11:39 EST ------- (In reply to comment #2) > If you remove the last two lines #------ from tmp_align.needle the program > loops sucking more and more memory You have an infinite loop, try modifying the bit near line 162 as follows: #Now should have the aligned query sequence with flanking region... while not (line.startswith(">") or ">>>" in line) and not line.startswith('#'): match_seq_parts.append(line.strip()) line = handle.readline() if not line : #End of file return None Also, your code is based on an out of date version of Bio/AlignIO/FastaIO.py - probably from Biopython 1.47, and lacks improvements which may also apply to the EMBOSS output. Given the object orientated nature of the current m10 parser, you/we should be able to subclass it and only override those bit dealing with the header and footer. This is probably the nicest way forward if we decide to treat the EMBOSS markx10 format as a new format in Bio.AlignIO. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 11:59:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:59:21 -0500 Subject: [Biopython-dev] [Bug 2705] New: Nicer GC and AT content and skew functions Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2705 Summary: Nicer GC and AT content and skew functions Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This bug started out as a discussion on Bug 2671, based on some nucleotide scoring functions in GenomeDiagram which were used for plotting sequence properties along a sequence using a sliding window. The basic underlying functions could make a nice addition under Bio.SeqUtils (rather than hiding them under Bio.Graphics.GenomeDiagram). In particular, GenomeDiagram's Utilities.py included the following (non-windowed) nucleotide composition functions: calc_gc_content - returns a float in the range 0 to 1. calc_at_content - returns a float in the range 0 to 1. calc_gc_skew - returns a float [*] calc_at_skew - returns a float [*] [*] As discussed on Bug 2671, these currently give zero if there is no AT content, which was a reasonable shortcut given these functions were originally used for plotting only. They should instead raise an exception or return None or NaN instead. Also, as implemented in GenomeDiagram, these functions do not cope with mixed case sequences (easily rectified). Also, for GC and AT content these do not deal with ambiguous nucleotides (where we could follow the existing Bio.SeqUtils convention). Bio.SeqUtils already has several related functions including: GC - returns a float (a percentage in the range 0 to 100) GC123 - returns a tuple of four floats (percentages between 0 and 100) GC_skew - returns a list of floats using a default window size of 100bp. Gives a floating point exception if there is no GC content in any window. Personally I don't like the fact that the existing GC function returns a number between 0 and 100 (rather than 0 and 1). Leighton agreed. I don't think the current GC_skew function is intuitive and doesn't cover the non-windowed use-case where you want the GC_skew of the whole sequence passed in. This is important if you want to do your own windowing (e.g. comparing GC skew of individual genes to the whole genome). Because they differ from the existing Bio.SeqUtils code, I think there is a case for adding the four non-windowed functions from GenomeDiagram's Utilities.py under Bio.SeqUtils. Each would take a single argument, a sequence (coping with a string, Seq object or MutableSeq object). I have no particularly strong views on the naming of these functions. Perhaps they could be located under a sub module like Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions in Bio.SeqUtils could be deprecated or at least declared obsolete. This would also be a good opportunity to explicitly specify what we expect to get back for the GC content when there are ambiguous nucleotides. e.g. Following Bio.SeqUtils.GC, only count C, G and S (which means C or G) (in either case) and divide by the length giving a lower bound. Here GC("ACGTN") is 40%. An alternative approach might be to treat an N as 50% GC, and H (which is A, C or T) as 66.6% GC etc, meaning GC("ACGTN") gives 50%. The same approach should be used for the AT percentage, for example the current lower bound approach would count only A, T and W characters (in either case). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 12:04:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 12:04:15 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091704.mB9H4F9C028063@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 12:04 EST ------- I've filed Bug 2705 about adding these nucleotide sequence functions somewhere under Bio.SeqUtils - this should get more people reading it because this bug (Bug 2671) hasn't been assigned to the dev mailing list I doubt many people are aware of it. For Bio.Graphics.GenomeDiagram we need to ensure the graphics tracks can cope with NAN/None missing values as outlined by Leighton in comment 21. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 12:53:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 12:53:44 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091753.mB9Hri42031692@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1133 is|0 |1 obsolete| | ------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 12:53 EST ------- (From update of attachment 1133) I've checked something like this into CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 11:46:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 11:46:35 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812101646.mBAGkZs1003825@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2705 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-10 11:46 EST ------- OK, GenomeDiagram is now in CVS, with some basic tests. Still to do: * Updating the existing GenomeDiagram manual to match (different imports, colour to color), which I think can stay as a separate PDF file. * A short introduction to Bio.Graphics including GenomeDiagram as part of a new chapter in the tutorial? * Dealing with Bug 2705 (for the AT and GC content and skew) and the window function to help plot these in GenomeDiagram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 11:46:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 11:46:38 -0500 Subject: [Biopython-dev] [Bug 2705] Nicer GC and AT content and skew functions In-Reply-To: Message-ID: <200812101646.mBAGkcGB003850@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2705 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2671 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 12:16:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 12:16:37 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812101716.mBAHGbGG006815@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-10 12:16 EST ------- We already talked about "colour" vs "color" (UK vs USA), but I've just noticed the use of "centre" vs "center" where again I would prefer we follow computer language norms and take the USA spelling. Also, I'm not sure that the existing colour/color dual support works 100% of the time. I had an old script using colour where the feature colours specified ended up being the default of light green. Using "color" instead of "colour" in my script worked. I'll try and investigate this later. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 12:55:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 12:55:31 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812101755.mBAHtVJ7009870@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-10 12:55 EST ------- This might be better off as a new enhancement bug, but here is a possible "arc-box" drawing function to go in the AbstractDrawer.py file, based on the existing draw_box function. def draw_arcbox(xcentre, ycentre, inner_radius, outer_radius, startangle, endangle, colour=colors.lightgreen, border=None, color=None) : """Returns a closed path object describing an arced box. Expects the angles to be in radians.""" if color is None: color = colour if color == colors.white and border is None: # Force black border on strokecolor = colors.black # white boxes with elif border is None: # undefined border, else strokecolor = color # use fill colour elif border is not None: strokecolor = border p = ArcPath(strokeColor=strokecolor, fillColor=color, strokewidth=0) p.addArc(xcentre, ycentre, outer_radius, startangle * 180 / pi, endangle * 180 / pi, moveTo=True) p.addArc(xcentre, ycentre, inner_radius, startangle * 180 / pi, endangle * 180 / pi, reverse=True) p.closePath() return p This takes advantage of reportlab's build in arc approximation code meaning we can simplify the CircularDrawer.py method to just something like this: def draw_arc(self, inner_radius, outer_radius, startangle, endangle, color, border=None, colour=None): #Docstring here return draw_arcbox(self.xcentre, self.ycentre, inner_radius, outer_radius, startangle, endangle, colour, border, color) Alternately, the code could just go in CircularDrawer.py directly. As far as I can tell from looking at their source code, even ReportLab_1_21_2 has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any issue here with backwards compatibility. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Dec 11 03:40:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Dec 2008 03:40:23 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812110840.mBB8eNFs006984@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #28 from lpritc at scri.sari.ac.uk 2008-12-11 03:40 EST ------- (In reply to comment #26) > We already talked about "colour" vs "color" (UK vs USA), but I've just noticed > the use of "centre" vs "center" where again I would prefer we follow computer > language norms and take the USA spelling. > > Also, I'm not sure that the existing colour/color dual support works 100% of > the time. I had an old script using colour where the feature colours specified > ended up being the default of light green. Using "color" instead of "colour" > in my script worked. I'll try and investigate this later. Is this related to my fix in comment #9? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Dec 11 06:50:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Dec 2008 06:50:17 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812111150.mBBBoHej030149@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-11 06:50 EST ------- (In reply to comment #28) > (In reply to comment #26) > > Also, I'm not sure that the existing colour/color dual support works 100% > > of the time. I had an old script using colour where the feature colours > > specified ended up being the default of light green. Using "color" > > instead of "colour" in my script worked. I'll try and investigate this > > later. > > Is this related to my fix in comment #9? Possibly - although I was already using that version of AbstractDrawer.py I've updated CVS to make it clear in the comments that "colour" arguments override "color" arguments (this is required for backwards compatibility with old scripts which would be using "colour"). I also had to fix the FeatureSet's add_feature method to handle the colour/color mapping (this was the root of the problem I had observed in comment 26). I propose that in Biopython 1.50 we support both "colour" and "color", but for Biopython 1.51 we add deprecation warnings when "colour" is used. We should probably do the same thing for "centre" and "center" as well... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Dec 11 06:52:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Dec 2008 06:52:41 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812111152.mBBBqfTQ030413@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #30 from lpritc at scri.sari.ac.uk 2008-12-11 06:52 EST ------- (In reply to comment #29) > > I propose that in Biopython 1.50 we support both "colour" and "color", but for > Biopython 1.51 we add deprecation warnings when "colour" is used. > > We should probably do the same thing for "centre" and "center" as well... > I agree. We should encourage use of the US spelling in the documentation, to catch those new to GD. This approach provides a window for conversion of old GD scripts for previous users, which is a good thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 11:09:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 11:09:27 -0500 Subject: [Biopython-dev] [Bug 2709] New: test_GenomeDiagram fails under Linux Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2709 Summary: test_GenomeDiagram fails under Linux Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P4 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Under my Linux 64-bit system test_GenomeDiagram fails but the other related tessts 'pass' as reportlab is not available: test_GenomeDiagram ... ERROR test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. ok test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. ok test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. ok ====================================================================== ERROR: test_GenomeDiagram ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_GenomeDiagram.py", line 21, in raise MissingExternalDependencyError(\ NameError: name 'MissingExternalDependencyError' is not defined ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 11:25:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 11:25:59 -0500 Subject: [Biopython-dev] [Bug 2709] test_GenomeDiagram fails under Linux In-Reply-To: Message-ID: <200812121625.mBCGPxeQ031269@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2709 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-12 11:25 EST ------- It was trying to raise MissingExternalDependencyError when reportlab was missing (which would have skipped the test), but MissingExternalDependencyError hadn't been imported. Fixed in test_GenomeDiagram.py CVS revision 1.10 Thanks for reporting this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 11:49:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 11:49:51 -0500 Subject: [Biopython-dev] [Bug 2710] New: GenomeDiagram.py unnecessary requires the reportlab addon renderPM Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2710 Summary: GenomeDiagram.py unnecessary requires the reportlab addon renderPM Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com test_GenomeDiagram fails because the renderPM module is not part of standard install of reportlab, at least under Linux. I consider that the renderPM module should not be required so Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the renderPM module when it is not available. The installation documentation needs to include something about needing the renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. There must be a test for the presence of the renderPM module. test_GenomeDiagram ... ERROR test_GraphicsChromosome ... ok test_GraphicsDistribution ... ok test_GraphicsGeneral ... ok ====================================================================== ERROR: test_GenomeDiagram ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_GenomeDiagram.py", line 30, in from Bio.Graphics.GenomeDiagram.FeatureSet import FeatureSet File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/__init__.py", line 13, in from Bio.Graphics.GenomeDiagram.Diagram import Diagram File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/Diagram.py", line 32, in from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM File "/usr/lib/python2.5/site-packages/reportlab/graphics/renderPM.py", line 28, in "see http://www.reportlab.org/rl_addons.html") ImportError: No module named _renderPM see http://www.reportlab.org/rl_addons.html ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 12:43:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:43:49 -0500 Subject: [Biopython-dev] [Bug 2711] New: GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2711 Summary: GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com While looking at GenomeDiagram.py I noticed some things that should be fixed. I do note that some of this stems from reportlab. In particlular, reportlab doesn't appear to have a generic interface for different image formats. 1) Why are there two functions to output a diagram than just one generic function? In particular, why not just pass a filename or not? Yes, I know that reportlab uses different functions but this just duplicates code. So this is more a comment than anything else. 2) I find the functions write() and write_to_string() just plain ugly. You define a local dictionary of modules every time these functions are called. But there is only one valid key so you then go back to find the input that you already knew. A nested list would be better and allow catching invalid inputs (see next point). 3) Neither write() and write_to_string() check that the output option is valid. These functions do not accept lowercase. Thus, output='ps' will crash with a key error as well any invalid key. 4) I do not know the policy on module imports, but this line is only required for write() and write_to_string(): from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM Also renderPM is an addon. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 12:46:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:46:53 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121746.mBCHkrPi005835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #1 from bsouthey at gmail.com 2008-12-12 12:46 EST ------- Created an attachment (id=1156) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1156&action=view) Fix various issues with GenomeDIagram/Diagram.py Contains a couple of fixes including bug 2710. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 12:54:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:54:21 -0500 Subject: [Biopython-dev] [Bug 2710] GenomeDiagram.py unnecessary requires the reportlab addon renderPM In-Reply-To: Message-ID: <200812121754.mBCHsL4q006303@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2710 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from bsouthey at gmail.com 2008-12-12 12:54 EST ------- The reason for this bug report was the import of renderPM. But closer look at the code shows a bigger issue with write() and writeToString() functions of Diagram.py. I am marking this as duplicate because correctly fixing bug 2711 (see patch for Bug 2711) will also fix this one. *** This bug has been marked as a duplicate of bug 2711 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 12:54:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:54:34 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121754.mBCHsYgN006312@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #2 from bsouthey at gmail.com 2008-12-12 12:54 EST ------- *** Bug 2710 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 13:25:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 13:25:25 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121825.mBCIPPZq008484@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-12 13:25 EST ------- I agree something needs to be done for this issue (in particular the bit originally covered by Bug 2710. Moving the imports into these function(s) would be another way to let use deal with the missing renderPM module if and when it is used (either leave the ImportError, or raise a missing external dependency error). As an aside, I'd like write_to_string() to support a DPI argument like write() does. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 14:23:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 14:23:06 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121923.mBCJN64B013046@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1156 is|0 |1 obsolete| | ------- Comment #4 from bsouthey at gmail.com 2008-12-12 14:23 EST ------- Created an attachment (id=1157) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1157&action=view) Corrected patch I blindly copied and pasted without correcting it. Also, added 'dpi' to write_to_string(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 14:29:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 14:29:37 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121929.mBCJTbtl013858@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #5 from bsouthey at gmail.com 2008-12-12 14:29 EST ------- (In reply to comment #3) > > As an aside, I'd like write_to_string() to support a DPI argument like write() > does. > I added this to the patch as it was trivial. I would also think that exposing the other options (bg, configPIL, showBoundary) could be useful. But I do not know how these influence the GenomeDiagram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Dec 13 13:20:10 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 13 Dec 2008 18:20:10 +0000 Subject: [Biopython-dev] [Utilities-announce] PubMed Entrez Utility 2009 DTD changes In-Reply-To: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov> <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com> Message-ID: <320fb6e00812131020r4a2a02dtcc7d65e8cf495052@mail.gmail.com> On Wed, Dec 3, 2008 at 9:10 PM, Peter wrote: > This email from the NCBI will be of interest for Bio.Entrez - we may > need to add a few DTD files to Bio.Entrez in preparation for this... > see also Bug 2678. I've just added the following five DTD files to CVS, nlmcommon_090101.dtd nlmmedline_090101.dtd nlmmedlinecitation_090101.dtd nlmsharedcatcit_090101.dtd pubmed_090101.dtd All from http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ Peter From bugzilla-daemon at portal.open-bio.org Sat Dec 13 15:19:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 15:19:15 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200812132019.mBDKJFkD005703@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 15:19 EST ------- (In reply to comment #6) > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read > it from there. If not, it tries to download it. This may fail if the servers > are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when > Biopython is installed), you won't run into this problem. I was just looking at this on my Windows XP Python 2.3 machine, and when it tried to download missing DTD files it was just using a filename as the URL. I've committed a fix to CVS which should resolve this: biopython/Bio/Entrez/Parser.py revision 1.3 I'll double check this on Linux/Mac next week. This may be related to Leighton's problem - although 'xhtml1-strict.dtd' and 'xhtml-lat1.ent' are not NCBI DTD files, but rather a part of the XML specification itself. Note that if I delete all the Bio/Entrez/DTDs/* files, then test_Entrez.py fails. I get warning messages about downloading missing DTD files, and the following failures: ====================================================================== ERROR: Test parsing pubmed links returned by ELink (fifth test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 2523, in t_pubmed5 record = Entrez.read(input) File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in read record = handler.run(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 131, in startE lement if object!="": UnboundLocalError: local variable 'object' referenced before assignment ====================================================================== ERROR: Test parsing XML returned by EFetch, PubMed database (first test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 3058, in t_pubmed1 record = Entrez.read(input) File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in read record = handler.run(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) ExpatError: syntax error: line 1, column 0 ====================================================================== ERROR: Test parsing XML returned by EFetch, PubMed database (second test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 3261, in t_pubmed2 record = Entrez.read(input) File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in read record = handler.run(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) ExpatError: syntax error: line 1, column 0 ====================================================================== FAIL: Test parsing pubmed links returned by ELink (sixth test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 2697, in t_pubmed6 assert len(record[0]["IdCheckList"])==2 AssertionError ---------------------------------------------------------------------- (The rest of the Entrez tests pass even with the missing DTDs - they are now successfully downloaded on demand) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 13 18:56:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 18:56:02 -0500 Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. In-Reply-To: Message-ID: <200812132356.mBDNu2HE017869@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 18:56 EST ------- Hi Paul, I'd like to close this bug now as we think it has been solved. Michiel's update was included with Biopython 1.49, so you don't need to mess about with CVS to check and confirm this now. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 13 19:12:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 19:12:00 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200812140012.mBE0C0Yo018673@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 19:11 EST ------- (In reply to comment #4) > (In reply to comment #2) > > (In reply to comment #0) > > > 1) Fixed date/dates typo. > > > > Why is it a typo? Change not checked in. > > The function _load_bioentry_date in Loader.py inserts the annotation 'date', > if present, or the current date if not, into the bioentry_qualifier_value > table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as > the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, > which should be 'date' and not 'dates'. OK, that does make sense. However... > Also, because Loader.py handles dates separately, they should not be > handled by the function load_annotations. That would make sense if we make the above "dates"/"date" change. If we tested a record with a "date" annotation, I guess currently it would get recorded twice - once under "date_changed" by _load_bioentry_date (retrieved as "dates") and again but under "date" by _load_annotations (retrieved as "date"). Right now, I'm wondering why _load_bioentry_date exists in the first place ... perhaps this special annotation entry "date_changed" is to mimic BioPerl? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 13 19:59:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 19:59:14 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812140059.mBE0xE0g021156@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 19:59 EST ------- (In reply to comment #0) > Also, the convergence criteria is hard coded into the file by the following > gloable definitions: > MAX_IIS_ITERATIONS = 10000 # Maximum iterations for IIS. > IIS_CONVERGE = 1E-5 # Convergence criteria for IIS. > MAX_NEWTON_ITERATIONS = 100 # Maximum iterations on Newton's method. > NEWTON_CONVERGE = 1E-10 # Convergence criteria for Newton's method. > > This makes it impossible for the user to specify their own values without > changing the actual function. No, you can change them in your own code - they are just module level variable. For example: from Bio import MaxEntropy #Check the current limit, print MaxEntropy.MAX_NEWTON_ITERATIONS #Increase the iteration limit, MaxEntropy.MAX_NEWTON_ITERATIONS = 1000 One might argue these should be *optional* arguments to the functions. However, your suggested change adds new *required* arguments, which is not a backwards compatible API change. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 13 21:20:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 21:20:37 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812140220.mBE2KbM1026093@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #3 from bsouthey at gmail.com 2008-12-13 21:20 EST ------- (In reply to comment #2) > (In reply to comment #0) > > Also, the convergence criteria is hard coded into the file by the following > > gloable definitions: > > MAX_IIS_ITERATIONS = 10000 # Maximum iterations for IIS. > > IIS_CONVERGE = 1E-5 # Convergence criteria for IIS. > > MAX_NEWTON_ITERATIONS = 100 # Maximum iterations on Newton's method. > > NEWTON_CONVERGE = 1E-10 # Convergence criteria for Newton's method. > > > > This makes it impossible for the user to specify their own values without > > changing the actual function. > > No, you can change them in your own code - they are just module level variable. > For example: > > from Bio import MaxEntropy > #Check the current limit, > print MaxEntropy.MAX_NEWTON_ITERATIONS > #Increase the iteration limit, > MaxEntropy.MAX_NEWTON_ITERATIONS = 1000 > > One might argue these should be *optional* arguments to the functions. > However, your suggested change adds new *required* arguments, which is not a > backwards compatible API change. > > Peter > I strongly disagree on this because a user should not have to read the module source code to find these module level global variables and what values these actually are. But this is not my code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 13 23:27:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 23:27:16 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812140427.mBE4RGIE001073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2008-12-13 23:27 EST ------- (In reply to comment #3) > I strongly disagree on this because a user should not have to read the module > source code to find these module level global variables and what values these > actually are. But this is not my code. > I agree with Bruce that these variables should be arguments to the function, rather than module-level global variables. To keep the API backwards compatible, we can specify the current values for these variables as default values for these arguments. This will also make it easier for users that are not particularly interested in these variables. If you submit a revised patch, please do not just comment out unneeded code; it is better to actually remove code that is no longer needed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 14 08:17:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 14 Dec 2008 08:17:47 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812141317.mBEDHla7021974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-14 08:17 EST ------- (In reply to comment #3) > (In reply to comment #2) > > No, you can change them in your own code - they are just module level > > variables > > ... > > One might argue these should be *optional* arguments to the functions. > > However, your suggested change adds new *required* arguments, which is not a > > backwards compatible API change. Sorry - you *did* use optional arguments for the train function. I was distracted by the private functions where the new arguments are required. > I strongly disagree on this because a user should not have to read the module > source code to find these module level global variables and what values these > actually are. But this is not my code. I'm not saying the current state of the code is elegant - just correcting your factual error that the end user couldn't change these parameters. They can. (In reply to comment #4) > I agree with Bruce that these variables should be arguments to the function, > rather than module-level global variables. To keep the API backwards > compatible, we can specify the current values for these variables as default > values for these arguments. This will also make it easier for users that are > not particularly interested in these variables. This is what I was implying, although less clearly. To be even more explicit, if we want to add these variables as arguments to the functions then they should default to the existing upper case module level variables. We shouldn't remove or rename the module level variables in case anyone was using them them in the way I illustrated in comment 2. e.g. def train(training_set, results, feature_fns, update_fn=None): becomes something like this: def train(training_set, results, feature_fns, update_fn=None, max_iis_iterations = MAX_IIS_ITERATIONS, iis_convere = IIS_CONVERGE, max_newton_iterations = MAX_NEWTON_ITERATIONS newton_coverage = NEWTON_CONVERGE): #This function's code would then need updating to use #local variable max_iis_iterations instead of the #module level MAX_IIS_ITERATIONS. Note this does NOT use uppercase argument names as in Bruce's original patch - these would not be consistent with the rest of Biopython. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 05:11:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:11:37 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151011.mBFABbqD007138@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #6 from lpritc at scri.sari.ac.uk 2008-12-15 05:11 EST ------- (In reply to comment #2) > *** Bug 2710 has been marked as a duplicate of this bug. *** > (In reply to comment #0) > test_GenomeDiagram fails because the renderPM module is not part of standard > install of reportlab, at least under Linux. That's odd - renderPM is in the source for ReportLab 2.2. Are you using an up-to-date version? It seems to install well enough on our 64-bit Linux box from the ReportLab source. > I consider that the renderPM module should not be required so > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the > renderPM module when it is not available. renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part of GenomeDiagram's functionality. I prefer your alternative suggestion of making it a 'dynamic' import, but even then I think that the inconvenience of preparing the diagram, only to find out at the last possible stage that you can't draw it because you're missing the library, is worse than getting the error message upfront. Not that this should be a problem, since renderPM is part of the main ReportLab source, now. YMMV though, and I'm happy for the code to conform to the Biopython house style. > The installation documentation needs to include something about needing the > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. > > There must be a test for the presence of the renderPM module. I'm not convinced of the value of this, as renderPM is part of the current ReportLab source installation. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 05:17:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:17:54 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151017.mBFAHs0K007630@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #6 from lpritc at scri.sari.ac.uk 2008-12-15 05:11 EST ------- (In reply to comment #2) > *** Bug 2710 has been marked as a duplicate of this bug. *** > (In reply to comment #0) > test_GenomeDiagram fails because the renderPM module is not part of standard > install of reportlab, at least under Linux. That's odd - renderPM is in the source for ReportLab 2.2. Are you using an up-to-date version? It seems to install well enough on our 64-bit Linux box from the ReportLab source. > I consider that the renderPM module should not be required so > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the > renderPM module when it is not available. renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part of GenomeDiagram's functionality. I prefer your alternative suggestion of making it a 'dynamic' import, but even then I think that the inconvenience of preparing the diagram, only to find out at the last possible stage that you can't draw it because you're missing the library, is worse than getting the error message upfront. Not that this should be a problem, since renderPM is part of the main ReportLab source, now. YMMV though, and I'm happy for the code to conform to the Biopython house style. > The installation documentation needs to include something about needing the > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. > > There must be a test for the presence of the renderPM module. I'm not convinced of the value of this, as renderPM is part of the current ReportLab source installation. ------- Comment #7 from lpritc at scri.sari.ac.uk 2008-12-15 05:17 EST ------- (In reply to comment #0) (from #2710) > test_GenomeDiagram fails because the renderPM module is not part of standard > install of reportlab, at least under Linux. renderPM is part of the source install of ReportLab 2.2, and installs correctly on our 64-bit Linux box. Are you using an up-to-date version of ReportLab? The version that your distro's installer uses may not be the most recent. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 05:41:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:41:13 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151041.mBFAfDI8010277@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #8 from lpritc at scri.sari.ac.uk 2008-12-15 05:41 EST ------- (In reply to comment #0) > 1) Why are there two functions to output a diagram than just one generic > function? In particular, why not just pass a filename or not? When I wrote the libraries originally, I had one main use in mind: production of publication-quality images in vector format. Later on I decided that I needed streaming output for web display, and then bolted on the write_to_string() to look like the ReportLab interface, for consistency. That's why there are two methods: the write() method produces publication-quality (and bitmaps, if you ask), and the write_to_string() method produces the streaming output. It should be possible to make write() do both jobs, so long as the intention is declared in the argument list. It might be nice to just be able to specify a stream or handle, rather than the filename. Both of these would be an API change. > 2) I find the functions write() and write_to_string() just plain ugly. > You define a local dictionary of modules every time these functions are called. That dictionary could be placed at the head of the script to be defined on import. But I think it's more explicit what's going on to have it in the method itself - the dictionary has restricted scope, and is garbage-collected after the function call. Also, I don't understand your nested list proposal: distribution dictionaries are not that uncommon. > 4) I do not know the policy on module imports, but this line is only required > for write() and write_to_string(): > from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM > Also renderPM is an addon. Apologies for repeating myself earlier about this one - Bugzilla was being flaky - but renderPM is now part of ReportLab 2.2. Whether we should continue to support/cater for installations of 1.21 without the add-ons is another question, I think. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 05:51:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:51:30 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151051.mBFApU9R011217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #9 from lpritc at scri.sari.ac.uk 2008-12-15 05:51 EST ------- (In reply to comment #3) >As an aside, I'd like write_to_string() to support a DPI argument like write() > does. The way I originally intended write_to_string() to be used - sending graphics to a browser - the DPI has no influence at all. DPI is only of any importance for printing graphics: the DPI translates the pixel size into the final printed size of the image. The image you see on screen (assuming no fancy browser scaling) is pixel-per-pixel. That's why I left it out. It may be that people have a sensible reason for writing their image output to string - rather than binary - encoding, for writing to a file. I'm not clear on what that would be, but it's possible. In that case, I think that an appropriate merging of the write() and write_to_string() methods could be: def write(self, filename=None, output=default_output, dpi=default_dpi, encoding=default_encoding): encoding could then be either 'binary' (default), or 'string' - which would emulate write_to_string()'s function. Where handle is not None, the resulting output would be sent to the passed handle - which could potentially include sys.stdout. Where handle is None, the method could return the encoded image directly, as write_to_string() does, now. Other than the obvious problem with ReportLab's drawToFile requiring a filename, rather than a handle - does this seem like a reasonable plan to others? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 06:00:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 06:00:01 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151100.mBFB01fk011962@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-15 06:00 EST ------- (In reply to comment #8) > > > 4) I do not know the policy on module imports, but this line is only > > required for write() and write_to_string(): > > from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM > > Also renderPM is an addon. > > Apologies for repeating myself earlier about this one - Bugzilla was being > flaky - but renderPM is now part of ReportLab 2.2. Whether we should continue > to support/cater for installations of 1.21 without the add-ons is another > question, I think. I thought I'd commented on this bug already but I committed a patch which would fail gracefully if renderPM was missing. I must be running an older version of ReportLab on my Linux box at home, because it didn't have renderPM installed. However - this check is done when writing the file. This is good if you don't have renderPM but only want vector images. This is bad if you do want bitmaps images, as the missing dependency error happens at the very end. However, I don't think we can assume renderPM will be installed. Looking at the website for reportlab 2.2, its not clear if the Windows installers will include renderPM or not... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 06:02:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 06:02:35 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151102.mBFB2ZMq012237@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #11 from lpritc at scri.sari.ac.uk 2008-12-15 06:02 EST ------- (In reply to comment #3) > I agree something needs to be done for this issue (in particular the bit > originally covered by Bug 2710. > > Moving the imports into these function(s) would be another way to let use deal > with the missing renderPM module if and when it is used (either leave the > ImportError, or raise a missing external dependency error). One issue with this approach is that, when working with the module interactively, a user might not be aware of the absence of the appropriate module until they attempted to produce their output - which might be after quite a bit of interactive work. Informing the user up-front that renderPM is not available - either by ImportError or friendly warning - avoids this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 06:17:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 06:17:45 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151117.mBFBHjgn013463@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-15 06:17 EST ------- (In reply to comment #9) > (In reply to comment #3) > > As an aside, I'd like write_to_string() to support a DPI argument like > > write() does. > > The way I originally intended write_to_string() to be used - sending graphics > to a browser - the DPI has no influence at all. DPI is only of any importance > for printing graphics ... OK, so its less useful than I had expected. Rending bitmaps to strings so they can be inserted into a database as blobs is one potential use-case. Also for a web-service where you expect the user to save and print the naked image (unusual, and probably software dependent on how the DPI is treated). > In that case, I think that an appropriate merging of the write() and > write_to_string() methods could be: > > def write(self, filename=None, output=default_output, dpi=default_dpi, > encoding=default_encoding): > > encoding could then be either 'binary' (default), or 'string' - which would > emulate write_to_string()'s function. > > Where handle is not None, the resulting output would be sent to the passed > handle - which could potentially include sys.stdout. Where handle is None, > the method could return the encoded image directly, as write_to_string() > does, now. > > Other than the obvious problem with ReportLab's drawToFile requiring a > filename, rather than a handle - does this seem like a reasonable plan to > others? On the plus side, this would be backwards compatible (and we could deprecate the draw_to_string function). However, I'm not so keen on this style personally - the return value is radically different depending on the arguments (nothing, or a string of data). If we were designing this from scratch, I would have suggested one write function which wrote to a handle - which would let you then write to a file or a string (using StringIO). On the other hand, this is perhaps a little low level. We're had similar discussions regarding Bio.SeqIO in the past. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 15:33:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 15:33:51 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200812152033.mBFKXpp4005791@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #4 from joelb at lanl.gov 2008-12-15 15:33 EST ------- I heard back from GenBank, and it seems they are saying the problem isn't theirs: >On Tue, December 9, 2008 10:30 am, gb-admin at ncbi.nlm.nih.gov wrote: >> Hi Joel, >> >> I heard back from our database folks on this one. Essentially we do >> allow the source line to line-wrap, but we never publicly announced >> it. We apologize for this oversight and will be putting something >> in the release notes regarding this. Hopefully BioPython and other >> companies will be able to pick up this change and adapt once it is >> announced in the release notes. >> >> thanks for pointing it out >> >> Linda I just wrote back with the followup question: > >OK, but but then a followup question. How does one distinguish, then, a >line-wrapped organism line from the multiline phylogeny that follows? >According to my reading of the specs (and most Bio* GenBank parser's >implementations) it seems that an equally-valid parsing of the following >ORGANISM record is that it belongs to the "AKU_12601 Bacteria" kingdom. >That is, there is no official way of signalling "this is the end of the >multiline organism name" or "this begins the multiline phylogeny record." > > ORGANISM Salmonella enterica subsp. enterica serovar Paratyphi A str. > AKU_12601 > Bacteria; Proteobacteria; Gammaproteobacteria;Enterobacteriales; > Enterobacteriaceae; Salmonella. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 17 18:44:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Dec 2008 18:44:58 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200812172344.mBHNiwPt019616@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #5 from joelb at lanl.gov 2008-12-17 18:44 EST ------- I received the following response to my followup. It now appears that the bug is with BioPython, since GenBank has changed its definition. It seems likely that all Bio* flatfile parsers will be affected. >I just received the wording that will appear in Section 3.4.2 of gbrel.txt >for this month's release: > > ORGANISM - Formal scientific name of the organism (first line) >and taxonomic classification levels (second and subsequent lines). >Mandatory subkeyword in all annotated entries/two or more records. > > In the event that the organism name exceeds 68 characters (80 - 13 + >1) > in length, it will be line-wrapped and continue on a second line, > prior to the taxonomic classification. Unfortunately, very long > organism names were not anticipated when the fixed-length GenBank > flatfile format was defined in the 1980s. The possibility of linewraps > makes the job of flatfile parsers more difficult : essentially, one > cannot be sure that the second line is truly a classification/lineage > unless it consists of multiple tokens, delimited by semi-colons. > The long-term solution to this problem is to introduce an additional > subkeyword, probably 'LINEAGE' . This might occur sometime in 2009 > or 2010. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 18 06:07:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Dec 2008 06:07:16 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200812181107.mBIB7G97005964@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-18 06:07 EST ------- (In reply to comment #5) > I received the following response to my followup. It now appears that the bug > is with BioPython, since GenBank has changed its definition. It seems likely > that all Bio* flatfile parsers will be affected. Thanks for chasing this up Joel :) > I just received the wording that will appear in Section 3.4.2 of gbrel.txt > for this month's release: > > > > ORGANISM - Formal scientific name of the organism (first line) > >and taxonomic classification levels (second and subsequent lines). > >Mandatory subkeyword in all annotated entries/two or more records. > > > > In the event that the organism name exceeds 68 characters (80-13+1) > > in length, it will be line-wrapped and continue on a second line, > > prior to the taxonomic classification. Unfortunately, very long > > organism names were not anticipated when the fixed-length GenBank > > flatfile format was defined in the 1980s. The possibility of linewraps > > makes the job of flatfile parsers more difficult : essentially, one > > cannot be sure that the second line is truly a classification/lineage > > unless it consists of multiple tokens, delimited by semi-colons. > > The long-term solution to this problem is to introduce an additional > > subkeyword, probably 'LINEAGE' . This might occur sometime in 2009 > > or 2010. It looks like my guess was right, see comment #1: > Let's wait and hear what the NCBI says - I expect they will have to change the > file format definition slightly. > > If they say this is a valid file, I hope they will also explain officially > how we should split up the species and its lineage. One option would be > some thing like looking for semi-colons in the following text as indicative > of the lineage (rather than as more of the ORGANISM). Now that we've had the NCBI recommend the semi-colon approach, I've fixed our parser in CVS: Bio/GenBank/Record.py revision 1.14 Bio/GenBank/Scanner.py revision 1.26 Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 18 14:01:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Dec 2008 14:01:32 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812181901.mBIJ1W31019801@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-18 14:01 EST ------- (In reply to comment #27) > This might be better off as a new enhancement bug, but here is a possible > "arc-box" drawing function to go in the AbstractDrawer.py file, based on the > existing draw_box function. > > ... There was an issue with different frames of reference in the initial code I was suggesting. > Alternately, the code could just go in CircularDrawer.py directly. This seemed simpler in the short term. > As far as I can tell from looking at their source code, even ReportLab_1_21_2 > has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any > issue here with backwards compatibility. I've just checked in a patch based on this - see Bio/Graphics/GenomeDiagram/CircularDrawer.py revision 1.8 I've also updated the unit test to draw a circular diagram with some features in white (with an automatic black border). This now looks nice - with the old code using mutliple boxes to fake the arced box, the whole feature ended up looking black. See Tests/test_GenomeDiagram.py revision 1.13 As a bonus, PDF output seems a little smaller now as well :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 11:19:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 11:19:51 -0500 Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2 In-Reply-To: Message-ID: <200812221619.mBMGJp6k013225@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2375 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 11:19 EST ------- (In reply to comment #24) > I committed my patch to setup.py, as it seems to work fine with Python 2.3, > 2.4, and 2.5 on all platforms. Leaving this bug open, since we still need to > remove the workaround in Bio/PopGen/SimCoal/__init__.py. Editing Bio/PopGen/SimCoal/__init__.py so do just the following seems to work fine on Linux and MacOS (I've not tested on Windows yet): import os builtin_tpl_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "data")) I *think* this directory is only used in one place in Bio/PopGen/SimCoal/Template.py so it might make more sense to put this code in that function (leaving the __init__.py file essentially empty). What do you think Tiago? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 12:20:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 12:20:46 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200812221720.mBMHKkwo018936@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #961 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 12:20 EST ------- (From update of attachment 961) This patch is now obsolete - I've checked in a variant of this into CVS. This will allow us to proceed with Bug 2597 ( Enforce alphabet letters in Seq objects) without having to first introduce mixed case variants of the IUPAC alphabets. If/when we have mixed case IUPAC alphabets, then Bio.Sequencing.PhD could use them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 12:33:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 12:33:33 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200812221733.mBMHXXjd020146@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 12:33 EST ------- Created an attachment (id=1174) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1174&action=view) Patch for Bio/Nexus/Nexus.py (non IUPAC) alphabet handling (In reply to comment #2) > I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for > everyone (instead creating their own uppercase-lowercase variants of those > terribly complicated biopython alphabet classes), and easy to change for all > other modules if lowercase-uppercase is what they want (or need). I'm not saying we shouldn't add mixed (and even lower) case variants of the IUPAC alphabets, however, even if we had them, NEXUS still uses extra characters like "-" for gaps (easily handled via a Gapped alphabet encoder) and "?" (for a missing character). Are there any other extra characters? Under the current alphabet schema, we'd have to use a (mixed case) IUPAC alphabet, then add a Gapped AlphabetEncoder (easy) then add a new alphabet encoder for any misc letters non-IUPAC characters like "?". This could be done with the generic AlphabetEncoder, or we could add additional encoder objects for special meanings. This starts to get complicated (dealing with AlphabetEncoders is nasty). This attached patch is a variation on my "plan (a)" from comment 0. It makes Bio.Nexus create its own alphabet objects (based on the generic DNA/RNA/Protein classes) with the precise list of valid letters required for that file. Using this patch should allow us to press ahead with Bug 2597. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 12:38:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 12:38:10 -0500 Subject: [Biopython-dev] [Bug 2597] Enforce alphabet letters in Seq objects In-Reply-To: Message-ID: <200812221738.mBMHcA86020507@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2597 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 12:38 EST ------- Created an attachment (id=1175) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1175&action=view) Patch for Bio/Seq.py to check the alphabet letters This is a simple approach to checking the letters - probably not the fastest. I think it is important that the exception gives some clue about why the Seq object was not created - either listing the first invalid character (as in this patch) or listing all invalid characters (which could be done using sets). On the other hand, I'd like this check to be as fast as possible - perhaps even at the cost of a generic exception message like "Sequence contains letters which are not valid for the given alphabet". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 13:27:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 13:27:11 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200812221827.mBMIRBme024497@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 13:27 EST ------- Created an attachment (id=1176) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1176&action=view) Adding lower and mixed case IUPAC Alphabets This needs reviewing by someone else - especially the multiple inheritance which tries to follow the existing pattern that the parent is a more general version of the child. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 04:58:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 04:58:31 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812230958.mBN9wVDK000340@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #13 from bsouthey at gmail.com 2008-12-23 04:58 EST ------- (In reply to comment #6) > (In reply to comment #2) > > *** Bug 2710 has been marked as a duplicate of this bug. *** > > > > (In reply to comment #0) > > test_GenomeDiagram fails because the renderPM module is not part of standard > > install of reportlab, at least under Linux. > > That's odd - renderPM is in the source for ReportLab 2.2. Are you using an > up-to-date version? It seems to install well enough on our 64-bit Linux box > from the ReportLab source. I can not check this as I am away from my system. As I recall, the Python code for accessing this library is provided with the standard install as there is a renderPM.py file. But that is just a wrapper to some C code found in the rl_addons directory. So it is a big no that renderPM is available unless you actually build the C sources or download the binaries (only valid for Windows). According to the website http://www.reportlab.org/subversion.html " It will create subdirectories for reportlab, which is an importable python package, and rl_addons which contains the C extensions. The latter need building with the contained setup script, but can also be downloaded in pre-built form from our downloads page. They rarely change. " What did you actually install? In particular where was _renderPM built? Basically we need to document this as there appears to be different ways to install reporlab (may also be version or svn related). > > > I consider that the renderPM module should not be required so > > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the > > renderPM module when it is not available. > > renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part > of GenomeDiagram's functionality. No problem then, but you must provide a test for the presence and functionality of it in the actual code as well as the biopython tests. > > I prefer your alternative suggestion of making it a 'dynamic' import, but even > then I think that the inconvenience of preparing the diagram, only to find out > at the last possible stage that you can't draw it because you're missing the > library, is worse than getting the error message upfront. Not that this should > be a problem, since renderPM is part of the main ReportLab source, now. YMMV > though, and I'm happy for the code to conform to the Biopython house style. > > > The installation documentation needs to include something about needing the > > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. > > > > There must be a test for the presence of the renderPM module. > > I'm not convinced of the value of this, as renderPM is part of the current > ReportLab source installation. > My understanding is that this statement is not completely true. But I would like confirmation either way. There may also be allowance for windows installations especially non-source ones but I can not check those. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 05:18:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 05:18:58 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812231018.mBNAIwuq002193@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #14 from bsouthey at gmail.com 2008-12-23 05:18 EST ------- (In reply to comment #12) > (In reply to comment #9) > > (In reply to comment #3) > > > As an aside, I'd like write_to_string() to support a DPI argument like > > > write() does. > > > > The way I originally intended write_to_string() to be used - sending graphics > > to a browser - the DPI has no influence at all. DPI is only of any importance > > for printing graphics ... > > OK, so its less useful than I had expected. Rending bitmaps to strings so they > can be inserted into a database as blobs is one potential use-case. Also for a > web-service where you expect the user to save and print the naked image > (unusual, and probably software dependent on how the DPI is treated). > Surely it is important because a user can write to a string and then save the string to a file rather than using write() a second time. What do these options do? bg, configPIL, showBoundary > > In that case, I think that an appropriate merging of the write() and > > write_to_string() methods could be: > > > > def write(self, filename=None, output=default_output, dpi=default_dpi, > > encoding=default_encoding): > > > > encoding could then be either 'binary' (default), or 'string' - which would > > emulate write_to_string()'s function. > > > > Where handle is not None, the resulting output would be sent to the passed > > handle - which could potentially include sys.stdout. Where handle is None, > > the method could return the encoded image directly, as write_to_string() > > does, now. > > > > Other than the obvious problem with ReportLab's drawToFile requiring a > > filename, rather than a handle - does this seem like a reasonable plan to > > others? > > On the plus side, this would be backwards compatible (and we could deprecate > the draw_to_string function). > > However, I'm not so keen on this style personally - the return value is > radically different depending on the arguments (nothing, or a string of data). > > If we were designing this from scratch, I would have suggested one write > function which wrote to a handle - which would let you then write to a file or > a string (using StringIO). On the other hand, this is perhaps a little low > level. We're had similar discussions regarding Bio.SeqIO in the past. > I agree and I am not very concerned about backwards compatibility since this is a very new function to Biopython. I think that is what is almost what write_to_string() does and python functions are very big. But this is not my code so please do as you want here. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 06:12:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 06:12:33 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812231112.mBNBCXkt006916@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 06:12 EST ------- (In reply to comment #14) > (In reply to comment #12) > > OK, so its less useful than I had expected. Rending bitmaps to strings so > > they can be inserted into a database as blobs is one potential use-case. > > Also for a web-service where you expect the user to save and print the > > naked image (unusual, and probably software dependent on how the DPI is > > treated). > > Surely it is important because a user can write to a string and then save the > string to a file rather than using write() a second time. I was talking about write to string with a DPI not being so useful. Using write to string is VERY useful, particularly for a webserver (which is why Leighton added it, and how I have used it). Setting the DPI isn't important for using images in webpages - HTML and CSS provide lots of ways to control the displayed and printed size. Even if the browser is pointed directly at the image (and not as part of a webpage) and you then print it, the browser may ignore the DPI setting (probably browser specific). i.e. The DPI will only matter if the user saves the image and opens it in DPI aware software. (In reply to comment #14) > (In reply to comment #12) > > However, I'm not so keen on this style personally - the return value is > > radically different depending on the arguments (nothing, or a string of > > data). > > > > If we were designing this from scratch, I would have suggested one write > > function which wrote to a handle - which would let you then write to a > > file or a string (using StringIO). On the other hand, this is perhaps a > > little low level. We're had similar discussions regarding Bio.SeqIO in > > the past. > > I agree and I am not very concerned about backwards compatibility since this > is a very new function to Biopython. I think that is what is almost what > write_to_string() does and python functions are very big. But this is not my > code so please do as you want here. GenomeDiagram is new to Biopython, but has been available independently for many years. There will be some existing users (not just me and Leighton), and the less they have to change to switch their code from using standalone GenomeDiagram to the one within Biopython the better (the import lines have to change for example). We do need to think about backwards compatibility a bit. Getting back to your original points, (1) Two functions write() and write_to_string() This follows the reportlab API, and they do actually return different encodings. From a backwards compatibility argument they should both stay, but that doesn't stop us providing a unified method and deprecating write_to_string(). (2) Coding style of write() and write_to_string() I don't have a problem with this - it works, its clear, its easily extended if ReportLab add more back ends. It doesn't strike me as ugly. Inevitably this is largely a matter of preference. (3) The KeyError exception with invalid arguments. This is fixed in CVS, for an invalid format argument you now get a ValueError which is standard python practice. (4) renderPM Fixed in CVS, in that you can now use GenomeDiagram without ReportLab renderPM, and have full functionality except for bitmap output. Given we don't seem to be able to assume renderPM will be installed and working, this seems a reasonable solution. If you try and render a bitmap without renderPM, then you get a MissingExternalDependencyError exception asking you to install renderPM. We will need to look into this further for the documentation. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 07:45:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 07:45:55 -0500 Subject: [Biopython-dev] [Bug 2718] New: Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2718 Summary: Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk In addition to PDF and PS/EPS (encapsulated postscript), ReportLab can also do SVG, and with its optional renderPM module can do assorted bitmaps too (e.g. PNG, JPG, TIFF, GIF, BMP). Note that renderPM may not be installed (see Bug 2710). The recently added Bio.Graphics.GenomeDiagram module supports all of these formats - see Diagram.py with write (to filename or a handle) and write_to_string methods. Looking at the older Bio.Graphics code, it currently only supports PDF postscript, using a mixture of method names (which isn't very consistent): Bio.Graphics.Distribution has a DistributionPage object with a draw method (which writes to a filename or handle). Bio.Graphics.BasicChromosome has an Organism object with a write method (which writes to a filename or handle). Bio.Graphics.Comparative has a ComparativeScatterPlot object with a draw_to_file method (which writes to a filename or handle). I would like: (1) All the Bio.Graphics "write to file/handle" functions to accept any of the supported file formats (like Bio.Graphics.GenomeDiagram), which would require renderPM at run time for the bitmap formats (see Bug 2710). They should share some code for mapping format names to ReportLab rendering module. This would be easy to do without changing the existing mix of method names. (2) Update the docstrings for the "write to file/handle" functions to make it clear they can accept a filename OR a handle (a result of the underlying reportlab renderer's drawToFile function's behaviour - see note below). (3) Standardise on the method naming (and perhaps deprecate the old methods). Using "write" seems to be a sensible choice based on the current names used in Bio.Graphics. For reference/comparison, ReportLab's render modules have three related functions: * drawToString - Returns a string, calls drawToFile internally with a StringIO handle. * drawToFile - Takes a filename OR a handle (although their docstrings do not make this clear, this works as the Canvas object takes either). Calls the draw function internally. * draw - Takes a canvas object See also Bug 2711 which touched on these issues in the context of GenomeDiagram only. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 07:47:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 07:47:26 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812231247.mBNClPt9017108@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 07:47 EST ------- In comment #12, I wrote: > If we were designing this from scratch, I would have suggested one write > function which wrote to a handle - which would let you then write to a file or > a string (using StringIO). On the other hand, this is perhaps a little low > level. We're had similar discussions regarding Bio.SeqIO in the past. The reportlab docstrings are very unclear, however, their renderer's drawToFile functions take either a filename OR a handle. This works because the underlying Canvas object can be created giving either a filename or a handle. As a result, GenomeDiagram's write() method should accept either a filename or a handle. We should update the docstring to say this (perhaps even renaming the argument?). (In reply to comment #15) > (1) Two functions write() and write_to_string() > This follows the reportlab API, and they do actually return different > encodings. I wrote this based on something Leighton had said to me. Going over the reportlab code, this isn't true - reportlab's drawToString just calls drawToFile with a cStringIO or StringIO handle. They write identical data. (In reply to comment #15) > Getting back to your original points, > > (1) Two functions write() and write_to_string() > This follows the reportlab API, and they do actually return different > encodings. From a backwards compatibility argument they should both stay, but > that doesn't stop us providing a unified method and deprecating > write_to_string(). I've filed Bug 2718 for the general issue of method naming for the Bio.Graphics modules output functionality. > (2) Coding style of write() and write_to_string() > I don't have a problem with this - it works, its clear, its easily extended if > ReportLab add more back ends. It doesn't strike me as ugly. Inevitably this > is largely a matter of preference. Leaving this as is - the code itself may end up handled via shared function for all of Bio.Graphics via Bug 2718. > (3) The KeyError exception with invalid arguments. > This is fixed in CVS, for an invalid format argument you now get a ValueError > which is standard python practice. > > (4) renderPM > Fixed in CVS, in that you can now use GenomeDiagram without ReportLab > renderPM and have full functionality except for bitmap output. Given we > don't seem to be able to assume renderPM will be installed and working, this > seems a reasonable solution. If you try and render a bitmap without > renderPM, then you get a MissingExternalDependencyError exception asking you > to install renderPM. We will need to look into this further for the > documentation. Marking this bug as FIXED. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 07:55:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 07:55:11 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200812231255.mBNCtB1L017851@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 07:55 EST ------- Example script showing the reportlab render modules producing output given a filename, handle, or via a string: from reportlab.pdfgen.canvas import Canvas from reportlab.lib.units import cm from reportlab.graphics import renderPS, renderPDF, renderPM from reportlab.graphics.shapes import Drawing, String width = 10*cm height = 2*cm print "Using canvas directly (PDF only)..." c = Canvas("hello1.pdf", pagesize=(width, height)) c.drawString(1*cm, 1*cm, "Hello World!") c.showPage() c.save() #Create very simple drawing object, drawing = Drawing(width, height) drawing.add(String(1*cm, 1*cm, "Hello World!")) print "Using filenames..." renderPDF.drawToFile(drawing, "hello2.pdf") renderPM.drawToFile(drawing, "hello2.png", "PNG") print "Using handles..." handle = open("hello3.pdf","w") renderPDF.drawToFile(drawing, handle) handle.close() handle = open("hello3.ps","w") renderPS.drawToFile(drawing, handle) handle.close() handle = open("hello3.png","w") renderPM.drawToFile(drawing, handle, "PNG") handle.close() print "Using strings..." handle = open("hello4.pdf","w") handle.write(renderPDF.drawToString(drawing)) handle.close() handle = open("hello4.ps","w") handle.write(renderPS.drawToString(drawing)) handle.close() handle = open("hello4.png","w") handle.write(renderPM.drawToString(drawing, "PNG")) handle.close() print "Done" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 08:14:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 08:14:06 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200812231314.mBNDE64X019775@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 08:14 EST ------- (In reply to comment #0) > (1) All the Bio.Graphics "write to file/handle" functions to accept any of the > supported file formats (like Bio.Graphics.GenomeDiagram), which would require > renderPM at run time for the bitmap formats (see Bug 2710). They should share > some code for mapping format names to ReportLab rendering module. This would > be easy to do without changing the existing mix of method names. In addition, I notice that Bio.Graphics.BasicChromosome, Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram expects upper case. We should be consistent, which for backwards compatibility would mean accepting either case. > (2) Update the docstrings for the "write to file/handle" functions to make it > clear they can accept a filename OR a handle (a result of the underlying > reportlab renderer's drawToFile function's behaviour - see note below). I've updated the docstrings in CVS, Bio/Graphics/BasicChromosome.py revision 1.3 Bio/Graphics/Comparative.py revision 1.2 Bio/Graphics/Distribution.py revision 1.3 Bio/Graphics/GenomeDiagram/Diagram.py revision 1.3 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed Dec 24 05:52:48 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 24 Dec 2008 02:52:48 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <442447.52362.qm@web62407.mail.re1.yahoo.com> Message-ID: <451304.38587.qm@web62407.mail.re1.yahoo.com> Hi everybody, How about the following for Biopython tests: For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite". We could now do the following in run_tests.py: 1) import the testing module and save its output 2) try to call module.testing_suite 3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite. 4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output. I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests. Questions, comments, objections, anybody? --Michiel. --- On Thu, 12/4/08, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework > To: "Brad Chapman" , "Peter" > Cc: biopython-dev at lists.open-bio.org > Date: Thursday, December 4, 2008, 7:32 AM > > Michiel de Hoon wrote: > > > If one of the sub-tests fails, Python's unit > > > testing framework will tell us so, > > > though (perhaps) not exactly which sub-test > fails. > > > However, that is easy to > > > figure out just by running the individual test > script > > > by itself. > > > > That won't always work. Consider intermittent > network > > problems, or tests using random data - in general it > > really is worthwhile having run_tests.py report a > little > > more than just which test_XXX.py module failed. > > > I wonder if Python's unit testing framework allows us > to capture exactly which sub-test fails. I'll look into > that. Ideally, it should be possible to have regular Python > unit tests and Biopython-style print-and-compare tests side > by side, and get information about failing sub-tests for > both. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From dalloliogm at gmail.com Thu Dec 25 14:22:04 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 25 Dec 2008 20:22:04 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com> References: <442447.52362.qm@web62407.mail.re1.yahoo.com> <451304.38587.qm@web62407.mail.re1.yahoo.com> Message-ID: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com> On Wed, Dec 24, 2008 at 11:52 AM, Michiel de Hoon wrote: > Hi everybody, > > How about the following for Biopython tests: > > For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite". Merry Christmas! Some people suggested me the nose python framework: - http://somethingaboutorange.com/mrl/projects/nose/ It is used by many other open source projects, like sqlalchemy and elixir. I haven't tried it but I think it does more or less everything you said automatically, we could try to adopt it. > > We could now do the following in run_tests.py: > > 1) import the testing module and save its output > 2) try to call module.testing_suite > 3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite. > 4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output. > > I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests. > > Questions, comments, objections, anybody? > > --Michiel. > > > --- On Thu, 12/4/08, Michiel de Hoon wrote: > >> From: Michiel de Hoon >> Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework >> To: "Brad Chapman" , "Peter" >> Cc: biopython-dev at lists.open-bio.org >> Date: Thursday, December 4, 2008, 7:32 AM >> > Michiel de Hoon wrote: >> > > If one of the sub-tests fails, Python's unit >> > > testing framework will tell us so, >> > > though (perhaps) not exactly which sub-test >> fails. >> > > However, that is easy to >> > > figure out just by running the individual test >> script >> > > by itself. >> > >> > That won't always work. Consider intermittent >> network >> > problems, or tests using random data - in general it >> > really is worthwhile having run_tests.py report a >> little >> > more than just which test_XXX.py module failed. >> > >> I wonder if Python's unit testing framework allows us >> to capture exactly which sub-test fails. I'll look into >> that. Ideally, it should be possible to have regular Python >> unit tests and Biopython-style print-and-compare tests side >> by side, and get information about failing sub-tests for >> both. >> >> --Michiel. >> >> >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Fri Dec 26 09:32:02 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 26 Dec 2008 06:32:02 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com> Message-ID: <726361.18977.qm@web62402.mail.re1.yahoo.com> --- On Thu, 12/25/08, Giovanni Marco Dall'Olio wrote: > Some people suggested me the nose python framework: > - http://somethingaboutorange.com/mrl/projects/nose/ > > It is used by many other open source projects, like > sqlalchemy and elixir. > I haven't tried it but I think it does more or less > everything you > said automatically, we could try to adopt it. If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work? --Michiel. From dalloliogm at gmail.com Fri Dec 26 12:52:58 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 26 Dec 2008 18:52:58 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <726361.18977.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com> <726361.18977.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com> On Fri, Dec 26, 2008 at 3:32 PM, Michiel de Hoon wrote: > --- On Thu, 12/25/08, Giovanni Marco Dall'Olio wrote: >> Some people suggested me the nose python framework: >> - http://somethingaboutorange.com/mrl/projects/nose/ >> >> It is used by many other open source projects, like >> sqlalchemy and elixir. >> I haven't tried it but I think it does more or less >> everything you >> said automatically, we could try to adopt it. > > If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work? nose is a testing framework, so it is a dependency only for developers. I have been able to install sqlalchemy and elixir (projects that make use of nose) without having to install this framework first. The docs on nose's website can explain its usage better than me. Basically, you have to install nose (easy_install nose) and then run it as a shell command (nosetests). It automatically reads all the files in the current directory and subdirectories, collects all the methods/classes/etc whose name begins or ends with 'test_' (_test), plus any unittest, and execute them. It can also read doctests, it is possible to write plugins and apply an high degree of customization. I tried to run it over the latest biopython cvs, and it already highlighted some problems (a few modules still using Martel, etc). I forgot to say that this project is also hosted on google/code: - http://code.google.com/p/python-nose/ You can find more information in the docs: - http://code.google.com/p/python-nose/wiki/FindingAndRunningTests p.p.s. Even if it was a dependency, I think it is worth to use it anyway, rather than rewriting existing code. > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Fri Dec 26 16:40:57 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 26 Dec 2008 13:40:57 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com> Message-ID: <590227.1906.qm@web62402.mail.re1.yahoo.com> --- On Fri, 12/26/08, Giovanni Marco Dall'Olio wrote: > > If we use nose, does that mean adding another > dependency to Biopython? If so, I don't think it's > worth it. If not, how does this work? > > nose is a testing framework, so it is a dependency only for > developers. If we use nose, can our users still run the Biopython tests (without having to install nose first)? --Michiel. From dalloliogm at gmail.com Sat Dec 27 03:48:09 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 27 Dec 2008 09:48:09 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <590227.1906.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com> <590227.1906.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> On Fri, Dec 26, 2008 at 10:40 PM, Michiel de Hoon wrote: > --- On Fri, 12/26/08, Giovanni Marco Dall'Olio wrote: >> > If we use nose, does that mean adding another >> dependency to Biopython? If so, I don't think it's >> worth it. If not, how does this work? >> >> nose is a testing framework, so it is a dependency only for >> developers. > > If we use nose, can our users still run the Biopython tests (without having to install nose first)? Yes, but they will have to do it manually, or with a wrapper script (as it is now). Basically, we will have to move every test in functions/classes with names beginning with 'test_'. To be more precise, they should match the regular expression '(?:^|[b_.-])[Tt]est' (it is also possible to coustomize this regex). So, if a test now is it like this: if __name__ == '__main__': seq = Seq('sadasda') assert seq.tostring() == 'sadasda' we will have to refactor it like this: def _test(): """test description""" seq = Seq('sadasda') assert seq.tostring() == 'sadasda' if __name__ == '__main__': _test() # this is optional > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Sun Dec 28 11:04:14 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 28 Dec 2008 08:04:14 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> Message-ID: <877679.6134.qm@web62406.mail.re1.yahoo.com> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: > >> > If we use nose, does that mean adding another > >> > dependency to Biopython? If so, I don't think > >> > it's worth it. If not, how does this work? > >> > >> nose is a testing framework, so it is a dependency > >> only for developers. > > > > If we use nose, can our users still run the Biopython > tests (without having to install nose first)? > > Yes, but they will have to do it manually, or with a > wrapper script (as it is now). By manually, do you mean running each test separately by hand? If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework? --Michiel. From biopython at maubp.freeserve.co.uk Sun Dec 28 11:51:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Dec 2008 16:51:58 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com> References: <442447.52362.qm@web62407.mail.re1.yahoo.com> <451304.38587.qm@web62407.mail.re1.yahoo.com> Message-ID: <320fb6e00812280851y32450bb9le505ae257726f497@mail.gmail.com> On Wed, Dec 24, 2008 at 10:52 AM, Michiel de Hoon wrote: > > Hi everybody, > > How about the following for Biopython tests: > > For Python's unittest-style test modules, Python's unittest documentation > recommends to define a function in each test module that returns the > test suite. Most Biopython tests that use the unittest framework already > do this (the function is called "testing_suite". > > We could now do the following in run_tests.py: > > 1) import the testing module and save its output > 2) try to call module.testing_suite > 3) if it exists, then we're using Python's unittest framework. > So we run the tests in the testing suite. > 4) if it does not exist, then we're using the print-and-compare > approach. So we compare the saved output from the test to the correct output. > > I think that this can be set up such that it looks like nothing has > changed for the user, while the files containing the correct > output are no longer needed for the unittest-based tests. > > Questions, comments, objections, anybody? Sounds good to me - and doesn't add any new dependencies either. Peter From dalloliogm at gmail.com Sun Dec 28 16:11:59 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 28 Dec 2008 22:11:59 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <877679.6134.qm@web62406.mail.re1.yahoo.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> Message-ID: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> On Sun, Dec 28, 2008 at 5:04 PM, Michiel de Hoon wrote: > --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: >> >> > If we use nose, does that mean adding another >> >> > dependency to Biopython? If so, I don't think >> >> > it's worth it. If not, how does this work? >> >> >> >> nose is a testing framework, so it is a dependency >> >> only for developers. >> > >> > If we use nose, can our users still run the Biopython >> tests (without having to install nose first)? >> >> Yes, but they will have to do it manually, or with a >> wrapper script (as it is now). > If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework? The wrapper script won't be as efficient as using nose. Writing a separated wrapper script will take much time and it will be very difficult to mantain updated; moreover, you will have to test the wrapper script itself, to prove that it works and doesn't alter the results of the tests. Nose is not a replacement for unittests: it is a tool that searches for every unittest and script that look like a test, and execute it. It has a few advantages more, for example it enables global methods for setUp and tearDown, but it is not necessary to use them. If you want to reorganize the biopython's testing infrastructure, then you should think about adopting a serious testing environment, whether it is nose or something else. You can't continue on relying on wrapper scripts, they are too difficult to mantain and they are not really scientifically valid. The pygr project (another bioinformatics library in python) make use of nose, and they explain how in their documentation: - http://bioinformatics.ucla.edu/pygr_0_7_b3/testing-doc.html Please have a look at the pages I have posted before. > By manually, do you mean running each test separately by hand? I mean they will have to be run in the same way as it is now. Maybe, there is a way to use nose itself to create a wrapper script automatically. In fact, what nose does is to find all the functions that look like tests, and then execute them. It should be possible to just save the statements that are executed in a log file, that can be used as a wrapper script. If this option doesn't exists yet, we can just propose it to nose's developers. In brief, I think it doesn't make sense to write a new testingg framework just for biopython, when there are many already existing tool available and free to use. > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Sun Dec 28 19:18:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Dec 2008 00:18:22 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> Message-ID: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> Giovanni wrote: >> nose is a testing framework, so it is a dependency >> only for developers. Requiring another external dependency does count against using nose - it is much nicer if anyone installing Biopython from source can run our test suite without having to install anything further. Giovanni wrote: > If you want to reorganize the biopython's testing infrastructure, then > you should think about adopting a serious testing environment, whether > it is nose or something else. You can't continue on relying on wrapper > scripts, they are too difficult to mantain and they are not really > scientifically valid. I'm not sure I understand your point here (especially re difficult to maintain and not scientifically valid). I'm failry happy with the current test framework - I would rather see any effort be spent on writing more tests under the current framework than switching the framework itself. Giovanni wrote: > In brief, I think it doesn't make sense to write a new testingg > framework just for biopython, when there are many already existing > tool available and free to use. We haven't been talking about writing a new test frame work (which I agree isn't a good idea). Rather we're talking about a modification to the existing Biopython test framework (part of which uses the built in python unittest library). Michiel's proposal on 24th Dec seems like it will simplify working with unittest based tests (especially not having to track their trivial output in CVS/SVN). Peter From dalloliogm at gmail.com Mon Dec 29 04:53:51 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 29 Dec 2008 10:53:51 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> Message-ID: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com> On Mon, Dec 29, 2008 at 1:18 AM, Peter wrote: > Giovanni wrote: >>> nose is a testing framework, so it is a dependency >>> only for developers. > > Requiring another external dependency does count against using nose - > it is much nicer if anyone installing Biopython from source can run > our test suite without having to install anything further. As I was saying before, it will be not a dependency. It's an external tool that you can use or not to execute the tests automatically. Also, it is not a replacement for unittest. It is comparable to using epydoc for the documentation. > Giovanni wrote: >> If you want to reorganize the biopython's testing infrastructure, then >> you should think about adopting a serious testing environment, whether >> it is nose or something else. You can't continue on relying on wrapper >> scripts, they are too difficult to mantain and they are not really >> scientifically valid. > > I'm not sure I understand your point here (especially re difficult to > maintain and not scientifically valid). > The wrapper script itself is a program. Therefore, if you want to be paranoid, you will have to test it too :) It will be difficult to mantain because everytime you will have to modify it to adapt to the new tests etc. Many big opensource python project make use of this framework, and it has already been proven to work correctly; so the quality of biopython would be comparable with those existing projects. Another projecty that make use of nose is pytables (hdf5 format wrapper for python). They say they have some billions of tests :). > I'm failry happy with the current test framework - I would rather see > any effort be spent on writing more tests under the current framework > than switching the framework itself. > > Giovanni wrote: >> In brief, I think it doesn't make sense to write a new testingg >> framework just for biopython, when there are many already existing >> tool available and free to use. > > We haven't been talking about writing a new test frame work (which I > agree isn't a good idea). Rather we're talking about a modification > to the existing Biopython test framework (part of which uses the built > in python unittest library). Michiel's proposal on 24th Dec seems > like it will simplify working with unittest based tests (especially > not having to track their trivial output in CVS/SVN). Then you will have to develop a way to execute only some of the tests (e.g. only those who doesn't make use of internet connection, or only those who make use of a database). You will need to write some methods for running some setUp and tearDown methods globally. You will have to verify your wrapper script works. In short, you will end up with writing a tool which will be really similar to nose. So, since this tool already exists now, you will save a lot of time by using it. Michel's proposal is good, but I am saying that there are already tools that do the same thing automatically. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Dec 29 13:21:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Dec 2008 18:21:33 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com> Message-ID: <320fb6e00812291021n297af797scaf7fd6ba1a7b048@mail.gmail.com> >> We haven't been talking about writing a new test frame work (which I >> agree isn't a good idea). Rather we're talking about a modification >> to the existing Biopython test framework (part of which uses the built >> in python unittest library). Michiel's proposal on 24th Dec seems >> like it will simplify working with unittest based tests (especially >> not having to track their trivial output in CVS/SVN). > > Then you will have to develop a way to execute only some of the tests > (e.g. only those who doesn't make use of internet connection, or only > those who make use of a database). ... We already have that in place and working for our current framework. > ... Michel's proposal is good, but I am saying that there are already > tools that do the same thing automatically. Well, let's go with Michiel's plan in the short term (a modification to the current Biopython test framework, see his email of 24th December). We will then have a clear divide into two styles of unit test: (1) Those where the output is captured and compared to the expected output (which will also be in CVS). These are easy to write as essentially any example Biopython script can be used. (2) Those using the python unittest framework. I think these are more complicated and require a bit more effort and thought to write (and debug), but make it very clear what exactly is being tested. Peter From mjldehoon at yahoo.com Tue Dec 30 05:06:08 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Dec 2008 02:06:08 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> Message-ID: <620107.65178.qm@web62401.mail.re1.yahoo.com> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: > Basically, we will have to move every test in > functions/classes with > names beginning with 'test_'. To be more precise, > they should match > the regular expression '(?:^|[b_.-])[Tt]est' (it is > also possible to > coustomize this regex). > > So, if a test now is it like this: > > if __name__ == '__main__': > seq = Seq('sadasda') > assert seq.tostring() == 'sadasda' > > we will have to refactor it like this: > > def _test(): > """test description""" > seq = Seq('sadasda') > assert seq.tostring() == 'sadasda' > > if __name__ == '__main__': > _test() # this is optional Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this. --Michiel From dalloliogm at gmail.com Tue Dec 30 08:53:34 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 30 Dec 2008 14:53:34 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <620107.65178.qm@web62401.mail.re1.yahoo.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <620107.65178.qm@web62401.mail.re1.yahoo.com> Message-ID: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> On Tue, Dec 30, 2008 at 11:06 AM, Michiel de Hoon wrote: > > > > --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: >> Basically, we will have to move every test in >> functions/classes with >> names beginning with 'test_'. To be more precise, >> they should match >> the regular expression '(?:^|[b_.-])[Tt]est' (it is >> also possible to >> coustomize this regex). >> >> So, if a test now is it like this: >> >> if __name__ == '__main__': >> seq = Seq('sadasda') >> assert seq.tostring() == 'sadasda' >> >> we will have to refactor it like this: >> >> def _test(): >> """test description""" >> seq = Seq('sadasda') >> assert seq.tostring() == 'sadasda' >> >> if __name__ == '__main__': >> _test() # this is optional > > Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this. Why do you find it difficult? You just have to rename every test to make sure that its name starts or end with 'test_'. That's all. If you want to reorganize biopython's testing framework, this is a good thing to do anyway. In particular, every test function/class/script name should match the regular expression '(?:^|[b_.-])[Tt]est' (it can be customized). Unittest modules and doctest will be recognized, too. Note that nose already works if you run it over biopython's cvs; but since I am not familiar with biopython's code, I am not sure it recognizes every test. Ehm, this example that I put won't work with the default settings :/ it expected 'test_module' or something like this (anyway, the regex can be customized). > --Michiel > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Tue Dec 30 12:29:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Dec 2008 17:29:06 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <620107.65178.qm@web62401.mail.re1.yahoo.com> <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> Message-ID: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com> > You just have to rename every test to make sure that its name starts > or end with 'test_'. That's all. > If you want to reorganize biopython's testing framework, this is a > good thing to do anyway. All the individual Biopython test scripts are named test_*.py anyway, so that should be fine. Those test scripts were we have to verify the output probably won't work in nose (this is handled via our run_test.py framework), but the rest of our test scripts being unittest based might already be fine with nose. Peter From dalloliogm at gmail.com Tue Dec 30 13:34:15 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 30 Dec 2008 19:34:15 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <620107.65178.qm@web62401.mail.re1.yahoo.com> <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com> Message-ID: <5aa3b3570812301034i5c007d92k17a8e55c61b5715@mail.gmail.com> On Tue, Dec 30, 2008 at 6:29 PM, Peter wrote: >> You just have to rename every test to make sure that its name starts >> or end with 'test_'. That's all. >> If you want to reorganize biopython's testing framework, this is a >> good thing to do anyway. > > All the individual Biopython test scripts are named test_*.py anyway, > so that should be fine. Those test scripts were we have to verify the > output probably won't work in nose (this is handled via our > run_test.py framework), but the rest of our test scripts being > unittest based might already be fine with nose. I think it executes also the run_test.py scripts, because its name matches that regular expression. > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Tue Dec 30 13:34:45 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 30 Dec 2008 19:34:45 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> References: <20081125144041.GC83220@sobchak.mgh.harvard.edu> <45956.75241.qm@web62406.mail.re1.yahoo.com> <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> Message-ID: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> On Fri, Nov 28, 2008 at 12:09 PM, Peter wrote: > Brad wrote: >> Agreed with the distinction between the unit tests and the "dump >> lots of text and compare" approach. I've written both and do think >> the unit testing/assertion model is more robust since you can go >> back and actually get some insight into what someone was thinking >> when they wrote an assertion. > > I have probably written more of the "dump lots of text and compare" > style tests. I think these have a number of advantages: > (1) Easier for beginneers to write a test, you can almost take any > example script and use that. You don't have to learn the unit test > framework. I agree with what you say, but I think that all the 'dump and compare' tests should be organized in various functions. This will make easier to use and understand them, and they will be compatible with the nose framework. > (2) Debugging a failing test in IDLE is much easier - using unit tests > you have all that framework between you and the local scope where the > error happens. > (3) For many broad tests, manually setting up the expected output for > an assert is extremely tedious (e.g. parsing sequences and checking > their checksums). This is an interesting discussion if you want to talk about it a bit. An advantage of unittest are the two setUp and tearDown methods (fixtures). With those, you are sure that all the tests are run with the right environment and that all variables are dropped before executing a new test. Also, if you want to do a lot of dump and compare tests, consider writing some big doctest scripts. It will require a bit more of work to write them, but they will be easier to understand, and they will also become good tutorials for the users. This is a tutorial we wrote for a small project not related to biopython: - http://github.com/cswegger/datamatrix/tree/master/tutorial.txt As you can see, the text is both a tutorial and a test set (which make use of a dump and compare approach) for the program. > We could discuss a modification to run_tests.py so that if there is no > expected output file output/test_XXX for test_XXX.py we just run > test_XXX.py and check its return value (I think Michiel had previously > suggested something like this). I think this should be done inside the test itself. All the tests should return only a boolean value (passed or not) and a description of the error. The tests that make use of an expected output file, they should open it and do the comparison by theirselves, not in run_tests.py. > Perhaps for more robustness, capture > the output and compare it to a predefined list of regular expressions > covering the typical outputs. For example, looking at > output/test_Cluster, the first line is the test name, but rest follows > the patten "test_... ok". I imaging only a few output styles exist. mmm have you changed this file in the cvs recently? I can't find what you are referring to. > With such a change, half the unit test's (e.g. test_Cluster.py) > wouldn't need their output file in CVS (output/test_Cluster). > > Michiel de Hoon wrote: >> If one of the sub-tests fails, Python's unit testing framework will tell us so, >> though (perhaps) not exactly which sub-test fails. However, that is easy to >> figure out just by running the individual test script by itself. > > That won't always work. Consider intermittent network problems, or > tests using random data - in general it really is worthwhile having > run_tests.py report a little more than just which test_XXX.py module > failed. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Tue Dec 30 18:33:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Dec 2008 23:33:16 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> References: <20081125144041.GC83220@sobchak.mgh.harvard.edu> <45956.75241.qm@web62406.mail.re1.yahoo.com> <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> Message-ID: <320fb6e00812301533h55f5e9eehcec69cc1d5913420@mail.gmail.com> Brad wrote: >>> Agreed with the distinction between the unit tests and the "dump >>> lots of text and compare" approach. I've written both and do think >>> the unit testing/assertion model is more robust since you can go >>> back and actually get some insight into what someone was thinking >>> when they wrote an assertion. Peter worte: >> I have probably written more of the "dump lots of text and compare" >> style tests. I think these have a number of advantages: >> (1) Easier for beginners to write a test, you can almost take any >> example script and use that. You don't have to learn the unit test >> framework. >> ... Giovanni wrote: > I agree with what you say, but I think that all the 'dump and compare' > tests should be organized in various functions. > This will make easier to use and understand them, and they will be > compatible with the nose framework. If we organise the "dump and compare" tests into various functions (e.g. using the unittest framework), and turn print statements into asserts etc, then yes they would become nose compatible. However, this is a lot of work, and for relatively little gain. Also, doing so we lose the simplicity (e.g. my points made earlier) and make it harder for newcomers to write further tests. Nevertheless, we could regard Michiel's plan of 24 Dec as a step towards this, in that it simplifies writing unittest based tests (in that they won't need an expected output file which must also be kept in CVS/SVN). I'm not sure what you meant by "This will make easier to use and understand them, ...". Switching the unit test coding style makes no difference to the end user's point of view, they run the test suite using "python setup.py test" (typically as part of installation from source, or from the tests directory using "python run_tests.py") and won't see any difference in how the tests work internally. In terms of understanding the unit tests: If you are a beginner wanting to look at a unit test to give a feel for how to use the code, then frankly those of our unit tests which simple do some imports and print some output are MUCH easier to understand. By their nature they are essentially example Biopython scripts. On the other hand, those of our unit tests using the unittest framework have all these each object classes defined, and split up the setup/clean up into separate methods etc. In some senses this is "clutter" which is not helpful if you want to regard the unit test also as a usage example. >> (2) Debugging a failing test in IDLE is much easier - using unit tests >> you have all that framework between you and the local scope where the >> error happens. > >> (3) For many broad tests, manually setting up the expected output for >> an assert is extremely tedious (e.g. parsing sequences and checking >> their checksums). > > This is an interesting discussion if you want to talk about it a bit. It could be, but I don't want to get side tracked (distracted) from pressing ahead with Michiel's plan (the email of 24th Dec, or something similar) which seems to be a worthwhile small improvement to the current status. > An advantage of unittest are the two setUp and tearDown methods (fixtures). > With those, you are sure that all the tests are run with the right > environment and that all variables are dropped before executing a new > test. For some tests, yes, this is useful - in particular where there are lots of independent small things you want to test. In other situations you want to test a work flow, with a series of cumulative steps each building on each other. This would end up as a single large test function/method. > Also, if you want to do a lot of dump and compare tests, consider > writing some big doctest scripts. > It will require a bit more of work to write them, but they will be > easier to understand, and they will also become good tutorials for the > users. Certainly some of the current simple "dump and compare" tests might be converted into doctests (and we could do this within the current Biopython framework). However, the requirements for good documentation and good test coverage differ - you'd want to include tests for atypical code which you would not want to encourage as good coding practice. I'm quite keen for further usage of doctests - but I see them primarily as an improvement to our documentation. Peter wrote: >> We could discuss a modification to run_tests.py so that if there is no >> expected output file output/test_XXX for test_XXX.py we just run >> test_XXX.py and check its return value (I think Michiel had previously >> suggested something like this). Note that Michiel's email of 24th Dec is another approach to this topic - either would work, but his plan makes the division between the two test types much more explicit. Giovanni wrote: > I think this should be done inside the test itself. > All the tests should return only a boolean value (passed or not) and a > description of the error. > The tests that make use of an expected output file, they should open > it and do the comparison by theirselves, not in run_tests.py. Your plan would work, but it means the simplicity of this style of unit test is lost. Rather than doing this change (which would be a moderate amount of tedious work), I would rather go all the way and make them unittest based like the rest of our test suite. >> Perhaps for more robustness, capture >> the output and compare it to a predefined list of regular expressions >> covering the typical outputs. For example, looking at >> output/test_Cluster, the first line is the test name, but rest follows >> the patten "test_... ok". I imaging only a few output styles exist. >> With such a change, half the unit test's (e.g. test_Cluster.py) >> wouldn't need their output file in CVS (output/test_Cluster). > > mmm have you changed this file in the cvs recently? I can't find what > you are referring to. For this example, the unit test Tests/test_Cluster.py is here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/test_Cluster.py?cvsroot=biopython Its expected output file Test/output/test_Cluster is here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/output/test_Cluster?cvsroot=biopython Peter From bsouthey at gmail.com Mon Dec 1 02:37:05 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Sun, 30 Nov 2008 20:37:05 -0600 Subject: [Biopython-dev] Deprecation and removal policy In-Reply-To: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> Message-ID: On Fri, Nov 28, 2008 at 11:26 AM, Peter wrote: > Back on 27 June 2008, in preparation for what became Biopython 1.47, > Michiel wrote: >> In recent releases, we have been using the rule of thumb to remove all >> modules from a new Biopython release that were deprecated two >> releases ago. > > I was thinking that when we made releases about six months apart, this > rule of thumb effectively gave a year's warning. Recently we're made > releases roughly every three months, which translates to only about > six months warning, so I think we should be a little more restrained > in removing deprecated code in future. > > As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in > Release 1.48 (Sept 2009). Under the old rule of thumb, we could > remove this module from CVS now (as the deprecation was present in > Biopython 1.48 and 1.49). If we release Biopython 1.50 in January or > February 2009 (for the sake of argument), that means the deprecation > would have been in place for only four or five months - which seems > too rash. > > How about a new policy that after adding a deprecation warning, > deprecated modules/functions are kept for at least two public releases > AND at least 12 months (counting from the first release when they are > deprecated - not the date of the CVS change) before being removed? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, Generally I would agree with idea for code that is under active development. For certain code that has not really been touched for a few years except for trivial changes (like removing string functions), I think 12 months is perhaps too long if it passes two releases. Regardless of how it is done, Python 3 will need to be supported (the final release is due soon) and I do not see a reason to port depreciated modules or functions just because of some policy. So I would add the provision that depreciated code will not be ported to the Python 3 compatible Biopython branch. Bruce From biopython at maubp.freeserve.co.uk Mon Dec 1 12:56:12 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 1 Dec 2008 12:56:12 +0000 Subject: [Biopython-dev] Deprecation and removal policy In-Reply-To: References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com> Message-ID: <320fb6e00812010456r9ae1a66p66032d02377003db@mail.gmail.com> Peter wrote: >> ... >> How about a new policy that after adding a deprecation warning, >> deprecated modules/functions are kept for at least two public releases >> AND at least 12 months (counting from the first release when they are >> deprecated - not the date of the CVS change) before being removed? Bruce wrote: > > Hi, > Generally I would agree with idea for code that is under active > development. For certain code that has not really been touched for a > few years except for trivial changes (like removing string functions), > I think 12 months is perhaps too long if it passes two releases. Just because some (deprecated) code hasn't been changed in several years doesn't mean no-one is using it. Giving less warning for removing such old but stable code isn't fair. > Regardless of how it is done, Python 3 will need to be supported (the > final release is due soon) and I do not see a reason to port > depreciated modules or functions just because of some policy. So I > would add the provision that depreciated code will not be ported to > the Python 3 compatible Biopython branch. I disagree - dropping old modules is changing the API, counter to Guido and other's recommendation/request: "Don't change your APIs incompatibly when porting to Py3k." http://www.artima.com/weblogs/viewpost.jsp?thread=227041 If porting any particular deprecated module or piece of code to Python 3 proved too difficult, then maybe we might drop that code (for example, due to third party dependencies on an obsolete version of mxTextTools, I don't think we'll port Martel/Mindy to Python 3). Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:36:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:36:33 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011536.mB1FaXWF003857@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:36 EST ------- Unit Test ========= The unit test included, test_GenomeDiagram.py adds yet another GenBank file to the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB is best avoided. I would prefer we used existing GenBank files already included in Biopython which would serve just as well. e.g. GenBank/NC_005816.gb file (Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1) which is circular. 9609 bp. GenBank/arab1.gb (Arabidopsis thaliana BAC T25K16 from chromosome I) which is linear. 86436 bp. Also, the code to parse the GenBank file does so via Bio.GenBank, and I would prefer to use Bio.SeqIO here. I'll attach a revised version shortly... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:40:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:40:22 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200812011540.mB1FeMWx004105@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:40 EST ------- Bio.Graphics.GenomeDiagram.Utilities ==================================== This is a collection of utilities for getting information useful for graph values. From the docstring, o apply_to_window (sequence, window_size, function, step=None) Apply a passed function to fragments of the passed sequence of size window_size, with each window separated by the passed step. o calc_gc_content (sequence) Returns the %GC content of a passed sequence o calc_at_content (sequence) Returns the %AT content of a passed sequence o calc_gc_skew (sequence) Returns the GC skew of a passed sequence o calc_at_skew (sequence) Returns the AT skew of a passed sequence o gc_content (sequence, window_size, step=None) Returns the %GC content of a passed sequence in windows of the passed size, separated by the passed step size o at_content (sequence, window_size, step=None) Returns the %AT content of a passed sequence in windows of the passed size, separated by the passed step size o gc_skew (sequence, window_size, step=None) Returns the GC skew of a passed sequence in windows of the passed size, separated by the passed step size o at_skew (sequence, window_size, step=None) Returns the AT skew of a passed sequence in windows of the passed size, separated by the passed step size I can see why these were useful when GenomeDiagram was a separate package, but I don't think we should add this file to Biopython as it is unnecessary code duplication. If we do lack any of this functionality, putting it somewhere under Bio.SeqUtils makes more sense than under Bio.Graphics. I have not looked at any implications this may have for the existing documentation or the GenomeDiagram unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:47:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:47:01 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011547.mB1Fl1qY004683@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:47 EST ------- Bio.Graphics.GenomeDiagram.DrawAll ================================== According to the comments, this is a script to walk a directory structure below the directory passed, and draw images of each .gbk file found there. While useful, I don't think this belongs in the core library. Maybe rename it and move it into our scripts or example directory instead... Bio.Graphics.GenomeDiagram.Utilities ==================================== This is a collection of utilities for getting information useful for graph values. From the docstring, o apply_to_window (sequence, window_size, function, step=None) Apply a passed function to fragments of the passed sequence of size window_size, with each window separated by the passed step. o calc_gc_content (sequence) Returns the %GC content of a passed sequence o calc_at_content (sequence) Returns the %AT content of a passed sequence o calc_gc_skew (sequence) Returns the GC skew of a passed sequence o calc_at_skew (sequence) Returns the AT skew of a passed sequence o gc_content (sequence, window_size, step=None) Returns the %GC content of a passed sequence in windows of the passed size, separated by the passed step size o at_content (sequence, window_size, step=None) Returns the %AT content of a passed sequence in windows of the passed size, separated by the passed step size o gc_skew (sequence, window_size, step=None) Returns the GC skew of a passed sequence in windows of the passed size, separated by the passed step size o at_skew (sequence, window_size, step=None) Returns the AT skew of a passed sequence in windows of the passed size, separated by the passed step size I can see why these were useful when GenomeDiagram was a separate package, but I don't think we should add this file to Biopython as it is unnecessary code duplication. If we do lack any of this functionality, putting it somewhere under Bio.SeqUtils makes more sense than under Bio.Graphics. I have not looked at any implications this may have for the existing documentation or the GenomeDiagram unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:49:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:49:14 -0500 Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements In-Reply-To: Message-ID: <200812011549.mB1FnEB8004888@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2677 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 10:49 EST ------- (In reply to comment #10) > Bio.Graphics.GenomeDiagram.Utilities > ==================================== > This is a collection of utilities for getting information useful for graph > values. From the docstring, ... Sorry - ignore this comment, it should have been on Bug 2671. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:51:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:51:19 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011551.mB1FpJNU005019@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #13 from lpritc at scri.sari.ac.uk 2008-12-01 10:51 EST ------- (In reply to comment #11) > Unit Test > ========= > The unit test included, test_GenomeDiagram.py adds yet another GenBank file to > the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB > is best avoided. I would prefer we used existing GenBank files already > included in Biopython which would serve just as well. That's a good idea. > Also, the code to parse the GenBank file does so via Bio.GenBank, and I would > prefer to use Bio.SeqIO here. I noticed that in revising the documentation, but hadn't got around to doing anything about it, except in the example code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 15:59:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 10:59:35 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011559.mB1FxZwH005670@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #14 from lpritc at scri.sari.ac.uk 2008-12-01 10:59 EST ------- (In reply to comment #12) > Bio.Graphics.GenomeDiagram.DrawAll > ================================== > According to the comments, this is a script to walk a directory structure below > the directory passed, and draw images of each .gbk file found there. > > While useful, I don't think this belongs in the core library. Maybe rename it > and move it into our scripts or example directory instead... Ah. I thought I'd left that one out. I was picturing perhaps having a Utilities.py module containing a function with that behaviour, and/or functions that drew a standard representation of a GenBank file, so that those who are not interested in the minutiae of the API/drawing their diagrams could still get a fair amount of function for little effort. On reflection, these functions are perhaps better suited to living in __init__.py. What do you think? > Bio.Graphics.GenomeDiagram.Utilities > ==================================== > This is a collection of utilities for getting information useful for graph > values, > I can see why these were useful when GenomeDiagram was a separate package, but > I don't think we should add this file to Biopython as it is unnecessary code > duplication. If we do lack any of this functionality, putting it somewhere > under Bio.SeqUtils makes more sense than under Bio.Graphics. Where there is repetition of function here, I'm happy to go with established Biopython code in preference. For graph data, GenomeDiagram expects a list of (position, value) tuples, which the functions in Utilities.py supply directly. There will be a level of user-processing required in moving to the Biopython versions. Perhaps the inclusion of similar functions in __init__ that wrap the Biopython versions to produce the appropriate format for graphs would be useful here? > I have not looked at any implications this may have for the existing > documentation or the GenomeDiagram unit test. Removing Utilities.py outright will affect both the documentation and the unit test. Both require those functions (or something similar) to generate test/example graph data. I would be happy to replace the existing functions with wrapped Biopython functions in __init__ - does this seem like a sensible option? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 16:59:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 11:59:50 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011659.mB1GxoGa009013@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1063 is|0 |1 obsolete| | Attachment #1121 is|0 |1 obsolete| | ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 11:59 EST ------- Created an attachment (id=1132) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1132&action=view) Zip of python files to go under Bio/Graphics/GenomeDiagram This attachment is just the main python files, omitting DrawAll.py and Utilities.py (see comment 12 and comment 14). The unit test needs updating to match (but then passes, updated version to follow). (In reply to comment #0) > Code for wx widgets has been removed, although the Observer/Observable code > remains, allowing user widgets to hook into the code, if that's desirable. There was a tiny bit of wx stuff still there in Diagram.py which I have removed in this version. After discussion with Leighton directly, due to possible uncertainly over the licensing of the Observer/Observable code (originally based on an example by Peter Norvig) this has been removed, together with the associated "set" methods in Diagram.py etc. This code was intended to assist using GenomeDiagram within a GUI. Note that if we later want to reintroduce this functionality, using python's property feature (with get/set functions) would allow the set function to update the observer. Leighton's old code would only update the observer if the set method was used explicitly (and not if the object property were updated directly). (In reply to comment #6) > I am perfectly happy with re-licensing the GD code under the Biopython > license. If you need a gpg-signed document to say so, I can provide one ;) I've updated the header of each file to reflect the Biopython license. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 17:20:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 12:20:57 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812011720.mB1HKvIJ010157@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-01 12:20 EST ------- Created an attachment (id=1133) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1133&action=view) Revised test_GenomeDiagram.py This uses the existing GenBank/arab1.gb file for input. It also includes a (slightly modified) copy of the GenomeDiagram.Utilities functions as a short term solution to the issues raised in comment 12 and comment 14. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 20:01:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 15:01:44 -0500 Subject: [Biopython-dev] [Bug 2693] New: LogisticRegression convergence criterion is too lenient Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2693 Summary: LogisticRegression convergence criterion is too lenient Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com In R and SAS, the example in the code and tutorial provides the following parameters: Intercept = 18.9622 x1 = -0.0714 x2 = 0.0444 By default, Bio/LogisticRegression.py defines the following parameters MAX_ITERATIONS = 500 CONVERGE_THRESHOLD = 0.01 The convergence threshold is too lenient so the iterations terminate before the expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD = 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS is greater than 7761 with my system. MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within Bio/LogisticRegression.py module but should be part of the API for the train function such as: def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD = 0.000000001, MAX_ITERATIONS=10000): Note the algorithm used requires a large number of iterations and the train function does not display the degree of convergence attained when MAX_ITERATIONS is exceeded. Jeffrey Whitaker provides Python code using an alternative algorithm: http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py Furthermore, the update_fn should also pass the previous likelihood or difference is likelihood so the actual convergence can be seen. Really the update_fn should be more general than this and be able to display more information but the attached patches provides the previous llh (old_llik). def show_progress(iteration, old_llh, loglikelihood): print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:", loglikelihood, "Diff:", (old_llh-loglikelihood) model = LogisticRegression.train(xs, ys, update_fn=show_progress) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 1 20:03:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 1 Dec 2008 15:03:27 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200812012003.mB1K3Rqg017974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #1 from bsouthey at gmail.com 2008-12-01 15:03 EST ------- Created an attachment (id=1134) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1134&action=view) Improvements to LogisticRegression.py Addresses certain problems with LogisticRegression.py and enhances the module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Mon Dec 1 20:53:59 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 1 Dec 2008 21:53:59 +0100 Subject: [Biopython-dev] [BioPython] Refactoring motif analysis code In-Reply-To: <492ACE38.1090301@gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> Message-ID: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> Hi all, I've done some work regarding the motif analysis in Biopython. I've done the following stuff: - refactored the Bio.AlignAce and Bio.MEME to use one common motif object - Put all of the refactored code in the Bio.Motif directory - Added more code (from my attic) to do motif comparisons and computing thresholds (this was actually written by my colleague Norbert Dojer, but I adapted it and I have his permission to contribute the code) - written a short tutorial on the usage of Bio.Motif (that's where I'd put it). - Written a basic test suite for the new motif. I haven't added it to cvs yet, but posted it as an attchment to the enhancement proposal in bugzilla: http://bugzilla.open-bio.org/show_bug.cgi?id=2694 I have cvs access, so I can commit the changes myself, but I'd like to wait for an "OK" from someone more involved in the release process. Since Giovanni and Bruce have responded to my previous call for comments, I'll try to answer them below: On Mon, Nov 24, 2008 at 4:54 PM, Bruce Southey wrote: > > Actually I am not that thrilled with the licenses for these packages and > similar packages because these are free only for academic use. To me this > clashes with the spirit of an open-sourced project especially a BSD-licensed > one. But if there is a need for such modules then these modules should be > included. > I have similar feelings about the "academic-use-only" licenses. On the other hand, since most of the biopython users are in academia, then I don't see it as a big problem. Also, since I don't have any truly open and free replacement for these programs, I think it's better to keep them. In fact the new Bio.Motif package provides some methods for motif comparisons, which at least to some extent can be used as a replacement for the respective functions of CompareACE and MAST. As a side note, I think that there is no point in providing parsers for every single motif finder that comes out, and I don't think that AlignAce and MEME are the best or the most representative ones. It just happened that these parsers were written "to scratch someone's itch". I think that the other functionality (motif searching, comparisons,weblogo) might be more useful to people. > While it is only free for academic use, have you seen TAMO? > *TAMO: a flexible, object-oriented framework for analyzing transcriptional > regulation using DNA-sequence motifs. * > Bioinformatics. 2005 Jul 15;21(14):3164-5. > > > http://fraenkel.mit.edu/TAMO/ Yes, I've seen it and I've even recommended it on the biopython mailing list when there was no replacement in biopython. However, their library is free only for academia and AFAIK it's not using biopython datastructures, so needs some work to integrate with TAMO if you are using Biopython. Bio.Motif is meant to provide free software for Motif analysis. > Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-) > Based on the CVS, both have been untouched for about three years. > Well, I've not used it myself for a while... I'm no longer doing de-novo motif discovery. However, it still works so it's potentially useful. I think this is largely due to the lack of documentation for the Bio.AlignAce and Bio.MEME tools (partially my fault). Hopefully people will start using this if they read the tutorial. > Also, what species are these used for? > One of the papers of AlignAce indicate that the base composition was set for > yeast. > They're both general purpose, you can set the gc content for alignAce and even an HMM for MEME. > > Personally I would be interested in a general protein motif finding module > because of my current research. However, I do have a different view with > respect to the Biopython community as indicated above with the licenses. Both MEME and AlignAce can be used to find motifs in proteins, but it has not so much to do with Bio.Motif, since it does not provide any motif-finnding capabilities by itself. In general Bio.Motif should be able to deal with protein motifs, but I've never tested it (I'm mostly using it for DNA motifs), so I'll be happy to help if you find bugs. On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio wrote: > > I would just like to tell you that I have tried the TAMO framework you > suggested me, and found it very useful. Yes, I remember, but the problem is with the TAMO license. I think that the Motif object might be still useful since it is free, allows to read motifs from databases like JASPAR to scan sequences and/or compare them with "your" motifs. > I am not using it anymore because I don't need it, but I remember that I liked: > - the methods to represent motifs as matrixes of frequencies/occurrencies etc.. done > - the fact that it was easy to create a motif from an alignment of sequences depending on your definition of easy, it's there > - the integration it had with this website: > http://weblogo.berkeley.edu/logo.cgi. done > I would suggest you to provide integration with this other web > service, which enable to plot the difference between two sequence > logos: http://www.twosamplelogo.org/examples.html. This I haven't done yet, but I'll try to provide functionality for that (shouldn't take too long). -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From dalloliogm at gmail.com Mon Dec 1 21:07:08 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 1 Dec 2008 22:07:08 +0100 Subject: [Biopython-dev] [BioPython] Refactoring motif analysis code In-Reply-To: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> Message-ID: <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com> On Mon, Dec 1, 2008 at 9:53 PM, Bartek Wilczynski wrote: > On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio > wrote: >> >> I would just like to tell you that I have tried the TAMO framework you >> suggested me, and found it very useful. > > Yes, I remember, but the problem is with the TAMO license. I think > that the Motif object might be still > useful since it is free, allows to read motifs from databases like > JASPAR to scan sequences and/or > compare them with "your" motifs. Thanks for all these changes. I remember that I wrote a mail to TAMO's authors when I was using it. They seemed to be interested in integrating the code with biopython, so maybe the license issue could be superated. It's up to you, whether you want to reimplement all the functions they have or not. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bartek at rezolwenta.eu.org Tue Dec 2 09:39:37 2008 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 2 Dec 2008 10:39:37 +0100 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com> <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com> Message-ID: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com> On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio wrote: > Thanks for all these changes. > I remember that I wrote a mail to TAMO's authors when I was using it. > They seemed to be interested in integrating the code with biopython, > so maybe the license issue could be superated. > It's up to you, whether you want to reimplement all the functions they > have or not. I have to say I haven't done anything yet towards integrating TAMO with biopython. So far, my own code was doing the job for me, and since there was a certain learning curve to get into TAMO, I didn't look closely into it. I've looked more carefully now at it and I have two general thoughts: - There is a number of features in TAMO, for which there is no counterpart in Bio.Motif. Just by looking at module names I've found: - MDscan parser - their own EM motif finding scheme (some kind of EM method) - several motif comparison functions from MotifCompare - a lot of nice little methods for motifs like textLogo, giflogo, etc. - There is quite an overlap between biopython and TAMO. They implemented their own Sequence handling, FASTA Parser, clustering module etc. There will be some gruntwork with integrating their code into Biopython (findining and reconciling the overlaps) I also have to say, that I'm a bit scared by copright statements in the TAMO code, saying it belongs to the Whitehead institute. I don't want to be overly pessimistic, but the process of releasing this code under biopython license might be slow. What I think is the best way to go is to clean up current mess with Bio.Alignace and Bio.MEME, and then ask people for contributions. If TAMO developers would be willing to contribute I'll be happy to help with integration into biopython. It will take some time anyway, so I wouldn't delay the inclusion of Bio.Motif into Biopython. cheers Bartek -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From timothyham at gmail.com Wed Dec 3 00:19:48 2008 From: timothyham at gmail.com (Timothy Ham) Date: Tue, 2 Dec 2008 16:19:48 -0800 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) Message-ID: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> Hi everyone, The current biopython GenBank parser dies while parsing VectorNTI generated files. For example, until recently, BioPython did not accept an empty SOURCE field. It still does not handle an empty VERSION or ACCESSION fields (consumer.data.id never gets filled), which is the default for user generated vector maps via VectorNTI. Now, it is easy enough to change the GenBank parser to handle malformed genbank files, (I can submit patches) but the real question becomes: > Should BioPython handle malformed genbank files at all? I would like to be practical and say yes, since VectorNTI is a very common, widely used format, but I wanted to ask the community before submitting my patches. Thanks for the great work, Tim From bsouthey at gmail.com Wed Dec 3 02:33:26 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 2 Dec 2008 20:33:26 -0600 Subject: [Biopython-dev] Refactoring motif analysis code In-Reply-To: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com> References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com> <492ACE38.1090301@gmail.com> <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com> <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com> <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com> <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com> Message-ID: On Tue, Dec 2, 2008 at 3:39 AM, Bartek Wilczynski wrote: > On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio > wrote: > >> Thanks for all these changes. >> I remember that I wrote a mail to TAMO's authors when I was using it. >> They seemed to be interested in integrating the code with biopython, >> so maybe the license issue could be superated. >> It's up to you, whether you want to reimplement all the functions they >> have or not. > > I have to say I haven't done anything yet towards integrating TAMO > with biopython. > So far, my own code was doing the job for me, and since there was a > certain learning curve to get into TAMO, > I didn't look closely into it. I've looked more carefully now at it > and I have two general thoughts: > - There is a number of features in TAMO, for which there is no > counterpart in Bio.Motif. Just by looking at module names I've found: > - MDscan parser > - their own EM motif finding scheme (some kind of EM method) > - several motif comparison functions from MotifCompare > - a lot of nice little methods for motifs like textLogo, giflogo, etc. > - There is quite an overlap between biopython and TAMO. They > implemented their own Sequence handling, FASTA Parser, clustering > module etc. There will be some gruntwork with integrating their code > into Biopython (findining and reconciling the overlaps) > > I also have to say, that I'm a bit scared by copright statements in > the TAMO code, saying it belongs to the Whitehead institute. I don't > want to be overly pessimistic, but the process of releasing this code > under biopython license might be slow. > > What I think is the best way to go is to clean up current mess with > Bio.Alignace and Bio.MEME, and then ask people for contributions. > If TAMO developers would be willing to contribute I'll be happy to > help with integration into biopython. It will take some time anyway, > so I wouldn't delay the inclusion of Bio.Motif into Biopython. > > cheers > Bartek > > > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > I would agree that you should ignore TAMO and just focus on developing a suitable framework to integrate Alignace and MEME as you have indicated. I would presume that the other motif finding applications will also fit into that framework. Unless the TAMO code is under a BSD-style or equivalent license that is compatible with Biopython you must stop looking at it. I know it is hard to avoid as the comes up on Google with a simple search. If the TAMO code gets suitably licensed, then fine but until then it can cause major problems that can involve the whole Biopython project (even including GPLed code can do this). Bruce From biopython at maubp.freeserve.co.uk Wed Dec 3 21:10:49 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Dec 2008 21:10:49 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] PubMed Entrez Utility 2009 DTD changes In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov> Message-ID: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com> This email from the NCBI will be of interest for Bio.Entrez - we may need to add a few DTD files to Bio.Entrez in preparation for this... see also Bug 2678. Peter ---------- Forwarded message ---------- From: Date: Wed, Dec 3, 2008 at 8:57 PM Subject: [Utilities-announce] PubMed Entrez Utility 2009 DTD changes To: utilities-announce at ncbi.nlm.nih.gov PubMed Entrez Utility Users, We anticipate switching to the updated PubMed 2009 DTDs on December 15, 2008. 2009 DTDs are available from the Entrez DTD page: http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/index.html The DTD changes for the 2009 production year, as noted in the Revision Notes section near the top of each DTD, are: NLMMedline DTD (used for MEDLINE/PubMed) a. Changed entity reference from "nlmmedlinecitation_080101.dtd" to: "nlmmedlinecitation_090101.dtd" b. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref and NlmDcmsID element [Edited 10/16/08] c. FOR V.3: Added GrantCountry.Ref entity [Edited 10/30/08] NLMMedlineCitation DTD (used for MEDLINE/PubMed data) a. Changed entity reference from "nlmsharedcatcit_080101.dtd" to: "nlmsharedcatcit_090101.dtd" b. Moved entity Type to nlmcommon dtd c. Added NLM value to entity Source d. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref [Edited 10/16/08] NLMSharedCatCit DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile) a. Changed entity reference from "nlmcommon_080101.dtd" to "nlmcommon_090101.dtd" b. Moved OtherAbstract element from nlmsharedcatcit dtd to nlmcommon dtd NLMCommon DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile) a. Added ValidYN attribute to Investigator element b. Moved OtherAbstract element from nlmsharedcatcit to nlmcommon dtd c. Added OtherAbstract element to NCBIArticle element d. Moved entity Type from nlmmedlinecitation to nlmcommon dtd e. Added Publisher value to entity Type f. Deleted Consumer value from entity Type g. Added Country element to Grant element h. FOR V.2: Changed Country value to GrantCountry.Ref in Grant Element [Edited 10/30/08] NLMCatalogRecord DTD (used for CatfilePlus and Serfile in XML format): a. Changed entity reference from "nlmsharedcatcit_080101.dtd" to: "nlmsharedcatcit_090101.dtd" b. Added PrecedingInPart, SupersedesInPart, SucceedingInPart, SupersededInPartBy values to entity TitleType _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From biopython at maubp.freeserve.co.uk Thu Dec 4 10:26:39 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Dec 2008 10:26:39 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> Message-ID: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham wrote: > > Hi everyone, > > The current biopython GenBank parser dies while parsing VectorNTI > generated files. For example, until recently, BioPython did not > accept an empty SOURCE field. It still does not handle an empty > VERSION or ACCESSION fields (consumer.data.id never gets filled), > which is the default for user generated vector maps via VectorNTI. I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97 after Tim contacted me offlist - there was no bug report. > Now, it is easy enough to change the GenBank parser to handle > malformed genbank files, (I can submit patches) but the real question > becomes: >> Should BioPython handle malformed genbank files at all? > I would like to be practical and say yes, since VectorNTI is a very > common, widely used format, but I wanted to ask the community before > submitting my patches. > > Thanks for the great work, > Tim As I'm the defacto maintainer for Bio.GenBank, I guess unless the list as a whole has a consensus this is my call. Reading the GenBank file format spec, the ACCESSION and VERSION lines are clearly intended to be mandatory. Note that for mandatory fields, IIRC, the NCBI will use a single dot/period as a place holder when there is no data. So I would argue that VectorNTI is producing invalid files, and you should write to the authors and encourage them to follow the spec more closely (even if we do change Biopython to cope). However, I'm willing to bend a little on out of spec GenBank files (in cases like this where there is no ambiguity about the parsing), but I would want a real example output file from VectorNTI to include for a unit test. This is important as we need to use something sensible for the SeqRecord's id property if the ACCESSION and VERSION are missing. Peter From mjldehoon at yahoo.com Thu Dec 4 12:32:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 4 Dec 2008 04:32:18 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> Message-ID: <442447.52362.qm@web62407.mail.re1.yahoo.com> > Michiel de Hoon wrote: > > If one of the sub-tests fails, Python's unit > > testing framework will tell us so, > > though (perhaps) not exactly which sub-test fails. > > However, that is easy to > > figure out just by running the individual test script > > by itself. > > That won't always work. Consider intermittent network > problems, or tests using random data - in general it > really is worthwhile having run_tests.py report a little > more than just which test_XXX.py module failed. > I wonder if Python's unit testing framework allows us to capture exactly which sub-test fails. I'll look into that. Ideally, it should be possible to have regular Python unit tests and Biopython-style print-and-compare tests side by side, and get information about failing sub-tests for both. --Michiel. From bsouthey at gmail.com Thu Dec 4 15:02:13 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 04 Dec 2008 09:02:13 -0600 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> Message-ID: <4937F0F5.6070905@gmail.com> Peter wrote: > On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham wrote: > >> Hi everyone, >> >> The current biopython GenBank parser dies while parsing VectorNTI >> generated files. For example, until recently, BioPython did not >> accept an empty SOURCE field. It still does not handle an empty >> VERSION or ACCESSION fields (consumer.data.id never gets filled), >> which is the default for user generated vector maps via VectorNTI. >> > > I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97 > after Tim contacted me offlist - there was no bug report. > > >> Now, it is easy enough to change the GenBank parser to handle >> malformed genbank files, (I can submit patches) but the real question >> becomes: >> >>> Should BioPython handle malformed genbank files at all? >>> >> I would like to be practical and say yes, since VectorNTI is a very >> common, widely used format, but I wanted to ask the community before >> submitting my patches. >> >> Thanks for the great work, >> Tim >> > > As I'm the defacto maintainer for Bio.GenBank, I guess unless the list > as a whole has a consensus this is my call. > > Reading the GenBank file format spec, the ACCESSION and VERSION lines > are clearly intended to be mandatory. Note that for mandatory fields, > IIRC, the NCBI will use a single dot/period as a place holder when > there is no data. So I would argue that VectorNTI is producing > invalid files, and you should write to the authors and encourage them > to follow the spec more closely (even if we do change Biopython to > cope). > > However, I'm willing to bend a little on out of spec GenBank files (in > cases like this where there is no ambiguity about the parsing), but I > would want a real example output file from VectorNTI to include for a > unit test. This is important as we need to use something sensible for > the SeqRecord's id property if the ACCESSION and VERSION are missing. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the 'complete release notes for the current version of GenBank'. From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that ACCESSION and VERSION are mandatory and I interpret the '/' to mean 'with'. The relevant section is: 3.4.2 Entry Organization " The second part of each sequence entry record contains the information appropriate to its keyword, in positions 13 to 80 for keywords and positions 11 to 80 for the sequence. The following is a brief description of each entry field. Detailed information about each field may be found in Sections 3.4.4 to 3.4.15. LOCUS - A short mnemonic name for the entry, chosen to suggest the sequence's definition. Mandatory keyword/exactly one record. DEFINITION - A concise description of the sequence. Mandatory keyword/one or more records. ACCESSION - The primary accession number is a unique, unchanging identifier assigned to each GenBank sequence record. (Please use this identifier when citing information from GenBank.) Mandatory keyword/one or more records. VERSION - A compound identifier consisting of the primary accession number and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the sequence by NCBI. " Mandatory keyword/exactly one record. If these entries are missing then Biopython must raise an exception because the GenBank file is invalid. While I have not seen an example, does a VectorNTI output contain the LOCUS field that could be used an accession number? I think it is fairly common for the accession number to be part of the LOCUS field. Bruce From biopython at maubp.freeserve.co.uk Thu Dec 4 15:16:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Dec 2008 15:16:20 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <4937F0F5.6070905@gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <4937F0F5.6070905@gmail.com> Message-ID: <320fb6e00812040716h1fb4bfbflf5a37456102722cc@mail.gmail.com> On Thu, Dec 4, 2008 at 3:02 PM, Bruce Southey wrote: > Peter wrote: >> Reading the GenBank file format spec, the ACCESSION and VERSION lines >> are clearly intended to be mandatory. Note that for mandatory fields, >> IIRC, the NCBI will use a single dot/period as a place holder when >> there is no data. So I would argue that VectorNTI is producing >> invalid files, and you should write to the authors and encourage them >> to follow the spec more closely (even if we do change Biopython to >> cope). Bruce wrote: > At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the > 'complete release notes for the current version of GenBank'. > From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that > ACCESSION and VERSION are mandatory ... We agree on this, according to the current NCBI standard, a GenBank file missing the ACCESSION or VERSION line is technically invalid. Bruce: > If these entries are missing then Biopython must raise an exception because > the GenBank file is invalid. I see a difference between a GenBank parser, and a GenBank validator. While it would be nice to just say "your file is invalid", in many cases the meaning of the file isn't ambiguous and can still be safely parsed. From past experience, even the NCBI sometimes provide invalid files which break their own rules (e.g. Biopython Bug 2591). In my personal opinion, a strict parser which rejects any invalid GenBank file isn't actually that useful - there is a grey area where a little leniency is very helpful: Peter wrote: >> However, I'm willing to bend a little on out of spec GenBank files (in >> cases like this where there is no ambiguity about the parsing), but I >> would want a real example output file from VectorNTI to include for a >> unit test. This is important as we need to use something sensible for >> the SeqRecord's id property if the ACCESSION and VERSION are missing. Peter From biopython at maubp.freeserve.co.uk Thu Dec 4 22:15:26 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Dec 2008 22:15:26 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> Message-ID: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> Tim wrote: > I have attached two representative example genbank outputs from > VectorNTI. I don't know if the mailing list accepts attachments, but > if it can't, is there a place where I can put it (maybe the biopython > wiki?) I got them, thanks. For future reference, it would have been better to have filed a bug on bugzilla, and then (once the bug is filed) you can attach files to it. Earlier Tim wrote: >>> The current biopython GenBank parser dies while parsing VectorNTI >>> generated files. For example, until recently, BioPython did not >>> accept an empty SOURCE field. It still does not handle an empty >>> VERSION or ACCESSION fields (consumer.data.id never gets filled), >>> which is the default for user generated vector maps via VectorNTI. Now that I've got your two files, my copy of Biopython seem to read them just fine. What exactly do you mean by the "parser dies"? Could you show us a snippet of code and if relevant the exception error - plus details of your OS, version of Python and Biopthon etc? Thanks Peter From timothyham at gmail.com Fri Dec 5 02:09:21 2008 From: timothyham at gmail.com (Timothy Ham) Date: Thu, 4 Dec 2008 18:09:21 -0800 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> Message-ID: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com> On Thu, Dec 4, 2008 at 2:15 PM, Peter wrote: > Now that I've got your two files, my copy of Biopython seem to read > them just fine. What exactly do you mean by the "parser dies"? Could > you show us a snippet of code and if relevant the exception error - > plus details of your OS, version of Python and Biopthon etc? > > Thanks > > Peter > Ah, my bad. I was running it against an old version. It looks like it was fixed as of /biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48). The current version does the right thing. Thanks much, Tim From biopython at maubp.freeserve.co.uk Fri Dec 5 10:19:12 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Dec 2008 10:19:12 +0000 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com> <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com> Message-ID: <320fb6e00812050219k376fdda2r969fe78a547b0ff6@mail.gmail.com> Tim wrote: > Ah, my bad. I was running it against an old version. It looks like it > was fixed as of > /biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48). > The current version does the right thing. Oh right - that was when I was testing parsing of the slightly non-standard GenBank output from the EMBOSS seqret tool. Anyway, problem solved :) Peter From bugzilla-daemon at portal.open-bio.org Fri Dec 5 11:59:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 06:59:07 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812051159.mB5Bx7TR009168@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-05 06:59 EST ------- (In reply to comment #0) > The default font has been changed to 'Vera', which is shipped with Reportlab, > to avoid some problems with unavailable fonts On my Mac "Vera" doesn't work, and going back to the default of 'Helvetica' seems best on Unix in general. Also, Helvetica is one of the standard fonts which all PDF viewers should be able to render. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 16:44:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 11:44:10 -0500 Subject: [Biopython-dev] [Bug 2697] New: MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2697 Summary: MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The Bio.MaxEntrophy.classify() assumes that the targets are integers starting at zero. However, a model can be trained by using character values. This requires a simple change in a loop in that function. Also, the convergence criteria is hard coded into the file by the following gloable definitions: MAX_IIS_ITERATIONS = 10000 # Maximum iterations for IIS. IIS_CONVERGE = 1E-5 # Convergence criteria for IIS. MAX_NEWTON_ITERATIONS = 100 # Maximum iterations on Newton's method. NEWTON_CONVERGE = 1E-10 # Convergence criteria for Newton's method. This makes it impossible for the user to specify their own values without changing the actual function. This is changed by passing these values to the train function and subfunctions. Both of these are fixed in an attached patch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 16:47:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 11:47:15 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812051647.mB5GlFRQ020087@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #1 from bsouthey at gmail.com 2008-12-05 11:47 EST ------- Created an attachment (id=1139) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1139&action=view) Fixes to MaxEntrophy 1) Fixes MaxEntrophy.calculate to use the target classes from the data 2) Permits the user to define their own convergence criterion -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 16:59:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 11:59:51 -0500 Subject: [Biopython-dev] [Bug 2698] New: Attempt at a unit test for MaxEntrophy Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2698 Summary: Attempt at a unit test for MaxEntrophy Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I used test_LogisticRegression.py to develop a test for MaxEntrophy. However, I could not get MaxEntrophy to train on that dataset. Indeed I have found it to be very sensitive to both data and functions making it extremely hard to develop bioinformatics-based data and associated test. So in the end I generated data based on some of my work. I trained the model outside the tests because I do not know how to avoid retraining the model for each test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 5 17:00:29 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 5 Dec 2008 12:00:29 -0500 Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy In-Reply-To: Message-ID: <200812051700.mB5H0Ted022044@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2698 ------- Comment #1 from bsouthey at gmail.com 2008-12-05 12:00 EST ------- Created an attachment (id=1140) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1140&action=view) Test for MaxEntrophy -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From timothyham at gmail.com Thu Dec 4 21:52:33 2008 From: timothyham at gmail.com (Timothy Ham) Date: Thu, 4 Dec 2008 13:52:33 -0800 Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI) In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com> <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com> Message-ID: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com> On Thu, Dec 4, 2008 at 2:26 AM, Peter wrote: > On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham wrote: >> >> Hi everyone, >> >> The current biopython GenBank parser dies while parsing VectorNTI >> generated files. For example, until recently, BioPython did not >> accept an empty SOURCE field. It still does not handle an empty >> VERSION or ACCESSION fields (consumer.data.id never gets filled), >> which is the default for user generated vector maps via VectorNTI. > > I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97 > after Tim contacted me offlist - there was no bug report. > >> Now, it is easy enough to change the GenBank parser to handle >> malformed genbank files, (I can submit patches) but the real question >> becomes: >>> Should BioPython handle malformed genbank files at all? >> I would like to be practical and say yes, since VectorNTI is a very >> common, widely used format, but I wanted to ask the community before >> submitting my patches. >> >> Thanks for the great work, >> Tim > > As I'm the defacto maintainer for Bio.GenBank, I guess unless the list > as a whole has a consensus this is my call. > > Reading the GenBank file format spec, the ACCESSION and VERSION lines > are clearly intended to be mandatory. Note that for mandatory fields, > IIRC, the NCBI will use a single dot/period as a place holder when > there is no data. So I would argue that VectorNTI is producing > invalid files, and you should write to the authors and encourage them > to follow the spec more closely (even if we do change Biopython to > cope). > > However, I'm willing to bend a little on out of spec GenBank files (in > cases like this where there is no ambiguity about the parsing), but I > would want a real example output file from VectorNTI to include for a > unit test. This is important as we need to use something sensible for > the SeqRecord's id property if the ACCESSION and VERSION are missing. > > Peter > I have attached two representative example genbank outputs from VectorNTI. I don't know if the mailing list accepts attachments, but if it can't, is there a place where I can put it (maybe the biopython wiki?) Tim -------------- next part -------------- A non-text attachment was scrubbed... Name: vnti_example.zip Type: application/zip Size: 11716 bytes Desc: not available URL: From bugzilla-daemon at portal.open-bio.org Tue Dec 9 14:55:05 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 09:55:05 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091455.mB9Et5iX017478@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1132|application/octet-stream |text/plain mime type| | Attachment #1132 is|0 |1 patch| | ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 09:55 EST ------- (From update of attachment 1132) Checked into CVS (with the font defaulting to Helvetica as discussed with Leighton privately). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 14:55:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 09:55:56 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091455.mB9Etu7C017584@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1132 is|1 |0 patch| | Attachment #1132 is|0 |1 obsolete| | ------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 09:55 EST ------- (From update of attachment 1132) This is now obsolete - checked into CVS (with the font defaulting to elvetica as discussed with Leighton privately). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 15:12:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:12:56 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091512.mB9FCusM019463@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 10:12 EST ------- (In reply to comment #12) > > Bio.Graphics.GenomeDiagram.Utilities > ==================================== > This is a collection of utilities for getting information useful for graph > values. From the docstring, > > o apply_to_window (sequence, window_size, function, step=None) Apply a > passed function to fragments of the passed sequence of > size window_size, with each window separated by the > passed step. This windowing function is rather specific to GenomeDiagram by the nature of how it returns the values and their positions. The handling of the end of the sequence is also non-general. Suppose we put apply_to_window somewhere under Bio.Graphics.GenomeDiagram. It can then be used with any sequence analysis function which takes a sequence/string and returns a float, returning the scores and window positions as expected by GenomeDiagram for drawing graphical tracks. That would leave the following general non-windowed functions from Utilities.py, calc_gc_content - returns a float in the range 0 to 1. calc_at_content - returns a float in the range 0 to 1. calc_gc_skew - returns a float, gives zero if there is no GC content. calc_at_skew - returns a float, gives zero if there is no AT content. Bio.SeqUtils already has several functions including: GC - returns a float in the range 0 to 100 (i.e. 100 times the actual fraction) GC_skew - returns a list of floats using a default window size of 100bp. Gives a floating point exception if there is no GC content in any window. Personally I don't like the fact that the existing GC function returns a number between 0 and 100, but otherwise this code is fine. I don't think the current GC_skew function is intuitive and doesn't cover the non-windowed use-case where you want the GC_skew of the whole sequence passed in. This is important if you want to do your own windowing (e.g. comparing GC skew of individual genes to the whole genome). Because they differ from the existing Bio.SeqUtils code, I think there is a case for adding the four non-windowed functions from GenomeDiagram's Utilities.py under Bio.SeqUtils. Perhaps under a sub module like Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions in Bio.SeqUtils could be deprecated or at least declared obsolete. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 15:19:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:19:23 -0500 Subject: [Biopython-dev] [Bug 2704] New: Parser for the markx10 alignment format Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2704 Summary: Parser for the markx10 alignment format Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: osvaldo.zagordi at bsse.ethz.ch Hi, I recently wrote some code to parse the Emboss alignment format markx10 (format explained at http://emboss.sourceforge.net/docs/themes/AlignFormats.html) Since it is slightly different from the Fasta m10 (not surprising, right?) I had to adapt FastaIO.py. I thought this might eventually be included in biopython. Important: I noticed that if the alignment program exits for some reason and does not close the alignment file with two lines like these #--------------------------------------- #--------------------------------------- bad things can happen (e.g., sucking all the memory of the system)). Could it be that a similar issue applies to FastaIO parser as well? Best, Osvaldo -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 15:35:57 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:35:57 -0500 Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format In-Reply-To: Message-ID: <200812091535.mB9FZvHG021117@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2704 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 10:35 EST ------- This sounds interesting Osvaldo, Now that you've filed this bug, you should be able to upload the python file (or a patch). Given EMBOSS's markx10 output is intended to be like FASTA's -m 10 output (but with the addition of EMBOSS style headers and footers), it *might* be nicer to have one parser for both. Right now I don't know how similar EMBOSS's output really is. If we do go for the simpler option of two separate parsers, it would certainly be a good idea in the long run for them to share some code. (In reply to comment #0) > Important: > I noticed that if the alignment program exits for some reason and > does not close the alignment file with two lines like these > #--------------------------------------- > #--------------------------------------- > bad things can happen (e.g., sucking all the memory of the system)). > Could it be that a similar issue applies to FastaIO parser as well? Does this happen create such a file by hand (lacking these files) and try and read that? If so it should be easier to debug. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 15:43:19 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 10:43:19 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091543.mB9FhJfV021598@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #21 from lpritc at scri.sari.ac.uk 2008-12-09 10:43 EST ------- (In reply to comment #20) > (In reply to comment #12) > > > > Bio.Graphics.GenomeDiagram.Utilities > > ==================================== > > This is a collection of utilities for getting information useful for graph > > values. From the docstring, > > > > o apply_to_window (sequence, window_size, function, step=None) Apply a > > passed function to fragments of the passed sequence of > > size window_size, with each window separated by the > > passed step. > > This windowing function is rather specific to GenomeDiagram by the nature of > how it returns the values and their positions. The handling of the end of the > sequence is also non-general. Suppose we put apply_to_window somewhere under > Bio.Graphics.GenomeDiagram. It can then be used with any sequence analysis > function which takes a sequence/string and returns a float, returning the > scores and window positions as expected by GenomeDiagram for drawing graphical > tracks. That seems sensible, to me. I like the generality that would result from it, and it seems like apply_to_window could even be a useful convenience function addition to Bio.SeqUtils in its own right. [...] > Because they differ from the existing Bio.SeqUtils code, I think there is a > case for adding the four non-windowed functions from GenomeDiagram's > Utilities.py under Bio.SeqUtils. Perhaps under a sub module like > Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions > in Bio.SeqUtils could be deprecated or at least declared obsolete. I think that there's value to be had in standardising to a floating-point 0..1 or -1..1 range for some of these kinds of functions, so I would support such a move on those grounds. Regarding my GC skew code (and the corresponding AT skew code): that the behaviour when there is no GC in the sequence is misleading (read: wrong ;) ). Strictly, a divide-by-zero error would be correct here, but I just lazily went for a zero value for ease of drawing, instead of doing something that properly indicated 'not a number'. I think that what needs to be done for GenomeDiagram is to modify the graphing code so that it does something appropriate for NaNs (however they may be indicated) - this should perhaps be to stop at the preceding point, and resume at the subsequent point, for line graphs; not to draw a box for the heat map; and not to draw a bar for the bar chart (not that this will always be distinguishable from a zero value...). The GenomeDiagram GC/AT skew code also needs to be modified to return None or some other NaN indicator before its behaviour can be considered correct. Apologies for propagating those shortcuts - my bad. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 16:20:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:20:06 -0500 Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format In-Reply-To: Message-ID: <200812091620.mB9GK6Si024603@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2704 ------- Comment #2 from osvaldo.zagordi at bsse.ethz.ch 2008-12-09 11:20 EST ------- Created an attachment (id=1151) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1151&action=view) Class Markx10Iterator for markx10 alignment format Attached a simple example of using the code. Just running simple_test.py should be enough. If you remove the last two lines #------ from tmp_align.needle the program loops sucking more and more memory -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 16:20:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:20:23 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091620.mB9GKNCm024646@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 11:20 EST ------- (In reply to comment #21) > Regarding my GC skew code (and the corresponding AT skew code): that the > behaviour when there is no GC in the sequence is misleading > (read: wrong ;) ). > Strictly, a divide-by-zero error would be correct here, but I just lazily went > for a zero value for ease of drawing, instead of doing something that properly > indicated 'not a number'. Yeah - you're right. Either we just allow the divide by zero to be raised, or return a NaN, maybe via float("nan") unless there is a better way without getting NumPy involved. > I think that what needs to be done for GenomeDiagram > is to modify the graphing code so that it does something appropriate for NaNs > (however they may be indicated) - this should perhaps be to stop at the > preceding point, and resume at the subsequent point, for line graphs; not to > draw a box for the heat map; and not to draw a bar for the bar chart (not that > this will always be distinguishable from a zero value...). OK. I can see what just using zero was a nice short cut here. > The GenomeDiagram GC/AT skew code also needs to be modified to return None or > some other NaN indicator before its behaviour can be considered correct. Or, if we accept that "sequence scoring functions" may raise a divide by zero error, then apply_to_window should be also to cope and map this to an appropriate nan indicator (e.g. None or float("nan")). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 16:39:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:39:27 -0500 Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format In-Reply-To: Message-ID: <200812091639.mB9GdRTJ026010@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2704 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 11:39 EST ------- (In reply to comment #2) > If you remove the last two lines #------ from tmp_align.needle the program > loops sucking more and more memory You have an infinite loop, try modifying the bit near line 162 as follows: #Now should have the aligned query sequence with flanking region... while not (line.startswith(">") or ">>>" in line) and not line.startswith('#'): match_seq_parts.append(line.strip()) line = handle.readline() if not line : #End of file return None Also, your code is based on an out of date version of Bio/AlignIO/FastaIO.py - probably from Biopython 1.47, and lacks improvements which may also apply to the EMBOSS output. Given the object orientated nature of the current m10 parser, you/we should be able to subclass it and only override those bit dealing with the header and footer. This is probably the nicest way forward if we decide to treat the EMBOSS markx10 format as a new format in Bio.AlignIO. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 16:59:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 11:59:21 -0500 Subject: [Biopython-dev] [Bug 2705] New: Nicer GC and AT content and skew functions Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2705 Summary: Nicer GC and AT content and skew functions Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This bug started out as a discussion on Bug 2671, based on some nucleotide scoring functions in GenomeDiagram which were used for plotting sequence properties along a sequence using a sliding window. The basic underlying functions could make a nice addition under Bio.SeqUtils (rather than hiding them under Bio.Graphics.GenomeDiagram). In particular, GenomeDiagram's Utilities.py included the following (non-windowed) nucleotide composition functions: calc_gc_content - returns a float in the range 0 to 1. calc_at_content - returns a float in the range 0 to 1. calc_gc_skew - returns a float [*] calc_at_skew - returns a float [*] [*] As discussed on Bug 2671, these currently give zero if there is no AT content, which was a reasonable shortcut given these functions were originally used for plotting only. They should instead raise an exception or return None or NaN instead. Also, as implemented in GenomeDiagram, these functions do not cope with mixed case sequences (easily rectified). Also, for GC and AT content these do not deal with ambiguous nucleotides (where we could follow the existing Bio.SeqUtils convention). Bio.SeqUtils already has several related functions including: GC - returns a float (a percentage in the range 0 to 100) GC123 - returns a tuple of four floats (percentages between 0 and 100) GC_skew - returns a list of floats using a default window size of 100bp. Gives a floating point exception if there is no GC content in any window. Personally I don't like the fact that the existing GC function returns a number between 0 and 100 (rather than 0 and 1). Leighton agreed. I don't think the current GC_skew function is intuitive and doesn't cover the non-windowed use-case where you want the GC_skew of the whole sequence passed in. This is important if you want to do your own windowing (e.g. comparing GC skew of individual genes to the whole genome). Because they differ from the existing Bio.SeqUtils code, I think there is a case for adding the four non-windowed functions from GenomeDiagram's Utilities.py under Bio.SeqUtils. Each would take a single argument, a sequence (coping with a string, Seq object or MutableSeq object). I have no particularly strong views on the naming of these functions. Perhaps they could be located under a sub module like Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions in Bio.SeqUtils could be deprecated or at least declared obsolete. This would also be a good opportunity to explicitly specify what we expect to get back for the GC content when there are ambiguous nucleotides. e.g. Following Bio.SeqUtils.GC, only count C, G and S (which means C or G) (in either case) and divide by the length giving a lower bound. Here GC("ACGTN") is 40%. An alternative approach might be to treat an N as 50% GC, and H (which is A, C or T) as 66.6% GC etc, meaning GC("ACGTN") gives 50%. The same approach should be used for the AT percentage, for example the current lower bound approach would count only A, T and W characters (in either case). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 17:04:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 12:04:15 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091704.mB9H4F9C028063@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 12:04 EST ------- I've filed Bug 2705 about adding these nucleotide sequence functions somewhere under Bio.SeqUtils - this should get more people reading it because this bug (Bug 2671) hasn't been assigned to the dev mailing list I doubt many people are aware of it. For Bio.Graphics.GenomeDiagram we need to ensure the graphics tracks can cope with NAN/None missing values as outlined by Leighton in comment 21. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Tue Dec 9 17:53:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Dec 2008 12:53:44 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812091753.mB9Hri42031692@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1133 is|0 |1 obsolete| | ------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-09 12:53 EST ------- (From update of attachment 1133) I've checked something like this into CVS. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 16:46:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 11:46:35 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812101646.mBAGkZs1003825@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2705 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-10 11:46 EST ------- OK, GenomeDiagram is now in CVS, with some basic tests. Still to do: * Updating the existing GenomeDiagram manual to match (different imports, colour to color), which I think can stay as a separate PDF file. * A short introduction to Bio.Graphics including GenomeDiagram as part of a new chapter in the tutorial? * Dealing with Bug 2705 (for the AT and GC content and skew) and the window function to help plot these in GenomeDiagram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 16:46:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 11:46:38 -0500 Subject: [Biopython-dev] [Bug 2705] Nicer GC and AT content and skew functions In-Reply-To: Message-ID: <200812101646.mBAGkcGB003850@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2705 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2671 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 17:16:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 12:16:37 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812101716.mBAHGbGG006815@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-10 12:16 EST ------- We already talked about "colour" vs "color" (UK vs USA), but I've just noticed the use of "centre" vs "center" where again I would prefer we follow computer language norms and take the USA spelling. Also, I'm not sure that the existing colour/color dual support works 100% of the time. I had an old script using colour where the feature colours specified ended up being the default of light green. Using "color" instead of "colour" in my script worked. I'll try and investigate this later. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Wed Dec 10 17:55:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Dec 2008 12:55:31 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812101755.mBAHtVJ7009870@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-10 12:55 EST ------- This might be better off as a new enhancement bug, but here is a possible "arc-box" drawing function to go in the AbstractDrawer.py file, based on the existing draw_box function. def draw_arcbox(xcentre, ycentre, inner_radius, outer_radius, startangle, endangle, colour=colors.lightgreen, border=None, color=None) : """Returns a closed path object describing an arced box. Expects the angles to be in radians.""" if color is None: color = colour if color == colors.white and border is None: # Force black border on strokecolor = colors.black # white boxes with elif border is None: # undefined border, else strokecolor = color # use fill colour elif border is not None: strokecolor = border p = ArcPath(strokeColor=strokecolor, fillColor=color, strokewidth=0) p.addArc(xcentre, ycentre, outer_radius, startangle * 180 / pi, endangle * 180 / pi, moveTo=True) p.addArc(xcentre, ycentre, inner_radius, startangle * 180 / pi, endangle * 180 / pi, reverse=True) p.closePath() return p This takes advantage of reportlab's build in arc approximation code meaning we can simplify the CircularDrawer.py method to just something like this: def draw_arc(self, inner_radius, outer_radius, startangle, endangle, color, border=None, colour=None): #Docstring here return draw_arcbox(self.xcentre, self.ycentre, inner_radius, outer_radius, startangle, endangle, colour, border, color) Alternately, the code could just go in CircularDrawer.py directly. As far as I can tell from looking at their source code, even ReportLab_1_21_2 has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any issue here with backwards compatibility. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Dec 11 08:40:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Dec 2008 03:40:23 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812110840.mBB8eNFs006984@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #28 from lpritc at scri.sari.ac.uk 2008-12-11 03:40 EST ------- (In reply to comment #26) > We already talked about "colour" vs "color" (UK vs USA), but I've just noticed > the use of "centre" vs "center" where again I would prefer we follow computer > language norms and take the USA spelling. > > Also, I'm not sure that the existing colour/color dual support works 100% of > the time. I had an old script using colour where the feature colours specified > ended up being the default of light green. Using "color" instead of "colour" > in my script worked. I'll try and investigate this later. Is this related to my fix in comment #9? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Dec 11 11:50:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Dec 2008 06:50:17 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812111150.mBBBoHej030149@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-11 06:50 EST ------- (In reply to comment #28) > (In reply to comment #26) > > Also, I'm not sure that the existing colour/color dual support works 100% > > of the time. I had an old script using colour where the feature colours > > specified ended up being the default of light green. Using "color" > > instead of "colour" in my script worked. I'll try and investigate this > > later. > > Is this related to my fix in comment #9? Possibly - although I was already using that version of AbstractDrawer.py I've updated CVS to make it clear in the comments that "colour" arguments override "color" arguments (this is required for backwards compatibility with old scripts which would be using "colour"). I also had to fix the FeatureSet's add_feature method to handle the colour/color mapping (this was the root of the problem I had observed in comment 26). I propose that in Biopython 1.50 we support both "colour" and "color", but for Biopython 1.51 we add deprecation warnings when "colour" is used. We should probably do the same thing for "centre" and "center" as well... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Thu Dec 11 11:52:41 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Dec 2008 06:52:41 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812111152.mBBBqfTQ030413@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #30 from lpritc at scri.sari.ac.uk 2008-12-11 06:52 EST ------- (In reply to comment #29) > > I propose that in Biopython 1.50 we support both "colour" and "color", but for > Biopython 1.51 we add deprecation warnings when "colour" is used. > > We should probably do the same thing for "centre" and "center" as well... > I agree. We should encourage use of the US spelling in the documentation, to catch those new to GD. This approach provides a window for conversion of old GD scripts for previous users, which is a good thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 16:09:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 11:09:27 -0500 Subject: [Biopython-dev] [Bug 2709] New: test_GenomeDiagram fails under Linux Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2709 Summary: test_GenomeDiagram fails under Linux Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P4 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Under my Linux 64-bit system test_GenomeDiagram fails but the other related tessts 'pass' as reportlab is not available: test_GenomeDiagram ... ERROR test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. ok test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. ok test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. ok ====================================================================== ERROR: test_GenomeDiagram ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_GenomeDiagram.py", line 21, in raise MissingExternalDependencyError(\ NameError: name 'MissingExternalDependencyError' is not defined ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 16:25:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 11:25:59 -0500 Subject: [Biopython-dev] [Bug 2709] test_GenomeDiagram fails under Linux In-Reply-To: Message-ID: <200812121625.mBCGPxeQ031269@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2709 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-12 11:25 EST ------- It was trying to raise MissingExternalDependencyError when reportlab was missing (which would have skipped the test), but MissingExternalDependencyError hadn't been imported. Fixed in test_GenomeDiagram.py CVS revision 1.10 Thanks for reporting this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 16:49:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 11:49:51 -0500 Subject: [Biopython-dev] [Bug 2710] New: GenomeDiagram.py unnecessary requires the reportlab addon renderPM Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2710 Summary: GenomeDiagram.py unnecessary requires the reportlab addon renderPM Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com test_GenomeDiagram fails because the renderPM module is not part of standard install of reportlab, at least under Linux. I consider that the renderPM module should not be required so Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the renderPM module when it is not available. The installation documentation needs to include something about needing the renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. There must be a test for the presence of the renderPM module. test_GenomeDiagram ... ERROR test_GraphicsChromosome ... ok test_GraphicsDistribution ... ok test_GraphicsGeneral ... ok ====================================================================== ERROR: test_GenomeDiagram ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_GenomeDiagram.py", line 30, in from Bio.Graphics.GenomeDiagram.FeatureSet import FeatureSet File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/__init__.py", line 13, in from Bio.Graphics.GenomeDiagram.Diagram import Diagram File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/Diagram.py", line 32, in from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM File "/usr/lib/python2.5/site-packages/reportlab/graphics/renderPM.py", line 28, in "see http://www.reportlab.org/rl_addons.html") ImportError: No module named _renderPM see http://www.reportlab.org/rl_addons.html ---------------------------------------------------------------------- -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 17:43:49 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:43:49 -0500 Subject: [Biopython-dev] [Bug 2711] New: GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2711 Summary: GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com While looking at GenomeDiagram.py I noticed some things that should be fixed. I do note that some of this stems from reportlab. In particlular, reportlab doesn't appear to have a generic interface for different image formats. 1) Why are there two functions to output a diagram than just one generic function? In particular, why not just pass a filename or not? Yes, I know that reportlab uses different functions but this just duplicates code. So this is more a comment than anything else. 2) I find the functions write() and write_to_string() just plain ugly. You define a local dictionary of modules every time these functions are called. But there is only one valid key so you then go back to find the input that you already knew. A nested list would be better and allow catching invalid inputs (see next point). 3) Neither write() and write_to_string() check that the output option is valid. These functions do not accept lowercase. Thus, output='ps' will crash with a key error as well any invalid key. 4) I do not know the policy on module imports, but this line is only required for write() and write_to_string(): from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM Also renderPM is an addon. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 17:46:53 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:46:53 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121746.mBCHkrPi005835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #1 from bsouthey at gmail.com 2008-12-12 12:46 EST ------- Created an attachment (id=1156) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1156&action=view) Fix various issues with GenomeDIagram/Diagram.py Contains a couple of fixes including bug 2710. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 17:54:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:54:21 -0500 Subject: [Biopython-dev] [Bug 2710] GenomeDiagram.py unnecessary requires the reportlab addon renderPM In-Reply-To: Message-ID: <200812121754.mBCHsL4q006303@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2710 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from bsouthey at gmail.com 2008-12-12 12:54 EST ------- The reason for this bug report was the import of renderPM. But closer look at the code shows a bigger issue with write() and writeToString() functions of Diagram.py. I am marking this as duplicate because correctly fixing bug 2711 (see patch for Bug 2711) will also fix this one. *** This bug has been marked as a duplicate of bug 2711 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 17:54:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 12:54:34 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121754.mBCHsYgN006312@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #2 from bsouthey at gmail.com 2008-12-12 12:54 EST ------- *** Bug 2710 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 18:25:25 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 13:25:25 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121825.mBCIPPZq008484@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-12 13:25 EST ------- I agree something needs to be done for this issue (in particular the bit originally covered by Bug 2710. Moving the imports into these function(s) would be another way to let use deal with the missing renderPM module if and when it is used (either leave the ImportError, or raise a missing external dependency error). As an aside, I'd like write_to_string() to support a DPI argument like write() does. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 19:23:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 14:23:06 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121923.mBCJN64B013046@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1156 is|0 |1 obsolete| | ------- Comment #4 from bsouthey at gmail.com 2008-12-12 14:23 EST ------- Created an attachment (id=1157) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1157&action=view) Corrected patch I blindly copied and pasted without correcting it. Also, added 'dpi' to write_to_string(). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 12 19:29:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 12 Dec 2008 14:29:37 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812121929.mBCJTbtl013858@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #5 from bsouthey at gmail.com 2008-12-12 14:29 EST ------- (In reply to comment #3) > > As an aside, I'd like write_to_string() to support a DPI argument like write() > does. > I added this to the patch as it was trivial. I would also think that exposing the other options (bg, configPIL, showBoundary) could be useful. But I do not know how these influence the GenomeDiagram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Dec 13 18:20:10 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 13 Dec 2008 18:20:10 +0000 Subject: [Biopython-dev] [Utilities-announce] PubMed Entrez Utility 2009 DTD changes In-Reply-To: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov> <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com> Message-ID: <320fb6e00812131020r4a2a02dtcc7d65e8cf495052@mail.gmail.com> On Wed, Dec 3, 2008 at 9:10 PM, Peter wrote: > This email from the NCBI will be of interest for Bio.Entrez - we may > need to add a few DTD files to Bio.Entrez in preparation for this... > see also Bug 2678. I've just added the following five DTD files to CVS, nlmcommon_090101.dtd nlmmedline_090101.dtd nlmmedlinecitation_090101.dtd nlmsharedcatcit_090101.dtd pubmed_090101.dtd All from http://www.ncbi.nlm.nih.gov/entrez/query/DTD/ Peter From bugzilla-daemon at portal.open-bio.org Sat Dec 13 20:19:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 15:19:15 -0500 Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always retrieve or find DTD files In-Reply-To: Message-ID: <200812132019.mBDKJFkD005703@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2678 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 15:19 EST ------- (In reply to comment #6) > If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read > it from there. If not, it tries to download it. This may fail if the servers > are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when > Biopython is installed), you won't run into this problem. I was just looking at this on my Windows XP Python 2.3 machine, and when it tried to download missing DTD files it was just using a filename as the URL. I've committed a fix to CVS which should resolve this: biopython/Bio/Entrez/Parser.py revision 1.3 I'll double check this on Linux/Mac next week. This may be related to Leighton's problem - although 'xhtml1-strict.dtd' and 'xhtml-lat1.ent' are not NCBI DTD files, but rather a part of the XML specification itself. Note that if I delete all the Bio/Entrez/DTDs/* files, then test_Entrez.py fails. I get warning messages about downloading missing DTD files, and the following failures: ====================================================================== ERROR: Test parsing pubmed links returned by ELink (fifth test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 2523, in t_pubmed5 record = Entrez.read(input) File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in read record = handler.run(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 131, in startE lement if object!="": UnboundLocalError: local variable 'object' referenced before assignment ====================================================================== ERROR: Test parsing XML returned by EFetch, PubMed database (first test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 3058, in t_pubmed1 record = Entrez.read(input) File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in read record = handler.run(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) ExpatError: syntax error: line 1, column 0 ====================================================================== ERROR: Test parsing XML returned by EFetch, PubMed database (second test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 3261, in t_pubmed2 record = Entrez.read(input) File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in read record = handler.run(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run self.parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in extern al_entity_ref_handler parser.ParseFile(handle) ExpatError: syntax error: line 1, column 0 ====================================================================== FAIL: Test parsing pubmed links returned by ELink (sixth test) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Entrez.py", line 2697, in t_pubmed6 assert len(record[0]["IdCheckList"])==2 AssertionError ---------------------------------------------------------------------- (The rest of the Entrez tests pass even with the missing DTDs - they are now successfully downloaded on demand) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 13 23:56:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 18:56:02 -0500 Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. In-Reply-To: Message-ID: <200812132356.mBDNu2HE017869@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 18:56 EST ------- Hi Paul, I'd like to close this bug now as we think it has been solved. Michiel's update was included with Biopython 1.49, so you don't need to mess about with CVS to check and confirm this now. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 14 00:12:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 19:12:00 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200812140012.mBE0C0Yo018673@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 19:11 EST ------- (In reply to comment #4) > (In reply to comment #2) > > (In reply to comment #0) > > > 1) Fixed date/dates typo. > > > > Why is it a typo? Change not checked in. > > The function _load_bioentry_date in Loader.py inserts the annotation 'date', > if present, or the current date if not, into the bioentry_qualifier_value > table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as > the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo, > which should be 'date' and not 'dates'. OK, that does make sense. However... > Also, because Loader.py handles dates separately, they should not be > handled by the function load_annotations. That would make sense if we make the above "dates"/"date" change. If we tested a record with a "date" annotation, I guess currently it would get recorded twice - once under "date_changed" by _load_bioentry_date (retrieved as "dates") and again but under "date" by _load_annotations (retrieved as "date"). Right now, I'm wondering why _load_bioentry_date exists in the first place ... perhaps this special annotation entry "date_changed" is to mimic BioPerl? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 14 00:59:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 19:59:14 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812140059.mBE0xE0g021156@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-13 19:59 EST ------- (In reply to comment #0) > Also, the convergence criteria is hard coded into the file by the following > gloable definitions: > MAX_IIS_ITERATIONS = 10000 # Maximum iterations for IIS. > IIS_CONVERGE = 1E-5 # Convergence criteria for IIS. > MAX_NEWTON_ITERATIONS = 100 # Maximum iterations on Newton's method. > NEWTON_CONVERGE = 1E-10 # Convergence criteria for Newton's method. > > This makes it impossible for the user to specify their own values without > changing the actual function. No, you can change them in your own code - they are just module level variable. For example: from Bio import MaxEntropy #Check the current limit, print MaxEntropy.MAX_NEWTON_ITERATIONS #Increase the iteration limit, MaxEntropy.MAX_NEWTON_ITERATIONS = 1000 One might argue these should be *optional* arguments to the functions. However, your suggested change adds new *required* arguments, which is not a backwards compatible API change. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 14 02:20:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 21:20:37 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812140220.mBE2KbM1026093@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #3 from bsouthey at gmail.com 2008-12-13 21:20 EST ------- (In reply to comment #2) > (In reply to comment #0) > > Also, the convergence criteria is hard coded into the file by the following > > gloable definitions: > > MAX_IIS_ITERATIONS = 10000 # Maximum iterations for IIS. > > IIS_CONVERGE = 1E-5 # Convergence criteria for IIS. > > MAX_NEWTON_ITERATIONS = 100 # Maximum iterations on Newton's method. > > NEWTON_CONVERGE = 1E-10 # Convergence criteria for Newton's method. > > > > This makes it impossible for the user to specify their own values without > > changing the actual function. > > No, you can change them in your own code - they are just module level variable. > For example: > > from Bio import MaxEntropy > #Check the current limit, > print MaxEntropy.MAX_NEWTON_ITERATIONS > #Increase the iteration limit, > MaxEntropy.MAX_NEWTON_ITERATIONS = 1000 > > One might argue these should be *optional* arguments to the functions. > However, your suggested change adds new *required* arguments, which is not a > backwards compatible API change. > > Peter > I strongly disagree on this because a user should not have to read the module source code to find these module level global variables and what values these actually are. But this is not my code. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 14 04:27:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 13 Dec 2008 23:27:16 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812140427.mBE4RGIE001073@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2008-12-13 23:27 EST ------- (In reply to comment #3) > I strongly disagree on this because a user should not have to read the module > source code to find these module level global variables and what values these > actually are. But this is not my code. > I agree with Bruce that these variables should be arguments to the function, rather than module-level global variables. To keep the API backwards compatible, we can specify the current values for these variables as default values for these arguments. This will also make it easier for users that are not particularly interested in these variables. If you submit a revised patch, please do not just comment out unneeded code; it is better to actually remove code that is no longer needed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 14 13:17:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 14 Dec 2008 08:17:47 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200812141317.mBEDHla7021974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-14 08:17 EST ------- (In reply to comment #3) > (In reply to comment #2) > > No, you can change them in your own code - they are just module level > > variables > > ... > > One might argue these should be *optional* arguments to the functions. > > However, your suggested change adds new *required* arguments, which is not a > > backwards compatible API change. Sorry - you *did* use optional arguments for the train function. I was distracted by the private functions where the new arguments are required. > I strongly disagree on this because a user should not have to read the module > source code to find these module level global variables and what values these > actually are. But this is not my code. I'm not saying the current state of the code is elegant - just correcting your factual error that the end user couldn't change these parameters. They can. (In reply to comment #4) > I agree with Bruce that these variables should be arguments to the function, > rather than module-level global variables. To keep the API backwards > compatible, we can specify the current values for these variables as default > values for these arguments. This will also make it easier for users that are > not particularly interested in these variables. This is what I was implying, although less clearly. To be even more explicit, if we want to add these variables as arguments to the functions then they should default to the existing upper case module level variables. We shouldn't remove or rename the module level variables in case anyone was using them them in the way I illustrated in comment 2. e.g. def train(training_set, results, feature_fns, update_fn=None): becomes something like this: def train(training_set, results, feature_fns, update_fn=None, max_iis_iterations = MAX_IIS_ITERATIONS, iis_convere = IIS_CONVERGE, max_newton_iterations = MAX_NEWTON_ITERATIONS newton_coverage = NEWTON_CONVERGE): #This function's code would then need updating to use #local variable max_iis_iterations instead of the #module level MAX_IIS_ITERATIONS. Note this does NOT use uppercase argument names as in Bruce's original patch - these would not be consistent with the rest of Biopython. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 10:11:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:11:37 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151011.mBFABbqD007138@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #6 from lpritc at scri.sari.ac.uk 2008-12-15 05:11 EST ------- (In reply to comment #2) > *** Bug 2710 has been marked as a duplicate of this bug. *** > (In reply to comment #0) > test_GenomeDiagram fails because the renderPM module is not part of standard > install of reportlab, at least under Linux. That's odd - renderPM is in the source for ReportLab 2.2. Are you using an up-to-date version? It seems to install well enough on our 64-bit Linux box from the ReportLab source. > I consider that the renderPM module should not be required so > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the > renderPM module when it is not available. renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part of GenomeDiagram's functionality. I prefer your alternative suggestion of making it a 'dynamic' import, but even then I think that the inconvenience of preparing the diagram, only to find out at the last possible stage that you can't draw it because you're missing the library, is worse than getting the error message upfront. Not that this should be a problem, since renderPM is part of the main ReportLab source, now. YMMV though, and I'm happy for the code to conform to the Biopython house style. > The installation documentation needs to include something about needing the > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. > > There must be a test for the presence of the renderPM module. I'm not convinced of the value of this, as renderPM is part of the current ReportLab source installation. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 10:17:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:17:54 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151017.mBFAHs0K007630@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #6 from lpritc at scri.sari.ac.uk 2008-12-15 05:11 EST ------- (In reply to comment #2) > *** Bug 2710 has been marked as a duplicate of this bug. *** > (In reply to comment #0) > test_GenomeDiagram fails because the renderPM module is not part of standard > install of reportlab, at least under Linux. That's odd - renderPM is in the source for ReportLab 2.2. Are you using an up-to-date version? It seems to install well enough on our 64-bit Linux box from the ReportLab source. > I consider that the renderPM module should not be required so > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the > renderPM module when it is not available. renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part of GenomeDiagram's functionality. I prefer your alternative suggestion of making it a 'dynamic' import, but even then I think that the inconvenience of preparing the diagram, only to find out at the last possible stage that you can't draw it because you're missing the library, is worse than getting the error message upfront. Not that this should be a problem, since renderPM is part of the main ReportLab source, now. YMMV though, and I'm happy for the code to conform to the Biopython house style. > The installation documentation needs to include something about needing the > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. > > There must be a test for the presence of the renderPM module. I'm not convinced of the value of this, as renderPM is part of the current ReportLab source installation. ------- Comment #7 from lpritc at scri.sari.ac.uk 2008-12-15 05:17 EST ------- (In reply to comment #0) (from #2710) > test_GenomeDiagram fails because the renderPM module is not part of standard > install of reportlab, at least under Linux. renderPM is part of the source install of ReportLab 2.2, and installs correctly on our 64-bit Linux box. Are you using an up-to-date version of ReportLab? The version that your distro's installer uses may not be the most recent. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 10:41:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:41:13 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151041.mBFAfDI8010277@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #8 from lpritc at scri.sari.ac.uk 2008-12-15 05:41 EST ------- (In reply to comment #0) > 1) Why are there two functions to output a diagram than just one generic > function? In particular, why not just pass a filename or not? When I wrote the libraries originally, I had one main use in mind: production of publication-quality images in vector format. Later on I decided that I needed streaming output for web display, and then bolted on the write_to_string() to look like the ReportLab interface, for consistency. That's why there are two methods: the write() method produces publication-quality (and bitmaps, if you ask), and the write_to_string() method produces the streaming output. It should be possible to make write() do both jobs, so long as the intention is declared in the argument list. It might be nice to just be able to specify a stream or handle, rather than the filename. Both of these would be an API change. > 2) I find the functions write() and write_to_string() just plain ugly. > You define a local dictionary of modules every time these functions are called. That dictionary could be placed at the head of the script to be defined on import. But I think it's more explicit what's going on to have it in the method itself - the dictionary has restricted scope, and is garbage-collected after the function call. Also, I don't understand your nested list proposal: distribution dictionaries are not that uncommon. > 4) I do not know the policy on module imports, but this line is only required > for write() and write_to_string(): > from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM > Also renderPM is an addon. Apologies for repeating myself earlier about this one - Bugzilla was being flaky - but renderPM is now part of ReportLab 2.2. Whether we should continue to support/cater for installations of 1.21 without the add-ons is another question, I think. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 10:51:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 05:51:30 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151051.mBFApU9R011217@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #9 from lpritc at scri.sari.ac.uk 2008-12-15 05:51 EST ------- (In reply to comment #3) >As an aside, I'd like write_to_string() to support a DPI argument like write() > does. The way I originally intended write_to_string() to be used - sending graphics to a browser - the DPI has no influence at all. DPI is only of any importance for printing graphics: the DPI translates the pixel size into the final printed size of the image. The image you see on screen (assuming no fancy browser scaling) is pixel-per-pixel. That's why I left it out. It may be that people have a sensible reason for writing their image output to string - rather than binary - encoding, for writing to a file. I'm not clear on what that would be, but it's possible. In that case, I think that an appropriate merging of the write() and write_to_string() methods could be: def write(self, filename=None, output=default_output, dpi=default_dpi, encoding=default_encoding): encoding could then be either 'binary' (default), or 'string' - which would emulate write_to_string()'s function. Where handle is not None, the resulting output would be sent to the passed handle - which could potentially include sys.stdout. Where handle is None, the method could return the encoded image directly, as write_to_string() does, now. Other than the obvious problem with ReportLab's drawToFile requiring a filename, rather than a handle - does this seem like a reasonable plan to others? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 11:00:01 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 06:00:01 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151100.mBFB01fk011962@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-15 06:00 EST ------- (In reply to comment #8) > > > 4) I do not know the policy on module imports, but this line is only > > required for write() and write_to_string(): > > from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM > > Also renderPM is an addon. > > Apologies for repeating myself earlier about this one - Bugzilla was being > flaky - but renderPM is now part of ReportLab 2.2. Whether we should continue > to support/cater for installations of 1.21 without the add-ons is another > question, I think. I thought I'd commented on this bug already but I committed a patch which would fail gracefully if renderPM was missing. I must be running an older version of ReportLab on my Linux box at home, because it didn't have renderPM installed. However - this check is done when writing the file. This is good if you don't have renderPM but only want vector images. This is bad if you do want bitmaps images, as the missing dependency error happens at the very end. However, I don't think we can assume renderPM will be installed. Looking at the website for reportlab 2.2, its not clear if the Windows installers will include renderPM or not... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 11:02:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 06:02:35 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151102.mBFB2ZMq012237@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #11 from lpritc at scri.sari.ac.uk 2008-12-15 06:02 EST ------- (In reply to comment #3) > I agree something needs to be done for this issue (in particular the bit > originally covered by Bug 2710. > > Moving the imports into these function(s) would be another way to let use deal > with the missing renderPM module if and when it is used (either leave the > ImportError, or raise a missing external dependency error). One issue with this approach is that, when working with the module interactively, a user might not be aware of the absence of the appropriate module until they attempted to produce their output - which might be after quite a bit of interactive work. Informing the user up-front that renderPM is not available - either by ImportError or friendly warning - avoids this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 11:17:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 06:17:45 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812151117.mBFBHjgn013463@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-15 06:17 EST ------- (In reply to comment #9) > (In reply to comment #3) > > As an aside, I'd like write_to_string() to support a DPI argument like > > write() does. > > The way I originally intended write_to_string() to be used - sending graphics > to a browser - the DPI has no influence at all. DPI is only of any importance > for printing graphics ... OK, so its less useful than I had expected. Rending bitmaps to strings so they can be inserted into a database as blobs is one potential use-case. Also for a web-service where you expect the user to save and print the naked image (unusual, and probably software dependent on how the DPI is treated). > In that case, I think that an appropriate merging of the write() and > write_to_string() methods could be: > > def write(self, filename=None, output=default_output, dpi=default_dpi, > encoding=default_encoding): > > encoding could then be either 'binary' (default), or 'string' - which would > emulate write_to_string()'s function. > > Where handle is not None, the resulting output would be sent to the passed > handle - which could potentially include sys.stdout. Where handle is None, > the method could return the encoded image directly, as write_to_string() > does, now. > > Other than the obvious problem with ReportLab's drawToFile requiring a > filename, rather than a handle - does this seem like a reasonable plan to > others? On the plus side, this would be backwards compatible (and we could deprecate the draw_to_string function). However, I'm not so keen on this style personally - the return value is radically different depending on the arguments (nothing, or a string of data). If we were designing this from scratch, I would have suggested one write function which wrote to a handle - which would let you then write to a file or a string (using StringIO). On the other hand, this is perhaps a little low level. We're had similar discussions regarding Bio.SeqIO in the past. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 15 20:33:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 15 Dec 2008 15:33:51 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200812152033.mBFKXpp4005791@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #4 from joelb at lanl.gov 2008-12-15 15:33 EST ------- I heard back from GenBank, and it seems they are saying the problem isn't theirs: >On Tue, December 9, 2008 10:30 am, gb-admin at ncbi.nlm.nih.gov wrote: >> Hi Joel, >> >> I heard back from our database folks on this one. Essentially we do >> allow the source line to line-wrap, but we never publicly announced >> it. We apologize for this oversight and will be putting something >> in the release notes regarding this. Hopefully BioPython and other >> companies will be able to pick up this change and adapt once it is >> announced in the release notes. >> >> thanks for pointing it out >> >> Linda I just wrote back with the followup question: > >OK, but but then a followup question. How does one distinguish, then, a >line-wrapped organism line from the multiline phylogeny that follows? >According to my reading of the specs (and most Bio* GenBank parser's >implementations) it seems that an equally-valid parsing of the following >ORGANISM record is that it belongs to the "AKU_12601 Bacteria" kingdom. >That is, there is no official way of signalling "this is the end of the >multiline organism name" or "this begins the multiline phylogeny record." > > ORGANISM Salmonella enterica subsp. enterica serovar Paratyphi A str. > AKU_12601 > Bacteria; Proteobacteria; Gammaproteobacteria;Enterobacteriales; > Enterobacteriaceae; Salmonella. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 17 23:44:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Dec 2008 18:44:58 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200812172344.mBHNiwPt019616@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #5 from joelb at lanl.gov 2008-12-17 18:44 EST ------- I received the following response to my followup. It now appears that the bug is with BioPython, since GenBank has changed its definition. It seems likely that all Bio* flatfile parsers will be affected. >I just received the wording that will appear in Section 3.4.2 of gbrel.txt >for this month's release: > > ORGANISM - Formal scientific name of the organism (first line) >and taxonomic classification levels (second and subsequent lines). >Mandatory subkeyword in all annotated entries/two or more records. > > In the event that the organism name exceeds 68 characters (80 - 13 + >1) > in length, it will be line-wrapped and continue on a second line, > prior to the taxonomic classification. Unfortunately, very long > organism names were not anticipated when the fixed-length GenBank > flatfile format was defined in the 1980s. The possibility of linewraps > makes the job of flatfile parsers more difficult : essentially, one > cannot be sure that the second line is truly a classification/lineage > unless it consists of multiple tokens, delimited by semi-colons. > The long-term solution to this problem is to introduce an additional > subkeyword, probably 'LINEAGE' . This might occur sometime in 2009 > or 2010. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 18 11:07:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Dec 2008 06:07:16 -0500 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200812181107.mBIB7G97005964@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-18 06:07 EST ------- (In reply to comment #5) > I received the following response to my followup. It now appears that the bug > is with BioPython, since GenBank has changed its definition. It seems likely > that all Bio* flatfile parsers will be affected. Thanks for chasing this up Joel :) > I just received the wording that will appear in Section 3.4.2 of gbrel.txt > for this month's release: > > > > ORGANISM - Formal scientific name of the organism (first line) > >and taxonomic classification levels (second and subsequent lines). > >Mandatory subkeyword in all annotated entries/two or more records. > > > > In the event that the organism name exceeds 68 characters (80-13+1) > > in length, it will be line-wrapped and continue on a second line, > > prior to the taxonomic classification. Unfortunately, very long > > organism names were not anticipated when the fixed-length GenBank > > flatfile format was defined in the 1980s. The possibility of linewraps > > makes the job of flatfile parsers more difficult : essentially, one > > cannot be sure that the second line is truly a classification/lineage > > unless it consists of multiple tokens, delimited by semi-colons. > > The long-term solution to this problem is to introduce an additional > > subkeyword, probably 'LINEAGE' . This might occur sometime in 2009 > > or 2010. It looks like my guess was right, see comment #1: > Let's wait and hear what the NCBI says - I expect they will have to change the > file format definition slightly. > > If they say this is a valid file, I hope they will also explain officially > how we should split up the species and its lineage. One option would be > some thing like looking for semi-colons in the following text as indicative > of the lineage (rather than as more of the ORGANISM). Now that we've had the NCBI recommend the semi-colon approach, I've fixed our parser in CVS: Bio/GenBank/Record.py revision 1.14 Bio/GenBank/Scanner.py revision 1.26 Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 18 19:01:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Dec 2008 14:01:32 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200812181901.mBIJ1W31019801@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-18 14:01 EST ------- (In reply to comment #27) > This might be better off as a new enhancement bug, but here is a possible > "arc-box" drawing function to go in the AbstractDrawer.py file, based on the > existing draw_box function. > > ... There was an issue with different frames of reference in the initial code I was suggesting. > Alternately, the code could just go in CircularDrawer.py directly. This seemed simpler in the short term. > As far as I can tell from looking at their source code, even ReportLab_1_21_2 > has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any > issue here with backwards compatibility. I've just checked in a patch based on this - see Bio/Graphics/GenomeDiagram/CircularDrawer.py revision 1.8 I've also updated the unit test to draw a circular diagram with some features in white (with an automatic black border). This now looks nice - with the old code using mutliple boxes to fake the arced box, the whole feature ended up looking black. See Tests/test_GenomeDiagram.py revision 1.13 As a bonus, PDF output seems a little smaller now as well :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 16:19:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 11:19:51 -0500 Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2 In-Reply-To: Message-ID: <200812221619.mBMGJp6k013225@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2375 ------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 11:19 EST ------- (In reply to comment #24) > I committed my patch to setup.py, as it seems to work fine with Python 2.3, > 2.4, and 2.5 on all platforms. Leaving this bug open, since we still need to > remove the workaround in Bio/PopGen/SimCoal/__init__.py. Editing Bio/PopGen/SimCoal/__init__.py so do just the following seems to work fine on Linux and MacOS (I've not tested on Windows yet): import os builtin_tpl_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), "data")) I *think* this directory is only used in one place in Bio/PopGen/SimCoal/Template.py so it might make more sense to put this code in that function (leaving the __init__.py file essentially empty). What do you think Tiago? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 17:20:46 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 12:20:46 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200812221720.mBMHKkwo018936@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #961 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 12:20 EST ------- (From update of attachment 961) This patch is now obsolete - I've checked in a variant of this into CVS. This will allow us to proceed with Bug 2597 ( Enforce alphabet letters in Seq objects) without having to first introduce mixed case variants of the IUPAC alphabets. If/when we have mixed case IUPAC alphabets, then Bio.Sequencing.PhD could use them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 17:33:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 12:33:33 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200812221733.mBMHXXjd020146@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 12:33 EST ------- Created an attachment (id=1174) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1174&action=view) Patch for Bio/Nexus/Nexus.py (non IUPAC) alphabet handling (In reply to comment #2) > I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for > everyone (instead creating their own uppercase-lowercase variants of those > terribly complicated biopython alphabet classes), and easy to change for all > other modules if lowercase-uppercase is what they want (or need). I'm not saying we shouldn't add mixed (and even lower) case variants of the IUPAC alphabets, however, even if we had them, NEXUS still uses extra characters like "-" for gaps (easily handled via a Gapped alphabet encoder) and "?" (for a missing character). Are there any other extra characters? Under the current alphabet schema, we'd have to use a (mixed case) IUPAC alphabet, then add a Gapped AlphabetEncoder (easy) then add a new alphabet encoder for any misc letters non-IUPAC characters like "?". This could be done with the generic AlphabetEncoder, or we could add additional encoder objects for special meanings. This starts to get complicated (dealing with AlphabetEncoders is nasty). This attached patch is a variation on my "plan (a)" from comment 0. It makes Bio.Nexus create its own alphabet objects (based on the generic DNA/RNA/Protein classes) with the precise list of valid letters required for that file. Using this patch should allow us to press ahead with Bug 2597. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 17:38:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 12:38:10 -0500 Subject: [Biopython-dev] [Bug 2597] Enforce alphabet letters in Seq objects In-Reply-To: Message-ID: <200812221738.mBMHcA86020507@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2597 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 12:38 EST ------- Created an attachment (id=1175) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1175&action=view) Patch for Bio/Seq.py to check the alphabet letters This is a simple approach to checking the letters - probably not the fastest. I think it is important that the exception gives some clue about why the Seq object was not created - either listing the first invalid character (as in this patch) or listing all invalid characters (which could be done using sets). On the other hand, I'd like this check to be as fast as possible - perhaps even at the cost of a generic exception message like "Sequence contains letters which are not valid for the given alphabet". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 22 18:27:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Dec 2008 13:27:11 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200812221827.mBMIRBme024497@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-22 13:27 EST ------- Created an attachment (id=1176) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1176&action=view) Adding lower and mixed case IUPAC Alphabets This needs reviewing by someone else - especially the multiple inheritance which tries to follow the existing pattern that the parent is a more general version of the child. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 09:58:31 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 04:58:31 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812230958.mBN9wVDK000340@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #13 from bsouthey at gmail.com 2008-12-23 04:58 EST ------- (In reply to comment #6) > (In reply to comment #2) > > *** Bug 2710 has been marked as a duplicate of this bug. *** > > > > (In reply to comment #0) > > test_GenomeDiagram fails because the renderPM module is not part of standard > > install of reportlab, at least under Linux. > > That's odd - renderPM is in the source for ReportLab 2.2. Are you using an > up-to-date version? It seems to install well enough on our 64-bit Linux box > from the ReportLab source. I can not check this as I am away from my system. As I recall, the Python code for accessing this library is provided with the standard install as there is a renderPM.py file. But that is just a wrapper to some C code found in the rl_addons directory. So it is a big no that renderPM is available unless you actually build the C sources or download the binaries (only valid for Windows). According to the website http://www.reportlab.org/subversion.html " It will create subdirectories for reportlab, which is an importable python package, and rl_addons which contains the C extensions. The latter need building with the contained setup script, but can also be downloaded in pre-built form from our downloads page. They rarely change. " What did you actually install? In particular where was _renderPM built? Basically we need to document this as there appears to be different ways to install reporlab (may also be version or svn related). > > > I consider that the renderPM module should not be required so > > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the > > renderPM module when it is not available. > > renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part > of GenomeDiagram's functionality. No problem then, but you must provide a test for the presence and functionality of it in the actual code as well as the biopython tests. > > I prefer your alternative suggestion of making it a 'dynamic' import, but even > then I think that the inconvenience of preparing the diagram, only to find out > at the last possible stage that you can't draw it because you're missing the > library, is worse than getting the error message upfront. Not that this should > be a problem, since renderPM is part of the main ReportLab source, now. YMMV > though, and I'm happy for the code to conform to the Biopython house style. > > > The installation documentation needs to include something about needing the > > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs. > > > > There must be a test for the presence of the renderPM module. > > I'm not convinced of the value of this, as renderPM is part of the current > ReportLab source installation. > My understanding is that this statement is not completely true. But I would like confirmation either way. There may also be allowance for windows installations especially non-source ones but I can not check those. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 10:18:58 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 05:18:58 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812231018.mBNAIwuq002193@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #14 from bsouthey at gmail.com 2008-12-23 05:18 EST ------- (In reply to comment #12) > (In reply to comment #9) > > (In reply to comment #3) > > > As an aside, I'd like write_to_string() to support a DPI argument like > > > write() does. > > > > The way I originally intended write_to_string() to be used - sending graphics > > to a browser - the DPI has no influence at all. DPI is only of any importance > > for printing graphics ... > > OK, so its less useful than I had expected. Rending bitmaps to strings so they > can be inserted into a database as blobs is one potential use-case. Also for a > web-service where you expect the user to save and print the naked image > (unusual, and probably software dependent on how the DPI is treated). > Surely it is important because a user can write to a string and then save the string to a file rather than using write() a second time. What do these options do? bg, configPIL, showBoundary > > In that case, I think that an appropriate merging of the write() and > > write_to_string() methods could be: > > > > def write(self, filename=None, output=default_output, dpi=default_dpi, > > encoding=default_encoding): > > > > encoding could then be either 'binary' (default), or 'string' - which would > > emulate write_to_string()'s function. > > > > Where handle is not None, the resulting output would be sent to the passed > > handle - which could potentially include sys.stdout. Where handle is None, > > the method could return the encoded image directly, as write_to_string() > > does, now. > > > > Other than the obvious problem with ReportLab's drawToFile requiring a > > filename, rather than a handle - does this seem like a reasonable plan to > > others? > > On the plus side, this would be backwards compatible (and we could deprecate > the draw_to_string function). > > However, I'm not so keen on this style personally - the return value is > radically different depending on the arguments (nothing, or a string of data). > > If we were designing this from scratch, I would have suggested one write > function which wrote to a handle - which would let you then write to a file or > a string (using StringIO). On the other hand, this is perhaps a little low > level. We're had similar discussions regarding Bio.SeqIO in the past. > I agree and I am not very concerned about backwards compatibility since this is a very new function to Biopython. I think that is what is almost what write_to_string() does and python functions are very big. But this is not my code so please do as you want here. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 11:12:33 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 06:12:33 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812231112.mBNBCXkt006916@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 06:12 EST ------- (In reply to comment #14) > (In reply to comment #12) > > OK, so its less useful than I had expected. Rending bitmaps to strings so > > they can be inserted into a database as blobs is one potential use-case. > > Also for a web-service where you expect the user to save and print the > > naked image (unusual, and probably software dependent on how the DPI is > > treated). > > Surely it is important because a user can write to a string and then save the > string to a file rather than using write() a second time. I was talking about write to string with a DPI not being so useful. Using write to string is VERY useful, particularly for a webserver (which is why Leighton added it, and how I have used it). Setting the DPI isn't important for using images in webpages - HTML and CSS provide lots of ways to control the displayed and printed size. Even if the browser is pointed directly at the image (and not as part of a webpage) and you then print it, the browser may ignore the DPI setting (probably browser specific). i.e. The DPI will only matter if the user saves the image and opens it in DPI aware software. (In reply to comment #14) > (In reply to comment #12) > > However, I'm not so keen on this style personally - the return value is > > radically different depending on the arguments (nothing, or a string of > > data). > > > > If we were designing this from scratch, I would have suggested one write > > function which wrote to a handle - which would let you then write to a > > file or a string (using StringIO). On the other hand, this is perhaps a > > little low level. We're had similar discussions regarding Bio.SeqIO in > > the past. > > I agree and I am not very concerned about backwards compatibility since this > is a very new function to Biopython. I think that is what is almost what > write_to_string() does and python functions are very big. But this is not my > code so please do as you want here. GenomeDiagram is new to Biopython, but has been available independently for many years. There will be some existing users (not just me and Leighton), and the less they have to change to switch their code from using standalone GenomeDiagram to the one within Biopython the better (the import lines have to change for example). We do need to think about backwards compatibility a bit. Getting back to your original points, (1) Two functions write() and write_to_string() This follows the reportlab API, and they do actually return different encodings. From a backwards compatibility argument they should both stay, but that doesn't stop us providing a unified method and deprecating write_to_string(). (2) Coding style of write() and write_to_string() I don't have a problem with this - it works, its clear, its easily extended if ReportLab add more back ends. It doesn't strike me as ugly. Inevitably this is largely a matter of preference. (3) The KeyError exception with invalid arguments. This is fixed in CVS, for an invalid format argument you now get a ValueError which is standard python practice. (4) renderPM Fixed in CVS, in that you can now use GenomeDiagram without ReportLab renderPM, and have full functionality except for bitmap output. Given we don't seem to be able to assume renderPM will be installed and working, this seems a reasonable solution. If you try and render a bitmap without renderPM, then you get a MissingExternalDependencyError exception asking you to install renderPM. We will need to look into this further for the documentation. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 12:45:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 07:45:55 -0500 Subject: [Biopython-dev] [Bug 2718] New: Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2718 Summary: Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk In addition to PDF and PS/EPS (encapsulated postscript), ReportLab can also do SVG, and with its optional renderPM module can do assorted bitmaps too (e.g. PNG, JPG, TIFF, GIF, BMP). Note that renderPM may not be installed (see Bug 2710). The recently added Bio.Graphics.GenomeDiagram module supports all of these formats - see Diagram.py with write (to filename or a handle) and write_to_string methods. Looking at the older Bio.Graphics code, it currently only supports PDF postscript, using a mixture of method names (which isn't very consistent): Bio.Graphics.Distribution has a DistributionPage object with a draw method (which writes to a filename or handle). Bio.Graphics.BasicChromosome has an Organism object with a write method (which writes to a filename or handle). Bio.Graphics.Comparative has a ComparativeScatterPlot object with a draw_to_file method (which writes to a filename or handle). I would like: (1) All the Bio.Graphics "write to file/handle" functions to accept any of the supported file formats (like Bio.Graphics.GenomeDiagram), which would require renderPM at run time for the bitmap formats (see Bug 2710). They should share some code for mapping format names to ReportLab rendering module. This would be easy to do without changing the existing mix of method names. (2) Update the docstrings for the "write to file/handle" functions to make it clear they can accept a filename OR a handle (a result of the underlying reportlab renderer's drawToFile function's behaviour - see note below). (3) Standardise on the method naming (and perhaps deprecate the old methods). Using "write" seems to be a sensible choice based on the current names used in Bio.Graphics. For reference/comparison, ReportLab's render modules have three related functions: * drawToString - Returns a string, calls drawToFile internally with a StringIO handle. * drawToFile - Takes a filename OR a handle (although their docstrings do not make this clear, this works as the Canvas object takes either). Calls the draw function internally. * draw - Takes a canvas object See also Bug 2711 which touched on these issues in the context of GenomeDiagram only. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 12:47:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 07:47:26 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200812231247.mBNClPt9017108@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 07:47 EST ------- In comment #12, I wrote: > If we were designing this from scratch, I would have suggested one write > function which wrote to a handle - which would let you then write to a file or > a string (using StringIO). On the other hand, this is perhaps a little low > level. We're had similar discussions regarding Bio.SeqIO in the past. The reportlab docstrings are very unclear, however, their renderer's drawToFile functions take either a filename OR a handle. This works because the underlying Canvas object can be created giving either a filename or a handle. As a result, GenomeDiagram's write() method should accept either a filename or a handle. We should update the docstring to say this (perhaps even renaming the argument?). (In reply to comment #15) > (1) Two functions write() and write_to_string() > This follows the reportlab API, and they do actually return different > encodings. I wrote this based on something Leighton had said to me. Going over the reportlab code, this isn't true - reportlab's drawToString just calls drawToFile with a cStringIO or StringIO handle. They write identical data. (In reply to comment #15) > Getting back to your original points, > > (1) Two functions write() and write_to_string() > This follows the reportlab API, and they do actually return different > encodings. From a backwards compatibility argument they should both stay, but > that doesn't stop us providing a unified method and deprecating > write_to_string(). I've filed Bug 2718 for the general issue of method naming for the Bio.Graphics modules output functionality. > (2) Coding style of write() and write_to_string() > I don't have a problem with this - it works, its clear, its easily extended if > ReportLab add more back ends. It doesn't strike me as ugly. Inevitably this > is largely a matter of preference. Leaving this as is - the code itself may end up handled via shared function for all of Bio.Graphics via Bug 2718. > (3) The KeyError exception with invalid arguments. > This is fixed in CVS, for an invalid format argument you now get a ValueError > which is standard python practice. > > (4) renderPM > Fixed in CVS, in that you can now use GenomeDiagram without ReportLab > renderPM and have full functionality except for bitmap output. Given we > don't seem to be able to assume renderPM will be installed and working, this > seems a reasonable solution. If you try and render a bitmap without > renderPM, then you get a MissingExternalDependencyError exception asking you > to install renderPM. We will need to look into this further for the > documentation. Marking this bug as FIXED. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 12:55:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 07:55:11 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200812231255.mBNCtB1L017851@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 07:55 EST ------- Example script showing the reportlab render modules producing output given a filename, handle, or via a string: from reportlab.pdfgen.canvas import Canvas from reportlab.lib.units import cm from reportlab.graphics import renderPS, renderPDF, renderPM from reportlab.graphics.shapes import Drawing, String width = 10*cm height = 2*cm print "Using canvas directly (PDF only)..." c = Canvas("hello1.pdf", pagesize=(width, height)) c.drawString(1*cm, 1*cm, "Hello World!") c.showPage() c.save() #Create very simple drawing object, drawing = Drawing(width, height) drawing.add(String(1*cm, 1*cm, "Hello World!")) print "Using filenames..." renderPDF.drawToFile(drawing, "hello2.pdf") renderPM.drawToFile(drawing, "hello2.png", "PNG") print "Using handles..." handle = open("hello3.pdf","w") renderPDF.drawToFile(drawing, handle) handle.close() handle = open("hello3.ps","w") renderPS.drawToFile(drawing, handle) handle.close() handle = open("hello3.png","w") renderPM.drawToFile(drawing, handle, "PNG") handle.close() print "Using strings..." handle = open("hello4.pdf","w") handle.write(renderPDF.drawToString(drawing)) handle.close() handle = open("hello4.ps","w") handle.write(renderPS.drawToString(drawing)) handle.close() handle = open("hello4.png","w") handle.write(renderPM.drawToString(drawing, "PNG")) handle.close() print "Done" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 23 13:14:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Dec 2008 08:14:06 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200812231314.mBNDE64X019775@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-12-23 08:14 EST ------- (In reply to comment #0) > (1) All the Bio.Graphics "write to file/handle" functions to accept any of the > supported file formats (like Bio.Graphics.GenomeDiagram), which would require > renderPM at run time for the bitmap formats (see Bug 2710). They should share > some code for mapping format names to ReportLab rendering module. This would > be easy to do without changing the existing mix of method names. In addition, I notice that Bio.Graphics.BasicChromosome, Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram expects upper case. We should be consistent, which for backwards compatibility would mean accepting either case. > (2) Update the docstrings for the "write to file/handle" functions to make it > clear they can accept a filename OR a handle (a result of the underlying > reportlab renderer's drawToFile function's behaviour - see note below). I've updated the docstrings in CVS, Bio/Graphics/BasicChromosome.py revision 1.3 Bio/Graphics/Comparative.py revision 1.2 Bio/Graphics/Distribution.py revision 1.3 Bio/Graphics/GenomeDiagram/Diagram.py revision 1.3 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed Dec 24 10:52:48 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 24 Dec 2008 02:52:48 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <442447.52362.qm@web62407.mail.re1.yahoo.com> Message-ID: <451304.38587.qm@web62407.mail.re1.yahoo.com> Hi everybody, How about the following for Biopython tests: For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite". We could now do the following in run_tests.py: 1) import the testing module and save its output 2) try to call module.testing_suite 3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite. 4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output. I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests. Questions, comments, objections, anybody? --Michiel. --- On Thu, 12/4/08, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework > To: "Brad Chapman" , "Peter" > Cc: biopython-dev at lists.open-bio.org > Date: Thursday, December 4, 2008, 7:32 AM > > Michiel de Hoon wrote: > > > If one of the sub-tests fails, Python's unit > > > testing framework will tell us so, > > > though (perhaps) not exactly which sub-test > fails. > > > However, that is easy to > > > figure out just by running the individual test > script > > > by itself. > > > > That won't always work. Consider intermittent > network > > problems, or tests using random data - in general it > > really is worthwhile having run_tests.py report a > little > > more than just which test_XXX.py module failed. > > > I wonder if Python's unit testing framework allows us > to capture exactly which sub-test fails. I'll look into > that. Ideally, it should be possible to have regular Python > unit tests and Biopython-style print-and-compare tests side > by side, and get information about failing sub-tests for > both. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From dalloliogm at gmail.com Thu Dec 25 19:22:04 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Thu, 25 Dec 2008 20:22:04 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com> References: <442447.52362.qm@web62407.mail.re1.yahoo.com> <451304.38587.qm@web62407.mail.re1.yahoo.com> Message-ID: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com> On Wed, Dec 24, 2008 at 11:52 AM, Michiel de Hoon wrote: > Hi everybody, > > How about the following for Biopython tests: > > For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite". Merry Christmas! Some people suggested me the nose python framework: - http://somethingaboutorange.com/mrl/projects/nose/ It is used by many other open source projects, like sqlalchemy and elixir. I haven't tried it but I think it does more or less everything you said automatically, we could try to adopt it. > > We could now do the following in run_tests.py: > > 1) import the testing module and save its output > 2) try to call module.testing_suite > 3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite. > 4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output. > > I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests. > > Questions, comments, objections, anybody? > > --Michiel. > > > --- On Thu, 12/4/08, Michiel de Hoon wrote: > >> From: Michiel de Hoon >> Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework >> To: "Brad Chapman" , "Peter" >> Cc: biopython-dev at lists.open-bio.org >> Date: Thursday, December 4, 2008, 7:32 AM >> > Michiel de Hoon wrote: >> > > If one of the sub-tests fails, Python's unit >> > > testing framework will tell us so, >> > > though (perhaps) not exactly which sub-test >> fails. >> > > However, that is easy to >> > > figure out just by running the individual test >> script >> > > by itself. >> > >> > That won't always work. Consider intermittent >> network >> > problems, or tests using random data - in general it >> > really is worthwhile having run_tests.py report a >> little >> > more than just which test_XXX.py module failed. >> > >> I wonder if Python's unit testing framework allows us >> to capture exactly which sub-test fails. I'll look into >> that. Ideally, it should be possible to have regular Python >> unit tests and Biopython-style print-and-compare tests side >> by side, and get information about failing sub-tests for >> both. >> >> --Michiel. >> >> >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Fri Dec 26 14:32:02 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 26 Dec 2008 06:32:02 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com> Message-ID: <726361.18977.qm@web62402.mail.re1.yahoo.com> --- On Thu, 12/25/08, Giovanni Marco Dall'Olio wrote: > Some people suggested me the nose python framework: > - http://somethingaboutorange.com/mrl/projects/nose/ > > It is used by many other open source projects, like > sqlalchemy and elixir. > I haven't tried it but I think it does more or less > everything you > said automatically, we could try to adopt it. If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work? --Michiel. From dalloliogm at gmail.com Fri Dec 26 17:52:58 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 26 Dec 2008 18:52:58 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <726361.18977.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com> <726361.18977.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com> On Fri, Dec 26, 2008 at 3:32 PM, Michiel de Hoon wrote: > --- On Thu, 12/25/08, Giovanni Marco Dall'Olio wrote: >> Some people suggested me the nose python framework: >> - http://somethingaboutorange.com/mrl/projects/nose/ >> >> It is used by many other open source projects, like >> sqlalchemy and elixir. >> I haven't tried it but I think it does more or less >> everything you >> said automatically, we could try to adopt it. > > If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work? nose is a testing framework, so it is a dependency only for developers. I have been able to install sqlalchemy and elixir (projects that make use of nose) without having to install this framework first. The docs on nose's website can explain its usage better than me. Basically, you have to install nose (easy_install nose) and then run it as a shell command (nosetests). It automatically reads all the files in the current directory and subdirectories, collects all the methods/classes/etc whose name begins or ends with 'test_' (_test), plus any unittest, and execute them. It can also read doctests, it is possible to write plugins and apply an high degree of customization. I tried to run it over the latest biopython cvs, and it already highlighted some problems (a few modules still using Martel, etc). I forgot to say that this project is also hosted on google/code: - http://code.google.com/p/python-nose/ You can find more information in the docs: - http://code.google.com/p/python-nose/wiki/FindingAndRunningTests p.p.s. Even if it was a dependency, I think it is worth to use it anyway, rather than rewriting existing code. > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Fri Dec 26 21:40:57 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 26 Dec 2008 13:40:57 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com> Message-ID: <590227.1906.qm@web62402.mail.re1.yahoo.com> --- On Fri, 12/26/08, Giovanni Marco Dall'Olio wrote: > > If we use nose, does that mean adding another > dependency to Biopython? If so, I don't think it's > worth it. If not, how does this work? > > nose is a testing framework, so it is a dependency only for > developers. If we use nose, can our users still run the Biopython tests (without having to install nose first)? --Michiel. From dalloliogm at gmail.com Sat Dec 27 08:48:09 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 27 Dec 2008 09:48:09 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <590227.1906.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com> <590227.1906.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> On Fri, Dec 26, 2008 at 10:40 PM, Michiel de Hoon wrote: > --- On Fri, 12/26/08, Giovanni Marco Dall'Olio wrote: >> > If we use nose, does that mean adding another >> dependency to Biopython? If so, I don't think it's >> worth it. If not, how does this work? >> >> nose is a testing framework, so it is a dependency only for >> developers. > > If we use nose, can our users still run the Biopython tests (without having to install nose first)? Yes, but they will have to do it manually, or with a wrapper script (as it is now). Basically, we will have to move every test in functions/classes with names beginning with 'test_'. To be more precise, they should match the regular expression '(?:^|[b_.-])[Tt]est' (it is also possible to coustomize this regex). So, if a test now is it like this: if __name__ == '__main__': seq = Seq('sadasda') assert seq.tostring() == 'sadasda' we will have to refactor it like this: def _test(): """test description""" seq = Seq('sadasda') assert seq.tostring() == 'sadasda' if __name__ == '__main__': _test() # this is optional > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Sun Dec 28 16:04:14 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 28 Dec 2008 08:04:14 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> Message-ID: <877679.6134.qm@web62406.mail.re1.yahoo.com> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: > >> > If we use nose, does that mean adding another > >> > dependency to Biopython? If so, I don't think > >> > it's worth it. If not, how does this work? > >> > >> nose is a testing framework, so it is a dependency > >> only for developers. > > > > If we use nose, can our users still run the Biopython > tests (without having to install nose first)? > > Yes, but they will have to do it manually, or with a > wrapper script (as it is now). By manually, do you mean running each test separately by hand? If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework? --Michiel. From biopython at maubp.freeserve.co.uk Sun Dec 28 16:51:58 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 28 Dec 2008 16:51:58 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com> References: <442447.52362.qm@web62407.mail.re1.yahoo.com> <451304.38587.qm@web62407.mail.re1.yahoo.com> Message-ID: <320fb6e00812280851y32450bb9le505ae257726f497@mail.gmail.com> On Wed, Dec 24, 2008 at 10:52 AM, Michiel de Hoon wrote: > > Hi everybody, > > How about the following for Biopython tests: > > For Python's unittest-style test modules, Python's unittest documentation > recommends to define a function in each test module that returns the > test suite. Most Biopython tests that use the unittest framework already > do this (the function is called "testing_suite". > > We could now do the following in run_tests.py: > > 1) import the testing module and save its output > 2) try to call module.testing_suite > 3) if it exists, then we're using Python's unittest framework. > So we run the tests in the testing suite. > 4) if it does not exist, then we're using the print-and-compare > approach. So we compare the saved output from the test to the correct output. > > I think that this can be set up such that it looks like nothing has > changed for the user, while the files containing the correct > output are no longer needed for the unittest-based tests. > > Questions, comments, objections, anybody? Sounds good to me - and doesn't add any new dependencies either. Peter From dalloliogm at gmail.com Sun Dec 28 21:11:59 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 28 Dec 2008 22:11:59 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <877679.6134.qm@web62406.mail.re1.yahoo.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> Message-ID: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> On Sun, Dec 28, 2008 at 5:04 PM, Michiel de Hoon wrote: > --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: >> >> > If we use nose, does that mean adding another >> >> > dependency to Biopython? If so, I don't think >> >> > it's worth it. If not, how does this work? >> >> >> >> nose is a testing framework, so it is a dependency >> >> only for developers. >> > >> > If we use nose, can our users still run the Biopython >> tests (without having to install nose first)? >> >> Yes, but they will have to do it manually, or with a >> wrapper script (as it is now). > If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework? The wrapper script won't be as efficient as using nose. Writing a separated wrapper script will take much time and it will be very difficult to mantain updated; moreover, you will have to test the wrapper script itself, to prove that it works and doesn't alter the results of the tests. Nose is not a replacement for unittests: it is a tool that searches for every unittest and script that look like a test, and execute it. It has a few advantages more, for example it enables global methods for setUp and tearDown, but it is not necessary to use them. If you want to reorganize the biopython's testing infrastructure, then you should think about adopting a serious testing environment, whether it is nose or something else. You can't continue on relying on wrapper scripts, they are too difficult to mantain and they are not really scientifically valid. The pygr project (another bioinformatics library in python) make use of nose, and they explain how in their documentation: - http://bioinformatics.ucla.edu/pygr_0_7_b3/testing-doc.html Please have a look at the pages I have posted before. > By manually, do you mean running each test separately by hand? I mean they will have to be run in the same way as it is now. Maybe, there is a way to use nose itself to create a wrapper script automatically. In fact, what nose does is to find all the functions that look like tests, and then execute them. It should be possible to just save the statements that are executed in a log file, that can be used as a wrapper script. If this option doesn't exists yet, we can just propose it to nose's developers. In brief, I think it doesn't make sense to write a new testingg framework just for biopython, when there are many already existing tool available and free to use. > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Dec 29 00:18:22 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Dec 2008 00:18:22 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> Message-ID: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> Giovanni wrote: >> nose is a testing framework, so it is a dependency >> only for developers. Requiring another external dependency does count against using nose - it is much nicer if anyone installing Biopython from source can run our test suite without having to install anything further. Giovanni wrote: > If you want to reorganize the biopython's testing infrastructure, then > you should think about adopting a serious testing environment, whether > it is nose or something else. You can't continue on relying on wrapper > scripts, they are too difficult to mantain and they are not really > scientifically valid. I'm not sure I understand your point here (especially re difficult to maintain and not scientifically valid). I'm failry happy with the current test framework - I would rather see any effort be spent on writing more tests under the current framework than switching the framework itself. Giovanni wrote: > In brief, I think it doesn't make sense to write a new testingg > framework just for biopython, when there are many already existing > tool available and free to use. We haven't been talking about writing a new test frame work (which I agree isn't a good idea). Rather we're talking about a modification to the existing Biopython test framework (part of which uses the built in python unittest library). Michiel's proposal on 24th Dec seems like it will simplify working with unittest based tests (especially not having to track their trivial output in CVS/SVN). Peter From dalloliogm at gmail.com Mon Dec 29 09:53:51 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 29 Dec 2008 10:53:51 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> Message-ID: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com> On Mon, Dec 29, 2008 at 1:18 AM, Peter wrote: > Giovanni wrote: >>> nose is a testing framework, so it is a dependency >>> only for developers. > > Requiring another external dependency does count against using nose - > it is much nicer if anyone installing Biopython from source can run > our test suite without having to install anything further. As I was saying before, it will be not a dependency. It's an external tool that you can use or not to execute the tests automatically. Also, it is not a replacement for unittest. It is comparable to using epydoc for the documentation. > Giovanni wrote: >> If you want to reorganize the biopython's testing infrastructure, then >> you should think about adopting a serious testing environment, whether >> it is nose or something else. You can't continue on relying on wrapper >> scripts, they are too difficult to mantain and they are not really >> scientifically valid. > > I'm not sure I understand your point here (especially re difficult to > maintain and not scientifically valid). > The wrapper script itself is a program. Therefore, if you want to be paranoid, you will have to test it too :) It will be difficult to mantain because everytime you will have to modify it to adapt to the new tests etc. Many big opensource python project make use of this framework, and it has already been proven to work correctly; so the quality of biopython would be comparable with those existing projects. Another projecty that make use of nose is pytables (hdf5 format wrapper for python). They say they have some billions of tests :). > I'm failry happy with the current test framework - I would rather see > any effort be spent on writing more tests under the current framework > than switching the framework itself. > > Giovanni wrote: >> In brief, I think it doesn't make sense to write a new testingg >> framework just for biopython, when there are many already existing >> tool available and free to use. > > We haven't been talking about writing a new test frame work (which I > agree isn't a good idea). Rather we're talking about a modification > to the existing Biopython test framework (part of which uses the built > in python unittest library). Michiel's proposal on 24th Dec seems > like it will simplify working with unittest based tests (especially > not having to track their trivial output in CVS/SVN). Then you will have to develop a way to execute only some of the tests (e.g. only those who doesn't make use of internet connection, or only those who make use of a database). You will need to write some methods for running some setUp and tearDown methods globally. You will have to verify your wrapper script works. In short, you will end up with writing a tool which will be really similar to nose. So, since this tool already exists now, you will save a lot of time by using it. Michel's proposal is good, but I am saying that there are already tools that do the same thing automatically. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Dec 29 18:21:33 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Dec 2008 18:21:33 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <877679.6134.qm@web62406.mail.re1.yahoo.com> <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com> <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com> <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com> Message-ID: <320fb6e00812291021n297af797scaf7fd6ba1a7b048@mail.gmail.com> >> We haven't been talking about writing a new test frame work (which I >> agree isn't a good idea). Rather we're talking about a modification >> to the existing Biopython test framework (part of which uses the built >> in python unittest library). Michiel's proposal on 24th Dec seems >> like it will simplify working with unittest based tests (especially >> not having to track their trivial output in CVS/SVN). > > Then you will have to develop a way to execute only some of the tests > (e.g. only those who doesn't make use of internet connection, or only > those who make use of a database). ... We already have that in place and working for our current framework. > ... Michel's proposal is good, but I am saying that there are already > tools that do the same thing automatically. Well, let's go with Michiel's plan in the short term (a modification to the current Biopython test framework, see his email of 24th December). We will then have a clear divide into two styles of unit test: (1) Those where the output is captured and compared to the expected output (which will also be in CVS). These are easy to write as essentially any example Biopython script can be used. (2) Those using the python unittest framework. I think these are more complicated and require a bit more effort and thought to write (and debug), but make it very clear what exactly is being tested. Peter From mjldehoon at yahoo.com Tue Dec 30 10:06:08 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Dec 2008 02:06:08 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> Message-ID: <620107.65178.qm@web62401.mail.re1.yahoo.com> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: > Basically, we will have to move every test in > functions/classes with > names beginning with 'test_'. To be more precise, > they should match > the regular expression '(?:^|[b_.-])[Tt]est' (it is > also possible to > coustomize this regex). > > So, if a test now is it like this: > > if __name__ == '__main__': > seq = Seq('sadasda') > assert seq.tostring() == 'sadasda' > > we will have to refactor it like this: > > def _test(): > """test description""" > seq = Seq('sadasda') > assert seq.tostring() == 'sadasda' > > if __name__ == '__main__': > _test() # this is optional Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this. --Michiel From dalloliogm at gmail.com Tue Dec 30 13:53:34 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 30 Dec 2008 14:53:34 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <620107.65178.qm@web62401.mail.re1.yahoo.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <620107.65178.qm@web62401.mail.re1.yahoo.com> Message-ID: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> On Tue, Dec 30, 2008 at 11:06 AM, Michiel de Hoon wrote: > > > > --- On Sat, 12/27/08, Giovanni Marco Dall'Olio wrote: >> Basically, we will have to move every test in >> functions/classes with >> names beginning with 'test_'. To be more precise, >> they should match >> the regular expression '(?:^|[b_.-])[Tt]est' (it is >> also possible to >> coustomize this regex). >> >> So, if a test now is it like this: >> >> if __name__ == '__main__': >> seq = Seq('sadasda') >> assert seq.tostring() == 'sadasda' >> >> we will have to refactor it like this: >> >> def _test(): >> """test description""" >> seq = Seq('sadasda') >> assert seq.tostring() == 'sadasda' >> >> if __name__ == '__main__': >> _test() # this is optional > > Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this. Why do you find it difficult? You just have to rename every test to make sure that its name starts or end with 'test_'. That's all. If you want to reorganize biopython's testing framework, this is a good thing to do anyway. In particular, every test function/class/script name should match the regular expression '(?:^|[b_.-])[Tt]est' (it can be customized). Unittest modules and doctest will be recognized, too. Note that nose already works if you run it over biopython's cvs; but since I am not familiar with biopython's code, I am not sure it recognizes every test. Ehm, this example that I put won't work with the default settings :/ it expected 'test_module' or something like this (anyway, the regex can be customized). > --Michiel > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Tue Dec 30 17:29:06 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Dec 2008 17:29:06 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <620107.65178.qm@web62401.mail.re1.yahoo.com> <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> Message-ID: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com> > You just have to rename every test to make sure that its name starts > or end with 'test_'. That's all. > If you want to reorganize biopython's testing framework, this is a > good thing to do anyway. All the individual Biopython test scripts are named test_*.py anyway, so that should be fine. Those test scripts were we have to verify the output probably won't work in nose (this is handled via our run_test.py framework), but the rest of our test scripts being unittest based might already be fine with nose. Peter From dalloliogm at gmail.com Tue Dec 30 18:34:15 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 30 Dec 2008 19:34:15 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com> References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com> <620107.65178.qm@web62401.mail.re1.yahoo.com> <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com> <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com> Message-ID: <5aa3b3570812301034i5c007d92k17a8e55c61b5715@mail.gmail.com> On Tue, Dec 30, 2008 at 6:29 PM, Peter wrote: >> You just have to rename every test to make sure that its name starts >> or end with 'test_'. That's all. >> If you want to reorganize biopython's testing framework, this is a >> good thing to do anyway. > > All the individual Biopython test scripts are named test_*.py anyway, > so that should be fine. Those test scripts were we have to verify the > output probably won't work in nose (this is handled via our > run_test.py framework), but the rest of our test scripts being > unittest based might already be fine with nose. I think it executes also the run_test.py scripts, because its name matches that regular expression. > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Tue Dec 30 18:34:45 2008 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 30 Dec 2008 19:34:45 +0100 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> References: <20081125144041.GC83220@sobchak.mgh.harvard.edu> <45956.75241.qm@web62406.mail.re1.yahoo.com> <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> Message-ID: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> On Fri, Nov 28, 2008 at 12:09 PM, Peter wrote: > Brad wrote: >> Agreed with the distinction between the unit tests and the "dump >> lots of text and compare" approach. I've written both and do think >> the unit testing/assertion model is more robust since you can go >> back and actually get some insight into what someone was thinking >> when they wrote an assertion. > > I have probably written more of the "dump lots of text and compare" > style tests. I think these have a number of advantages: > (1) Easier for beginneers to write a test, you can almost take any > example script and use that. You don't have to learn the unit test > framework. I agree with what you say, but I think that all the 'dump and compare' tests should be organized in various functions. This will make easier to use and understand them, and they will be compatible with the nose framework. > (2) Debugging a failing test in IDLE is much easier - using unit tests > you have all that framework between you and the local scope where the > error happens. > (3) For many broad tests, manually setting up the expected output for > an assert is extremely tedious (e.g. parsing sequences and checking > their checksums). This is an interesting discussion if you want to talk about it a bit. An advantage of unittest are the two setUp and tearDown methods (fixtures). With those, you are sure that all the tests are run with the right environment and that all variables are dropped before executing a new test. Also, if you want to do a lot of dump and compare tests, consider writing some big doctest scripts. It will require a bit more of work to write them, but they will be easier to understand, and they will also become good tutorials for the users. This is a tutorial we wrote for a small project not related to biopython: - http://github.com/cswegger/datamatrix/tree/master/tutorial.txt As you can see, the text is both a tutorial and a test set (which make use of a dump and compare approach) for the program. > We could discuss a modification to run_tests.py so that if there is no > expected output file output/test_XXX for test_XXX.py we just run > test_XXX.py and check its return value (I think Michiel had previously > suggested something like this). I think this should be done inside the test itself. All the tests should return only a boolean value (passed or not) and a description of the error. The tests that make use of an expected output file, they should open it and do the comparison by theirselves, not in run_tests.py. > Perhaps for more robustness, capture > the output and compare it to a predefined list of regular expressions > covering the typical outputs. For example, looking at > output/test_Cluster, the first line is the test name, but rest follows > the patten "test_... ok". I imaging only a few output styles exist. mmm have you changed this file in the cvs recently? I can't find what you are referring to. > With such a change, half the unit test's (e.g. test_Cluster.py) > wouldn't need their output file in CVS (output/test_Cluster). > > Michiel de Hoon wrote: >> If one of the sub-tests fails, Python's unit testing framework will tell us so, >> though (perhaps) not exactly which sub-test fails. However, that is easy to >> figure out just by running the individual test script by itself. > > That won't always work. Consider intermittent network problems, or > tests using random data - in general it really is worthwhile having > run_tests.py report a little more than just which test_XXX.py module > failed. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Tue Dec 30 23:33:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Dec 2008 23:33:16 +0000 Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> References: <20081125144041.GC83220@sobchak.mgh.harvard.edu> <45956.75241.qm@web62406.mail.re1.yahoo.com> <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com> <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> Message-ID: <320fb6e00812301533h55f5e9eehcec69cc1d5913420@mail.gmail.com> Brad wrote: >>> Agreed with the distinction between the unit tests and the "dump >>> lots of text and compare" approach. I've written both and do think >>> the unit testing/assertion model is more robust since you can go >>> back and actually get some insight into what someone was thinking >>> when they wrote an assertion. Peter worte: >> I have probably written more of the "dump lots of text and compare" >> style tests. I think these have a number of advantages: >> (1) Easier for beginners to write a test, you can almost take any >> example script and use that. You don't have to learn the unit test >> framework. >> ... Giovanni wrote: > I agree with what you say, but I think that all the 'dump and compare' > tests should be organized in various functions. > This will make easier to use and understand them, and they will be > compatible with the nose framework. If we organise the "dump and compare" tests into various functions (e.g. using the unittest framework), and turn print statements into asserts etc, then yes they would become nose compatible. However, this is a lot of work, and for relatively little gain. Also, doing so we lose the simplicity (e.g. my points made earlier) and make it harder for newcomers to write further tests. Nevertheless, we could regard Michiel's plan of 24 Dec as a step towards this, in that it simplifies writing unittest based tests (in that they won't need an expected output file which must also be kept in CVS/SVN). I'm not sure what you meant by "This will make easier to use and understand them, ...". Switching the unit test coding style makes no difference to the end user's point of view, they run the test suite using "python setup.py test" (typically as part of installation from source, or from the tests directory using "python run_tests.py") and won't see any difference in how the tests work internally. In terms of understanding the unit tests: If you are a beginner wanting to look at a unit test to give a feel for how to use the code, then frankly those of our unit tests which simple do some imports and print some output are MUCH easier to understand. By their nature they are essentially example Biopython scripts. On the other hand, those of our unit tests using the unittest framework have all these each object classes defined, and split up the setup/clean up into separate methods etc. In some senses this is "clutter" which is not helpful if you want to regard the unit test also as a usage example. >> (2) Debugging a failing test in IDLE is much easier - using unit tests >> you have all that framework between you and the local scope where the >> error happens. > >> (3) For many broad tests, manually setting up the expected output for >> an assert is extremely tedious (e.g. parsing sequences and checking >> their checksums). > > This is an interesting discussion if you want to talk about it a bit. It could be, but I don't want to get side tracked (distracted) from pressing ahead with Michiel's plan (the email of 24th Dec, or something similar) which seems to be a worthwhile small improvement to the current status. > An advantage of unittest are the two setUp and tearDown methods (fixtures). > With those, you are sure that all the tests are run with the right > environment and that all variables are dropped before executing a new > test. For some tests, yes, this is useful - in particular where there are lots of independent small things you want to test. In other situations you want to test a work flow, with a series of cumulative steps each building on each other. This would end up as a single large test function/method. > Also, if you want to do a lot of dump and compare tests, consider > writing some big doctest scripts. > It will require a bit more of work to write them, but they will be > easier to understand, and they will also become good tutorials for the > users. Certainly some of the current simple "dump and compare" tests might be converted into doctests (and we could do this within the current Biopython framework). However, the requirements for good documentation and good test coverage differ - you'd want to include tests for atypical code which you would not want to encourage as good coding practice. I'm quite keen for further usage of doctests - but I see them primarily as an improvement to our documentation. Peter wrote: >> We could discuss a modification to run_tests.py so that if there is no >> expected output file output/test_XXX for test_XXX.py we just run >> test_XXX.py and check its return value (I think Michiel had previously >> suggested something like this). Note that Michiel's email of 24th Dec is another approach to this topic - either would work, but his plan makes the division between the two test types much more explicit. Giovanni wrote: > I think this should be done inside the test itself. > All the tests should return only a boolean value (passed or not) and a > description of the error. > The tests that make use of an expected output file, they should open > it and do the comparison by theirselves, not in run_tests.py. Your plan would work, but it means the simplicity of this style of unit test is lost. Rather than doing this change (which would be a moderate amount of tedious work), I would rather go all the way and make them unittest based like the rest of our test suite. >> Perhaps for more robustness, capture >> the output and compare it to a predefined list of regular expressions >> covering the typical outputs. For example, looking at >> output/test_Cluster, the first line is the test name, but rest follows >> the patten "test_... ok". I imaging only a few output styles exist. >> With such a change, half the unit test's (e.g. test_Cluster.py) >> wouldn't need their output file in CVS (output/test_Cluster). > > mmm have you changed this file in the cvs recently? I can't find what > you are referring to. For this example, the unit test Tests/test_Cluster.py is here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/test_Cluster.py?cvsroot=biopython Its expected output file Test/output/test_Cluster is here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/output/test_Cluster?cvsroot=biopython Peter