From biopython at maubp.freeserve.co.uk  Mon Dec  1 07:56:12 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 1 Dec 2008 12:56:12 +0000
Subject: [Biopython-dev] Deprecation and removal policy
In-Reply-To: <bbcd77d00811301837qf6e7909x18b09f423c55a800@mail.gmail.com>
References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>
	<bbcd77d00811301837qf6e7909x18b09f423c55a800@mail.gmail.com>
Message-ID: <320fb6e00812010456r9ae1a66p66032d02377003db@mail.gmail.com>

Peter wrote:
>> ...
>> How about a new policy that after adding a deprecation warning,
>> deprecated modules/functions are kept for at least two public releases
>> AND at least 12 months (counting from the first release when they are
>> deprecated - not the date of the CVS change) before being removed?

Bruce wrote:
>
> Hi,
> Generally I would agree with idea for code that is under active
> development. For certain code that has not really been touched for a
> few years except for trivial changes (like removing string functions),
> I think 12 months is perhaps too long if it passes two releases.

Just because some (deprecated) code hasn't been changed in several
years doesn't mean no-one is using it.  Giving less warning for
removing such old but stable code isn't fair.

> Regardless of how it is done, Python 3 will need to be supported (the
> final release is due soon) and I do not see a reason to port
> depreciated modules or functions just because of some policy.  So I
> would add the provision that depreciated code will not be ported to
> the Python 3 compatible Biopython branch.

I disagree - dropping old modules is changing the API, counter to
Guido and other's recommendation/request: "Don't change your APIs
incompatibly when porting to Py3k."
http://www.artima.com/weblogs/viewpost.jsp?thread=227041

If porting any particular deprecated module or piece of code to Python
3 proved too difficult, then maybe we might drop that code (for
example, due to third party dependencies on an obsolete version of
mxTextTools, I don't think we'll port Martel/Mindy to Python 3).

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 10:36:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:36:33 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011536.mB1FaXWF003857@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:36 EST -------
Unit Test
=========
The unit test included, test_GenomeDiagram.py adds yet another GenBank file to
the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB
is best avoided.  I would prefer we used existing GenBank files already
included in Biopython which would serve just as well.  e.g.

GenBank/NC_005816.gb file (Yersinia pestis biovar Microtus str. 91001 plasmid
pPCP1) which is circular.  9609 bp.

GenBank/arab1.gb (Arabidopsis thaliana BAC T25K16 from chromosome I) which is
linear.  86436 bp.

Also, the code to parse the GenBank file does so via Bio.GenBank, and I would
prefer to use Bio.SeqIO here.

I'll attach a revised version shortly...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 10:40:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:40:22 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011540.mB1FeMWx004105@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:40 EST -------
Bio.Graphics.GenomeDiagram.Utilities
====================================
This is a collection of utilities for getting information useful for graph
values.  From the docstring,

    o apply_to_window (sequence, window_size, function, step=None)  Apply a
                        passed function to fragments of the passed sequence of
                        size window_size, with each window separated by the
                        passed step.

    o calc_gc_content (sequence)    Returns the %GC content of a passed
sequence

    o calc_at_content (sequence)    Returns the %AT content of a passed
sequence

    o calc_gc_skew (sequence)    Returns the GC skew of a passed sequence

    o calc_at_skew (sequence)    Returns the AT skew of a passed sequence

    o gc_content (sequence, window_size, step=None)    Returns the %GC content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_content (sequence, window_size, step=None)    Returns the %AT content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o gc_skew (sequence, window_size, step=None)    Returns the GC skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_skew (sequence, window_size, step=None)    Returns the AT skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

I can see why these were useful when GenomeDiagram was a separate package, but
I don't think we should add this file to Biopython as it is unnecessary code
duplication.  If we do lack any of this functionality, putting it somewhere
under Bio.SeqUtils makes more sense than under Bio.Graphics.

I have not looked at any implications this may have for the existing
documentation or the GenomeDiagram unit test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 10:47:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:47:01 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011547.mB1Fl1qY004683@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:47 EST -------
Bio.Graphics.GenomeDiagram.DrawAll
==================================
According to the comments, this is a script to walk a directory structure below
the directory passed, and draw images of each .gbk file found there.

While useful, I don't think this belongs in the core library.  Maybe rename it
and move it into our scripts or example directory instead...

Bio.Graphics.GenomeDiagram.Utilities
====================================
This is a collection of utilities for getting information useful for graph
values.  From the docstring,

    o apply_to_window (sequence, window_size, function, step=None)  Apply a
                        passed function to fragments of the passed sequence of
                        size window_size, with each window separated by the
                        passed step.

    o calc_gc_content (sequence)    Returns the %GC content of a passed
sequence

    o calc_at_content (sequence)    Returns the %AT content of a passed
sequence

    o calc_gc_skew (sequence)    Returns the GC skew of a passed sequence

    o calc_at_skew (sequence)    Returns the AT skew of a passed sequence

    o gc_content (sequence, window_size, step=None)    Returns the %GC content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_content (sequence, window_size, step=None)    Returns the %AT content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o gc_skew (sequence, window_size, step=None)    Returns the GC skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_skew (sequence, window_size, step=None)    Returns the AT skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

I can see why these were useful when GenomeDiagram was a separate package, but
I don't think we should add this file to Biopython as it is unnecessary code
duplication.  If we do lack any of this functionality, putting it somewhere
under Bio.SeqUtils makes more sense than under Bio.Graphics.

I have not looked at any implications this may have for the existing
documentation or the GenomeDiagram unit test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 10:49:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:49:14 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011549.mB1FnEB8004888@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:49 EST -------
(In reply to comment #10)
> Bio.Graphics.GenomeDiagram.Utilities
> ====================================
> This is a collection of utilities for getting information useful for graph
> values.  From the docstring, ...

Sorry - ignore this comment, it should have been on Bug 2671.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 10:51:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:51:19 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011551.mB1FpJNU005019@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #13 from lpritc at scri.sari.ac.uk  2008-12-01 10:51 EST -------
(In reply to comment #11)
> Unit Test
> =========
> The unit test included, test_GenomeDiagram.py adds yet another GenBank file to
> the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB
> is best avoided.  I would prefer we used existing GenBank files already
> included in Biopython which would serve just as well.

That's a good idea.

> Also, the code to parse the GenBank file does so via Bio.GenBank, and I would
> prefer to use Bio.SeqIO here.

I noticed that in revising the documentation, but hadn't got around to doing
anything about it, except in the example code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 10:59:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:59:35 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011559.mB1FxZwH005670@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #14 from lpritc at scri.sari.ac.uk  2008-12-01 10:59 EST -------
(In reply to comment #12)
> Bio.Graphics.GenomeDiagram.DrawAll
> ==================================
> According to the comments, this is a script to walk a directory structure below
> the directory passed, and draw images of each .gbk file found there.
> 
> While useful, I don't think this belongs in the core library.  Maybe rename it
> and move it into our scripts or example directory instead...

Ah.  I thought I'd left that one out.  I was picturing perhaps having a
Utilities.py module containing a function with that behaviour, and/or functions
that drew a standard representation of a GenBank file, so that those who are
not interested in the minutiae of the API/drawing their diagrams could still
get a fair amount of function for little effort.

On reflection, these functions are perhaps better suited to living in
__init__.py.  What do you think?

> Bio.Graphics.GenomeDiagram.Utilities
> ====================================
> This is a collection of utilities for getting information useful for graph
> values,

> I can see why these were useful when GenomeDiagram was a separate package, but
> I don't think we should add this file to Biopython as it is unnecessary code
> duplication.  If we do lack any of this functionality, putting it somewhere
> under Bio.SeqUtils makes more sense than under Bio.Graphics.

Where there is repetition of function here, I'm happy to go with established
Biopython code in preference.  For graph data, GenomeDiagram expects a list of
(position, value) tuples, which the functions in Utilities.py supply directly. 
There will be a level of user-processing required in moving to the Biopython
versions.  Perhaps the inclusion of similar functions in __init__ that wrap the
Biopython versions to produce the appropriate format for graphs would be useful
here?

> I have not looked at any implications this may have for the existing
> documentation or the GenomeDiagram unit test.

Removing Utilities.py outright will affect both the documentation and the unit
test.  Both require those functions (or something similar) to generate
test/example graph data.

I would be happy to replace the existing functions with wrapped Biopython
functions in __init__ - does this seem like a sensible option?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 11:59:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 11:59:50 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011659.mB1GxoGa009013@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1063 is|0                           |1
           obsolete|                            |
Attachment #1121 is|0                           |1
           obsolete|                            |


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 11:59 EST -------
Created an attachment (id=1132)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1132&action=view)
Zip of python files to go under Bio/Graphics/GenomeDiagram

This attachment is just the main python files, omitting DrawAll.py and
Utilities.py (see comment 12 and comment 14).  The unit test needs updating to
match (but then passes, updated version to follow).

(In reply to comment #0)
> Code for wx widgets has been removed, although the Observer/Observable code
> remains, allowing user widgets to hook into the code, if that's desirable.

There was a tiny bit of wx stuff still there in Diagram.py which I have removed
in this version.

After discussion with Leighton directly, due to possible uncertainly over the
licensing of the Observer/Observable code (originally based on an example by
Peter Norvig) this has been removed, together with the associated "set" methods
in Diagram.py etc.  This code was intended to assist using GenomeDiagram within
a GUI.

Note that if we later want to reintroduce this functionality, using python's
property feature (with get/set functions) would allow the set function to
update the observer.  Leighton's old code would only update the observer if the
set method was used explicitly (and not if the object property were updated
directly).

(In reply to comment #6)
> I am perfectly happy with re-licensing the GD code under the Biopython
> license. If you need a gpg-signed document to say so, I can provide one ;)

I've updated the header of each file to reflect the Biopython license.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 12:20:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 12:20:57 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011720.mB1HKvIJ010157@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 12:20 EST -------
Created an attachment (id=1133)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1133&action=view)
Revised test_GenomeDiagram.py

This uses the existing GenBank/arab1.gb file for input.

It also includes a (slightly modified) copy of the GenomeDiagram.Utilities
functions as a short term solution to the issues raised in comment 12 and
comment 14.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:01:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 15:01:44 -0500
Subject: [Biopython-dev] [Bug 2693] New: LogisticRegression convergence
	criterion is too lenient
Message-ID: <bug-2693-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2693

           Summary: LogisticRegression convergence criterion is too lenient
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


In R and SAS, the example in the code and tutorial provides the following
parameters:

Intercept =  18.9622
x1        =  -0.0714
x2        =   0.0444

By default, Bio/LogisticRegression.py defines the following parameters
    MAX_ITERATIONS = 500
    CONVERGE_THRESHOLD = 0.01

The convergence threshold is too lenient so the iterations terminate before the
expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD
= 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS
is greater than 7761 with my system.

MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within
Bio/LogisticRegression.py module but should be part of the API for the train
function such as:
def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD =
0.000000001, MAX_ITERATIONS=10000):

Note the algorithm used requires a large number of iterations and the train
function does not display the degree of convergence attained when
MAX_ITERATIONS is exceeded.

Jeffrey Whitaker provides Python code using an alternative algorithm: 
http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py

Furthermore, the update_fn should also pass the previous likelihood or
difference is likelihood so the actual convergence can be seen. Really the
update_fn should be more general than this and be able to display more
information but the attached patches provides the previous llh (old_llik).
def show_progress(iteration, old_llh, loglikelihood):
    print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:",
loglikelihood, "Diff:", (old_llh-loglikelihood)

model = LogisticRegression.train(xs, ys, update_fn=show_progress)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:03:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 15:03:27 -0500
Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion
	is too lenient
In-Reply-To: <bug-2693-42@http.bugzilla.open-bio.org/>
Message-ID: <200812012003.mB1K3Rqg017974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2693


------- Comment #1 from bsouthey at gmail.com  2008-12-01 15:03 EST -------
Created an attachment (id=1134)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1134&action=view)
Improvements to LogisticRegression.py

Addresses certain problems with LogisticRegression.py and enhances the module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bartek at rezolwenta.eu.org  Mon Dec  1 15:53:59 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 1 Dec 2008 21:53:59 +0100
Subject: [Biopython-dev] [BioPython]  Refactoring motif analysis code
In-Reply-To: <492ACE38.1090301@gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
Message-ID: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>

Hi all,

I've done some work regarding the motif analysis in Biopython. I've
done the following stuff:
- refactored the Bio.AlignAce and Bio.MEME to use one common motif object
- Put all of the refactored code in the Bio.Motif directory
- Added more code (from my attic) to do motif comparisons and
computing thresholds
  (this was actually written by my colleague Norbert Dojer, but I
adapted it and I have his permission to contribute the code)
- written a short tutorial on the usage of Bio.Motif (that's where I'd put it).
- Written a basic test suite for the new motif.

I haven't added it to cvs yet, but posted it as an attchment to the
enhancement proposal in bugzilla:
http://bugzilla.open-bio.org/show_bug.cgi?id=2694

I have cvs access, so I can commit the changes myself, but I'd like to
wait for an "OK" from someone more involved in the release process.

Since Giovanni and Bruce have responded to my previous call for comments,
I'll  try to answer them below:

On Mon, Nov 24, 2008 at 4:54 PM, Bruce Southey <bsouthey at gmail.com> wrote:

>
> Actually I am not that thrilled with the licenses for these packages and
> similar packages because these are free only for academic use. To me this
> clashes with the spirit of an open-sourced project especially a BSD-licensed
> one. But if there is a need for such modules then these modules should be
> included.
>

I have similar feelings about the "academic-use-only" licenses. On the
other hand,
since most of the biopython users are in academia, then I don't see it
as a big problem.
Also, since I don't have any truly open and free replacement for these
programs, I think
it's better to keep them. In fact the new Bio.Motif package provides
some methods for motif
comparisons, which at least to some extent can be used as a
replacement for the respective
functions of CompareACE and MAST.

As a side note, I think that there is no point in providing parsers
for every single motif finder that
comes out, and I don't think that AlignAce and MEME are the best or
the most representative ones.
It just happened that these parsers were written "to scratch someone's
itch". I think that the other
functionality (motif searching, comparisons,weblogo) might be more
useful to people.

> While it is only free for academic use, have you seen TAMO?
> *TAMO: a flexible, object-oriented framework for analyzing transcriptional
> regulation using DNA-sequence motifs. *
> Bioinformatics. 2005 Jul 15;21(14):3164-5.
> <http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/14/3164>
>
> http://fraenkel.mit.edu/TAMO/

Yes, I've seen it and I've even recommended it on the biopython
mailing list when there was no
 replacement in biopython. However, their library is free only for
academia and AFAIK it's not using
biopython datastructures, so needs some work to integrate with TAMO if
you are using Biopython.
Bio.Motif is meant to provide free software for Motif analysis.

> Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-)
> Based on the CVS, both have been untouched for about three years.
>
Well, I've not used it myself for a while... I'm no longer doing
de-novo motif discovery.
However, it still works so it's potentially useful. I think this is
largely due to the lack of documentation
for the Bio.AlignAce and Bio.MEME tools (partially my fault).
Hopefully people will start using this
if they read the tutorial.

> Also, what species are these used for?
> One of the papers of AlignAce indicate that the base composition was set for
> yeast.
>
They're both general purpose, you can set the gc content for alignAce
and even an HMM for MEME.

>
> Personally I would be interested in a general protein motif finding module
> because of my current research. However, I do have a different view with
> respect to the Biopython community as indicated above with the licenses.

Both MEME and AlignAce can be used to find motifs in proteins, but it
has not so much to do
with Bio.Motif, since it does not provide any motif-finnding
capabilities by itself. In general Bio.Motif
should be able to deal with protein motifs, but I've never tested it
(I'm mostly using it for DNA motifs),
 so I'll be happy to help if you find bugs.

On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
> I would just like to tell you that I have tried the TAMO framework you
> suggested me, and found it very useful.

Yes, I remember, but the problem is with the TAMO license. I think
that the Motif object might be still
useful since it is free, allows to read motifs from databases like
JASPAR to scan sequences  and/or
compare them with "your" motifs.


> I am not using it anymore because I don't need it, but I remember that I liked:
> - the methods to represent motifs as matrixes of frequencies/occurrencies etc..
done
> - the fact that it was easy to create a motif from an alignment of sequences
depending on your definition of easy, it's there
> - the integration it had with this website:
> http://weblogo.berkeley.edu/logo.cgi.
done

> I would suggest you to provide integration with this other web
> service, which enable to plot the difference between two sequence
> logos: http://www.twosamplelogo.org/examples.html.

This I haven't done yet, but I'll try to provide functionality for
that (shouldn't take too long).

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From dalloliogm at gmail.com  Mon Dec  1 16:07:08 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 1 Dec 2008 22:07:08 +0100
Subject: [Biopython-dev] [BioPython] Refactoring motif analysis code
In-Reply-To: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
	<8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
Message-ID: <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com>

On Mon, Dec 1, 2008 at 9:53 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:

> On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>>
>> I would just like to tell you that I have tried the TAMO framework you
>> suggested me, and found it very useful.
>
> Yes, I remember, but the problem is with the TAMO license. I think
> that the Motif object might be still
> useful since it is free, allows to read motifs from databases like
> JASPAR to scan sequences  and/or
> compare them with "your" motifs.

Thanks for all these changes.
I remember that I wrote a mail to TAMO's authors when I was using it.
They seemed to be interested in integrating the code with biopython,
so maybe the license issue could be superated.
It's up to you, whether you want to reimplement all the functions they
have or not.


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From bartek at rezolwenta.eu.org  Tue Dec  2 04:39:37 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 2 Dec 2008 10:39:37 +0100
Subject: [Biopython-dev]  Refactoring motif analysis code
In-Reply-To: <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
	<8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
	<5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com>
	<8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com>
Message-ID: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com>

On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:

> Thanks for all these changes.
> I remember that I wrote a mail to TAMO's authors when I was using it.
> They seemed to be interested in integrating the code with biopython,
> so maybe the license issue could be superated.
> It's up to you, whether you want to reimplement all the functions they
> have or not.

I have to say I haven't done anything yet towards integrating TAMO
with biopython.
So far, my own code was doing the job for me, and since there was a
certain learning curve to get into TAMO,
I didn't look closely into it. I've looked more carefully now at it
and I have two general thoughts:
- There is a number of features in TAMO, for which there is no
counterpart in Bio.Motif. Just by looking at module names I've found:
 - MDscan parser
 - their own EM motif finding scheme (some kind of EM method)
 - several motif comparison functions from MotifCompare
 - a lot of nice little methods for motifs like textLogo, giflogo, etc.
- There is quite an overlap between biopython and TAMO. They
implemented their own Sequence handling, FASTA Parser, clustering
module etc.  There will be some gruntwork with integrating their code
into Biopython (findining and reconciling the overlaps)

I also have to say, that I'm a bit scared by copright statements in
the TAMO code, saying it belongs to the Whitehead institute. I don't
want to be overly pessimistic, but the process of releasing this code
under biopython license might be slow.

What I think is the best way to go is to clean up current mess with
Bio.Alignace and Bio.MEME, and then ask people for contributions.
If TAMO developers would be willing to contribute I'll be happy to
help with integration into biopython. It will take some time anyway,
so I wouldn't delay the inclusion of Bio.Motif into Biopython.

cheers
Bartek


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From timothyham at gmail.com  Tue Dec  2 19:19:48 2008
From: timothyham at gmail.com (Timothy Ham)
Date: Tue, 2 Dec 2008 16:19:48 -0800
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
Message-ID: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>

Hi everyone,

The current biopython GenBank parser dies while parsing VectorNTI
generated files.  For example, until recently, BioPython did not
accept an empty SOURCE field. It still does not handle an empty
VERSION or ACCESSION fields (consumer.data.id never gets filled),
which is the default for user generated vector maps via VectorNTI.

Now, it is easy enough to change the GenBank parser to handle
malformed genbank files, (I can submit patches) but the real question
becomes:
> Should BioPython handle malformed genbank files at all?
I would like to be practical and say yes, since VectorNTI is a very
common, widely used format, but I wanted to ask the community before
submitting my patches.

Thanks for the great work,
Tim

From bsouthey at gmail.com  Tue Dec  2 21:33:26 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 2 Dec 2008 20:33:26 -0600
Subject: [Biopython-dev] Refactoring motif analysis code
In-Reply-To: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
	<8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
	<5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com>
	<8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com>
	<8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com>
Message-ID: <bbcd77d00812021833w4ed8cb46m939faab31ffd780b@mail.gmail.com>

On Tue, Dec 2, 2008 at 3:39 AM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>
>> Thanks for all these changes.
>> I remember that I wrote a mail to TAMO's authors when I was using it.
>> They seemed to be interested in integrating the code with biopython,
>> so maybe the license issue could be superated.
>> It's up to you, whether you want to reimplement all the functions they
>> have or not.
>
> I have to say I haven't done anything yet towards integrating TAMO
> with biopython.
> So far, my own code was doing the job for me, and since there was a
> certain learning curve to get into TAMO,
> I didn't look closely into it. I've looked more carefully now at it
> and I have two general thoughts:
> - There is a number of features in TAMO, for which there is no
> counterpart in Bio.Motif. Just by looking at module names I've found:
>  - MDscan parser
>  - their own EM motif finding scheme (some kind of EM method)
>  - several motif comparison functions from MotifCompare
>  - a lot of nice little methods for motifs like textLogo, giflogo, etc.
> - There is quite an overlap between biopython and TAMO. They
> implemented their own Sequence handling, FASTA Parser, clustering
> module etc.  There will be some gruntwork with integrating their code
> into Biopython (findining and reconciling the overlaps)
>
> I also have to say, that I'm a bit scared by copright statements in
> the TAMO code, saying it belongs to the Whitehead institute. I don't
> want to be overly pessimistic, but the process of releasing this code
> under biopython license might be slow.
>
> What I think is the best way to go is to clean up current mess with
> Bio.Alignace and Bio.MEME, and then ask people for contributions.
> If TAMO developers would be willing to contribute I'll be happy to
> help with integration into biopython. It will take some time anyway,
> so I wouldn't delay the inclusion of Bio.Motif into Biopython.
>
> cheers
> Bartek
>
>
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

I would agree that you should ignore TAMO and just focus on developing
a suitable framework to integrate Alignace and MEME as you have
indicated. I would presume that the other motif finding applications
will also fit into that framework.

Unless the TAMO code is under a BSD-style or equivalent license that
is compatible with Biopython you must stop looking at it. I know it is
hard to avoid as the comes up on Google with a simple search. If the
TAMO code gets suitably licensed, then fine but until then it can
cause major problems that can involve the whole Biopython project
(even including GPLed code can do this).

Bruce

From biopython at maubp.freeserve.co.uk  Wed Dec  3 16:10:49 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Dec 2008 21:10:49 +0000
Subject: [Biopython-dev] Fwd: [Utilities-announce] PubMed Entrez Utility
	2009 DTD changes
In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov>
Message-ID: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com>

This email from the NCBI will be of interest for Bio.Entrez - we may
need to add a few DTD files to Bio.Entrez in preparation for this...
see also Bug 2678.

Peter

---------- Forwarded message ----------
From:  <utilities-announce at ncbi.nlm.nih.gov>
Date: Wed, Dec 3, 2008 at 8:57 PM
Subject: [Utilities-announce] PubMed Entrez Utility 2009 DTD changes
To: utilities-announce at ncbi.nlm.nih.gov


PubMed Entrez Utility Users,

We anticipate switching to the updated PubMed 2009 DTDs on December 15,
2008.

2009 DTDs are available from the Entrez DTD page:
http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/index.html

The DTD changes for the 2009 production year, as noted in the Revision
Notes section near the top of each DTD, are:

NLMMedline DTD (used for MEDLINE/PubMed)
a. Changed entity reference from "nlmmedlinecitation_080101.dtd" to:
"nlmmedlinecitation_090101.dtd"
b. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref and NlmDcmsID
element [Edited 10/16/08]
c. FOR V.3: Added GrantCountry.Ref entity [Edited 10/30/08]

NLMMedlineCitation DTD (used for MEDLINE/PubMed data)
a. Changed entity reference from "nlmsharedcatcit_080101.dtd" to:
"nlmsharedcatcit_090101.dtd"
b. Moved entity Type to nlmcommon dtd
c. Added NLM value to entity Source
d. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref [Edited
10/16/08]

NLMSharedCatCit DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile)
a.  Changed entity reference from "nlmcommon_080101.dtd"
to "nlmcommon_090101.dtd"
b.  Moved OtherAbstract element from nlmsharedcatcit dtd to nlmcommon
dtd

NLMCommon DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile)
a. Added ValidYN attribute to Investigator element
b. Moved OtherAbstract element from nlmsharedcatcit to nlmcommon dtd
c. Added OtherAbstract element to NCBIArticle element
d. Moved entity Type from nlmmedlinecitation to nlmcommon dtd
e. Added Publisher value to entity Type
f. Deleted Consumer value from entity Type
g. Added Country element to Grant element
h. FOR V.2: Changed Country value to GrantCountry.Ref in Grant Element
[Edited 10/30/08]

NLMCatalogRecord DTD (used for CatfilePlus and Serfile in XML format):
a.  Changed entity reference from "nlmsharedcatcit_080101.dtd"
to: "nlmsharedcatcit_090101.dtd"
b.  Added PrecedingInPart, SupersedesInPart, SucceedingInPart,
SupersededInPartBy values to entity TitleType


_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce

From biopython at maubp.freeserve.co.uk  Thu Dec  4 05:26:39 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Dec 2008 10:26:39 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
Message-ID: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>

On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham <timothyham at gmail.com> wrote:
>
> Hi everyone,
>
> The current biopython GenBank parser dies while parsing VectorNTI
> generated files.  For example, until recently, BioPython did not
> accept an empty SOURCE field. It still does not handle an empty
> VERSION or ACCESSION fields (consumer.data.id never gets filled),
> which is the default for user generated vector maps via VectorNTI.

I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97
after Tim contacted me offlist - there was no bug report.

> Now, it is easy enough to change the GenBank parser to handle
> malformed genbank files, (I can submit patches) but the real question
> becomes:
>> Should BioPython handle malformed genbank files at all?
> I would like to be practical and say yes, since VectorNTI is a very
> common, widely used format, but I wanted to ask the community before
> submitting my patches.
>
> Thanks for the great work,
> Tim

As I'm the defacto maintainer for Bio.GenBank, I guess unless the list
as a whole has a consensus this is my call.

Reading the GenBank file format spec, the ACCESSION and VERSION lines
are clearly intended to be mandatory.  Note that for mandatory fields,
IIRC, the NCBI will use a single dot/period as a place holder when
there is no data.  So I would argue that VectorNTI is producing
invalid files, and you should write to the authors and encourage them
to follow the spec more closely (even if we do change Biopython to
cope).

However, I'm willing to bend a little on out of spec GenBank files (in
cases like this where there is no ambiguity about the parsing), but I
would want a real example output file from VectorNTI to include for a
unit test.  This is important as we need to use something sensible for
the SeqRecord's id property if the ACCESSION and VERSION are missing.

Peter

From mjldehoon at yahoo.com  Thu Dec  4 07:32:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 4 Dec 2008 04:32:18 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
Message-ID: <442447.52362.qm@web62407.mail.re1.yahoo.com>

> Michiel de Hoon wrote:
> > If one of the sub-tests fails, Python's unit
> > testing framework will tell us so,
> > though (perhaps) not exactly which sub-test fails.
> > However, that is easy to
> > figure out just by running the individual test script
> > by itself.
> 
> That won't always work.  Consider intermittent network
> problems, or tests using random data - in general it 
> really is worthwhile having run_tests.py report a little
> more than just which test_XXX.py module failed.
>
I wonder if Python's unit testing framework allows us to capture exactly which sub-test fails. I'll look into that. Ideally, it should be possible to have regular Python unit tests and Biopython-style print-and-compare tests side by side, and get information about failing sub-tests for both.

--Michiel.


From bsouthey at gmail.com  Thu Dec  4 10:02:13 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 04 Dec 2008 09:02:13 -0600
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
Message-ID: <4937F0F5.6070905@gmail.com>

Peter wrote:
> On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham <timothyham at gmail.com> wrote:
>   
>> Hi everyone,
>>
>> The current biopython GenBank parser dies while parsing VectorNTI
>> generated files.  For example, until recently, BioPython did not
>> accept an empty SOURCE field. It still does not handle an empty
>> VERSION or ACCESSION fields (consumer.data.id never gets filled),
>> which is the default for user generated vector maps via VectorNTI.
>>     
>
> I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97
> after Tim contacted me offlist - there was no bug report.
>
>   
>> Now, it is easy enough to change the GenBank parser to handle
>> malformed genbank files, (I can submit patches) but the real question
>> becomes:
>>     
>>> Should BioPython handle malformed genbank files at all?
>>>       
>> I would like to be practical and say yes, since VectorNTI is a very
>> common, widely used format, but I wanted to ask the community before
>> submitting my patches.
>>
>> Thanks for the great work,
>> Tim
>>     
>
> As I'm the defacto maintainer for Bio.GenBank, I guess unless the list
> as a whole has a consensus this is my call.
>
> Reading the GenBank file format spec, the ACCESSION and VERSION lines
> are clearly intended to be mandatory.  Note that for mandatory fields,
> IIRC, the NCBI will use a single dot/period as a place holder when
> there is no data.  So I would argue that VectorNTI is producing
> invalid files, and you should write to the authors and encourage them
> to follow the spec more closely (even if we do change Biopython to
> cope).
>
> However, I'm willing to bend a little on out of spec GenBank files (in
> cases like this where there is no ambiguity about the parsing), but I
> would want a real example output file from VectorNTI to include for a
> unit test.  This is important as we need to use something sensible for
> the SeqRecord's id property if the ACCESSION and VERSION are missing.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the 
'complete release notes for the current version of GenBank'.
 From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that 
ACCESSION and VERSION are mandatory and I interpret the '/' to mean 
'with'. The relevant section is:

3.4.2  Entry Organization
"
  The second part of each sequence entry record contains the information
appropriate to its keyword, in positions 13 to 80 for keywords and
positions 11 to 80 for the sequence.

  The following is a brief description of each entry field. Detailed
information about each field may be found in Sections 3.4.4 to 3.4.15.

LOCUS	- A short mnemonic name for the entry, chosen to suggest the
sequence's definition. Mandatory keyword/exactly one record.

DEFINITION	- A concise description of the sequence. Mandatory
keyword/one or more records.

ACCESSION	- The primary accession number is a unique, unchanging
identifier assigned to each GenBank sequence record. (Please use this
identifier when citing information from GenBank.) Mandatory keyword/one
or more records.

VERSION		- A compound identifier consisting of the primary
accession number and a numeric version number associated with the
current version of the sequence data in the record. This is followed
by an integer key (a "GI") assigned to the sequence by NCBI.
"
Mandatory keyword/exactly one record.

If these entries are missing then Biopython must raise an exception 
because the GenBank file is invalid.

While I have not seen an example, does a VectorNTI output contain the 
LOCUS field that could be used an accession number?
I think it is fairly common for the accession number to be part of the 
LOCUS field.

Bruce


From biopython at maubp.freeserve.co.uk  Thu Dec  4 10:16:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Dec 2008 15:16:20 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <4937F0F5.6070905@gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<4937F0F5.6070905@gmail.com>
Message-ID: <320fb6e00812040716h1fb4bfbflf5a37456102722cc@mail.gmail.com>

On Thu, Dec 4, 2008 at 3:02 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Peter wrote:
>> Reading the GenBank file format spec, the ACCESSION and VERSION lines
>> are clearly intended to be mandatory.  Note that for mandatory fields,
>> IIRC, the NCBI will use a single dot/period as a place holder when
>> there is no data.  So I would argue that VectorNTI is producing
>> invalid files, and you should write to the authors and encourage them
>> to follow the spec more closely (even if we do change Biopython to
>> cope).

Bruce wrote:
> At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the
> 'complete release notes for the current version of GenBank'.
> From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that
> ACCESSION and VERSION are mandatory  ...

We agree on this, according to the current NCBI standard, a GenBank
file missing the ACCESSION or VERSION line is technically invalid.

Bruce:
> If these entries are missing then Biopython must raise an exception because
> the GenBank file is invalid.

I see a difference between a GenBank parser, and a GenBank validator.
While it would be nice to just say "your file is invalid", in many
cases the meaning of the file isn't ambiguous and can still be safely
parsed.  From past experience, even the NCBI sometimes provide invalid
files which break their own rules (e.g. Biopython Bug 2591).  In my
personal opinion, a strict parser which rejects any invalid GenBank
file isn't actually that useful - there is a grey area where a little
leniency is very helpful:

Peter wrote:
>> However, I'm willing to bend a little on out of spec GenBank files (in
>> cases like this where there is no ambiguity about the parsing), but I
>> would want a real example output file from VectorNTI to include for a
>> unit test.  This is important as we need to use something sensible for
>> the SeqRecord's id property if the ACCESSION and VERSION are missing.

Peter

From biopython at maubp.freeserve.co.uk  Thu Dec  4 17:15:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Dec 2008 22:15:26 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
Message-ID: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>

Tim wrote:
> I have attached two representative example genbank outputs from
> VectorNTI. I don't know if the mailing list accepts attachments, but
> if it can't, is there a place where I can put it (maybe the biopython
> wiki?)

I got them, thanks.  For future reference, it would have been better
to have filed a bug on bugzilla, and then (once the bug is filed) you
can attach files to it.

Earlier Tim wrote:
>>> The current biopython GenBank parser dies while parsing VectorNTI
>>> generated files.  For example, until recently, BioPython did not
>>> accept an empty SOURCE field. It still does not handle an empty
>>> VERSION or ACCESSION fields (consumer.data.id never gets filled),
>>> which is the default for user generated vector maps via VectorNTI.

Now that I've got your two files, my copy of Biopython seem to read
them just fine.  What exactly do you mean by the "parser dies"?  Could
you show us a snippet of code and if relevant the exception error -
plus details of your OS, version of Python and Biopthon etc?

Thanks

Peter

From timothyham at gmail.com  Thu Dec  4 21:09:21 2008
From: timothyham at gmail.com (Timothy Ham)
Date: Thu, 4 Dec 2008 18:09:21 -0800
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
	<320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>
Message-ID: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Now that I've got your two files, my copy of Biopython seem to read
> them just fine.  What exactly do you mean by the "parser dies"?  Could
> you show us a snippet of code and if relevant the exception error -
> plus details of your OS, version of Python and Biopthon etc?
>
> Thanks
>
> Peter
>

Ah, my bad. I was running it against an old version. It looks like it
was fixed as of
/biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48).
The current version does the right thing.

Thanks much,
Tim

From biopython at maubp.freeserve.co.uk  Fri Dec  5 05:19:12 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Dec 2008 10:19:12 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
	<320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>
	<632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com>
Message-ID: <320fb6e00812050219k376fdda2r969fe78a547b0ff6@mail.gmail.com>

Tim wrote:
> Ah, my bad. I was running it against an old version. It looks like it
> was fixed as of
> /biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48).
> The current version does the right thing.

Oh right - that was when I was testing parsing of the slightly
non-standard GenBank output from the EMBOSS seqret tool.  Anyway,
problem solved :)

Peter

From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 06:59:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 06:59:07 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812051159.mB5Bx7TR009168@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-05 06:59 EST -------
(In reply to comment #0)
> The default font has been changed to 'Vera', which is shipped with Reportlab,
> to avoid some problems with unavailable fonts

On my Mac "Vera" doesn't work, and going back to the default of 'Helvetica'
seems best on Unix in general.  Also, Helvetica is one of the standard fonts
which all PDF viewers should be able to render.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 11:44:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 11:44:10 -0500
Subject: [Biopython-dev] [Bug 2697] New: MaxEntropy calculate function
	assumes integer values for class and convergence criteria is
	hard coded
Message-ID: <bug-2697-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697

           Summary: MaxEntropy calculate function assumes integer values for
                    class and convergence criteria is hard coded
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The Bio.MaxEntrophy.classify() assumes that the targets are integers starting
at zero. However, a model can be trained by using character values. This
requires a simple change in a loop in that function.

Also, the convergence criteria is hard coded into the file by the following
gloable definitions:
MAX_IIS_ITERATIONS = 10000    # Maximum iterations for IIS.
IIS_CONVERGE = 1E-5           # Convergence criteria for IIS.
MAX_NEWTON_ITERATIONS = 100   # Maximum iterations on Newton's method.
NEWTON_CONVERGE = 1E-10       # Convergence criteria for Newton's method.

This makes it impossible for the user to specify their own values without
changing the actual function. This is changed by passing these values to the
train function and subfunctions. 

Both of these are fixed in an attached patch.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 11:47:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 11:47:15 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812051647.mB5GlFRQ020087@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #1 from bsouthey at gmail.com  2008-12-05 11:47 EST -------
Created an attachment (id=1139)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1139&action=view)
Fixes to MaxEntrophy

1) Fixes MaxEntrophy.calculate to use the target classes from the data
2) Permits the user to define their own convergence criterion


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 11:59:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 11:59:51 -0500
Subject: [Biopython-dev] [Bug 2698] New: Attempt at a unit test for
	MaxEntrophy
Message-ID: <bug-2698-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2698

           Summary: Attempt at a unit test for MaxEntrophy
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


I used test_LogisticRegression.py to develop a test for MaxEntrophy. However, I
could not get MaxEntrophy to train on that dataset. Indeed I have found it to
be very sensitive to both data and functions making it extremely hard to
develop bioinformatics-based data and associated test. So in the end I
generated data based on some of my work. 

I trained the model outside the tests because I do not know how to avoid
retraining the model for each test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 12:00:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 12:00:29 -0500
Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy
In-Reply-To: <bug-2698-42@http.bugzilla.open-bio.org/>
Message-ID: <200812051700.mB5H0Ted022044@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2698


------- Comment #1 from bsouthey at gmail.com  2008-12-05 12:00 EST -------
Created an attachment (id=1140)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1140&action=view)
Test for MaxEntrophy


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From timothyham at gmail.com  Thu Dec  4 16:52:33 2008
From: timothyham at gmail.com (Timothy Ham)
Date: Thu, 4 Dec 2008 13:52:33 -0800
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
Message-ID: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:26 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham <timothyham at gmail.com> wrote:
>>
>> Hi everyone,
>>
>> The current biopython GenBank parser dies while parsing VectorNTI
>> generated files.  For example, until recently, BioPython did not
>> accept an empty SOURCE field. It still does not handle an empty
>> VERSION or ACCESSION fields (consumer.data.id never gets filled),
>> which is the default for user generated vector maps via VectorNTI.
>
> I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97
> after Tim contacted me offlist - there was no bug report.
>
>> Now, it is easy enough to change the GenBank parser to handle
>> malformed genbank files, (I can submit patches) but the real question
>> becomes:
>>> Should BioPython handle malformed genbank files at all?
>> I would like to be practical and say yes, since VectorNTI is a very
>> common, widely used format, but I wanted to ask the community before
>> submitting my patches.
>>
>> Thanks for the great work,
>> Tim
>
> As I'm the defacto maintainer for Bio.GenBank, I guess unless the list
> as a whole has a consensus this is my call.
>
> Reading the GenBank file format spec, the ACCESSION and VERSION lines
> are clearly intended to be mandatory.  Note that for mandatory fields,
> IIRC, the NCBI will use a single dot/period as a place holder when
> there is no data.  So I would argue that VectorNTI is producing
> invalid files, and you should write to the authors and encourage them
> to follow the spec more closely (even if we do change Biopython to
> cope).
>
> However, I'm willing to bend a little on out of spec GenBank files (in
> cases like this where there is no ambiguity about the parsing), but I
> would want a real example output file from VectorNTI to include for a
> unit test.  This is important as we need to use something sensible for
> the SeqRecord's id property if the ACCESSION and VERSION are missing.
>
> Peter
>

I have attached two representative example genbank outputs from
VectorNTI. I don't know if the mailing list accepts attachments, but
if it can't, is there a place where I can put it (maybe the biopython
wiki?)

Tim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vnti_example.zip
Type: application/zip
Size: 11716 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20081204/15ebddc9/attachment-0001.zip>

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 09:55:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 09:55:05 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091455.mB9Et5iX017478@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #1132|application/octet-stream    |text/plain
          mime type|                            |
Attachment #1132 is|0                           |1
              patch|                            |


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 09:55 EST -------
(From update of attachment 1132)
Checked into CVS (with the font defaulting to Helvetica as discussed with
Leighton privately).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 09:55:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 09:55:56 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091455.mB9Etu7C017584@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1132 is|1                           |0
              patch|                            |
Attachment #1132 is|0                           |1
           obsolete|                            |


------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 09:55 EST -------
(From update of attachment 1132)
This is now obsolete - checked into CVS (with the font defaulting to elvetica
as discussed with Leighton privately).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 10:12:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:12:56 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091512.mB9FCusM019463@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 10:12 EST -------
(In reply to comment #12)
> 
> Bio.Graphics.GenomeDiagram.Utilities
> ====================================
> This is a collection of utilities for getting information useful for graph
> values.  From the docstring,
> 
>     o apply_to_window (sequence, window_size, function, step=None)  Apply a
>                         passed function to fragments of the passed sequence of
>                         size window_size, with each window separated by the
>                         passed step.

This windowing function is rather specific to GenomeDiagram by the nature of
how it returns the values and their positions.  The handling of the end of the
sequence is also non-general.  Suppose we put apply_to_window somewhere under
Bio.Graphics.GenomeDiagram.  It can then be used with any sequence analysis
function which takes a sequence/string and returns a float, returning the
scores and window positions as expected by GenomeDiagram for drawing graphical
tracks.

That would leave the following general non-windowed functions from
Utilities.py,

calc_gc_content - returns a float in the range 0 to 1.
calc_at_content - returns a float in the range 0 to 1.
calc_gc_skew - returns a float, gives zero if there is no GC content.
calc_at_skew - returns a float, gives zero if there is no AT content.

Bio.SeqUtils already has several functions including:

GC - returns a float in the range 0 to 100 (i.e. 100 times the actual fraction)
GC_skew - returns a list of floats using a default window size of 100bp.  Gives
a floating point exception if there is no GC content in any window.

Personally I don't like the fact that the existing GC function returns a number
between 0 and 100, but otherwise this code is fine.

I don't think the current GC_skew function is intuitive and doesn't cover the
non-windowed use-case where you want the GC_skew of the whole sequence passed
in.  This is important if you want to do your own windowing (e.g. comparing GC
skew of individual genes to the whole genome).

Because they differ from the existing Bio.SeqUtils code, I think there is a
case for adding the four non-windowed functions from GenomeDiagram's
Utilities.py under Bio.SeqUtils.  Perhaps under a sub module like
Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils?  The existing GC functions
in Bio.SeqUtils could be deprecated or at least declared obsolete.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 10:19:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:19:23 -0500
Subject: [Biopython-dev] [Bug 2704] New: Parser for the markx10 alignment
	format
Message-ID: <bug-2704-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704

           Summary: Parser for the markx10 alignment format
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: osvaldo.zagordi at bsse.ethz.ch


Hi,
I recently wrote some code to parse the Emboss alignment format
markx10 (format explained at
http://emboss.sourceforge.net/docs/themes/AlignFormats.html)
Since it is slightly different from the Fasta m10 (not surprising, right?) I
had to adapt FastaIO.py.
I thought this might eventually be included in biopython.
Important:
I noticed that if the alignment program exits for some reason and
does not close the alignment file with two lines like these
#---------------------------------------
#---------------------------------------
bad things can happen (e.g., sucking all the memory of the system)). 
Could it be that a similar issue applies to FastaIO parser as well?
Best,
        Osvaldo


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 10:35:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:35:57 -0500
Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format
In-Reply-To: <bug-2704-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091535.mB9FZvHG021117@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 10:35 EST -------
This sounds interesting Osvaldo,

Now that you've filed this bug, you should be able to upload the python file
(or a patch).

Given EMBOSS's markx10 output is intended to be like FASTA's -m 10 output (but
with the addition of EMBOSS style headers and footers), it *might* be nicer to
have one parser for both.  Right now I don't know how similar EMBOSS's output
really is.

If we do go for the simpler option of two separate parsers, it would certainly
be a good idea in the long run for them to share some code.

(In reply to comment #0)
> Important:
> I noticed that if the alignment program exits for some reason and
> does not close the alignment file with two lines like these
> #---------------------------------------
> #---------------------------------------
> bad things can happen (e.g., sucking all the memory of the system)). 
> Could it be that a similar issue applies to FastaIO parser as well?

Does this happen create such a file by hand (lacking these files) and try and
read that?  If so it should be easier to debug.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 10:43:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:43:19 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091543.mB9FhJfV021598@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #21 from lpritc at scri.sari.ac.uk  2008-12-09 10:43 EST -------
(In reply to comment #20)
> (In reply to comment #12)
> > 
> > Bio.Graphics.GenomeDiagram.Utilities
> > ====================================
> > This is a collection of utilities for getting information useful for graph
> > values.  From the docstring,
> > 
> >     o apply_to_window (sequence, window_size, function, step=None)  Apply a
> >                         passed function to fragments of the passed sequence of
> >                         size window_size, with each window separated by the
> >                         passed step.
> 
> This windowing function is rather specific to GenomeDiagram by the nature of
> how it returns the values and their positions.  The handling of the end of the
> sequence is also non-general.  Suppose we put apply_to_window somewhere under
> Bio.Graphics.GenomeDiagram.  It can then be used with any sequence analysis
> function which takes a sequence/string and returns a float, returning the
> scores and window positions as expected by GenomeDiagram for drawing graphical
> tracks.

That seems sensible, to me.  I like the generality that would result from it,
and it seems like apply_to_window could even be a useful convenience function
addition to Bio.SeqUtils in its own right.

[...]

> Because they differ from the existing Bio.SeqUtils code, I think there is a
> case for adding the four non-windowed functions from GenomeDiagram's
> Utilities.py under Bio.SeqUtils.  Perhaps under a sub module like
> Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils?  The existing GC functions
> in Bio.SeqUtils could be deprecated or at least declared obsolete.

I think that there's value to be had in standardising to a floating-point 0..1
or -1..1 range for some of these kinds of functions, so I would support such a
move on those grounds.

Regarding my GC skew code (and the corresponding AT skew code): that the
behaviour when there is no GC in the sequence is misleading (read: wrong ;) ). 
Strictly, a divide-by-zero error would be correct here, but I just lazily went
for a zero value for ease of drawing, instead of doing something that properly
indicated 'not a number'.  I think that what needs to be done for GenomeDiagram
is to modify the graphing code so that it does something appropriate for NaNs
(however they may be indicated) - this should perhaps be to stop at the
preceding point, and resume at the subsequent point, for line graphs; not to
draw a box for the heat map; and not to draw a bar for the bar chart (not that
this will always be distinguishable from a zero value...).

The GenomeDiagram GC/AT skew code also needs to be modified to return None or
some other NaN indicator before its behaviour can be considered correct.

Apologies for propagating those shortcuts - my bad.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 11:20:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:20:06 -0500
Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format
In-Reply-To: <bug-2704-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091620.mB9GK6Si024603@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704


------- Comment #2 from osvaldo.zagordi at bsse.ethz.ch  2008-12-09 11:20 EST -------
Created an attachment (id=1151)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1151&action=view)
Class Markx10Iterator for markx10 alignment format

Attached a simple example of using the code. Just running simple_test.py should
be enough.
If you remove the last two lines #------ from tmp_align.needle the program
loops sucking more and more memory


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 11:20:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:20:23 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091620.mB9GKNCm024646@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 11:20 EST -------
(In reply to comment #21)
> Regarding my GC skew code (and the corresponding AT skew code): that the
> behaviour when there is no GC in the sequence is misleading
> (read: wrong ;) ). 
> Strictly, a divide-by-zero error would be correct here, but I just lazily went
> for a zero value for ease of drawing, instead of doing something that properly
> indicated 'not a number'.  

Yeah - you're right.  Either we just allow the divide by zero to be raised, or
return a NaN, maybe via float("nan") unless there is a better way without
getting NumPy involved.

> I think that what needs to be done for GenomeDiagram
> is to modify the graphing code so that it does something appropriate for NaNs
> (however they may be indicated) - this should perhaps be to stop at the
> preceding point, and resume at the subsequent point, for line graphs; not to
> draw a box for the heat map; and not to draw a bar for the bar chart (not that
> this will always be distinguishable from a zero value...).

OK.  I can see what just using zero was a nice short cut here.

> The GenomeDiagram GC/AT skew code also needs to be modified to return None or
> some other NaN indicator before its behaviour can be considered correct.

Or, if we accept that "sequence scoring functions" may raise a divide by zero
error, then apply_to_window should be also to cope and map this to an
appropriate nan indicator (e.g. None or float("nan")).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 11:39:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:39:27 -0500
Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format
In-Reply-To: <bug-2704-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091639.mB9GdRTJ026010@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 11:39 EST -------
(In reply to comment #2)
> If you remove the last two lines #------ from tmp_align.needle the program
> loops sucking more and more memory

You have an infinite loop, try modifying the bit near line 162 as follows:

        #Now should have the aligned query sequence with flanking region...
        while not (line.startswith(">") or ">>>" in line) and not
line.startswith('#'):
            match_seq_parts.append(line.strip())
            line = handle.readline()
            if not line :
                #End of file
                return None 

Also, your code is based on an out of date version of Bio/AlignIO/FastaIO.py -
probably from Biopython 1.47, and lacks improvements which may also apply to
the EMBOSS output.  Given the object orientated nature of the current m10
parser, you/we should be able to subclass it and only override those bit
dealing with the header and footer.  This is probably the nicest way forward if
we decide to treat the EMBOSS markx10 format as a new format in Bio.AlignIO.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 11:59:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:59:21 -0500
Subject: [Biopython-dev] [Bug 2705] New: Nicer GC and AT content and skew
	functions
Message-ID: <bug-2705-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2705

           Summary: Nicer GC and AT content and skew functions
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This bug started out as a discussion on Bug 2671, based on some nucleotide
scoring functions in GenomeDiagram which were used for plotting sequence
properties along a sequence using a sliding window.  The basic underlying
functions could make a nice addition under Bio.SeqUtils (rather than hiding
them under Bio.Graphics.GenomeDiagram).

In particular, GenomeDiagram's Utilities.py included the following
(non-windowed) nucleotide composition functions:

calc_gc_content - returns a float in the range 0 to 1.
calc_at_content - returns a float in the range 0 to 1.
calc_gc_skew - returns a float [*]
calc_at_skew - returns a float [*]

[*] As discussed on Bug 2671, these currently give zero if there is no AT
content, which was a reasonable shortcut given these functions were originally
used for plotting only.  They should instead raise an exception or return None
or NaN instead.

Also, as implemented in GenomeDiagram, these functions do not cope with mixed
case sequences (easily rectified).  Also, for GC and AT content these do not
deal with ambiguous nucleotides (where we could follow the existing
Bio.SeqUtils convention).

Bio.SeqUtils already has several related functions including:

GC - returns a float (a percentage in the range 0 to 100)
GC123 - returns a tuple of four floats (percentages between 0 and 100)

GC_skew - returns a list of floats using a default window size of 100bp.  Gives
a floating point exception if there is no GC content in any window.

Personally I don't like the fact that the existing GC function returns a number
between 0 and 100 (rather than 0 and 1).  Leighton agreed.

I don't think the current GC_skew function is intuitive and doesn't cover the
non-windowed use-case where you want the GC_skew of the whole sequence passed
in.  This is important if you want to do your own windowing (e.g. comparing GC
skew of individual genes to the whole genome).

Because they differ from the existing Bio.SeqUtils code, I think there is a
case for adding the four non-windowed functions from GenomeDiagram's
Utilities.py under Bio.SeqUtils.  Each would take a single argument, a sequence
(coping with a string, Seq object or MutableSeq object).  I have no
particularly strong views on the naming of these functions.  Perhaps they could
be located under a sub module like Bio.SeqUtils.Nucleotides or
Bio.SeqUtils.NucUtils?  The existing GC functions in Bio.SeqUtils could be
deprecated or at least declared obsolete.

This would also be a good opportunity to explicitly specify what we expect to
get back for the GC content when there are ambiguous nucleotides.

e.g. Following Bio.SeqUtils.GC, only count C, G and S (which means C or G) (in
either case) and divide by the length giving a lower bound.  Here GC("ACGTN")
is 40%.  An alternative approach might be to treat an N as 50% GC, and H (which
is A, C or T) as 66.6% GC etc, meaning GC("ACGTN") gives 50%.

The same approach should be used for the AT percentage, for example the current
lower bound approach would count only A, T and W characters (in either case).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 12:04:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 12:04:15 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091704.mB9H4F9C028063@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 12:04 EST -------
I've filed Bug 2705 about adding these nucleotide sequence functions somewhere
under Bio.SeqUtils - this should get more people reading it because this bug
(Bug 2671) hasn't been assigned to the dev mailing list I doubt many people are
aware of it.

For Bio.Graphics.GenomeDiagram we need to ensure the graphics tracks can cope
with NAN/None missing values as outlined by Leighton in comment 21.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 12:53:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 12:53:44 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091753.mB9Hri42031692@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1133 is|0                           |1
           obsolete|                            |


------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 12:53 EST -------
(From update of attachment 1133)
I've checked something like this into CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 11:46:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 11:46:35 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101646.mBAGkZs1003825@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2705


------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-10 11:46 EST -------
OK, GenomeDiagram is now in CVS, with some basic tests.  Still to do:

* Updating the existing GenomeDiagram manual to match (different imports,
colour to color), which I think can stay as a separate PDF file.

* A short introduction to Bio.Graphics including GenomeDiagram as part of a new
chapter in the tutorial?

* Dealing with Bug 2705 (for the AT and GC content and skew) and the window
function to help plot these in GenomeDiagram.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 11:46:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 11:46:38 -0500
Subject: [Biopython-dev] [Bug 2705] Nicer GC and AT content and skew
	functions
In-Reply-To: <bug-2705-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101646.mBAGkcGB003850@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2705


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |2671
              nThis|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 12:16:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 12:16:37 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101716.mBAHGbGG006815@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-10 12:16 EST -------
We already talked about "colour" vs "color" (UK vs USA), but I've just noticed
the use of "centre" vs "center" where again I would prefer we follow computer
language norms and take the USA spelling.

Also, I'm not sure that the existing colour/color dual support works 100% of
the time.  I had an old script using colour where the feature colours specified
ended up being the default of light green.  Using "color" instead of "colour"
in my script worked.  I'll try and investigate this later.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 12:55:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 12:55:31 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101755.mBAHtVJ7009870@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-10 12:55 EST -------
This might be better off as a new enhancement bug, but here is a possible
"arc-box" drawing function to go in the AbstractDrawer.py file, based on the
existing draw_box function.

def draw_arcbox(xcentre, ycentre, inner_radius, outer_radius,
                startangle, endangle,
                colour=colors.lightgreen, border=None, color=None) :
    """Returns a closed path object describing an arced box.

    Expects the angles to be in radians."""
    if color is None:
        color = colour
    if color == colors.white and border is None:   # Force black border on 
        strokecolor = colors.black                 # white boxes with
    elif border is None:                           # undefined border, else
        strokecolor = color                        # use fill colour
    elif border is not None:
        strokecolor = border

    p = ArcPath(strokeColor=strokecolor,
                fillColor=color,
                strokewidth=0)
    p.addArc(xcentre, ycentre, outer_radius,
             startangle * 180 / pi, endangle * 180 / pi,
             moveTo=True)
    p.addArc(xcentre, ycentre, inner_radius,
             startangle * 180 / pi, endangle * 180 / pi,
             reverse=True)
    p.closePath()
    return p

This takes advantage of reportlab's build in arc approximation code meaning we
can simplify the CircularDrawer.py method to just something like this:

    def draw_arc(self, inner_radius, outer_radius,
                 startangle, endangle,
                 color, border=None, colour=None):
        #Docstring here
        return draw_arcbox(self.xcentre, self.ycentre,
                           inner_radius, outer_radius,
                           startangle, endangle,
                           colour, border, color)

Alternately, the code could just go in CircularDrawer.py directly.

As far as I can tell from looking at their source code, even ReportLab_1_21_2
has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any
issue here with backwards compatibility.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 11 03:40:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Dec 2008 03:40:23 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812110840.mBB8eNFs006984@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #28 from lpritc at scri.sari.ac.uk  2008-12-11 03:40 EST -------
(In reply to comment #26)
> We already talked about "colour" vs "color" (UK vs USA), but I've just noticed
> the use of "centre" vs "center" where again I would prefer we follow computer
> language norms and take the USA spelling.
> 
> Also, I'm not sure that the existing colour/color dual support works 100% of
> the time.  I had an old script using colour where the feature colours specified
> ended up being the default of light green.  Using "color" instead of "colour"
> in my script worked.  I'll try and investigate this later.

Is this related to my fix in comment #9?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 11 06:50:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Dec 2008 06:50:17 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812111150.mBBBoHej030149@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-11 06:50 EST -------
(In reply to comment #28)
> (In reply to comment #26)
> > Also, I'm not sure that the existing colour/color dual support works 100%
> > of the time.  I had an old script using colour where the feature colours
> > specified ended up being the default of light green.  Using "color"
> > instead of "colour" in my script worked.  I'll try and investigate this
> > later.
> 
> Is this related to my fix in comment #9?

Possibly - although I was already using that version of AbstractDrawer.py

I've updated CVS to make it clear in the comments that "colour" arguments
override "color" arguments (this is required for backwards compatibility with
old scripts which would be using "colour").  I also had to fix the FeatureSet's
add_feature method to handle the colour/color mapping (this was the root of the
problem I had observed in comment 26).

I propose that in Biopython 1.50 we support both "colour" and "color", but for
Biopython 1.51 we add deprecation warnings when "colour" is used.

We should probably do the same thing for "centre" and "center" as well...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 11 06:52:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Dec 2008 06:52:41 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812111152.mBBBqfTQ030413@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #30 from lpritc at scri.sari.ac.uk  2008-12-11 06:52 EST -------
(In reply to comment #29)
> 
> I propose that in Biopython 1.50 we support both "colour" and "color", but for
> Biopython 1.51 we add deprecation warnings when "colour" is used.
> 
> We should probably do the same thing for "centre" and "center" as well...
> 

I agree.  We should encourage use of the US spelling in the documentation, to
catch those new to GD. This approach provides a window for conversion of old GD
scripts for previous users, which is a good thing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 11:09:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 11:09:27 -0500
Subject: [Biopython-dev] [Bug 2709] New: test_GenomeDiagram fails under Linux
Message-ID: <bug-2709-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2709

           Summary: test_GenomeDiagram fails under Linux
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P4
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


Under my Linux 64-bit system test_GenomeDiagram fails but the other related
tessts 'pass' as reportlab is not available:

test_GenomeDiagram ... ERROR                                                    
test_GraphicsChromosome ... skipping. Install reportlab if you want to use
Bio.Graphics.
ok
test_GraphicsDistribution ... skipping. Install reportlab if you want to use
Bio.Graphics.
ok
test_GraphicsGeneral ... skipping. Install reportlab if you want to use
Bio.Graphics.
ok


======================================================================
ERROR: test_GenomeDiagram                                             
----------------------------------------------------------------------
Traceback (most recent call last):                                    
  File "run_tests.py", line 125, in runTest                           
    self.runSafeTest()                                                
  File "run_tests.py", line 138, in runSafeTest                       
    cur_test = __import__(self.test_name)                             
  File "test_GenomeDiagram.py", line 21, in <module>                  
    raise MissingExternalDependencyError(\                            
NameError: name 'MissingExternalDependencyError' is not defined       

----------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 11:25:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 11:25:59 -0500
Subject: [Biopython-dev] [Bug 2709] test_GenomeDiagram fails under Linux
In-Reply-To: <bug-2709-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121625.mBCGPxeQ031269@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2709


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-12 11:25 EST -------
It was trying to raise MissingExternalDependencyError when reportlab was
missing (which would have skipped the test), but MissingExternalDependencyError
hadn't been imported.

Fixed in test_GenomeDiagram.py CVS revision 1.10

Thanks for reporting this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 11:49:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 11:49:51 -0500
Subject: [Biopython-dev] [Bug 2710] New: GenomeDiagram.py unnecessary
	requires the reportlab addon renderPM
Message-ID: <bug-2710-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2710

           Summary: GenomeDiagram.py unnecessary requires the reportlab
                    addon renderPM
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


test_GenomeDiagram fails because the renderPM module is not part of standard
install of reportlab, at least under Linux. 

I consider that the renderPM module should not be required so
Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
renderPM module when it is not available. 

The installation documentation needs to include something about needing the
renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.

There must be a test for the presence of the renderPM module.


test_GenomeDiagram ... ERROR
test_GraphicsChromosome ... ok
test_GraphicsDistribution ... ok
test_GraphicsGeneral ... ok

======================================================================
ERROR: test_GenomeDiagram
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_GenomeDiagram.py", line 30, in <module>
    from Bio.Graphics.GenomeDiagram.FeatureSet import FeatureSet
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/__init__.py",
line 13, in <module>
    from Bio.Graphics.GenomeDiagram.Diagram import Diagram
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/Diagram.py",
line 32, in <module>
    from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
  File "/usr/lib/python2.5/site-packages/reportlab/graphics/renderPM.py", line
28, in <module>
    "see http://www.reportlab.org/rl_addons.html")
ImportError: No module named _renderPM
see http://www.reportlab.org/rl_addons.html

----------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 12:43:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:43:49 -0500
Subject: [Biopython-dev] [Bug 2711] New: GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
Message-ID: <bug-2711-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711

           Summary: GenomeDiagram.py: write() and write_to_string() are
                    inefficient and don't check inputs
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


While looking at GenomeDiagram.py I noticed some things that should be fixed. I
do note that some of this stems from reportlab. In particlular, reportlab
doesn't appear to have a generic interface for different image formats.

1) Why are there two functions to output a diagram than just one generic
function? In particular, why not just pass a filename or not? Yes, I know that
reportlab uses different functions but this just duplicates code. So this is
more a comment than anything else. 

2) I find the functions write() and write_to_string() just plain ugly. 
You define a local dictionary of modules every time these functions are called.
But there is only one valid key so you then go back to find the input that you
already knew. A nested list would be better and allow catching invalid inputs
(see next point).

3) Neither write() and write_to_string() check that the output option is valid.
These functions do not accept lowercase. Thus, output='ps' will crash with a
key error as well any invalid key.

4) I do not know the policy on module imports, but this line is only required
for write() and write_to_string():
from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
Also renderPM is an addon.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 12:46:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:46:53 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121746.mBCHkrPi005835@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #1 from bsouthey at gmail.com  2008-12-12 12:46 EST -------
Created an attachment (id=1156)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1156&action=view)
Fix various issues with GenomeDIagram/Diagram.py

Contains a couple of fixes including bug 2710.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 12:54:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:54:21 -0500
Subject: [Biopython-dev] [Bug 2710] GenomeDiagram.py unnecessary requires
	the reportlab addon renderPM
In-Reply-To: <bug-2710-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121754.mBCHsL4q006303@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2710


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from bsouthey at gmail.com  2008-12-12 12:54 EST -------
The reason for this bug report was the import of renderPM. But closer look at
the code shows a bigger issue with write() and writeToString() functions of
Diagram.py. I am marking this as duplicate because correctly fixing bug 2711
(see patch for Bug 2711) will also fix this one.

*** This bug has been marked as a duplicate of bug 2711 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 12:54:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:54:34 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121754.mBCHsYgN006312@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #2 from bsouthey at gmail.com  2008-12-12 12:54 EST -------
*** Bug 2710 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 13:25:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 13:25:25 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121825.mBCIPPZq008484@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-12 13:25 EST -------
I agree something needs to be done for this issue (in particular the bit
originally covered by Bug 2710.

Moving the imports into these function(s) would be another way to let use deal
with the missing renderPM module if and when it is used (either leave the
ImportError, or raise a missing external dependency error).

As an aside, I'd like write_to_string() to support a DPI argument like write()
does.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 14:23:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 14:23:06 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121923.mBCJN64B013046@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1156 is|0                           |1
           obsolete|                            |


------- Comment #4 from bsouthey at gmail.com  2008-12-12 14:23 EST -------
Created an attachment (id=1157)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1157&action=view)
Corrected patch

I blindly copied and pasted without correcting it. Also, added 'dpi' to
write_to_string().


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 14:29:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 14:29:37 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121929.mBCJTbtl013858@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #5 from bsouthey at gmail.com  2008-12-12 14:29 EST -------
(In reply to comment #3)
> 
> As an aside, I'd like write_to_string() to support a DPI argument like write()
> does.
> 

I added this to the patch as it was trivial. I would also think that exposing
the other options (bg, configPIL, showBoundary) could be useful. But I do not
know how these influence the GenomeDiagram.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Sat Dec 13 13:20:10 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 13 Dec 2008 18:20:10 +0000
Subject: [Biopython-dev] [Utilities-announce] PubMed Entrez Utility 2009
	DTD changes
In-Reply-To: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov>
	<320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com>
Message-ID: <320fb6e00812131020r4a2a02dtcc7d65e8cf495052@mail.gmail.com>

On Wed, Dec 3, 2008 at 9:10 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> This email from the NCBI will be of interest for Bio.Entrez - we may
> need to add a few DTD files to Bio.Entrez in preparation for this...
> see also Bug 2678.

I've just added the following five DTD files to CVS,

nlmcommon_090101.dtd
nlmmedline_090101.dtd
nlmmedlinecitation_090101.dtd
nlmsharedcatcit_090101.dtd
pubmed_090101.dtd

All from http://www.ncbi.nlm.nih.gov/entrez/query/DTD/

Peter

From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 15:19:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 15:19:15 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200812132019.mBDKJFkD005703@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 15:19 EST -------
(In reply to comment #6)
> If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read
> it from there. If not, it tries to download it. This may fail if the servers
> are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when
> Biopython is installed), you won't run into this problem.

I was just looking at this on my Windows XP Python 2.3 machine, and when it
tried to download missing DTD files it was just using a filename as the URL.
I've committed a fix to CVS which should resolve this:

biopython/Bio/Entrez/Parser.py revision 1.3

I'll double check this on Linux/Mac next week.

This may be related to Leighton's problem - although 'xhtml1-strict.dtd' and
'xhtml-lat1.ent' are not NCBI DTD files, but rather a part of the XML
specification itself.

Note that if I delete all the Bio/Entrez/DTDs/* files, then test_Entrez.py
fails.  I get warning messages about downloading missing DTD files, and the
following failures:

======================================================================
ERROR: Test parsing pubmed links returned by ELink (fifth test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 2523, in t_pubmed5
    record = Entrez.read(input)
  File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in
read

    record = handler.run(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run
    self.parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 131, in
startE
lement
    if object!="":
UnboundLocalError: local variable 'object' referenced before assignment

======================================================================
ERROR: Test parsing XML returned by EFetch, PubMed database (first test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 3058, in t_pubmed1
    record = Entrez.read(input)
  File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in
read

    record = handler.run(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run
    self.parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
ExpatError: syntax error: line 1, column 0

======================================================================
ERROR: Test parsing XML returned by EFetch, PubMed database (second test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 3261, in t_pubmed2
    record = Entrez.read(input)
  File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in
read

    record = handler.run(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run
    self.parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
ExpatError: syntax error: line 1, column 0

======================================================================
FAIL: Test parsing pubmed links returned by ELink (sixth test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 2697, in t_pubmed6
    assert len(record[0]["IdCheckList"])==2
AssertionError

----------------------------------------------------------------------

(The rest of the Entrez tests pass even with the missing DTDs - they are now
successfully downloaded on demand)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 18:56:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 18:56:02 -0500
Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
In-Reply-To: <bug-2649-42@http.bugzilla.open-bio.org/>
Message-ID: <200812132356.mBDNu2HE017869@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 18:56 EST -------
Hi Paul,

I'd like to close this bug now as we think it has been solved.  Michiel's
update was included with Biopython 1.49, so you don't need to mess about with
CVS to check and confirm this now.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 19:12:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 19:12:00 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140012.mBE0C0Yo018673@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 19:11 EST -------
(In reply to comment #4)
> (In reply to comment #2)
> > (In reply to comment #0)
> > > 1) Fixed date/dates typo.
> > 
> > Why is it a typo?  Change not checked in.
> 
> The function _load_bioentry_date in Loader.py inserts the annotation 'date',
> if present, or the current date if not, into the bioentry_qualifier_value
> table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as
> the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo,
> which should be 'date' and not 'dates'.

OK, that does make sense.  However...

> Also, because Loader.py handles dates separately, they should not be
> handled by the function load_annotations.

That would make sense if we make the above "dates"/"date" change.

If we tested a record with a "date" annotation, I guess currently it would get
recorded twice - once under "date_changed" by _load_bioentry_date (retrieved as
"dates") and again but under "date" by _load_annotations (retrieved as "date").

Right now, I'm wondering why _load_bioentry_date exists in the first place ...
perhaps this special annotation entry "date_changed" is to mimic BioPerl?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 19:59:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 19:59:14 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140059.mBE0xE0g021156@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 19:59 EST -------
(In reply to comment #0)
> Also, the convergence criteria is hard coded into the file by the following
> gloable definitions:
> MAX_IIS_ITERATIONS = 10000    # Maximum iterations for IIS.
> IIS_CONVERGE = 1E-5           # Convergence criteria for IIS.
> MAX_NEWTON_ITERATIONS = 100   # Maximum iterations on Newton's method.
> NEWTON_CONVERGE = 1E-10       # Convergence criteria for Newton's method.
> 
> This makes it impossible for the user to specify their own values without
> changing the actual function.

No, you can change them in your own code - they are just module level variable.
For example:

from Bio import MaxEntropy
#Check the current limit,
print MaxEntropy.MAX_NEWTON_ITERATIONS
#Increase the iteration limit,
MaxEntropy.MAX_NEWTON_ITERATIONS = 1000

One might argue these should be *optional* arguments to the functions. 
However, your suggested change adds new *required* arguments, which is not a
backwards compatible API change.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 21:20:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 21:20:37 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140220.mBE2KbM1026093@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #3 from bsouthey at gmail.com  2008-12-13 21:20 EST -------
(In reply to comment #2)
> (In reply to comment #0)
> > Also, the convergence criteria is hard coded into the file by the following
> > gloable definitions:
> > MAX_IIS_ITERATIONS = 10000    # Maximum iterations for IIS.
> > IIS_CONVERGE = 1E-5           # Convergence criteria for IIS.
> > MAX_NEWTON_ITERATIONS = 100   # Maximum iterations on Newton's method.
> > NEWTON_CONVERGE = 1E-10       # Convergence criteria for Newton's method.
> > 
> > This makes it impossible for the user to specify their own values without
> > changing the actual function.
> 
> No, you can change them in your own code - they are just module level variable.
> For example:
> 
> from Bio import MaxEntropy
> #Check the current limit,
> print MaxEntropy.MAX_NEWTON_ITERATIONS
> #Increase the iteration limit,
> MaxEntropy.MAX_NEWTON_ITERATIONS = 1000
> 
> One might argue these should be *optional* arguments to the functions. 
> However, your suggested change adds new *required* arguments, which is not a
> backwards compatible API change.
> 
> Peter
> 

I strongly disagree on this because a user should not have to read the module
source code to find these module level global variables and what values these
actually are. But this is not my code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 23:27:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 23:27:16 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140427.mBE4RGIE001073@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-12-13 23:27 EST -------
(In reply to comment #3)
> I strongly disagree on this because a user should not have to read the module
> source code to find these module level global variables and what values these
> actually are. But this is not my code.
> 
I agree with Bruce that these variables should be arguments to the function,
rather than module-level global variables. To keep the API backwards
compatible, we can specify the current values for these variables as default
values for these arguments. This will also make it easier for users that are
not particularly interested in these variables.

If you submit a revised patch, please do not just comment out unneeded code; it
is better to actually remove code that is no longer needed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Dec 14 08:17:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 14 Dec 2008 08:17:47 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812141317.mBEDHla7021974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-14 08:17 EST -------
(In reply to comment #3)
> (In reply to comment #2)
> > No, you can change them in your own code - they are just module level
> > variables
> > ...
> > One might argue these should be *optional* arguments to the functions. 
> > However, your suggested change adds new *required* arguments, which is not a
> > backwards compatible API change.

Sorry - you *did* use optional arguments for the train function. I was
distracted by the private functions where the new arguments are required.

> I strongly disagree on this because a user should not have to read the module
> source code to find these module level global variables and what values these
> actually are. But this is not my code.

I'm not saying the current state of the code is elegant - just correcting your
factual error that the end user couldn't change these parameters.  They can.

(In reply to comment #4)
> I agree with Bruce that these variables should be arguments to the function,
> rather than module-level global variables. To keep the API backwards
> compatible, we can specify the current values for these variables as default
> values for these arguments. This will also make it easier for users that are
> not particularly interested in these variables.

This is what I was implying, although less clearly.

To be even more explicit, if we want to add these variables as arguments to the
functions then they should default to the existing upper case module level
variables.  We shouldn't remove or rename the module level variables in case
anyone was using them them in the way I illustrated in comment 2.

e.g.
def train(training_set, results, feature_fns, update_fn=None):

becomes something like this:

def train(training_set, results, feature_fns, update_fn=None,
          max_iis_iterations = MAX_IIS_ITERATIONS,
          iis_convere = IIS_CONVERGE,
          max_newton_iterations = MAX_NEWTON_ITERATIONS
          newton_coverage = NEWTON_CONVERGE):
#This function's code would then need updating to use
#local variable max_iis_iterations instead of the
#module level MAX_IIS_ITERATIONS.

Note this does NOT use uppercase argument names as in Bruce's original patch -
these would not be consistent with the rest of Biopython.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 05:11:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:11:37 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151011.mBFABbqD007138@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #6 from lpritc at scri.sari.ac.uk  2008-12-15 05:11 EST -------
(In reply to comment #2)
> *** Bug 2710 has been marked as a duplicate of this bug. ***
> 

(In reply to comment #0)
> test_GenomeDiagram fails because the renderPM module is not part of standard
> install of reportlab, at least under Linux. 

That's odd - renderPM is in the source for ReportLab 2.2.  Are you using an
up-to-date version?  It seems to install well enough on our 64-bit Linux box
from the ReportLab source.

> I consider that the renderPM module should not be required so
> Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
> renderPM module when it is not available. 

renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part
of GenomeDiagram's functionality.

I prefer your alternative suggestion of making it a 'dynamic' import, but even
then I think that the inconvenience of preparing the diagram, only to find out
at the last possible stage that you can't draw it because you're missing the
library, is worse than getting the error message upfront.  Not that this should
be a problem, since renderPM is part of the main ReportLab source, now.  YMMV
though, and I'm happy for the code to conform to the Biopython house style.

> The installation documentation needs to include something about needing the
> renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.
> 
> There must be a test for the presence of the renderPM module.

I'm not convinced of the value of this, as renderPM is part of the current
ReportLab source installation.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 05:17:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:17:54 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151017.mBFAHs0K007630@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #6 from lpritc at scri.sari.ac.uk  2008-12-15 05:11 EST -------
(In reply to comment #2)
> *** Bug 2710 has been marked as a duplicate of this bug. ***
> 

(In reply to comment #0)
> test_GenomeDiagram fails because the renderPM module is not part of standard
> install of reportlab, at least under Linux. 

That's odd - renderPM is in the source for ReportLab 2.2.  Are you using an
up-to-date version?  It seems to install well enough on our 64-bit Linux box
from the ReportLab source.

> I consider that the renderPM module should not be required so
> Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
> renderPM module when it is not available. 

renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part
of GenomeDiagram's functionality.

I prefer your alternative suggestion of making it a 'dynamic' import, but even
then I think that the inconvenience of preparing the diagram, only to find out
at the last possible stage that you can't draw it because you're missing the
library, is worse than getting the error message upfront.  Not that this should
be a problem, since renderPM is part of the main ReportLab source, now.  YMMV
though, and I'm happy for the code to conform to the Biopython house style.

> The installation documentation needs to include something about needing the
> renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.
> 
> There must be a test for the presence of the renderPM module.

I'm not convinced of the value of this, as renderPM is part of the current
ReportLab source installation.


------- Comment #7 from lpritc at scri.sari.ac.uk  2008-12-15 05:17 EST -------
(In reply to comment #0) (from #2710)
> test_GenomeDiagram fails because the renderPM module is not part of standard
> install of reportlab, at least under Linux. 

renderPM is part of the source install of ReportLab 2.2, and installs correctly
on our 64-bit Linux box.  Are you using an up-to-date version of ReportLab? 
The version that your distro's installer uses may not be the most recent.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 05:41:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:41:13 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151041.mBFAfDI8010277@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #8 from lpritc at scri.sari.ac.uk  2008-12-15 05:41 EST -------
(In reply to comment #0)
> 1) Why are there two functions to output a diagram than just one generic
> function? In particular, why not just pass a filename or not? 

When I wrote the libraries originally, I had one main use in mind: production
of publication-quality images in vector format.  Later on I decided that I
needed streaming output for web display, and then bolted on the
write_to_string() to look like the ReportLab interface, for consistency. 
That's why there are two methods: the write() method produces
publication-quality (and bitmaps, if you ask), and the write_to_string() method
produces the streaming output.

It should be possible to make write() do both jobs, so long as the intention is
declared in the argument list.  It might be nice to just be able to specify a
stream or handle, rather than the filename.  Both of these would be an API
change.

> 2) I find the functions write() and write_to_string() just plain ugly. 
> You define a local dictionary of modules every time these functions are called.

That dictionary could be placed at the head of the script to be defined on
import.  But I think it's more explicit what's going on to have it in the
method itself - the dictionary has restricted scope, and is garbage-collected
after the function call.  Also, I don't understand your nested list proposal:
distribution dictionaries are not that uncommon.

> 4) I do not know the policy on module imports, but this line is only required
> for write() and write_to_string():
> from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
> Also renderPM is an addon.

Apologies for repeating myself earlier about this one - Bugzilla was being
flaky - but renderPM is now part of ReportLab 2.2.  Whether we should continue
to support/cater for installations of 1.21 without the add-ons is another
question, I think.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 05:51:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:51:30 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151051.mBFApU9R011217@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #9 from lpritc at scri.sari.ac.uk  2008-12-15 05:51 EST -------
(In reply to comment #3)
>As an aside, I'd like write_to_string() to support a DPI argument like write()
> does.

The way I originally intended write_to_string() to be used - sending graphics
to a browser - the DPI has no influence at all.  DPI is only of any importance
for printing graphics: the DPI translates the pixel size into the final printed
size of the image.  The image you see on screen (assuming no fancy browser
scaling) is pixel-per-pixel.  That's why I left it out.

It may be that people have a sensible reason for writing their image output to
string - rather than binary - encoding, for writing to a file.  I'm not clear
on what that would be, but it's possible.  In that case, I think that an
appropriate merging of the write() and write_to_string() methods could be:

def write(self, filename=None, output=default_output, dpi=default_dpi,
encoding=default_encoding):

encoding could then be either 'binary' (default), or 'string' - which would
emulate write_to_string()'s function.

Where handle is not None, the resulting output would be sent to the passed
handle - which could potentially include sys.stdout.  Where handle is None, the
method could return the encoded image directly, as write_to_string() does, now.

Other than the obvious problem with ReportLab's drawToFile requiring a
filename, rather than a handle - does this seem like a reasonable plan to
others?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 06:00:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 06:00:01 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151100.mBFB01fk011962@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-15 06:00 EST -------
(In reply to comment #8)
> 
> > 4) I do not know the policy on module imports, but this line is only
> > required for write() and write_to_string():
> > from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
> > Also renderPM is an addon.
> 
> Apologies for repeating myself earlier about this one - Bugzilla was being
> flaky - but renderPM is now part of ReportLab 2.2.  Whether we should continue
> to support/cater for installations of 1.21 without the add-ons is another
> question, I think.

I thought I'd commented on this bug already but I committed a patch which would
fail gracefully if renderPM was missing.  I must be running an older version of
ReportLab on my Linux box at home, because it didn't have renderPM installed.  

However - this check is done when writing the file.  This is good if you don't
have renderPM but only want vector images.  This is bad if you do want bitmaps
images, as the missing dependency error happens at the very end.

However, I don't think we can assume renderPM will be installed.  Looking at
the website for reportlab 2.2, its not clear if the Windows installers will
include renderPM or not...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 06:02:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 06:02:35 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151102.mBFB2ZMq012237@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #11 from lpritc at scri.sari.ac.uk  2008-12-15 06:02 EST -------
(In reply to comment #3)
> I agree something needs to be done for this issue (in particular the bit
> originally covered by Bug 2710.
> 
> Moving the imports into these function(s) would be another way to let use deal
> with the missing renderPM module if and when it is used (either leave the
> ImportError, or raise a missing external dependency error).

One issue with this approach is that, when working with the module
interactively, a user might not be aware of the absence of the appropriate
module until they attempted to produce their output - which might be after
quite a bit of interactive work.  Informing the user up-front that renderPM is
not available - either by ImportError or friendly warning - avoids this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 06:17:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 06:17:45 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151117.mBFBHjgn013463@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-15 06:17 EST -------
(In reply to comment #9)
> (In reply to comment #3)
> > As an aside, I'd like write_to_string() to support a DPI argument like
> > write() does.
> 
> The way I originally intended write_to_string() to be used - sending graphics
> to a browser - the DPI has no influence at all.  DPI is only of any importance
> for printing graphics ...

OK, so its less useful than I had expected.  Rending bitmaps to strings so they
can be inserted into a database as blobs is one potential use-case.  Also for a
web-service where you expect the user to save and print the naked image
(unusual, and probably software dependent on how the DPI is treated).

> In that case, I think that an appropriate merging of the write() and
> write_to_string() methods could be:
> 
> def write(self, filename=None, output=default_output, dpi=default_dpi,
> encoding=default_encoding):
> 
> encoding could then be either 'binary' (default), or 'string' - which would
> emulate write_to_string()'s function.
> 
> Where handle is not None, the resulting output would be sent to the passed
> handle - which could potentially include sys.stdout.  Where handle is None,
> the method could return the encoded image directly, as write_to_string()
> does, now.
> 
> Other than the obvious problem with ReportLab's drawToFile requiring a
> filename, rather than a handle - does this seem like a reasonable plan to
> others?

On the plus side, this would be backwards compatible (and we could deprecate
the draw_to_string function).

However, I'm not so keen on this style personally - the return value is
radically different depending on the arguments (nothing, or a string of data).

If we were designing this from scratch, I would have suggested one write
function which wrote to a handle - which would let you then write to a file or
a string (using StringIO).  On the other hand, this is perhaps a little low
level.  We're had similar discussions regarding Bio.SeqIO in the past.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 15:33:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 15:33:51 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200812152033.mBFKXpp4005791@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


------- Comment #4 from joelb at lanl.gov  2008-12-15 15:33 EST -------
I heard back from GenBank, and it seems they are saying the problem isn't
theirs:
>On Tue, December 9, 2008 10:30 am, gb-admin at ncbi.nlm.nih.gov wrote:
>> Hi Joel,
>>
>> I heard back from our database folks on this one.  Essentially we do
>> allow the source line to line-wrap, but we never publicly announced
>> it.  We apologize for this oversight and will be putting something
>> in the release notes regarding this.  Hopefully BioPython and other
>> companies will be able to pick up this change and adapt once it is
>> announced in the release notes.
>>
>> thanks for pointing it out
>>
>> Linda

I just wrote back with the followup question:
>

>OK, but but then a followup question.  How does one distinguish, then, a
>line-wrapped organism line from the multiline phylogeny that follows? 
>According to my reading of the specs (and most Bio* GenBank parser's
>implementations) it seems that an equally-valid parsing of the following
>ORGANISM record is that it belongs to the "AKU_12601 Bacteria" kingdom. 
>That is, there is no official way of signalling "this is the end of the
>multiline organism name" or "this begins the multiline phylogeny record."
>
>  ORGANISM  Salmonella enterica subsp. enterica serovar Paratyphi A str.
>            AKU_12601
>            Bacteria; Proteobacteria; Gammaproteobacteria;Enterobacteriales;
>            Enterobacteriaceae; Salmonella.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Dec 17 18:44:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Dec 2008 18:44:58 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200812172344.mBHNiwPt019616@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


------- Comment #5 from joelb at lanl.gov  2008-12-17 18:44 EST -------
I received the following response to my followup.  It now appears that the bug
is with BioPython, since GenBank has changed its definition.  It seems likely
that all Bio* flatfile parsers will be affected.

>I just received the wording that will appear in Section 3.4.2 of gbrel.txt 
>for this month's release:
>
>   ORGANISM     - Formal scientific name of the organism (first line)
>and taxonomic classification levels (second and subsequent lines).
>Mandatory subkeyword in all annotated entries/two or more records.
>
>   In the event that the organism name exceeds 68 characters (80 - 13 +
>1)
>   in length, it will be line-wrapped and continue on a second line,
>   prior to the taxonomic classification. Unfortunately, very long 
>   organism names were not anticipated when the fixed-length GenBank
>   flatfile format was defined in the 1980s. The possibility of linewraps
>   makes the job of flatfile parsers more difficult : essentially, one
>   cannot be sure that the second line is truly a classification/lineage
>   unless it consists of multiple tokens, delimited by semi-colons.
>   The long-term solution to this problem is to introduce an additional
>   subkeyword, probably 'LINEAGE' . This might occur sometime in 2009
>   or 2010.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 18 06:07:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Dec 2008 06:07:16 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200812181107.mBIB7G97005964@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-18 06:07 EST -------
(In reply to comment #5)
> I received the following response to my followup.  It now appears that the bug
> is with BioPython, since GenBank has changed its definition.  It seems likely
> that all Bio* flatfile parsers will be affected.

Thanks for chasing this up Joel :)

> I just received the wording that will appear in Section 3.4.2 of gbrel.txt 
> for this month's release:
> >
> >   ORGANISM     - Formal scientific name of the organism (first line)
> >and taxonomic classification levels (second and subsequent lines).
> >Mandatory subkeyword in all annotated entries/two or more records.
> >
> >   In the event that the organism name exceeds 68 characters (80-13+1)
> >   in length, it will be line-wrapped and continue on a second line,
> >   prior to the taxonomic classification. Unfortunately, very long 
> >   organism names were not anticipated when the fixed-length GenBank
> >   flatfile format was defined in the 1980s. The possibility of linewraps
> >   makes the job of flatfile parsers more difficult : essentially, one
> >   cannot be sure that the second line is truly a classification/lineage
> >   unless it consists of multiple tokens, delimited by semi-colons.
> >   The long-term solution to this problem is to introduce an additional
> >   subkeyword, probably 'LINEAGE' . This might occur sometime in 2009
> >   or 2010.


It looks like my guess was right, see comment #1:
> Let's wait and hear what the NCBI says - I expect they will have to change the
> file format definition slightly.
> 
> If they say this is a valid file, I hope they will also explain officially
> how we should split up the species and its lineage.  One option would be
> some thing like looking for semi-colons in the following text as indicative
> of the lineage (rather than as more of the ORGANISM).

Now that we've had the NCBI recommend the semi-colon approach, I've fixed our
parser in CVS:
Bio/GenBank/Record.py revision 1.14
Bio/GenBank/Scanner.py revision 1.26

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Dec 18 14:01:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Dec 2008 14:01:32 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812181901.mBIJ1W31019801@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-18 14:01 EST -------
(In reply to comment #27)
> This might be better off as a new enhancement bug, but here is a possible
> "arc-box" drawing function to go in the AbstractDrawer.py file, based on the
> existing draw_box function.
> 
> ...

There was an issue with different frames of reference in the initial code I was
suggesting.

> Alternately, the code could just go in CircularDrawer.py directly.

This seemed simpler in the short term.

> As far as I can tell from looking at their source code, even ReportLab_1_21_2
> has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any
> issue here with backwards compatibility.

I've just checked in a patch based on this - see
Bio/Graphics/GenomeDiagram/CircularDrawer.py revision 1.8

I've also updated the unit test to draw a circular diagram with some features
in white (with an automatic black border).  This now looks nice - with the old
code using mutliple boxes to fake the arced box, the whole feature ended up
looking black.  See Tests/test_GenomeDiagram.py revision 1.13

As a bonus, PDF output seems a little smaller now as well :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 11:19:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 11:19:51 -0500
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221619.mBMGJp6k013225@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 11:19 EST -------
(In reply to comment #24)
> I committed my patch to setup.py, as it seems to work fine with Python 2.3,
> 2.4, and 2.5 on all platforms. Leaving this bug open, since we still need to
> remove the workaround in Bio/PopGen/SimCoal/__init__.py.

Editing Bio/PopGen/SimCoal/__init__.py so do just the following seems to work
fine on Linux and MacOS (I've not tested on Windows yet):

import os
builtin_tpl_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
"data"))

I *think* this directory is only used in one place in
Bio/PopGen/SimCoal/Template.py so it might make more sense to put this code in
that function (leaving the __init__.py file essentially empty).  What do you
think Tiago?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 12:20:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 12:20:46 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221720.mBMHKkwo018936@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #961 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 12:20 EST -------
(From update of attachment 961)
This patch is now obsolete - I've checked in a variant of this into CVS.

This will allow us to proceed with Bug 2597 (
Enforce alphabet letters in Seq objects) without having to first introduce
mixed case variants of the IUPAC alphabets.

If/when we have mixed case IUPAC alphabets, then Bio.Sequencing.PhD could use
them.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 12:33:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 12:33:33 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221733.mBMHXXjd020146@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 12:33 EST -------
Created an attachment (id=1174)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1174&action=view)
Patch for Bio/Nexus/Nexus.py (non IUPAC) alphabet handling

(In reply to comment #2)
> I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for
> everyone (instead creating their own uppercase-lowercase variants of those
> terribly complicated biopython alphabet classes), and easy to change for all
> other modules if lowercase-uppercase is what they want (or need).

I'm not saying we shouldn't add mixed (and even lower) case variants of the
IUPAC alphabets, however, even if we had them, NEXUS still uses extra
characters like "-" for gaps (easily handled via a Gapped alphabet encoder) and
"?" (for a missing character).  Are there any other extra characters?

Under the current alphabet schema, we'd have to use a (mixed case) IUPAC
alphabet, then add a Gapped AlphabetEncoder (easy) then add a new alphabet
encoder for any misc letters non-IUPAC characters like "?".  This could be done
with the generic AlphabetEncoder, or we could add additional encoder objects
for special meanings.  This starts to get complicated (dealing with
AlphabetEncoders is nasty).

This attached patch is a variation on my "plan (a)" from comment 0. It makes
Bio.Nexus create its own alphabet objects (based on the generic DNA/RNA/Protein
classes) with the precise list of valid letters required for that file.  Using
this patch should allow us to press ahead with Bug 2597.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 12:38:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 12:38:10 -0500
Subject: [Biopython-dev] [Bug 2597] Enforce alphabet letters in Seq objects
In-Reply-To: <bug-2597-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221738.mBMHcA86020507@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2597


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 12:38 EST -------
Created an attachment (id=1175)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1175&action=view)
Patch for Bio/Seq.py to check the alphabet letters

This is a simple approach to checking the letters - probably not the fastest. 
I think it is important that the exception gives some clue about why the Seq
object was not created - either listing the first invalid character (as in this
patch) or listing all invalid characters (which could be done using sets).

On the other hand, I'd like this check to be as fast as possible - perhaps even
at the cost of a generic exception message like "Sequence contains letters
which are not valid for the given alphabet".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 13:27:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 13:27:11 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221827.mBMIRBme024497@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 13:27 EST -------
Created an attachment (id=1176)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1176&action=view)
Adding lower and mixed case IUPAC Alphabets

This needs reviewing by someone else - especially the multiple inheritance
which tries to follow the existing pattern that the parent is a more general
version of the child.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 04:58:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 04:58:31 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812230958.mBN9wVDK000340@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #13 from bsouthey at gmail.com  2008-12-23 04:58 EST -------
(In reply to comment #6)
> (In reply to comment #2)
> > *** Bug 2710 has been marked as a duplicate of this bug. ***
> > 
> 
> (In reply to comment #0)
> > test_GenomeDiagram fails because the renderPM module is not part of standard
> > install of reportlab, at least under Linux. 
> 
> That's odd - renderPM is in the source for ReportLab 2.2.  Are you using an
> up-to-date version?  It seems to install well enough on our 64-bit Linux box
> from the ReportLab source.


I can not check this as I am away from my system. As I recall, the Python code
for accessing this library is provided with the standard install as there is a
renderPM.py file. But that is just a wrapper to some C code found in the
rl_addons directory. So it is a big no that renderPM is available unless you
actually build the C sources or download the binaries (only valid for Windows).

According to the website
http://www.reportlab.org/subversion.html
"
It will create subdirectories for reportlab, which is an importable
python package, and rl_addons which contains the C extensions. The
latter need building with the contained setup script, but can also be
downloaded in pre-built form from our downloads page. They rarely
change.
"

What did you actually install?
In particular where was _renderPM built?
Basically we need to document this as there appears to be different ways to
install reporlab (may also be version or svn related).

> 
> > I consider that the renderPM module should not be required so
> > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
> > renderPM module when it is not available. 
> 
> renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part
> of GenomeDiagram's functionality.

No problem then, but you must provide a test for the presence and functionality
of it in the actual code as well as the biopython tests.

> 
> I prefer your alternative suggestion of making it a 'dynamic' import, but even
> then I think that the inconvenience of preparing the diagram, only to find out
> at the last possible stage that you can't draw it because you're missing the
> library, is worse than getting the error message upfront.  Not that this should
> be a problem, since renderPM is part of the main ReportLab source, now.  YMMV
> though, and I'm happy for the code to conform to the Biopython house style.
> 
> > The installation documentation needs to include something about needing the
> > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.
> > 
> > There must be a test for the presence of the renderPM module.
> 
> I'm not convinced of the value of this, as renderPM is part of the current
> ReportLab source installation.
> 

My understanding is that this statement is not completely true.  But I would
like confirmation either way. There may also be allowance for windows
installations especially non-source ones but I can not check those.


Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 05:18:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 05:18:58 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231018.mBNAIwuq002193@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #14 from bsouthey at gmail.com  2008-12-23 05:18 EST -------
(In reply to comment #12)
> (In reply to comment #9)
> > (In reply to comment #3)
> > > As an aside, I'd like write_to_string() to support a DPI argument like
> > > write() does.
> > 
> > The way I originally intended write_to_string() to be used - sending graphics
> > to a browser - the DPI has no influence at all.  DPI is only of any importance
> > for printing graphics ...
> 
> OK, so its less useful than I had expected.  Rending bitmaps to strings so they
> can be inserted into a database as blobs is one potential use-case.  Also for a
> web-service where you expect the user to save and print the naked image
> (unusual, and probably software dependent on how the DPI is treated).
> 

Surely it is important because a user can write to a string and then save the
string to a file rather than using write() a second time. 

What do these options do?
bg, configPIL, showBoundary

> > In that case, I think that an appropriate merging of the write() and
> > write_to_string() methods could be:
> > 
> > def write(self, filename=None, output=default_output, dpi=default_dpi,
> > encoding=default_encoding):
> > 
> > encoding could then be either 'binary' (default), or 'string' - which would
> > emulate write_to_string()'s function.
> > 
> > Where handle is not None, the resulting output would be sent to the passed
> > handle - which could potentially include sys.stdout.  Where handle is None,
> > the method could return the encoded image directly, as write_to_string()
> > does, now.
> > 
> > Other than the obvious problem with ReportLab's drawToFile requiring a
> > filename, rather than a handle - does this seem like a reasonable plan to
> > others?
> 
> On the plus side, this would be backwards compatible (and we could deprecate
> the draw_to_string function).
> 
> However, I'm not so keen on this style personally - the return value is
> radically different depending on the arguments (nothing, or a string of data).
> 
> If we were designing this from scratch, I would have suggested one write
> function which wrote to a handle - which would let you then write to a file or
> a string (using StringIO).  On the other hand, this is perhaps a little low
> level.  We're had similar discussions regarding Bio.SeqIO in the past.
> 

I agree and I am not very concerned about backwards compatibility since this is
a very new function to Biopython. I think that is what is almost what
write_to_string() does and python functions are very big. But this is not my
code so please do as you want here.

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 06:12:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 06:12:33 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231112.mBNBCXkt006916@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 06:12 EST -------
(In reply to comment #14)
> (In reply to comment #12)
> > OK, so its less useful than I had expected.  Rending bitmaps to strings so
> > they can be inserted into a database as blobs is one potential use-case.
> > Also for a web-service where you expect the user to save and print the
> > naked image (unusual, and probably software dependent on how the DPI is
> > treated).
> 
> Surely it is important because a user can write to a string and then save the
> string to a file rather than using write() a second time. 

I was talking about write to string with a DPI not being so useful.

Using write to string is VERY useful, particularly for a webserver (which is
why Leighton added it, and how I have used it).  Setting the DPI isn't
important for using images in webpages - HTML and CSS provide lots of ways to
control the displayed and printed size.  Even if the browser is pointed
directly at the image (and not as part of a webpage) and you then print it, the
browser may ignore the DPI setting (probably browser specific).  i.e. The DPI
will only matter if the user saves the image and opens it in DPI aware
software.

(In reply to comment #14)
> (In reply to comment #12)
> > However, I'm not so keen on this style personally - the return value is
> > radically different depending on the arguments (nothing, or a string of
> > data).
> > 
> > If we were designing this from scratch, I would have suggested one write
> > function which wrote to a handle - which would let you then write to a
> > file or a string (using StringIO).  On the other hand, this is perhaps a
> > little low level.  We're had similar discussions regarding Bio.SeqIO in
> > the past.
> 
> I agree and I am not very concerned about backwards compatibility since this
> is a very new function to Biopython. I think that is what is almost what
> write_to_string() does and python functions are very big. But this is not my
> code so please do as you want here.

GenomeDiagram is new to Biopython, but has been available independently for
many years.  There will be some existing users (not just me and Leighton), and
the less they have to change to switch their code from using standalone
GenomeDiagram to the one within Biopython the better (the import lines have to
change for example).  We do need to think about backwards compatibility a bit.

Getting back to your original points,

(1) Two functions write() and write_to_string()
This follows the reportlab API, and they do actually return different
encodings.  From a backwards compatibility argument they should both stay, but
that doesn't stop us providing a unified method and deprecating 
write_to_string().

(2) Coding style of write() and write_to_string()
I don't have a problem with this - it works, its clear, its easily extended if
ReportLab add more back ends.  It doesn't strike me as ugly.  Inevitably this
is largely a matter of preference.

(3) The KeyError exception with invalid arguments.
This is fixed in CVS, for an invalid format argument you now get a ValueError
which is standard python practice.

(4) renderPM
Fixed in CVS, in that you can now use GenomeDiagram without ReportLab renderPM,
and have full functionality except for bitmap output.  Given we don't seem to
be able to assume renderPM will be installed and working, this seems a
reasonable solution.  If you try and render a bitmap without renderPM, then you
get a MissingExternalDependencyError exception asking you to install renderPM. 
We will need to look into this further for the documentation.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 07:45:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 07:45:55 -0500
Subject: [Biopython-dev] [Bug 2718] New: Bio.Graphics and output file
	formats (PDF, EPS, SVG, and bitmaps)
Message-ID: <bug-2718-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718

           Summary: Bio.Graphics and output file formats (PDF, EPS, SVG, and
                    bitmaps)
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


In addition to PDF and PS/EPS (encapsulated postscript), ReportLab can also do
SVG, and with its optional renderPM module can do assorted bitmaps too (e.g.
PNG, JPG, TIFF, GIF, BMP).  Note that renderPM may not be installed (see Bug
2710).

The recently added Bio.Graphics.GenomeDiagram module supports all of these
formats - see Diagram.py with write (to filename or a handle) and
write_to_string methods.

Looking at the older Bio.Graphics code, it currently only supports PDF
postscript, using a mixture of method names (which isn't very consistent):

Bio.Graphics.Distribution has a DistributionPage object with a draw method
(which writes to a filename or handle).

Bio.Graphics.BasicChromosome has an Organism object with a write method (which
writes to a filename or handle).

Bio.Graphics.Comparative has a ComparativeScatterPlot object with a
draw_to_file method (which writes to a filename or handle).

I would like:

(1) All the Bio.Graphics "write to file/handle" functions to accept any of the
supported file formats (like Bio.Graphics.GenomeDiagram), which would require
renderPM at run time for the bitmap formats (see Bug 2710).  They should share
some code for mapping format names to ReportLab rendering module.  This would
be easy to do without changing the existing mix of method names.

(2) Update the docstrings for the "write to file/handle" functions to make it
clear they can accept a filename OR a handle (a result of the underlying
reportlab renderer's drawToFile function's behaviour - see note below).

(3) Standardise on the method naming (and perhaps deprecate the old methods). 
Using "write" seems to be a sensible choice based on the current names used in
Bio.Graphics.

For reference/comparison, ReportLab's render modules have three related
functions:

* drawToString - Returns a string, calls drawToFile internally with a StringIO
handle.
* drawToFile - Takes a filename OR a handle (although their docstrings do not
make this clear, this works as the Canvas object takes either).  Calls the draw
function internally.
* draw - Takes a canvas object

See also Bug 2711 which touched on these issues in the context of GenomeDiagram
only.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 07:47:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 07:47:26 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231247.mBNClPt9017108@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 07:47 EST -------
In comment #12, I wrote:
> If we were designing this from scratch, I would have suggested one write
> function which wrote to a handle - which would let you then write to a file or
> a string (using StringIO).  On the other hand, this is perhaps a little low
> level.  We're had similar discussions regarding Bio.SeqIO in the past.

The reportlab docstrings are very unclear, however, their renderer's drawToFile
functions take either a filename OR a handle.  This works because the
underlying Canvas object can be created giving either a filename or a handle.

As a result, GenomeDiagram's write() method should accept either a filename or
a handle.  We should update the docstring to say this (perhaps even renaming
the argument?).

(In reply to comment #15)
> (1) Two functions write() and write_to_string()
> This follows the reportlab API, and they do actually return different
> encodings.

I wrote this based on something Leighton had said to me.  Going over the
reportlab code, this isn't true - reportlab's drawToString just calls
drawToFile with a cStringIO or StringIO handle.  They write identical data.

(In reply to comment #15)
> Getting back to your original points,
> 
> (1) Two functions write() and write_to_string()
> This follows the reportlab API, and they do actually return different
> encodings.  From a backwards compatibility argument they should both stay, but
> that doesn't stop us providing a unified method and deprecating 
> write_to_string().

I've filed Bug 2718 for the general issue of method naming for the Bio.Graphics
modules output functionality.

> (2) Coding style of write() and write_to_string()
> I don't have a problem with this - it works, its clear, its easily extended if
> ReportLab add more back ends.  It doesn't strike me as ugly.  Inevitably this
> is largely a matter of preference.

Leaving this as is - the code itself may end up handled via shared function for
all of Bio.Graphics via Bug 2718.

> (3) The KeyError exception with invalid arguments.
> This is fixed in CVS, for an invalid format argument you now get a ValueError
> which is standard python practice.
> 
> (4) renderPM
> Fixed in CVS, in that you can now use GenomeDiagram without ReportLab
> renderPM and have full functionality except for bitmap output.  Given we 
> don't seem to be able to assume renderPM will be installed and working, this
> seems a reasonable solution.  If you try and render a bitmap without
> renderPM, then you get a MissingExternalDependencyError exception asking you
> to install renderPM.  We will need to look into this further for the
> documentation.

Marking this bug as FIXED.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 07:55:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 07:55:11 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231255.mBNCtB1L017851@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 07:55 EST -------
Example script showing the reportlab render modules producing output given a
filename, handle, or via a string:

from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.units import cm
from reportlab.graphics import renderPS, renderPDF, renderPM
from reportlab.graphics.shapes import Drawing, String

width = 10*cm
height = 2*cm

print "Using canvas directly (PDF only)..."
c = Canvas("hello1.pdf", pagesize=(width, height))
c.drawString(1*cm, 1*cm, "Hello World!")
c.showPage()
c.save()

#Create very simple drawing object,
drawing = Drawing(width, height)
drawing.add(String(1*cm, 1*cm, "Hello World!"))

print "Using filenames..."
renderPDF.drawToFile(drawing, "hello2.pdf")
renderPM.drawToFile(drawing, "hello2.png", "PNG")

print "Using handles..."
handle = open("hello3.pdf","w")
renderPDF.drawToFile(drawing, handle)
handle.close()
handle = open("hello3.ps","w")
renderPS.drawToFile(drawing, handle)
handle.close()
handle = open("hello3.png","w")
renderPM.drawToFile(drawing, handle, "PNG")
handle.close()

print "Using strings..."
handle = open("hello4.pdf","w")
handle.write(renderPDF.drawToString(drawing))
handle.close()
handle = open("hello4.ps","w")
handle.write(renderPS.drawToString(drawing))
handle.close()
handle = open("hello4.png","w")
handle.write(renderPM.drawToString(drawing, "PNG"))
handle.close()

print "Done"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 08:14:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 08:14:06 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231314.mBNDE64X019775@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 08:14 EST -------
(In reply to comment #0)
> (1) All the Bio.Graphics "write to file/handle" functions to accept any of the
> supported file formats (like Bio.Graphics.GenomeDiagram), which would require
> renderPM at run time for the bitmap formats (see Bug 2710).  They should share
> some code for mapping format names to ReportLab rendering module.  This would
> be easy to do without changing the existing mix of method names.

In addition, I notice that Bio.Graphics.BasicChromosome,
Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case
formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram expects
upper case.  We should be consistent, which for backwards compatibility would
mean accepting either case.

> (2) Update the docstrings for the "write to file/handle" functions to make it
> clear they can accept a filename OR a handle (a result of the underlying
> reportlab renderer's drawToFile function's behaviour - see note below).

I've updated the docstrings in CVS,

Bio/Graphics/BasicChromosome.py revision 1.3
Bio/Graphics/Comparative.py revision 1.2
Bio/Graphics/Distribution.py revision 1.3
Bio/Graphics/GenomeDiagram/Diagram.py revision 1.3


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From mjldehoon at yahoo.com  Wed Dec 24 05:52:48 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 24 Dec 2008 02:52:48 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <442447.52362.qm@web62407.mail.re1.yahoo.com>
Message-ID: <451304.38587.qm@web62407.mail.re1.yahoo.com>

Hi everybody,

How about the following for Biopython tests:

For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite". 

We could now do the following in run_tests.py:

1) import the testing module and save its output
2) try to call module.testing_suite
3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite.
4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output.

I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests.

Questions, comments, objections, anybody?

--Michiel.


--- On Thu, 12/4/08, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework
> To: "Brad Chapman" <chapmanb at 50mail.com>, "Peter" <biopython at maubp.freeserve.co.uk>
> Cc: biopython-dev at lists.open-bio.org
> Date: Thursday, December 4, 2008, 7:32 AM
> > Michiel de Hoon wrote:
> > > If one of the sub-tests fails, Python's unit
> > > testing framework will tell us so,
> > > though (perhaps) not exactly which sub-test
> fails.
> > > However, that is easy to
> > > figure out just by running the individual test
> script
> > > by itself.
> > 
> > That won't always work.  Consider intermittent
> network
> > problems, or tests using random data - in general it 
> > really is worthwhile having run_tests.py report a
> little
> > more than just which test_XXX.py module failed.
> >
> I wonder if Python's unit testing framework allows us
> to capture exactly which sub-test fails. I'll look into
> that. Ideally, it should be possible to have regular Python
> unit tests and Biopython-style print-and-compare tests side
> by side, and get information about failing sub-tests for
> both.
> 
> --Michiel.
> 
> 
>       
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From dalloliogm at gmail.com  Thu Dec 25 14:22:04 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Thu, 25 Dec 2008 20:22:04 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com>
References: <442447.52362.qm@web62407.mail.re1.yahoo.com>
	<451304.38587.qm@web62407.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com>

On Wed, Dec 24, 2008 at 11:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> How about the following for Biopython tests:
>
> For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite".

Merry Christmas!
Some people suggested me the nose python framework:
- http://somethingaboutorange.com/mrl/projects/nose/

It is used by many other open source projects, like sqlalchemy and elixir.
I haven't tried it but I think it does more or less everything you
said automatically, we could try to adopt it.


>
> We could now do the following in run_tests.py:
>
> 1) import the testing module and save its output
> 2) try to call module.testing_suite
> 3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite.
> 4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output.
>
> I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests.
>
> Questions, comments, objections, anybody?
>
> --Michiel.
>
>
> --- On Thu, 12/4/08, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> From: Michiel de Hoon <mjldehoon at yahoo.com>
>> Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework
>> To: "Brad Chapman" <chapmanb at 50mail.com>, "Peter" <biopython at maubp.freeserve.co.uk>
>> Cc: biopython-dev at lists.open-bio.org
>> Date: Thursday, December 4, 2008, 7:32 AM
>> > Michiel de Hoon wrote:
>> > > If one of the sub-tests fails, Python's unit
>> > > testing framework will tell us so,
>> > > though (perhaps) not exactly which sub-test
>> fails.
>> > > However, that is easy to
>> > > figure out just by running the individual test
>> script
>> > > by itself.
>> >
>> > That won't always work.  Consider intermittent
>> network
>> > problems, or tests using random data - in general it
>> > really is worthwhile having run_tests.py report a
>> little
>> > more than just which test_XXX.py module failed.
>> >
>> I wonder if Python's unit testing framework allows us
>> to capture exactly which sub-test fails. I'll look into
>> that. Ideally, it should be possible to have regular Python
>> unit tests and Biopython-style print-and-compare tests side
>> by side, and get information about failing sub-tests for
>> both.
>>
>> --Michiel.
>>
>>
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From mjldehoon at yahoo.com  Fri Dec 26 09:32:02 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 26 Dec 2008 06:32:02 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com>
Message-ID: <726361.18977.qm@web62402.mail.re1.yahoo.com>

--- On Thu, 12/25/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> Some people suggested me the nose python framework:
> - http://somethingaboutorange.com/mrl/projects/nose/
> 
> It is used by many other open source projects, like
> sqlalchemy and elixir.
> I haven't tried it but I think it does more or less
> everything you
> said automatically, we could try to adopt it.

If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work?

--Michiel.


From dalloliogm at gmail.com  Fri Dec 26 12:52:58 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 26 Dec 2008 18:52:58 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <726361.18977.qm@web62402.mail.re1.yahoo.com>
References: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com>
	<726361.18977.qm@web62402.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com>

On Fri, Dec 26, 2008 at 3:32 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Thu, 12/25/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> Some people suggested me the nose python framework:
>> - http://somethingaboutorange.com/mrl/projects/nose/
>>
>> It is used by many other open source projects, like
>> sqlalchemy and elixir.
>> I haven't tried it but I think it does more or less
>> everything you
>> said automatically, we could try to adopt it.
>
> If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work?

nose is a testing framework, so it is a dependency only for developers.
I have been able to install sqlalchemy and elixir (projects that make
use of nose) without having to install this framework first.

The docs on nose's website can explain its usage better than me.
Basically, you have to install nose (easy_install nose) and then run
it as a shell command (nosetests).
It automatically reads all the files in the current directory and
subdirectories, collects all the methods/classes/etc whose name begins
or ends with 'test_' (_test), plus any unittest, and execute them. It
can also read doctests, it is possible to write plugins and apply an
high degree of customization.
I tried to run it over the latest biopython cvs, and it already
highlighted some problems (a few modules still using Martel, etc).

I forgot to say that this project is also hosted on google/code:
- http://code.google.com/p/python-nose/
You can find more information in the docs:
- http://code.google.com/p/python-nose/wiki/FindingAndRunningTests


p.p.s. Even if it was a dependency, I think it is worth to use it
anyway, rather than rewriting existing code.

> --Michiel.
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From mjldehoon at yahoo.com  Fri Dec 26 16:40:57 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 26 Dec 2008 13:40:57 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com>
Message-ID: <590227.1906.qm@web62402.mail.re1.yahoo.com>

--- On Fri, 12/26/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> > If we use nose, does that mean adding another
> dependency to Biopython? If so, I don't think it's
> worth it. If not, how does this work?
> 
> nose is a testing framework, so it is a dependency only for
> developers.

If we use nose, can our users still run the Biopython tests (without having to install nose first)?

--Michiel.


From dalloliogm at gmail.com  Sat Dec 27 03:48:09 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 27 Dec 2008 09:48:09 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <590227.1906.qm@web62402.mail.re1.yahoo.com>
References: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com>
	<590227.1906.qm@web62402.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>

On Fri, Dec 26, 2008 at 10:40 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Fri, 12/26/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> > If we use nose, does that mean adding another
>> dependency to Biopython? If so, I don't think it's
>> worth it. If not, how does this work?
>>
>> nose is a testing framework, so it is a dependency only for
>> developers.
>
> If we use nose, can our users still run the Biopython tests (without having to install nose first)?

Yes, but they will have to do it manually, or with a wrapper script
(as it is now).

Basically, we will have to move every test in functions/classes with
names beginning with 'test_'. To be more precise, they should match
the regular expression '(?:^|[b_.-])[Tt]est' (it is also possible to
coustomize this regex).

So, if a test now is it like this:

if __name__ == '__main__':
    seq = Seq('sadasda')
    assert seq.tostring() == 'sadasda'

we will have to refactor it like this:

def _test():
    """test description"""
    seq = Seq('sadasda')
    assert seq.tostring() == 'sadasda'

if __name__ == '__main__':
    _test()   # this is optional


> --Michiel.
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From mjldehoon at yahoo.com  Sun Dec 28 11:04:14 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 28 Dec 2008 08:04:14 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
Message-ID: <877679.6134.qm@web62406.mail.re1.yahoo.com>

--- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> >> > If we use nose, does that mean adding another
> >> > dependency to Biopython? If so, I don't think 
> >> > it's worth it. If not, how does this work?
> >>
> >> nose is a testing framework, so it is a dependency
> >> only for developers.
> >
> > If we use nose, can our users still run the Biopython
> tests (without having to install nose first)?
> 
> Yes, but they will have to do it manually, or with a
> wrapper script (as it is now).

By manually, do you mean running each test separately by hand? If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework?

--Michiel.


From biopython at maubp.freeserve.co.uk  Sun Dec 28 11:51:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Dec 2008 16:51:58 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com>
References: <442447.52362.qm@web62407.mail.re1.yahoo.com>
	<451304.38587.qm@web62407.mail.re1.yahoo.com>
Message-ID: <320fb6e00812280851y32450bb9le505ae257726f497@mail.gmail.com>

On Wed, Dec 24, 2008 at 10:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Hi everybody,
>
> How about the following for Biopython tests:
>
> For Python's unittest-style test modules, Python's unittest documentation
> recommends to define a function in each test module that returns the
> test suite. Most Biopython tests that use the unittest framework already
> do this (the function is called "testing_suite".
>
> We could now do the following in run_tests.py:
>
> 1) import the testing module and save its output
> 2) try to call module.testing_suite
> 3) if it exists, then we're using Python's unittest framework.
> So we run the tests in the testing suite.
> 4) if it does not exist, then we're using the print-and-compare
> approach. So we compare the saved output from the test to the correct output.
>
> I think that this can be set up such that it looks like nothing has
> changed for the user, while the files containing the correct
> output are no longer needed for the unittest-based tests.
>
> Questions, comments, objections, anybody?

Sounds good to me - and doesn't add any new dependencies either.

Peter

From dalloliogm at gmail.com  Sun Dec 28 16:11:59 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sun, 28 Dec 2008 22:11:59 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <877679.6134.qm@web62406.mail.re1.yahoo.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>

On Sun, Dec 28, 2008 at 5:04 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> >> > If we use nose, does that mean adding another
>> >> > dependency to Biopython? If so, I don't think
>> >> > it's worth it. If not, how does this work?
>> >>
>> >> nose is a testing framework, so it is a dependency
>> >> only for developers.
>> >
>> > If we use nose, can our users still run the Biopython
>> tests (without having to install nose first)?
>>
>> Yes, but they will have to do it manually, or with a
>> wrapper script (as it is now).


> If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework?

The wrapper script won't be as efficient as using nose.
Writing a separated wrapper script will take much time and it will be
very difficult to mantain updated; moreover, you will have to test the
wrapper script itself, to prove that it works and doesn't alter the
results of the tests.

Nose is not a replacement for unittests: it is a tool that searches
for every unittest and script that look like a test, and execute it.
It has a few advantages more, for example it enables global methods
for setUp and tearDown, but it is not necessary to use them.


If you want to reorganize the biopython's testing infrastructure, then
you should think about adopting a serious testing environment, whether
it is nose or something else. You can't continue on relying on wrapper
scripts, they are too difficult to mantain and they are not really
scientifically valid.

The pygr project (another bioinformatics library in python) make use
of nose, and they explain how in their documentation:
- http://bioinformatics.ucla.edu/pygr_0_7_b3/testing-doc.html

Please have a look at the pages I have posted before.


> By manually, do you mean running each test separately by hand?

I mean they will have to be run in the same way as it is now.

Maybe, there is a way to use nose itself to create a wrapper script
automatically.
In fact, what nose does is to find all the functions that look like
tests, and then execute them. It should be possible to just save the
statements that are executed in a log file, that can be used as a
wrapper script.
If this option doesn't exists yet, we can just propose it to nose's developers.

In brief, I think it doesn't make sense to write a new testingg
framework just for biopython, when there are many already existing
tool available and free to use.


> --Michiel.
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From biopython at maubp.freeserve.co.uk  Sun Dec 28 19:18:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Dec 2008 00:18:22 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
	<5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
Message-ID: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>

Giovanni wrote:
>> nose is a testing framework, so it is a dependency
>> only for developers.

Requiring another external dependency does count against using nose -
it is much nicer if anyone installing Biopython from source can run
our test suite without having to install anything further.

Giovanni wrote:
> If you want to reorganize the biopython's testing infrastructure, then
> you should think about adopting a serious testing environment, whether
> it is nose or something else. You can't continue on relying on wrapper
> scripts, they are too difficult to mantain and they are not really
> scientifically valid.

I'm not sure I understand your point here (especially re difficult to
maintain and not scientifically valid).

I'm failry happy with the current test framework - I would rather see
any effort be spent on writing more tests under the current framework
than switching the framework itself.

Giovanni wrote:
> In brief, I think it doesn't make sense to write a new testingg
> framework just for biopython, when there are many already existing
> tool available and free to use.

We haven't been talking about writing a new test frame work (which I
agree isn't a good idea).  Rather we're talking about a modification
to the existing Biopython test framework (part of which uses the built
in python unittest library).  Michiel's proposal on 24th Dec seems
like it will simplify working with unittest based tests (especially
not having to track their trivial output in CVS/SVN).

Peter

From dalloliogm at gmail.com  Mon Dec 29 04:53:51 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 29 Dec 2008 10:53:51 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
	<5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
	<320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>
Message-ID: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com>

On Mon, Dec 29, 2008 at 1:18 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Giovanni wrote:
>>> nose is a testing framework, so it is a dependency
>>> only for developers.
>
> Requiring another external dependency does count against using nose -
> it is much nicer if anyone installing Biopython from source can run
> our test suite without having to install anything further.

As I was saying before, it will be not a dependency. It's an external
tool that you can use or not to execute the tests automatically.
Also, it is not a replacement for unittest. It is comparable to using
epydoc for the documentation.

> Giovanni wrote:
>> If you want to reorganize the biopython's testing infrastructure, then
>> you should think about adopting a serious testing environment, whether
>> it is nose or something else. You can't continue on relying on wrapper
>> scripts, they are too difficult to mantain and they are not really
>> scientifically valid.
>
> I'm not sure I understand your point here (especially re difficult to
> maintain and not scientifically valid).
>

The wrapper script itself is a program. Therefore, if you want to be
paranoid, you will have to test it too :)
It will be difficult to mantain because everytime you will have to
modify it to adapt to the new tests etc.
Many big opensource python project make use of this framework, and it
has already been proven to work correctly; so the quality of biopython
would be comparable with those existing projects.
Another projecty that make use of nose is pytables (hdf5 format
wrapper for python). They say they have some billions of tests :).

> I'm failry happy with the current test framework - I would rather see
> any effort be spent on writing more tests under the current framework
> than switching the framework itself.
>
> Giovanni wrote:
>> In brief, I think it doesn't make sense to write a new testingg
>> framework just for biopython, when there are many already existing
>> tool available and free to use.
>
> We haven't been talking about writing a new test frame work (which I
> agree isn't a good idea).  Rather we're talking about a modification
> to the existing Biopython test framework (part of which uses the built
> in python unittest library).  Michiel's proposal on 24th Dec seems
> like it will simplify working with unittest based tests (especially
> not having to track their trivial output in CVS/SVN).

Then you will have to develop a way to execute only some of the tests
(e.g. only those who doesn't make use of internet connection, or only
those who make use of a database).
You will need to write some methods for running some setUp and
tearDown methods globally.
You will have to verify your wrapper script works.
In short, you will end up with writing a tool which will be really
similar to nose. So, since this tool already exists now, you will save
a lot of time by using it.
Michel's proposal is good, but I am saying that there are already
tools that do the same thing automatically.

>
> Peter
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From biopython at maubp.freeserve.co.uk  Mon Dec 29 13:21:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Dec 2008 18:21:33 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
	<5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
	<320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>
	<5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com>
Message-ID: <320fb6e00812291021n297af797scaf7fd6ba1a7b048@mail.gmail.com>

>> We haven't been talking about writing a new test frame work (which I
>> agree isn't a good idea).  Rather we're talking about a modification
>> to the existing Biopython test framework (part of which uses the built
>> in python unittest library).  Michiel's proposal on 24th Dec seems
>> like it will simplify working with unittest based tests (especially
>> not having to track their trivial output in CVS/SVN).
>
> Then you will have to develop a way to execute only some of the tests
> (e.g. only those who doesn't make use of internet connection, or only
> those who make use of a database). ...

We already have that in place and working for our current framework.

> ... Michel's proposal is good, but I am saying that there are already
> tools that do the same thing automatically.

Well, let's go with Michiel's plan in the short term (a modification
to the current Biopython test framework, see his email of 24th
December).  We will then have a clear divide into two styles of unit
test:

(1) Those where the output is captured and compared to the expected
output (which will also be in CVS).  These are easy to write as
essentially any example Biopython script can be used.

(2) Those using the python unittest framework.  I think these are more
complicated and require a bit more effort and thought to write (and
debug), but make it very clear what exactly is being tested.

Peter

From mjldehoon at yahoo.com  Tue Dec 30 05:06:08 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 30 Dec 2008 02:06:08 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
Message-ID: <620107.65178.qm@web62401.mail.re1.yahoo.com>


--- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> Basically, we will have to move every test in
> functions/classes with
> names beginning with 'test_'. To be more precise,
> they should match
> the regular expression '(?:^|[b_.-])[Tt]est' (it is
> also possible to
> coustomize this regex).
> 
> So, if a test now is it like this:
> 
> if __name__ == '__main__':
>     seq = Seq('sadasda')
>     assert seq.tostring() == 'sadasda'
> 
> we will have to refactor it like this:
> 
> def _test():
>     """test description"""
>     seq = Seq('sadasda')
>     assert seq.tostring() == 'sadasda'
> 
> if __name__ == '__main__':
>     _test()   # this is optional

Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this.

--Michiel


From dalloliogm at gmail.com  Tue Dec 30 08:53:34 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 30 Dec 2008 14:53:34 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <620107.65178.qm@web62401.mail.re1.yahoo.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<620107.65178.qm@web62401.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>

On Tue, Dec 30, 2008 at 11:06 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>
>
> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> Basically, we will have to move every test in
>> functions/classes with
>> names beginning with 'test_'. To be more precise,
>> they should match
>> the regular expression '(?:^|[b_.-])[Tt]est' (it is
>> also possible to
>> coustomize this regex).
>>
>> So, if a test now is it like this:
>>
>> if __name__ == '__main__':
>>     seq = Seq('sadasda')
>>     assert seq.tostring() == 'sadasda'
>>
>> we will have to refactor it like this:
>>
>> def _test():
>>     """test description"""
>>     seq = Seq('sadasda')
>>     assert seq.tostring() == 'sadasda'
>>
>> if __name__ == '__main__':
>>     _test()   # this is optional
>
> Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this.


Why do you find it difficult?
You just have to rename every test to make sure that its name starts
or end with 'test_'. That's all.
If you want to reorganize biopython's testing framework, this is a
good thing to do anyway.

In particular, every test function/class/script name should match the
regular expression '(?:^|[b_.-])[Tt]est' (it can be customized).
Unittest modules and doctest will be recognized, too.
Note that nose already works if you run it over biopython's cvs; but
since I am not familiar with biopython's code, I am not sure it
recognizes every test.

Ehm, this example that I put won't work with the default settings :/
it expected 'test_module' or something like this (anyway, the regex
can be customized).


> --Michiel
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From biopython at maubp.freeserve.co.uk  Tue Dec 30 12:29:06 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Dec 2008 17:29:06 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<620107.65178.qm@web62401.mail.re1.yahoo.com>
	<5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>
Message-ID: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com>

> You just have to rename every test to make sure that its name starts
> or end with 'test_'. That's all.
> If you want to reorganize biopython's testing framework, this is a
> good thing to do anyway.

All the individual Biopython test scripts are named test_*.py anyway,
so that should be fine.  Those test scripts were we have to verify the
output probably won't work in nose (this is handled via our
run_test.py framework), but the rest of our test scripts being
unittest based might already be fine with nose.

Peter

From dalloliogm at gmail.com  Tue Dec 30 13:34:15 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 30 Dec 2008 19:34:15 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<620107.65178.qm@web62401.mail.re1.yahoo.com>
	<5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>
	<320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com>
Message-ID: <5aa3b3570812301034i5c007d92k17a8e55c61b5715@mail.gmail.com>

On Tue, Dec 30, 2008 at 6:29 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> You just have to rename every test to make sure that its name starts
>> or end with 'test_'. That's all.
>> If you want to reorganize biopython's testing framework, this is a
>> good thing to do anyway.
>
> All the individual Biopython test scripts are named test_*.py anyway,
> so that should be fine.  Those test scripts were we have to verify the
> output probably won't work in nose (this is handled via our
> run_test.py framework), but the rest of our test scripts being
> unittest based might already be fine with nose.

I think it executes also the run_test.py scripts, because its name
matches that regular expression.

> Peter
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From dalloliogm at gmail.com  Tue Dec 30 13:34:45 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 30 Dec 2008 19:34:45 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
References: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
	<45956.75241.qm@web62406.mail.re1.yahoo.com>
	<320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
Message-ID: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>

On Fri, Nov 28, 2008 at 12:09 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> Brad wrote:
>> Agreed with the distinction between the unit tests and the "dump
>> lots of text and compare" approach. I've written both and do think
>> the unit testing/assertion model is more robust since you can go
>> back and actually get some insight into what someone was thinking
>> when they wrote an assertion.
>
> I have probably written more of the "dump lots of text and compare"
> style tests.  I think these have a number of advantages:
> (1) Easier for beginneers to write a test, you can almost take any
> example script and use that.  You don't have to learn the unit test
> framework.

I agree with what you say, but I think that all the 'dump and compare'
tests should be organized in various functions.
This will make easier to use and understand them, and they will be
compatible with the nose framework.

> (2) Debugging a failing test in IDLE is much easier - using unit tests
> you have all that framework between you and the local scope where the
> error happens.

> (3) For many broad tests, manually setting up the expected output for
> an assert is extremely tedious (e.g. parsing sequences and checking
> their checksums).

This is an interesting discussion if you want to talk about it a bit.

An advantage of unittest are the two setUp and tearDown methods (fixtures).
With those, you are sure that all the tests are run with the right
environment and that all variables are dropped before executing a new
test.

Also, if you want to do a lot of dump and compare tests, consider
writing some big doctest scripts.
It will require a bit more of work to write them, but they will be
easier to understand, and they will also become good tutorials for the
users.

This is a tutorial we wrote for a small project not related to biopython:
- http://github.com/cswegger/datamatrix/tree/master/tutorial.txt
As you can see, the text is both a tutorial and a test set (which make
use of a dump and compare approach) for the program.

> We could discuss a modification to run_tests.py so that if there is no
> expected output file output/test_XXX for test_XXX.py we just run
> test_XXX.py and check its return value (I think Michiel had previously
> suggested something like this).

I think this should be done inside the test itself.
All the tests should return only a boolean value (passed or not) and a
description of the error.
The tests that make use of an expected output file, they should open
it and do the comparison by theirselves, not in run_tests.py.

> Perhaps for more robustness, capture
> the output and compare it to a predefined list of regular expressions
> covering the typical outputs.  For example, looking at
> output/test_Cluster, the first line is the test name, but rest follows
> the patten "test_... ok". I imaging only a few output styles exist.

mmm have you changed this file in the cvs recently? I can't find what
you are referring to.

> With such a change, half the unit test's (e.g. test_Cluster.py)
> wouldn't need their output file in CVS (output/test_Cluster).
>
> Michiel de Hoon wrote:
>> If one of the sub-tests fails, Python's unit testing framework will tell us so,
>> though (perhaps) not exactly which sub-test fails. However, that is easy to
>> figure out just by running the individual test script by itself.
>
> That won't always work.  Consider intermittent network problems, or
> tests using random data - in general it really is worthwhile having
> run_tests.py report a little more than just which test_XXX.py module
> failed.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From biopython at maubp.freeserve.co.uk  Tue Dec 30 18:33:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Dec 2008 23:33:16 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>
References: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
	<45956.75241.qm@web62406.mail.re1.yahoo.com>
	<320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
	<5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>
Message-ID: <320fb6e00812301533h55f5e9eehcec69cc1d5913420@mail.gmail.com>

Brad wrote:
>>> Agreed with the distinction between the unit tests and the "dump
>>> lots of text and compare" approach. I've written both and do think
>>> the unit testing/assertion model is more robust since you can go
>>> back and actually get some insight into what someone was thinking
>>> when they wrote an assertion.

Peter worte:
>> I have probably written more of the "dump lots of text and compare"
>> style tests.  I think these have a number of advantages:
>> (1) Easier for beginners to write a test, you can almost take any
>> example script and use that.  You don't have to learn the unit test
>> framework.
>> ...

Giovanni wrote:
> I agree with what you say, but I think that all the 'dump and compare'
> tests should be organized in various functions.
> This will make easier to use and understand them, and they will be
> compatible with the nose framework.

If we organise the "dump and compare" tests into various functions
(e.g. using the unittest framework), and turn print statements into
asserts etc, then yes they would become nose compatible.  However,
this is a lot of work, and for relatively little gain.  Also, doing so
we lose the simplicity (e.g. my points made earlier) and make it
harder for newcomers to write further tests.

Nevertheless, we could regard Michiel's plan of 24 Dec as a step
towards this, in that it simplifies writing unittest based tests (in
that they won't need an expected output file which must also be kept
in CVS/SVN).

I'm not sure what you meant by "This will make easier to use and
understand them, ...".  Switching the unit test coding style makes no
difference to the end user's point of view, they run the test suite
using "python setup.py test" (typically as part of installation from
source, or from the tests directory using "python run_tests.py") and
won't see any difference in how the tests work internally.

In terms of understanding the unit tests: If you are a beginner
wanting to look at a unit test to give a feel for how to use the code,
then frankly those of our unit tests which simple do some imports and
print some output are MUCH easier to understand.  By their nature they
are essentially example Biopython scripts.  On the other hand, those
of our unit tests using the unittest framework have all these each
object classes defined, and split up the setup/clean up into separate
methods etc.  In some senses this is "clutter" which is not helpful if
you want to regard the unit test also as a usage example.

>> (2) Debugging a failing test in IDLE is much easier - using unit tests
>> you have all that framework between you and the local scope where the
>> error happens.
>
>> (3) For many broad tests, manually setting up the expected output for
>> an assert is extremely tedious (e.g. parsing sequences and checking
>> their checksums).
>
> This is an interesting discussion if you want to talk about it a bit.

It could be, but I don't want to get side tracked (distracted) from
pressing ahead with Michiel's plan (the email of 24th Dec, or
something similar) which seems to be a worthwhile small improvement to
the current status.

> An advantage of unittest are the two setUp and tearDown methods (fixtures).
> With those, you are sure that all the tests are run with the right
> environment and that all variables are dropped before executing a new
> test.

For some tests, yes, this is useful - in particular where there are
lots of independent small things you want to test.  In other
situations you want to test a work flow, with a series of cumulative
steps each building on each other.  This would end up as a single
large test function/method.

> Also, if you want to do a lot of dump and compare tests, consider
> writing some big doctest scripts.
> It will require a bit more of work to write them, but they will be
> easier to understand, and they will also become good tutorials for the
> users.

Certainly some of the current simple "dump and compare" tests might be
converted into doctests (and we could do this within the current
Biopython framework).  However, the requirements for good
documentation and good test coverage differ - you'd want to include
tests for atypical code which you would not want to encourage as good
coding practice.  I'm quite keen for further usage of doctests - but I
see them primarily as an improvement to our documentation.

Peter wrote:
>> We could discuss a modification to run_tests.py so that if there is no
>> expected output file output/test_XXX for test_XXX.py we just run
>> test_XXX.py and check its return value (I think Michiel had previously
>> suggested something like this).

Note that Michiel's email of 24th Dec is another approach to this
topic - either would work, but his plan makes the division between the
two test types much more explicit.

Giovanni wrote:
> I think this should be done inside the test itself.
> All the tests should return only a boolean value (passed or not) and a
> description of the error.
> The tests that make use of an expected output file, they should open
> it and do the comparison by theirselves, not in run_tests.py.

Your plan would work, but it means the simplicity of this style of
unit test is lost.  Rather than doing this change (which would be a
moderate amount of tedious work), I would rather go all the way and
make them unittest based like the rest of our test suite.

>> Perhaps for more robustness, capture
>> the output and compare it to a predefined list of regular expressions
>> covering the typical outputs.  For example, looking at
>> output/test_Cluster, the first line is the test name, but rest follows
>> the patten "test_... ok". I imaging only a few output styles exist.
>> With such a change, half the unit test's (e.g. test_Cluster.py)
>> wouldn't need their output file in CVS (output/test_Cluster).
>
> mmm have you changed this file in the cvs recently? I can't find what
> you are referring to.

For this example, the unit test Tests/test_Cluster.py is here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/test_Cluster.py?cvsroot=biopython

Its expected output file Test/output/test_Cluster is here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/output/test_Cluster?cvsroot=biopython

Peter

From bsouthey at gmail.com  Mon Dec  1 02:37:05 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Sun, 30 Nov 2008 20:37:05 -0600
Subject: [Biopython-dev] Deprecation and removal policy
In-Reply-To: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>
References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>
Message-ID: <bbcd77d00811301837qf6e7909x18b09f423c55a800@mail.gmail.com>

On Fri, Nov 28, 2008 at 11:26 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Back on 27 June 2008, in preparation for what became Biopython 1.47,
> Michiel wrote:
>> In recent releases, we have been using the rule of thumb to remove all
>> modules from a new Biopython release that were deprecated two
>> releases ago.
>
> I was thinking that when we made releases about six months apart, this
> rule of thumb effectively gave a year's warning.  Recently we're made
> releases roughly every three months, which translates to only about
> six months warning, so I think we should be a little more restrained
> in removing deprecated code in future.
>
> As an example, Bio.EUtils was deprecated in favour of Bio.Entrez in
> Release 1.48 (Sept 2009).  Under the old rule of thumb, we could
> remove this module from CVS now (as the deprecation was present in
> Biopython 1.48 and 1.49).  If we release Biopython 1.50 in January or
> February 2009 (for the sake of argument), that means the deprecation
> would have been in place for only four or five months - which seems
> too rash.
>
> How about a new policy that after adding a deprecation warning,
> deprecated modules/functions are kept for at least two public releases
> AND at least 12 months (counting from the first release when they are
> deprecated - not the date of the CVS change) before being removed?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

Hi,
Generally I would agree with idea for code that is under active
development. For certain code that has not really been touched for a
few years except for trivial changes (like removing string functions),
 I think 12 months is perhaps too long if it passes two releases.

Regardless of how it is done, Python 3 will need to be supported (the
final release is due soon) and I do not see a reason to port
depreciated modules or functions just because of some policy.  So I
would add the provision that depreciated code will not be ported to
the Python 3 compatible Biopython branch.

Bruce


From biopython at maubp.freeserve.co.uk  Mon Dec  1 12:56:12 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 1 Dec 2008 12:56:12 +0000
Subject: [Biopython-dev] Deprecation and removal policy
In-Reply-To: <bbcd77d00811301837qf6e7909x18b09f423c55a800@mail.gmail.com>
References: <320fb6e00811280926v16454fa6t891fcc74e4fa4729@mail.gmail.com>
	<bbcd77d00811301837qf6e7909x18b09f423c55a800@mail.gmail.com>
Message-ID: <320fb6e00812010456r9ae1a66p66032d02377003db@mail.gmail.com>

Peter wrote:
>> ...
>> How about a new policy that after adding a deprecation warning,
>> deprecated modules/functions are kept for at least two public releases
>> AND at least 12 months (counting from the first release when they are
>> deprecated - not the date of the CVS change) before being removed?

Bruce wrote:
>
> Hi,
> Generally I would agree with idea for code that is under active
> development. For certain code that has not really been touched for a
> few years except for trivial changes (like removing string functions),
> I think 12 months is perhaps too long if it passes two releases.

Just because some (deprecated) code hasn't been changed in several
years doesn't mean no-one is using it.  Giving less warning for
removing such old but stable code isn't fair.

> Regardless of how it is done, Python 3 will need to be supported (the
> final release is due soon) and I do not see a reason to port
> depreciated modules or functions just because of some policy.  So I
> would add the provision that depreciated code will not be ported to
> the Python 3 compatible Biopython branch.

I disagree - dropping old modules is changing the API, counter to
Guido and other's recommendation/request: "Don't change your APIs
incompatibly when porting to Py3k."
http://www.artima.com/weblogs/viewpost.jsp?thread=227041

If porting any particular deprecated module or piece of code to Python
3 proved too difficult, then maybe we might drop that code (for
example, due to third party dependencies on an obsolete version of
mxTextTools, I don't think we'll port Martel/Mindy to Python 3).

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:36:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:36:33 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011536.mB1FaXWF003857@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:36 EST -------
Unit Test
=========
The unit test included, test_GenomeDiagram.py adds yet another GenBank file to
the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB
is best avoided.  I would prefer we used existing GenBank files already
included in Biopython which would serve just as well.  e.g.

GenBank/NC_005816.gb file (Yersinia pestis biovar Microtus str. 91001 plasmid
pPCP1) which is circular.  9609 bp.

GenBank/arab1.gb (Arabidopsis thaliana BAC T25K16 from chromosome I) which is
linear.  86436 bp.

Also, the code to parse the GenBank file does so via Bio.GenBank, and I would
prefer to use Bio.SeqIO here.

I'll attach a revised version shortly...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:40:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:40:22 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011540.mB1FeMWx004105@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:40 EST -------
Bio.Graphics.GenomeDiagram.Utilities
====================================
This is a collection of utilities for getting information useful for graph
values.  From the docstring,

    o apply_to_window (sequence, window_size, function, step=None)  Apply a
                        passed function to fragments of the passed sequence of
                        size window_size, with each window separated by the
                        passed step.

    o calc_gc_content (sequence)    Returns the %GC content of a passed
sequence

    o calc_at_content (sequence)    Returns the %AT content of a passed
sequence

    o calc_gc_skew (sequence)    Returns the GC skew of a passed sequence

    o calc_at_skew (sequence)    Returns the AT skew of a passed sequence

    o gc_content (sequence, window_size, step=None)    Returns the %GC content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_content (sequence, window_size, step=None)    Returns the %AT content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o gc_skew (sequence, window_size, step=None)    Returns the GC skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_skew (sequence, window_size, step=None)    Returns the AT skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

I can see why these were useful when GenomeDiagram was a separate package, but
I don't think we should add this file to Biopython as it is unnecessary code
duplication.  If we do lack any of this functionality, putting it somewhere
under Bio.SeqUtils makes more sense than under Bio.Graphics.

I have not looked at any implications this may have for the existing
documentation or the GenomeDiagram unit test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:47:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:47:01 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011547.mB1Fl1qY004683@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:47 EST -------
Bio.Graphics.GenomeDiagram.DrawAll
==================================
According to the comments, this is a script to walk a directory structure below
the directory passed, and draw images of each .gbk file found there.

While useful, I don't think this belongs in the core library.  Maybe rename it
and move it into our scripts or example directory instead...

Bio.Graphics.GenomeDiagram.Utilities
====================================
This is a collection of utilities for getting information useful for graph
values.  From the docstring,

    o apply_to_window (sequence, window_size, function, step=None)  Apply a
                        passed function to fragments of the passed sequence of
                        size window_size, with each window separated by the
                        passed step.

    o calc_gc_content (sequence)    Returns the %GC content of a passed
sequence

    o calc_at_content (sequence)    Returns the %AT content of a passed
sequence

    o calc_gc_skew (sequence)    Returns the GC skew of a passed sequence

    o calc_at_skew (sequence)    Returns the AT skew of a passed sequence

    o gc_content (sequence, window_size, step=None)    Returns the %GC content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_content (sequence, window_size, step=None)    Returns the %AT content
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o gc_skew (sequence, window_size, step=None)    Returns the GC skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

    o at_skew (sequence, window_size, step=None)    Returns the AT skew
                    of a passed sequence in windows of the passed size,
                    separated by the passed step size

I can see why these were useful when GenomeDiagram was a separate package, but
I don't think we should add this file to Biopython as it is unnecessary code
duplication.  If we do lack any of this functionality, putting it somewhere
under Bio.SeqUtils makes more sense than under Bio.Graphics.

I have not looked at any implications this may have for the existing
documentation or the GenomeDiagram unit test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:49:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:49:14 -0500
Subject: [Biopython-dev] [Bug 2677] BioSQL seqfeature enhancements
In-Reply-To: <bug-2677-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011549.mB1FnEB8004888@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2677


------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 10:49 EST -------
(In reply to comment #10)
> Bio.Graphics.GenomeDiagram.Utilities
> ====================================
> This is a collection of utilities for getting information useful for graph
> values.  From the docstring, ...

Sorry - ignore this comment, it should have been on Bug 2671.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:51:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:51:19 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011551.mB1FpJNU005019@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #13 from lpritc at scri.sari.ac.uk  2008-12-01 10:51 EST -------
(In reply to comment #11)
> Unit Test
> =========
> The unit test included, test_GenomeDiagram.py adds yet another GenBank file to
> the test suite, NC_005213.gb (Nanoarchaeum equitans, 490885 bp) which at 1.2 MB
> is best avoided.  I would prefer we used existing GenBank files already
> included in Biopython which would serve just as well.

That's a good idea.

> Also, the code to parse the GenBank file does so via Bio.GenBank, and I would
> prefer to use Bio.SeqIO here.

I noticed that in revising the documentation, but hadn't got around to doing
anything about it, except in the example code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 15:59:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 10:59:35 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011559.mB1FxZwH005670@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #14 from lpritc at scri.sari.ac.uk  2008-12-01 10:59 EST -------
(In reply to comment #12)
> Bio.Graphics.GenomeDiagram.DrawAll
> ==================================
> According to the comments, this is a script to walk a directory structure below
> the directory passed, and draw images of each .gbk file found there.
> 
> While useful, I don't think this belongs in the core library.  Maybe rename it
> and move it into our scripts or example directory instead...

Ah.  I thought I'd left that one out.  I was picturing perhaps having a
Utilities.py module containing a function with that behaviour, and/or functions
that drew a standard representation of a GenBank file, so that those who are
not interested in the minutiae of the API/drawing their diagrams could still
get a fair amount of function for little effort.

On reflection, these functions are perhaps better suited to living in
__init__.py.  What do you think?

> Bio.Graphics.GenomeDiagram.Utilities
> ====================================
> This is a collection of utilities for getting information useful for graph
> values,

> I can see why these were useful when GenomeDiagram was a separate package, but
> I don't think we should add this file to Biopython as it is unnecessary code
> duplication.  If we do lack any of this functionality, putting it somewhere
> under Bio.SeqUtils makes more sense than under Bio.Graphics.

Where there is repetition of function here, I'm happy to go with established
Biopython code in preference.  For graph data, GenomeDiagram expects a list of
(position, value) tuples, which the functions in Utilities.py supply directly. 
There will be a level of user-processing required in moving to the Biopython
versions.  Perhaps the inclusion of similar functions in __init__ that wrap the
Biopython versions to produce the appropriate format for graphs would be useful
here?

> I have not looked at any implications this may have for the existing
> documentation or the GenomeDiagram unit test.

Removing Utilities.py outright will affect both the documentation and the unit
test.  Both require those functions (or something similar) to generate
test/example graph data.

I would be happy to replace the existing functions with wrapped Biopython
functions in __init__ - does this seem like a sensible option?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 16:59:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 11:59:50 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011659.mB1GxoGa009013@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1063 is|0                           |1
           obsolete|                            |
Attachment #1121 is|0                           |1
           obsolete|                            |


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 11:59 EST -------
Created an attachment (id=1132)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1132&action=view)
Zip of python files to go under Bio/Graphics/GenomeDiagram

This attachment is just the main python files, omitting DrawAll.py and
Utilities.py (see comment 12 and comment 14).  The unit test needs updating to
match (but then passes, updated version to follow).

(In reply to comment #0)
> Code for wx widgets has been removed, although the Observer/Observable code
> remains, allowing user widgets to hook into the code, if that's desirable.

There was a tiny bit of wx stuff still there in Diagram.py which I have removed
in this version.

After discussion with Leighton directly, due to possible uncertainly over the
licensing of the Observer/Observable code (originally based on an example by
Peter Norvig) this has been removed, together with the associated "set" methods
in Diagram.py etc.  This code was intended to assist using GenomeDiagram within
a GUI.

Note that if we later want to reintroduce this functionality, using python's
property feature (with get/set functions) would allow the set function to
update the observer.  Leighton's old code would only update the observer if the
set method was used explicitly (and not if the object property were updated
directly).

(In reply to comment #6)
> I am perfectly happy with re-licensing the GD code under the Biopython
> license. If you need a gpg-signed document to say so, I can provide one ;)

I've updated the header of each file to reflect the Biopython license.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 17:20:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 12:20:57 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812011720.mB1HKvIJ010157@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-01 12:20 EST -------
Created an attachment (id=1133)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1133&action=view)
Revised test_GenomeDiagram.py

This uses the existing GenBank/arab1.gb file for input.

It also includes a (slightly modified) copy of the GenomeDiagram.Utilities
functions as a short term solution to the issues raised in comment 12 and
comment 14.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 20:01:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 15:01:44 -0500
Subject: [Biopython-dev] [Bug 2693] New: LogisticRegression convergence
	criterion is too lenient
Message-ID: <bug-2693-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2693

           Summary: LogisticRegression convergence criterion is too lenient
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


In R and SAS, the example in the code and tutorial provides the following
parameters:

Intercept =  18.9622
x1        =  -0.0714
x2        =   0.0444

By default, Bio/LogisticRegression.py defines the following parameters
    MAX_ITERATIONS = 500
    CONVERGE_THRESHOLD = 0.01

The convergence threshold is too lenient so the iterations terminate before the
expected values are obtained. Using more stringent criteria (CONVERGE_THRESHOLD
= 0.000000001) permits convergence to the R/SAS values provided MAX_ITERATIONS
is greater than 7761 with my system.

MAX_ITERATIONS and CONVERGE_THRESHOLD are fixed within
Bio/LogisticRegression.py module but should be part of the API for the train
function such as:
def train(xs, ys, update_fn=None, typecode=None, CONVERGE_THRESHOLD =
0.000000001, MAX_ITERATIONS=10000):

Note the algorithm used requires a large number of iterations and the train
function does not display the degree of convergence attained when
MAX_ITERATIONS is exceeded.

Jeffrey Whitaker provides Python code using an alternative algorithm: 
http://www.cdc.noaa.gov/people/jeffrey.s.whitaker/python/logistic_regression.py

Furthermore, the update_fn should also pass the previous likelihood or
difference is likelihood so the actual convergence can be seen. Really the
update_fn should be more general than this and be able to display more
information but the attached patches provides the previous llh (old_llik).
def show_progress(iteration, old_llh, loglikelihood):
    print "Iteration:", iteration, "Old", old_llh, "Log-likelihood function:",
loglikelihood, "Diff:", (old_llh-loglikelihood)

model = LogisticRegression.train(xs, ys, update_fn=show_progress)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec  1 20:03:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 1 Dec 2008 15:03:27 -0500
Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion
	is too lenient
In-Reply-To: <bug-2693-42@http.bugzilla.open-bio.org/>
Message-ID: <200812012003.mB1K3Rqg017974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2693


------- Comment #1 from bsouthey at gmail.com  2008-12-01 15:03 EST -------
Created an attachment (id=1134)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1134&action=view)
Improvements to LogisticRegression.py

Addresses certain problems with LogisticRegression.py and enhances the module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bartek at rezolwenta.eu.org  Mon Dec  1 20:53:59 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Mon, 1 Dec 2008 21:53:59 +0100
Subject: [Biopython-dev] [BioPython]  Refactoring motif analysis code
In-Reply-To: <492ACE38.1090301@gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
Message-ID: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>

Hi all,

I've done some work regarding the motif analysis in Biopython. I've
done the following stuff:
- refactored the Bio.AlignAce and Bio.MEME to use one common motif object
- Put all of the refactored code in the Bio.Motif directory
- Added more code (from my attic) to do motif comparisons and
computing thresholds
  (this was actually written by my colleague Norbert Dojer, but I
adapted it and I have his permission to contribute the code)
- written a short tutorial on the usage of Bio.Motif (that's where I'd put it).
- Written a basic test suite for the new motif.

I haven't added it to cvs yet, but posted it as an attchment to the
enhancement proposal in bugzilla:
http://bugzilla.open-bio.org/show_bug.cgi?id=2694

I have cvs access, so I can commit the changes myself, but I'd like to
wait for an "OK" from someone more involved in the release process.

Since Giovanni and Bruce have responded to my previous call for comments,
I'll  try to answer them below:

On Mon, Nov 24, 2008 at 4:54 PM, Bruce Southey <bsouthey at gmail.com> wrote:

>
> Actually I am not that thrilled with the licenses for these packages and
> similar packages because these are free only for academic use. To me this
> clashes with the spirit of an open-sourced project especially a BSD-licensed
> one. But if there is a need for such modules then these modules should be
> included.
>

I have similar feelings about the "academic-use-only" licenses. On the
other hand,
since most of the biopython users are in academia, then I don't see it
as a big problem.
Also, since I don't have any truly open and free replacement for these
programs, I think
it's better to keep them. In fact the new Bio.Motif package provides
some methods for motif
comparisons, which at least to some extent can be used as a
replacement for the respective
functions of CompareACE and MAST.

As a side note, I think that there is no point in providing parsers
for every single motif finder that
comes out, and I don't think that AlignAce and MEME are the best or
the most representative ones.
It just happened that these parsers were written "to scratch someone's
itch". I think that the other
functionality (motif searching, comparisons,weblogo) might be more
useful to people.

> While it is only free for academic use, have you seen TAMO?
> *TAMO: a flexible, object-oriented framework for analyzing transcriptional
> regulation using DNA-sequence motifs. *
> Bioinformatics. 2005 Jul 15;21(14):3164-5.
> <http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/14/3164>
>
> http://fraenkel.mit.edu/TAMO/

Yes, I've seen it and I've even recommended it on the biopython
mailing list when there was no
 replacement in biopython. However, their library is free only for
academia and AFAIK it's not using
biopython datastructures, so needs some work to integrate with TAMO if
you are using Biopython.
Bio.Motif is meant to provide free software for Motif analysis.

> Well, I am not sure how many used Bio.AlignAce given the Parser.py bug :-)
> Based on the CVS, both have been untouched for about three years.
>
Well, I've not used it myself for a while... I'm no longer doing
de-novo motif discovery.
However, it still works so it's potentially useful. I think this is
largely due to the lack of documentation
for the Bio.AlignAce and Bio.MEME tools (partially my fault).
Hopefully people will start using this
if they read the tutorial.

> Also, what species are these used for?
> One of the papers of AlignAce indicate that the base composition was set for
> yeast.
>
They're both general purpose, you can set the gc content for alignAce
and even an HMM for MEME.

>
> Personally I would be interested in a general protein motif finding module
> because of my current research. However, I do have a different view with
> respect to the Biopython community as indicated above with the licenses.

Both MEME and AlignAce can be used to find motifs in proteins, but it
has not so much to do
with Bio.Motif, since it does not provide any motif-finnding
capabilities by itself. In general Bio.Motif
should be able to deal with protein motifs, but I've never tested it
(I'm mostly using it for DNA motifs),
 so I'll be happy to help if you find bugs.

On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
>
> I would just like to tell you that I have tried the TAMO framework you
> suggested me, and found it very useful.

Yes, I remember, but the problem is with the TAMO license. I think
that the Motif object might be still
useful since it is free, allows to read motifs from databases like
JASPAR to scan sequences  and/or
compare them with "your" motifs.


> I am not using it anymore because I don't need it, but I remember that I liked:
> - the methods to represent motifs as matrixes of frequencies/occurrencies etc..
done
> - the fact that it was easy to create a motif from an alignment of sequences
depending on your definition of easy, it's there
> - the integration it had with this website:
> http://weblogo.berkeley.edu/logo.cgi.
done

> I would suggest you to provide integration with this other web
> service, which enable to plot the difference between two sequence
> logos: http://www.twosamplelogo.org/examples.html.

This I haven't done yet, but I'll try to provide functionality for
that (shouldn't take too long).

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From dalloliogm at gmail.com  Mon Dec  1 21:07:08 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 1 Dec 2008 22:07:08 +0100
Subject: [Biopython-dev] [BioPython] Refactoring motif analysis code
In-Reply-To: <8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
	<8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
Message-ID: <5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com>

On Mon, Dec 1, 2008 at 9:53 PM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:

> On Mon, Nov 24, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>>
>> I would just like to tell you that I have tried the TAMO framework you
>> suggested me, and found it very useful.
>
> Yes, I remember, but the problem is with the TAMO license. I think
> that the Motif object might be still
> useful since it is free, allows to read motifs from databases like
> JASPAR to scan sequences  and/or
> compare them with "your" motifs.

Thanks for all these changes.
I remember that I wrote a mail to TAMO's authors when I was using it.
They seemed to be interested in integrating the code with biopython,
so maybe the license issue could be superated.
It's up to you, whether you want to reimplement all the functions they
have or not.


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From bartek at rezolwenta.eu.org  Tue Dec  2 09:39:37 2008
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Tue, 2 Dec 2008 10:39:37 +0100
Subject: [Biopython-dev]  Refactoring motif analysis code
In-Reply-To: <8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
	<8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
	<5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com>
	<8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com>
Message-ID: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com>

On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:

> Thanks for all these changes.
> I remember that I wrote a mail to TAMO's authors when I was using it.
> They seemed to be interested in integrating the code with biopython,
> so maybe the license issue could be superated.
> It's up to you, whether you want to reimplement all the functions they
> have or not.

I have to say I haven't done anything yet towards integrating TAMO
with biopython.
So far, my own code was doing the job for me, and since there was a
certain learning curve to get into TAMO,
I didn't look closely into it. I've looked more carefully now at it
and I have two general thoughts:
- There is a number of features in TAMO, for which there is no
counterpart in Bio.Motif. Just by looking at module names I've found:
 - MDscan parser
 - their own EM motif finding scheme (some kind of EM method)
 - several motif comparison functions from MotifCompare
 - a lot of nice little methods for motifs like textLogo, giflogo, etc.
- There is quite an overlap between biopython and TAMO. They
implemented their own Sequence handling, FASTA Parser, clustering
module etc.  There will be some gruntwork with integrating their code
into Biopython (findining and reconciling the overlaps)

I also have to say, that I'm a bit scared by copright statements in
the TAMO code, saying it belongs to the Whitehead institute. I don't
want to be overly pessimistic, but the process of releasing this code
under biopython license might be slow.

What I think is the best way to go is to clean up current mess with
Bio.Alignace and Bio.MEME, and then ask people for contributions.
If TAMO developers would be willing to contribute I'll be happy to
help with integration into biopython. It will take some time anyway,
so I wouldn't delay the inclusion of Bio.Motif into Biopython.

cheers
Bartek


-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From timothyham at gmail.com  Wed Dec  3 00:19:48 2008
From: timothyham at gmail.com (Timothy Ham)
Date: Tue, 2 Dec 2008 16:19:48 -0800
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
Message-ID: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>

Hi everyone,

The current biopython GenBank parser dies while parsing VectorNTI
generated files.  For example, until recently, BioPython did not
accept an empty SOURCE field. It still does not handle an empty
VERSION or ACCESSION fields (consumer.data.id never gets filled),
which is the default for user generated vector maps via VectorNTI.

Now, it is easy enough to change the GenBank parser to handle
malformed genbank files, (I can submit patches) but the real question
becomes:
> Should BioPython handle malformed genbank files at all?
I would like to be practical and say yes, since VectorNTI is a very
common, widely used format, but I wanted to ask the community before
submitting my patches.

Thanks for the great work,
Tim


From bsouthey at gmail.com  Wed Dec  3 02:33:26 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 2 Dec 2008 20:33:26 -0600
Subject: [Biopython-dev] Refactoring motif analysis code
In-Reply-To: <8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com>
References: <8b34ec180811240651k45c11563p9e3dd18ba128f0ac@mail.gmail.com>
	<492ACE38.1090301@gmail.com>
	<8b34ec180812011253p28a08a0bv43cd72369062b39b@mail.gmail.com>
	<5aa3b3570812011307q710cab78q2fbae061f5dd5eff@mail.gmail.com>
	<8b34ec180812020118t1c5bc551t4b1e241427755517@mail.gmail.com>
	<8b34ec180812020139y18feadf6s5d2ce23ec95b79d1@mail.gmail.com>
Message-ID: <bbcd77d00812021833w4ed8cb46m939faab31ffd780b@mail.gmail.com>

On Tue, Dec 2, 2008 at 3:39 AM, Bartek Wilczynski
<bartek at rezolwenta.eu.org> wrote:
> On Mon, Dec 1, 2008 at 10:07 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>
>> Thanks for all these changes.
>> I remember that I wrote a mail to TAMO's authors when I was using it.
>> They seemed to be interested in integrating the code with biopython,
>> so maybe the license issue could be superated.
>> It's up to you, whether you want to reimplement all the functions they
>> have or not.
>
> I have to say I haven't done anything yet towards integrating TAMO
> with biopython.
> So far, my own code was doing the job for me, and since there was a
> certain learning curve to get into TAMO,
> I didn't look closely into it. I've looked more carefully now at it
> and I have two general thoughts:
> - There is a number of features in TAMO, for which there is no
> counterpart in Bio.Motif. Just by looking at module names I've found:
>  - MDscan parser
>  - their own EM motif finding scheme (some kind of EM method)
>  - several motif comparison functions from MotifCompare
>  - a lot of nice little methods for motifs like textLogo, giflogo, etc.
> - There is quite an overlap between biopython and TAMO. They
> implemented their own Sequence handling, FASTA Parser, clustering
> module etc.  There will be some gruntwork with integrating their code
> into Biopython (findining and reconciling the overlaps)
>
> I also have to say, that I'm a bit scared by copright statements in
> the TAMO code, saying it belongs to the Whitehead institute. I don't
> want to be overly pessimistic, but the process of releasing this code
> under biopython license might be slow.
>
> What I think is the best way to go is to clean up current mess with
> Bio.Alignace and Bio.MEME, and then ask people for contributions.
> If TAMO developers would be willing to contribute I'll be happy to
> help with integration into biopython. It will take some time anyway,
> so I wouldn't delay the inclusion of Bio.Motif into Biopython.
>
> cheers
> Bartek
>
>
>
> --
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

I would agree that you should ignore TAMO and just focus on developing
a suitable framework to integrate Alignace and MEME as you have
indicated. I would presume that the other motif finding applications
will also fit into that framework.

Unless the TAMO code is under a BSD-style or equivalent license that
is compatible with Biopython you must stop looking at it. I know it is
hard to avoid as the comes up on Google with a simple search. If the
TAMO code gets suitably licensed, then fine but until then it can
cause major problems that can involve the whole Biopython project
(even including GPLed code can do this).

Bruce


From biopython at maubp.freeserve.co.uk  Wed Dec  3 21:10:49 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Dec 2008 21:10:49 +0000
Subject: [Biopython-dev] Fwd: [Utilities-announce] PubMed Entrez Utility
	2009 DTD changes
In-Reply-To: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov>
Message-ID: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com>

This email from the NCBI will be of interest for Bio.Entrez - we may
need to add a few DTD files to Bio.Entrez in preparation for this...
see also Bug 2678.

Peter

---------- Forwarded message ----------
From:  <utilities-announce at ncbi.nlm.nih.gov>
Date: Wed, Dec 3, 2008 at 8:57 PM
Subject: [Utilities-announce] PubMed Entrez Utility 2009 DTD changes
To: utilities-announce at ncbi.nlm.nih.gov


PubMed Entrez Utility Users,

We anticipate switching to the updated PubMed 2009 DTDs on December 15,
2008.

2009 DTDs are available from the Entrez DTD page:
http://eutils.ncbi.nlm.nih.gov/entrez/query/DTD/index.html

The DTD changes for the 2009 production year, as noted in the Revision
Notes section near the top of each DTD, are:

NLMMedline DTD (used for MEDLINE/PubMed)
a. Changed entity reference from "nlmmedlinecitation_080101.dtd" to:
"nlmmedlinecitation_090101.dtd"
b. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref and NlmDcmsID
element [Edited 10/16/08]
c. FOR V.3: Added GrantCountry.Ref entity [Edited 10/30/08]

NLMMedlineCitation DTD (used for MEDLINE/PubMed data)
a. Changed entity reference from "nlmsharedcatcit_080101.dtd" to:
"nlmsharedcatcit_090101.dtd"
b. Moved entity Type to nlmcommon dtd
c. Added NLM value to entity Source
d. CHANGE WITHDRAWN FOR V.2: Deleted entity NlmDcmsID.Ref [Edited
10/16/08]

NLMSharedCatCit DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile)
a.  Changed entity reference from "nlmcommon_080101.dtd"
to "nlmcommon_090101.dtd"
b.  Moved OtherAbstract element from nlmsharedcatcit dtd to nlmcommon
dtd

NLMCommon DTD (used for MEDLINE/PubMed, CatfilePlus, and Serfile)
a. Added ValidYN attribute to Investigator element
b. Moved OtherAbstract element from nlmsharedcatcit to nlmcommon dtd
c. Added OtherAbstract element to NCBIArticle element
d. Moved entity Type from nlmmedlinecitation to nlmcommon dtd
e. Added Publisher value to entity Type
f. Deleted Consumer value from entity Type
g. Added Country element to Grant element
h. FOR V.2: Changed Country value to GrantCountry.Ref in Grant Element
[Edited 10/30/08]

NLMCatalogRecord DTD (used for CatfilePlus and Serfile in XML format):
a.  Changed entity reference from "nlmsharedcatcit_080101.dtd"
to: "nlmsharedcatcit_090101.dtd"
b.  Added PrecedingInPart, SupersedesInPart, SucceedingInPart,
SupersededInPartBy values to entity TitleType


_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce


From biopython at maubp.freeserve.co.uk  Thu Dec  4 10:26:39 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Dec 2008 10:26:39 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
Message-ID: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>

On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham <timothyham at gmail.com> wrote:
>
> Hi everyone,
>
> The current biopython GenBank parser dies while parsing VectorNTI
> generated files.  For example, until recently, BioPython did not
> accept an empty SOURCE field. It still does not handle an empty
> VERSION or ACCESSION fields (consumer.data.id never gets filled),
> which is the default for user generated vector maps via VectorNTI.

I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97
after Tim contacted me offlist - there was no bug report.

> Now, it is easy enough to change the GenBank parser to handle
> malformed genbank files, (I can submit patches) but the real question
> becomes:
>> Should BioPython handle malformed genbank files at all?
> I would like to be practical and say yes, since VectorNTI is a very
> common, widely used format, but I wanted to ask the community before
> submitting my patches.
>
> Thanks for the great work,
> Tim

As I'm the defacto maintainer for Bio.GenBank, I guess unless the list
as a whole has a consensus this is my call.

Reading the GenBank file format spec, the ACCESSION and VERSION lines
are clearly intended to be mandatory.  Note that for mandatory fields,
IIRC, the NCBI will use a single dot/period as a place holder when
there is no data.  So I would argue that VectorNTI is producing
invalid files, and you should write to the authors and encourage them
to follow the spec more closely (even if we do change Biopython to
cope).

However, I'm willing to bend a little on out of spec GenBank files (in
cases like this where there is no ambiguity about the parsing), but I
would want a real example output file from VectorNTI to include for a
unit test.  This is important as we need to use something sensible for
the SeqRecord's id property if the ACCESSION and VERSION are missing.

Peter


From mjldehoon at yahoo.com  Thu Dec  4 12:32:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 4 Dec 2008 04:32:18 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
Message-ID: <442447.52362.qm@web62407.mail.re1.yahoo.com>

> Michiel de Hoon wrote:
> > If one of the sub-tests fails, Python's unit
> > testing framework will tell us so,
> > though (perhaps) not exactly which sub-test fails.
> > However, that is easy to
> > figure out just by running the individual test script
> > by itself.
> 
> That won't always work.  Consider intermittent network
> problems, or tests using random data - in general it 
> really is worthwhile having run_tests.py report a little
> more than just which test_XXX.py module failed.
>
I wonder if Python's unit testing framework allows us to capture exactly which sub-test fails. I'll look into that. Ideally, it should be possible to have regular Python unit tests and Biopython-style print-and-compare tests side by side, and get information about failing sub-tests for both.

--Michiel.


From bsouthey at gmail.com  Thu Dec  4 15:02:13 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 04 Dec 2008 09:02:13 -0600
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
Message-ID: <4937F0F5.6070905@gmail.com>

Peter wrote:
> On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham <timothyham at gmail.com> wrote:
>   
>> Hi everyone,
>>
>> The current biopython GenBank parser dies while parsing VectorNTI
>> generated files.  For example, until recently, BioPython did not
>> accept an empty SOURCE field. It still does not handle an empty
>> VERSION or ACCESSION fields (consumer.data.id never gets filled),
>> which is the default for user generated vector maps via VectorNTI.
>>     
>
> I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97
> after Tim contacted me offlist - there was no bug report.
>
>   
>> Now, it is easy enough to change the GenBank parser to handle
>> malformed genbank files, (I can submit patches) but the real question
>> becomes:
>>     
>>> Should BioPython handle malformed genbank files at all?
>>>       
>> I would like to be practical and say yes, since VectorNTI is a very
>> common, widely used format, but I wanted to ask the community before
>> submitting my patches.
>>
>> Thanks for the great work,
>> Tim
>>     
>
> As I'm the defacto maintainer for Bio.GenBank, I guess unless the list
> as a whole has a consensus this is my call.
>
> Reading the GenBank file format spec, the ACCESSION and VERSION lines
> are clearly intended to be mandatory.  Note that for mandatory fields,
> IIRC, the NCBI will use a single dot/period as a place holder when
> there is no data.  So I would argue that VectorNTI is producing
> invalid files, and you should write to the authors and encourage them
> to follow the spec more closely (even if we do change Biopython to
> cope).
>
> However, I'm willing to bend a little on out of spec GenBank files (in
> cases like this where there is no ambiguity about the parsing), but I
> would want a real example output file from VectorNTI to include for a
> unit test.  This is important as we need to use something sensible for
> the SeqRecord's id property if the ACCESSION and VERSION are missing.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the 
'complete release notes for the current version of GenBank'.
 From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that 
ACCESSION and VERSION are mandatory and I interpret the '/' to mean 
'with'. The relevant section is:

3.4.2  Entry Organization
"
  The second part of each sequence entry record contains the information
appropriate to its keyword, in positions 13 to 80 for keywords and
positions 11 to 80 for the sequence.

  The following is a brief description of each entry field. Detailed
information about each field may be found in Sections 3.4.4 to 3.4.15.

LOCUS	- A short mnemonic name for the entry, chosen to suggest the
sequence's definition. Mandatory keyword/exactly one record.

DEFINITION	- A concise description of the sequence. Mandatory
keyword/one or more records.

ACCESSION	- The primary accession number is a unique, unchanging
identifier assigned to each GenBank sequence record. (Please use this
identifier when citing information from GenBank.) Mandatory keyword/one
or more records.

VERSION		- A compound identifier consisting of the primary
accession number and a numeric version number associated with the
current version of the sequence data in the record. This is followed
by an integer key (a "GI") assigned to the sequence by NCBI.
"
Mandatory keyword/exactly one record.

If these entries are missing then Biopython must raise an exception 
because the GenBank file is invalid.

While I have not seen an example, does a VectorNTI output contain the 
LOCUS field that could be used an accession number?
I think it is fairly common for the accession number to be part of the 
LOCUS field.

Bruce


From biopython at maubp.freeserve.co.uk  Thu Dec  4 15:16:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Dec 2008 15:16:20 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <4937F0F5.6070905@gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<4937F0F5.6070905@gmail.com>
Message-ID: <320fb6e00812040716h1fb4bfbflf5a37456102722cc@mail.gmail.com>

On Thu, Dec 4, 2008 at 3:02 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Peter wrote:
>> Reading the GenBank file format spec, the ACCESSION and VERSION lines
>> are clearly intended to be mandatory.  Note that for mandatory fields,
>> IIRC, the NCBI will use a single dot/period as a place holder when
>> there is no data.  So I would argue that VectorNTI is producing
>> invalid files, and you should write to the authors and encourage them
>> to follow the spec more closely (even if we do change Biopython to
>> cope).

Bruce wrote:
> At http://www.ncbi.nlm.nih.gov/Genbank/index.html there is a link to the
> 'complete release notes for the current version of GenBank'.
> From ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, it clearly states that
> ACCESSION and VERSION are mandatory  ...

We agree on this, according to the current NCBI standard, a GenBank
file missing the ACCESSION or VERSION line is technically invalid.

Bruce:
> If these entries are missing then Biopython must raise an exception because
> the GenBank file is invalid.

I see a difference between a GenBank parser, and a GenBank validator.
While it would be nice to just say "your file is invalid", in many
cases the meaning of the file isn't ambiguous and can still be safely
parsed.  From past experience, even the NCBI sometimes provide invalid
files which break their own rules (e.g. Biopython Bug 2591).  In my
personal opinion, a strict parser which rejects any invalid GenBank
file isn't actually that useful - there is a grey area where a little
leniency is very helpful:

Peter wrote:
>> However, I'm willing to bend a little on out of spec GenBank files (in
>> cases like this where there is no ambiguity about the parsing), but I
>> would want a real example output file from VectorNTI to include for a
>> unit test.  This is important as we need to use something sensible for
>> the SeqRecord's id property if the ACCESSION and VERSION are missing.

Peter


From biopython at maubp.freeserve.co.uk  Thu Dec  4 22:15:26 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Dec 2008 22:15:26 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
Message-ID: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>

Tim wrote:
> I have attached two representative example genbank outputs from
> VectorNTI. I don't know if the mailing list accepts attachments, but
> if it can't, is there a place where I can put it (maybe the biopython
> wiki?)

I got them, thanks.  For future reference, it would have been better
to have filed a bug on bugzilla, and then (once the bug is filed) you
can attach files to it.

Earlier Tim wrote:
>>> The current biopython GenBank parser dies while parsing VectorNTI
>>> generated files.  For example, until recently, BioPython did not
>>> accept an empty SOURCE field. It still does not handle an empty
>>> VERSION or ACCESSION fields (consumer.data.id never gets filled),
>>> which is the default for user generated vector maps via VectorNTI.

Now that I've got your two files, my copy of Biopython seem to read
them just fine.  What exactly do you mean by the "parser dies"?  Could
you show us a snippet of code and if relevant the exception error -
plus details of your OS, version of Python and Biopthon etc?

Thanks

Peter


From timothyham at gmail.com  Fri Dec  5 02:09:21 2008
From: timothyham at gmail.com (Timothy Ham)
Date: Thu, 4 Dec 2008 18:09:21 -0800
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
	<320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>
Message-ID: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Now that I've got your two files, my copy of Biopython seem to read
> them just fine.  What exactly do you mean by the "parser dies"?  Could
> you show us a snippet of code and if relevant the exception error -
> plus details of your OS, version of Python and Biopthon etc?
>
> Thanks
>
> Peter
>

Ah, my bad. I was running it against an old version. It looks like it
was fixed as of
/biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48).
The current version does the right thing.

Thanks much,
Tim


From biopython at maubp.freeserve.co.uk  Fri Dec  5 10:19:12 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Dec 2008 10:19:12 +0000
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
	<632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>
	<320fb6e00812041415t3fb22630xae4d34205e0562a3@mail.gmail.com>
	<632cdbf70812041809v1d4ed344q3cc03db3e310b2ab@mail.gmail.com>
Message-ID: <320fb6e00812050219k376fdda2r969fe78a547b0ff6@mail.gmail.com>

Tim wrote:
> Ah, my bad. I was running it against an old version. It looks like it
> was fixed as of
> /biopython/Bio/GenBank/__init__.py version 1.87 (biopython release 1.48).
> The current version does the right thing.

Oh right - that was when I was testing parsing of the slightly
non-standard GenBank output from the EMBOSS seqret tool.  Anyway,
problem solved :)

Peter


From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 11:59:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 06:59:07 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812051159.mB5Bx7TR009168@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #17 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-05 06:59 EST -------
(In reply to comment #0)
> The default font has been changed to 'Vera', which is shipped with Reportlab,
> to avoid some problems with unavailable fonts

On my Mac "Vera" doesn't work, and going back to the default of 'Helvetica'
seems best on Unix in general.  Also, Helvetica is one of the standard fonts
which all PDF viewers should be able to render.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 16:44:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 11:44:10 -0500
Subject: [Biopython-dev] [Bug 2697] New: MaxEntropy calculate function
	assumes integer values for class and convergence criteria is
	hard coded
Message-ID: <bug-2697-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697

           Summary: MaxEntropy calculate function assumes integer values for
                    class and convergence criteria is hard coded
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P3
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The Bio.MaxEntrophy.classify() assumes that the targets are integers starting
at zero. However, a model can be trained by using character values. This
requires a simple change in a loop in that function.

Also, the convergence criteria is hard coded into the file by the following
gloable definitions:
MAX_IIS_ITERATIONS = 10000    # Maximum iterations for IIS.
IIS_CONVERGE = 1E-5           # Convergence criteria for IIS.
MAX_NEWTON_ITERATIONS = 100   # Maximum iterations on Newton's method.
NEWTON_CONVERGE = 1E-10       # Convergence criteria for Newton's method.

This makes it impossible for the user to specify their own values without
changing the actual function. This is changed by passing these values to the
train function and subfunctions. 

Both of these are fixed in an attached patch.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 16:47:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 11:47:15 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812051647.mB5GlFRQ020087@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #1 from bsouthey at gmail.com  2008-12-05 11:47 EST -------
Created an attachment (id=1139)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1139&action=view)
Fixes to MaxEntrophy

1) Fixes MaxEntrophy.calculate to use the target classes from the data
2) Permits the user to define their own convergence criterion


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 16:59:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 11:59:51 -0500
Subject: [Biopython-dev] [Bug 2698] New: Attempt at a unit test for
	MaxEntrophy
Message-ID: <bug-2698-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2698

           Summary: Attempt at a unit test for MaxEntrophy
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


I used test_LogisticRegression.py to develop a test for MaxEntrophy. However, I
could not get MaxEntrophy to train on that dataset. Indeed I have found it to
be very sensitive to both data and functions making it extremely hard to
develop bioinformatics-based data and associated test. So in the end I
generated data based on some of my work. 

I trained the model outside the tests because I do not know how to avoid
retraining the model for each test.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec  5 17:00:29 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 5 Dec 2008 12:00:29 -0500
Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy
In-Reply-To: <bug-2698-42@http.bugzilla.open-bio.org/>
Message-ID: <200812051700.mB5H0Ted022044@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2698


------- Comment #1 from bsouthey at gmail.com  2008-12-05 12:00 EST -------
Created an attachment (id=1140)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1140&action=view)
Test for MaxEntrophy


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From timothyham at gmail.com  Thu Dec  4 21:52:33 2008
From: timothyham at gmail.com (Timothy Ham)
Date: Thu, 4 Dec 2008 13:52:33 -0800
Subject: [Biopython-dev] Parsing malformed genbank files (e.g. VectorNTI)
In-Reply-To: <320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
References: <632cdbf70812021619i7e652a05nd801dd408ba9aad4@mail.gmail.com>
	<320fb6e00812040226g117fe534g4523e8b58f7f28@mail.gmail.com>
Message-ID: <632cdbf70812041352oec43f5fh13bd35a1416d0fd2@mail.gmail.com>

On Thu, Dec 4, 2008 at 2:26 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Dec 3, 2008 at 12:19 AM, Timothy Ham <timothyham at gmail.com> wrote:
>>
>> Hi everyone,
>>
>> The current biopython GenBank parser dies while parsing VectorNTI
>> generated files.  For example, until recently, BioPython did not
>> accept an empty SOURCE field. It still does not handle an empty
>> VERSION or ACCESSION fields (consumer.data.id never gets filled),
>> which is the default for user generated vector maps via VectorNTI.
>
> I fixed the SOURCE issue in Bio/GenBank/__init__.py CVS revision 1.97
> after Tim contacted me offlist - there was no bug report.
>
>> Now, it is easy enough to change the GenBank parser to handle
>> malformed genbank files, (I can submit patches) but the real question
>> becomes:
>>> Should BioPython handle malformed genbank files at all?
>> I would like to be practical and say yes, since VectorNTI is a very
>> common, widely used format, but I wanted to ask the community before
>> submitting my patches.
>>
>> Thanks for the great work,
>> Tim
>
> As I'm the defacto maintainer for Bio.GenBank, I guess unless the list
> as a whole has a consensus this is my call.
>
> Reading the GenBank file format spec, the ACCESSION and VERSION lines
> are clearly intended to be mandatory.  Note that for mandatory fields,
> IIRC, the NCBI will use a single dot/period as a place holder when
> there is no data.  So I would argue that VectorNTI is producing
> invalid files, and you should write to the authors and encourage them
> to follow the spec more closely (even if we do change Biopython to
> cope).
>
> However, I'm willing to bend a little on out of spec GenBank files (in
> cases like this where there is no ambiguity about the parsing), but I
> would want a real example output file from VectorNTI to include for a
> unit test.  This is important as we need to use something sensible for
> the SeqRecord's id property if the ACCESSION and VERSION are missing.
>
> Peter
>

I have attached two representative example genbank outputs from
VectorNTI. I don't know if the mailing list accepts attachments, but
if it can't, is there a place where I can put it (maybe the biopython
wiki?)

Tim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vnti_example.zip
Type: application/zip
Size: 11716 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20081204/15ebddc9/attachment-0002.zip>

From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 14:55:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 09:55:05 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091455.mB9Et5iX017478@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #1132|application/octet-stream    |text/plain
          mime type|                            |
Attachment #1132 is|0                           |1
              patch|                            |


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 09:55 EST -------
(From update of attachment 1132)
Checked into CVS (with the font defaulting to Helvetica as discussed with
Leighton privately).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 14:55:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 09:55:56 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091455.mB9Etu7C017584@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1132 is|1                           |0
              patch|                            |
Attachment #1132 is|0                           |1
           obsolete|                            |


------- Comment #19 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 09:55 EST -------
(From update of attachment 1132)
This is now obsolete - checked into CVS (with the font defaulting to elvetica
as discussed with Leighton privately).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 15:12:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:12:56 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091512.mB9FCusM019463@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 10:12 EST -------
(In reply to comment #12)
> 
> Bio.Graphics.GenomeDiagram.Utilities
> ====================================
> This is a collection of utilities for getting information useful for graph
> values.  From the docstring,
> 
>     o apply_to_window (sequence, window_size, function, step=None)  Apply a
>                         passed function to fragments of the passed sequence of
>                         size window_size, with each window separated by the
>                         passed step.

This windowing function is rather specific to GenomeDiagram by the nature of
how it returns the values and their positions.  The handling of the end of the
sequence is also non-general.  Suppose we put apply_to_window somewhere under
Bio.Graphics.GenomeDiagram.  It can then be used with any sequence analysis
function which takes a sequence/string and returns a float, returning the
scores and window positions as expected by GenomeDiagram for drawing graphical
tracks.

That would leave the following general non-windowed functions from
Utilities.py,

calc_gc_content - returns a float in the range 0 to 1.
calc_at_content - returns a float in the range 0 to 1.
calc_gc_skew - returns a float, gives zero if there is no GC content.
calc_at_skew - returns a float, gives zero if there is no AT content.

Bio.SeqUtils already has several functions including:

GC - returns a float in the range 0 to 100 (i.e. 100 times the actual fraction)
GC_skew - returns a list of floats using a default window size of 100bp.  Gives
a floating point exception if there is no GC content in any window.

Personally I don't like the fact that the existing GC function returns a number
between 0 and 100, but otherwise this code is fine.

I don't think the current GC_skew function is intuitive and doesn't cover the
non-windowed use-case where you want the GC_skew of the whole sequence passed
in.  This is important if you want to do your own windowing (e.g. comparing GC
skew of individual genes to the whole genome).

Because they differ from the existing Bio.SeqUtils code, I think there is a
case for adding the four non-windowed functions from GenomeDiagram's
Utilities.py under Bio.SeqUtils.  Perhaps under a sub module like
Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils?  The existing GC functions
in Bio.SeqUtils could be deprecated or at least declared obsolete.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 15:19:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:19:23 -0500
Subject: [Biopython-dev] [Bug 2704] New: Parser for the markx10 alignment
	format
Message-ID: <bug-2704-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704

           Summary: Parser for the markx10 alignment format
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: osvaldo.zagordi at bsse.ethz.ch


Hi,
I recently wrote some code to parse the Emboss alignment format
markx10 (format explained at
http://emboss.sourceforge.net/docs/themes/AlignFormats.html)
Since it is slightly different from the Fasta m10 (not surprising, right?) I
had to adapt FastaIO.py.
I thought this might eventually be included in biopython.
Important:
I noticed that if the alignment program exits for some reason and
does not close the alignment file with two lines like these
#---------------------------------------
#---------------------------------------
bad things can happen (e.g., sucking all the memory of the system)). 
Could it be that a similar issue applies to FastaIO parser as well?
Best,
        Osvaldo


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 15:35:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:35:57 -0500
Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format
In-Reply-To: <bug-2704-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091535.mB9FZvHG021117@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 10:35 EST -------
This sounds interesting Osvaldo,

Now that you've filed this bug, you should be able to upload the python file
(or a patch).

Given EMBOSS's markx10 output is intended to be like FASTA's -m 10 output (but
with the addition of EMBOSS style headers and footers), it *might* be nicer to
have one parser for both.  Right now I don't know how similar EMBOSS's output
really is.

If we do go for the simpler option of two separate parsers, it would certainly
be a good idea in the long run for them to share some code.

(In reply to comment #0)
> Important:
> I noticed that if the alignment program exits for some reason and
> does not close the alignment file with two lines like these
> #---------------------------------------
> #---------------------------------------
> bad things can happen (e.g., sucking all the memory of the system)). 
> Could it be that a similar issue applies to FastaIO parser as well?

Does this happen create such a file by hand (lacking these files) and try and
read that?  If so it should be easier to debug.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 15:43:19 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 10:43:19 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091543.mB9FhJfV021598@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #21 from lpritc at scri.sari.ac.uk  2008-12-09 10:43 EST -------
(In reply to comment #20)
> (In reply to comment #12)
> > 
> > Bio.Graphics.GenomeDiagram.Utilities
> > ====================================
> > This is a collection of utilities for getting information useful for graph
> > values.  From the docstring,
> > 
> >     o apply_to_window (sequence, window_size, function, step=None)  Apply a
> >                         passed function to fragments of the passed sequence of
> >                         size window_size, with each window separated by the
> >                         passed step.
> 
> This windowing function is rather specific to GenomeDiagram by the nature of
> how it returns the values and their positions.  The handling of the end of the
> sequence is also non-general.  Suppose we put apply_to_window somewhere under
> Bio.Graphics.GenomeDiagram.  It can then be used with any sequence analysis
> function which takes a sequence/string and returns a float, returning the
> scores and window positions as expected by GenomeDiagram for drawing graphical
> tracks.

That seems sensible, to me.  I like the generality that would result from it,
and it seems like apply_to_window could even be a useful convenience function
addition to Bio.SeqUtils in its own right.

[...]

> Because they differ from the existing Bio.SeqUtils code, I think there is a
> case for adding the four non-windowed functions from GenomeDiagram's
> Utilities.py under Bio.SeqUtils.  Perhaps under a sub module like
> Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils?  The existing GC functions
> in Bio.SeqUtils could be deprecated or at least declared obsolete.

I think that there's value to be had in standardising to a floating-point 0..1
or -1..1 range for some of these kinds of functions, so I would support such a
move on those grounds.

Regarding my GC skew code (and the corresponding AT skew code): that the
behaviour when there is no GC in the sequence is misleading (read: wrong ;) ). 
Strictly, a divide-by-zero error would be correct here, but I just lazily went
for a zero value for ease of drawing, instead of doing something that properly
indicated 'not a number'.  I think that what needs to be done for GenomeDiagram
is to modify the graphing code so that it does something appropriate for NaNs
(however they may be indicated) - this should perhaps be to stop at the
preceding point, and resume at the subsequent point, for line graphs; not to
draw a box for the heat map; and not to draw a bar for the bar chart (not that
this will always be distinguishable from a zero value...).

The GenomeDiagram GC/AT skew code also needs to be modified to return None or
some other NaN indicator before its behaviour can be considered correct.

Apologies for propagating those shortcuts - my bad.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 16:20:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:20:06 -0500
Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format
In-Reply-To: <bug-2704-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091620.mB9GK6Si024603@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704


------- Comment #2 from osvaldo.zagordi at bsse.ethz.ch  2008-12-09 11:20 EST -------
Created an attachment (id=1151)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1151&action=view)
Class Markx10Iterator for markx10 alignment format

Attached a simple example of using the code. Just running simple_test.py should
be enough.
If you remove the last two lines #------ from tmp_align.needle the program
loops sucking more and more memory


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 16:20:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:20:23 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091620.mB9GKNCm024646@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #22 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 11:20 EST -------
(In reply to comment #21)
> Regarding my GC skew code (and the corresponding AT skew code): that the
> behaviour when there is no GC in the sequence is misleading
> (read: wrong ;) ). 
> Strictly, a divide-by-zero error would be correct here, but I just lazily went
> for a zero value for ease of drawing, instead of doing something that properly
> indicated 'not a number'.  

Yeah - you're right.  Either we just allow the divide by zero to be raised, or
return a NaN, maybe via float("nan") unless there is a better way without
getting NumPy involved.

> I think that what needs to be done for GenomeDiagram
> is to modify the graphing code so that it does something appropriate for NaNs
> (however they may be indicated) - this should perhaps be to stop at the
> preceding point, and resume at the subsequent point, for line graphs; not to
> draw a box for the heat map; and not to draw a bar for the bar chart (not that
> this will always be distinguishable from a zero value...).

OK.  I can see what just using zero was a nice short cut here.

> The GenomeDiagram GC/AT skew code also needs to be modified to return None or
> some other NaN indicator before its behaviour can be considered correct.

Or, if we accept that "sequence scoring functions" may raise a divide by zero
error, then apply_to_window should be also to cope and map this to an
appropriate nan indicator (e.g. None or float("nan")).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 16:39:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:39:27 -0500
Subject: [Biopython-dev] [Bug 2704] Parser for the markx10 alignment format
In-Reply-To: <bug-2704-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091639.mB9GdRTJ026010@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2704


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 11:39 EST -------
(In reply to comment #2)
> If you remove the last two lines #------ from tmp_align.needle the program
> loops sucking more and more memory

You have an infinite loop, try modifying the bit near line 162 as follows:

        #Now should have the aligned query sequence with flanking region...
        while not (line.startswith(">") or ">>>" in line) and not
line.startswith('#'):
            match_seq_parts.append(line.strip())
            line = handle.readline()
            if not line :
                #End of file
                return None 

Also, your code is based on an out of date version of Bio/AlignIO/FastaIO.py -
probably from Biopython 1.47, and lacks improvements which may also apply to
the EMBOSS output.  Given the object orientated nature of the current m10
parser, you/we should be able to subclass it and only override those bit
dealing with the header and footer.  This is probably the nicest way forward if
we decide to treat the EMBOSS markx10 format as a new format in Bio.AlignIO.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 16:59:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 11:59:21 -0500
Subject: [Biopython-dev] [Bug 2705] New: Nicer GC and AT content and skew
	functions
Message-ID: <bug-2705-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2705

           Summary: Nicer GC and AT content and skew functions
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This bug started out as a discussion on Bug 2671, based on some nucleotide
scoring functions in GenomeDiagram which were used for plotting sequence
properties along a sequence using a sliding window.  The basic underlying
functions could make a nice addition under Bio.SeqUtils (rather than hiding
them under Bio.Graphics.GenomeDiagram).

In particular, GenomeDiagram's Utilities.py included the following
(non-windowed) nucleotide composition functions:

calc_gc_content - returns a float in the range 0 to 1.
calc_at_content - returns a float in the range 0 to 1.
calc_gc_skew - returns a float [*]
calc_at_skew - returns a float [*]

[*] As discussed on Bug 2671, these currently give zero if there is no AT
content, which was a reasonable shortcut given these functions were originally
used for plotting only.  They should instead raise an exception or return None
or NaN instead.

Also, as implemented in GenomeDiagram, these functions do not cope with mixed
case sequences (easily rectified).  Also, for GC and AT content these do not
deal with ambiguous nucleotides (where we could follow the existing
Bio.SeqUtils convention).

Bio.SeqUtils already has several related functions including:

GC - returns a float (a percentage in the range 0 to 100)
GC123 - returns a tuple of four floats (percentages between 0 and 100)

GC_skew - returns a list of floats using a default window size of 100bp.  Gives
a floating point exception if there is no GC content in any window.

Personally I don't like the fact that the existing GC function returns a number
between 0 and 100 (rather than 0 and 1).  Leighton agreed.

I don't think the current GC_skew function is intuitive and doesn't cover the
non-windowed use-case where you want the GC_skew of the whole sequence passed
in.  This is important if you want to do your own windowing (e.g. comparing GC
skew of individual genes to the whole genome).

Because they differ from the existing Bio.SeqUtils code, I think there is a
case for adding the four non-windowed functions from GenomeDiagram's
Utilities.py under Bio.SeqUtils.  Each would take a single argument, a sequence
(coping with a string, Seq object or MutableSeq object).  I have no
particularly strong views on the naming of these functions.  Perhaps they could
be located under a sub module like Bio.SeqUtils.Nucleotides or
Bio.SeqUtils.NucUtils?  The existing GC functions in Bio.SeqUtils could be
deprecated or at least declared obsolete.

This would also be a good opportunity to explicitly specify what we expect to
get back for the GC content when there are ambiguous nucleotides.

e.g. Following Bio.SeqUtils.GC, only count C, G and S (which means C or G) (in
either case) and divide by the length giving a lower bound.  Here GC("ACGTN")
is 40%.  An alternative approach might be to treat an N as 50% GC, and H (which
is A, C or T) as 66.6% GC etc, meaning GC("ACGTN") gives 50%.

The same approach should be used for the AT percentage, for example the current
lower bound approach would count only A, T and W characters (in either case).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 17:04:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 12:04:15 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091704.mB9H4F9C028063@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 12:04 EST -------
I've filed Bug 2705 about adding these nucleotide sequence functions somewhere
under Bio.SeqUtils - this should get more people reading it because this bug
(Bug 2671) hasn't been assigned to the dev mailing list I doubt many people are
aware of it.

For Bio.Graphics.GenomeDiagram we need to ensure the graphics tracks can cope
with NAN/None missing values as outlined by Leighton in comment 21.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Tue Dec  9 17:53:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Dec 2008 12:53:44 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812091753.mB9Hri42031692@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1133 is|0                           |1
           obsolete|                            |


------- Comment #24 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-09 12:53 EST -------
(From update of attachment 1133)
I've checked something like this into CVS.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 16:46:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 11:46:35 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101646.mBAGkZs1003825@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2705


------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-10 11:46 EST -------
OK, GenomeDiagram is now in CVS, with some basic tests.  Still to do:

* Updating the existing GenomeDiagram manual to match (different imports,
colour to color), which I think can stay as a separate PDF file.

* A short introduction to Bio.Graphics including GenomeDiagram as part of a new
chapter in the tutorial?

* Dealing with Bug 2705 (for the AT and GC content and skew) and the window
function to help plot these in GenomeDiagram.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 16:46:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 11:46:38 -0500
Subject: [Biopython-dev] [Bug 2705] Nicer GC and AT content and skew
	functions
In-Reply-To: <bug-2705-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101646.mBAGkcGB003850@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2705


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |2671
              nThis|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 17:16:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 12:16:37 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101716.mBAHGbGG006815@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-10 12:16 EST -------
We already talked about "colour" vs "color" (UK vs USA), but I've just noticed
the use of "centre" vs "center" where again I would prefer we follow computer
language norms and take the USA spelling.

Also, I'm not sure that the existing colour/color dual support works 100% of
the time.  I had an old script using colour where the feature colours specified
ended up being the default of light green.  Using "color" instead of "colour"
in my script worked.  I'll try and investigate this later.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Wed Dec 10 17:55:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Dec 2008 12:55:31 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812101755.mBAHtVJ7009870@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #27 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-10 12:55 EST -------
This might be better off as a new enhancement bug, but here is a possible
"arc-box" drawing function to go in the AbstractDrawer.py file, based on the
existing draw_box function.

def draw_arcbox(xcentre, ycentre, inner_radius, outer_radius,
                startangle, endangle,
                colour=colors.lightgreen, border=None, color=None) :
    """Returns a closed path object describing an arced box.

    Expects the angles to be in radians."""
    if color is None:
        color = colour
    if color == colors.white and border is None:   # Force black border on 
        strokecolor = colors.black                 # white boxes with
    elif border is None:                           # undefined border, else
        strokecolor = color                        # use fill colour
    elif border is not None:
        strokecolor = border

    p = ArcPath(strokeColor=strokecolor,
                fillColor=color,
                strokewidth=0)
    p.addArc(xcentre, ycentre, outer_radius,
             startangle * 180 / pi, endangle * 180 / pi,
             moveTo=True)
    p.addArc(xcentre, ycentre, inner_radius,
             startangle * 180 / pi, endangle * 180 / pi,
             reverse=True)
    p.closePath()
    return p

This takes advantage of reportlab's build in arc approximation code meaning we
can simplify the CircularDrawer.py method to just something like this:

    def draw_arc(self, inner_radius, outer_radius,
                 startangle, endangle,
                 color, border=None, colour=None):
        #Docstring here
        return draw_arcbox(self.xcentre, self.ycentre,
                           inner_radius, outer_radius,
                           startangle, endangle,
                           colour, border, color)

Alternately, the code could just go in CircularDrawer.py directly.

As far as I can tell from looking at their source code, even ReportLab_1_21_2
has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any
issue here with backwards compatibility.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 11 08:40:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Dec 2008 03:40:23 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812110840.mBB8eNFs006984@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #28 from lpritc at scri.sari.ac.uk  2008-12-11 03:40 EST -------
(In reply to comment #26)
> We already talked about "colour" vs "color" (UK vs USA), but I've just noticed
> the use of "centre" vs "center" where again I would prefer we follow computer
> language norms and take the USA spelling.
> 
> Also, I'm not sure that the existing colour/color dual support works 100% of
> the time.  I had an old script using colour where the feature colours specified
> ended up being the default of light green.  Using "color" instead of "colour"
> in my script worked.  I'll try and investigate this later.

Is this related to my fix in comment #9?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 11 11:50:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Dec 2008 06:50:17 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812111150.mBBBoHej030149@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #29 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-11 06:50 EST -------
(In reply to comment #28)
> (In reply to comment #26)
> > Also, I'm not sure that the existing colour/color dual support works 100%
> > of the time.  I had an old script using colour where the feature colours
> > specified ended up being the default of light green.  Using "color"
> > instead of "colour" in my script worked.  I'll try and investigate this
> > later.
> 
> Is this related to my fix in comment #9?

Possibly - although I was already using that version of AbstractDrawer.py

I've updated CVS to make it clear in the comments that "colour" arguments
override "color" arguments (this is required for backwards compatibility with
old scripts which would be using "colour").  I also had to fix the FeatureSet's
add_feature method to handle the colour/color mapping (this was the root of the
problem I had observed in comment 26).

I propose that in Biopython 1.50 we support both "colour" and "color", but for
Biopython 1.51 we add deprecation warnings when "colour" is used.

We should probably do the same thing for "centre" and "center" as well...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 11 11:52:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Dec 2008 06:52:41 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812111152.mBBBqfTQ030413@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #30 from lpritc at scri.sari.ac.uk  2008-12-11 06:52 EST -------
(In reply to comment #29)
> 
> I propose that in Biopython 1.50 we support both "colour" and "color", but for
> Biopython 1.51 we add deprecation warnings when "colour" is used.
> 
> We should probably do the same thing for "centre" and "center" as well...
> 

I agree.  We should encourage use of the US spelling in the documentation, to
catch those new to GD. This approach provides a window for conversion of old GD
scripts for previous users, which is a good thing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 16:09:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 11:09:27 -0500
Subject: [Biopython-dev] [Bug 2709] New: test_GenomeDiagram fails under Linux
Message-ID: <bug-2709-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2709

           Summary: test_GenomeDiagram fails under Linux
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P4
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


Under my Linux 64-bit system test_GenomeDiagram fails but the other related
tessts 'pass' as reportlab is not available:

test_GenomeDiagram ... ERROR                                                    
test_GraphicsChromosome ... skipping. Install reportlab if you want to use
Bio.Graphics.
ok
test_GraphicsDistribution ... skipping. Install reportlab if you want to use
Bio.Graphics.
ok
test_GraphicsGeneral ... skipping. Install reportlab if you want to use
Bio.Graphics.
ok


======================================================================
ERROR: test_GenomeDiagram                                             
----------------------------------------------------------------------
Traceback (most recent call last):                                    
  File "run_tests.py", line 125, in runTest                           
    self.runSafeTest()                                                
  File "run_tests.py", line 138, in runSafeTest                       
    cur_test = __import__(self.test_name)                             
  File "test_GenomeDiagram.py", line 21, in <module>                  
    raise MissingExternalDependencyError(\                            
NameError: name 'MissingExternalDependencyError' is not defined       

----------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 16:25:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 11:25:59 -0500
Subject: [Biopython-dev] [Bug 2709] test_GenomeDiagram fails under Linux
In-Reply-To: <bug-2709-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121625.mBCGPxeQ031269@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2709


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-12 11:25 EST -------
It was trying to raise MissingExternalDependencyError when reportlab was
missing (which would have skipped the test), but MissingExternalDependencyError
hadn't been imported.

Fixed in test_GenomeDiagram.py CVS revision 1.10

Thanks for reporting this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 16:49:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 11:49:51 -0500
Subject: [Biopython-dev] [Bug 2710] New: GenomeDiagram.py unnecessary
	requires the reportlab addon renderPM
Message-ID: <bug-2710-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2710

           Summary: GenomeDiagram.py unnecessary requires the reportlab
                    addon renderPM
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


test_GenomeDiagram fails because the renderPM module is not part of standard
install of reportlab, at least under Linux. 

I consider that the renderPM module should not be required so
Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
renderPM module when it is not available. 

The installation documentation needs to include something about needing the
renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.

There must be a test for the presence of the renderPM module.


test_GenomeDiagram ... ERROR
test_GraphicsChromosome ... ok
test_GraphicsDistribution ... ok
test_GraphicsGeneral ... ok

======================================================================
ERROR: test_GenomeDiagram
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_GenomeDiagram.py", line 30, in <module>
    from Bio.Graphics.GenomeDiagram.FeatureSet import FeatureSet
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/__init__.py",
line 13, in <module>
    from Bio.Graphics.GenomeDiagram.Diagram import Diagram
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/Graphics/GenomeDiagram/Diagram.py",
line 32, in <module>
    from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
  File "/usr/lib/python2.5/site-packages/reportlab/graphics/renderPM.py", line
28, in <module>
    "see http://www.reportlab.org/rl_addons.html")
ImportError: No module named _renderPM
see http://www.reportlab.org/rl_addons.html

----------------------------------------------------------------------


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 17:43:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:43:49 -0500
Subject: [Biopython-dev] [Bug 2711] New: GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
Message-ID: <bug-2711-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711

           Summary: GenomeDiagram.py: write() and write_to_string() are
                    inefficient and don't check inputs
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


While looking at GenomeDiagram.py I noticed some things that should be fixed. I
do note that some of this stems from reportlab. In particlular, reportlab
doesn't appear to have a generic interface for different image formats.

1) Why are there two functions to output a diagram than just one generic
function? In particular, why not just pass a filename or not? Yes, I know that
reportlab uses different functions but this just duplicates code. So this is
more a comment than anything else. 

2) I find the functions write() and write_to_string() just plain ugly. 
You define a local dictionary of modules every time these functions are called.
But there is only one valid key so you then go back to find the input that you
already knew. A nested list would be better and allow catching invalid inputs
(see next point).

3) Neither write() and write_to_string() check that the output option is valid.
These functions do not accept lowercase. Thus, output='ps' will crash with a
key error as well any invalid key.

4) I do not know the policy on module imports, but this line is only required
for write() and write_to_string():
from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
Also renderPM is an addon.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 17:46:53 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:46:53 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121746.mBCHkrPi005835@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #1 from bsouthey at gmail.com  2008-12-12 12:46 EST -------
Created an attachment (id=1156)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1156&action=view)
Fix various issues with GenomeDIagram/Diagram.py

Contains a couple of fixes including bug 2710.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 17:54:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:54:21 -0500
Subject: [Biopython-dev] [Bug 2710] GenomeDiagram.py unnecessary requires
	the reportlab addon renderPM
In-Reply-To: <bug-2710-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121754.mBCHsL4q006303@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2710


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE


------- Comment #1 from bsouthey at gmail.com  2008-12-12 12:54 EST -------
The reason for this bug report was the import of renderPM. But closer look at
the code shows a bigger issue with write() and writeToString() functions of
Diagram.py. I am marking this as duplicate because correctly fixing bug 2711
(see patch for Bug 2711) will also fix this one.

*** This bug has been marked as a duplicate of bug 2711 ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 17:54:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 12:54:34 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121754.mBCHsYgN006312@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #2 from bsouthey at gmail.com  2008-12-12 12:54 EST -------
*** Bug 2710 has been marked as a duplicate of this bug. ***


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 18:25:25 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 13:25:25 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121825.mBCIPPZq008484@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-12 13:25 EST -------
I agree something needs to be done for this issue (in particular the bit
originally covered by Bug 2710.

Moving the imports into these function(s) would be another way to let use deal
with the missing renderPM module if and when it is used (either leave the
ImportError, or raise a missing external dependency error).

As an aside, I'd like write_to_string() to support a DPI argument like write()
does.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 19:23:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 14:23:06 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121923.mBCJN64B013046@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1156 is|0                           |1
           obsolete|                            |


------- Comment #4 from bsouthey at gmail.com  2008-12-12 14:23 EST -------
Created an attachment (id=1157)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1157&action=view)
Corrected patch

I blindly copied and pasted without correcting it. Also, added 'dpi' to
write_to_string().


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Dec 12 19:29:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 12 Dec 2008 14:29:37 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812121929.mBCJTbtl013858@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #5 from bsouthey at gmail.com  2008-12-12 14:29 EST -------
(In reply to comment #3)
> 
> As an aside, I'd like write_to_string() to support a DPI argument like write()
> does.
> 

I added this to the patch as it was trivial. I would also think that exposing
the other options (bg, configPIL, showBoundary) could be useful. But I do not
know how these influence the GenomeDiagram.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Sat Dec 13 18:20:10 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 13 Dec 2008 18:20:10 +0000
Subject: [Biopython-dev] [Utilities-announce] PubMed Entrez Utility 2009
	DTD changes
In-Reply-To: <320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com>
References: <7B6F170840CA6C4DA63EE0C8A7BB43EC03A0001F@NIHCESMLBX15.nih.gov>
	<320fb6e00812031310s43124c68n988838af3837638d@mail.gmail.com>
Message-ID: <320fb6e00812131020r4a2a02dtcc7d65e8cf495052@mail.gmail.com>

On Wed, Dec 3, 2008 at 9:10 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> This email from the NCBI will be of interest for Bio.Entrez - we may
> need to add a few DTD files to Bio.Entrez in preparation for this...
> see also Bug 2678.

I've just added the following five DTD files to CVS,

nlmcommon_090101.dtd
nlmmedline_090101.dtd
nlmmedlinecitation_090101.dtd
nlmsharedcatcit_090101.dtd
pubmed_090101.dtd

All from http://www.ncbi.nlm.nih.gov/entrez/query/DTD/

Peter


From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 20:19:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 15:19:15 -0500
Subject: [Biopython-dev] [Bug 2678] Bio.Entrez module does not always
	retrieve or find DTD files
In-Reply-To: <bug-2678-42@http.bugzilla.open-bio.org/>
Message-ID: <200812132019.mBDKJFkD005703@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2678


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 15:19 EST -------
(In reply to comment #6)
> If the DTD is available locally in Bio/Entrez/DTDs, then Bio.Entrez will read
> it from there. If not, it tries to download it. This may fail if the servers
> are busy. If the needed DTDs are saved in Bio/Entrez/DTDs (and installed when
> Biopython is installed), you won't run into this problem.

I was just looking at this on my Windows XP Python 2.3 machine, and when it
tried to download missing DTD files it was just using a filename as the URL.
I've committed a fix to CVS which should resolve this:

biopython/Bio/Entrez/Parser.py revision 1.3

I'll double check this on Linux/Mac next week.

This may be related to Leighton's problem - although 'xhtml1-strict.dtd' and
'xhtml-lat1.ent' are not NCBI DTD files, but rather a part of the XML
specification itself.

Note that if I delete all the Bio/Entrez/DTDs/* files, then test_Entrez.py
fails.  I get warning messages about downloading missing DTD files, and the
following failures:

======================================================================
ERROR: Test parsing pubmed links returned by ELink (fifth test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 2523, in t_pubmed5
    record = Entrez.read(input)
  File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in
read

    record = handler.run(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run
    self.parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 131, in
startE
lement
    if object!="":
UnboundLocalError: local variable 'object' referenced before assignment

======================================================================
ERROR: Test parsing XML returned by EFetch, PubMed database (first test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 3058, in t_pubmed1
    record = Entrez.read(input)
  File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in
read

    record = handler.run(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run
    self.parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
ExpatError: syntax error: line 1, column 0

======================================================================
ERROR: Test parsing XML returned by EFetch, PubMed database (second test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 3261, in t_pubmed2
    record = Entrez.read(input)
  File "c:\python23\Lib\site-packages\Bio\Entrez\__init__.py", line 286, in
read

    record = handler.run(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 95, in run
    self.parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
  File "c:\python23\Lib\site-packages\Bio\Entrez\Parser.py", line 294, in
extern
al_entity_ref_handler
    parser.ParseFile(handle)
ExpatError: syntax error: line 1, column 0

======================================================================
FAIL: Test parsing pubmed links returned by ELink (sixth test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_Entrez.py", line 2697, in t_pubmed6
    assert len(record[0]["IdCheckList"])==2
AssertionError

----------------------------------------------------------------------

(The rest of the Entrez tests pass even with the missing DTDs - they are now
successfully downloaded on demand)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Dec 13 23:56:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 18:56:02 -0500
Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
In-Reply-To: <bug-2649-42@http.bugzilla.open-bio.org/>
Message-ID: <200812132356.mBDNu2HE017869@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 18:56 EST -------
Hi Paul,

I'd like to close this bug now as we think it has been solved.  Michiel's
update was included with Biopython 1.49, so you don't need to mess about with
CVS to check and confirm this now.

Thanks,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec 14 00:12:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 19:12:00 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140012.mBE0C0Yo018673@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |biopython-
                   |                            |bugzilla at maubp.freeserve.co.
                   |                            |uk


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 19:11 EST -------
(In reply to comment #4)
> (In reply to comment #2)
> > (In reply to comment #0)
> > > 1) Fixed date/dates typo.
> > 
> > Why is it a typo?  Change not checked in.
> 
> The function _load_bioentry_date in Loader.py inserts the annotation 'date',
> if present, or the current date if not, into the bioentry_qualifier_value
> table. This is pulled by BioSeq.py _retrieve_qualifier_value and stored as
> the attribute 'dates'. Hence I considered line 307 in BioSeq.py to be a typo,
> which should be 'date' and not 'dates'.

OK, that does make sense.  However...

> Also, because Loader.py handles dates separately, they should not be
> handled by the function load_annotations.

That would make sense if we make the above "dates"/"date" change.

If we tested a record with a "date" annotation, I guess currently it would get
recorded twice - once under "date_changed" by _load_bioentry_date (retrieved as
"dates") and again but under "date" by _load_annotations (retrieved as "date").

Right now, I'm wondering why _load_bioentry_date exists in the first place ...
perhaps this special annotation entry "date_changed" is to mimic BioPerl?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec 14 00:59:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 19:59:14 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140059.mBE0xE0g021156@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-13 19:59 EST -------
(In reply to comment #0)
> Also, the convergence criteria is hard coded into the file by the following
> gloable definitions:
> MAX_IIS_ITERATIONS = 10000    # Maximum iterations for IIS.
> IIS_CONVERGE = 1E-5           # Convergence criteria for IIS.
> MAX_NEWTON_ITERATIONS = 100   # Maximum iterations on Newton's method.
> NEWTON_CONVERGE = 1E-10       # Convergence criteria for Newton's method.
> 
> This makes it impossible for the user to specify their own values without
> changing the actual function.

No, you can change them in your own code - they are just module level variable.
For example:

from Bio import MaxEntropy
#Check the current limit,
print MaxEntropy.MAX_NEWTON_ITERATIONS
#Increase the iteration limit,
MaxEntropy.MAX_NEWTON_ITERATIONS = 1000

One might argue these should be *optional* arguments to the functions. 
However, your suggested change adds new *required* arguments, which is not a
backwards compatible API change.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec 14 02:20:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 21:20:37 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140220.mBE2KbM1026093@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #3 from bsouthey at gmail.com  2008-12-13 21:20 EST -------
(In reply to comment #2)
> (In reply to comment #0)
> > Also, the convergence criteria is hard coded into the file by the following
> > gloable definitions:
> > MAX_IIS_ITERATIONS = 10000    # Maximum iterations for IIS.
> > IIS_CONVERGE = 1E-5           # Convergence criteria for IIS.
> > MAX_NEWTON_ITERATIONS = 100   # Maximum iterations on Newton's method.
> > NEWTON_CONVERGE = 1E-10       # Convergence criteria for Newton's method.
> > 
> > This makes it impossible for the user to specify their own values without
> > changing the actual function.
> 
> No, you can change them in your own code - they are just module level variable.
> For example:
> 
> from Bio import MaxEntropy
> #Check the current limit,
> print MaxEntropy.MAX_NEWTON_ITERATIONS
> #Increase the iteration limit,
> MaxEntropy.MAX_NEWTON_ITERATIONS = 1000
> 
> One might argue these should be *optional* arguments to the functions. 
> However, your suggested change adds new *required* arguments, which is not a
> backwards compatible API change.
> 
> Peter
> 

I strongly disagree on this because a user should not have to read the module
source code to find these module level global variables and what values these
actually are. But this is not my code.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec 14 04:27:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 13 Dec 2008 23:27:16 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812140427.mBE4RGIE001073@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp  2008-12-13 23:27 EST -------
(In reply to comment #3)
> I strongly disagree on this because a user should not have to read the module
> source code to find these module level global variables and what values these
> actually are. But this is not my code.
> 
I agree with Bruce that these variables should be arguments to the function,
rather than module-level global variables. To keep the API backwards
compatible, we can specify the current values for these variables as default
values for these arguments. This will also make it easier for users that are
not particularly interested in these variables.

If you submit a revised patch, please do not just comment out unneeded code; it
is better to actually remove code that is no longer needed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Dec 14 13:17:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 14 Dec 2008 08:17:47 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200812141317.mBEDHla7021974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-14 08:17 EST -------
(In reply to comment #3)
> (In reply to comment #2)
> > No, you can change them in your own code - they are just module level
> > variables
> > ...
> > One might argue these should be *optional* arguments to the functions. 
> > However, your suggested change adds new *required* arguments, which is not a
> > backwards compatible API change.

Sorry - you *did* use optional arguments for the train function. I was
distracted by the private functions where the new arguments are required.

> I strongly disagree on this because a user should not have to read the module
> source code to find these module level global variables and what values these
> actually are. But this is not my code.

I'm not saying the current state of the code is elegant - just correcting your
factual error that the end user couldn't change these parameters.  They can.

(In reply to comment #4)
> I agree with Bruce that these variables should be arguments to the function,
> rather than module-level global variables. To keep the API backwards
> compatible, we can specify the current values for these variables as default
> values for these arguments. This will also make it easier for users that are
> not particularly interested in these variables.

This is what I was implying, although less clearly.

To be even more explicit, if we want to add these variables as arguments to the
functions then they should default to the existing upper case module level
variables.  We shouldn't remove or rename the module level variables in case
anyone was using them them in the way I illustrated in comment 2.

e.g.
def train(training_set, results, feature_fns, update_fn=None):

becomes something like this:

def train(training_set, results, feature_fns, update_fn=None,
          max_iis_iterations = MAX_IIS_ITERATIONS,
          iis_convere = IIS_CONVERGE,
          max_newton_iterations = MAX_NEWTON_ITERATIONS
          newton_coverage = NEWTON_CONVERGE):
#This function's code would then need updating to use
#local variable max_iis_iterations instead of the
#module level MAX_IIS_ITERATIONS.

Note this does NOT use uppercase argument names as in Bruce's original patch -
these would not be consistent with the rest of Biopython.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 10:11:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:11:37 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151011.mBFABbqD007138@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #6 from lpritc at scri.sari.ac.uk  2008-12-15 05:11 EST -------
(In reply to comment #2)
> *** Bug 2710 has been marked as a duplicate of this bug. ***
> 

(In reply to comment #0)
> test_GenomeDiagram fails because the renderPM module is not part of standard
> install of reportlab, at least under Linux. 

That's odd - renderPM is in the source for ReportLab 2.2.  Are you using an
up-to-date version?  It seems to install well enough on our 64-bit Linux box
from the ReportLab source.

> I consider that the renderPM module should not be required so
> Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
> renderPM module when it is not available. 

renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part
of GenomeDiagram's functionality.

I prefer your alternative suggestion of making it a 'dynamic' import, but even
then I think that the inconvenience of preparing the diagram, only to find out
at the last possible stage that you can't draw it because you're missing the
library, is worse than getting the error message upfront.  Not that this should
be a problem, since renderPM is part of the main ReportLab source, now.  YMMV
though, and I'm happy for the code to conform to the Biopython house style.

> The installation documentation needs to include something about needing the
> renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.
> 
> There must be a test for the presence of the renderPM module.

I'm not convinced of the value of this, as renderPM is part of the current
ReportLab source installation.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 10:17:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:17:54 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151017.mBFAHs0K007630@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #6 from lpritc at scri.sari.ac.uk  2008-12-15 05:11 EST -------
(In reply to comment #2)
> *** Bug 2710 has been marked as a duplicate of this bug. ***
> 

(In reply to comment #0)
> test_GenomeDiagram fails because the renderPM module is not part of standard
> install of reportlab, at least under Linux. 

That's odd - renderPM is in the source for ReportLab 2.2.  Are you using an
up-to-date version?  It seems to install well enough on our 64-bit Linux box
from the ReportLab source.

> I consider that the renderPM module should not be required so
> Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
> renderPM module when it is not available. 

renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part
of GenomeDiagram's functionality.

I prefer your alternative suggestion of making it a 'dynamic' import, but even
then I think that the inconvenience of preparing the diagram, only to find out
at the last possible stage that you can't draw it because you're missing the
library, is worse than getting the error message upfront.  Not that this should
be a problem, since renderPM is part of the main ReportLab source, now.  YMMV
though, and I'm happy for the code to conform to the Biopython house style.

> The installation documentation needs to include something about needing the
> renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.
> 
> There must be a test for the presence of the renderPM module.

I'm not convinced of the value of this, as renderPM is part of the current
ReportLab source installation.


------- Comment #7 from lpritc at scri.sari.ac.uk  2008-12-15 05:17 EST -------
(In reply to comment #0) (from #2710)
> test_GenomeDiagram fails because the renderPM module is not part of standard
> install of reportlab, at least under Linux. 

renderPM is part of the source install of ReportLab 2.2, and installs correctly
on our 64-bit Linux box.  Are you using an up-to-date version of ReportLab? 
The version that your distro's installer uses may not be the most recent.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 10:41:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:41:13 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151041.mBFAfDI8010277@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #8 from lpritc at scri.sari.ac.uk  2008-12-15 05:41 EST -------
(In reply to comment #0)
> 1) Why are there two functions to output a diagram than just one generic
> function? In particular, why not just pass a filename or not? 

When I wrote the libraries originally, I had one main use in mind: production
of publication-quality images in vector format.  Later on I decided that I
needed streaming output for web display, and then bolted on the
write_to_string() to look like the ReportLab interface, for consistency. 
That's why there are two methods: the write() method produces
publication-quality (and bitmaps, if you ask), and the write_to_string() method
produces the streaming output.

It should be possible to make write() do both jobs, so long as the intention is
declared in the argument list.  It might be nice to just be able to specify a
stream or handle, rather than the filename.  Both of these would be an API
change.

> 2) I find the functions write() and write_to_string() just plain ugly. 
> You define a local dictionary of modules every time these functions are called.

That dictionary could be placed at the head of the script to be defined on
import.  But I think it's more explicit what's going on to have it in the
method itself - the dictionary has restricted scope, and is garbage-collected
after the function call.  Also, I don't understand your nested list proposal:
distribution dictionaries are not that uncommon.

> 4) I do not know the policy on module imports, but this line is only required
> for write() and write_to_string():
> from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
> Also renderPM is an addon.

Apologies for repeating myself earlier about this one - Bugzilla was being
flaky - but renderPM is now part of ReportLab 2.2.  Whether we should continue
to support/cater for installations of 1.21 without the add-ons is another
question, I think.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 10:51:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 05:51:30 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151051.mBFApU9R011217@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #9 from lpritc at scri.sari.ac.uk  2008-12-15 05:51 EST -------
(In reply to comment #3)
>As an aside, I'd like write_to_string() to support a DPI argument like write()
> does.

The way I originally intended write_to_string() to be used - sending graphics
to a browser - the DPI has no influence at all.  DPI is only of any importance
for printing graphics: the DPI translates the pixel size into the final printed
size of the image.  The image you see on screen (assuming no fancy browser
scaling) is pixel-per-pixel.  That's why I left it out.

It may be that people have a sensible reason for writing their image output to
string - rather than binary - encoding, for writing to a file.  I'm not clear
on what that would be, but it's possible.  In that case, I think that an
appropriate merging of the write() and write_to_string() methods could be:

def write(self, filename=None, output=default_output, dpi=default_dpi,
encoding=default_encoding):

encoding could then be either 'binary' (default), or 'string' - which would
emulate write_to_string()'s function.

Where handle is not None, the resulting output would be sent to the passed
handle - which could potentially include sys.stdout.  Where handle is None, the
method could return the encoded image directly, as write_to_string() does, now.

Other than the obvious problem with ReportLab's drawToFile requiring a
filename, rather than a handle - does this seem like a reasonable plan to
others?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 11:00:01 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 06:00:01 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151100.mBFB01fk011962@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-15 06:00 EST -------
(In reply to comment #8)
> 
> > 4) I do not know the policy on module imports, but this line is only
> > required for write() and write_to_string():
> > from reportlab.graphics import renderPS, renderPDF, renderSVG, renderPM
> > Also renderPM is an addon.
> 
> Apologies for repeating myself earlier about this one - Bugzilla was being
> flaky - but renderPM is now part of ReportLab 2.2.  Whether we should continue
> to support/cater for installations of 1.21 without the add-ons is another
> question, I think.

I thought I'd commented on this bug already but I committed a patch which would
fail gracefully if renderPM was missing.  I must be running an older version of
ReportLab on my Linux box at home, because it didn't have renderPM installed.  

However - this check is done when writing the file.  This is good if you don't
have renderPM but only want vector images.  This is bad if you do want bitmaps
images, as the missing dependency error happens at the very end.

However, I don't think we can assume renderPM will be installed.  Looking at
the website for reportlab 2.2, its not clear if the Windows installers will
include renderPM or not...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 11:02:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 06:02:35 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151102.mBFB2ZMq012237@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #11 from lpritc at scri.sari.ac.uk  2008-12-15 06:02 EST -------
(In reply to comment #3)
> I agree something needs to be done for this issue (in particular the bit
> originally covered by Bug 2710.
> 
> Moving the imports into these function(s) would be another way to let use deal
> with the missing renderPM module if and when it is used (either leave the
> ImportError, or raise a missing external dependency error).

One issue with this approach is that, when working with the module
interactively, a user might not be aware of the absence of the appropriate
module until they attempted to produce their output - which might be after
quite a bit of interactive work.  Informing the user up-front that renderPM is
not available - either by ImportError or friendly warning - avoids this.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 11:17:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 06:17:45 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812151117.mBFBHjgn013463@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-15 06:17 EST -------
(In reply to comment #9)
> (In reply to comment #3)
> > As an aside, I'd like write_to_string() to support a DPI argument like
> > write() does.
> 
> The way I originally intended write_to_string() to be used - sending graphics
> to a browser - the DPI has no influence at all.  DPI is only of any importance
> for printing graphics ...

OK, so its less useful than I had expected.  Rending bitmaps to strings so they
can be inserted into a database as blobs is one potential use-case.  Also for a
web-service where you expect the user to save and print the naked image
(unusual, and probably software dependent on how the DPI is treated).

> In that case, I think that an appropriate merging of the write() and
> write_to_string() methods could be:
> 
> def write(self, filename=None, output=default_output, dpi=default_dpi,
> encoding=default_encoding):
> 
> encoding could then be either 'binary' (default), or 'string' - which would
> emulate write_to_string()'s function.
> 
> Where handle is not None, the resulting output would be sent to the passed
> handle - which could potentially include sys.stdout.  Where handle is None,
> the method could return the encoded image directly, as write_to_string()
> does, now.
> 
> Other than the obvious problem with ReportLab's drawToFile requiring a
> filename, rather than a handle - does this seem like a reasonable plan to
> others?

On the plus side, this would be backwards compatible (and we could deprecate
the draw_to_string function).

However, I'm not so keen on this style personally - the return value is
radically different depending on the arguments (nothing, or a string of data).

If we were designing this from scratch, I would have suggested one write
function which wrote to a handle - which would let you then write to a file or
a string (using StringIO).  On the other hand, this is perhaps a little low
level.  We're had similar discussions regarding Bio.SeqIO in the past.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 15 20:33:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 15 Dec 2008 15:33:51 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200812152033.mBFKXpp4005791@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


------- Comment #4 from joelb at lanl.gov  2008-12-15 15:33 EST -------
I heard back from GenBank, and it seems they are saying the problem isn't
theirs:
>On Tue, December 9, 2008 10:30 am, gb-admin at ncbi.nlm.nih.gov wrote:
>> Hi Joel,
>>
>> I heard back from our database folks on this one.  Essentially we do
>> allow the source line to line-wrap, but we never publicly announced
>> it.  We apologize for this oversight and will be putting something
>> in the release notes regarding this.  Hopefully BioPython and other
>> companies will be able to pick up this change and adapt once it is
>> announced in the release notes.
>>
>> thanks for pointing it out
>>
>> Linda

I just wrote back with the followup question:
>

>OK, but but then a followup question.  How does one distinguish, then, a
>line-wrapped organism line from the multiline phylogeny that follows? 
>According to my reading of the specs (and most Bio* GenBank parser's
>implementations) it seems that an equally-valid parsing of the following
>ORGANISM record is that it belongs to the "AKU_12601 Bacteria" kingdom. 
>That is, there is no official way of signalling "this is the end of the
>multiline organism name" or "this begins the multiline phylogeny record."
>
>  ORGANISM  Salmonella enterica subsp. enterica serovar Paratyphi A str.
>            AKU_12601
>            Bacteria; Proteobacteria; Gammaproteobacteria;Enterobacteriales;
>            Enterobacteriaceae; Salmonella.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Dec 17 23:44:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Dec 2008 18:44:58 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200812172344.mBHNiwPt019616@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


------- Comment #5 from joelb at lanl.gov  2008-12-17 18:44 EST -------
I received the following response to my followup.  It now appears that the bug
is with BioPython, since GenBank has changed its definition.  It seems likely
that all Bio* flatfile parsers will be affected.

>I just received the wording that will appear in Section 3.4.2 of gbrel.txt 
>for this month's release:
>
>   ORGANISM     - Formal scientific name of the organism (first line)
>and taxonomic classification levels (second and subsequent lines).
>Mandatory subkeyword in all annotated entries/two or more records.
>
>   In the event that the organism name exceeds 68 characters (80 - 13 +
>1)
>   in length, it will be line-wrapped and continue on a second line,
>   prior to the taxonomic classification. Unfortunately, very long 
>   organism names were not anticipated when the fixed-length GenBank
>   flatfile format was defined in the 1980s. The possibility of linewraps
>   makes the job of flatfile parsers more difficult : essentially, one
>   cannot be sure that the second line is truly a classification/lineage
>   unless it consists of multiple tokens, delimited by semi-colons.
>   The long-term solution to this problem is to introduce an additional
>   subkeyword, probably 'LINEAGE' . This might occur sometime in 2009
>   or 2010.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 18 11:07:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Dec 2008 06:07:16 -0500
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
	organism names
In-Reply-To: <bug-2591-42@http.bugzilla.open-bio.org/>
Message-ID: <200812181107.mBIB7G97005964@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2591


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-18 06:07 EST -------
(In reply to comment #5)
> I received the following response to my followup.  It now appears that the bug
> is with BioPython, since GenBank has changed its definition.  It seems likely
> that all Bio* flatfile parsers will be affected.

Thanks for chasing this up Joel :)

> I just received the wording that will appear in Section 3.4.2 of gbrel.txt 
> for this month's release:
> >
> >   ORGANISM     - Formal scientific name of the organism (first line)
> >and taxonomic classification levels (second and subsequent lines).
> >Mandatory subkeyword in all annotated entries/two or more records.
> >
> >   In the event that the organism name exceeds 68 characters (80-13+1)
> >   in length, it will be line-wrapped and continue on a second line,
> >   prior to the taxonomic classification. Unfortunately, very long 
> >   organism names were not anticipated when the fixed-length GenBank
> >   flatfile format was defined in the 1980s. The possibility of linewraps
> >   makes the job of flatfile parsers more difficult : essentially, one
> >   cannot be sure that the second line is truly a classification/lineage
> >   unless it consists of multiple tokens, delimited by semi-colons.
> >   The long-term solution to this problem is to introduce an additional
> >   subkeyword, probably 'LINEAGE' . This might occur sometime in 2009
> >   or 2010.


It looks like my guess was right, see comment #1:
> Let's wait and hear what the NCBI says - I expect they will have to change the
> file format definition slightly.
> 
> If they say this is a valid file, I hope they will also explain officially
> how we should split up the species and its lineage.  One option would be
> some thing like looking for semi-colons in the following text as indicative
> of the lineage (rather than as more of the ORGANISM).

Now that we've had the NCBI recommend the semi-colon approach, I've fixed our
parser in CVS:
Bio/GenBank/Record.py revision 1.14
Bio/GenBank/Scanner.py revision 1.26

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Dec 18 19:01:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Dec 2008 14:01:32 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200812181901.mBIJ1W31019801@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #31 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-18 14:01 EST -------
(In reply to comment #27)
> This might be better off as a new enhancement bug, but here is a possible
> "arc-box" drawing function to go in the AbstractDrawer.py file, based on the
> existing draw_box function.
> 
> ...

There was an issue with different frames of reference in the initial code I was
suggesting.

> Alternately, the code could just go in CircularDrawer.py directly.

This seemed simpler in the short term.

> As far as I can tell from looking at their source code, even ReportLab_1_21_2
> has ArcPath defined in reportlab.graphics.shapes so there shouldn't be any
> issue here with backwards compatibility.

I've just checked in a patch based on this - see
Bio/Graphics/GenomeDiagram/CircularDrawer.py revision 1.8

I've also updated the unit test to draw a circular diagram with some features
in white (with an automatic black border).  This now looks nice - with the old
code using mutliple boxes to fake the arced box, the whole feature ended up
looking black.  See Tests/test_GenomeDiagram.py revision 1.13

As a bonus, PDF output seems a little smaller now as well :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 16:19:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 11:19:51 -0500
Subject: [Biopython-dev] [Bug 2375] Coalescent support through Simcoal2
In-Reply-To: <bug-2375-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221619.mBMGJp6k013225@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2375


------- Comment #25 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 11:19 EST -------
(In reply to comment #24)
> I committed my patch to setup.py, as it seems to work fine with Python 2.3,
> 2.4, and 2.5 on all platforms. Leaving this bug open, since we still need to
> remove the workaround in Bio/PopGen/SimCoal/__init__.py.

Editing Bio/PopGen/SimCoal/__init__.py so do just the following seems to work
fine on Linux and MacOS (I've not tested on Windows yet):

import os
builtin_tpl_dir = os.path.abspath(os.path.join(os.path.dirname(__file__),
"data"))

I *think* this directory is only used in one place in
Bio/PopGen/SimCoal/Template.py so it might make more sense to put this code in
that function (leaving the __init__.py file essentially empty).  What do you
think Tiago?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 17:20:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 12:20:46 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221720.mBMHKkwo018936@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #961 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 12:20 EST -------
(From update of attachment 961)
This patch is now obsolete - I've checked in a variant of this into CVS.

This will allow us to proceed with Bug 2597 (
Enforce alphabet letters in Seq objects) without having to first introduce
mixed case variants of the IUPAC alphabets.

If/when we have mixed case IUPAC alphabets, then Bio.Sequencing.PhD could use
them.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 17:33:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 12:33:33 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221733.mBMHXXjd020146@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 12:33 EST -------
Created an attachment (id=1174)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1174&action=view)
Patch for Bio/Nexus/Nexus.py (non IUPAC) alphabet handling

(In reply to comment #2)
> I opt for (b): an easy one-time addition to Bio.Alphabets, easy to use for
> everyone (instead creating their own uppercase-lowercase variants of those
> terribly complicated biopython alphabet classes), and easy to change for all
> other modules if lowercase-uppercase is what they want (or need).

I'm not saying we shouldn't add mixed (and even lower) case variants of the
IUPAC alphabets, however, even if we had them, NEXUS still uses extra
characters like "-" for gaps (easily handled via a Gapped alphabet encoder) and
"?" (for a missing character).  Are there any other extra characters?

Under the current alphabet schema, we'd have to use a (mixed case) IUPAC
alphabet, then add a Gapped AlphabetEncoder (easy) then add a new alphabet
encoder for any misc letters non-IUPAC characters like "?".  This could be done
with the generic AlphabetEncoder, or we could add additional encoder objects
for special meanings.  This starts to get complicated (dealing with
AlphabetEncoders is nasty).

This attached patch is a variation on my "plan (a)" from comment 0. It makes
Bio.Nexus create its own alphabet objects (based on the generic DNA/RNA/Protein
classes) with the precise list of valid letters required for that file.  Using
this patch should allow us to press ahead with Bug 2597.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 17:38:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 12:38:10 -0500
Subject: [Biopython-dev] [Bug 2597] Enforce alphabet letters in Seq objects
In-Reply-To: <bug-2597-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221738.mBMHcA86020507@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2597


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 12:38 EST -------
Created an attachment (id=1175)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1175&action=view)
Patch for Bio/Seq.py to check the alphabet letters

This is a simple approach to checking the letters - probably not the fastest. 
I think it is important that the exception gives some clue about why the Seq
object was not created - either listing the first invalid character (as in this
patch) or listing all invalid characters (which could be done using sets).

On the other hand, I'd like this check to be as fast as possible - perhaps even
at the cost of a generic exception message like "Sequence contains letters
which are not valid for the given alphabet".


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Dec 22 18:27:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Dec 2008 13:27:11 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200812221827.mBMIRBme024497@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-22 13:27 EST -------
Created an attachment (id=1176)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1176&action=view)
Adding lower and mixed case IUPAC Alphabets

This needs reviewing by someone else - especially the multiple inheritance
which tries to follow the existing pattern that the parent is a more general
version of the child.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 09:58:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 04:58:31 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812230958.mBN9wVDK000340@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #13 from bsouthey at gmail.com  2008-12-23 04:58 EST -------
(In reply to comment #6)
> (In reply to comment #2)
> > *** Bug 2710 has been marked as a duplicate of this bug. ***
> > 
> 
> (In reply to comment #0)
> > test_GenomeDiagram fails because the renderPM module is not part of standard
> > install of reportlab, at least under Linux. 
> 
> That's odd - renderPM is in the source for ReportLab 2.2.  Are you using an
> up-to-date version?  It seems to install well enough on our 64-bit Linux box
> from the ReportLab source.


I can not check this as I am away from my system. As I recall, the Python code
for accessing this library is provided with the standard install as there is a
renderPM.py file. But that is just a wrapper to some C code found in the
rl_addons directory. So it is a big no that renderPM is available unless you
actually build the C sources or download the binaries (only valid for Windows).

According to the website
http://www.reportlab.org/subversion.html
"
It will create subdirectories for reportlab, which is an importable
python package, and rl_addons which contains the C extensions. The
latter need building with the contained setup script, but can also be
downloaded in pre-built form from our downloads page. They rarely
change.
"

What did you actually install?
In particular where was _renderPM built?
Basically we need to document this as there appears to be different ways to
install reporlab (may also be version or svn related).

> 
> > I consider that the renderPM module should not be required so
> > Graphics/GenomeDiagram/Diagram.py needs to be rewritten to avoid using the
> > renderPM module when it is not available. 
> 
> renderPM is how raster graphics are drawn, so is, I'm afraid, a necessary part
> of GenomeDiagram's functionality.

No problem then, but you must provide a test for the presence and functionality
of it in the actual code as well as the biopython tests.

> 
> I prefer your alternative suggestion of making it a 'dynamic' import, but even
> then I think that the inconvenience of preparing the diagram, only to find out
> at the last possible stage that you can't draw it because you're missing the
> library, is worse than getting the error message upfront.  Not that this should
> be a problem, since renderPM is part of the main ReportLab source, now.  YMMV
> though, and I'm happy for the code to conform to the Biopython house style.
> 
> > The installation documentation needs to include something about needing the
> > renderPM for JPG, BMP, GIF, PNG, TIFF or TIFF outputs.
> > 
> > There must be a test for the presence of the renderPM module.
> 
> I'm not convinced of the value of this, as renderPM is part of the current
> ReportLab source installation.
> 

My understanding is that this statement is not completely true.  But I would
like confirmation either way. There may also be allowance for windows
installations especially non-source ones but I can not check those.


Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 10:18:58 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 05:18:58 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231018.mBNAIwuq002193@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #14 from bsouthey at gmail.com  2008-12-23 05:18 EST -------
(In reply to comment #12)
> (In reply to comment #9)
> > (In reply to comment #3)
> > > As an aside, I'd like write_to_string() to support a DPI argument like
> > > write() does.
> > 
> > The way I originally intended write_to_string() to be used - sending graphics
> > to a browser - the DPI has no influence at all.  DPI is only of any importance
> > for printing graphics ...
> 
> OK, so its less useful than I had expected.  Rending bitmaps to strings so they
> can be inserted into a database as blobs is one potential use-case.  Also for a
> web-service where you expect the user to save and print the naked image
> (unusual, and probably software dependent on how the DPI is treated).
> 

Surely it is important because a user can write to a string and then save the
string to a file rather than using write() a second time. 

What do these options do?
bg, configPIL, showBoundary

> > In that case, I think that an appropriate merging of the write() and
> > write_to_string() methods could be:
> > 
> > def write(self, filename=None, output=default_output, dpi=default_dpi,
> > encoding=default_encoding):
> > 
> > encoding could then be either 'binary' (default), or 'string' - which would
> > emulate write_to_string()'s function.
> > 
> > Where handle is not None, the resulting output would be sent to the passed
> > handle - which could potentially include sys.stdout.  Where handle is None,
> > the method could return the encoded image directly, as write_to_string()
> > does, now.
> > 
> > Other than the obvious problem with ReportLab's drawToFile requiring a
> > filename, rather than a handle - does this seem like a reasonable plan to
> > others?
> 
> On the plus side, this would be backwards compatible (and we could deprecate
> the draw_to_string function).
> 
> However, I'm not so keen on this style personally - the return value is
> radically different depending on the arguments (nothing, or a string of data).
> 
> If we were designing this from scratch, I would have suggested one write
> function which wrote to a handle - which would let you then write to a file or
> a string (using StringIO).  On the other hand, this is perhaps a little low
> level.  We're had similar discussions regarding Bio.SeqIO in the past.
> 

I agree and I am not very concerned about backwards compatibility since this is
a very new function to Biopython. I think that is what is almost what
write_to_string() does and python functions are very big. But this is not my
code so please do as you want here.

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 11:12:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 06:12:33 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231112.mBNBCXkt006916@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 06:12 EST -------
(In reply to comment #14)
> (In reply to comment #12)
> > OK, so its less useful than I had expected.  Rending bitmaps to strings so
> > they can be inserted into a database as blobs is one potential use-case.
> > Also for a web-service where you expect the user to save and print the
> > naked image (unusual, and probably software dependent on how the DPI is
> > treated).
> 
> Surely it is important because a user can write to a string and then save the
> string to a file rather than using write() a second time. 

I was talking about write to string with a DPI not being so useful.

Using write to string is VERY useful, particularly for a webserver (which is
why Leighton added it, and how I have used it).  Setting the DPI isn't
important for using images in webpages - HTML and CSS provide lots of ways to
control the displayed and printed size.  Even if the browser is pointed
directly at the image (and not as part of a webpage) and you then print it, the
browser may ignore the DPI setting (probably browser specific).  i.e. The DPI
will only matter if the user saves the image and opens it in DPI aware
software.

(In reply to comment #14)
> (In reply to comment #12)
> > However, I'm not so keen on this style personally - the return value is
> > radically different depending on the arguments (nothing, or a string of
> > data).
> > 
> > If we were designing this from scratch, I would have suggested one write
> > function which wrote to a handle - which would let you then write to a
> > file or a string (using StringIO).  On the other hand, this is perhaps a
> > little low level.  We're had similar discussions regarding Bio.SeqIO in
> > the past.
> 
> I agree and I am not very concerned about backwards compatibility since this
> is a very new function to Biopython. I think that is what is almost what
> write_to_string() does and python functions are very big. But this is not my
> code so please do as you want here.

GenomeDiagram is new to Biopython, but has been available independently for
many years.  There will be some existing users (not just me and Leighton), and
the less they have to change to switch their code from using standalone
GenomeDiagram to the one within Biopython the better (the import lines have to
change for example).  We do need to think about backwards compatibility a bit.

Getting back to your original points,

(1) Two functions write() and write_to_string()
This follows the reportlab API, and they do actually return different
encodings.  From a backwards compatibility argument they should both stay, but
that doesn't stop us providing a unified method and deprecating 
write_to_string().

(2) Coding style of write() and write_to_string()
I don't have a problem with this - it works, its clear, its easily extended if
ReportLab add more back ends.  It doesn't strike me as ugly.  Inevitably this
is largely a matter of preference.

(3) The KeyError exception with invalid arguments.
This is fixed in CVS, for an invalid format argument you now get a ValueError
which is standard python practice.

(4) renderPM
Fixed in CVS, in that you can now use GenomeDiagram without ReportLab renderPM,
and have full functionality except for bitmap output.  Given we don't seem to
be able to assume renderPM will be installed and working, this seems a
reasonable solution.  If you try and render a bitmap without renderPM, then you
get a MissingExternalDependencyError exception asking you to install renderPM. 
We will need to look into this further for the documentation.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 12:45:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 07:45:55 -0500
Subject: [Biopython-dev] [Bug 2718] New: Bio.Graphics and output file
	formats (PDF, EPS, SVG, and bitmaps)
Message-ID: <bug-2718-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718

           Summary: Bio.Graphics and output file formats (PDF, EPS, SVG, and
                    bitmaps)
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


In addition to PDF and PS/EPS (encapsulated postscript), ReportLab can also do
SVG, and with its optional renderPM module can do assorted bitmaps too (e.g.
PNG, JPG, TIFF, GIF, BMP).  Note that renderPM may not be installed (see Bug
2710).

The recently added Bio.Graphics.GenomeDiagram module supports all of these
formats - see Diagram.py with write (to filename or a handle) and
write_to_string methods.

Looking at the older Bio.Graphics code, it currently only supports PDF
postscript, using a mixture of method names (which isn't very consistent):

Bio.Graphics.Distribution has a DistributionPage object with a draw method
(which writes to a filename or handle).

Bio.Graphics.BasicChromosome has an Organism object with a write method (which
writes to a filename or handle).

Bio.Graphics.Comparative has a ComparativeScatterPlot object with a
draw_to_file method (which writes to a filename or handle).

I would like:

(1) All the Bio.Graphics "write to file/handle" functions to accept any of the
supported file formats (like Bio.Graphics.GenomeDiagram), which would require
renderPM at run time for the bitmap formats (see Bug 2710).  They should share
some code for mapping format names to ReportLab rendering module.  This would
be easy to do without changing the existing mix of method names.

(2) Update the docstrings for the "write to file/handle" functions to make it
clear they can accept a filename OR a handle (a result of the underlying
reportlab renderer's drawToFile function's behaviour - see note below).

(3) Standardise on the method naming (and perhaps deprecate the old methods). 
Using "write" seems to be a sensible choice based on the current names used in
Bio.Graphics.

For reference/comparison, ReportLab's render modules have three related
functions:

* drawToString - Returns a string, calls drawToFile internally with a StringIO
handle.
* drawToFile - Takes a filename OR a handle (although their docstrings do not
make this clear, this works as the Canvas object takes either).  Calls the draw
function internally.
* draw - Takes a canvas object

See also Bug 2711 which touched on these issues in the context of GenomeDiagram
only.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 12:47:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 07:47:26 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231247.mBNClPt9017108@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 07:47 EST -------
In comment #12, I wrote:
> If we were designing this from scratch, I would have suggested one write
> function which wrote to a handle - which would let you then write to a file or
> a string (using StringIO).  On the other hand, this is perhaps a little low
> level.  We're had similar discussions regarding Bio.SeqIO in the past.

The reportlab docstrings are very unclear, however, their renderer's drawToFile
functions take either a filename OR a handle.  This works because the
underlying Canvas object can be created giving either a filename or a handle.

As a result, GenomeDiagram's write() method should accept either a filename or
a handle.  We should update the docstring to say this (perhaps even renaming
the argument?).

(In reply to comment #15)
> (1) Two functions write() and write_to_string()
> This follows the reportlab API, and they do actually return different
> encodings.

I wrote this based on something Leighton had said to me.  Going over the
reportlab code, this isn't true - reportlab's drawToString just calls
drawToFile with a cStringIO or StringIO handle.  They write identical data.

(In reply to comment #15)
> Getting back to your original points,
> 
> (1) Two functions write() and write_to_string()
> This follows the reportlab API, and they do actually return different
> encodings.  From a backwards compatibility argument they should both stay, but
> that doesn't stop us providing a unified method and deprecating 
> write_to_string().

I've filed Bug 2718 for the general issue of method naming for the Bio.Graphics
modules output functionality.

> (2) Coding style of write() and write_to_string()
> I don't have a problem with this - it works, its clear, its easily extended if
> ReportLab add more back ends.  It doesn't strike me as ugly.  Inevitably this
> is largely a matter of preference.

Leaving this as is - the code itself may end up handled via shared function for
all of Bio.Graphics via Bug 2718.

> (3) The KeyError exception with invalid arguments.
> This is fixed in CVS, for an invalid format argument you now get a ValueError
> which is standard python practice.
> 
> (4) renderPM
> Fixed in CVS, in that you can now use GenomeDiagram without ReportLab
> renderPM and have full functionality except for bitmap output.  Given we 
> don't seem to be able to assume renderPM will be installed and working, this
> seems a reasonable solution.  If you try and render a bitmap without
> renderPM, then you get a MissingExternalDependencyError exception asking you
> to install renderPM.  We will need to look into this further for the
> documentation.

Marking this bug as FIXED.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 12:55:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 07:55:11 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231255.mBNCtB1L017851@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 07:55 EST -------
Example script showing the reportlab render modules producing output given a
filename, handle, or via a string:

from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.units import cm
from reportlab.graphics import renderPS, renderPDF, renderPM
from reportlab.graphics.shapes import Drawing, String

width = 10*cm
height = 2*cm

print "Using canvas directly (PDF only)..."
c = Canvas("hello1.pdf", pagesize=(width, height))
c.drawString(1*cm, 1*cm, "Hello World!")
c.showPage()
c.save()

#Create very simple drawing object,
drawing = Drawing(width, height)
drawing.add(String(1*cm, 1*cm, "Hello World!"))

print "Using filenames..."
renderPDF.drawToFile(drawing, "hello2.pdf")
renderPM.drawToFile(drawing, "hello2.png", "PNG")

print "Using handles..."
handle = open("hello3.pdf","w")
renderPDF.drawToFile(drawing, handle)
handle.close()
handle = open("hello3.ps","w")
renderPS.drawToFile(drawing, handle)
handle.close()
handle = open("hello3.png","w")
renderPM.drawToFile(drawing, handle, "PNG")
handle.close()

print "Using strings..."
handle = open("hello4.pdf","w")
handle.write(renderPDF.drawToString(drawing))
handle.close()
handle = open("hello4.ps","w")
handle.write(renderPS.drawToString(drawing))
handle.close()
handle = open("hello4.png","w")
handle.write(renderPM.drawToString(drawing, "PNG"))
handle.close()

print "Done"


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Dec 23 13:14:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Dec 2008 08:14:06 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200812231314.mBNDE64X019775@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-12-23 08:14 EST -------
(In reply to comment #0)
> (1) All the Bio.Graphics "write to file/handle" functions to accept any of the
> supported file formats (like Bio.Graphics.GenomeDiagram), which would require
> renderPM at run time for the bitmap formats (see Bug 2710).  They should share
> some code for mapping format names to ReportLab rendering module.  This would
> be easy to do without changing the existing mix of method names.

In addition, I notice that Bio.Graphics.BasicChromosome,
Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case
formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram expects
upper case.  We should be consistent, which for backwards compatibility would
mean accepting either case.

> (2) Update the docstrings for the "write to file/handle" functions to make it
> clear they can accept a filename OR a handle (a result of the underlying
> reportlab renderer's drawToFile function's behaviour - see note below).

I've updated the docstrings in CVS,

Bio/Graphics/BasicChromosome.py revision 1.3
Bio/Graphics/Comparative.py revision 1.2
Bio/Graphics/Distribution.py revision 1.3
Bio/Graphics/GenomeDiagram/Diagram.py revision 1.3


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From mjldehoon at yahoo.com  Wed Dec 24 10:52:48 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 24 Dec 2008 02:52:48 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <442447.52362.qm@web62407.mail.re1.yahoo.com>
Message-ID: <451304.38587.qm@web62407.mail.re1.yahoo.com>

Hi everybody,

How about the following for Biopython tests:

For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite". 

We could now do the following in run_tests.py:

1) import the testing module and save its output
2) try to call module.testing_suite
3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite.
4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output.

I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests.

Questions, comments, objections, anybody?

--Michiel.


--- On Thu, 12/4/08, Michiel de Hoon <mjldehoon at yahoo.com> wrote:

> From: Michiel de Hoon <mjldehoon at yahoo.com>
> Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework
> To: "Brad Chapman" <chapmanb at 50mail.com>, "Peter" <biopython at maubp.freeserve.co.uk>
> Cc: biopython-dev at lists.open-bio.org
> Date: Thursday, December 4, 2008, 7:32 AM
> > Michiel de Hoon wrote:
> > > If one of the sub-tests fails, Python's unit
> > > testing framework will tell us so,
> > > though (perhaps) not exactly which sub-test
> fails.
> > > However, that is easy to
> > > figure out just by running the individual test
> script
> > > by itself.
> > 
> > That won't always work.  Consider intermittent
> network
> > problems, or tests using random data - in general it 
> > really is worthwhile having run_tests.py report a
> little
> > more than just which test_XXX.py module failed.
> >
> I wonder if Python's unit testing framework allows us
> to capture exactly which sub-test fails. I'll look into
> that. Ideally, it should be possible to have regular Python
> unit tests and Biopython-style print-and-compare tests side
> by side, and get information about failing sub-tests for
> both.
> 
> --Michiel.
> 
> 
>       
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From dalloliogm at gmail.com  Thu Dec 25 19:22:04 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Thu, 25 Dec 2008 20:22:04 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com>
References: <442447.52362.qm@web62407.mail.re1.yahoo.com>
	<451304.38587.qm@web62407.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com>

On Wed, Dec 24, 2008 at 11:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> Hi everybody,
>
> How about the following for Biopython tests:
>
> For Python's unittest-style test modules, Python's unittest documentation recommends to define a function in each test module that returns the test suite. Most Biopython tests that use the unittest framework already do this (the function is called "testing_suite".

Merry Christmas!
Some people suggested me the nose python framework:
- http://somethingaboutorange.com/mrl/projects/nose/

It is used by many other open source projects, like sqlalchemy and elixir.
I haven't tried it but I think it does more or less everything you
said automatically, we could try to adopt it.


>
> We could now do the following in run_tests.py:
>
> 1) import the testing module and save its output
> 2) try to call module.testing_suite
> 3) if it exists, then we're using Python's unittest framework. So we run the tests in the testing suite.
> 4) if it does not exist, then we're using the print-and-compare approach. So we compare the saved output from the test to the correct output.
>
> I think that this can be set up such that it looks like nothing has changed for the user, while the files containing the correct output are no longer needed for the unittest-based tests.
>
> Questions, comments, objections, anybody?
>
> --Michiel.
>
>
> --- On Thu, 12/4/08, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> From: Michiel de Hoon <mjldehoon at yahoo.com>
>> Subject: Re: [Biopython-dev] Rethinking Biopython's testing framework
>> To: "Brad Chapman" <chapmanb at 50mail.com>, "Peter" <biopython at maubp.freeserve.co.uk>
>> Cc: biopython-dev at lists.open-bio.org
>> Date: Thursday, December 4, 2008, 7:32 AM
>> > Michiel de Hoon wrote:
>> > > If one of the sub-tests fails, Python's unit
>> > > testing framework will tell us so,
>> > > though (perhaps) not exactly which sub-test
>> fails.
>> > > However, that is easy to
>> > > figure out just by running the individual test
>> script
>> > > by itself.
>> >
>> > That won't always work.  Consider intermittent
>> network
>> > problems, or tests using random data - in general it
>> > really is worthwhile having run_tests.py report a
>> little
>> > more than just which test_XXX.py module failed.
>> >
>> I wonder if Python's unit testing framework allows us
>> to capture exactly which sub-test fails. I'll look into
>> that. Ideally, it should be possible to have regular Python
>> unit tests and Biopython-style print-and-compare tests side
>> by side, and get information about failing sub-tests for
>> both.
>>
>> --Michiel.
>>
>>
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From mjldehoon at yahoo.com  Fri Dec 26 14:32:02 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 26 Dec 2008 06:32:02 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com>
Message-ID: <726361.18977.qm@web62402.mail.re1.yahoo.com>

--- On Thu, 12/25/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> Some people suggested me the nose python framework:
> - http://somethingaboutorange.com/mrl/projects/nose/
> 
> It is used by many other open source projects, like
> sqlalchemy and elixir.
> I haven't tried it but I think it does more or less
> everything you
> said automatically, we could try to adopt it.

If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work?

--Michiel.


From dalloliogm at gmail.com  Fri Dec 26 17:52:58 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 26 Dec 2008 18:52:58 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <726361.18977.qm@web62402.mail.re1.yahoo.com>
References: <5aa3b3570812251122s43352380ke843c167e85569b5@mail.gmail.com>
	<726361.18977.qm@web62402.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com>

On Fri, Dec 26, 2008 at 3:32 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Thu, 12/25/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> Some people suggested me the nose python framework:
>> - http://somethingaboutorange.com/mrl/projects/nose/
>>
>> It is used by many other open source projects, like
>> sqlalchemy and elixir.
>> I haven't tried it but I think it does more or less
>> everything you
>> said automatically, we could try to adopt it.
>
> If we use nose, does that mean adding another dependency to Biopython? If so, I don't think it's worth it. If not, how does this work?

nose is a testing framework, so it is a dependency only for developers.
I have been able to install sqlalchemy and elixir (projects that make
use of nose) without having to install this framework first.

The docs on nose's website can explain its usage better than me.
Basically, you have to install nose (easy_install nose) and then run
it as a shell command (nosetests).
It automatically reads all the files in the current directory and
subdirectories, collects all the methods/classes/etc whose name begins
or ends with 'test_' (_test), plus any unittest, and execute them. It
can also read doctests, it is possible to write plugins and apply an
high degree of customization.
I tried to run it over the latest biopython cvs, and it already
highlighted some problems (a few modules still using Martel, etc).

I forgot to say that this project is also hosted on google/code:
- http://code.google.com/p/python-nose/
You can find more information in the docs:
- http://code.google.com/p/python-nose/wiki/FindingAndRunningTests


p.p.s. Even if it was a dependency, I think it is worth to use it
anyway, rather than rewriting existing code.

> --Michiel.
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From mjldehoon at yahoo.com  Fri Dec 26 21:40:57 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 26 Dec 2008 13:40:57 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com>
Message-ID: <590227.1906.qm@web62402.mail.re1.yahoo.com>

--- On Fri, 12/26/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> > If we use nose, does that mean adding another
> dependency to Biopython? If so, I don't think it's
> worth it. If not, how does this work?
> 
> nose is a testing framework, so it is a dependency only for
> developers.

If we use nose, can our users still run the Biopython tests (without having to install nose first)?

--Michiel.


From dalloliogm at gmail.com  Sat Dec 27 08:48:09 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sat, 27 Dec 2008 09:48:09 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <590227.1906.qm@web62402.mail.re1.yahoo.com>
References: <5aa3b3570812260952s5cc5fcc9k71f3e8c3a988e63c@mail.gmail.com>
	<590227.1906.qm@web62402.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>

On Fri, Dec 26, 2008 at 10:40 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Fri, 12/26/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> > If we use nose, does that mean adding another
>> dependency to Biopython? If so, I don't think it's
>> worth it. If not, how does this work?
>>
>> nose is a testing framework, so it is a dependency only for
>> developers.
>
> If we use nose, can our users still run the Biopython tests (without having to install nose first)?

Yes, but they will have to do it manually, or with a wrapper script
(as it is now).

Basically, we will have to move every test in functions/classes with
names beginning with 'test_'. To be more precise, they should match
the regular expression '(?:^|[b_.-])[Tt]est' (it is also possible to
coustomize this regex).

So, if a test now is it like this:

if __name__ == '__main__':
    seq = Seq('sadasda')
    assert seq.tostring() == 'sadasda'

we will have to refactor it like this:

def _test():
    """test description"""
    seq = Seq('sadasda')
    assert seq.tostring() == 'sadasda'

if __name__ == '__main__':
    _test()   # this is optional


> --Michiel.
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From mjldehoon at yahoo.com  Sun Dec 28 16:04:14 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 28 Dec 2008 08:04:14 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
Message-ID: <877679.6134.qm@web62406.mail.re1.yahoo.com>

--- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> >> > If we use nose, does that mean adding another
> >> > dependency to Biopython? If so, I don't think 
> >> > it's worth it. If not, how does this work?
> >>
> >> nose is a testing framework, so it is a dependency
> >> only for developers.
> >
> > If we use nose, can our users still run the Biopython
> tests (without having to install nose first)?
> 
> Yes, but they will have to do it manually, or with a
> wrapper script (as it is now).

By manually, do you mean running each test separately by hand? If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework?

--Michiel.


From biopython at maubp.freeserve.co.uk  Sun Dec 28 16:51:58 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 28 Dec 2008 16:51:58 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <451304.38587.qm@web62407.mail.re1.yahoo.com>
References: <442447.52362.qm@web62407.mail.re1.yahoo.com>
	<451304.38587.qm@web62407.mail.re1.yahoo.com>
Message-ID: <320fb6e00812280851y32450bb9le505ae257726f497@mail.gmail.com>

On Wed, Dec 24, 2008 at 10:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> Hi everybody,
>
> How about the following for Biopython tests:
>
> For Python's unittest-style test modules, Python's unittest documentation
> recommends to define a function in each test module that returns the
> test suite. Most Biopython tests that use the unittest framework already
> do this (the function is called "testing_suite".
>
> We could now do the following in run_tests.py:
>
> 1) import the testing module and save its output
> 2) try to call module.testing_suite
> 3) if it exists, then we're using Python's unittest framework.
> So we run the tests in the testing suite.
> 4) if it does not exist, then we're using the print-and-compare
> approach. So we compare the saved output from the test to the correct output.
>
> I think that this can be set up such that it looks like nothing has
> changed for the user, while the files containing the correct
> output are no longer needed for the unittest-based tests.
>
> Questions, comments, objections, anybody?

Sounds good to me - and doesn't add any new dependencies either.

Peter


From dalloliogm at gmail.com  Sun Dec 28 21:11:59 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Sun, 28 Dec 2008 22:11:59 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <877679.6134.qm@web62406.mail.re1.yahoo.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>

On Sun, Dec 28, 2008 at 5:04 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> >> > If we use nose, does that mean adding another
>> >> > dependency to Biopython? If so, I don't think
>> >> > it's worth it. If not, how does this work?
>> >>
>> >> nose is a testing framework, so it is a dependency
>> >> only for developers.
>> >
>> > If we use nose, can our users still run the Biopython
>> tests (without having to install nose first)?
>>
>> Yes, but they will have to do it manually, or with a
>> wrapper script (as it is now).


> If we use a wrapper script, then what is the difference between using nose and using Python's unittest framework?

The wrapper script won't be as efficient as using nose.
Writing a separated wrapper script will take much time and it will be
very difficult to mantain updated; moreover, you will have to test the
wrapper script itself, to prove that it works and doesn't alter the
results of the tests.

Nose is not a replacement for unittests: it is a tool that searches
for every unittest and script that look like a test, and execute it.
It has a few advantages more, for example it enables global methods
for setUp and tearDown, but it is not necessary to use them.


If you want to reorganize the biopython's testing infrastructure, then
you should think about adopting a serious testing environment, whether
it is nose or something else. You can't continue on relying on wrapper
scripts, they are too difficult to mantain and they are not really
scientifically valid.

The pygr project (another bioinformatics library in python) make use
of nose, and they explain how in their documentation:
- http://bioinformatics.ucla.edu/pygr_0_7_b3/testing-doc.html

Please have a look at the pages I have posted before.


> By manually, do you mean running each test separately by hand?

I mean they will have to be run in the same way as it is now.

Maybe, there is a way to use nose itself to create a wrapper script
automatically.
In fact, what nose does is to find all the functions that look like
tests, and then execute them. It should be possible to just save the
statements that are executed in a log file, that can be used as a
wrapper script.
If this option doesn't exists yet, we can just propose it to nose's developers.

In brief, I think it doesn't make sense to write a new testingg
framework just for biopython, when there are many already existing
tool available and free to use.


> --Michiel.
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Mon Dec 29 00:18:22 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Dec 2008 00:18:22 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
	<5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
Message-ID: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>

Giovanni wrote:
>> nose is a testing framework, so it is a dependency
>> only for developers.

Requiring another external dependency does count against using nose -
it is much nicer if anyone installing Biopython from source can run
our test suite without having to install anything further.

Giovanni wrote:
> If you want to reorganize the biopython's testing infrastructure, then
> you should think about adopting a serious testing environment, whether
> it is nose or something else. You can't continue on relying on wrapper
> scripts, they are too difficult to mantain and they are not really
> scientifically valid.

I'm not sure I understand your point here (especially re difficult to
maintain and not scientifically valid).

I'm failry happy with the current test framework - I would rather see
any effort be spent on writing more tests under the current framework
than switching the framework itself.

Giovanni wrote:
> In brief, I think it doesn't make sense to write a new testingg
> framework just for biopython, when there are many already existing
> tool available and free to use.

We haven't been talking about writing a new test frame work (which I
agree isn't a good idea).  Rather we're talking about a modification
to the existing Biopython test framework (part of which uses the built
in python unittest library).  Michiel's proposal on 24th Dec seems
like it will simplify working with unittest based tests (especially
not having to track their trivial output in CVS/SVN).

Peter


From dalloliogm at gmail.com  Mon Dec 29 09:53:51 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Mon, 29 Dec 2008 10:53:51 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
	<5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
	<320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>
Message-ID: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com>

On Mon, Dec 29, 2008 at 1:18 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Giovanni wrote:
>>> nose is a testing framework, so it is a dependency
>>> only for developers.
>
> Requiring another external dependency does count against using nose -
> it is much nicer if anyone installing Biopython from source can run
> our test suite without having to install anything further.

As I was saying before, it will be not a dependency. It's an external
tool that you can use or not to execute the tests automatically.
Also, it is not a replacement for unittest. It is comparable to using
epydoc for the documentation.

> Giovanni wrote:
>> If you want to reorganize the biopython's testing infrastructure, then
>> you should think about adopting a serious testing environment, whether
>> it is nose or something else. You can't continue on relying on wrapper
>> scripts, they are too difficult to mantain and they are not really
>> scientifically valid.
>
> I'm not sure I understand your point here (especially re difficult to
> maintain and not scientifically valid).
>

The wrapper script itself is a program. Therefore, if you want to be
paranoid, you will have to test it too :)
It will be difficult to mantain because everytime you will have to
modify it to adapt to the new tests etc.
Many big opensource python project make use of this framework, and it
has already been proven to work correctly; so the quality of biopython
would be comparable with those existing projects.
Another projecty that make use of nose is pytables (hdf5 format
wrapper for python). They say they have some billions of tests :).

> I'm failry happy with the current test framework - I would rather see
> any effort be spent on writing more tests under the current framework
> than switching the framework itself.
>
> Giovanni wrote:
>> In brief, I think it doesn't make sense to write a new testingg
>> framework just for biopython, when there are many already existing
>> tool available and free to use.
>
> We haven't been talking about writing a new test frame work (which I
> agree isn't a good idea).  Rather we're talking about a modification
> to the existing Biopython test framework (part of which uses the built
> in python unittest library).  Michiel's proposal on 24th Dec seems
> like it will simplify working with unittest based tests (especially
> not having to track their trivial output in CVS/SVN).

Then you will have to develop a way to execute only some of the tests
(e.g. only those who doesn't make use of internet connection, or only
those who make use of a database).
You will need to write some methods for running some setUp and
tearDown methods globally.
You will have to verify your wrapper script works.
In short, you will end up with writing a tool which will be really
similar to nose. So, since this tool already exists now, you will save
a lot of time by using it.
Michel's proposal is good, but I am saying that there are already
tools that do the same thing automatically.

>
> Peter
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Mon Dec 29 18:21:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Dec 2008 18:21:33 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<877679.6134.qm@web62406.mail.re1.yahoo.com>
	<5aa3b3570812281311t466e61bp99af198e918737d8@mail.gmail.com>
	<320fb6e00812281618r7ae4899g5aa1f1634bd1b217@mail.gmail.com>
	<5aa3b3570812290153k43e24a63nc0f27c90891adf7d@mail.gmail.com>
Message-ID: <320fb6e00812291021n297af797scaf7fd6ba1a7b048@mail.gmail.com>

>> We haven't been talking about writing a new test frame work (which I
>> agree isn't a good idea).  Rather we're talking about a modification
>> to the existing Biopython test framework (part of which uses the built
>> in python unittest library).  Michiel's proposal on 24th Dec seems
>> like it will simplify working with unittest based tests (especially
>> not having to track their trivial output in CVS/SVN).
>
> Then you will have to develop a way to execute only some of the tests
> (e.g. only those who doesn't make use of internet connection, or only
> those who make use of a database). ...

We already have that in place and working for our current framework.

> ... Michel's proposal is good, but I am saying that there are already
> tools that do the same thing automatically.

Well, let's go with Michiel's plan in the short term (a modification
to the current Biopython test framework, see his email of 24th
December).  We will then have a clear divide into two styles of unit
test:

(1) Those where the output is captured and compared to the expected
output (which will also be in CVS).  These are easy to write as
essentially any example Biopython script can be used.

(2) Those using the python unittest framework.  I think these are more
complicated and require a bit more effort and thought to write (and
debug), but make it very clear what exactly is being tested.

Peter


From mjldehoon at yahoo.com  Tue Dec 30 10:06:08 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 30 Dec 2008 02:06:08 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
Message-ID: <620107.65178.qm@web62401.mail.re1.yahoo.com>


--- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
> Basically, we will have to move every test in
> functions/classes with
> names beginning with 'test_'. To be more precise,
> they should match
> the regular expression '(?:^|[b_.-])[Tt]est' (it is
> also possible to
> coustomize this regex).
> 
> So, if a test now is it like this:
> 
> if __name__ == '__main__':
>     seq = Seq('sadasda')
>     assert seq.tostring() == 'sadasda'
> 
> we will have to refactor it like this:
> 
> def _test():
>     """test description"""
>     seq = Seq('sadasda')
>     assert seq.tostring() == 'sadasda'
> 
> if __name__ == '__main__':
>     _test()   # this is optional

Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this.

--Michiel


From dalloliogm at gmail.com  Tue Dec 30 13:53:34 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 30 Dec 2008 14:53:34 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <620107.65178.qm@web62401.mail.re1.yahoo.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<620107.65178.qm@web62401.mail.re1.yahoo.com>
Message-ID: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>

On Tue, Dec 30, 2008 at 11:06 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>
>
> --- On Sat, 12/27/08, Giovanni Marco Dall'Olio <dalloliogm at gmail.com> wrote:
>> Basically, we will have to move every test in
>> functions/classes with
>> names beginning with 'test_'. To be more precise,
>> they should match
>> the regular expression '(?:^|[b_.-])[Tt]est' (it is
>> also possible to
>> coustomize this regex).
>>
>> So, if a test now is it like this:
>>
>> if __name__ == '__main__':
>>     seq = Seq('sadasda')
>>     assert seq.tostring() == 'sadasda'
>>
>> we will have to refactor it like this:
>>
>> def _test():
>>     """test description"""
>>     seq = Seq('sadasda')
>>     assert seq.tostring() == 'sadasda'
>>
>> if __name__ == '__main__':
>>     _test()   # this is optional
>
> Probably I don't quite understand how nose works, but if we refactor the code in this way, is that sufficient to enable users to use nose if they want to? If so, it may be possible to write the test scripts in a nose-compliant way as a courtesy to nose users. The only problem I can see with this is that it will be difficult to maintain. Basically every new test will have to be written in this nose-compliant way, and users are likely to be unaware of this.


Why do you find it difficult?
You just have to rename every test to make sure that its name starts
or end with 'test_'. That's all.
If you want to reorganize biopython's testing framework, this is a
good thing to do anyway.

In particular, every test function/class/script name should match the
regular expression '(?:^|[b_.-])[Tt]est' (it can be customized).
Unittest modules and doctest will be recognized, too.
Note that nose already works if you run it over biopython's cvs; but
since I am not familiar with biopython's code, I am not sure it
recognizes every test.

Ehm, this example that I put won't work with the default settings :/
it expected 'test_module' or something like this (anyway, the regex
can be customized).


> --Michiel
>
>
>
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Tue Dec 30 17:29:06 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Dec 2008 17:29:06 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<620107.65178.qm@web62401.mail.re1.yahoo.com>
	<5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>
Message-ID: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com>

> You just have to rename every test to make sure that its name starts
> or end with 'test_'. That's all.
> If you want to reorganize biopython's testing framework, this is a
> good thing to do anyway.

All the individual Biopython test scripts are named test_*.py anyway,
so that should be fine.  Those test scripts were we have to verify the
output probably won't work in nose (this is handled via our
run_test.py framework), but the rest of our test scripts being
unittest based might already be fine with nose.

Peter


From dalloliogm at gmail.com  Tue Dec 30 18:34:15 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 30 Dec 2008 19:34:15 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com>
References: <5aa3b3570812270048x20c10c52h25c8e30a29697a45@mail.gmail.com>
	<620107.65178.qm@web62401.mail.re1.yahoo.com>
	<5aa3b3570812300553v74c48cd1x66c1b7280a3f3319@mail.gmail.com>
	<320fb6e00812300929j7fa767c7xce138912ae07d480@mail.gmail.com>
Message-ID: <5aa3b3570812301034i5c007d92k17a8e55c61b5715@mail.gmail.com>

On Tue, Dec 30, 2008 at 6:29 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> You just have to rename every test to make sure that its name starts
>> or end with 'test_'. That's all.
>> If you want to reorganize biopython's testing framework, this is a
>> good thing to do anyway.
>
> All the individual Biopython test scripts are named test_*.py anyway,
> so that should be fine.  Those test scripts were we have to verify the
> output probably won't work in nose (this is handled via our
> run_test.py framework), but the rest of our test scripts being
> unittest based might already be fine with nose.

I think it executes also the run_test.py scripts, because its name
matches that regular expression.

> Peter
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From dalloliogm at gmail.com  Tue Dec 30 18:34:45 2008
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 30 Dec 2008 19:34:45 +0100
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
References: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
	<45956.75241.qm@web62406.mail.re1.yahoo.com>
	<320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
Message-ID: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>

On Fri, Nov 28, 2008 at 12:09 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:

> Brad wrote:
>> Agreed with the distinction between the unit tests and the "dump
>> lots of text and compare" approach. I've written both and do think
>> the unit testing/assertion model is more robust since you can go
>> back and actually get some insight into what someone was thinking
>> when they wrote an assertion.
>
> I have probably written more of the "dump lots of text and compare"
> style tests.  I think these have a number of advantages:
> (1) Easier for beginneers to write a test, you can almost take any
> example script and use that.  You don't have to learn the unit test
> framework.

I agree with what you say, but I think that all the 'dump and compare'
tests should be organized in various functions.
This will make easier to use and understand them, and they will be
compatible with the nose framework.

> (2) Debugging a failing test in IDLE is much easier - using unit tests
> you have all that framework between you and the local scope where the
> error happens.

> (3) For many broad tests, manually setting up the expected output for
> an assert is extremely tedious (e.g. parsing sequences and checking
> their checksums).

This is an interesting discussion if you want to talk about it a bit.

An advantage of unittest are the two setUp and tearDown methods (fixtures).
With those, you are sure that all the tests are run with the right
environment and that all variables are dropped before executing a new
test.

Also, if you want to do a lot of dump and compare tests, consider
writing some big doctest scripts.
It will require a bit more of work to write them, but they will be
easier to understand, and they will also become good tutorials for the
users.

This is a tutorial we wrote for a small project not related to biopython:
- http://github.com/cswegger/datamatrix/tree/master/tutorial.txt
As you can see, the text is both a tutorial and a test set (which make
use of a dump and compare approach) for the program.

> We could discuss a modification to run_tests.py so that if there is no
> expected output file output/test_XXX for test_XXX.py we just run
> test_XXX.py and check its return value (I think Michiel had previously
> suggested something like this).

I think this should be done inside the test itself.
All the tests should return only a boolean value (passed or not) and a
description of the error.
The tests that make use of an expected output file, they should open
it and do the comparison by theirselves, not in run_tests.py.

> Perhaps for more robustness, capture
> the output and compare it to a predefined list of regular expressions
> covering the typical outputs.  For example, looking at
> output/test_Cluster, the first line is the test name, but rest follows
> the patten "test_... ok". I imaging only a few output styles exist.

mmm have you changed this file in the cvs recently? I can't find what
you are referring to.

> With such a change, half the unit test's (e.g. test_Cluster.py)
> wouldn't need their output file in CVS (output/test_Cluster).
>
> Michiel de Hoon wrote:
>> If one of the sub-tests fails, Python's unit testing framework will tell us so,
>> though (perhaps) not exactly which sub-test fails. However, that is easy to
>> figure out just by running the individual test script by itself.
>
> That won't always work.  Consider intermittent network problems, or
> tests using random data - in general it really is worthwhile having
> run_tests.py report a little more than just which test_XXX.py module
> failed.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Tue Dec 30 23:33:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Dec 2008 23:33:16 +0000
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>
References: <20081125144041.GC83220@sobchak.mgh.harvard.edu>
	<45956.75241.qm@web62406.mail.re1.yahoo.com>
	<320fb6e00811280309w7b5f0fc6m38795c4dc61c8744@mail.gmail.com>
	<5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>
Message-ID: <320fb6e00812301533h55f5e9eehcec69cc1d5913420@mail.gmail.com>

Brad wrote:
>>> Agreed with the distinction between the unit tests and the "dump
>>> lots of text and compare" approach. I've written both and do think
>>> the unit testing/assertion model is more robust since you can go
>>> back and actually get some insight into what someone was thinking
>>> when they wrote an assertion.

Peter worte:
>> I have probably written more of the "dump lots of text and compare"
>> style tests.  I think these have a number of advantages:
>> (1) Easier for beginners to write a test, you can almost take any
>> example script and use that.  You don't have to learn the unit test
>> framework.
>> ...

Giovanni wrote:
> I agree with what you say, but I think that all the 'dump and compare'
> tests should be organized in various functions.
> This will make easier to use and understand them, and they will be
> compatible with the nose framework.

If we organise the "dump and compare" tests into various functions
(e.g. using the unittest framework), and turn print statements into
asserts etc, then yes they would become nose compatible.  However,
this is a lot of work, and for relatively little gain.  Also, doing so
we lose the simplicity (e.g. my points made earlier) and make it
harder for newcomers to write further tests.

Nevertheless, we could regard Michiel's plan of 24 Dec as a step
towards this, in that it simplifies writing unittest based tests (in
that they won't need an expected output file which must also be kept
in CVS/SVN).

I'm not sure what you meant by "This will make easier to use and
understand them, ...".  Switching the unit test coding style makes no
difference to the end user's point of view, they run the test suite
using "python setup.py test" (typically as part of installation from
source, or from the tests directory using "python run_tests.py") and
won't see any difference in how the tests work internally.

In terms of understanding the unit tests: If you are a beginner
wanting to look at a unit test to give a feel for how to use the code,
then frankly those of our unit tests which simple do some imports and
print some output are MUCH easier to understand.  By their nature they
are essentially example Biopython scripts.  On the other hand, those
of our unit tests using the unittest framework have all these each
object classes defined, and split up the setup/clean up into separate
methods etc.  In some senses this is "clutter" which is not helpful if
you want to regard the unit test also as a usage example.

>> (2) Debugging a failing test in IDLE is much easier - using unit tests
>> you have all that framework between you and the local scope where the
>> error happens.
>
>> (3) For many broad tests, manually setting up the expected output for
>> an assert is extremely tedious (e.g. parsing sequences and checking
>> their checksums).
>
> This is an interesting discussion if you want to talk about it a bit.

It could be, but I don't want to get side tracked (distracted) from
pressing ahead with Michiel's plan (the email of 24th Dec, or
something similar) which seems to be a worthwhile small improvement to
the current status.

> An advantage of unittest are the two setUp and tearDown methods (fixtures).
> With those, you are sure that all the tests are run with the right
> environment and that all variables are dropped before executing a new
> test.

For some tests, yes, this is useful - in particular where there are
lots of independent small things you want to test.  In other
situations you want to test a work flow, with a series of cumulative
steps each building on each other.  This would end up as a single
large test function/method.

> Also, if you want to do a lot of dump and compare tests, consider
> writing some big doctest scripts.
> It will require a bit more of work to write them, but they will be
> easier to understand, and they will also become good tutorials for the
> users.

Certainly some of the current simple "dump and compare" tests might be
converted into doctests (and we could do this within the current
Biopython framework).  However, the requirements for good
documentation and good test coverage differ - you'd want to include
tests for atypical code which you would not want to encourage as good
coding practice.  I'm quite keen for further usage of doctests - but I
see them primarily as an improvement to our documentation.

Peter wrote:
>> We could discuss a modification to run_tests.py so that if there is no
>> expected output file output/test_XXX for test_XXX.py we just run
>> test_XXX.py and check its return value (I think Michiel had previously
>> suggested something like this).

Note that Michiel's email of 24th Dec is another approach to this
topic - either would work, but his plan makes the division between the
two test types much more explicit.

Giovanni wrote:
> I think this should be done inside the test itself.
> All the tests should return only a boolean value (passed or not) and a
> description of the error.
> The tests that make use of an expected output file, they should open
> it and do the comparison by theirselves, not in run_tests.py.

Your plan would work, but it means the simplicity of this style of
unit test is lost.  Rather than doing this change (which would be a
moderate amount of tedious work), I would rather go all the way and
make them unittest based like the rest of our test suite.

>> Perhaps for more robustness, capture
>> the output and compare it to a predefined list of regular expressions
>> covering the typical outputs.  For example, looking at
>> output/test_Cluster, the first line is the test name, but rest follows
>> the patten "test_... ok". I imaging only a few output styles exist.
>> With such a change, half the unit test's (e.g. test_Cluster.py)
>> wouldn't need their output file in CVS (output/test_Cluster).
>
> mmm have you changed this file in the cvs recently? I can't find what
> you are referring to.

For this example, the unit test Tests/test_Cluster.py is here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/test_Cluster.py?cvsroot=biopython

Its expected output file Test/output/test_Cluster is here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Tests/output/test_Cluster?cvsroot=biopython

Peter