From bugzilla-daemon at portal.open-bio.org  Thu Jan  1 20:37:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 1 Jan 2009 20:37:43 -0500
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To: <bug-2544-42@http.bugzilla.open-bio.org/>
Message-ID: <200901020137.n021bhEB022751@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2544


------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz  2009-01-01 20:37 EST -------
Can I instantiate GenBank file, reverse-complement the sequence (keep letter
casing) in the SeqIO object and dump it back to a GenBank file?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan  2 13:15:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 2 Jan 2009 13:15:46 -0500
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To: <bug-2544-42@http.bugzilla.open-bio.org/>
Message-ID: <200901021815.n02IFkcf012662@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2544


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-02 13:15 EST -------
(In reply to comment #4)
> Can I instantiate GenBank file, reverse-complement the sequence
> (keep letter casing) in the SeqIO object and dump it back to a
> GenBank file?

I think this question would have been better handled on the mailing lists,
rather than on this bug.  Note that currently our GenBank output via Bio.SeqIO
does not include the features and references - see Bug 2294.

I would do this based on the approach described in the tutorial, which assumes
there could be many records in the input file.  Here is a variation for just
one record (untested):

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
record = SeqIO.read(open("example.gbk"), "genbank")
rc_record = SeqRecord(seq = record.seq.reverse_complement(), \
                      id = "rc_" + record.id, \
                      name = "rc_" + record.name, \
                      description = "reverse complement")
out_handle = open("rc_example.gbk","w")
SeqIO.write([rc_record], out_handle, "genbank")
out_handle.close()

Note you *could* override the record's sequence in situ:
record.seq = record.seq.reverse_complement() #BAD IDEA
This is a bad idea because none of the annotations will have been changed - in
addition to the name/id/description still being the same, all the feature
locations etc will still be for the forward sequence.

--

I'm leaving this bug open for defining __repr__ for the
Bio.SeqFeature.Reference object (and perhaps tweaking the display of the
references in the SeqRecord __str__ method) ONLY.

Please continue any other discussion on the mailing lists.  Thanks.

Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jan  3 17:18:56 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 17:18:56 -0500
Subject: [Biopython-dev] [Bug 2723] New: Clarify what applies to which
	version of biopython and other doc cleanup
Message-ID: <bug-2723-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723

           Summary: Clarify what applies to which version of biopython and
                    other doc cleanup
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


I went to look around at the docs because the built-in tests of 1.49 setup.py
spitted some messages about external programs missing. I haven't found any
hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/.

Anyway, looking at 
http://biopython.org/DIST/docs/install/Installation.html#htoc17
I see: "3.4  mxTextTools (no longer needed)". I would propose:

3.4  mxTextTools (no longer needed since 1.49)

Similarly:
- 3.1  Numerical Python (NumPy) (strongly recommended)
+ 3.1  Numerical Python (NumPy) (strongly recommended since 1.49)


Bad URL links are in the text:


3.3  Database Access (MySQLdb, ...) (optional)

[cut]

Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be
used for accessing BioSQL databases through Biopython (see ). Again if you are 
-----------------------------------------------------------^
not going to use BioSQL, there shouldn???t be any need to install these
modules.


3.4  mxTextTools (no longer needed)

[cut]

However, we currently recommend you install mxTextTools 2.0, as some of the API
changes made in 3.0 version were not compatible with Biopython. Goto to
download
---------------------------------------------------------------------^^
this.


I haven't found an answer for me yet:

test_PopGen_FDist ... skipping. Install FDist if you want to use
Bio.PopGen.FDist.
ok
test_PopGen_FDist_nodepend ... ok
test_PopGen_GenePop ... ok
test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use
Bio.PopGen.SimCoal.
ok
test_PopGen_SimCoal_nodepend ... ok
test_ProtParam ... ok
test_Registry ... ok
test_Restriction ... ok
test_SCOP_Astral ... ok
test_SCOP_Cla ... ok
test_SCOP_Des ... ok
test_SCOP_Dom ... ok
test_SCOP_Hie ... ok
test_SCOP_Raf ... ok
test_SCOP_Residues ... ok
test_SCOP_Scop ... ok
test_SProt ... ok
test_SVDSuperimposer ... ok
test_SeqIO ... ok
test_SeqIO_online ... ok
test_SeqUtils ... ok
test_SubsMat ... ok
test_UniGene ... ok
test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
ok
test_align ... ok
test_docstrings ... ok
test_geo ... ok
test_interpro ... ok
test_kNN ... ok
test_lowess ... ok
test_pairwise2 ... ok
test_prodoc ... ok
test_property_manager ... ok
test_prosite ... ok
test_prosite2 ... ok
test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
ok
test_seq ... ok
test_translate ... ok
test_trie ... ok
test_triefind ... ok

----------------------------------------------------------------------
Ran 96 tests in 172.215s

OK


Pointer to those packages would have been helpful. From the test suite as well
as from installation manual. Moreover, what database username/password would
I have to make to get the BioSQL stuff compiled and tested?  ^H^H^H^H^H^H
I see, it gets compiled anyway the tests just were not run. The installation
manual and the output from test suite should be clearer.

Thanks, Peter!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jan  3 17:30:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 17:30:55 -0500
Subject: [Biopython-dev] [Bug 2724] New: Unclear? changes between 1.47 and
	1.49
Message-ID: <bug-2724-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2724

           Summary: Unclear? changes between 1.47 and 1.49
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


I had a look by diff(1) what files were installed on my machine by 1.47 release
and which were installed by 1.49. I don't know what cdistance was about but the
mailing list archive search tool does not work, and searching for it manually
in raw archives of Oct and Nov 2008 did not help.

The second file shown here contains a white space in a filename, not critical
but maybe good to rename in next release.

-/usr/lib/python2.5/site-packages/Bio/cdistance.so
+/usr/share/biopython/Tests/Clustalw/temp horses.dnd


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jan  3 20:10:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 20:10:02 -0500
Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49
In-Reply-To: <bug-2724-42@http.bugzilla.open-bio.org/>
Message-ID: <200901040110.n041A2e5028585@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2724


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-03 20:10 EST -------
Bio.cdistance was an optional C implementation used within Bio.distance - the C
code was used if available to speed up calculations.  You can see the (now
deleted) code in CVS here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Attic/cdistancemodule.c?hideattic=0&cvsroot=biopython

This C code (Bio.cdistance) was removed when the python code (Bio.distance) was
deprecated for release 1.49.

This was discussed at the start of October on the mailing list, see this
thread:
http://lists.open-bio.org/pipermail/biopython/2008-October/004532.html


This should have been mentioned in the DEPRECATED file, but wasn't.  I've
update this in CVS, see revision 1.41

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/DEPRECATED?cvsroot=biopython

Thanks for spotting this omission.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jan  3 20:20:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 20:20:42 -0500
Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49
In-Reply-To: <bug-2724-42@http.bugzilla.open-bio.org/>
Message-ID: <200901040120.n041Kgkx029421@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2724


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-03 20:20 EST -------
The file "/usr/share/biopython/Tests/Clustalw/temp horses.dnd" is normally
created by one of the unit tests, test_Clustalw_tool.py (and the space is very
deliberate).

This stray dnd file does appear to have been included with biopython-1.49.zip
(and probably the tar ball as well), which must have been a minor slip on my
part.  However, I don't think its worth re-issuing the archive files over this.

I've updated test_Clustalw_tool.py as of CVS revision 1.4 so that it should
remove this dnd file automatically.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jan  3 20:37:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 20:37:26 -0500
Subject: [Biopython-dev] [Bug 2723] Clarify what applies to which version of
	biopython and other doc cleanup
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901040137.n041bQ6Z030767@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-03 20:37 EST -------
(In reply to comment #0)
> I went to look around at the docs because the built-in tests of 1.49 setup.py
> spitted some messages about external programs missing. I haven't found any
> hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/.

No, that text and the matching email announcement don't do into details about
installation - the text was already long enough I felt.  However, the download
page does list various external programs:
http://biopython.org/wiki/Download

(Someone else had pointed out we were missing a few, which as been fixed, but I
couldn't find the email/bug report while writing this reply).

> Anyway, looking at 
> http://biopython.org/DIST/docs/install/Installation.html#htoc17
> I see: "3.4  mxTextTools (no longer needed)". I would propose:
> 
> 3.4  mxTextTools (no longer needed since 1.49)
> 
> Similarly:
> - 3.1  Numerical Python (NumPy) (strongly recommended)
> + 3.1  Numerical Python (NumPy) (strongly recommended since 1.49)

That does seem sensible.

> Bad URL links are in the text:
> 
> 3.3  Database Access (MySQLdb, ...) (optional)
> 
> [cut]
> 
> Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be
> used for accessing BioSQL databases through Biopython (see ). Again if you
> -----------------------------------------------------------^
> are not going to use BioSQL, there shouldn???t be any need to install these
> modules.
> 
> 
> 3.4  mxTextTools (no longer needed)
> 
> [cut]
> 
> However, we currently recommend you install mxTextTools 2.0, as some of the
> API changes made in 3.0 version were not compatible with Biopython. Goto
> ---------------------------------------------------------------------^^
> to download this.

I'll have to check those... probably something silly in the LaTeX source.

> I haven't found an answer for me yet:
> 
> test_PopGen_FDist ... skipping. Install FDist if you want to use
> Bio.PopGen.FDist.
> ok
> ...
> test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use
> Bio.PopGen.SimCoal.
> ok
> ...
> test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
> ok
> test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
> ok

See http://biopython.org/wiki/Download

> Pointer to those packages would have been helpful. From the test suite as well
> as from installation manual.

I'm not keen on making the unit test even more verbose by adding URLs to these
messages.  The information is on the download page, but yes, adding it to the
installation document seems sensible.

> Moreover, what database username/password would
> I have to make to get the BioSQL stuff compiled and tested?  ^H^H^H^H^H^H
> I see, it gets compiled anyway the tests just were not run.

The BioSQL unit test message should say: "Check settings in
Tests/setup_BioSQL.py if you plan to use BioSQL".  i.e. Once you have installed
BioSQL and setup a database, edit the file setup_BioSQL.py to match.  See
http://biopython.org/wiki/BioSQL

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jan  4 13:56:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 4 Jan 2009 13:56:22 -0500
Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation
	document
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901041856.n04IuMhJ028749@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Clarify what applies to     |Minor corrections to the
                   |which version of biopython  |installation document
                   |and other doc cleanup       |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-04 13:56 EST -------
(In reply to comment #1)
> (In reply to comment #0)
> > I went to look around at the docs because the built-in tests of 1.49
> > setup.py spitted some messages about external programs missing. I haven't
> > found any hints on them in
> > http://news.open-bio.org/news/2008/11/biopython-release-149/.
> 
> No, that text and the matching email announcement don't do into details about
> installation - the text was already long enough I felt.  However, the download
> page does list various external programs:
> http://biopython.org/wiki/Download

I've added a section on third party tools to the installation document in CVS.

> > Anyway, looking at 
> > http://biopython.org/DIST/docs/install/Installation.html#htoc17
> > I see: "3.4  mxTextTools (no longer needed)". I would propose:
> > 
> > 3.4  mxTextTools (no longer needed since 1.49)
> > 
> > Similarly:
> > - 3.1  Numerical Python (NumPy) (strongly recommended)
> > + 3.1  Numerical Python (NumPy) (strongly recommended since 1.49)
> 
> That does seem sensible.

On reflection, I don't like the layout with version numbers stuck in the
section names.  The NumPy section is already very clear about the fact that
this applies to 1.49 onwards, and that older versions of Biopython needed
Numeric instead.  I have tried to clarify the mxTextTools section in CVS.

> > Bad URL links are in the text:
> > 
> > 3.3  Database Access (MySQLdb, ...) (optional)
> > ...
> > 3.4  mxTextTools (no longer needed)
> > ...
> 
> I'll have to check those... probably something silly in the LaTeX source.

Fixed in CVS.

I'm leaving this bug open until I've updated the HTML and PDF copies of the
installation document on the website.  I don't have the tools hevea installed
on this machine, so I can't create the HTML version of the installation
document -- just the PDF.  I should be be able to do this next week...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sun Jan  4 17:09:47 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 4 Jan 2009 17:09:47 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200901042209.n04M9lJ0010428@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #32 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-04 17:09 EST -------
(In reply to comment #30)
> (In reply to comment #29)
> > 
> > I propose that in Biopython 1.50 we support both "colour" and "color",
> > but for Biopython 1.51 we add deprecation warnings when "colour" is used.
> > 
> > We should probably do the same thing for "centre" and "center" as well...
> > 
> 
> I agree.  We should encourage use of the US spelling in the documentation, to
> catch those new to GD. This approach provides a window for conversion of old
> GD scripts for previous users, which is a good thing.
> 

I've updated CVS to switch from centre to centre, with properties setup to
allow access under the old spellings, and where I thought it appropriate I've
included both spellings in argument lists.  Another set of eyes to check this
wouldn't hurt.

I'm leaving this bug open until we've done the documentation (see my comment
25).

There is also the issue of Bug 2705 for the AT and GC content and skew
functions and any windowing function to help plot these in GenomeDiagram.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 11:30:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 11:30:46 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901051630.n05GUkun032207@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #17 from bsouthey at gmail.com  2009-01-05 11:30 EST -------
I do not consider this bug completely fixed for multiple reasons of which my
patch addressed some of these prior to the creation of the _write function. I
do like where _write is heading as it is making cleaner and more understandable
code.

1) I do not understand the need for the dictionary of modules 'formatdict' in
_write as it creates unnecessary inefficient code. The options need to be part
of the check for the type of output.

2) There is no indication that the output for write and write_to_string only
accepts uppercase. Note the _write function states this but a user will not see
these. I do not understand why lowercase is unacceptable. 

3) The check for renderPM at start is really redundant because _write checks
for it (well sort of). It is also an unnecessary delay if renderPM is not used.
If you really must use the dictionary (which I really do not like) I would
suggest something like:
formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG}
try:
    from reportlab.graphics import renderPM
    formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM,
'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM})

The current code would show the correct options regardless of status
ofrenderPM. Perhaps an exception could provide a warning that renderPM is not
present.

4) There is no test for the presence of renderPM. The test function must check
for renderPM and should at least provide a warning if not present. Otherwise
this is a surprise to a user because not all options will be available.

5) The installation documentation must also indicate that renderPM is optional
and also how to install the renderPM module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 11:49:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 11:49:46 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200901051649.n05GnkVK001550@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 11:49 EST -------
Still to do on the documentation front (as written in comment #25),
> 
> * Updating the existing GenomeDiagram manual to match (different imports,
> colour to color), which I think can stay as a separate PDF file.
> 
> * A short introduction to Bio.Graphics including GenomeDiagram as part of
> a new chapter in the tutorial?

Plus (as pointed out on Bug 2711 / Bug 2710):

* Updating the installation instructions so that the ReportLab section also
covers renderPM (needed for bitmaps).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 11:56:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 11:56:57 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901051656.n05GuvPP002443@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 11:56 EST -------
(In reply to comment #17)
> I do not consider this bug completely fixed for multiple reasons of which my
> patch addressed some of these prior to the creation of the _write function. I
> do like where _write is heading as it is making cleaner and more
> understandable code.
> 
> 1) I do not understand the need for the dictionary of modules 'formatdict' in
> _write as it creates unnecessary inefficient code. The options need to be part
> of the check for the type of output.

OK the use of a dictionary is a style thing.  You think its ugly and
inefficient.  Leighton and I don't find it ugly.  I thought the
if/elif/elif/else alternative you suggested was "ugly".

The argument for the type of output does get checked (by catching a KeyError
from the dictionary).

> 2) There is no indication that the output for write and write_to_string only
> accepts uppercase. Note the _write function states this but a user will not
> see these. I do not understand why lowercase is unacceptable. 

As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we
should after all accept either case.

> 3) The check for renderPM at start is really redundant because _write checks
> for it (well sort of). It is also an unnecessary delay if renderPM is not
> used. If you really must use the dictionary (which I really do not like) I
> would suggest something like:
> formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG}
> try:
>     from reportlab.graphics import renderPM
>     formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM,
> 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM})

I don't see how that would work, because unfortunately with the reportlab API,
we must treat renderPM differently to renderPDF, renderPS and renderSVG.

> The current code would show the correct options regardless of status
> ofrenderPM. Perhaps an exception could provide a warning that renderPM
> is not present.

Right now we do have a "helpful" exception raised when a bitmap format is
requested and renderPM is not installed.

> 4) There is no test for the presence of renderPM. The test function must check
> for renderPM and should at least provide a warning if not present. Otherwise
> this is a surprise to a user because not all options will be available.

There is an "on demand" test - via the _write function.  As Leighton has
already pointed out, this is nasty in that it can come as a surprise to the
user.  However, as far as I can see the alternative is an error/warning at
import time regardless even if the user doesn't need or want bitmap output
(i.e. Bug 2710).  The current situation strikes me as the lesser of two evils.

> 5) The installation documentation must also indicate that renderPM is
> optional and also how to install the renderPM module.

Yes, we should indicate renderPM is optional.  Updating our documentation to
cover GenomeDiagram is still pending on Bug 2671.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 16:46:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 16:46:37 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901052146.n05LkbSZ031281@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #19 from bsouthey at gmail.com  2009-01-05 16:46 EST -------
(In reply to comment #18)
> (In reply to comment #17)
> > I do not consider this bug completely fixed for multiple reasons of which my
> > patch addressed some of these prior to the creation of the _write function. I
> > do like where _write is heading as it is making cleaner and more
> > understandable code.
> > 
> > 1) I do not understand the need for the dictionary of modules 'formatdict' in
> > _write as it creates unnecessary inefficient code. The options need to be part
> > of the check for the type of output.
> 
> OK the use of a dictionary is a style thing.  You think its ugly and
> inefficient.  Leighton and I don't find it ugly.  I thought the
> if/elif/elif/else alternative you suggested was "ugly".
> 
> The argument for the type of output does get checked (by catching a KeyError
> from the dictionary).

I agree that reportlab makes any solution "ugly" because the different types
require different arguments. I agree this is partly a style issue because it is
a case of what to do first, when to do it and when to tell the user what is
missing. 

> 
> > 2) There is no indication that the output for write and write_to_string only
> > accepts uppercase. Note the _write function states this but a user will not
> > see these. I do not understand why lowercase is unacceptable. 
> 
> As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we
> should after all accept either case.
> 
> > 3) The check for renderPM at start is really redundant because _write checks
> > for it (well sort of). It is also an unnecessary delay if renderPM is not
> > used. If you really must use the dictionary (which I really do not like) I
> > would suggest something like:
> > formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG}
> > try:
> >     from reportlab.graphics import renderPM
> >     formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM,
> > 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM})
> 
> I don't see how that would work, because unfortunately with the reportlab API,
> we must treat renderPM differently to renderPDF, renderPS and renderSVG.
> 

This just moves the renderPM import into _write and the rest of the code runs
if you add:
except:
    renderPM=None

> > The current code would show the correct options regardless of status
> > ofrenderPM. Perhaps an exception could provide a warning that renderPM
> > is not present.
> 
> Right now we do have a "helpful" exception raised when a bitmap format is
> requested and renderPM is not installed.

Again a style issue because I just find it redundant if we already know that
renderPM is not present.

> 
> > 4) There is no test for the presence of renderPM. The test function must check
> > for renderPM and should at least provide a warning if not present. Otherwise
> > this is a surprise to a user because not all options will be available.
> 
> There is an "on demand" test - via the _write function.  As Leighton has
> already pointed out, this is nasty in that it can come as a surprise to the
> user.  However, as far as I can see the alternative is an error/warning at
> import time regardless even if the user doesn't need or want bitmap output
> (i.e. Bug 2710).  The current situation strikes me as the lesser of two evils.
> 

I mean that test_GenomeDiagram should also check for renderPM and provide a
warning if not present. So if tests are run then there is some indication that
something is missing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 17:33:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 17:33:30 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901052233.n05MXUCS002828@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 17:33 EST -------
(In reply to comment #19)
> I mean that test_GenomeDiagram should also check for renderPM and provide a
> warning if not present. So if tests are run then there is some indication that
> something is missing.

The way we have our external dependency checking setup, if something is missing
the whole test is skipped.  I want to keep test_GenomeDiagram.py as it is
producing PDF output (with no dependency on renderPM - so that the core
GenomeDiagram functionality is tested).

However, I had been thinking about adding a (smaller) extra test, say
test_GenomeDiagram_bitmaps.py which would need renderPM installed. 
Alternatively this could be a more general quick test for making PNG etc with
all of Bio.Graphics after fixing Bug 2718.

This would as you point out mean anyone running the test suite would then be
alerted to the fact they may be missing renderPM - which would be a good thing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 18:20:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 18:20:52 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200901052320.n05NKqok006769@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 18:20 EST -------
(In reply to comment #2)
> In addition, I notice that Bio.Graphics.BasicChromosome,
> Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case
> formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram
> expects upper case.  We should be consistent, which for backwards
> compatibility would mean accepting either case.

Bio.Graphics.GenomeDiagram will now accept format names in any case.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 19:16:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 19:16:10 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200901060016.n060GAfe011559@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 19:16 EST -------
Created an attachment (id=1186)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1186&action=view)
Adding output function to Bio.Graphics for shared use

This is based on the code from Bio.Graphics.GenomeDiagram.Diagram and would be
called from all the Bio.Graphics modules to output to a file/handle in any
supported file format, in a consistent manor.

This is done as a private function, as I do not want to expose this as a new
public API.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 19:18:06 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 19:18:06 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901060018.n060I6eq011760@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 19:18 EST -------
(In reply to comment #17)
> I do not consider this bug completely fixed for multiple reasons of which my
> patch addressed some of these prior to the creation of the _write function. I
> do like where _write is heading as it is making cleaner and more
> understandable code.

I decided that since ReportLab used a cStringIO or StringIO handle internally
to implement its writeToString method, we might as well do the same as it
allows a great simplification to the GenomeDiagram write and write_to_string
methods (and we can get rid of _write too).

See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython

I hope you'll agree that this is a further improvement (even if the dictionary
approach is still used internally).

My plan (see Bug 2718) is to move this code into a shared private function for
all of the Bio.Graphics modules to use.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Mon Jan  5 19:48:12 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 6 Jan 2009 00:48:12 +0000
Subject: [Biopython-dev] Structure and LDNe
Message-ID: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>

Hi all,

Jason Eshleman (he subscribes to this list also) has made available
code to interact with Structure (a widely used application in
population genetics - the 2 papers related to it have around 3000
citations acording to Google scholar). We will try to convert his code
to the Bio.PopGen namespace, create documentation and test cases.
To this adds the exsiting LDNe code (mine). This all should be ready
in a reasonably fast time frame (I suppose before the next release).

The all important statistics part is still due, I am afraid (I don't
know if anybody has looked at the beta code on git). But at least this
LDNe and Structure code will be ready to go soon.

Tiago

From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 21:56:35 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 21:56:35 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901060256.n062uZBF023086@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #22 from bsouthey at gmail.com  2009-01-05 21:56 EST -------
(In reply to comment #21)
> (In reply to comment #17)
> > I do not consider this bug completely fixed for multiple reasons of which my
> > patch addressed some of these prior to the creation of the _write function. I
> > do like where _write is heading as it is making cleaner and more
> > understandable code.
> 
> I decided that since ReportLab used a cStringIO or StringIO handle internally
> to implement its writeToString method, we might as well do the same as it
> allows a great simplification to the GenomeDiagram write and write_to_string
> methods (and we can get rid of _write too).
> 
> See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython
> 
> I hope you'll agree that this is a further improvement (even if the dictionary
> approach is still used internally).
> 
> My plan (see Bug 2718) is to move this code into a shared private function for
> all of the Bio.Graphics modules to use.
> 

That is great! 

Note that reportlab's drawToString first uses it's getStringIO() and passes
that to drawToFile. I am not sure the difference between getStringIO() and
StringIO() but getStringIO() might be preferred. 

Also, I would presume that checking for the filename would allow you to combine
the writing to a file and writing to a string into a single new function to
maintain backwards compatibility.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From rhythmbox-devel at maubp.freeserve.co.uk  Tue Jan  6 05:01:34 2009
From: rhythmbox-devel at maubp.freeserve.co.uk (Peter)
Date: Tue, 6 Jan 2009 10:01:34 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
Message-ID: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>

On Tue, Jan 6, 2009 at 12:48 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi all,
>
> Jason Eshleman (he subscribes to this list also) has made available
> code to interact with Structure (a widely used application in
> population genetics - the 2 papers related to it have around 3000
> citations acording to Google scholar). We will try to convert his code
> to the Bio.PopGen namespace, create documentation and test cases.
> To this adds the exsiting LDNe code (mine). This all should be ready
> in a reasonably fast time frame (I suppose before the next release).

That sounds good :)

> The all important statistics part is still due, I am afraid (I don't
> know if anybody has looked at the beta code on git). But at least this
> LDNe and Structure code will be ready to go soon.
>
> Tiago

I haven't looked at any of your code on git - and I probably won't
have any spare time till next week.  But anyway, do you have the URL
handy?

Thanks

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Jan  6 07:30:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 6 Jan 2009 07:30:39 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901061230.n06CUds2006927@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-06 07:30 EST -------
(In reply to comment #22)
> That is great! 
> 
> Note that reportlab's drawToString first uses it's getStringIO() and passes
> that to drawToFile. I am not sure the difference between getStringIO() and
> StringIO() but getStringIO() might be preferred. 

>From going through the ReportLab code a week or two ago, it ends up using
cStringIO (or falling back on StringIO) internally.

> Also, I would presume that checking for the filename would allow you to
> combine the writing to a file and writing to a string into a single new
> function to maintain backwards compatibility.

You'd then have one method to write to a string, handle or filename.  As I said
before, I'm not keen on this - having two very different return values (string
or nothing) depending on the arguments, with some special invocation needed to
request the string output (maybe None rather than a filename/handle?).

The status quo seems OK here, with a write method (to a handle or filename) and
separate a write_to_string method.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From tiagoantao at gmail.com  Tue Jan  6 11:52:22 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 6 Jan 2009 16:52:22 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
Message-ID: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>

On Tue, Jan 6, 2009 at 10:01 AM, Peter
<rhythmbox-devel at maubp.freeserve.co.uk> wrote:
> I haven't looked at any of your code on git - and I probably won't
> have any spare time till next week.  But anyway, do you have the URL
> handy?

I gave the code to Giovanni, so its his URL:
http://github.com/dalloliogm/biopython---popgen/tree/master
The code on Stats is still in a version that will have to be changed.
It is probably only of interest to developers that might have direct
interest in the module.
For development purposes I will put the code there (I don't want to
commit to the main CVS branch - as it is a production branch - before
the code is in an acceptable format).

Tiago

From bsouthey at gmail.com  Tue Jan  6 12:41:29 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 06 Jan 2009 11:41:29 -0600
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
Message-ID: <496397C9.3030706@gmail.com>

Tiago Ant?o wrote:
> Hi all,
>
> Jason Eshleman (he subscribes to this list also) has made available
> code to interact with Structure (a widely used application in
> population genetics - the 2 papers related to it have around 3000
> citations acording to Google scholar). We will try to convert his code
> to the Bio.PopGen namespace, create documentation and test cases.
> To this adds the exsiting LDNe code (mine). This all should be ready
> in a reasonably fast time frame (I suppose before the next release).
>
> The all important statistics part is still due, I am afraid (I don't
> know if anybody has looked at the beta code on git). But at least this
> LDNe and Structure code will be ready to go soon.
>
> Tiago
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
Hi,
What are the licenses for LDNe and Structure?
Saying just 'free' is insufficient because it is not clear in which 
definition is being used.

Also, please ensure that none of the code that is included into 
Biopython is not a deriviative of LDNe and Structure unless these have 
explicit license that is compatible with Biopython.  For example, 
'copying' an existing function into Python would be considered a 
derivative. Obviously reading a documented output is probably not 
considered a derivative.

I prefer to be proactive with licenses so these don't bite back like has 
happened in some formally open sources projects or use of unclean code 
sources. A current example of this is that the current release of scipy 
0.7 has been significantly delayed due to some major effort to check 
various functions that reference the Numerical Recipes book (which has 
an incompatible license).

Anyhow, this sounds good!

Bruce

From tiagoantao at gmail.com  Tue Jan  6 13:10:28 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 6 Jan 2009 18:10:28 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <496397C9.3030706@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
Message-ID: <6d941f120901061010n36281702gc073d9f4469d492c@mail.gmail.com>

On Tue, Jan 6, 2009 at 5:41 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> What are the licenses for LDNe and Structure?
> Saying just 'free' is insufficient because it is not clear in which
> definition is being used.
>
> Also, please ensure that none of the code that is included into Biopython is
> not a deriviative of LDNe and Structure unless these have explicit license
> that is compatible with Biopython.  For example, 'copying' an existing
> function into Python would be considered a derivative. Obviously reading a
> documented output is probably not considered a derivative.

Regarding LDNe we have had this discussion in the past. I have some
updates/extra info:
1. They only make available a Windows/DOS version. But they will make
a Linux version available (compiled by me, I offered to do that).
Probably a mac version also.
2. As I said before and as it is common in population genetics
(unfortunately), the software comes with no license at all, they
didn't even think that is an issue.
3. No code is remotely derived or adapted.

Regarding structure, the authors make the source available (a notch
better than LDNe) http://pritch.bsd.uchicago.edu/structure.html , but
again, they didn't bother to include license info. I am contacting
them in order to investigate this. I will report back as soon as I
have an answer.

This being said, structure support is way more important than LDNe.
The userbase of structure is quite big (just check the factoid
previous on google schoolar citations).

From dalloliogm at gmail.com  Wed Jan  7 05:37:00 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 7 Jan 2009 11:37:00 +0100
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
Message-ID: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>

On Tue, Jan 6, 2009 at 5:52 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> On Tue, Jan 6, 2009 at 10:01 AM, Peter
> <rhythmbox-devel at maubp.freeserve.co.uk> wrote:
>> I haven't looked at any of your code on git - and I probably won't
>> have any spare time till next week.  But anyway, do you have the URL
>> handy?
>
> I gave the code to Giovanni, so its his URL:
> http://github.com/dalloliogm/biopython---popgen/tree/master

Hi people,
if you want to upload the code there, please tell me and I will give
you the write access.

However, the right way to do it should be that you create a fork of
the code on github, add your changes and work on it locally, and then
merge them back again in the original repository. I suppose that is
the standard way to use git.


> The code on Stats is still in a version that will have to be changed.
> It is probably only of interest to developers that might have direct
> interest in the module.
> For development purposes I will put the code there (I don't want to
> commit to the main CVS branch - as it is a production branch - before
> the code is in an acceptable format).
>
> Tiago
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From tiagoantao at gmail.com  Wed Jan  7 06:54:19 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 7 Jan 2009 11:54:19 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
Message-ID: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>

> However, the right way to do it should be that you create a fork of
> the code on github, add your changes and work on it locally, and then
> merge them back again in the original repository. I suppose that is
> the standard way to use git.

Considering that CVS has no development branch I think having git is
very good. I would just recommend extreme care with changing existing
code. When merging back into CVS, changes to existing code might not
go in (especially if they change interfaces) or be delayed.

Big _design_ changes will have to be discussed in advance.

For my part, what I am including is just new LDNe code and helping
Jason with the structure code. So I expect zero impact on existing
code and no need for design changes.

Tiago
PS - I am travelling until Saturday, apologies in advance for delayed answers.

From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 09:12:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:12:46 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071412.n07ECk1n012802@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #24 from lpritc at scri.sari.ac.uk  2009-01-07 09:12 EST -------
(In reply to comment #13)

> I can not check this as I am away from my system. As I recall, the Python code
> for accessing this library is provided with the standard install as there is a
> renderPM.py file. But that is just a wrapper to some C code found in the
> rl_addons directory. So it is a big no that renderPM is available unless you
> actually build the C sources or download the binaries (only valid for Windows).

That's not really a big deal, as those are the only two ways to get ReportLab,
from reportlab.org!

>From the website (http://www.reportlab.org/downloads.html):

"""
We provide precompiled binaries for Windows, but not for any other platform.
Many Linux distributors and other UNIX-like OS vendors provide their own
binaries for download
"""

The installation procedure for me was to issue:

python setup.py install

at the command line while in the top directory of the source download, which
isn't any harder than installing Biopython itself.  This installed ReportLab
2.2, including compilation of renderPM.  

> According to the website
> http://www.reportlab.org/subversion.html
> "
> It will create subdirectories for reportlab, which is an importable
> python package, and rl_addons which contains the C extensions. The
> latter need building with the contained setup script, but can also be
> downloaded in pre-built form from our downloads page. They rarely
> change.
> "
> 
> What did you actually install?

Reportlab 2.2, stable build as ReportLab_2_2.tgz, downloaded on December 15th
last year.  From the checksum, it's the 11/9 build.

I've just checked the SVN trunk, and that also builds renderPM, on the same
machine.

> In particular where was _renderPM built?

Initially, in [download location]/ReportLab_2_2/src/rl_addons/renderPM

and the library was installed to 

/usr/local/lib/python2.4/site-packages/_renderPM.so

by the setup script.

> Basically we need to document this as there appears to be different ways to
> install reporlab (may also be version or svn related).

I'm happy with this, but it's not exactly a complicated issue: either the local
Reportlab installation does or does not have renderPM; if it does not, then
raising an error before the user dedicates too much effort to something that
can't work seems at least polite.  Also, providing pointers in the
documentation to where renderPM can be obtained (at time of last writing) is a
good idea.  IMO, given the straightforward installation procedure that corrects
the issue - which ought not to affect *nix users that do not run precompiled
binaries, anyway -  I reckon that raising an error will be sufficient for most
of the few cases that renderPM is not installed. 

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 09:33:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:33:21 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071433.n07EXLSn014755@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #25 from lpritc at scri.sari.ac.uk  2009-01-07 09:33 EST -------
(In reply to comment #17)

> 1) I do not understand the need for the dictionary of modules 'formatdict' in
> _write as it creates unnecessary inefficient code. The options need to be part
> of the check for the type of output.

The need is that input types are associated with alternative rendering
backends.  The distribution dictionary approach is highly-readable and readily
extendable to accept, for example, lowercase variants of format names that map
to the same backend - as in your point number 2.

I also don't understand your efficiency argument.  Firstly, this step is not
AFAIAA a bottleneck, and hardly a priority for optimisation; secondly I do not
believe that a distribution dictionary is less efficient than your suggestion. 
The dictionary achieves the same end in three lines of code, rather than ten
for the elif.  Also computationally, if the format name is 'TIF', your elif
code will always have to cycle through all output format name tests (four
conditionals, and an O(n) list search) in order to associate that format with
renderPM.  This is less efficient than a dictionary approach: retrieving values
from dictionaries takes approximately constant time. Not that if we ran profile
on the two approaches we'd see much of a difference, of course - this is not a
speed-critical step.

Also, and in my opinion, elifs are not as easy to maintain, or as readable, as
distribution dictionaries.

> 2) There is no indication that the output for write and write_to_string only
> accepts uppercase. Note the _write function states this but a user will not see
> these. I do not understand why lowercase is unacceptable. 

It's not unacceptable - at least, not to me - I just didn't write it to accept
lowercase, originally.  I've no objection to adding lowercase variants of the
format names to the distribution dictionary.

> 3) The check for renderPM at start is really redundant because _write checks
> for it (well sort of). It is also an unnecessary delay if renderPM is not used.

It's not a big speed hit (or is there contradictory data? it's certainly not a
speed worry for my work) and, if tested on import, needs only to be done once
when GenomeDiagram is imported.

> 4) There is no test for the presence of renderPM. The test function must check
> for renderPM and should at least provide a warning if not present. Otherwise
> this is a surprise to a user because not all options will be available.

Raising an error, or at least a warning, is a good idea.  I favour raising this
error on first import.

> 5) The installation documentation must also indicate that renderPM is optional
> and also how to install the renderPM module.

I'm still not convinced that this is all that big an issue: renderPM is part of
the source ReportLab 2.2 distribution, and the instructions on reportlab.org
are pretty clear.  However, for those users who have pathological
installations, a line pointing out that renderPM can be obtained via
reportlab.org is a good idea.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 09:38:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:38:14 -0500
Subject: [Biopython-dev] [Bug 2727] New: PDB.Bio: header should include
	CRYST1 information
Message-ID: <bug-2727-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2727

           Summary: PDB.Bio:  header should include CRYST1 information
           Product: Biopython
           Version: 1.49b
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mok at bioxray.au.dk


The unit cell and spacegroup information should be available from PDBParser's
get_header() method.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 09:40:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:40:52 -0500
Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1
	information
In-Reply-To: <bug-2727-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071440.n07EeqsZ015513@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2727


------- Comment #1 from mok at bioxray.au.dk  2009-01-07 09:40 EST -------
Created an attachment (id=1188)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1188&action=view)
Patch for parse_pdb_header.py

Attached patch will add three keys to the header dictionary: cell, spacegroup
and cell_z, giving access to this data gleaned from the CRYST1 record of a PDB
file.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 10:10:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 10:10:12 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071510.n07FACPH017825@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #26 from bsouthey at gmail.com  2009-01-07 10:10 EST -------
(In reply to comment #24)
I had Reportlab version 2.1 installed but once I upgraded to version 2.2 I got
renderPM built. So anyone using reportlab version 2.2 will be happy, others
that don't will not be happy! 

So please ensure that Reportlab version 2.2 (released 11 Sep 2008) and higher
is required. Otherwise you must check for renderPM because most people probably
have old version around with renderPM and most distributions (OpenSUSE seems to
be an exception if you look in the right place) don't have the 2.2 version yet.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 10:52:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 10:52:52 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071552.n07FqqcX021811@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #27 from bsouthey at gmail.com  2009-01-07 10:52 EST -------
(In reply to comment #25)
This is a mainly a reportlab issue (API and version problem) and, as Peter
said, a style issue. So the only remaining issue is a unit test involving at
least checks for the presence of renderPM due to versions of reportlab less
than 2.2.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From jae at lmi.net  Thu Jan  8 17:24:21 2009
From: jae at lmi.net (Jason Eshleman)
Date: Thu, 08 Jan 2009 14:24:21 -0800
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <496397C9.3030706@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
Message-ID: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>

Greetings all,

Presently, the code I have for dealing with STRUCTURE is similar to the 
code for interacting with Clustal in that it does not modify any of the 
STRUCTURE source code by merely initiates the compiled executable.

Initially, I have used my code in place of their Java front end as it 
allows for more control of the run-time variables for successive runs with 
varying run parameters.  At some point, I'd like to get it to interface 
more directly with the STRUCTURE code to be able to pipe results directly 
to python for parsing rather than working with the STRUCTURE text output 
but that's a ways off still.


-Jason


At 09:41 AM 1/6/2009, Bruce Southey wrote:
>Tiago Ant?o wrote:
>>Hi all,
>>
>>Jason Eshleman (he subscribes to this list also) has made available
>>code to interact with Structure (a widely used application in
>>population genetics - the 2 papers related to it have around 3000
>>citations acording to Google scholar). We will try to convert his code
>>to the Bio.PopGen namespace, create documentation and test cases.
>>To this adds the exsiting LDNe code (mine). This all should be ready
>>in a reasonably fast time frame (I suppose before the next release).
>>
>>The all important statistics part is still due, I am afraid (I don't
>>know if anybody has looked at the beta code on git). But at least this
>>LDNe and Structure code will be ready to go soon.
>>
>>Tiago
>>_______________________________________________
>>Biopython-dev mailing list
>>Biopython-dev at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>Hi,
>What are the licenses for LDNe and Structure?
>Saying just 'free' is insufficient because it is not clear in which 
>definition is being used.
>
>Also, please ensure that none of the code that is included into Biopython 
>is not a deriviative of LDNe and Structure unless these have explicit 
>license that is compatible with Biopython.  For example, 'copying' an 
>existing function into Python would be considered a derivative. Obviously 
>reading a documented output is probably not considered a derivative.
>
>I prefer to be proactive with licenses so these don't bite back like has 
>happened in some formally open sources projects or use of unclean code 
>sources. A current example of this is that the current release of scipy 
>0.7 has been significantly delayed due to some major effort to check 
>various functions that reference the Numerical Recipes book (which has an 
>incompatible license).
>
>Anyhow, this sounds good!
>
>Bruce
>_______________________________________________
>Biopython-dev mailing list
>Biopython-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 07:50:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 07:50:37 -0500
Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1
	information
In-Reply-To: <bug-2727-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091250.n09Cob1q021245@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2727


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 07:50 EST -------
Hopefully Bio.PDB's owner/maintainer Thomas Hamelryck can comment on this.

In the meantime, the code style seems to fit fine with the rest of
parse_pdb_header.py which is good.  However, you have not updated the
parse_pdb_header function's docstring to include the new keys.  Furthermore, it
would be nice to have the docstring describe the meaning of the cell, z-cell
and spacegroup entries you have introduced.  I'm also curious about the default
values and their meanings.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From rhythmbox-devel at maubp.freeserve.co.uk  Fri Jan  9 07:55:13 2009
From: rhythmbox-devel at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 12:55:13 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
Message-ID: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>

On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>
> Considering that CVS has no development branch I think having git is
> very good. I would just recommend extreme care with changing existing
> code. When merging back into CVS, changes to existing code might not
> go in (especially if they change interfaces) or be delayed.
>

If there is a strong interest in having experimental branches in the
official Biopython repository, we could discuss that as an option.
Although I would prefer we get moved from CVS to SVN first before
actually doing this, in order to keep the migration as simple as
possible.

Peter


From biopython at maubp.freeserve.co.uk  Fri Jan  9 07:59:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 12:59:00 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
	<6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
Message-ID: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>

On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman <jae at lmi.net> wrote:
> Greetings all,
>
> Presently, the code I have for dealing with STRUCTURE is similar to the code
> for interacting with Clustal, in that it does not modify any of the STRUCTURE
> source code by merely initiates the compiled executable.

Biopython has code for interacting with lots of command line tools,
and this neatly avoids any copyright/licence questions about being a
derived work.

> Initially, I have used my code in place of their Java front end as it allows
> for more control of the run-time variables for successive runs with varying
> run parameters.  At some point, I'd like to get it to interface more
> directly with the STRUCTURE code to be able to pipe results directly to
> python for parsing rather than working with the STRUCTURE text output but
> that's a ways off still.

I'm not quite clear what you have in mind, but this would probably
need a little more thought from the legal perspective.  If STRUCTURE
provides an API with header files you can compile against, that should
be OK (but I am not a lawyer).  Note that do this within Biopython
would then mean adding another build time dependency, which would need
to be justified in terms of the benefits it brings.

Peter

From bsouthey at gmail.com  Fri Jan  9 09:46:15 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 09 Jan 2009 08:46:15 -0600
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
Message-ID: <49676337.7050504@gmail.com>

Peter wrote:
> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>   
>> Considering that CVS has no development branch I think having git is
>> very good. I would just recommend extreme care with changing existing
>> code. When merging back into CVS, changes to existing code might not
>> go in (especially if they change interfaces) or be delayed.
>>
>>     
>
> If there is a strong interest in having experimental branches in the
> official Biopython repository, we could discuss that as an option.
> Although I would prefer we get moved from CVS to SVN first before
> actually doing this, in order to keep the migration as simple as
> possible.
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   

I agree that it is essential to move from CVS before doing this but does 
not prevent any discussion.

So I'll start a thread.

Bruce


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 10:59:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 10:59:40 -0500
Subject: [Biopython-dev] [Bug 2729] New: Importing Bio.SeqUtils before
	importing pylab gives a "Bus Error"
Message-ID: <bug-2729-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729

           Summary: Importing Bio.SeqUtils before importing pylab gives a
                    "Bus Error"
           Product: Biopython
           Version: 1.49
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: stephan_schiffels at mac.com


I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0
The following two lines crash:

import Bio.SeqUtils
import pylab

I nailed down the problem to lines 122 through 125 in Bio/SeqUtils/__init__.py.
Commenting out these four lines SOLVES the bug for me, since I don't use the
graphics-functions in the SeqUtils package

Best,
Stephan


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Fri Jan  9 11:18:26 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 09 Jan 2009 10:18:26 -0600
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
Message-ID: <496778D2.1050801@gmail.com>

Hi,
In a previous thread (and indicated in others) it was suggested that 
perhaps Biopython needs some type of development  or experimental 
branch. So this thread is orientated to provide some discussion on this 
and considers that Biopython has moved to SVN. I think it is very 
relevant discussion because Biopython needs an effective approach to 
mainly handle new code but also handle significant rewrites of older code.

The most important question is do you support creating developmental and 
experimental branches or not?

However, I do not think that this is a yes or no answer and I am not 
concerned about the question at the present time.  Rather I am concerned 
about the burden placed on the maintainers (especially Peter and 
Michiel), the expression of the developer needs and how this impact the 
community. I am rather neutral on it (probably because I have not 
contributed any major code to Biopython) but I would like to ensure that 
the discussion leads to positive changes.

I find Biopython interesting and special for various reasons. There is a 
solid core of functions that are common to many aspects of 
bioinformatics. But it also contains very specialized code that has a 
much smaller audience. Consequently certain parts get considerable 
exposure and other parts get limited or no exposure. This means that it 
may be necessary to release beta versions in order to get the necessary 
exposure as I assume that code has had sufficient development to be 
released in the first place. Creating developmental and experimental 
branches is one way to get this exposure but perhaps branches are not 
necessary.

An alternative approach is creating specialized projects within 
Biopython that can be used for development and testing. For example, 
Scipy provides SciKits that are related code that is typically special 
purpose or is released under a different license than scipy/numpy. This 
replaced the sandboxes that existed in prior versions of numpy and 
scipy. But a recent problem arose in numpy was how to get code from such 
a location into numpy by creating a experimental section in the main 
distribution but that met some strong resistance.

Therefore, I see the following issues that need to be addressed 
regardless of the approach taken:

0) Must be easy for project maintenance and release as this must not 
create an extra burden to Biopython!
1) Ensure adequate testing is performed especially to get it out to the 
appropriate audience and to correct the code and APIs. I consider this 
rather important because I tend to follow a type of user experience 
design (http://en.wikipedia.org/wiki/User_experience_design) and 
software prototyping (http://en.wikipedia.org/wiki/Software_prototyping) 
for software development.
2) Stabilization of APIs for backwards compatibility as we don't want to 
change these with each Biopython release.
3) Adequate test coverage especially across platforms and different 
software versions. For example Windows paths and older software versions 
can cause problems on other peoples machines but not yours.
4) Some type of code review even if it is just to ensure a consistent 
format (like spaces versus tabs) or compatibility across Python versions 
and platforms.
5) If developmental or experimental branch are used then how does the 
code move into the main distribution and how are these branches created 
and destroyed.

Please add other issues.

I would appreciate these issues being addressed when appropriate.

Regards
Bruce

Peter wrote:
> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>   
>> Considering that CVS has no development branch I think having git is
>> very good. I would just recommend extreme care with changing existing
>> code. When merging back into CVS, changes to existing code might not
>> go in (especially if they change interfaces) or be delayed.
>>
>>     
>
> If there is a strong interest in having experimental branches in the
> official Biopython repository, we could discuss that as an option.
> Although I would prefer we get moved from CVS to SVN first before
> actually doing this, in order to keep the migration as simple as
> possible.
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 11:27:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:27:08 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091627.n09GR88l003529@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 11:27 EST -------
i.e. these lines?

try:
    from Tkinter import *
except ImportError:
    pass

What happens with just "import Tkinter" on your machine?

Are you using the default Apple installed copy of python?

I can see why this might cause trouble if Tkinter does some initialisation at
import time.  Could you include the actual crash/traceback error please?

Note I see no crash on my MacOS machine (not sure which version of pylab) which
has Tkinter.  Nor do I see a crash on one of my linux machines (again, not sure
which pylab) which does NOT have TKinter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 11:33:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:33:59 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091633.n09GXxDS004117@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2009-01-09 11:33 EST -------
(In reply to comment #0)
> I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0
> The following two lines crash:
> 
> import Bio.SeqUtils
> import pylab
> 
What do you mean by crash?
Also, do you get the same problem with the latest matplotlib (0.98.4 I
believe)?
If

try:
    from Tkinter import *
except ImportError:
    pass
import pylab

crashes, then this is not a Biopython bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 11:45:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:45:52 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091645.n09GjqFV004905@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 11:45 EST -------
Created an attachment (id=1189)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1189&action=view)
Patch to Bio/SeqUtils/__init__.py to moving the Tkinter imports

This patch moves the Tkinter import back into the xGC_skew function as
suggested by the old comments in the code, and uses an explicit import list
instead of "import *".  For the history of this bit of code, see the deleted
file Bio/sequtils.py in CVS.

I think this is worthwhile little bit of clean up - but it probably won't have
any effect on Stephan's issue with Tkinter/pylab.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 11:53:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:53:23 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091653.n09GrN6W005481@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #4 from stephan_schiffels at mac.com  2009-01-09 11:53 EST -------
Hi,
importing Tkinter works fine. Only calling import pylab after it crashes... (no
traceback... just "bus error").
Here is the shell-output:

mac14:~ stschiff$ python
Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import Tkinter
>>> import pylab
Bus error
mac14:~ stschiff$ 

The weirdest thing is that calling the other way around works fine:

mac14:~ stschiff$ python
Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylab
>>> import Tkinter
>>> 

The same holds for first calling pylab and then Bio.SeqUtils...

I dont know, it could be that this is just a pathological case on my specific
setup. It's still weird though, since matplotlib uses GTK on X11 on my machine,
not Tkinter... I dont get it.

Maybe this is not a biopython bug after all... sorry and thanks anyway for your
concern

Stephan
(In reply to comment #1)
> i.e. these lines?
> 
> try:
>     from Tkinter import *
> except ImportError:
>     pass
> 
> What happens with just "import Tkinter" on your machine?
> 
> Are you using the default Apple installed copy of python?
> 
> I can see why this might cause trouble if Tkinter does some initialisation at
> import time.  Could you include the actual crash/traceback error please?
> 
> Note I see no crash on my MacOS machine (not sure which version of pylab) which
> has Tkinter.  Nor do I see a crash on one of my linux machines (again, not sure
> which pylab) which does NOT have TKinter.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 12:10:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 12:10:10 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091710.n09HAA5c006886@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 12:10 EST -------
(In reply to comment #4)
> Hi,
> importing Tkinter works fine. Only calling import pylab after it crashes...
> (no traceback... just "bus error").

You could try going to Application, Utilities, Console on your Mac to look for
any error log associated with the bus error.

> Here is the shell-output:
> 
> mac14:~ stschiff$ python
> Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) 
> [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import Tkinter
> >>> import pylab
> Bus error
> mac14:~ stschiff$ 

OK - that does seem to confirm that its a bug with pylab, and therefore isn't
Biopython's fault.  I'm going to close this bug.

I would suggest you update your installation of pylab, and if it still goes
wrong, file a bug with pylab.

Thanks anyway,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 12:10:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 12:10:52 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091710.n09HAqh1006971@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1189 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 12:10 EST -------
(From update of attachment 1189)
This didn't turn out to be related to Bug 2729 after all.

However, I've checked it in anyway.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From dalloliogm at gmail.com  Fri Jan  9 12:17:53 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 9 Jan 2009 18:17:53 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <496778D2.1050801@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
Message-ID: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>

On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> In a previous thread (and indicated in others) it was suggested that perhaps
> Biopython needs some type of development  or experimental branch. So this
> thread is orientated to provide some discussion on this and considers that
> Biopython has moved to SVN.

Maybe you can consider the approach at the basis of git, in which
every developer works on its personal branch, and the owner of the
'official branch' can decide whether to accept the changes apported by
the single branches or not.

If you want to play a bit with it, you can use my repository at github:
- http://github.com/dalloliogm/biopython---popgen/commits/master
and then create a fork from it.
I am sorry that you will have to create an account on github.. but I
don't know of any other free hosting service for git repositories.

Git has also other advantages over svn, like working on local (which
is done by creating a local branch internally) and being faster (this
is what they say).
Well, I am not a git guru, but I can suggest you some good videos,
like this one:
- http://excess.org/article/2008/07/ogre-git-tutorial/


> I think it is very relevant discussion because
> Biopython needs an effective approach to mainly handle new code but also
> handle significant rewrites of older code.
>
> The most important question is do you support creating developmental and
> experimental branches or not?
>
> Please add other issues.
>
> I would appreciate these issues being addressed when appropriate.
>
> Regards
> Bruce
>
> Peter wrote:
>>
>> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>>
>>>
>>> Considering that CVS has no development branch I think having git is
>>> very good. I would just recommend extreme care with changing existing
>>> code. When merging back into CVS, changes to existing code might not
>>> go in (especially if they change interfaces) or be delayed.
>>>
>>>
>>
>> If there is a strong interest in having experimental branches in the
>> official Biopython repository, we could discuss that as an option.
>> Although I would prefer we get moved from CVS to SVN first before
>> actually doing this, in order to keep the migration as simple as
>> possible.
>>
>> Peter
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Fri Jan  9 12:28:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 17:28:06 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
Message-ID: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>

On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> Hi,
>> In a previous thread (and indicated in others) it was suggested that perhaps
>> Biopython needs some type of development  or experimental branch. So this
>> thread is orientated to provide some discussion on this and considers that
>> Biopython has moved to SVN.
>
> Maybe you can consider the approach at the basis of git, in which
> every developer works on its personal branch, and the owner of the
> 'official branch' can decide whether to accept the changes apported by
> the single branches or not.

In some ways this describes the current situation but without the
software: The CVS/SVN repository is the master official branch which
we (as a group) try and keep pretty stable.  When working on new
modules, individual developers or contributors have hacked away on
their own machines (perhaps using a local repository - I tended to
just save versioned snapshots of work in progress), and commit things
to the master once it was sufficiently stable to be approved.  For
self contained modules, this works OK - although using something like
git would be a bit more formalised and automated, and allow this kind
of "work in progress" to be done openly.

Peter

From dalloliogm at gmail.com  Fri Jan  9 12:43:26 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 9 Jan 2009 18:43:26 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
Message-ID: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com>

On Fri, Jan 9, 2009 at 6:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>> Hi,
>>> In a previous thread (and indicated in others) it was suggested that perhaps
>>> Biopython needs some type of development  or experimental branch. So this
>>> thread is orientated to provide some discussion on this and considers that
>>> Biopython has moved to SVN.
>>
>> Maybe you can consider the approach at the basis of git, in which
>> every developer works on its personal branch, and the owner of the
>> 'official branch' can decide whether to accept the changes apported by
>> the single branches or not.
>
> In some ways this describes the current situation but without the
> software: The CVS/SVN repository is the master official branch which
> we (as a group) try and keep pretty stable.  When working on new
> modules, individual developers or contributors have hacked away on
> their own machines (perhaps using a local repository - I tended to
> just save versioned snapshots of work in progress), and commit things
> to the master once it was sufficiently stable to be approved.  For
> self contained modules, this works OK - although using something like
> git would be a bit more formalised and automated, and allow this kind
> of "work in progress" to be done openly.

just a note: since I was trying to simplify the concept, I said
something which is not particularly correct.
In git, you are not needed to have a central repository. Everyone has
its personal branch and there is not such thing as an 'official
branch', unless it is defined by convention.

For example, look at this graph:
- http://github.com/blog/39-say-hello-to-the-network-graph-visualizer
on March 6th someone has created a fork to work on a mysql support,
which has not been merged in the ufficial branch yet.

There are many other forks, too: which one is the official?
The answer is none of them, but if the authors wanted, they could have
created a repository and decided that it was the official one, and
kept it up to date.


>
> Peter
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it

From biopython at maubp.freeserve.co.uk  Fri Jan  9 12:49:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 17:49:43 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com>
Message-ID: <320fb6e00901090949v695333ak2615e9c217bc1387@mail.gmail.com>

> just a note: since I was trying to simplify the concept, I said
> something which is not particularly correct.
> In git, you are not needed to have a central repository. Everyone has
> its personal branch and there is not such thing as an 'official
> branch', unless it is defined by convention.

If we did want to adopt a git style approach, I do think we need an
official branch which would be used for the releases and installers
hosted on biopython.org, and this branch would be managed in much the
same way as we do now with CVS/SVN.

I think this would be essential for avoiding confusion in the typical end user.

Peter

From bartek at rezolwenta.eu.org  Fri Jan  9 13:17:09 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 9 Jan 2009 19:17:09 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
Message-ID: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>

On Fri, Jan 9, 2009 at 6:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>> Hi,
>>> In a previous thread (and indicated in others) it was suggested that perhaps
>>> Biopython needs some type of development  or experimental branch. So this
>>> thread is orientated to provide some discussion on this and considers that
>>> Biopython has moved to SVN.
>>
>> Maybe you can consider the approach at the basis of git, in which
>> every developer works on its personal branch, and the owner of the
>> 'official branch' can decide whether to accept the changes apported by
>> the single branches or not.
>
> In some ways this describes the current situation but without the
> software: The CVS/SVN repository is the master official branch which
> we (as a group) try and keep pretty stable.  When working on new
> modules, individual developers or contributors have hacked away on
> their own machines (perhaps using a local repository - I tended to
> just save versioned snapshots of work in progress), and commit things
> to the master once it was sufficiently stable to be approved.  For
> self contained modules, this works OK - although using something like
> git would be a bit more formalised and automated, and allow this kind
> of "work in progress" to be done openly.
>

It can be viewed this way, but the point here is that making this change to
the process of development might decrease the amount of work required to
join the  development. Especially, if you think about adding new library
to biopython, the most sensible way to do it is to branch and then
stabilize. I've
recently experienced (with Bio.Motif) that it might be tedious even
for a very simple
task. Also, using the distributed version control system, it is very
easy for a small team
of people to collaborate on a branch before merging back to the main
repository. In the
current mode this would be really difficult. And another  benefit is
that you do not loose
 the history of changes made "on a branch".

As for github, it is currently used by BioRuby project hosted on
open-bio.org. We can try
to talk to them and ask about their experiences. I'm not personally
involved in any way in it,
but it seems, that they've basically moved the main branch to github
and update the cvs repository
only occasionaly.

I think that for biopython, if we decided to use distributed version
control, it would
be better to use bazaar+launchpad instead of git+github. And for the
following reasons:
- it's completely free, as opposed to <300Mb of free account on github
- launchpad could make the transition very easy. They provide a
service of importing existing
open source projects  to launchpad:
https://help.launchpad.net/VcsImports They convert the trunk
to bazzaar for us and set it up to update from the cvs every 6-12
hours. It would be easy then to
see whether we like it like this or not
- bazaar is specifically aimed to be more user friendly than git, and
allows developers
to keep working in a familiar environment when moving from cvs or svn.
I think it is important since git
itself is really different from cvs and if we switch to anything else,
everybody needs to learn the tool.
- they use openID, which makes it simpler for people to join (even
though you still need another
 account)
- both bazaar and launchpad are developed in python, so they're more
python oriented
(while github is developed in ruby, so a better choice for bioruby).

More on comparing these to possibilities (from the bazaar developers
non-objective point of view):
http://bazaar-vcs.org/BzrVsGit

These are my 2 cents on the choice of  tools for development, but I
have to admit that I'm not
sure whether it is  needed for biopython now. I'm very open to discussion.

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433

From chapmanb at 50mail.com  Fri Jan  9 17:51:55 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 9 Jan 2009 17:51:55 -0500
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
Message-ID: <20090109225155.GF4135@sobchak.mgh.harvard.edu>

Hi all;
In terms of the coding of experimental modules, Giovanni is taking
an excellent approach. While they are under development, we can
utilize one of the many free hosting platforms to develop it as a
separate project in the Bio namespace. This allows interested users
to get the code, contribute, and test. Once an interface and
functionality is hammered out and they begin to stabilize, then it's
a good time to package it up and roll it into Biopython provided the
ol' mailing list consensus is happy.

This is a nice development model as it leverages the community, but
only rolls code into the main release when it stabilizes reasonable
well. Peter has taken a really good development methodology -- 
creating a rock solid stable core of modules, and actively deprecating
or fixing those that fall out of line.

My only suggestion would be to have a Biopython wiki page for the
experimental modules as they are under development. Something simple
with a description of the goals and a link to the source code would
help the majority of people who don't follow the mailing list find
and contribute to these.

Brad


> On Fri, Jan 9, 2009 at 6:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
> > <dalloliogm at gmail.com> wrote:
> >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> >>> Hi,
> >>> In a previous thread (and indicated in others) it was suggested that perhaps
> >>> Biopython needs some type of development  or experimental branch. So this
> >>> thread is orientated to provide some discussion on this and considers that
> >>> Biopython has moved to SVN.
> >>
> >> Maybe you can consider the approach at the basis of git, in which
> >> every developer works on its personal branch, and the owner of the
> >> 'official branch' can decide whether to accept the changes apported by
> >> the single branches or not.
> >
> > In some ways this describes the current situation but without the
> > software: The CVS/SVN repository is the master official branch which
> > we (as a group) try and keep pretty stable.  When working on new
> > modules, individual developers or contributors have hacked away on
> > their own machines (perhaps using a local repository - I tended to
> > just save versioned snapshots of work in progress), and commit things
> > to the master once it was sufficiently stable to be approved.  For
> > self contained modules, this works OK - although using something like
> > git would be a bit more formalised and automated, and allow this kind
> > of "work in progress" to be done openly.
> >
> 
> It can be viewed this way, but the point here is that making this change to
> the process of development might decrease the amount of work required to
> join the  development. Especially, if you think about adding new library
> to biopython, the most sensible way to do it is to branch and then
> stabilize. I've
> recently experienced (with Bio.Motif) that it might be tedious even
> for a very simple
> task. Also, using the distributed version control system, it is very
> easy for a small team
> of people to collaborate on a branch before merging back to the main
> repository. In the
> current mode this would be really difficult. And another  benefit is
> that you do not loose
>  the history of changes made "on a branch".
> 
> As for github, it is currently used by BioRuby project hosted on
> open-bio.org. We can try
> to talk to them and ask about their experiences. I'm not personally
> involved in any way in it,
> but it seems, that they've basically moved the main branch to github
> and update the cvs repository
> only occasionaly.
> 
> I think that for biopython, if we decided to use distributed version
> control, it would
> be better to use bazaar+launchpad instead of git+github. And for the
> following reasons:
> - it's completely free, as opposed to <300Mb of free account on github
> - launchpad could make the transition very easy. They provide a
> service of importing existing
> open source projects  to launchpad:
> https://help.launchpad.net/VcsImports They convert the trunk
> to bazzaar for us and set it up to update from the cvs every 6-12
> hours. It would be easy then to
> see whether we like it like this or not
> - bazaar is specifically aimed to be more user friendly than git, and
> allows developers
> to keep working in a familiar environment when moving from cvs or svn.
> I think it is important since git
> itself is really different from cvs and if we switch to anything else,
> everybody needs to learn the tool.
> - they use openID, which makes it simpler for people to join (even
> though you still need another
>  account)
> - both bazaar and launchpad are developed in python, so they're more
> python oriented
> (while github is developed in ruby, so a better choice for bioruby).
> 
> More on comparing these to possibilities (from the bazaar developers
> non-objective point of view):
> http://bazaar-vcs.org/BzrVsGit
> 
> These are my 2 cents on the choice of  tools for development, but I
> have to admit that I'm not
> sure whether it is  needed for biopython now. I'm very open to discussion.
> 
> -- 
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev

From biopython at maubp.freeserve.co.uk  Sat Jan 10 09:46:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 14:46:13 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <20090109225155.GF4135@sobchak.mgh.harvard.edu>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>

On Fri, Jan 9, 2009 at 10:51 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> In terms of the coding of experimental modules, Giovanni is taking
> an excellent approach. While they are under development, we can
> utilize one of the many free hosting platforms to develop it as a
> separate project in the Bio namespace. This allows interested users
> to get the code, contribute, and test. Once an interface and
> functionality is hammered out and they begin to stabilize, then it's
> a good time to package it up and roll it into Biopython provided the
> ol' mailing list consensus is happy.

This does describe recent large additions fairly well - such as
Bio.SeqIO, Bio.AlignIO, Bio.Entrez, Bio.PopGen and most recently
Bio.Graphics.GenomeDiagram (which is a little different in that it was
previously publicly available as a separate module).

Modifications to existing bits of code (for example I have some
proposals for Seq, SeqRecord and Alignment objects as enhancement
bugs) don't really work in the same way - but also by their nature
require more discussion because they can indirectly affect a lot of
code.

> This is a nice development model as it leverages the community, but
> only rolls code into the main release when it stabilizes reasonable
> well. Peter has taken a really good development methodology --
> creating a rock solid stable core of modules, and actively deprecating
> or fixing those that fall out of line.

I really don't deserve all the credit here - Michiel has also been a
strong proponent for this "spring cleaning" as needed, for example how
our NCBI online bits have been rationalised, refocusing on Bio.Entrez
at the preferred module.

> My only suggestion would be to have a Biopython wiki page for the
> experimental modules as they are under development. Something simple
> with a description of the goals and a link to the source code would
> help the majority of people who don't follow the mailing list find
> and contribute to these.

Using the wiki in this way is a nice idea.  Tiago - do you fancy
adding a PopGen page describing the additions you're working on?  As a
bonus, once these do get into the main repository, you may find the
wiki text will be a useful basis for extending the documentation.

Peter

From mjldehoon at yahoo.com  Sat Jan 10 11:30:07 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 10 Jan 2009 08:30:07 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>
Message-ID: <126502.76038.qm@web62403.mail.re1.yahoo.com>

> > We could discuss a modification to run_tests.py so
> > that if there is no expected output file
> > output/test_XXX for test_XXX.py we just run
> > test_XXX.py and check its return value (I think
> > Michiel had previously
> > suggested something like this).
> 
> I think this should be done inside the test itself.
> All the tests should return only a boolean value (passed or
> not) and a description of the error.
> The tests that make use of an expected output file, they
> should open it and do the comparison by themselves, not in
> run_tests.py.

Sounds attractive, but there is one complication for print-and-compare tests. The code that does the print-and-compare is not trivial (see run_tests.py). It is possible to have the print-and-compare code in a helper module, which is then imported by each print-and-compare test. Still, while currently the print-and-compare tests have the advantage of being simple, they will get more complicated if we require the print-and-compare to be part of each test.

Does anybody have an opinion on this? It's either doing the print-and-compare as part of each print-and-compare test script, or requiring a test_suite() function in each unittest-based test script, and assuming that a test script is a unittest-based test script if it contains a test_suite() function.

--Michiel


From tiagoantao at gmail.com  Sat Jan 10 11:48:03 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 10 Jan 2009 16:48:03 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <496778D2.1050801@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
Message-ID: <6d941f120901100848h6e186022o241b928ea2566993@mail.gmail.com>

This whole discussion is very interesting. In fact, whatever are the
conclusions I think they should be labeled "offical policy" and put on
the Wiki.

The biggest problem that I've faced is that, whenever I am doing
something, I don't know the level of acceptability with other
developers. I tend to put everything to discussion before I commit it
and whenever I say something I might get completely different answers
from time to time and from different people. The end result is that I
defer from commiting things because of issues that are raised in an
ad-hoc fashion.

There should be a page clarifying things like:
1. Are contributions that have a small target audience accepted?
2. Use of foreign libraries (e.g., SciPy)?
3. Code management policies. Branches?  Adding new code? Breaking interfaces?
4. New developers
5. Legal issues
6. Interop with non-free software
7. Code quality strategies. Code review? Testing?
8. Multiplatform issues

I am not saying a big document. But as questions arise, just discuss
them, arrive at a decision and document them. It becomes tiring having
to answer the same questions about code that you want to submit over
and over again and with different issues everytime.

One can live with decisions that are disliked, but it is much more
difficult to live when the playing ground is moving all the time.

On Fri, Jan 9, 2009 at 4:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> In a previous thread (and indicated in others) it was suggested that perhaps
> Biopython needs some type of development  or experimental branch. So this
> thread is orientated to provide some discussion on this and considers that
> Biopython has moved to SVN. I think it is very relevant discussion because
> Biopython needs an effective approach to mainly handle new code but also
> handle significant rewrites of older code.
>
> The most important question is do you support creating developmental and
> experimental branches or not?
>
> However, I do not think that this is a yes or no answer and I am not
> concerned about the question at the present time.  Rather I am concerned
> about the burden placed on the maintainers (especially Peter and Michiel),
> the expression of the developer needs and how this impact the community. I
> am rather neutral on it (probably because I have not contributed any major
> code to Biopython) but I would like to ensure that the discussion leads to
> positive changes.
>
> I find Biopython interesting and special for various reasons. There is a
> solid core of functions that are common to many aspects of bioinformatics.
> But it also contains very specialized code that has a much smaller audience.
> Consequently certain parts get considerable exposure and other parts get
> limited or no exposure. This means that it may be necessary to release beta
> versions in order to get the necessary exposure as I assume that code has
> had sufficient development to be released in the first place. Creating
> developmental and experimental branches is one way to get this exposure but
> perhaps branches are not necessary.
>
> An alternative approach is creating specialized projects within Biopython
> that can be used for development and testing. For example, Scipy provides
> SciKits that are related code that is typically special purpose or is
> released under a different license than scipy/numpy. This replaced the
> sandboxes that existed in prior versions of numpy and scipy. But a recent
> problem arose in numpy was how to get code from such a location into numpy
> by creating a experimental section in the main distribution but that met
> some strong resistance.
>
> Therefore, I see the following issues that need to be addressed regardless
> of the approach taken:
>
> 0) Must be easy for project maintenance and release as this must not create
> an extra burden to Biopython!
> 1) Ensure adequate testing is performed especially to get it out to the
> appropriate audience and to correct the code and APIs. I consider this
> rather important because I tend to follow a type of user experience design
> (http://en.wikipedia.org/wiki/User_experience_design) and software
> prototyping (http://en.wikipedia.org/wiki/Software_prototyping) for software
> development.
> 2) Stabilization of APIs for backwards compatibility as we don't want to
> change these with each Biopython release.
> 3) Adequate test coverage especially across platforms and different software
> versions. For example Windows paths and older software versions can cause
> problems on other peoples machines but not yours.
> 4) Some type of code review even if it is just to ensure a consistent format
> (like spaces versus tabs) or compatibility across Python versions and
> platforms.
> 5) If developmental or experimental branch are used then how does the code
> move into the main distribution and how are these branches created and
> destroyed.
>
> Please add other issues.
>
> I would appreciate these issues being addressed when appropriate.
>
> Regards
> Bruce
>
> Peter wrote:
>>
>> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>>
>>>
>>> Considering that CVS has no development branch I think having git is
>>> very good. I would just recommend extreme care with changing existing
>>> code. When merging back into CVS, changes to existing code might not
>>> go in (especially if they change interfaces) or be delayed.
>>>
>>>
>>
>> If there is a strong interest in having experimental branches in the
>> official Biopython repository, we could discuss that as an option.
>> Although I would prefer we get moved from CVS to SVN first before
>> actually doing this, in order to keep the migration as simple as
>> possible.
>>
>> Peter
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
"Systems can remain irrational far longer than you or I can survive" -
Freely adapted from John Maynard Keynes


From tiagoantao at gmail.com  Sat Jan 10 11:52:44 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 10 Jan 2009 16:52:44 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
Message-ID: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>

On Sat, Jan 10, 2009 at 2:46 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Using the wiki in this way is a nice idea.  Tiago - do you fancy
> adding a PopGen page describing the additions you're working on?  As a
> bonus, once these do get into the main repository, you may find the
> wiki text will be a useful basis for extending the documentation.

Where do you want me to link the page on the Wiki?

From biopython at maubp.freeserve.co.uk  Sat Jan 10 12:03:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 17:03:05 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
	<6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
Message-ID: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>

On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> On Sat, Jan 10, 2009 at 2:46 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Using the wiki in this way is a nice idea.  Tiago - do you fancy
>> adding a PopGen page describing the additions you're working on?  As a
>> bonus, once these do get into the main repository, you may find the
>> wiki text will be a useful basis for extending the documentation.
>
> Where do you want me to link the page on the Wiki?

How about having two pages:

http://biopython.org/wiki/PopGen
- documentation on the code in the current official release,
- linked to from the main doc page

http://biopython.org/wiki/PopGen_dev
- discussion and links to your branch etc,
- linked to from the above PopGen page

This would be consistent with how I did the Bio.SeqIO pages,
http://biopython.org/wiki/SeqIO
http://biopython.org/wiki/SeqIO_dev

If you think you have an better idea, feel free to make suggestions.

Peter


From peter at maubp.freeserve.co.uk  Sat Jan 10 12:46:38 2009
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 17:46:38 +0000
Subject: [Biopython-dev] Developmental policies
Message-ID: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>

On Sat, Jan 10, 2009 at 4:48 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> This whole discussion is very interesting. In fact, whatever are the
> conclusions I think they should be labeled "offical policy" and put on
> the Wiki.

That sounds good.

> The biggest problem that I've faced is that, whenever I am doing
> something, I don't know the level of acceptability with other
> developers. I tend to put everything to discussion before I commit it
> and whenever I say something I might get completely different answers
> from time to time and from different people. The end result is that I
> defer from commiting things because of issues that are raised in an
> ad-hoc fashion.

Asking before doing things is in general a good plan.  Sadly not
everyone will be free to respond at any one time - but I agree with
you that having more of the defacto policy written out explicitly
would help.

> There should be a page clarifying things like:
> 1. Are contributions that have a small target audience accepted?

Historically yes this has happened - although my impression is that
the bar was perhaps set too low.  I would say some things were
accepted without sufficient documentation and tests.  The problem with
small interest modules is that if the original developer moves on, in
the absense of any apparent users, the module gets abandoned.  This
seems to explain several of the smaller modules we've deprecated in
the last couple of years.

On the other hand, somethings will start with a small target audience
that will grow.  If I was confident that the developer concerned would
stick arround for several years and was prepared to deal with
documentation, unit tests and bug fixes then I would be much happier
about including something, even if it might have a relatively small
target audience initially.

> 2. Use of foreign libraries (e.g., SciPy)?

I think the current stance has been to try and minimise 3rd party
dependencies, other than the special case of python wrappers for
command line tools.  This makes much easier for beginners to install
and use Biopython, and lowering the barrier to entry is a good thing.

There are practical points here too.  In general, 3rd party
dependencies can be a pain (e.g. our Martel parsers broke when
mxTextTools changed their API between 2.0 and 3.0).  Similarly they
can restrict the distribution of Biopython (e.g. NumPy isn't get
available on Windows for Python 2.6), and will also be a potential
road block for moving to Python 3.  As another example, a small part
of Bio.PDB uses flex in a parser, and again this makes building and
distributing it a real pain (so much so, that its been commented out
by default).

However, run time only dependencies (like pure python libraries and
command line tools) are not such an issue for packaging/distribution.
e.g. ReportLab (used in Bio.Graphics only).  If SciPy were to be used
by part of Bio.PopGen, and this didn't affect packaging/distribution
then this might be OK.

> 3. Code management policies. Branches?  Adding new code? Breaking interfaces?

Biopython has historically worked from a stable trunk.  As a
consequence we try and avoid breaking interfaces, instead adopting a
gradual deprecation of an old interface when adding a new interface,
or adding enhancements in a backwards compatible manor.

> 4. New developers

I think there is something written down about this already...

> 5. Legal issues

Try and avoid them?  What did you mean in particular?

> 6. Interop with non-free software

This is linked to the legal issues question.  Many of the tools we
link to like BLAST aren't open source, but are "free" as in cost.  I
don't think we have any examples of non-free software.

> 7. Code quality strategies. Code review? Testing?

Code review:
For new code in a specialist area, it can be difficult to get a
qualified second opinion on the approach, but existing developers can
at least comment on the coding style.  For existing code, my
impression is module owners have been trusted to make changes to
"their" code without review - and generally speaking this has worked
out OK.  Although if anyone spot someone making a change they disagree
with, then please do raise it.  I would hope any larger change had
some discussion before hand - possibly via enhancement entries on
bugzilla.

Testing:
I'd strongly resist adding any new module without an accompanying
test, and wish this had been a firm policy from day one.

> 8. Multiplatform issues

Ideally everything should be cross platform (like python itself).
There are exceptions to this - in particular some 3rd party tools are
not cross platform.  I personally use and test on Windows, Linux and
Mac - and I believe Michiel does too.

> I am not saying a big document. But as questions arise, just discuss
> them, arrive at a decision and document them. It becomes tiring having
> to answer the same questions about code that you want to submit over
> and over again and with different issues everytime.
> One can live with decisions that are disliked, but it is much more
> difficult to live when the playing ground is moving all the time.

I'm sorry if you've had that feeling.  However, circumstances change.
As I recall when you first asked about using SciPy as a dependency,
Biopython was still using Numeric instead of Numpy - so using SciPy
had to wait until after that transition.  Now that we have moved to
NumPy, I think you have a much stronger case.

Peter


From tiagoantao at gmail.com  Sat Jan 10 13:31:05 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 10 Jan 2009 18:31:05 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
Message-ID: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>

> mxTextTools changed their API between 2.0 and 3.0).  Similarly they
> can restrict the distribution of Biopython (e.g. NumPy isn't get
> available on Windows for Python 2.6), and will also be a potential
> road block for moving to Python 3.  As another example, a small part

By the way, another issue that would be interesting to address is
deprecation of older Python versions and Python 3. Like just having a
clear stance on what is the current feeling about this. It seems to be
a recurring question.


>> 5. Legal issues
>
> Try and avoid them?  What did you mean in particular?

In my opinion something should be said about this. Actually I think
(suggest) it is essencially a matter of mainly taking Bruce' s
comments (e.g. one cannot have derived works of non-free software) and
write them down on a wiki page. Just things potential contributor
would have to be aware of on a legal front.

> Testing:
> I'd strongly resist adding any new module without an accompanying
> test, and wish this had been a firm policy from day one.

People should also be encouraged to test (in as much as possible) in
at least Win/Linux/Mac. Of course, for some people it will be
difficult as access to all platforms is not always possible for
everybody. But at least encouragement should be made...


> I'm sorry if you've had that feeling.  However, circumstances change.
> As I recall when you first asked about using SciPy as a dependency,
> Biopython was still using Numeric instead of Numpy - so using SciPy
> had to wait until after that transition.  Now that we have moved to
> NumPy, I think you have a much stronger case.

Boss, don't say sorry, I think everybody would agree that you make a
most fantastic effort.

Regarding circunstances: When circunstances change, then one would
ammend documents.
Again, my point is not in favour of this or that policy. Only that a
barebones policy should be documented. So that people know what the
basic rules are, this will allow for realistic expectations with
regards to code being accepted or not in the stable distribution.

From peter at maubp.freeserve.co.uk  Sat Jan 10 15:10:27 2009
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 20:10:27 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
Message-ID: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>

On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> By the way, another issue that would be interesting to address is
> deprecation of older Python versions and Python 3. Like just having a
> clear stance on what is the current feeling about this. It seems to be
> a recurring question.

Regarding older versions of python, we have stated that Biopython 1.49
should work on Python 2.3 to 2.6, and we expect to do the same for
Biopython 1.50.  Thereafter, we will probably drop support for Python
2.3 (unless anyone has a strong need for it and makes their voice
heard).  See the mailing list archive and the corresponding new
postings:
http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/
http://news.open-bio.org/news/2008/11/biopython-release-149/

Regarding Python 3, one hold up will be neither ReportLab nor NumPy
have a clear plan for Python 3 - or at least that is my impression.
However, even ignoring those parts of Biopython which use NumPy (e.g.
Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab),
we have a lot of useful code.  In the short term we should be aiming
to have everything run under Python 2.6 in warnings mode, as a step
towards eventual Python 3 support.

Beyond that, I think that it is likely we'll want to use bytes rather
than (unicode) strings in Python 3 for the Seq object, but have not
given this much thought.

>>> 5. Legal issues
>>
>> Try and avoid them?  What did you mean in particular?
>
> In my opinion something should be said about this. Actually I think
> (suggest) it is essencially a matter of mainly taking Bruce' s
> comments (e.g. one cannot have derived works of non-free software) and
> write them down on a wiki page. Just things potential contributor
> would have to be aware of on a legal front.

I see what you mean.  Perhaps I am naive in thinking this should be
common knowledge amongst potential contributors.

>> Testing:
>> I'd strongly resist adding any new module without an accompanying
>> test, and wish this had been a firm policy from day one.
>
> People should also be encouraged to test (in as much as possible) in
> at least Win/Linux/Mac. Of course, for some people it will be
> difficult as access to all platforms is not always possible for
> everybody. But at least encouragement should be made...

Also tests which require additional setup are a pain.  The BioSQL
tests are an example of this, where it is unavoidable - but any
situation like this reduces the number of people/machines where that
test will get checked.  Michiel has stressed this kind of thing as a
concern in the past (as I recall).

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 09:31:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 09:31:22 -0500
Subject: [Biopython-dev] [Bug 2731] New: Adding .upper() and .lower()
	methods to the Seq object
Message-ID: <bug-2731-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731

           Summary: Adding .upper() and .lower() methods to the Seq object
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
 BugsThisDependsOn: 2532
OtherBugsDependingO 2351
             nThis:


As part of making the Seq object more string like (Bug 2351), it would be nice
to support the .upper() and .lower() methods.

Doing this elegantly will require different case versions of the alphabets (see
Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet
object itself.

Alternatively, we can handle this without adding new Alphabets by mapping the
fixed case IUPAC alphabets to case-less generic alphabets.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 09:31:25 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 09:31:25 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200901121431.n0CEVPFK010376@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |2731
              nThis|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 09:31:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 09:31:30 -0500
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
	even subclass string?
In-Reply-To: <bug-2351-42@http.bugzilla.open-bio.org/>
Message-ID: <200901121431.n0CEVUDG010399@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2351


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2731


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bsouthey at gmail.com  Mon Jan 12 12:03:45 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 12 Jan 2009 11:03:45 -0600
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
Message-ID: <496B77F1.9060207@gmail.com>

Peter wrote:
> On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>   
>> By the way, another issue that would be interesting to address is
>> deprecation of older Python versions and Python 3. Like just having a
>> clear stance on what is the current feeling about this. It seems to be
>> a recurring question.
>>     
>
> Regarding older versions of python, we have stated that Biopython 1.49
> should work on Python 2.3 to 2.6, and we expect to do the same for
> Biopython 1.50.  Thereafter, we will probably drop support for Python
> 2.3 (unless anyone has a strong need for it and makes their voice
> heard).  See the mailing list archive and the corresponding new
> postings:
> http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/
> http://news.open-bio.org/news/2008/11/biopython-release-149/
>
> Regarding Python 3, one hold up will be neither ReportLab nor NumPy
> have a clear plan for Python 3 - or at least that is my impression.
>   
There has been limited information on the numpy list regarding Python 3 
but there has been some investigation on this 
(http://www.scipy.org/Python3k). I did ask about Python 3 last year in 
the thread titled 'Report from SciPy' and Robert Kern's response should 
be at:
http://www.mail-archive.com/numpy-discussion at scipy.org/msg12101.html

Also, this thread has the future aims of numpy (obviously still awaiting 
scipy 0.7):
http://www.mail-archive.com/numpy-discussion at scipy.org/msg12091.html

Currently I think the main current effort for numpy 1.3 is getting 
Python 2.6 fully supported (windows is the main problem) before there 
will be any further consideration of Python 3. One of the main problems 
is that numpy uses a few APIs that are depreciated in Python 3. So any 
porting will not go far until the correct APIs are used which is 
probably be after the next numpy release.

> However, even ignoring those parts of Biopython which use NumPy (e.g.
> Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab),
> we have a lot of useful code.  In the short term we should be aiming
> to have everything run under Python 2.6 in warnings mode, as a step
> towards eventual Python 3 support.
>   
While I understand this approach, I do wonder how effective it will be 
compared to direct porting using the 2to3 tool. One reason is that 2to3 
is more than a code convertor as it also attempts to guess at what you 
are trying to do.

Anyhow, this is not a trivial task and I am willing to help in that regard.
 
> Beyond that, I think that it is likely we'll want to use bytes rather
> than (unicode) strings in Python 3 for the Seq object, but have not
> given this much thought.
>
>   
>>>> 5. Legal issues
>>>>         
>>> Try and avoid them?  What did you mean in particular?
>>>       
>> In my opinion something should be said about this. Actually I think
>> (suggest) it is essencially a matter of mainly taking Bruce' s
>> comments (e.g. one cannot have derived works of non-free software) and
>> write them down on a wiki page. Just things potential contributor
>> would have to be aware of on a legal front.
>>     
>
> I see what you mean.  Perhaps I am naive in thinking this should be
> common knowledge amongst potential contributors.
>   
I think we must be explicit in this and ensure that any accepted code is 
BSD-compatible because we can not ensure what people really know. 
Further the license of any application that Biopython interacts with 
must be clearly stated and the developer is responsible to get one if it 
does not have one. That way we know what is included and should help 
users as well in terms of whether or not they can use some application.

>   
>>> Testing:
>>> I'd strongly resist adding any new module without an accompanying
>>> test, and wish this had been a firm policy from day one.
>>>       
>> People should also be encouraged to test (in as much as possible) in
>> at least Win/Linux/Mac. Of course, for some people it will be
>> difficult as access to all platforms is not always possible for
>> everybody. But at least encouragement should be made...
>>     
>
> Also tests which require additional setup are a pain.  The BioSQL
> tests are an example of this, where it is unavoidable - but any
> situation like this reduces the number of people/machines where that
> test will get checked.  Michiel has stressed this kind of thing as a
> concern in the past (as I recall).
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
We can not force people to run tests but hope that sufficient people who 
do cover many of the variations as possible. Do we need to create 
buildbots (eg http://sourceforge.net/projects/buildbot/)?

I do not test or use BioSQL code because I do not use BioSQL and do not 
run a compatible database on my system. So it would be really great if 
BioSQL supported sqlite because the database requirements would be 
alleviated.

The other related aspect is that certain applications like clustalw must 
be in the path otherwise the application will not be found and the test 
skipped. But I do not know how to solve this except perhaps using 
environmental variables.

Regards
Bruce


From bsouthey at gmail.com  Mon Jan 12 12:34:50 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 12 Jan 2009 11:34:50 -0600
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>	
	<496397C9.3030706@gmail.com>	
	<6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
	<320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>
Message-ID: <496B7F3A.60407@gmail.com>

Peter wrote:
> On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman <jae at lmi.net> wrote:
>   
>> Greetings all,
>>
>> Presently, the code I have for dealing with STRUCTURE is similar to the code
>> for interacting with Clustal, in that it does not modify any of the STRUCTURE
>> source code by merely initiates the compiled executable.
>>     
>
> Biopython has code for interacting with lots of command line tools,
> and this neatly avoids any copyright/licence questions about being a
> derived work.
>   
I have no problem with this provided that the parsing follows documented 
information such a description of the output. I would have a problem if 
you based it code from another source that uses undocumented information 
or information not obvious from the output.

>   
>> Initially, I have used my code in place of their Java front end as it allows
>> for more control of the run-time variables for successive runs with varying
>> run parameters.  At some point, I'd like to get it to interface more
>> directly with the STRUCTURE code to be able to pipe results directly to
>> python for parsing rather than working with the STRUCTURE text output but
>> that's a ways off still.
>>     
>
> I'm not quite clear what you have in mind, but this would probably
> need a little more thought from the legal perspective.  If STRUCTURE
> provides an API with header files you can compile against, that should
> be OK (but I am not a lawyer).  Note that do this within Biopython
> would then mean adding another build time dependency, which would need
> to be justified in terms of the benefits it brings.
>
> Peter
>   
Linking against header files is a gray area but some views considered it 
to be illegal (see the Linux kernel discussions on that!). It does 
really depend on whether or not the result can be considered to a 
derivative.

Unless STRUCTURE is released under a BSD-compatible license, you should 
not use any code from it (and probably should not even look at the 
code). Just saying the code is free is insufficient because code 
licensed under the GPL is 'free' but not BSD-compatible. So if STRUCTURE 
does not have a license then either get one or forget about this until 
it does have a BSD-compatible license. Alternatively, get STRUCTURE to 
support your changes.

One is being difficult simply because of the potential impact on the 
Biopython project by including code incompatible with the BSD license.

Bruce

From biopython at maubp.freeserve.co.uk  Mon Jan 12 13:19:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 12 Jan 2009 18:19:03 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <496B77F1.9060207@gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
	<496B77F1.9060207@gmail.com>
Message-ID: <320fb6e00901121019h72463a5dl316cabc85100c09d@mail.gmail.com>

> We can not force people to run tests but hope that sufficient people who do
> cover many of the variations as possible. Do we need to create buildbots (eg
> http://sourceforge.net/projects/buildbot/)?

Some kind of "buildbots" would be nice - possibly with something
hosted on the OBF server to hold the reports (even just via the wiki
pages would work). I have access to one or two platforms at work which
might be able to act in this way, but the infrastructure isn't there
yet.

> I do not test or use BioSQL code because I do not use BioSQL and do not run
> a compatible database on my system. So it would be really great if BioSQL
> supported sqlite because the database requirements would be alleviated.

This was recently requested on the BioSQL mailing list - and it would be nice.

> The other related aspect is that certain applications like clustalw must be
> in the path otherwise the application will not be found and the test
> skipped. But I do not know how to solve this except perhaps using
> environmental variables.

Part of setting up a "buildbot" or test server would include
installing all the optional command line tools (like ClustalW) so that
the full test suite can be run.

Peter

From bsouthey at gmail.com  Mon Jan 12 17:24:00 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 12 Jan 2009 16:24:00 -0600
Subject: [Biopython-dev] Alphabet case and standards
Message-ID: <496BC300.90003@gmail.com>

Hi,
I am moving a potential discussion away from the bugzilla because it 
affects at least the following Bugs (please add others):
2351 (Make Seq more like a string, even subclass string? 
http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ),
2532 (Using IUPAC alphabets in mixed case Seq objects 
http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ),
2597 (Enforce alphabet letters in Seq objects 
http://bugzilla.open-bio.org/show_bug.cgi?id=2597 )
2731 (Adding .upper() and .lower() methods to the Seq object 
http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ).

I am hoping it gets wider feedback than using bugzilla, avoid 
unnecessary duplication and closure of these bugs.

 From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with 
defined lists of valid letters which are in upper case ONLY". But 
various applications ignore the alphabet case and hence the standards. 
So this creates the problem of how Biopython should handle alphabet case.

If we follow the standard for all modules then there should be not need 
to do anything except to ensure we follow it. There are numerous 
examples where the standard is not followed including users ignorance, 
simplicity or design (such as using mixed case to denote 'important' 
things), and various databases and applications do not follow it. But I 
think that the actual case is irrelevant in most situations and not 
following the standard would make Biopython inefficient.

One suggestion given in two of the bugs is to change the Alphabet object 
but I believe that this is wrong because you do not know which alphabet 
to use. If you already know the case then my preferred option is change 
the case of your query. Otherwise  you would have to obtain and use one 
alphabet for every case used, for example, a user may need two alphabets 
to handle upper and lower case or just one combined one. Also, if mixed 
case alphabets are used, then an excessive number of alphabets may be 
required.

I think that current approach is to force to user to using uppercase 
when interacting with the Alphabet object or derived from it (such as an 
actual alphabet). While this maintains storage of the input case, it 
does not enforce the standard. This is also inefficient because it 
requires constant checks for the correct case.

Similar to the first suggestion in Bug 2731, I think that we should 
automatically changes the case when creating any sequence-related object 
and provide a warning that the input has changed. This enforces standard 
and probably requires small changes to the code but loses the format of 
the input. Outside of Biopython, an example of this is the web version 
of NCBI blast silently converts input case of the query.

Less desirable options:
a) Enforces the standard such as with Bug 2597 so that an error is 
return for any sequence-related object if the case is incorrect. This is 
probably a little too harsh for a difference in case.
b) Use regular expressions to ignore case but this will create a large 
penalty especially if it is not required.

Regards
Bruce


From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 17:43:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 17:43:55 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901122243.n0CMhtlZ017015@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


------- Comment #1 from bsouthey at gmail.com  2009-01-12 17:43 EST -------
(In reply to comment #0)
> As part of making the Seq object more string like (Bug 2351), it would be nice
> to support the .upper() and .lower() methods.

Sure it would be nice in terms of following the string object, but I do not
follow the reasons for having .upper() and .lower() methods to the Seq object.
If we follow the standards, these should be unnecessary. The only time that I
see is when you want this is to output the sequence. In such situations, the
sequence is likely to be a string which has these methods.

I do not consider that other applications can handle different case a
sufficiently compelling reason.

> 
> Doing this elegantly will require different case versions of the alphabets (see
> Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet
> object itself.
> 
> Alternatively, we can handle this without adding new Alphabets by mapping the
> fixed case IUPAC alphabets to case-less generic alphabets.
> 

These comments suggests that Seq object needs to be case-aware which also
affects other methods like string queries. But I think this is a different
issue such as whether or not the standards would be enforced than having these
two methods. 

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Mon Jan 12 18:04:46 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 12 Jan 2009 23:04:46 +0000
Subject: [Biopython-dev] Alphabet case and standards
In-Reply-To: <496BC300.90003@gmail.com>
References: <496BC300.90003@gmail.com>
Message-ID: <320fb6e00901121504u6e9f3b7fu23e5f2ea25dee003@mail.gmail.com>

On Mon, Jan 12, 2009 at 10:24 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> I am moving a potential discussion away from the bugzilla because it affects
> at least the following Bugs (please add others):
> 2351 (Make Seq more like a string, even subclass string?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ),
> 2532 (Using IUPAC alphabets in mixed case Seq objects
> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ),
> 2597 (Enforce alphabet letters in Seq objects
> http://bugzilla.open-bio.org/show_bug.cgi?id=2597 )
> 2731 (Adding .upper() and .lower() methods to the Seq object
> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ).
>
> I am hoping it gets wider feedback than using bugzilla, avoid unnecessary
> duplication and closure of these bugs.

Yes, having a discussion on the mailing list is probably better than
on bugzilla.  I should probably write up my views on this topic
explicitly, but I've tried to do so below in reply to your points.

> From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with
> defined lists of valid letters which are in upper case ONLY". But various
> applications ignore the alphabet case and hence the standards. So this
> creates the problem of how Biopython should handle alphabet case.
> ...

I don't want to prevent people from using mixed case or lower case
sequences if they want to.  However, I do think doing so with an
alphabet which is intended to be an upper case ONLY should be treated
as an error.

We currently have a number of generic alphabets which DO NOT define
the a set of valid letters.  We also have some IUPAC derived alphabet
which define a set of upper case only expected letters.

So, if you want to use lower or mixed case sequences in a Seq object,
(1) Use a generic alphabet which does not explicitly define the valid
letters (so any characters are allowed)
(2) Use an explicit alphabet which includes the relevant cases.  This
could be a user defined alphabet, or we one added to Biopython.

Most of the time in my personally usage, I don't actually care about
the precise alphabet - the generic DNA/RNA/protein alphabets suffice.
These do not list the expected/allowed letters, and thus can be used
for upper case, lower case or mixed case sequences.  Working with well
defined alphabets is more important when working with things like
BLOSUM matrices.

> One suggestion given in two of the bugs is to change the Alphabet object but
> I believe that this is wrong because you do not know which alphabet to use.

The person creating the Seq object should know what kind of data they
are dealing with, and if they specifically want to use say "mixed case
unambiguous IUPAC DNA" (if this were in Biopython) then that's up to
them.  If you don't know exactly what you are dealing with, fall back
on the generic DNA alphabet, or the generic nucleotide alphabet, or
even the generic single letter alphabet.

> ... Also, if mixed case alphabets are used, then an excessive number
> of alphabets may be required.

We *could* introduce mixed case IUPAC alphabets, and lower case IUPAC
alphabets to complement the existing upper case IUPAC alphabets (see
my patch on 2532).  Yes, this does add a lot of alphabets, and I'm not
entirely keen on this either.  Maybe just adding mixed case versions
would suffice?

> I think that current approach is to force to user to using uppercase when
> interacting with the Alphabet object or derived from it (such as an actual
> alphabet). While this maintains storage of the input case, it does not
> enforce the standard. This is also inefficient because it requires constant
> checks for the correct case.

Right now we don't force the user to do anything.  I would like to
make the alphabet check strict (Bug 2579), or at least give a warning.
 Running with this change locally has flagged up several typos in my
unit tests - I think it is a good thing.

> Similar to the first suggestion in Bug 2731, I think that we should
> automatically changes the case when creating any sequence-related object and
> provide a warning that the input has changed. This enforces standard and
> probably requires small changes to the code but loses the format of the
> input. Outside of Biopython, an example of this is the web version of NCBI
> blast silently converts input case of the query.

My personal view on automatically changing the case of the sequence
string when creating a Seq object: NO WAY.  You're throwing away
potentially important data, and also preventing people from working
with mixed case sequences - for no real benefit.

> Less desirable options:
> a) Enforces the standard such as with Bug 2597 so that an error is return
> for any sequence-related object if the case is incorrect. This is probably a
> little too harsh for a difference in case.

It could be done as a warning for a couple of releases, and later an
error.  Why do you think it is too hash?  Maybe I am being pedantic
here, but lots of code gets written assuming uppercase letters only,
and in this situation having any unwanted lower case caught early is a
good thing.

To my mind the whole point about the user explicity using for example
the IUPAC protein alphabet is they expect the sequence to comply with
the IUPAC conventions.  I *WANT* to get an error if the sequence
contained something invalid like a "@" character, or anything else not
in the IUPAC definition.  Mixed cases are a special case of this (the
IUPAC standards use upper case).

> b) Use regular expressions to ignore case but this will create a large
> penalty especially if it is not required.

I'm not sure what you mean here, but I don't think regular expressions
are required.

Peter

From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 18:30:49 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 18:30:49 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901122330.n0CNUnG7021141@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-12 18:30 EST -------
Created an attachment (id=1191)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1191&action=view)
Patch to Bio/Seq.py ONLY adding upper and lower methods

This patch is a proof of principle of how we could add upper and lower methods
while following the strict alphabet checking proposed on Bug 2597.  The code is
a little complicated/nasty in order to localise the change to Bio/Seq.py only.

Here is a usage example with the patch applied,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("AGGGTGTTGA",IUPAC.IUPACUnambiguousDNA())
>>> my_dna
Seq('AGGGTGTTGA', IUPACUnambiguousDNA())
>>> my_dna.lower()
Seq('agggtgttga', NucleotideAlphabet())
>>> my_dna.lower().upper()
Seq('AGGGTGTTGA', NucleotideAlphabet())

Note that If we implemented (private) upper and lower methods in the Alphabet
objects as I suggested on Bug 2532, the code in the Seq class would be much
simpler, e.g.

def upper(self) :
    return Seq(str(self).upper(), self.alphabet._upper())
def lower(self) :
    return Seq(str(self).lower(), self.alphabet._upper())

The generic alphabets (where the list of letters is undefined) would just
return self, while the AlphabetEncoders could also implement these methods
simply.  Individual explicit alphabets (i.e. the IUPAC ones) would have to
define sensible upper/lower mappings - perhaps by defining lower case variants
(see Bug 2532).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 19:21:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 19:21:42 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901130021.n0D0LgUu024264@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1191 is|0                           |1
           obsolete|                            |


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-12 19:21 EST -------
(From update of attachment 1191)
There are a couple of "if" statements which should be "elif", but otherwise the
patch seems to cover the basics.

However, it does not cover the pathological/evil situation where a LETTER has
been used for a stop codon or gap character.  e.g. Something this should happen
(assuming Bug 2597 is implemented in order to trigger the exception shown):

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped
>>> my_dna = Seq("AGGGTXGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x"))
Traceback (most recent call last):
...
ValueError: Letter 'X' not in Gapped(IUPACUnambiguousDNA(), 'x')
>>> my_dna = Seq("AGGGTxGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x"))
>>> my_dna.lower()
Seq('agggtxgttga', Gapped(DNAAlphabet(), 'x'))
>>> my_dna.lower().upper()
Seq('AGGGTXGTTGA', Gapped(DNAAlphabet(), 'X'))

I think the most elegant way to deal with the AlphabetEncoders (stop and gaps)
is by adding (private) upper/lower methods to the Alphabet objects as I
outlined in comment 2. Patch taking this approach to follow...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 19:30:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 19:30:55 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901130030.n0D0UtHL024905@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-12 19:30 EST -------
Created an attachment (id=1192)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1192&action=view)
Patch to Bio/Seq.py and Bio/Alphabet/__init__.py

Implements upper/lower methods in the Seq object, handling the alphabet case
conversion in the Alphabet object using (private) upper/lower methods.  This
could be extended for the IUPAC alphabets if we add lower case variants to
those (see Bug 2532).

This works for the evil example in comment 3 where the case of any extra
characters from an AlphabetEncoder should also be changed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From dalloliogm at gmail.com  Tue Jan 13 06:49:19 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 13 Jan 2009 12:49:19 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
	<6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
	<320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>
Message-ID: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com>

On Sat, Jan 10, 2009 at 6:03 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>> On Sat, Jan 10, 2009 at 2:46 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> Using the wiki in this way is a nice idea.  Tiago - do you fancy
>>> adding a PopGen page describing the additions you're working on?  As a
>>> bonus, once these do get into the main repository, you may find the
>>> wiki text will be a useful basis for extending the documentation.
>>
>> Where do you want me to link the page on the Wiki?
>
> How about having two pages:
>
> http://biopython.org/wiki/PopGen
> - documentation on the code in the current official release,
> - linked to from the main doc page
>
> http://biopython.org/wiki/PopGen_dev

ok, I have started writing something there..


_______________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From tiagoantao at gmail.com  Tue Jan 13 07:14:05 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 13 Jan 2009 12:14:05 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <496B7F3A.60407@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
	<6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
	<320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>
	<496B7F3A.60407@gmail.com>
Message-ID: <6d941f120901130414v3f770f3dy84bc44e4b4a8e25f@mail.gmail.com>

> Linking against header files is a gray area but some views considered it to
> be illegal (see the Linux kernel discussions on that!). It does really
> depend on whether or not the result can be considered to a derivative.

Fortunately this is not the case with Jason's code.
Anyway, if there is agreement on what you said, I think most of the
comments made should be put on the Wiki in some form. I don't mind to
draft something myself based on your comments.

From tiagoantao at gmail.com  Tue Jan 13 07:34:56 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 13 Jan 2009 12:34:56 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <496B77F1.9060207@gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
	<496B77F1.9060207@gmail.com>
Message-ID: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>

> I think we must be explicit in this and ensure that any accepted code is
> BSD-compatible because we can not ensure what people really know. Further
> the license of any application that Biopython interacts with must be clearly
> stated and the developer is responsible to get one if it does not have one.
> That way we know what is included and should help users as well in terms of
> whether or not they can use some application.


A point is not clear here to me: If you only interact with an (say
command-line and web-based) application, is there a problem if that
application has an unspecified license? There are 3 dimensions here
that I find important
1. If biopython interacts with a application with no license are there
possible liabilities with regards to the project? The same question in
regards to users?
2. I would remember that interaction might be library based (with
linking - where we know problems exist), command-line based (are there
any problems?) and web-based (are there any problems different from
the command-line case?).
3. I would suppose (for licensed non-free apps) that some licenses
might not be clear in regards to this kind of usage. Would it be
necessary to inspect the licenses in detail?

A strict view regarding software without licenses (ie, no interaction
at all) would require immediate removal of the fdist code (not very
important, it is the part that is probably not used by anyone). No
inclusion of LDNe code. And more importantly no STRUCTURE interaction
code and no Genepop interaction code (although the file format parser
that currently inside is OK).

So, the very pertinent question are:
1. Can biopython command-line interact with applications with no license?
2. Is biopython interacting with applications (command-line or web)
for which the license is not clear regarding interaction with
software?

From p.j.a.cock at googlemail.com  Tue Jan 13 07:54:57 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 13 Jan 2009 12:54:57 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
	<496B77F1.9060207@gmail.com>
	<6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
Message-ID: <320fb6e00901130454i13f1faedw29e049f9b9df9478@mail.gmail.com>

> So, the very pertinent question are:
> 1. Can biopython command-line interact with applications with no license?

I think so, yes.  If there was a license then it may try and impose
rules which could prevent this (possible in some legal
jurisdictions?).  Even "viral" licences like the GPL should be fine in
this context.

However, for the Population Genetics software you are talking about,
trying to get the authors to make their licence explicit would be
worthwhile (even if they just say its given freely to the public
domain or whatever the terminology is).

> 2. Is biopython interacting with applications (command-line or web)
> for which the license is not clear regarding interaction with
> software?

For command line tools (e.g. ClustalW, BLAST) calling them from a
script is common practice.  In fact, by the nature command line tools
are generally expected to be used in this way.  I think we are OK
here.

For web tools, in some cases the provider provides clear instructions
(e.g. NCBI and BLAST and Entrez).  Another example is Bio.PDB can
fetch files from the FTP site - which is by its nature provided as a
public server.  In other cases things are perhaps a little less clear
cut.  Speaking generally, many websites do have conditions imposed in
their terms of service (e.g. TV listing sites don't want people
"screen scraping" with a script to "steal" the schedule information),
although these may not be legally enforeable.  However, this is
unlikely to be a problem in the academic setting applicable to most
websites Biopython may interact with.

Peter

From bsouthey at gmail.com  Tue Jan 13 11:50:28 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 13 Jan 2009 10:50:28 -0600
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>	
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>	
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>	
	<496B77F1.9060207@gmail.com>
	<6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
Message-ID: <496CC654.5090806@gmail.com>

Tiago Ant?o wrote:
>> I think we must be explicit in this and ensure that any accepted code is
>> BSD-compatible because we can not ensure what people really know. Further
>> the license of any application that Biopython interacts with must be clearly
>> stated and the developer is responsible to get one if it does not have one.
>> That way we know what is included and should help users as well in terms of
>> whether or not they can use some application.
>>     
>
>
> A point is not clear here to me: If you only interact with an (say
> command-line and web-based) application, is there a problem if that
> application has an unspecified license? There are 3 dimensions here
> that I find important
> 1. If biopython interacts with a application with no license are there
> possible liabilities with regards to the project? The same question in
> regards to users? 
>   
I do not think that there is any real difference between the developer 
and the user as ignorance is usually not a good defense.

If you use code from another application in your project with little or 
no modification (such as rewriting the code into Python) or did 
reverse-engineering or even looked at the code then your application 
could be controlled by the license of that application. Obviously if it 
has a license then you must abide those terms. If it does not have a 
license and you do not get permission to use that code then you have 
violated the original author's copyrights and you are liable for 
damages. Of course, as in one of the most important open-source related 
cases in the USA, the Jacobsen v. Katzer case (eg 
http://www.groklaw.net/article.php?story=2008081313212422 ) about the 
Java Model Railroad Interface (JMRI), those damages may be nothing.

> 2. I would remember that interaction might be library based (with
> linking - where we know problems exist), command-line based (are there
> any problems?) and web-based (are there any problems different from
> the command-line case?).
>   
Unless the application forbids it then there is no problem on how you 
actually run the application. As Peter said, web tools also have 
conditions that you have keep or you will find yourself locked out.

The main problem is using someone else's code in your project and the 
real problem is the actual terms of the code used. Using a function from 
that code in yours is a potential violation such as how to parse the 
output especially if it is in a binary format.  If your code clearly 
follows the published documentation or a clean-room approach (see 
http://en.wikipedia.org/wiki/Clean_room_design ) was properly used then 
there should no problems. Linking only becomes a problem if your code 
can be considered a derivative or the license forbids linking such as 
the GPL but not the LGPL. However, this is a grey area as evident from 
the use of binary drivers in Linux.

> 3. I would suppose (for licensed non-free apps) that some licenses
> might not be clear in regards to this kind of usage. Would it be
> necessary to inspect the licenses in detail?
>   
Yes, you must inspect any license in detail because even downloading the 
code can involve or imply acceptance of the terms. Some licenses, 
usually for commercial applications, are rather nasty in terms what can 
and can not be done like no reverse engineering. Even open source 
license like the GPL v3 can have some unexpected side effects (ie 
related to patents). Most non-open source licenses (including academic 
only licenses) that I have seen related to bioinformatics usually are 
aimed at restricting the commercial usage of the code and the subsequent 
distribution of it. But you need to see if there are other restrictions 
involved that limit the output from that application.

> A strict view regarding software without licenses (ie, no interaction
> at all) would require immediate removal of the fdist code (not very
> important, it is the part that is probably not used by anyone). No
> inclusion of LDNe code. And more importantly no STRUCTURE interaction
> code and no Genepop interaction code (although the file format parser
> that currently inside is OK).
>   
If the interaction is just creating inputs, running the standalone 
application and parsing the output, then those interactions should be 
okay. Obviously the code to create the input and parse the output must 
be free of the application like based on public documentation or a 
clean-room approach.

If the interaction creates a derivative such as when the code of the 
application is required in addition to your code then it is not okay. 
Further, as Peter commented elsewhere, there needs to be strong 
justification to include it into Biopython. Rather I would strongly 
suggest that you try to get your code included in the other application 
as it may help other users and you don't have to maintain a version of 
the original application.
> So, the very pertinent question are:
> 1. Can biopython command-line interact with applications with no license?
>   
Yes, but must not be considered a derivative of the application or it 
must do so in terms of the license. For example, AlignACE uses the 
Harvard  University license where everyone using it must have their own 
license or it can be run on a second computer provided that only one 
copy is running at a time.

> 2. Is biopython interacting with applications (command-line or web)
> for which the license is not clear regarding interaction with
> software?
>   
I do not know the answer to this question because I do not know or use 
all the applications involved. However, we do need to create a list of 
applications with associated web sites and licenses that Biopython 
'interacts' with which would answer this question.

Regards
Bruce

From bsouthey at gmail.com  Wed Jan 14 15:24:29 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 14 Jan 2009 14:24:29 -0600
Subject: [Biopython-dev] Running Biopython tests on windows xp
Message-ID: <496E49FD.4080305@gmail.com>

Hi,
I decided to install windows on a virtual system part to have a windows 
test system.  I installed Python 2.5, numpy 1.2 and biopython 1.49 using 
binary installers. I am aiming to get add the optional software like 
Reportlab and a C compiler.

Is there a way to run the Biopython tests within Python rather than 
using the system command line?

When I run the tests from the command like I get a number a failures 
that I think are due to a lack of a C compiler.
Are these expected or do you want bug reports?

Bruce

C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.49>c:\Pyt
hon25\python.exe setup.py test
running test
test_Ace ... ok
test_AlignIO ... ok
test_BioSQL ... skipping. Install MySQLdb or correct 
Tests/setup_BioSQL.py (not
important if you do not plan to use BioSQL).
ok
test_BioSQL_SeqIO ... skipping. Install MySQLdb or correct 
Tests/setup_BioSQL.py
 (not important if you do not plan to use BioSQL).
ok
test_CAPS ... ERROR
test_Clustalw ... ok
test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you 
want to us
e Bio.Clustalw.
ok
test_Cluster ... FAIL
test_CodonTable ... ok
test_CodonUsage ... ok
test_Compass ... ok
test_Crystal ... ok
test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL.
ok
test_EmbossPrimer ... ok
test_Entrez ... ok
test_Enzyme ... ok
test_FSSP ... ok
test_Fasta ... ok
test_Fasta2 ... ok
test_File ... ok
test_GACrossover ... ok
test_GAMutation ... ok
test_GAOrganism ... ok
test_GAQueens ... ok
test_GARepair ... ok
test_GASelection ... ok
test_GFF ... skipping. Environment is not configured for this test (not 
importan
t if you do not plan to use Bio.GFF).
ok
test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF.
ok
test_GenBank ... ok
test_GraphicsChromosome ... skipping. Install reportlab if you want to 
use Bio.G
raphics.
ok
test_GraphicsDistribution ... skipping. Install reportlab if you want to 
use Bio
.Graphics.
ok
test_GraphicsGeneral ... skipping. Install reportlab if you want to use 
Bio.Grap
hics.
ok
test_HMMCasino ... ok
test_HMMGeneral ... ok
test_HotRand ... ok
test_IsoelectricPoint ... ok
test_KDTree ... ERROR
test_KEGG ... ok
test_KeyWList ... ok
test_Location ... ok
test_LocationParser ... ok
test_LogisticRegression ... ok
test_MEME ... ok
test_MarkovModel ... ok
test_Medline ... ok
test_NCBIStandalone ... ok
test_NCBIXML ... ok
test_NCBI_qblast ... ok
test_NNExclusiveOr ... ok
test_NNGene ... ok
test_NNGeneral ... ok
test_Nexus ... ok
test_PDB ... ERROR
test_ParserSupport ... ok
test_Pathway ... ok
test_Phd ... ok
test_PopGen_FDist ... skipping. Install FDist if you want to use 
Bio.PopGen.FDis
t.
ok
test_PopGen_FDist_nodepend ... ok
test_PopGen_GenePop ... ok
test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use 
Bio.PopGen
.SimCoal.
ok
test_PopGen_SimCoal_nodepend ... ok
test_ProtParam ... ok
test_Registry ... ok
test_Restriction ... ERROR
test_SCOP_Astral ... ok
test_SCOP_Cla ... FAIL
test_SCOP_Des ... ok
test_SCOP_Dom ... ok
test_SCOP_Hie ... ok
test_SCOP_Raf ... FAIL
test_SCOP_Residues ... ok
test_SCOP_Scop ... ok
test_SProt ... ok
test_SVDSuperimposer ... ok
test_SeqIO ... ok
test_SeqIO_online ... ok
test_SeqUtils ... ok
test_SubsMat ... ok
test_UniGene ... ok
test_Wise ... skipping. Don't know how to find the Wise2 tool dnal on 
Windows.
ok
test_align ... ok
test_docstrings ... ok
test_geo ... ok
test_interpro ... ok
test_kNN ... ok
test_lowess ... ok
test_pairwise2 ... ok
test_prodoc ... ok
test_property_manager ... ok
test_prosite ... ok
test_prosite2 ... ok
test_psw ... skipping. Don't know how to find the Wise2 tool dnal on 
Windows.
ok
test_seq ... ok
test_translate ... ok
test_trie ... ERROR
test_triefind ... ERROR

======================================================================
ERROR: test_CAPS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_CAPS.py", line 3, in <module>
    from Bio.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\Restriction.py", line 96, in <module>
    from Bio.Restriction.PrintFormat import PrintFormat
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\PrintFormat.py", line 14, in <module>
    from Bio.Restriction.DNAUtils import complement
ImportError: No module named DNAUtils

======================================================================
ERROR: test_KDTree
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_KDTree.py", line 10, in <module>
    from Bio.KDTree.KDTree import _neighbor_test, _test
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\__init__.py", line 10, in <module>
    from KDTree import KDTree
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\KDTree.py", line 20, in <module>
    from Bio.KDTree import _CKDTree
ImportError: cannot import name _CKDTree

======================================================================
ERROR: test_PDB
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_PDB.py", line 98, in <module>
    run_test()
  File "test_PDB.py", line 90, in run_test
    quick_neighbor_search_test()
  File "test_PDB.py", line 19, in quick_neighbor_search_test
    from Bio.PDB.NeighborSearch import NeighborSearch
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\PDB\NeighborSearch.py", line 8, in <module>
    from Bio.KDTree import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\__init__.py", line 10, in <module>
    from KDTree import KDTree
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\KDTree.py", line 20, in <module>
    from Bio.KDTree import _CKDTree
ImportError: cannot import name _CKDTree

======================================================================
ERROR: test_Restriction
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_Restriction.py", line 8, in <module>
    from Bio.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\Restriction.py", line 96, in <module>
    from Bio.Restriction.PrintFormat import PrintFormat
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\PrintFormat.py", line 13, in <module>
    from Bio.Restriction import RanaConfig as RanaConf
ImportError: cannot import name RanaConfig

======================================================================
ERROR: test_trie
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_trie.py", line 6, in <module>
    from Bio import trie
ImportError: cannot import name trie

======================================================================
ERROR: test_triefind
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_triefind.py", line 6, in <module>
    from Bio import trie
ImportError: cannot import name trie

======================================================================
FAIL: test_Cluster
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 162, in runSafeTest
    expected_handle)
  File "run_tests.py", line 263, in compare_output
    % (repr(output_line), repr(expected_line))
AssertionError:
Output  : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n'
Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n'

======================================================================
FAIL: test_SCOP_Cla
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 162, in runSafeTest
    expected_handle)
  File "run_tests.py", line 263, in compare_output
    % (repr(output_line), repr(expected_line))
AssertionError:
Output  : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n'
Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n'

======================================================================
FAIL: test_SCOP_Raf
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 162, in runSafeTest
    expected_handle)
  File "run_tests.py", line 263, in compare_output
    % (repr(output_line), repr(expected_line))
AssertionError:
Output  : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n'
Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n'

----------------------------------------------------------------------
Ran 96 tests in 86.153s

FAILED (failures=3, errors=6)

C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.49>

From tiagoantao at gmail.com  Wed Jan 14 15:52:58 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 14 Jan 2009 20:52:58 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
	<6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
	<320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>
	<5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com>
Message-ID: <6d941f120901141252x1a1088f9n7f30d894f35c18ab@mail.gmail.com>

>> http://biopython.org/wiki/PopGen_dev
>
> ok, I have started writing something there..

I've edited the development one. I would recommend anyone interested
in tracking the changes to watch the page.

From biopython at maubp.freeserve.co.uk  Wed Jan 14 16:43:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 14 Jan 2009 21:43:33 +0000
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <496E49FD.4080305@gmail.com>
References: <496E49FD.4080305@gmail.com>
Message-ID: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>

On Wed, Jan 14, 2009 at 8:24 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> I decided to install windows on a virtual system part to have a windows test
> system.  I installed Python 2.5, numpy 1.2 and biopython 1.49 using binary
> installers. I am aiming to get add the optional software like Reportlab and
> a C compiler.

If you are installing Biopython using our Windows Installer then you
shouldn't need a C compiler.

If you would like to install from source, then yes, you will need a C
compiler.  You can either try the appropriate MS compiler for your
version of python, or we suggest Mingw32 from cygwin.

> Is there a way to run the Biopython tests within Python rather than using
> the system command line?

Not really - why do you want to?  I suppose you could use python to
invoke the command "python run_tests.py".

> When I run the tests from the command like I get a number a failures that I
> think are due to a lack of a C compiler.
>
> Are these expected or do you want bug reports?

These are not expected.  The whole test suite passes for me on Windows
where I have installed Biopython from source.

So you installed Biopython using our Window Installer - how did you
get the unit tests?  I'm pretty sure the SCOP failures are due to the
files under Tests\SCOP having Unix line endings instead of Windows
line endings (we're fixed some similar issues in the past).  Note that
both the source code archives as *.zip and *.tar.gz use Unix line
endings internally, but if you used CVS it should have got them with
Windows line endings for you.

However, most of your test failures do seem to be related to C code in
some way.  I wonder if this is linked to the virtual environment?  I
should be able to try the Biopython 1.49 installer with Python 2.5 on
a Windows machine myself to check that...

The list of failures:
> test_CAPS ... ERROR
> test_Cluster ... FAIL
> test_KDTree ... ERROR
> test_PDB ... ERROR
> test_Restriction ... ERROR
> test_SCOP_Cla ... FAIL
> test_SCOP_Raf ... FAIL
> test_trie ... ERROR
> test_triefind ... ERROR

And some comments on the messages:

> ERROR: test_CAPS
> ...
>   from Bio.Restriction.DNAUtils import complement
> ImportError: No module named DNAUtils

Strange.  Note Bio.Restriction.DNAUtils is a C module.

> ERROR: test_KDTree
> ...
>   from Bio.KDTree import _CKDTree
> ImportError: cannot import name _CKDTree

Again, Bio.KDTree. _CKDTree is a C module

> ERROR: test_PDB
> ...
>   from Bio.KDTree import _CKDTree
> ImportError: cannot import name _CKDTree

Same failure as test_KDTree

> ERROR: test_Restriction
> ...
>   from Bio.Restriction import RanaConfig as RanaConf
> ImportError: cannot import name RanaConfig

Odd.  RanaConfig is a pure python module, and pretty short too.

> ERROR: test_trie
> ...
>   from Bio import trie
> ImportError: cannot import name trie

Bio.trie is another C module

> ERROR: test_triefind
> ...
>   from Bio import trie
> ImportError: cannot import name trie

Same error as test_trie above.

> FAIL: test_Cluster
> ...
> Output  : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n'
> Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n'

Could you run this test directly (python test_Cluster.py) which should
give a more helpful message.  But again, this module does include some
C code....

> FAIL: test_SCOP_Cla
> ...
> Output  : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n'
> Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n'

I think this is just a new line issue.

> FAIL: test_SCOP_Raf
> ...
> Output  : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n'
> Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n'

I think this is just a new line issue.

Peter

From bsouthey at gmail.com  Wed Jan 14 17:48:27 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 14 Jan 2009 16:48:27 -0600
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
Message-ID: <496E6BBB.2020506@gmail.com>

Peter wrote:
> These are not expected.  The whole test suite passes for me on Windows
> where I have installed Biopython from source.
>
> So you installed Biopython using our Window Installer - how did you
> get the unit tests?  I'm pretty sure the SCOP failures are due to the
> files under Tests\SCOP having Unix line endings instead of Windows
> line endings (we're fixed some similar issues in the past).  Note that
> both the source code archives as *.zip and *.tar.gz use Unix line
> endings internally, but if you used CVS it should have got them with
> Windows line endings for you.
>
> However, most of your test failures do seem to be related to C code in
> some way.  I wonder if this is linked to the virtual environment?  I
> should be able to try the Biopython 1.49 installer with Python 2.5 on
> a Windows machine myself to check that...
>
> The list of failures:
>   
>> test_CAPS ... ERROR
>> test_Cluster ... FAIL
>> test_KDTree ... ERROR
>> test_PDB ... ERROR
>> test_Restriction ... ERROR
>> test_trie ... ERROR
>> test_triefind ... ERROR
>>     
Using IDLE, 'from Bio.Restriction import *' works correctly.

These ones are failures to find the correct biopython installation. 
Both  'python setup.py test' and 'python run_tests.py' are assuming that 
I have built from source and everything is in the local directory. But 
that assumption is wrong since I used the Biopython binary installer so 
technically the tests I run are invalid. The difference for these 
failures can be seen here:

C:\Documents and 
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe test_KDTree.py
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.

C:\Documents and 
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe run_tests.py test_KDTree.py
test_KDTree ... ERROR

======================================================================
ERROR: test_KDTree
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_KDTree.py", line 10, in <module>
    from Bio.KDTree.KDTree import _neighbor_test, _test
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\__init__.py", line 10, in <module>
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\KDTree.py", line 20, in <module>
ImportError: cannot import name _CKDTree

----------------------------------------------------------------------
Ran 1 test in 0.100s

FAILED (errors=1)


For the SCOP tests, this is as you say, a 'end of line' issue between 
windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and 
saved it with a new name. The line from testIndex in test_SCOP_Cla.py 
that gave the error index['d4hbia_'] works with the new file but not the 
old file.

I also installed reportlab and biosql and these pass the tests (except 
for the mysql warning with Biosql that Peter reported).

Regards
Bruce

From biopython at maubp.freeserve.co.uk  Wed Jan 14 18:27:27 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 14 Jan 2009 23:27:27 +0000
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <496E6BBB.2020506@gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
	<496E6BBB.2020506@gmail.com>
Message-ID: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>

On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Using IDLE, 'from Bio.Restriction import *' works correctly.
>
> These ones are failures to find the correct biopython installation. Both
>  'python setup.py test' and 'python run_tests.py' are assuming that I have
> built from source and everything is in the local directory. But that
> assumption is wrong since I used the Biopython binary installer so
> technically the tests I run are invalid.

I think I understand what's going on now.  All these failures are
essentially due to the unusual and unexpected setup on your machine
(or for the SCOP tests, the line endings).  You still didn't explain
how/where you installed the test scripts etc, but what I think is
happening is the following:

You're official installation (including the compiled C code) create
using the Windows Installer is in one place, say under
C:\XXX\site-packages for the sake of discussion.

You've unpacked the source code in another location, and are trying to
run the test suite there.  This set of files will NOT have the
compiled C code - and thus running some of the tests via run_tests.py
will fail.  If you run individual test_XXX.py files this should use
the system installed files under C:\XXX\site-packages and so the test
should work.

It would be a bit of a hack, but you can probably overcome this by
manually copying the installed compiled modules from
C:\XXX\site-packages into the unpacked source code (under a suitably
named build sub directory), or moving the Test suite next to the
installed code.

Alternatively, you could try editing run_tests.py to comment out the
path "magic" so that is just uses the system installation of Biopython
(rather than trying to use the local copy it expects you to have just
built from source), i.e. try commenting out these two lines in
run_tests.py found near the start of the main function:

sys.path.insert(1, source_path)
sys.path.insert(1, build_path)

However, I'm no longer surprised that the C code tests are failing,
and don't think this is a bug per se.

> For the SCOP tests, this is as you say, a 'end of line' issue between
> windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and
> saved it with a new name. The line from testIndex in test_SCOP_Cla.py that
> gave the error index['d4hbia_'] works with the new file but not the old
> file.

Good to confirm that.  If you spot an easy cross platform fix so that
the SCOP code can cope with either line ending that would be good, but
I didn't consider this worth sending much time on.

> I also installed reportlab and biosql and these pass the tests (except for
> the mysql warning with Biosql that Peter reported).

Good.  Out of interest, which BioSQL warning are you talking about?

Peter

From bsouthey at gmail.com  Wed Jan 14 22:10:30 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 14 Jan 2009 21:10:30 -0600
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
	<496E6BBB.2020506@gmail.com>
	<320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>
Message-ID: <bbcd77d00901141910s304ef76brc92257af459361b0@mail.gmail.com>

On Wed, Jan 14, 2009 at 5:27 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> Using IDLE, 'from Bio.Restriction import *' works correctly.
>>
>> These ones are failures to find the correct biopython installation. Both
>>  'python setup.py test' and 'python run_tests.py' are assuming that I have
>> built from source and everything is in the local directory. But that
>> assumption is wrong since I used the Biopython binary installer so
>> technically the tests I run are invalid.
>
> I think I understand what's going on now.  All these failures are
> essentially due to the unusual and unexpected setup on your machine
> (or for the SCOP tests, the line endings).

I do not see it as unusual as it does follow the instructions. But
these clearly need some enhancement to address perhaps a variation of
one of the options below.

I am now curious about what happens under Linux distros because these
may have the same issue.

> You still didn't explain
> how/where you installed the test scripts etc, but what I think is
> happening is the following:
>
> You're official installation (including the compiled C code) create
> using the Windows Installer is in one place, say under
> C:\XXX\site-packages for the sake of discussion.
>
> You've unpacked the source code in another location, and are trying to
> run the test suite there.  This set of files will NOT have the
> compiled C code - and thus running some of the tests via run_tests.py
> will fail.  If you run individual test_XXX.py files this should use
> the system installed files under C:\XXX\site-packages and so the test
> should work.

Correct!

The installation documentation is lacking at least for the binary
installer. Depending on what happens, I will write down this
information.

Would be be a hassle to include the tests with the binary installer?
At least of the tests should work if they are run from that directory.

>
> It would be a bit of a hack, but you can probably overcome this by
> manually copying the installed compiled modules from
> C:\XXX\site-packages into the unpacked source code (under a suitably
> named build sub directory), or moving the Test suite next to the
> installed code.

While this would work for the binary installer, I do not think it is
suitable solution for building it from source - especially if someone
has the binary installer and is building but not necessary installing
from source.

>
> Alternatively, you could try editing run_tests.py to comment out the
> path "magic" so that is just uses the system installation of Biopython
> (rather than trying to use the local copy it expects you to have just
> built from source), i.e. try commenting out these two lines in
> run_tests.py found near the start of the main function:
>
> sys.path.insert(1, source_path)
> sys.path.insert(1, build_path)

I think the best solution is to fix this part because these assume the
location of the source and build directories even if these are not
really present. I would suggest we add a new commandline option that
causes the source_path and/or build_path variables to be undefined
forcing Python to use the installed versions. Passing a user-specified
path is also an option but these can get long.


> However, I'm no longer surprised that the C code tests are failing,
> and don't think this is a bug per se.

Agreed - just a case that has not been addressed yet.

>
>> For the SCOP tests, this is as you say, a 'end of line' issue between
>> windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and
>> saved it with a new name. The line from testIndex in test_SCOP_Cla.py that
>> gave the error index['d4hbia_'] works with the new file but not the old
>> file.
>
> Good to confirm that.  If you spot an easy cross platform fix so that
> the SCOP code can cope with either line ending that would be good, but
> I didn't consider this worth sending much time on.

When I get to my system, I will see if my Linux system will accept the
file correctly because the other SCOP tests did work. If I get time I
will try to look at that as I looked at the function and I think it is
just the way the file is being used.
>
>> I also installed reportlab and biosql and these pass the tests (except for
>> the mysql warning with Biosql that Peter reported).
>
> Good.  Out of interest, which BioSQL warning are you talking about?
>
> Peter

Sorry, I do not have that handy but it is depreciation one for a
setting that will be gone in MySQL 5.2.

Bruce

From biopython at maubp.freeserve.co.uk  Thu Jan 15 07:46:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 15 Jan 2009 12:46:21 +0000
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <bbcd77d00901141910s304ef76brc92257af459361b0@mail.gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
	<496E6BBB.2020506@gmail.com>
	<320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>
	<bbcd77d00901141910s304ef76brc92257af459361b0@mail.gmail.com>
Message-ID: <320fb6e00901150446j57748cf0mb493601444a9422d@mail.gmail.com>

>>
>> I think I understand what's going on now.  All these failures are
>> essentially due to the unusual and unexpected setup on your machine
>> (or for the SCOP tests, the line endings).
>
> I do not see it as unusual as it does follow the instructions. But
> these clearly need some enhancement to address perhaps a variation of
> one of the options below.

There are no instructions on how to install Biopython on Windows using
the provided installer and then run the unit tests - so I don't
understand what you mean by you followed the instructions.  If the
installer came with the unit tests then this would be sensible.

Right now the only documented way to run the unit tests is part of an
installation from source.

>> You've unpacked the source code in another location, and are trying to
>> run the test suite there.  This set of files will NOT have the
>> compiled C code - and thus running some of the tests via run_tests.py
>> will fail.  If you run individual test_XXX.py files this should use
>> the system installed files under C:\XXX\site-packages and so the test
>> should work.
>
> Correct!
>
> The installation documentation is lacking at least for the binary
> installer. Depending on what happens, I will write down this
> information.
>
> Would be be a hassle to include the tests with the binary installer?

I don't know enough about distutils to answer that.  So the short
answer is yes, it might be a hassle.

> At least of the tests should work if they are run from that directory.

Which directory?

>> It would be a bit of a hack, but you can probably overcome this by
>> manually copying the installed compiled modules from
>> C:\XXX\site-packages into the unpacked source code (under a suitably
>> named build sub directory), or moving the Test suite next to the
>> installed code.
>
> While this would work for the binary installer, I do not think it is
> suitable solution for building it from source - especially if someone
> has the binary installer and is building but not necessary installing
> from source.

The hack suggested was specifically for combining the installed files
from the Windows installer with the test suite by hand - you don't
need to do anything special if you are building from source.  The
current run_tests.py should work perfectly for anyone building from
source (on Windows, Linux and Mac).  You can (and ideally should)
build biopython, and then run the tests BEFORE installing it.

>> Alternatively, you could try editing run_tests.py to comment out the
>> path "magic" so that is just uses the system installation of Biopython
>> (rather than trying to use the local copy it expects you to have just
>> built from source), i.e. try commenting out these two lines in
>> run_tests.py found near the start of the main function:
>>
>> sys.path.insert(1, source_path)
>> sys.path.insert(1, build_path)
>
> I think the best solution is to fix this part because these assume the
> location of the source and build directories even if these are not
> really present.

If you are building from source this is a safe assumption (and in fact
the code does check they exist).  We WANT to run the tests using the
just built and not yet installed files!

> I would suggest we add a new commandline option that
> causes the source_path and/or build_path variables to be undefined
> forcing Python to use the installed versions. Passing a user-specified
> path is also an option but these can get long.

Yes, an option to run_test.py to use the system installed version of
Biopython could solve this particular situation.  Alternatively, and
perhaps more simply for the end user, we could add a prompt if there
is no build directory to ask the user if they want to run the tests
using an already installed version of Biopython.  I might have time to
come up with a patch for this...

>> However, I'm no longer surprised that the C code tests are failing,
>> and don't think this is a bug per se.
>
> Agreed - just a case that has not been addressed yet.

----------------------------------------------------------------------------------------------

>>> I also installed reportlab and biosql and these pass the tests (except for
>>> the mysql warning with Biosql that Peter reported).
>>
>> Good.  Out of interest, which BioSQL warning are you talking about?
>>
>> Peter
>
> Sorry, I do not have that handy but it is depreciation one for a
> setting that will be gone in MySQL 5.2.

You might be referring to BioSQL Bug 2568,
http://bugzilla.open-bio.org/show_bug.cgi?id=2568

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 09:37:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 09:37:57 -0500
Subject: [Biopython-dev] [Bug 2733] New: Unit tests incorrectly assume that
	Biopthyon was built from source
Message-ID: <bug-2733-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733

           Summary: Unit tests incorrectly assume that Biopthyon was built
                    from source
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P4
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


If Biopython is not built from source and the tests are run from a different
place than the installation, the test that use C objects fail because these are
not found (an example is below).

Currently the test environment uses the Biopython in the build directory. It
would be nice to be able to optionally specify some other Biopython such as the
installed version using say a command line argument.

Example of a failure:

======================================================================
ERROR: test_KDTree                                                    
----------------------------------------------------------------------
Traceback (most recent call last):                                    
  File "run_tests.py.orig", line 125, in runTest                      
    self.runSafeTest()                                                
  File "run_tests.py.orig", line 138, in runSafeTest                  
    cur_test = __import__(self.test_name)                             
  File "test_KDTree.py", line 10, in <module>                         
    from Bio.KDTree.KDTree import _neighbor_test, _test               
  File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/__init__.py",
line 10, in <module>
    from KDTree import KDTree                                                   
  File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/KDTree.py",
line 20, in <module>  
    from Bio.KDTree import _CKDTree                                             
ImportError: cannot import name _CKDTree  
======================================================================


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 09:44:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 09:44:15 -0500
Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that
	Biopthyon was built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151444.n0FEiFd8020991@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #1 from bsouthey at gmail.com  2009-01-15 09:44 EST -------
Created an attachment (id=1197)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1197&action=view)
Patch to avoid adding source path if Biopython is not built from source

This is a simple path to that just moves the inclusion of the source path to
being conditional on the presence of the build directory. That is, if a build
directory exists, then we assume that Biopython was built from the source. But
if the build directory does not exist then the source path is not added and the
test environment will use the installed Biopython and not the source directory. 

This patch works on a Linux system with the build directory removed and a
Windows XP system using the binary Biopython installer.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 10:20:58 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 10:20:58 -0500
Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that
	Biopthyon was built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151520.n0FFKwqZ024124@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-15 10:20 EST -------
Created an attachment (id=1198)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view)
Patch to Tests/run_tests.py

Bruce,

Could you try out this alternative patch which tries to tell the user what is
happening in this atypical situation.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 10:26:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 10:26:13 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151526.n0FFQD5F024483@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|minor                       |enhancement
            Summary|Unit tests incorrectly      |Runing unit tests where
                   |assume that Biopthyon was   |Biopthyon wasn't built from
                   |built from source           |source


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-15 10:26 EST -------
Retitling bug and marking it as an enhancement.

The main use case for this is Windows users who installed Biopython from one
our Windows Installers (pre-compiled, does not include the unit tests), and
later download and unzip the source code archive in order to run the unit
tests.

As Bruce points out, this might also apply to Linux users who install a
Biopython package (pre-compiled, and presumably not including the unit tests),
and then want to run the unit tests without themselves compiling Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 10:41:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 10:41:34 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151541.n0FFfYgG025830@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #4 from dalloliogm at gmail.com  2009-01-15 10:41 EST -------
(In reply to comment #0)

What about re-organizing the tests in three categories:
- the ones needed to make sure the modules don't contain errors
- the ones needed to make sure that biopython can run correctly in the user's
environment
- the ones needed to make sure that the C modules are compiled correctly.

Usually, people don't need to repeat the tests from case 1, but only case 2 and
in 3 if they have compiled biopython by theirselves.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 11:09:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 11:09:34 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151609.n0FG9Y5V028318@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #5 from bsouthey at gmail.com  2009-01-15 11:09 EST -------
(In reply to comment #2)
> Created an attachment (id=1198)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view) [details]
> Patch to Tests/run_tests.py
> 
> Bruce,
> 
> Could you try out this alternative patch which tries to tell the user what is
> happening in this atypical situation.
> 
> Peter
> 

Very quickly it works for my Linux system where I removed the build directory
but have Biopython installed. I will let you known for Windows and also when
Biopython is not installed. But I do not foresee any problems with the patch.

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 12:18:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 12:18:31 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151718.n0FHIVSm001687@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-15 12:18 EST -------
(In reply to comment #4)
> (In reply to comment #0)
> 
> What about re-organizing the tests in three categories:
> - the ones needed to make sure the modules don't contain errors
> - the ones needed to make sure that biopython can run correctly
>   in the user's environment
> - the ones needed to make sure that the C modules are compiled correctly.
> 
> Usually, people don't need to repeat the tests from case 1, but only
> case 2 and in 3 if they have compiled biopython by theirselves.

Case 1 applies to all the unit tests.
Case 2 applies to all the unit tests whose dependencies are present.
Case 3 applies to those modules with C code.

I don't really understand your divisions.  If was compiling Biopython myself,
I've want all the tests run.  If I installed a pre-compiled version Biopython
(from a Linux distribution or the Windows installers), I'd still want to try
and run all the tests.

There is the special case of trying to use Biopython without the C code modules
(e.g. installing from source without a C compiler, or for repackaging a subset
of the modules), but that is atypical.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 15:31:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 15:31:21 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901152031.n0FKVLDp015913@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #7 from bsouthey at gmail.com  2009-01-15 15:31 EST -------
(In reply to comment #5)
> (In reply to comment #2)
Just to confirm that it works as expected with windows xp 
1) Without Biopython installed

C:\Documents and
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py
You do not seem to have built Biopython from source.
You do not seem to have installed Biopython.

2) With Biopython installed:
C:\Documents and
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py
You do not seem to have built Biopython from source.
Unit tests will be run using the installed Biopython.
test_trie ... ok

----------------------------------------------------------------------
Ran 1 test in 0.731s

OK


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 18:55:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 18:55:14 -0500
Subject: [Biopython-dev] [Bug 2734] New: db.load problem with postgresql and
	psycopg2
Message-ID: <bug-2734-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734

           Summary: db.load problem with postgresql and psycopg2
           Product: Biopython
           Version: 1.49
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: stephen at blackrim.net


I have a simple script to load sequences into a postgresql database using the
biosql schema and biopython db.load function. 

here is the script :

from Bio import GenBank
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="psycopg2", user=...)
db = server["plants"]
for i in range(37):
        handle = open("PLN/gbpln"+str(i+1)+".seq", "r")
        db.load(SeqIO.parse(handle,"genbank"))
        handle.close()
        print str(i+1)
server.adaptor.commit()

there is an error with the output and here it is with some of the psycopg2
debug info:

asis_dealloc: deleted asis object at 0x52350, refcnt = 0
psyco_curs_execute: cvt->refcnt = 1
curs_execute: pg connection at 0x8d0c00 OK
pq_begin: pgconn = 0x8d0c00, isolevel = 1, status = 2
pq_begin: transaction in progress
pq_execute: executing SYNC query:
   SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id =
"3" AND dbxref_id = "6"
pq_execute: entering syncronous DBAPI compatibility mode
pq_fetch: pgstatus = PGRES_FATAL_ERROR
pq_fetch: uh-oh, something FAILED
pq_fetch: fetching done; check for critical errors
psyco_curs_execute: res = -1, pgres = 0x0
Traceback (most recent call last):
 File "add_seqs_subdb2 2.py", line 9, in <module>
   db.load(SeqIO.parse(handle,"genbank"))
 File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420,
in load
   db_loader.load_seqrecord(cur_record)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in
load_seqrecord
   self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in
_load_seqfeature
   self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in
_load_seqfeature_qualifiers
   seqfeature_id)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in
_load_seqfeature_dbxref
   self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in
_get_seqfeature_dbxref
   dbxref_id))
 File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295,
in execute_and_fetch_col0
   self.cursor.execute(sql, args or ())
psycopg2.ProgrammingError: column "3" does not exist
LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db...

it seems like there could be some issues with the double quotes but i am not
sure where that is being called. i am using postgresql 8.2.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 16 05:24:16 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 16 Jan 2009 05:24:16 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901161024.n0GAOGFA015422@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-16 05:24 EST -------
Hi Stephen,

Does this happen for all the files you've tried, or just one or two?  If its
the later it may be something funny about the file and how its been parsed. 
I'm guessing you downloaded the GenBank files from
ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing.

Have you tried running the Biopython unit tests - in particular the two for
BioSQL?  I presume you installed Biopython from source on your Mac, so you
should have all the files present.  You'll need to edit the file
Tests/setup_BioSQL.py to point to a suitable postgresql test database.

P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to
import Bio.GenBank (first line of code snippet).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 16 14:12:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 16 Jan 2009 14:12:28 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901161912.n0GJCSWO030831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #2 from stephen at blackrim.net  2009-01-16 14:12 EST -------
Hi Peter,
Thanks for the quick reply. I will try to answer everything here. So I just
reran the BioSQL tests and I get 
test_BioSQL ... ok
test_BioSQL_SeqIO ... ok

so seems like everything there is fine (and I did configure the test for
postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it
happens not only with all the files but also with the example on the biopython
biosql wiki page. Specifically with this example:
from Bio import Entrez
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="psycopg2", ...)
db = server["plants"]
handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289",
rettype="genbank")
db.load(SeqIO.parse(handle, "genbank"))
server.adaptor.commit()

I get the same error:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420,
in load
    db_loader.load_seqrecord(cur_record)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in
load_seqrecord
    self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in
_load_seqfeature
    self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in
_load_seqfeature_qualifiers
    seqfeature_id)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in
_load_seqfeature_dbxref
    self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in
_get_seqfeature_dbxref
    dbxref_id))
  File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295,
in execute_and_fetch_col0
    self.cursor.execute(sql, args or ())
psycopg2.ProgrammingError: column "3" does not exist
LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db...

Thanks for any help. 
Stephen

(In reply to comment #1)
> Hi Stephen,
> 
> Does this happen for all the files you've tried, or just one or two?  If its
> the later it may be something funny about the file and how its been parsed. 
> I'm guessing you downloaded the GenBank files from
> ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing.
> 
> Have you tried running the Biopython unit tests - in particular the two for
> BioSQL?  I presume you installed Biopython from source on your Mac, so you
> should have all the files present.  You'll need to edit the file
> Tests/setup_BioSQL.py to point to a suitable postgresql test database.
> 
> P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to
> import Bio.GenBank (first line of code snippet).
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jan 17 05:09:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 17 Jan 2009 05:09:21 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901171009.n0HA9Lk3027163@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #3 from cymon.cox at gmail.com  2009-01-17 05:09 EST -------
Hi Stephen,

2009/1/16  <bugzilla-daemon at portal.open-bio.org>:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2734
>
> ------- Comment #2 from stephen at blackrim.net  2009-01-16 14:12 EST -------
> Hi Peter,
> Thanks for the quick reply. I will try to answer everything here. So I just
> reran the BioSQL tests and I get
> test_BioSQL ... ok
> test_BioSQL_SeqIO ... ok
>
> so seems like everything there is fine (and I did configure the test for
> postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it
> happens not only with all the files but also with the example on the biopython
> biosql wiki page. Specifically with this example:
> from Bio import Entrez
> from Bio import SeqIO
> from BioSQL import BioSeqDatabase
> server = BioSeqDatabase.open_database(driver="psycopg2", ...)
> db = server["plants"]
> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289",
> rettype="genbank")
> db.load(SeqIO.parse(handle, "genbank"))
> server.adaptor.commit()

This code works form me:
[cymon at chara ~]$ python
Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Entrez
>>> from Bio import SeqIO
>>> from BioSQL import BioSeqDatabase
>>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test")
>>> db = server.new_database("blah", description="Just for testing")
>>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank")
>>> server.adaptor.commit()
>>>

What versions of biopython and the BioSQL schema are you using?

Cymon


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Sat Jan 17 05:50:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 17 Jan 2009 05:50:19 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901171050.n0HAoJZa029834@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #4 from cymon.cox at gmail.com  2009-01-17 05:50 EST -------
> This code works form me:
> [cymon at chara ~]$ python
> Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36)
> [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from Bio import Entrez
> >>> from Bio import SeqIO
> >>> from BioSQL import BioSeqDatabase
> >>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test")
> >>> db = server.new_database("blah", description="Just for testing")
> >>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank")
> >>> server.adaptor.commit()
> >>>

Sorry forgot to load it! :)

>>> db.load(SeqIO.parse(handle, "genbank"))
3
>>> server.adaptor.commit()
>>> 

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan 21 13:22:47 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 Jan 2009 13:22:47 -0500
Subject: [Biopython-dev] [Bug 2738] New: Speed up GenBank parsing,
	in particular location parsing
Message-ID: <bug-2738-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738

           Summary: Speed up GenBank parsing, in particular location parsing
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This is an enhancement "bug", for trying to improve the speed of parsing
GenBank files WITHOUT any functionality changes.  From previous profiling, I
have found that the location parsing looks like an easy target.  However, this
code is non-trivial so we should proceed with caution.

Possible patch to follow...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan 21 13:30:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 Jan 2009 13:30:27 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901211830.n0LIURFx009561@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-21 13:30 EST -------
Created an attachment (id=1206)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1206&action=view)
Patch for Bio/GenBank/__init__.py to handle simple locations with re

This patch handles the simple cases (non-fuzzy, no database references) using
simple python and regular expressions.  Everything else works by falling back
on the old spark based Bio.GenBank.LocationParser code (e.g. fuzzy locations).

The new code is pretty simple, and could potentially be extended to cover all
the currently used location strings found in the feature table, allowing us to
remove the use of Bio.GenBank.LocationParser, which in the long term this could
lead to an overall code simplification.

In the short term, this patch does complicate the location parsing because it
means there are effectively two ways we parse the location strings (my new
code, and the old spark based Bio.GenBank.LocationParser code).

However, from my limited testing using Python 2.5 on the Mac with GenBank files
for large bacterial genomes, this may be a price worth paying.  I'll like
independent measurements (and to check this on other platforms), but this does
seem to more than halve the time taken to parse GenBank files!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 22 13:58:18 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 22 Jan 2009 13:58:18 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901221858.n0MIwIpR000974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-22 13:58 EST -------
Created an attachment (id=1208)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view)
Simple test script for timing GenBank parsing

I've attached a trivial script to time parsing all the GenBank files in 
directory to help anyone wanting to benchmark this change.

(In reply to comment #1)
> However, from my limited testing using Python 2.5 on the Mac with GenBank
> files for large bacterial genomes, this may be a price worth paying.  I'll
> like independent measurements (and to check this on other platforms), but
> this does seem to more than halve the time taken to parse GenBank files!

Further testing with Python 2.5 on Linux, this time also with some large
Eurakyotics files, appears to confirm a very large speed up (most obvious on
feature rich GenBank files of course).

I still want to check this on other versions of python...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 23 03:43:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 Jan 2009 03:43:01 -0500
Subject: [Biopython-dev] [Bug 2740] New: Wise test fails with wise 2.4.1
Message-ID: <bug-2740-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2740

           Summary: Wise test fails with wise 2.4.1
           Product: Biopython
           Version: 1.49
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: charles-debian-nospam at plessy.org


Dear Biopython developers,

The test for wise fails with wise 2.4.1 and Biopython 1.49. I think one gap is
missing in the reference used in the test script (probably that wise changed
its gap opening penalties):

anx159???Tests???$ dnal Wise/human_114_g01_exons.fna_01
Wise/human_114_g02_exons.fna_01 
Warning Error
        Strangely truncated line in fasta file
Warning Error
        Strangely truncated line in fasta file
DnaAlign Matrix calculation: [  14000] Cells 95%
Score 114
Warning Error
        Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than
allowed name block (12). Truncating

Warning Error
        Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than
allowed name block (12). Truncating

ENSG00000172135   AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC 
                  A GGAA GCCCC  AGCTC  CT  TCT   CT C TCC    TGC A   TGG  TCC 
ENSG000001631     ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC 


ENSG00000172191   CA                                                          
                  CA                                                          
ENSG0000016347    CA         


This is compared to a different reference result in the test script:

anx159???Tests???$ grep -A5 -B5 ENSG00000172135 test_Wise.py 
        sys.stdout = self.old_stdout

class TestWise(unittest.TestCase):
    def test_align(self):
        temp_file = Wise.align(["dnal"], ("Wise/human_114_g01_exons.fna_01",
"Wise/human_114_g02_exons.fna_01"), kbyte=100000, force_type="DNA", quiet=True)
        self.assertEqual(temp_file.readline().rstrip(), "ENSG00000172135  
AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC")

def run_tests(argv):
    test_suite = testing_suite()
    runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
    runner.run(test_suite)

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 23 07:06:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 Jan 2009 07:06:29 -0500
Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1
In-Reply-To: <bug-2740-42@http.bugzilla.open-bio.org/>
Message-ID: <200901231206.n0NC6T4B023669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2740


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-23 07:06 EST -------
Thanks for the report.  Based on the following pages I had assumed the latest
version was wise 2.2.0, available here:

http://www.sanger.ac.uk/Software/Wise2/ points to
ftp://ftp.ebi.ac.uk/pub/software/unix/wise2/ which only contains up to wise
2.2.0

After some Google searching I found Ewan Birney had changed his mind and stared
work on it again:
http://www.ebi.ac.uk/~birney/wise2/

Installing wise 2.4.1 took a while (tip for Linux uses, edit file
src/models/phasemodel.c line 23 to replace isnumber by isdigit), but I can
confirm the error you reported.

This is the output from an older version of wise,

$ ~/Downloads/wise2.2.0/src/bin/dnal Wise/human_114_g01_exons.fna_01
Wise/human_114_g02_exons.fna_01 
DnaAlign Matrix calculation: [  14000] Cells 97%
Warning Error
        Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than
allowed name block (12). Truncating

Warning Error
        Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than
allowed name block (12). Truncating

ENSG00000172135   AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC 
                  A GGAA GCCCC  AGCTC  CT  TCT   CT C TCC    TGC A    GG TCCC 
ENSG000001631     ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGCTCCC 


ENSG00000172192   A                                                           
                  A                                                           
ENSG0000016348    A                                                           


Using the newer version of wise, we do indeed get a different alignment:

$ ~/Downloads/wise2.4.1/src/bin/dnal Wise/human_114_g01_exons.fna_01
Wise/human_114_g02_exons.fna_01 
DnaAlign Matrix calculation: [  14000] Cells 97%
Score 114
Warning Error
        Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than
allowed name block (12). Truncating

Warning Error
        Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than
allowed name block (12). Truncating

ENSG00000172135   AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC 
                  A GGAA GCCCC  AGCTC  CT  TCT   CT C TCC    TGC A   TGG  TCC 
ENSG000001631     ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC 


ENSG00000172191   CA                                                          
                  CA                                                          
ENSG0000016347    CA 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 23 07:28:05 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 Jan 2009 07:28:05 -0500
Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1
In-Reply-To: <bug-2740-42@http.bugzilla.open-bio.org/>
Message-ID: <200901231228.n0NCS5a8028823@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2740


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-23 07:28 EST -------
This should be fixed in CVS, see:

Tests/test_Wise.py revision 1.7
Tests/output/test_Wise revision 1.3

All I have done is made the unit test accept the old output, or the slightly
different output from wise 2.4.1 - the main Biopython code is unchanged.

>From the help text (just run dnal with no arguments), it appears the gap
penalties have not changed - so the differing alignments but be an algorithm
change of some sort.

Another small difference is with wise 2.4.1, even in quiet mode, dnal starts
its output by printing the score.

Thank you for reporting this,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 05:13:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 05:13:43 -0500
Subject: [Biopython-dev] [Bug 2743] New: manual installation overwrites
	previous biopython installations
Message-ID: <bug-2743-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743

           Summary: manual installation overwrites previous biopython
                    installations
           Product: Biopython
           Version: Not Applicable
          Platform: All
               URL: http://lists.open-bio.org/pipermail/biopython/2009-
                    January/004893.html
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com


The manual biopython installation (the one made with python setup.py install)
installs all the files in a directory like this:
- /usr/lib/python2.5/site-packages/Bio

The problem comes when you want to install biopython in a system where there is
already an old version installed.
In that case, it is not clear what happens to the old installation... are all
the old files removed before the new version is installed? Or are the two
versions 'mixed'?

please refer to this discussion:
- http://lists.open-bio.org/pipermail/biopython/2009-January/004893.html


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 06:05:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 06:05:07 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281105.n0SB577F013398@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2009-01-28 06:05 EST -------
(In reply to comment #0)
> The manual biopython installation (the one made with python setup.py install)
> installs all the files in a directory like this:
> - /usr/lib/python2.5/site-packages/Bio
> 
> The problem comes when you want to install biopython in a system where there is
> already an old version installed.
> In that case, it is not clear what happens to the old installation... are all
> the old files removed before the new version is installed? Or are the two
> versions 'mixed'?

Isn't this what always happens when installing a Python module? If so, then it
doesn't seem to be a Biopython bug to me.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 06:14:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 06:14:28 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281114.n0SBESYY014510@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #2 from dalloliogm at gmail.com  2009-01-28 06:14 EST -------
(In reply to comment #1)
> (In reply to comment #0)
> > The manual biopython installation (the one made with python setup.py install)
> > installs all the files in a directory like this:
> > - /usr/lib/python2.5/site-packages/Bio
> > 
> > The problem comes when you want to install biopython in a system where there is
> > already an old version installed.
> > In that case, it is not clear what happens to the old installation... are all
> > the old files removed before the new version is installed? Or are the two
> > versions 'mixed'?
> 
> Isn't this what always happens when installing a Python module? If so, then it
> doesn't seem to be a Biopython bug to me.


Well, I don't know if it is the same behaviour for the other python modules,
but it can create dangerous situations, especially if you are 'downgrading' a
biopython installation.
The biopython installer should clarify that, asking the user if he wants to
overwrite the existing installation, change the installation path, or abort.


Anyway. the right way to install biopython should be by using easy_install.
Easy_install downloads the latest code and creates an egg, and then install
everything on a directory like this:
- /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/
automatically changing $PYTHON_PATH.

I suggest to change the biopython's wiki to tell people that they should always
prefer to install biopython with easy_install, which by the way works perfectly
and automatically checks the dependencies.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 07:46:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 07:46:37 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281246.n0SCkbKj028750@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-28 07:46 EST -------
(In reply to comment #1)
> > the old files removed before the new version is installed? Or are the two
> > versions 'mixed'?
> 
> Isn't this what always happens when installing a Python module? If so, then it
> doesn't seem to be a Biopython bug to me.

Agreed.  As far as I know, this affects ANY python module installed with
distutils - and indeed this is typical practice for ANY unix tool installed
from source via a make file.  It is essentially NORMAL, although not so nice
for beginners.

Linux distributions will often provide packaged versions of python libraries
(including Biopython) which you can install/update/remove using the system's
package manager (e.g. apt, yum, up2date etc).  The only downside to me is they
won't always have the latest version of each package.

I suppose we could add a hack to setup.py to check if there is already a
Biopython installation present (try doing "import Bio"), and if it is
installed, ask the user if they want to continue.  However, there are
legitimate situations where this just makes things more confusing.  e.g. You
don't have admin rights on a unix machine where your systems administrator has
provided python and an old version of Biopython, so you want to install the
latest version of Biopython under your home directory.

(In reply to comment #2)
> I suggest to change the biopython's wiki to tell people that they should
> always prefer to install biopython with easy_install, which by the way works
> perfectly and automatically checks the dependencies.

For now distutils is still the python standard, while easy_install is an
non-standard optional extra.  This in some ways using easy_install is more
work.

Note that easy_install doesn't provide a simple uninstall either:
http://peak.telecommunity.com/DevCenter/EasyInstall#uninstalling-packages


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 10:23:48 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 10:23:48 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281523.n0SFNmqQ013945@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #4 from bsouthey at gmail.com  2009-01-28 10:23 EST -------
(In reply to comment #3)
> (In reply to comment #1)
> > > the old files removed before the new version is installed? Or are the two
> > > versions 'mixed'?
> > 
> > Isn't this what always happens when installing a Python module? If so, then it
> > doesn't seem to be a Biopython bug to me.
> 
> Agreed.  As far as I know, this affects ANY python module installed with
> distutils - and indeed this is typical practice for ANY unix tool installed
> from source via a make file.  It is essentially NORMAL, although not so nice
> for beginners.
> 

Agreed that this is not a Biopython bug but a Python feature.

Yes, the installation is usually 'mixed' when installing from source. The setup
will remove the existing egg-info and then a new one. Python copies the files
to the appropriate place thus overwriting any old files with new versions but
old files that are no longer present or files with different names will remain.
To my knowledge, Python and Biopython will not know about those files unless a
user explicitly tries to use them. 

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 12:41:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 12:41:19 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291741.n0THfJYC018518@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #3 from bsouthey at gmail.com  2009-01-29 12:41 EST -------
First, I object to this patch because it replaces the current version without
keeping the old code. It should create a new parsing function so verify that
the old and new versions provide exactly the same output for the same input. 

As indicated below, it does speed things up! So I have no problems for it to
replace the current parsing code in the next release provided that the old
parsing code remains as depreciated function. (Alternatively add a conditional
statement with a flag to avoid this new code as required.) 

(In reply to comment #2)
> Created an attachment (id=1208)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view) [details]
> Simple test script for timing GenBank parsing
> 
> I've attached a trivial script to time parsing all the GenBank files in 
> directory to help anyone wanting to benchmark this change.
> 
> (In reply to comment #1)
> > However, from my limited testing using Python 2.5 on the Mac with GenBank
> > files for large bacterial genomes, this may be a price worth paying.  I'll
> > like independent measurements (and to check this on other platforms), but
> > this does seem to more than halve the time taken to parse GenBank files!
> 
> Further testing with Python 2.5 on Linux, this time also with some large
> Eurakyotics files, appears to confirm a very large speed up (most obvious on
> feature rich GenBank files of course).
> 
> I still want to check this on other versions of python...
> 

I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5 and
2.6) and noted that this halved the time required to parse a Genbank
Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb) with
213942 records with total length 158245604 bp). 

While the number of records and sequences are the same, I have not checked if
the patched version is providing exactly the same output as the unpatched
version. This is very important for the different types of GenBank files (Whole
Genome Shotgun and CON types).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 12:57:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 12:57:22 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291757.n0THvMVl023111@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-29 12:57 EST -------
(In reply to comment #3)
> First, I object to this patch because it replaces the current version without
> keeping the old code.

It does keep the old code, and explicitly uses the old code for the non-simple
locations.

> It should create a new parsing function so verify that
> the old and new versions provide exactly the same output for the same input. 

We should probably extend the Biopython GenBank/EMBL parsing unit tests to make
sure this patch doesn't break anything, and additionally have some extra test
cases using big GenBank files which won't become official unit tests.  This
could be as simple as a script which parses all the records in a set of GenBank
files, printing out a very minimal summary of each feature location (including
subfeatures).  We then run the script with and without the patch, and confirm
their output matches.

Once we are happy that the patch doesn't change the parser behaviour, I don't
see any reason to offer both options to the end user.  In fact, I would prefer
to go further and REMOVE the old slow location parser after extending the
regular expression based parser to cope with ALL location variants.

> As indicated below, it does speed things up! So I have no problems for it to
> replace the current parsing code in the next release provided that the old
> parsing code remains as depreciated function. (Alternatively add a conditional
> statement with a flag to avoid this new code as required.) 

Having the new code controlled by some option would actually be pretty easy. 
Other than for testing I see no reason to do this.

> I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5
> and 2.6) and noted that this halved the time required to parse a Genbank
> Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb)
> with 213942 records with total length 158245604 bp). 

That is consistent with the speed ups I have seen - you can get even more
depending on the proportion of features in the file.  Thanks for checking
python 2.3 to 2.6, nice to see they all benefit.

> While the number of records and sequences are the same, I have not checked if
> the patched version is providing exactly the same output as the unpatched
> version. This is very important for the different types of GenBank files
> (Whole Genome Shotgun and CON types).

I agree through testing is important here.  Would you like to suggest any
particular WGS or CON files for testing with?  I'm thinking something large
with a wide range of location types would be good for checking this patch (but
not to include with Biopython).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 13:26:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 13:26:09 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291826.n0TIQ9YR030903@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-29 13:26 EST -------
Created an attachment (id=1209)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1209&action=view)
Simple test script for checking GenBank location parsing

This is a simple script to help validate the location parsing has not changed. 
Intended usage is to put the script in a directory with a good set of test
GenBank files (all ending with the extension .gbk), then:

(starting with a clean install of Biopython)

$ time python parse_gbk_locs.py > old.txt

(apply the patch)

$ time python parse_gbk_locs.py > new.txt

(verify the output matches)

$ ls -l old.txt new.txt

(check file sizes agree)

$ diff old.txt new.txt

(should be no output)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 14:38:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 14:38:20 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291938.n0TJcKh2021246@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #6 from bsouthey at gmail.com  2009-01-29 14:38 EST -------
Created an attachment (id=1210)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view)
Single test case that is not correctly parsed

I just used a simple 'print record' followed by a diff (but that does not check
the references). This record (and related ones) has a difference between
versions ...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 16:13:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 16:13:19 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901292113.n0TLDJ51019466@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #7 from bsouthey at gmail.com  2009-01-29 16:13 EST -------
(In reply to comment #4)
> > While the number of records and sequences are the same, I have not checked if
> > the patched version is providing exactly the same output as the unpatched
> > version. This is very important for the different types of GenBank files
> > (Whole Genome Shotgun and CON types).
> 
> I agree through testing is important here.  Would you like to suggest any
> particular WGS or CON files for testing with? 

I downloaded a few example files including WGS and CON. I found that CON files
are not parsed by either version. Not a surprise given that these have no
sequences but that is a different topic. Apart from the errors in attached
case, I have not seen any other errors (even parsing the references).

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 06:00:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:00:24 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301100.n0UB0OsD002442@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:00 EST -------
(In reply to comment #6)
> Created an attachment (id=1210)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view) [details]
> Single test case that is not correctly parsed
> 
> I just used a simple 'print record' followed by a diff (but that does not
> check the references). This record (and related ones) has a difference
> between versions ...

If you do a 'print record' with a SeqRecord object, any references are shown
using their __repr__ string - which is currently the python object default
which includes a memory address (something I've been meaning to address on Bug
2544).  Different objects will have different memory locations, which will show
up in the diff.

For example, using the following as a simple test script and capturing its
output to files:

from Bio import SeqIO
record = SeqIO.read(open("CY029873.gbk"), "genbank")
print record

Running diff with and without the patch gave me:

9c9
< /references=[<Bio.SeqFeature.Reference instance at 0xb7b7bfcc>,
<Bio.SeqFeature.Reference instance at 0xb7b8412c>]
---
> /references=[<Bio.SeqFeature.Reference instance at 0x866b04c>, <Bio.SeqFeature.Reference instance at 0x866b18c>]

i.e. No real differences between the records as far as I can see.  Please
clarify - if you have found a failing example I would be most interested.

(In reply to comment #7)
> I downloaded a few example files including WGS and CON. I found that CON files
> are not parsed by either version. Not a surprise given that these have no
> sequences but that is a different topic. Apart from the errors in attached
> case, I have not seen any other errors (even parsing the references).

Could you clarify your problem with the CON files please (on a new bug, or the
mailing list - since as you point out this is a different topic).  I've just
downloaded and unzipped one of the smaller CON files and it parses fine for me:
ftp://ftp.ncbi.nih.gov/genbank/gbcon107.seq.gz

>>> from Bio import SeqIO
>>> count = 0
>>> for record in SeqIO.parse(open("gbcon107.seq"),"genbank") : count += 1
...
>>> print count
55031

As expected there is no sequence, but the name, description, features,
references etc are there.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 06:29:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:29:07 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301129.n0UBT7Ah008213@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:29 EST -------
I've run my test script (attachment 1209) on a Linux machine with Python 2.5

 5.5K Jan 30 10:29 CY029873.gbk
  67M Jan 22 17:53 dr_ref_chr16.gbk
  42M Jan 22 17:53 NC_003075.gbk
  14M Jan 22 18:43 NC_003272.gbk
  25M Jan 22 17:52 NC_003279.gbk
 4.8M Jan 22 18:44 NC_004350.gbk
  20M Jan 22 18:42 NC_008095.gbk
  14M Jan 22 18:44 NC_009925.gbk
  18M Jan 22 18:43 NC_010628.gbk
 296M Jan 22 17:52 ptr_ref_chr1.gbk
  86M Jan 30 10:55 wgs.AAAB.1.gnp.gbk
 297M Jan 30 10:55 wgs.AABR.10.gbff.gbk

The last two files are WGS data for protein and nucleotide sequences,
downloaded from ftp://ftp.ncbi.nih.gov/genbank/wgs/ then unzipped and a gbk
extension added so my script parses them.

With and without the patch the test script gives identical output - which
appears to confirm the location parsing is not functionally altered.  The
timings where just over 2min and just over 8min with and without the patch (a
four fold speed up on this dataset).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 06:30:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:30:30 -0500
Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
In-Reply-To: <bug-2649-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301130.n0UBUUMm008550@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:30 EST -------
Marking as fixed - please reopen this if need be.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 06:54:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:54:26 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments for their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301154.n0UBsQbw014456@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:54 EST -------
(In reply to comment #5)
> Ok, understood. I didn't thought of these cases.
> However, having not a Seq causes errors that are difficult to
> understand in other functions that use SeqRecord.
> For example, if you do:
> 
> >>> a = SeqRecord(id = '1')
> >>> a.format('fasta')
> 
> you get the error: 
> <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
> 'tostring'
> 
> This could scary an eventual biopython newbie, an exception like to
> 'error - current SeqRecord object doesn't have a Seq' could be better.

Well, if you want to create a SeqRecord where the sequence is None, you'd have
to do SeqRecord(None, id="1") - your suggestion of SeqRecord(id="1") doesn't
work as the sequence is a mandatory argument.

However, I see your point that the current AttributeError isn't helpful in this
special case.  I've updated the Bio/SeqIO/FastaIO.py file in CVS (revision
1.15) to give a TypeError in this situation which will try to explain the
problem.

> What do you think about creating a 'NullSeq' object, which represent a
> Seq with no value, and using it as a default for SeqRecord?
> Later we could modify the other functions like .format e Seq.translate to
> intercept these objects and return the right error message.

Hmm.  It seems rather complicated for a rare case.  Using None to mean
"missing" or "null" is done in other python libraries/code (e.g. database
access), which is why I suggested someone might want to do this.

Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 07:00:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 07:00:19 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301200.n0UC0JcD016114@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 07:00 EST -------
(In reply to comment #3)
> 
> What versions of biopython and the BioSQL schema are you using?
> 
> Cymon

According to the bug report, Stephen was using Biopython 1.49, so:

Stephen:
Biopython 1.49
postgresql 8.2 
BioSQL - schema version unspecified
psycopg2 - version unspecified
python - version unspecified
OS - Mac OS X

What about you Cymon - you have postgresql with psycopg2 working, but what
versions of things?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 07:13:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 07:13:52 -0500
Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation
	document
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301213.n0UCDqef019147@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 07:13 EST -------
(In reply to comment #2)
> I'm leaving this bug open until I've updated the HTML and PDF copies of the
> installation document on the website.  I don't have the tools hevea installed
> on this machine, so I can't create the HTML version of the installation
> document -- just the PDF.  I should be be able to do this next week...

Website updated.  Marking this bug as fixed. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 07:20:06 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 07:20:06 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301220.n0UCK6Fp020687@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #6 from cymon.cox at gmail.com  2009-01-30 07:20 EST -------
(In reply to comment #5)
> (In reply to comment #3)
> > 
> > What versions of biopython and the BioSQL schema are you using?
> > 
> > Cymon
> 
> According to the bug report, Stephen was using Biopython 1.49, so:
> 
> Stephen:
> Biopython 1.49
> postgresql 8.2 
> BioSQL - schema version unspecified
> psycopg2 - version unspecified
> python - version unspecified
> OS - Mac OS X
> 
> What about you Cymon - you have postgresql with psycopg2 working, but what
> versions of things?
> 
> Peter
> 

Peter,

I'm using:
Biopython: CVS
Posgresql: 8.1.11
BioSQL: 1.0.1
Python: 2.5.2
Psycopg: 2.0.8 
OS: Red Hat Enterprise 5.3

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 09:16:32 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:16:32 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301416.n0UEGWeN005337@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1139 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:16 EST -------
Created an attachment (id=1211)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1211&action=view)
Patch to Bio/MaxEntropy.py to make the convergence parameters optional
arguments

This should retain API backwards compatibility by using the current module
level values as the function's default arguments (see earlier comments).  I've
checked that changing these and then re-calling the train function does work as
expected.

How does this look?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 09:17:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:17:43 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301417.n0UEHhKG005438@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #1211|application/octet-stream    |text/plain
          mime type|                            |
Attachment #1211 is|0                           |1
              patch|                            |


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:17 EST -------
(From update of attachment 1211)
Marking this as a patch (plain text)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 09:19:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:19:43 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301419.n0UEJhID005587@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1211 is|0                           |1
           obsolete|                            |


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:19 EST -------
(From update of attachment 1211)
Sorry - wrong version of the patch.  This doesn't cover _iis_solve_delta etc.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 09:30:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:30:40 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301430.n0UEUe04006448@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:30 EST -------
Created an attachment (id=1212)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1212&action=view)
Patch to Bio/MaxEntropy.py to make the convergence parameters optional
arguments

This time its the whole patch - sorry for the extra emails this has triggered. 
I had stopped to check in a couple of docstring changes and fixed a few tabs in
MaxEntropy.py first, which confused things.

Note this is a bit different to what I was thinking in comment #5,
> ... something like this:
> 
> def train(training_set, results, feature_fns, update_fn=None,
>           max_iis_iterations = MAX_IIS_ITERATIONS,
>           iis_convere = IIS_CONVERGE,
>           max_newton_iterations = MAX_NEWTON_ITERATIONS
>           newton_coverage = NEWTON_CONVERGE):

The above code won't pick up changes to the module level variables like
MAX_IIS_ITERATIONS because the defaults are only evaluated once when the
function is created.  The patch deals with this as follows:

def train(training_set, results, feature_fns, update_fn=None,
          max_iis_iterations=None, iis_converge=None,
          max_newton_iterations=None, newton_converge=None):
    if max_iis_iterations is None :
        max_iis_iterations = MAX_IIS_ITERATIONS
    if iis_converge is None :
        iis_converge = IIS_CONVERGE
    if max_newton_iterations is None :
        max_newton_iterations = MAX_NEWTON_ITERATIONS
    if newton_converge is None :
        newton_converge = NEWTON_CONVERGE

This works :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 09:34:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:34:23 -0500
Subject: [Biopython-dev] [Bug 2745] New: Bio.GenBank.LocationParserError
	with a GenBank CON file
Message-ID: <bug-2745-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745

           Summary: Bio.GenBank.LocationParserError with a GenBank CON file
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The following file has a Bio.GenBank.LocationParserError:
ftp://ftp.ncbi.nih.gov/genbank/daily-nc/con_nc.0103.flat.gz

Partial error message (as the last line is the complete CONTIG line).

Syntax error at or near `Tokens('close_paren')' token                           
Traceback (most recent call last):                                              
  File "parse_gbk.py", line 26, in <module>                                     
    for record in SeqIO.parse(handle, "genbank") :                              
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 410, in parse_records                                
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 393, in parse                                        
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 371, in feed                                         
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 1093, in _feed_misc_lines                            
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py",
line 990, in contig_location                             
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py",
line 707, in location                                    
Bio.GenBank.LocationParserError:
join(DS483543.1:1..325170,gap(unk100),DS483544.1:1..218545,gap(unk100),DS483545.1:1..95394,gap(unk100),DS483546.1:1..261305,gap(unk100),DS483547.1:1..63422,gap(unk100),DS483548.1:1..77432,gap(unk100),DS483549.1:1..371434,gap(unk100),DS483550.1:1..74569,gap(unk100),DS483551.1:1..54637,gap(unk100),DS483552.1:1..73591,gap(unk100),DS483553.1:1..63632,gap(unk100),DS483554.1:1..60619,gap(unk100),DS483555.1:1..57196,gap(unk100),DS483556.1:1..95189,gap(unk100),DS483557.1:1..48586,gap(unk100),DS483558.1:1..45971,gap(unk100),DS483559.1:1..59826,gap(unk100),DS483560.1:1..49535,gap(unk100),DS483561.1:1..51083,gap(unk100),...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 09:35:41 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:35:41 -0500
Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a
	GenBank CON file
In-Reply-To: <bug-2745-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301435.n0UEZfpC007388@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745


------- Comment #1 from bsouthey at gmail.com  2009-01-30 09:35 EST -------
Created an attachment (id=1213)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1213&action=view)
Example of a single GenBank CON record that fails


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 09:47:36 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:47:36 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301447.n0UEla5Q009025@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #10 from bsouthey at gmail.com  2009-01-30 09:47 EST -------
(In reply to comment #8)
Thanks, I was able to print out the references from the annotations and I also
did not see any differences. 

I submitted a bug for the CON file.

I am a lot more comfortable with this patch now that a wide range of files have
been tested. But you can confirm that the example I provided is correctly
parsed?

Thanks
Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 10:11:56 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 10:11:56 -0500
Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a
	GenBank CON file
In-Reply-To: <bug-2745-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301511.n0UFBuEW012224@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 10:11 EST -------
It's the "gap(unk100)" entries which are breaking the location parser in
Bruce's examples.  Similarly even "gap()" entries of unknown length like this
will fail:

LOCUS       AH007743     7832 bp    DNA             CON       26-MAY-1999
DEFINITION  Gallus gallus ornithine transcarbamylase (OTC) gene, complete cds.
ACCESSION   AH007743
VERSION     AH007743.1  GI:4927367
KEYWORDS    .
SOURCE      chicken.
  ORGANISM  Gallus gallus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Archosauria;
            Aves; Neognathae; Galliformes; Phasianidae; Phasianinae; Gallus.
[....]
FEATURES             Location/Qualifiers
     source          1..7832
                     /organism="Gallus gallus"
                     /db_xref="taxon:9031"
                     /chromosome="1"
CONTIG      join(AF065630.1:1..1903,gap(),AF065631.1:1..435,gap(),
            AF065632.1:1..509,gap(),AF065633.1:1..722,gap(),AF065634.1:1..707,
            gap(),AF065635.1:1..836,gap(),AF065636.1:1..1614,gap(),
            AF065637.1:1..605,gap(),AF065638.1:1..501)
//

Example based on ftp://ftp.ncbi.nih.gov/genbank/README.genbank although this
does not describe the new terms.  Older versions of the release notes do, e.g.
ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb168.release.notes

========================= [start quote] =========================

3.4.15 CONTIG Format

  As an alternative to SEQUENCE, a CONTIG record can be present
following the ORIGIN record. A join() statement utilizing a syntax
similar to that of feature locations (see the Feature Table specification
mentioned in Section 3.4.12) provides the accession numbers and basepair
ranges of other GenBank sequences which contribute to a large-scale
biological object, such as a chromosome or complete genome. Here is
an example of the use of CONTIG :

CONTIG      join(AE003590.3:1..305900,AE003589.4:61..306076,
            AE003588.3:61..308447,AE003587.4:61..314549,AE003586.3:61..306696,
            AE003585.5:61..343161,AE003584.5:61..346734,AE003583.3:101..303641,

            [ lines removed for brevity ]

            AE003782.4:61..298116,AE003783.3:16..111706,AE002603.3:61..143856)

However, the CONTIG join() statement can also utilize a special operator
which is *not* part of the syntax for feature locations:

        gap()     : Gap of unknown length.

        gap(X)    : Gap with an estimated integer length of X bases.

                    To be represented as a run of n's of length X
                    in the sequence that can be constructed from
                    the CONTIG line join() statement .

        gap(unkX) : Gap of unknown length, which is to be represented
                    as an integer number (X) of n's in the sequence that
                    can be constructed from the CONTIG line join()
                    statement.

                    The value of this gap operator consists of the 
                    literal characters 'unk', followed by an integer.

Here is an example of a CONTIG line join() that utilizes the gap() operator:

CONTIG      join(complement(AADE01002756.1:1..10234),gap(1206),
            AADE01006160.1:1..1963,gap(323),AADE01002525.1:1..11915,gap(1633),
            AADE01005641.1:1..2377)

The first and last elements of the join() statement may be a gap() operator.
But if so, then those gaps should represent telomeres, centromeres, etc.

Consecutive gap() operators are illegal.

========================= [end quote] =========================

Evidently Biopython doesn't cope with these CONTIG lines - but then they do
have a different syntax to the feature locations.  I never understood why the
current code tries to parse the CONTIG string into a SeqFeature object in the
first place.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 10:36:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 10:36:52 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301536.n0UFaq5u015637@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 10:36 EST -------
(In reply to comment #2)
> > 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)
> 
> I couldn't even say off hand how the CONTIG line in that example would be
> parsed, let alone how it gets dealt with when loading into BioSQL.

Basically the CONTIG line looks rather a lot like a feature location, typically
the join of lots of (external) sequences.  It makes some sense to parse this
into an object structure, which given the way joins are handled for features,
this lead the original author to represent the CONTIG information as a dummy
feature with lots of sub features.  Given the CONTIG can also include gaps (of
unknown length), this doesn't quite fit the current SeqFeature location objects
(see Bug 2745).

If we extend the location objects to cope with these gaps, then perhaps the
CONTIG can stay as a SeqFeature in which case for BioSQL maybe we should record
it in the SeqFeature table.  We'd have to invent a way to record these gap
locations though.

However, if we just stored the CONTIG line as a raw string, we could then store
it in BioSQL as just another bioentry qualifier (assuming it doesn't overflow
the text field limit).

I've checked how and where BioPerl stores the contig information using the
example Bruce used on Bug 2745, attachment 1213, and see that the CONTIG
information is stored in the bioentry_qualifier_value table under the term
"contig" under the ontology "Annotation Tags".  They have retained the separate
lines, storing each as a separate entry with an increasing rank.

Thus for compatibility with BioSQL, it would make sense for the GenBank parser
to store the CONTIG line as a simple string (or list of strings), and not as a
SeqFeature (which is currently half broken anyway - see Bug 2745).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 11:20:18 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 11:20:18 -0500
Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a
	GenBank CON file
In-Reply-To: <bug-2745-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301620.n0UGKIXW024960@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 11:20 EST -------
Created an attachment (id=1214)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1214&action=view)
Treat the CONTIG information as a string, not a SeqFeature

As outlined on Bug 2681 comment 8, there are good reasons to simply store the
CONTIG information as a string or perhaps a list of strings.  This will make
our BioSQL bindings consistent with BioPerl.

More generally, I never really liked the idea of storing the CONTIG location as
a SeqFeature.  I could understand in principle using a location-object, but the
current location objects do not deal with joins directly - which is why you
have to use a SeqFeature with subfeatures.

In the long term, a new location object might be a worthwhile change to both
features and the contig.  For now, this patch simply stores the CONTIG
information as one long string.

If we commit this, then Tests/output/test_GenBank will need updating too.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 11:54:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 11:54:20 -0500
Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation
	document
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301654.n0UGsK0D003024@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 11:54 EST -------
This is fixed now.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Fri Jan  2 01:37:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 1 Jan 2009 20:37:43 -0500
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To: <bug-2544-42@http.bugzilla.open-bio.org/>
Message-ID: <200901020137.n021bhEB022751@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2544


------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz  2009-01-01 20:37 EST -------
Can I instantiate GenBank file, reverse-complement the sequence (keep letter
casing) in the SeqIO object and dump it back to a GenBank file?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan  2 18:15:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 2 Jan 2009 13:15:46 -0500
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To: <bug-2544-42@http.bugzilla.open-bio.org/>
Message-ID: <200901021815.n02IFkcf012662@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2544


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-02 13:15 EST -------
(In reply to comment #4)
> Can I instantiate GenBank file, reverse-complement the sequence
> (keep letter casing) in the SeqIO object and dump it back to a
> GenBank file?

I think this question would have been better handled on the mailing lists,
rather than on this bug.  Note that currently our GenBank output via Bio.SeqIO
does not include the features and references - see Bug 2294.

I would do this based on the approach described in the tutorial, which assumes
there could be many records in the input file.  Here is a variation for just
one record (untested):

from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
record = SeqIO.read(open("example.gbk"), "genbank")
rc_record = SeqRecord(seq = record.seq.reverse_complement(), \
                      id = "rc_" + record.id, \
                      name = "rc_" + record.name, \
                      description = "reverse complement")
out_handle = open("rc_example.gbk","w")
SeqIO.write([rc_record], out_handle, "genbank")
out_handle.close()

Note you *could* override the record's sequence in situ:
record.seq = record.seq.reverse_complement() #BAD IDEA
This is a bad idea because none of the annotations will have been changed - in
addition to the name/id/description still being the same, all the feature
locations etc will still be for the forward sequence.

--

I'm leaving this bug open for defining __repr__ for the
Bio.SeqFeature.Reference object (and perhaps tweaking the display of the
references in the SeqRecord __str__ method) ONLY.

Please continue any other discussion on the mailing lists.  Thanks.

Peter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jan  3 22:18:56 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 17:18:56 -0500
Subject: [Biopython-dev] [Bug 2723] New: Clarify what applies to which
	version of biopython and other doc cleanup
Message-ID: <bug-2723-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723

           Summary: Clarify what applies to which version of biopython and
                    other doc cleanup
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Documentation
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


I went to look around at the docs because the built-in tests of 1.49 setup.py
spitted some messages about external programs missing. I haven't found any
hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/.

Anyway, looking at 
http://biopython.org/DIST/docs/install/Installation.html#htoc17
I see: "3.4  mxTextTools (no longer needed)". I would propose:

3.4  mxTextTools (no longer needed since 1.49)

Similarly:
- 3.1  Numerical Python (NumPy) (strongly recommended)
+ 3.1  Numerical Python (NumPy) (strongly recommended since 1.49)


Bad URL links are in the text:


3.3  Database Access (MySQLdb, ...) (optional)

[cut]

Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be
used for accessing BioSQL databases through Biopython (see ). Again if you are 
-----------------------------------------------------------^
not going to use BioSQL, there shouldn???t be any need to install these
modules.


3.4  mxTextTools (no longer needed)

[cut]

However, we currently recommend you install mxTextTools 2.0, as some of the API
changes made in 3.0 version were not compatible with Biopython. Goto to
download
---------------------------------------------------------------------^^
this.


I haven't found an answer for me yet:

test_PopGen_FDist ... skipping. Install FDist if you want to use
Bio.PopGen.FDist.
ok
test_PopGen_FDist_nodepend ... ok
test_PopGen_GenePop ... ok
test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use
Bio.PopGen.SimCoal.
ok
test_PopGen_SimCoal_nodepend ... ok
test_ProtParam ... ok
test_Registry ... ok
test_Restriction ... ok
test_SCOP_Astral ... ok
test_SCOP_Cla ... ok
test_SCOP_Des ... ok
test_SCOP_Dom ... ok
test_SCOP_Hie ... ok
test_SCOP_Raf ... ok
test_SCOP_Residues ... ok
test_SCOP_Scop ... ok
test_SProt ... ok
test_SVDSuperimposer ... ok
test_SeqIO ... ok
test_SeqIO_online ... ok
test_SeqUtils ... ok
test_SubsMat ... ok
test_UniGene ... ok
test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
ok
test_align ... ok
test_docstrings ... ok
test_geo ... ok
test_interpro ... ok
test_kNN ... ok
test_lowess ... ok
test_pairwise2 ... ok
test_prodoc ... ok
test_property_manager ... ok
test_prosite ... ok
test_prosite2 ... ok
test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
ok
test_seq ... ok
test_translate ... ok
test_trie ... ok
test_triefind ... ok

----------------------------------------------------------------------
Ran 96 tests in 172.215s

OK


Pointer to those packages would have been helpful. From the test suite as well
as from installation manual. Moreover, what database username/password would
I have to make to get the BioSQL stuff compiled and tested?  ^H^H^H^H^H^H
I see, it gets compiled anyway the tests just were not run. The installation
manual and the output from test suite should be clearer.

Thanks, Peter!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jan  3 22:30:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 17:30:55 -0500
Subject: [Biopython-dev] [Bug 2724] New: Unclear? changes between 1.47 and
	1.49
Message-ID: <bug-2724-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2724

           Summary: Unclear? changes between 1.47 and 1.49
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


I had a look by diff(1) what files were installed on my machine by 1.47 release
and which were installed by 1.49. I don't know what cdistance was about but the
mailing list archive search tool does not work, and searching for it manually
in raw archives of Oct and Nov 2008 did not help.

The second file shown here contains a white space in a filename, not critical
but maybe good to rename in next release.

-/usr/lib/python2.5/site-packages/Bio/cdistance.so
+/usr/share/biopython/Tests/Clustalw/temp horses.dnd


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jan  4 01:10:02 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 20:10:02 -0500
Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49
In-Reply-To: <bug-2724-42@http.bugzilla.open-bio.org/>
Message-ID: <200901040110.n041A2e5028585@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2724


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-03 20:10 EST -------
Bio.cdistance was an optional C implementation used within Bio.distance - the C
code was used if available to speed up calculations.  You can see the (now
deleted) code in CVS here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Attic/cdistancemodule.c?hideattic=0&cvsroot=biopython

This C code (Bio.cdistance) was removed when the python code (Bio.distance) was
deprecated for release 1.49.

This was discussed at the start of October on the mailing list, see this
thread:
http://lists.open-bio.org/pipermail/biopython/2008-October/004532.html


This should have been mentioned in the DEPRECATED file, but wasn't.  I've
update this in CVS, see revision 1.41

http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/DEPRECATED?cvsroot=biopython

Thanks for spotting this omission.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jan  4 01:20:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 20:20:42 -0500
Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49
In-Reply-To: <bug-2724-42@http.bugzilla.open-bio.org/>
Message-ID: <200901040120.n041Kgkx029421@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2724


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-03 20:20 EST -------
The file "/usr/share/biopython/Tests/Clustalw/temp horses.dnd" is normally
created by one of the unit tests, test_Clustalw_tool.py (and the space is very
deliberate).

This stray dnd file does appear to have been included with biopython-1.49.zip
(and probably the tar ball as well), which must have been a minor slip on my
part.  However, I don't think its worth re-issuing the archive files over this.

I've updated test_Clustalw_tool.py as of CVS revision 1.4 so that it should
remove this dnd file automatically.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jan  4 01:37:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 3 Jan 2009 20:37:26 -0500
Subject: [Biopython-dev] [Bug 2723] Clarify what applies to which version of
	biopython and other doc cleanup
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901040137.n041bQ6Z030767@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-03 20:37 EST -------
(In reply to comment #0)
> I went to look around at the docs because the built-in tests of 1.49 setup.py
> spitted some messages about external programs missing. I haven't found any
> hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/.

No, that text and the matching email announcement don't do into details about
installation - the text was already long enough I felt.  However, the download
page does list various external programs:
http://biopython.org/wiki/Download

(Someone else had pointed out we were missing a few, which as been fixed, but I
couldn't find the email/bug report while writing this reply).

> Anyway, looking at 
> http://biopython.org/DIST/docs/install/Installation.html#htoc17
> I see: "3.4  mxTextTools (no longer needed)". I would propose:
> 
> 3.4  mxTextTools (no longer needed since 1.49)
> 
> Similarly:
> - 3.1  Numerical Python (NumPy) (strongly recommended)
> + 3.1  Numerical Python (NumPy) (strongly recommended since 1.49)

That does seem sensible.

> Bad URL links are in the text:
> 
> 3.3  Database Access (MySQLdb, ...) (optional)
> 
> [cut]
> 
> Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be
> used for accessing BioSQL databases through Biopython (see ). Again if you
> -----------------------------------------------------------^
> are not going to use BioSQL, there shouldn???t be any need to install these
> modules.
> 
> 
> 3.4  mxTextTools (no longer needed)
> 
> [cut]
> 
> However, we currently recommend you install mxTextTools 2.0, as some of the
> API changes made in 3.0 version were not compatible with Biopython. Goto
> ---------------------------------------------------------------------^^
> to download this.

I'll have to check those... probably something silly in the LaTeX source.

> I haven't found an answer for me yet:
> 
> test_PopGen_FDist ... skipping. Install FDist if you want to use
> Bio.PopGen.FDist.
> ok
> ...
> test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use
> Bio.PopGen.SimCoal.
> ok
> ...
> test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
> ok
> test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise.
> ok

See http://biopython.org/wiki/Download

> Pointer to those packages would have been helpful. From the test suite as well
> as from installation manual.

I'm not keen on making the unit test even more verbose by adding URLs to these
messages.  The information is on the download page, but yes, adding it to the
installation document seems sensible.

> Moreover, what database username/password would
> I have to make to get the BioSQL stuff compiled and tested?  ^H^H^H^H^H^H
> I see, it gets compiled anyway the tests just were not run.

The BioSQL unit test message should say: "Check settings in
Tests/setup_BioSQL.py if you plan to use BioSQL".  i.e. Once you have installed
BioSQL and setup a database, edit the file setup_BioSQL.py to match.  See
http://biopython.org/wiki/BioSQL

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jan  4 18:56:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 4 Jan 2009 13:56:22 -0500
Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation
	document
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901041856.n04IuMhJ028749@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Clarify what applies to     |Minor corrections to the
                   |which version of biopython  |installation document
                   |and other doc cleanup       |


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-04 13:56 EST -------
(In reply to comment #1)
> (In reply to comment #0)
> > I went to look around at the docs because the built-in tests of 1.49
> > setup.py spitted some messages about external programs missing. I haven't
> > found any hints on them in
> > http://news.open-bio.org/news/2008/11/biopython-release-149/.
> 
> No, that text and the matching email announcement don't do into details about
> installation - the text was already long enough I felt.  However, the download
> page does list various external programs:
> http://biopython.org/wiki/Download

I've added a section on third party tools to the installation document in CVS.

> > Anyway, looking at 
> > http://biopython.org/DIST/docs/install/Installation.html#htoc17
> > I see: "3.4  mxTextTools (no longer needed)". I would propose:
> > 
> > 3.4  mxTextTools (no longer needed since 1.49)
> > 
> > Similarly:
> > - 3.1  Numerical Python (NumPy) (strongly recommended)
> > + 3.1  Numerical Python (NumPy) (strongly recommended since 1.49)
> 
> That does seem sensible.

On reflection, I don't like the layout with version numbers stuck in the
section names.  The NumPy section is already very clear about the fact that
this applies to 1.49 onwards, and that older versions of Biopython needed
Numeric instead.  I have tried to clarify the mxTextTools section in CVS.

> > Bad URL links are in the text:
> > 
> > 3.3  Database Access (MySQLdb, ...) (optional)
> > ...
> > 3.4  mxTextTools (no longer needed)
> > ...
> 
> I'll have to check those... probably something silly in the LaTeX source.

Fixed in CVS.

I'm leaving this bug open until I've updated the HTML and PDF copies of the
installation document on the website.  I don't have the tools hevea installed
on this machine, so I can't create the HTML version of the installation
document -- just the PDF.  I should be be able to do this next week...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sun Jan  4 22:09:47 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 4 Jan 2009 17:09:47 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200901042209.n04M9lJ0010428@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #32 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-04 17:09 EST -------
(In reply to comment #30)
> (In reply to comment #29)
> > 
> > I propose that in Biopython 1.50 we support both "colour" and "color",
> > but for Biopython 1.51 we add deprecation warnings when "colour" is used.
> > 
> > We should probably do the same thing for "centre" and "center" as well...
> > 
> 
> I agree.  We should encourage use of the US spelling in the documentation, to
> catch those new to GD. This approach provides a window for conversion of old
> GD scripts for previous users, which is a good thing.
> 

I've updated CVS to switch from centre to centre, with properties setup to
allow access under the old spellings, and where I thought it appropriate I've
included both spellings in argument lists.  Another set of eyes to check this
wouldn't hurt.

I'm leaving this bug open until we've done the documentation (see my comment
25).

There is also the issue of Bug 2705 for the AT and GC content and skew
functions and any windowing function to help plot these in GenomeDiagram.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 16:30:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 11:30:46 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901051630.n05GUkun032207@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


bsouthey at gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


------- Comment #17 from bsouthey at gmail.com  2009-01-05 11:30 EST -------
I do not consider this bug completely fixed for multiple reasons of which my
patch addressed some of these prior to the creation of the _write function. I
do like where _write is heading as it is making cleaner and more understandable
code.

1) I do not understand the need for the dictionary of modules 'formatdict' in
_write as it creates unnecessary inefficient code. The options need to be part
of the check for the type of output.

2) There is no indication that the output for write and write_to_string only
accepts uppercase. Note the _write function states this but a user will not see
these. I do not understand why lowercase is unacceptable. 

3) The check for renderPM at start is really redundant because _write checks
for it (well sort of). It is also an unnecessary delay if renderPM is not used.
If you really must use the dictionary (which I really do not like) I would
suggest something like:
formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG}
try:
    from reportlab.graphics import renderPM
    formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM,
'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM})

The current code would show the correct options regardless of status
ofrenderPM. Perhaps an exception could provide a warning that renderPM is not
present.

4) There is no test for the presence of renderPM. The test function must check
for renderPM and should at least provide a warning if not present. Otherwise
this is a surprise to a user because not all options will be available.

5) The installation documentation must also indicate that renderPM is optional
and also how to install the renderPM module.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 16:49:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 11:49:46 -0500
Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main
	Biopython distribution
In-Reply-To: <bug-2671-42@http.bugzilla.open-bio.org/>
Message-ID: <200901051649.n05GnkVK001550@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2671


------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 11:49 EST -------
Still to do on the documentation front (as written in comment #25),
> 
> * Updating the existing GenomeDiagram manual to match (different imports,
> colour to color), which I think can stay as a separate PDF file.
> 
> * A short introduction to Bio.Graphics including GenomeDiagram as part of
> a new chapter in the tutorial?

Plus (as pointed out on Bug 2711 / Bug 2710):

* Updating the installation instructions so that the ReportLab section also
covers renderPM (needed for bitmaps).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 16:56:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 11:56:57 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901051656.n05GuvPP002443@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 11:56 EST -------
(In reply to comment #17)
> I do not consider this bug completely fixed for multiple reasons of which my
> patch addressed some of these prior to the creation of the _write function. I
> do like where _write is heading as it is making cleaner and more
> understandable code.
> 
> 1) I do not understand the need for the dictionary of modules 'formatdict' in
> _write as it creates unnecessary inefficient code. The options need to be part
> of the check for the type of output.

OK the use of a dictionary is a style thing.  You think its ugly and
inefficient.  Leighton and I don't find it ugly.  I thought the
if/elif/elif/else alternative you suggested was "ugly".

The argument for the type of output does get checked (by catching a KeyError
from the dictionary).

> 2) There is no indication that the output for write and write_to_string only
> accepts uppercase. Note the _write function states this but a user will not
> see these. I do not understand why lowercase is unacceptable. 

As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we
should after all accept either case.

> 3) The check for renderPM at start is really redundant because _write checks
> for it (well sort of). It is also an unnecessary delay if renderPM is not
> used. If you really must use the dictionary (which I really do not like) I
> would suggest something like:
> formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG}
> try:
>     from reportlab.graphics import renderPM
>     formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM,
> 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM})

I don't see how that would work, because unfortunately with the reportlab API,
we must treat renderPM differently to renderPDF, renderPS and renderSVG.

> The current code would show the correct options regardless of status
> ofrenderPM. Perhaps an exception could provide a warning that renderPM
> is not present.

Right now we do have a "helpful" exception raised when a bitmap format is
requested and renderPM is not installed.

> 4) There is no test for the presence of renderPM. The test function must check
> for renderPM and should at least provide a warning if not present. Otherwise
> this is a surprise to a user because not all options will be available.

There is an "on demand" test - via the _write function.  As Leighton has
already pointed out, this is nasty in that it can come as a surprise to the
user.  However, as far as I can see the alternative is an error/warning at
import time regardless even if the user doesn't need or want bitmap output
(i.e. Bug 2710).  The current situation strikes me as the lesser of two evils.

> 5) The installation documentation must also indicate that renderPM is
> optional and also how to install the renderPM module.

Yes, we should indicate renderPM is optional.  Updating our documentation to
cover GenomeDiagram is still pending on Bug 2671.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 21:46:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 16:46:37 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901052146.n05LkbSZ031281@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #19 from bsouthey at gmail.com  2009-01-05 16:46 EST -------
(In reply to comment #18)
> (In reply to comment #17)
> > I do not consider this bug completely fixed for multiple reasons of which my
> > patch addressed some of these prior to the creation of the _write function. I
> > do like where _write is heading as it is making cleaner and more
> > understandable code.
> > 
> > 1) I do not understand the need for the dictionary of modules 'formatdict' in
> > _write as it creates unnecessary inefficient code. The options need to be part
> > of the check for the type of output.
> 
> OK the use of a dictionary is a style thing.  You think its ugly and
> inefficient.  Leighton and I don't find it ugly.  I thought the
> if/elif/elif/else alternative you suggested was "ugly".
> 
> The argument for the type of output does get checked (by catching a KeyError
> from the dictionary).

I agree that reportlab makes any solution "ugly" because the different types
require different arguments. I agree this is partly a style issue because it is
a case of what to do first, when to do it and when to tell the user what is
missing. 

> 
> > 2) There is no indication that the output for write and write_to_string only
> > accepts uppercase. Note the _write function states this but a user will not
> > see these. I do not understand why lowercase is unacceptable. 
> 
> As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we
> should after all accept either case.
> 
> > 3) The check for renderPM at start is really redundant because _write checks
> > for it (well sort of). It is also an unnecessary delay if renderPM is not
> > used. If you really must use the dictionary (which I really do not like) I
> > would suggest something like:
> > formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG}
> > try:
> >     from reportlab.graphics import renderPM
> >     formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM,
> > 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM})
> 
> I don't see how that would work, because unfortunately with the reportlab API,
> we must treat renderPM differently to renderPDF, renderPS and renderSVG.
> 

This just moves the renderPM import into _write and the rest of the code runs
if you add:
except:
    renderPM=None

> > The current code would show the correct options regardless of status
> > ofrenderPM. Perhaps an exception could provide a warning that renderPM
> > is not present.
> 
> Right now we do have a "helpful" exception raised when a bitmap format is
> requested and renderPM is not installed.

Again a style issue because I just find it redundant if we already know that
renderPM is not present.

> 
> > 4) There is no test for the presence of renderPM. The test function must check
> > for renderPM and should at least provide a warning if not present. Otherwise
> > this is a surprise to a user because not all options will be available.
> 
> There is an "on demand" test - via the _write function.  As Leighton has
> already pointed out, this is nasty in that it can come as a surprise to the
> user.  However, as far as I can see the alternative is an error/warning at
> import time regardless even if the user doesn't need or want bitmap output
> (i.e. Bug 2710).  The current situation strikes me as the lesser of two evils.
> 

I mean that test_GenomeDiagram should also check for renderPM and provide a
warning if not present. So if tests are run then there is some indication that
something is missing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 22:33:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 17:33:30 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901052233.n05MXUCS002828@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 17:33 EST -------
(In reply to comment #19)
> I mean that test_GenomeDiagram should also check for renderPM and provide a
> warning if not present. So if tests are run then there is some indication that
> something is missing.

The way we have our external dependency checking setup, if something is missing
the whole test is skipped.  I want to keep test_GenomeDiagram.py as it is
producing PDF output (with no dependency on renderPM - so that the core
GenomeDiagram functionality is tested).

However, I had been thinking about adding a (smaller) extra test, say
test_GenomeDiagram_bitmaps.py which would need renderPM installed. 
Alternatively this could be a more general quick test for making PNG etc with
all of Bio.Graphics after fixing Bug 2718.

This would as you point out mean anyone running the test suite would then be
alerted to the fact they may be missing renderPM - which would be a good thing.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jan  5 23:20:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 18:20:52 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200901052320.n05NKqok006769@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 18:20 EST -------
(In reply to comment #2)
> In addition, I notice that Bio.Graphics.BasicChromosome,
> Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case
> formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram
> expects upper case.  We should be consistent, which for backwards
> compatibility would mean accepting either case.

Bio.Graphics.GenomeDiagram will now accept format names in any case.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jan  6 00:16:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 19:16:10 -0500
Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats
	(PDF, EPS, SVG, and bitmaps)
In-Reply-To: <bug-2718-42@http.bugzilla.open-bio.org/>
Message-ID: <200901060016.n060GAfe011559@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2718


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 19:16 EST -------
Created an attachment (id=1186)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1186&action=view)
Adding output function to Bio.Graphics for shared use

This is based on the code from Bio.Graphics.GenomeDiagram.Diagram and would be
called from all the Bio.Graphics modules to output to a file/handle in any
supported file format, in a consistent manor.

This is done as a private function, as I do not want to expose this as a new
public API.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jan  6 00:18:06 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 19:18:06 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901060018.n060I6eq011760@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-05 19:18 EST -------
(In reply to comment #17)
> I do not consider this bug completely fixed for multiple reasons of which my
> patch addressed some of these prior to the creation of the _write function. I
> do like where _write is heading as it is making cleaner and more
> understandable code.

I decided that since ReportLab used a cStringIO or StringIO handle internally
to implement its writeToString method, we might as well do the same as it
allows a great simplification to the GenomeDiagram write and write_to_string
methods (and we can get rid of _write too).

See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython

I hope you'll agree that this is a further improvement (even if the dictionary
approach is still used internally).

My plan (see Bug 2718) is to move this code into a shared private function for
all of the Bio.Graphics modules to use.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Tue Jan  6 00:48:12 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 6 Jan 2009 00:48:12 +0000
Subject: [Biopython-dev] Structure and LDNe
Message-ID: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>

Hi all,

Jason Eshleman (he subscribes to this list also) has made available
code to interact with Structure (a widely used application in
population genetics - the 2 papers related to it have around 3000
citations acording to Google scholar). We will try to convert his code
to the Bio.PopGen namespace, create documentation and test cases.
To this adds the exsiting LDNe code (mine). This all should be ready
in a reasonably fast time frame (I suppose before the next release).

The all important statistics part is still due, I am afraid (I don't
know if anybody has looked at the beta code on git). But at least this
LDNe and Structure code will be ready to go soon.

Tiago


From bugzilla-daemon at portal.open-bio.org  Tue Jan  6 02:56:35 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 5 Jan 2009 21:56:35 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901060256.n062uZBF023086@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #22 from bsouthey at gmail.com  2009-01-05 21:56 EST -------
(In reply to comment #21)
> (In reply to comment #17)
> > I do not consider this bug completely fixed for multiple reasons of which my
> > patch addressed some of these prior to the creation of the _write function. I
> > do like where _write is heading as it is making cleaner and more
> > understandable code.
> 
> I decided that since ReportLab used a cStringIO or StringIO handle internally
> to implement its writeToString method, we might as well do the same as it
> allows a great simplification to the GenomeDiagram write and write_to_string
> methods (and we can get rid of _write too).
> 
> See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython
> 
> I hope you'll agree that this is a further improvement (even if the dictionary
> approach is still used internally).
> 
> My plan (see Bug 2718) is to move this code into a shared private function for
> all of the Bio.Graphics modules to use.
> 

That is great! 

Note that reportlab's drawToString first uses it's getStringIO() and passes
that to drawToFile. I am not sure the difference between getStringIO() and
StringIO() but getStringIO() might be preferred. 

Also, I would presume that checking for the filename would allow you to combine
the writing to a file and writing to a string into a single new function to
maintain backwards compatibility.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From rhythmbox-devel at maubp.freeserve.co.uk  Tue Jan  6 10:01:34 2009
From: rhythmbox-devel at maubp.freeserve.co.uk (Peter)
Date: Tue, 6 Jan 2009 10:01:34 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
Message-ID: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>

On Tue, Jan 6, 2009 at 12:48 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> Hi all,
>
> Jason Eshleman (he subscribes to this list also) has made available
> code to interact with Structure (a widely used application in
> population genetics - the 2 papers related to it have around 3000
> citations acording to Google scholar). We will try to convert his code
> to the Bio.PopGen namespace, create documentation and test cases.
> To this adds the exsiting LDNe code (mine). This all should be ready
> in a reasonably fast time frame (I suppose before the next release).

That sounds good :)

> The all important statistics part is still due, I am afraid (I don't
> know if anybody has looked at the beta code on git). But at least this
> LDNe and Structure code will be ready to go soon.
>
> Tiago

I haven't looked at any of your code on git - and I probably won't
have any spare time till next week.  But anyway, do you have the URL
handy?

Thanks

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Jan  6 12:30:39 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 6 Jan 2009 07:30:39 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901061230.n06CUds2006927@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-06 07:30 EST -------
(In reply to comment #22)
> That is great! 
> 
> Note that reportlab's drawToString first uses it's getStringIO() and passes
> that to drawToFile. I am not sure the difference between getStringIO() and
> StringIO() but getStringIO() might be preferred. 

>From going through the ReportLab code a week or two ago, it ends up using
cStringIO (or falling back on StringIO) internally.

> Also, I would presume that checking for the filename would allow you to
> combine the writing to a file and writing to a string into a single new
> function to maintain backwards compatibility.

You'd then have one method to write to a string, handle or filename.  As I said
before, I'm not keen on this - having two very different return values (string
or nothing) depending on the arguments, with some special invocation needed to
request the string output (maybe None rather than a filename/handle?).

The status quo seems OK here, with a write method (to a handle or filename) and
separate a write_to_string method.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From tiagoantao at gmail.com  Tue Jan  6 16:52:22 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 6 Jan 2009 16:52:22 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
Message-ID: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>

On Tue, Jan 6, 2009 at 10:01 AM, Peter
<rhythmbox-devel at maubp.freeserve.co.uk> wrote:
> I haven't looked at any of your code on git - and I probably won't
> have any spare time till next week.  But anyway, do you have the URL
> handy?

I gave the code to Giovanni, so its his URL:
http://github.com/dalloliogm/biopython---popgen/tree/master
The code on Stats is still in a version that will have to be changed.
It is probably only of interest to developers that might have direct
interest in the module.
For development purposes I will put the code there (I don't want to
commit to the main CVS branch - as it is a production branch - before
the code is in an acceptable format).

Tiago


From bsouthey at gmail.com  Tue Jan  6 17:41:29 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 06 Jan 2009 11:41:29 -0600
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
Message-ID: <496397C9.3030706@gmail.com>

Tiago Ant?o wrote:
> Hi all,
>
> Jason Eshleman (he subscribes to this list also) has made available
> code to interact with Structure (a widely used application in
> population genetics - the 2 papers related to it have around 3000
> citations acording to Google scholar). We will try to convert his code
> to the Bio.PopGen namespace, create documentation and test cases.
> To this adds the exsiting LDNe code (mine). This all should be ready
> in a reasonably fast time frame (I suppose before the next release).
>
> The all important statistics part is still due, I am afraid (I don't
> know if anybody has looked at the beta code on git). But at least this
> LDNe and Structure code will be ready to go soon.
>
> Tiago
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
Hi,
What are the licenses for LDNe and Structure?
Saying just 'free' is insufficient because it is not clear in which 
definition is being used.

Also, please ensure that none of the code that is included into 
Biopython is not a deriviative of LDNe and Structure unless these have 
explicit license that is compatible with Biopython.  For example, 
'copying' an existing function into Python would be considered a 
derivative. Obviously reading a documented output is probably not 
considered a derivative.

I prefer to be proactive with licenses so these don't bite back like has 
happened in some formally open sources projects or use of unclean code 
sources. A current example of this is that the current release of scipy 
0.7 has been significantly delayed due to some major effort to check 
various functions that reference the Numerical Recipes book (which has 
an incompatible license).

Anyhow, this sounds good!

Bruce


From tiagoantao at gmail.com  Tue Jan  6 18:10:28 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 6 Jan 2009 18:10:28 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <496397C9.3030706@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
Message-ID: <6d941f120901061010n36281702gc073d9f4469d492c@mail.gmail.com>

On Tue, Jan 6, 2009 at 5:41 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> What are the licenses for LDNe and Structure?
> Saying just 'free' is insufficient because it is not clear in which
> definition is being used.
>
> Also, please ensure that none of the code that is included into Biopython is
> not a deriviative of LDNe and Structure unless these have explicit license
> that is compatible with Biopython.  For example, 'copying' an existing
> function into Python would be considered a derivative. Obviously reading a
> documented output is probably not considered a derivative.

Regarding LDNe we have had this discussion in the past. I have some
updates/extra info:
1. They only make available a Windows/DOS version. But they will make
a Linux version available (compiled by me, I offered to do that).
Probably a mac version also.
2. As I said before and as it is common in population genetics
(unfortunately), the software comes with no license at all, they
didn't even think that is an issue.
3. No code is remotely derived or adapted.

Regarding structure, the authors make the source available (a notch
better than LDNe) http://pritch.bsd.uchicago.edu/structure.html , but
again, they didn't bother to include license info. I am contacting
them in order to investigate this. I will report back as soon as I
have an answer.

This being said, structure support is way more important than LDNe.
The userbase of structure is quite big (just check the factoid
previous on google schoolar citations).


From dalloliogm at gmail.com  Wed Jan  7 10:37:00 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Wed, 7 Jan 2009 11:37:00 +0100
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
Message-ID: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>

On Tue, Jan 6, 2009 at 5:52 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> On Tue, Jan 6, 2009 at 10:01 AM, Peter
> <rhythmbox-devel at maubp.freeserve.co.uk> wrote:
>> I haven't looked at any of your code on git - and I probably won't
>> have any spare time till next week.  But anyway, do you have the URL
>> handy?
>
> I gave the code to Giovanni, so its his URL:
> http://github.com/dalloliogm/biopython---popgen/tree/master

Hi people,
if you want to upload the code there, please tell me and I will give
you the write access.

However, the right way to do it should be that you create a fork of
the code on github, add your changes and work on it locally, and then
merge them back again in the original repository. I suppose that is
the standard way to use git.


> The code on Stats is still in a version that will have to be changed.
> It is probably only of interest to developers that might have direct
> interest in the module.
> For development purposes I will put the code there (I don't want to
> commit to the main CVS branch - as it is a production branch - before
> the code is in an acceptable format).
>
> Tiago
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From tiagoantao at gmail.com  Wed Jan  7 11:54:19 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 7 Jan 2009 11:54:19 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
Message-ID: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>

> However, the right way to do it should be that you create a fork of
> the code on github, add your changes and work on it locally, and then
> merge them back again in the original repository. I suppose that is
> the standard way to use git.

Considering that CVS has no development branch I think having git is
very good. I would just recommend extreme care with changing existing
code. When merging back into CVS, changes to existing code might not
go in (especially if they change interfaces) or be delayed.

Big _design_ changes will have to be discussed in advance.

For my part, what I am including is just new LDNe code and helping
Jason with the structure code. So I expect zero impact on existing
code and no need for design changes.

Tiago
PS - I am travelling until Saturday, apologies in advance for delayed answers.


From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 14:12:46 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:12:46 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071412.n07ECk1n012802@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #24 from lpritc at scri.sari.ac.uk  2009-01-07 09:12 EST -------
(In reply to comment #13)

> I can not check this as I am away from my system. As I recall, the Python code
> for accessing this library is provided with the standard install as there is a
> renderPM.py file. But that is just a wrapper to some C code found in the
> rl_addons directory. So it is a big no that renderPM is available unless you
> actually build the C sources or download the binaries (only valid for Windows).

That's not really a big deal, as those are the only two ways to get ReportLab,
from reportlab.org!

>From the website (http://www.reportlab.org/downloads.html):

"""
We provide precompiled binaries for Windows, but not for any other platform.
Many Linux distributors and other UNIX-like OS vendors provide their own
binaries for download
"""

The installation procedure for me was to issue:

python setup.py install

at the command line while in the top directory of the source download, which
isn't any harder than installing Biopython itself.  This installed ReportLab
2.2, including compilation of renderPM.  

> According to the website
> http://www.reportlab.org/subversion.html
> "
> It will create subdirectories for reportlab, which is an importable
> python package, and rl_addons which contains the C extensions. The
> latter need building with the contained setup script, but can also be
> downloaded in pre-built form from our downloads page. They rarely
> change.
> "
> 
> What did you actually install?

Reportlab 2.2, stable build as ReportLab_2_2.tgz, downloaded on December 15th
last year.  From the checksum, it's the 11/9 build.

I've just checked the SVN trunk, and that also builds renderPM, on the same
machine.

> In particular where was _renderPM built?

Initially, in [download location]/ReportLab_2_2/src/rl_addons/renderPM

and the library was installed to 

/usr/local/lib/python2.4/site-packages/_renderPM.so

by the setup script.

> Basically we need to document this as there appears to be different ways to
> install reporlab (may also be version or svn related).

I'm happy with this, but it's not exactly a complicated issue: either the local
Reportlab installation does or does not have renderPM; if it does not, then
raising an error before the user dedicates too much effort to something that
can't work seems at least polite.  Also, providing pointers in the
documentation to where renderPM can be obtained (at time of last writing) is a
good idea.  IMO, given the straightforward installation procedure that corrects
the issue - which ought not to affect *nix users that do not run precompiled
binaries, anyway -  I reckon that raising an error will be sufficient for most
of the few cases that renderPM is not installed. 

L.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 14:33:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:33:21 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071433.n07EXLSn014755@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #25 from lpritc at scri.sari.ac.uk  2009-01-07 09:33 EST -------
(In reply to comment #17)

> 1) I do not understand the need for the dictionary of modules 'formatdict' in
> _write as it creates unnecessary inefficient code. The options need to be part
> of the check for the type of output.

The need is that input types are associated with alternative rendering
backends.  The distribution dictionary approach is highly-readable and readily
extendable to accept, for example, lowercase variants of format names that map
to the same backend - as in your point number 2.

I also don't understand your efficiency argument.  Firstly, this step is not
AFAIAA a bottleneck, and hardly a priority for optimisation; secondly I do not
believe that a distribution dictionary is less efficient than your suggestion. 
The dictionary achieves the same end in three lines of code, rather than ten
for the elif.  Also computationally, if the format name is 'TIF', your elif
code will always have to cycle through all output format name tests (four
conditionals, and an O(n) list search) in order to associate that format with
renderPM.  This is less efficient than a dictionary approach: retrieving values
from dictionaries takes approximately constant time. Not that if we ran profile
on the two approaches we'd see much of a difference, of course - this is not a
speed-critical step.

Also, and in my opinion, elifs are not as easy to maintain, or as readable, as
distribution dictionaries.

> 2) There is no indication that the output for write and write_to_string only
> accepts uppercase. Note the _write function states this but a user will not see
> these. I do not understand why lowercase is unacceptable. 

It's not unacceptable - at least, not to me - I just didn't write it to accept
lowercase, originally.  I've no objection to adding lowercase variants of the
format names to the distribution dictionary.

> 3) The check for renderPM at start is really redundant because _write checks
> for it (well sort of). It is also an unnecessary delay if renderPM is not used.

It's not a big speed hit (or is there contradictory data? it's certainly not a
speed worry for my work) and, if tested on import, needs only to be done once
when GenomeDiagram is imported.

> 4) There is no test for the presence of renderPM. The test function must check
> for renderPM and should at least provide a warning if not present. Otherwise
> this is a surprise to a user because not all options will be available.

Raising an error, or at least a warning, is a good idea.  I favour raising this
error on first import.

> 5) The installation documentation must also indicate that renderPM is optional
> and also how to install the renderPM module.

I'm still not convinced that this is all that big an issue: renderPM is part of
the source ReportLab 2.2 distribution, and the instructions on reportlab.org
are pretty clear.  However, for those users who have pathological
installations, a line pointing out that renderPM can be obtained via
reportlab.org is a good idea.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 14:38:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:38:14 -0500
Subject: [Biopython-dev] [Bug 2727] New: PDB.Bio: header should include
	CRYST1 information
Message-ID: <bug-2727-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2727

           Summary: PDB.Bio:  header should include CRYST1 information
           Product: Biopython
           Version: 1.49b
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mok at bioxray.au.dk


The unit cell and spacegroup information should be available from PDBParser's
get_header() method.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 14:40:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 09:40:52 -0500
Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1
	information
In-Reply-To: <bug-2727-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071440.n07EeqsZ015513@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2727


------- Comment #1 from mok at bioxray.au.dk  2009-01-07 09:40 EST -------
Created an attachment (id=1188)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1188&action=view)
Patch for parse_pdb_header.py

Attached patch will add three keys to the header dictionary: cell, spacegroup
and cell_z, giving access to this data gleaned from the CRYST1 record of a PDB
file.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 15:10:12 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 10:10:12 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071510.n07FACPH017825@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #26 from bsouthey at gmail.com  2009-01-07 10:10 EST -------
(In reply to comment #24)
I had Reportlab version 2.1 installed but once I upgraded to version 2.2 I got
renderPM built. So anyone using reportlab version 2.2 will be happy, others
that don't will not be happy! 

So please ensure that Reportlab version 2.2 (released 11 Sep 2008) and higher
is required. Otherwise you must check for renderPM because most people probably
have old version around with renderPM and most distributions (OpenSUSE seems to
be an exception if you look in the right place) don't have the 2.2 version yet.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan  7 15:52:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 7 Jan 2009 10:52:52 -0500
Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and
	write_to_string() are inefficient and don't check inputs
In-Reply-To: <bug-2711-42@http.bugzilla.open-bio.org/>
Message-ID: <200901071552.n07FqqcX021811@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2711


------- Comment #27 from bsouthey at gmail.com  2009-01-07 10:52 EST -------
(In reply to comment #25)
This is a mainly a reportlab issue (API and version problem) and, as Peter
said, a style issue. So the only remaining issue is a unit test involving at
least checks for the presence of renderPM due to versions of reportlab less
than 2.2.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From jae at lmi.net  Thu Jan  8 22:24:21 2009
From: jae at lmi.net (Jason Eshleman)
Date: Thu, 08 Jan 2009 14:24:21 -0800
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <496397C9.3030706@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
Message-ID: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>

Greetings all,

Presently, the code I have for dealing with STRUCTURE is similar to the 
code for interacting with Clustal in that it does not modify any of the 
STRUCTURE source code by merely initiates the compiled executable.

Initially, I have used my code in place of their Java front end as it 
allows for more control of the run-time variables for successive runs with 
varying run parameters.  At some point, I'd like to get it to interface 
more directly with the STRUCTURE code to be able to pipe results directly 
to python for parsing rather than working with the STRUCTURE text output 
but that's a ways off still.


-Jason


At 09:41 AM 1/6/2009, Bruce Southey wrote:
>Tiago Ant?o wrote:
>>Hi all,
>>
>>Jason Eshleman (he subscribes to this list also) has made available
>>code to interact with Structure (a widely used application in
>>population genetics - the 2 papers related to it have around 3000
>>citations acording to Google scholar). We will try to convert his code
>>to the Bio.PopGen namespace, create documentation and test cases.
>>To this adds the exsiting LDNe code (mine). This all should be ready
>>in a reasonably fast time frame (I suppose before the next release).
>>
>>The all important statistics part is still due, I am afraid (I don't
>>know if anybody has looked at the beta code on git). But at least this
>>LDNe and Structure code will be ready to go soon.
>>
>>Tiago
>>_______________________________________________
>>Biopython-dev mailing list
>>Biopython-dev at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>Hi,
>What are the licenses for LDNe and Structure?
>Saying just 'free' is insufficient because it is not clear in which 
>definition is being used.
>
>Also, please ensure that none of the code that is included into Biopython 
>is not a deriviative of LDNe and Structure unless these have explicit 
>license that is compatible with Biopython.  For example, 'copying' an 
>existing function into Python would be considered a derivative. Obviously 
>reading a documented output is probably not considered a derivative.
>
>I prefer to be proactive with licenses so these don't bite back like has 
>happened in some formally open sources projects or use of unclean code 
>sources. A current example of this is that the current release of scipy 
>0.7 has been significantly delayed due to some major effort to check 
>various functions that reference the Numerical Recipes book (which has an 
>incompatible license).
>
>Anyhow, this sounds good!
>
>Bruce
>_______________________________________________
>Biopython-dev mailing list
>Biopython-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biopython-dev


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 12:50:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 07:50:37 -0500
Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1
	information
In-Reply-To: <bug-2727-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091250.n09Cob1q021245@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2727


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 07:50 EST -------
Hopefully Bio.PDB's owner/maintainer Thomas Hamelryck can comment on this.

In the meantime, the code style seems to fit fine with the rest of
parse_pdb_header.py which is good.  However, you have not updated the
parse_pdb_header function's docstring to include the new keys.  Furthermore, it
would be nice to have the docstring describe the meaning of the cell, z-cell
and spacegroup entries you have introduced.  I'm also curious about the default
values and their meanings.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From rhythmbox-devel at maubp.freeserve.co.uk  Fri Jan  9 12:55:13 2009
From: rhythmbox-devel at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 12:55:13 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
Message-ID: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>

On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>
> Considering that CVS has no development branch I think having git is
> very good. I would just recommend extreme care with changing existing
> code. When merging back into CVS, changes to existing code might not
> go in (especially if they change interfaces) or be delayed.
>

If there is a strong interest in having experimental branches in the
official Biopython repository, we could discuss that as an option.
Although I would prefer we get moved from CVS to SVN first before
actually doing this, in order to keep the migration as simple as
possible.

Peter


From biopython at maubp.freeserve.co.uk  Fri Jan  9 12:59:00 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 12:59:00 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
	<6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
Message-ID: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>

On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman <jae at lmi.net> wrote:
> Greetings all,
>
> Presently, the code I have for dealing with STRUCTURE is similar to the code
> for interacting with Clustal, in that it does not modify any of the STRUCTURE
> source code by merely initiates the compiled executable.

Biopython has code for interacting with lots of command line tools,
and this neatly avoids any copyright/licence questions about being a
derived work.

> Initially, I have used my code in place of their Java front end as it allows
> for more control of the run-time variables for successive runs with varying
> run parameters.  At some point, I'd like to get it to interface more
> directly with the STRUCTURE code to be able to pipe results directly to
> python for parsing rather than working with the STRUCTURE text output but
> that's a ways off still.

I'm not quite clear what you have in mind, but this would probably
need a little more thought from the legal perspective.  If STRUCTURE
provides an API with header files you can compile against, that should
be OK (but I am not a lawyer).  Note that do this within Biopython
would then mean adding another build time dependency, which would need
to be justified in terms of the benefits it brings.

Peter


From bsouthey at gmail.com  Fri Jan  9 14:46:15 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 09 Jan 2009 08:46:15 -0600
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
Message-ID: <49676337.7050504@gmail.com>

Peter wrote:
> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>   
>> Considering that CVS has no development branch I think having git is
>> very good. I would just recommend extreme care with changing existing
>> code. When merging back into CVS, changes to existing code might not
>> go in (especially if they change interfaces) or be delayed.
>>
>>     
>
> If there is a strong interest in having experimental branches in the
> official Biopython repository, we could discuss that as an option.
> Although I would prefer we get moved from CVS to SVN first before
> actually doing this, in order to keep the migration as simple as
> possible.
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   

I agree that it is essential to move from CVS before doing this but does 
not prevent any discussion.

So I'll start a thread.

Bruce


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 15:59:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 10:59:40 -0500
Subject: [Biopython-dev] [Bug 2729] New: Importing Bio.SeqUtils before
	importing pylab gives a "Bus Error"
Message-ID: <bug-2729-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729

           Summary: Importing Bio.SeqUtils before importing pylab gives a
                    "Bus Error"
           Product: Biopython
           Version: 1.49
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: stephan_schiffels at mac.com


I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0
The following two lines crash:

import Bio.SeqUtils
import pylab

I nailed down the problem to lines 122 through 125 in Bio/SeqUtils/__init__.py.
Commenting out these four lines SOLVES the bug for me, since I don't use the
graphics-functions in the SeqUtils package

Best,
Stephan


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Fri Jan  9 16:18:26 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Fri, 09 Jan 2009 10:18:26 -0600
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
Message-ID: <496778D2.1050801@gmail.com>

Hi,
In a previous thread (and indicated in others) it was suggested that 
perhaps Biopython needs some type of development  or experimental 
branch. So this thread is orientated to provide some discussion on this 
and considers that Biopython has moved to SVN. I think it is very 
relevant discussion because Biopython needs an effective approach to 
mainly handle new code but also handle significant rewrites of older code.

The most important question is do you support creating developmental and 
experimental branches or not?

However, I do not think that this is a yes or no answer and I am not 
concerned about the question at the present time.  Rather I am concerned 
about the burden placed on the maintainers (especially Peter and 
Michiel), the expression of the developer needs and how this impact the 
community. I am rather neutral on it (probably because I have not 
contributed any major code to Biopython) but I would like to ensure that 
the discussion leads to positive changes.

I find Biopython interesting and special for various reasons. There is a 
solid core of functions that are common to many aspects of 
bioinformatics. But it also contains very specialized code that has a 
much smaller audience. Consequently certain parts get considerable 
exposure and other parts get limited or no exposure. This means that it 
may be necessary to release beta versions in order to get the necessary 
exposure as I assume that code has had sufficient development to be 
released in the first place. Creating developmental and experimental 
branches is one way to get this exposure but perhaps branches are not 
necessary.

An alternative approach is creating specialized projects within 
Biopython that can be used for development and testing. For example, 
Scipy provides SciKits that are related code that is typically special 
purpose or is released under a different license than scipy/numpy. This 
replaced the sandboxes that existed in prior versions of numpy and 
scipy. But a recent problem arose in numpy was how to get code from such 
a location into numpy by creating a experimental section in the main 
distribution but that met some strong resistance.

Therefore, I see the following issues that need to be addressed 
regardless of the approach taken:

0) Must be easy for project maintenance and release as this must not 
create an extra burden to Biopython!
1) Ensure adequate testing is performed especially to get it out to the 
appropriate audience and to correct the code and APIs. I consider this 
rather important because I tend to follow a type of user experience 
design (http://en.wikipedia.org/wiki/User_experience_design) and 
software prototyping (http://en.wikipedia.org/wiki/Software_prototyping) 
for software development.
2) Stabilization of APIs for backwards compatibility as we don't want to 
change these with each Biopython release.
3) Adequate test coverage especially across platforms and different 
software versions. For example Windows paths and older software versions 
can cause problems on other peoples machines but not yours.
4) Some type of code review even if it is just to ensure a consistent 
format (like spaces versus tabs) or compatibility across Python versions 
and platforms.
5) If developmental or experimental branch are used then how does the 
code move into the main distribution and how are these branches created 
and destroyed.

Please add other issues.

I would appreciate these issues being addressed when appropriate.

Regards
Bruce

Peter wrote:
> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>   
>> Considering that CVS has no development branch I think having git is
>> very good. I would just recommend extreme care with changing existing
>> code. When merging back into CVS, changes to existing code might not
>> go in (especially if they change interfaces) or be delayed.
>>
>>     
>
> If there is a strong interest in having experimental branches in the
> official Biopython repository, we could discuss that as an option.
> Although I would prefer we get moved from CVS to SVN first before
> actually doing this, in order to keep the migration as simple as
> possible.
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 16:27:08 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:27:08 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091627.n09GR88l003529@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 11:27 EST -------
i.e. these lines?

try:
    from Tkinter import *
except ImportError:
    pass

What happens with just "import Tkinter" on your machine?

Are you using the default Apple installed copy of python?

I can see why this might cause trouble if Tkinter does some initialisation at
import time.  Could you include the actual crash/traceback error please?

Note I see no crash on my MacOS machine (not sure which version of pylab) which
has Tkinter.  Nor do I see a crash on one of my linux machines (again, not sure
which pylab) which does NOT have TKinter.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 16:33:59 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:33:59 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091633.n09GXxDS004117@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp  2009-01-09 11:33 EST -------
(In reply to comment #0)
> I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0
> The following two lines crash:
> 
> import Bio.SeqUtils
> import pylab
> 
What do you mean by crash?
Also, do you get the same problem with the latest matplotlib (0.98.4 I
believe)?
If

try:
    from Tkinter import *
except ImportError:
    pass
import pylab

crashes, then this is not a Biopython bug.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 16:45:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:45:52 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091645.n09GjqFV004905@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 11:45 EST -------
Created an attachment (id=1189)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1189&action=view)
Patch to Bio/SeqUtils/__init__.py to moving the Tkinter imports

This patch moves the Tkinter import back into the xGC_skew function as
suggested by the old comments in the code, and uses an explicit import list
instead of "import *".  For the history of this bit of code, see the deleted
file Bio/sequtils.py in CVS.

I think this is worthwhile little bit of clean up - but it probably won't have
any effect on Stephan's issue with Tkinter/pylab.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 16:53:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 11:53:23 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091653.n09GrN6W005481@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


------- Comment #4 from stephan_schiffels at mac.com  2009-01-09 11:53 EST -------
Hi,
importing Tkinter works fine. Only calling import pylab after it crashes... (no
traceback... just "bus error").
Here is the shell-output:

mac14:~ stschiff$ python
Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import Tkinter
>>> import pylab
Bus error
mac14:~ stschiff$ 

The weirdest thing is that calling the other way around works fine:

mac14:~ stschiff$ python
Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylab
>>> import Tkinter
>>> 

The same holds for first calling pylab and then Bio.SeqUtils...

I dont know, it could be that this is just a pathological case on my specific
setup. It's still weird though, since matplotlib uses GTK on X11 on my machine,
not Tkinter... I dont get it.

Maybe this is not a biopython bug after all... sorry and thanks anyway for your
concern

Stephan
(In reply to comment #1)
> i.e. these lines?
> 
> try:
>     from Tkinter import *
> except ImportError:
>     pass
> 
> What happens with just "import Tkinter" on your machine?
> 
> Are you using the default Apple installed copy of python?
> 
> I can see why this might cause trouble if Tkinter does some initialisation at
> import time.  Could you include the actual crash/traceback error please?
> 
> Note I see no crash on my MacOS machine (not sure which version of pylab) which
> has Tkinter.  Nor do I see a crash on one of my linux machines (again, not sure
> which pylab) which does NOT have TKinter.
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 17:10:10 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 12:10:10 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091710.n09HAA5c006886@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 12:10 EST -------
(In reply to comment #4)
> Hi,
> importing Tkinter works fine. Only calling import pylab after it crashes...
> (no traceback... just "bus error").

You could try going to Application, Utilities, Console on your Mac to look for
any error log associated with the bus error.

> Here is the shell-output:
> 
> mac14:~ stschiff$ python
> Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) 
> [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import Tkinter
> >>> import pylab
> Bus error
> mac14:~ stschiff$ 

OK - that does seem to confirm that its a bug with pylab, and therefore isn't
Biopython's fault.  I'm going to close this bug.

I would suggest you update your installation of pylab, and if it still goes
wrong, file a bug with pylab.

Thanks anyway,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan  9 17:10:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 9 Jan 2009 12:10:52 -0500
Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing
	pylab gives a "Bus Error"
In-Reply-To: <bug-2729-42@http.bugzilla.open-bio.org/>
Message-ID: <200901091710.n09HAqh1006971@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2729


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1189 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-09 12:10 EST -------
(From update of attachment 1189)
This didn't turn out to be related to Bug 2729 after all.

However, I've checked it in anyway.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dalloliogm at gmail.com  Fri Jan  9 17:17:53 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 9 Jan 2009 18:17:53 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <496778D2.1050801@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
Message-ID: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>

On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> In a previous thread (and indicated in others) it was suggested that perhaps
> Biopython needs some type of development  or experimental branch. So this
> thread is orientated to provide some discussion on this and considers that
> Biopython has moved to SVN.

Maybe you can consider the approach at the basis of git, in which
every developer works on its personal branch, and the owner of the
'official branch' can decide whether to accept the changes apported by
the single branches or not.

If you want to play a bit with it, you can use my repository at github:
- http://github.com/dalloliogm/biopython---popgen/commits/master
and then create a fork from it.
I am sorry that you will have to create an account on github.. but I
don't know of any other free hosting service for git repositories.

Git has also other advantages over svn, like working on local (which
is done by creating a local branch internally) and being faster (this
is what they say).
Well, I am not a git guru, but I can suggest you some good videos,
like this one:
- http://excess.org/article/2008/07/ogre-git-tutorial/


> I think it is very relevant discussion because
> Biopython needs an effective approach to mainly handle new code but also
> handle significant rewrites of older code.
>
> The most important question is do you support creating developmental and
> experimental branches or not?
>
> Please add other issues.
>
> I would appreciate these issues being addressed when appropriate.
>
> Regards
> Bruce
>
> Peter wrote:
>>
>> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>>
>>>
>>> Considering that CVS has no development branch I think having git is
>>> very good. I would just recommend extreme care with changing existing
>>> code. When merging back into CVS, changes to existing code might not
>>> go in (especially if they change interfaces) or be delayed.
>>>
>>>
>>
>> If there is a strong interest in having experimental branches in the
>> official Biopython repository, we could discuss that as an option.
>> Although I would prefer we get moved from CVS to SVN first before
>> actually doing this, in order to keep the migration as simple as
>> possible.
>>
>> Peter
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Fri Jan  9 17:28:06 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 17:28:06 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
Message-ID: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>

On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> Hi,
>> In a previous thread (and indicated in others) it was suggested that perhaps
>> Biopython needs some type of development  or experimental branch. So this
>> thread is orientated to provide some discussion on this and considers that
>> Biopython has moved to SVN.
>
> Maybe you can consider the approach at the basis of git, in which
> every developer works on its personal branch, and the owner of the
> 'official branch' can decide whether to accept the changes apported by
> the single branches or not.

In some ways this describes the current situation but without the
software: The CVS/SVN repository is the master official branch which
we (as a group) try and keep pretty stable.  When working on new
modules, individual developers or contributors have hacked away on
their own machines (perhaps using a local repository - I tended to
just save versioned snapshots of work in progress), and commit things
to the master once it was sufficiently stable to be approved.  For
self contained modules, this works OK - although using something like
git would be a bit more formalised and automated, and allow this kind
of "work in progress" to be done openly.

Peter


From dalloliogm at gmail.com  Fri Jan  9 17:43:26 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Fri, 9 Jan 2009 18:43:26 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
Message-ID: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com>

On Fri, Jan 9, 2009 at 6:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>> Hi,
>>> In a previous thread (and indicated in others) it was suggested that perhaps
>>> Biopython needs some type of development  or experimental branch. So this
>>> thread is orientated to provide some discussion on this and considers that
>>> Biopython has moved to SVN.
>>
>> Maybe you can consider the approach at the basis of git, in which
>> every developer works on its personal branch, and the owner of the
>> 'official branch' can decide whether to accept the changes apported by
>> the single branches or not.
>
> In some ways this describes the current situation but without the
> software: The CVS/SVN repository is the master official branch which
> we (as a group) try and keep pretty stable.  When working on new
> modules, individual developers or contributors have hacked away on
> their own machines (perhaps using a local repository - I tended to
> just save versioned snapshots of work in progress), and commit things
> to the master once it was sufficiently stable to be approved.  For
> self contained modules, this works OK - although using something like
> git would be a bit more formalised and automated, and allow this kind
> of "work in progress" to be done openly.

just a note: since I was trying to simplify the concept, I said
something which is not particularly correct.
In git, you are not needed to have a central repository. Everyone has
its personal branch and there is not such thing as an 'official
branch', unless it is defined by convention.

For example, look at this graph:
- http://github.com/blog/39-say-hello-to-the-network-graph-visualizer
on March 6th someone has created a fork to work on a mysql support,
which has not been merged in the ufficial branch yet.

There are many other forks, too: which one is the official?
The answer is none of them, but if the authors wanted, they could have
created a repository and decided that it was the official one, and
kept it up to date.


>
> Peter
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From biopython at maubp.freeserve.co.uk  Fri Jan  9 17:49:43 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 9 Jan 2009 17:49:43 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com>
Message-ID: <320fb6e00901090949v695333ak2615e9c217bc1387@mail.gmail.com>

> just a note: since I was trying to simplify the concept, I said
> something which is not particularly correct.
> In git, you are not needed to have a central repository. Everyone has
> its personal branch and there is not such thing as an 'official
> branch', unless it is defined by convention.

If we did want to adopt a git style approach, I do think we need an
official branch which would be used for the releases and installers
hosted on biopython.org, and this branch would be managed in much the
same way as we do now with CVS/SVN.

I think this would be essential for avoiding confusion in the typical end user.

Peter


From bartek at rezolwenta.eu.org  Fri Jan  9 18:17:09 2009
From: bartek at rezolwenta.eu.org (Bartek Wilczynski)
Date: Fri, 9 Jan 2009 19:17:09 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
Message-ID: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>

On Fri, Jan 9, 2009 at 6:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>>> Hi,
>>> In a previous thread (and indicated in others) it was suggested that perhaps
>>> Biopython needs some type of development  or experimental branch. So this
>>> thread is orientated to provide some discussion on this and considers that
>>> Biopython has moved to SVN.
>>
>> Maybe you can consider the approach at the basis of git, in which
>> every developer works on its personal branch, and the owner of the
>> 'official branch' can decide whether to accept the changes apported by
>> the single branches or not.
>
> In some ways this describes the current situation but without the
> software: The CVS/SVN repository is the master official branch which
> we (as a group) try and keep pretty stable.  When working on new
> modules, individual developers or contributors have hacked away on
> their own machines (perhaps using a local repository - I tended to
> just save versioned snapshots of work in progress), and commit things
> to the master once it was sufficiently stable to be approved.  For
> self contained modules, this works OK - although using something like
> git would be a bit more formalised and automated, and allow this kind
> of "work in progress" to be done openly.
>

It can be viewed this way, but the point here is that making this change to
the process of development might decrease the amount of work required to
join the  development. Especially, if you think about adding new library
to biopython, the most sensible way to do it is to branch and then
stabilize. I've
recently experienced (with Bio.Motif) that it might be tedious even
for a very simple
task. Also, using the distributed version control system, it is very
easy for a small team
of people to collaborate on a branch before merging back to the main
repository. In the
current mode this would be really difficult. And another  benefit is
that you do not loose
 the history of changes made "on a branch".

As for github, it is currently used by BioRuby project hosted on
open-bio.org. We can try
to talk to them and ask about their experiences. I'm not personally
involved in any way in it,
but it seems, that they've basically moved the main branch to github
and update the cvs repository
only occasionaly.

I think that for biopython, if we decided to use distributed version
control, it would
be better to use bazaar+launchpad instead of git+github. And for the
following reasons:
- it's completely free, as opposed to <300Mb of free account on github
- launchpad could make the transition very easy. They provide a
service of importing existing
open source projects  to launchpad:
https://help.launchpad.net/VcsImports They convert the trunk
to bazzaar for us and set it up to update from the cvs every 6-12
hours. It would be easy then to
see whether we like it like this or not
- bazaar is specifically aimed to be more user friendly than git, and
allows developers
to keep working in a familiar environment when moving from cvs or svn.
I think it is important since git
itself is really different from cvs and if we switch to anything else,
everybody needs to learn the tool.
- they use openID, which makes it simpler for people to join (even
though you still need another
 account)
- both bazaar and launchpad are developed in python, so they're more
python oriented
(while github is developed in ruby, so a better choice for bioruby).

More on comparing these to possibilities (from the bazaar developers
non-objective point of view):
http://bazaar-vcs.org/BzrVsGit

These are my 2 cents on the choice of  tools for development, but I
have to admit that I'm not
sure whether it is  needed for biopython now. I'm very open to discussion.

-- 
Bartek Wilczynski
==================
Postdoctoral fellow
EMBL, Furlong group
Meyerhoffstrasse 1,
69012 Heidelberg,
Germany
tel: +49 6221 387 8433


From chapmanb at 50mail.com  Fri Jan  9 22:51:55 2009
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 9 Jan 2009 17:51:55 -0500
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
Message-ID: <20090109225155.GF4135@sobchak.mgh.harvard.edu>

Hi all;
In terms of the coding of experimental modules, Giovanni is taking
an excellent approach. While they are under development, we can
utilize one of the many free hosting platforms to develop it as a
separate project in the Bio namespace. This allows interested users
to get the code, contribute, and test. Once an interface and
functionality is hammered out and they begin to stabilize, then it's
a good time to package it up and roll it into Biopython provided the
ol' mailing list consensus is happy.

This is a nice development model as it leverages the community, but
only rolls code into the main release when it stabilizes reasonable
well. Peter has taken a really good development methodology -- 
creating a rock solid stable core of modules, and actively deprecating
or fixing those that fall out of line.

My only suggestion would be to have a Biopython wiki page for the
experimental modules as they are under development. Something simple
with a description of the goals and a link to the source code would
help the majority of people who don't follow the mailing list find
and contribute to these.

Brad


> On Fri, Jan 9, 2009 at 6:28 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio
> > <dalloliogm at gmail.com> wrote:
> >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> >>> Hi,
> >>> In a previous thread (and indicated in others) it was suggested that perhaps
> >>> Biopython needs some type of development  or experimental branch. So this
> >>> thread is orientated to provide some discussion on this and considers that
> >>> Biopython has moved to SVN.
> >>
> >> Maybe you can consider the approach at the basis of git, in which
> >> every developer works on its personal branch, and the owner of the
> >> 'official branch' can decide whether to accept the changes apported by
> >> the single branches or not.
> >
> > In some ways this describes the current situation but without the
> > software: The CVS/SVN repository is the master official branch which
> > we (as a group) try and keep pretty stable.  When working on new
> > modules, individual developers or contributors have hacked away on
> > their own machines (perhaps using a local repository - I tended to
> > just save versioned snapshots of work in progress), and commit things
> > to the master once it was sufficiently stable to be approved.  For
> > self contained modules, this works OK - although using something like
> > git would be a bit more formalised and automated, and allow this kind
> > of "work in progress" to be done openly.
> >
> 
> It can be viewed this way, but the point here is that making this change to
> the process of development might decrease the amount of work required to
> join the  development. Especially, if you think about adding new library
> to biopython, the most sensible way to do it is to branch and then
> stabilize. I've
> recently experienced (with Bio.Motif) that it might be tedious even
> for a very simple
> task. Also, using the distributed version control system, it is very
> easy for a small team
> of people to collaborate on a branch before merging back to the main
> repository. In the
> current mode this would be really difficult. And another  benefit is
> that you do not loose
>  the history of changes made "on a branch".
> 
> As for github, it is currently used by BioRuby project hosted on
> open-bio.org. We can try
> to talk to them and ask about their experiences. I'm not personally
> involved in any way in it,
> but it seems, that they've basically moved the main branch to github
> and update the cvs repository
> only occasionaly.
> 
> I think that for biopython, if we decided to use distributed version
> control, it would
> be better to use bazaar+launchpad instead of git+github. And for the
> following reasons:
> - it's completely free, as opposed to <300Mb of free account on github
> - launchpad could make the transition very easy. They provide a
> service of importing existing
> open source projects  to launchpad:
> https://help.launchpad.net/VcsImports They convert the trunk
> to bazzaar for us and set it up to update from the cvs every 6-12
> hours. It would be easy then to
> see whether we like it like this or not
> - bazaar is specifically aimed to be more user friendly than git, and
> allows developers
> to keep working in a familiar environment when moving from cvs or svn.
> I think it is important since git
> itself is really different from cvs and if we switch to anything else,
> everybody needs to learn the tool.
> - they use openID, which makes it simpler for people to join (even
> though you still need another
>  account)
> - both bazaar and launchpad are developed in python, so they're more
> python oriented
> (while github is developed in ruby, so a better choice for bioruby).
> 
> More on comparing these to possibilities (from the bazaar developers
> non-objective point of view):
> http://bazaar-vcs.org/BzrVsGit
> 
> These are my 2 cents on the choice of  tools for development, but I
> have to admit that I'm not
> sure whether it is  needed for biopython now. I'm very open to discussion.
> 
> -- 
> Bartek Wilczynski
> ==================
> Postdoctoral fellow
> EMBL, Furlong group
> Meyerhoffstrasse 1,
> 69012 Heidelberg,
> Germany
> tel: +49 6221 387 8433
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From biopython at maubp.freeserve.co.uk  Sat Jan 10 14:46:13 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 14:46:13 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <20090109225155.GF4135@sobchak.mgh.harvard.edu>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
Message-ID: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>

On Fri, Jan 9, 2009 at 10:51 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Hi all;
> In terms of the coding of experimental modules, Giovanni is taking
> an excellent approach. While they are under development, we can
> utilize one of the many free hosting platforms to develop it as a
> separate project in the Bio namespace. This allows interested users
> to get the code, contribute, and test. Once an interface and
> functionality is hammered out and they begin to stabilize, then it's
> a good time to package it up and roll it into Biopython provided the
> ol' mailing list consensus is happy.

This does describe recent large additions fairly well - such as
Bio.SeqIO, Bio.AlignIO, Bio.Entrez, Bio.PopGen and most recently
Bio.Graphics.GenomeDiagram (which is a little different in that it was
previously publicly available as a separate module).

Modifications to existing bits of code (for example I have some
proposals for Seq, SeqRecord and Alignment objects as enhancement
bugs) don't really work in the same way - but also by their nature
require more discussion because they can indirectly affect a lot of
code.

> This is a nice development model as it leverages the community, but
> only rolls code into the main release when it stabilizes reasonable
> well. Peter has taken a really good development methodology --
> creating a rock solid stable core of modules, and actively deprecating
> or fixing those that fall out of line.

I really don't deserve all the credit here - Michiel has also been a
strong proponent for this "spring cleaning" as needed, for example how
our NCBI online bits have been rationalised, refocusing on Bio.Entrez
at the preferred module.

> My only suggestion would be to have a Biopython wiki page for the
> experimental modules as they are under development. Something simple
> with a description of the goals and a link to the source code would
> help the majority of people who don't follow the mailing list find
> and contribute to these.

Using the wiki in this way is a nice idea.  Tiago - do you fancy
adding a PopGen page describing the additions you're working on?  As a
bonus, once these do get into the main repository, you may find the
wiki text will be a useful basis for extending the documentation.

Peter


From mjldehoon at yahoo.com  Sat Jan 10 16:30:07 2009
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 10 Jan 2009 08:30:07 -0800 (PST)
Subject: [Biopython-dev] Rethinking Biopython's testing framework
In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com>
Message-ID: <126502.76038.qm@web62403.mail.re1.yahoo.com>

> > We could discuss a modification to run_tests.py so
> > that if there is no expected output file
> > output/test_XXX for test_XXX.py we just run
> > test_XXX.py and check its return value (I think
> > Michiel had previously
> > suggested something like this).
> 
> I think this should be done inside the test itself.
> All the tests should return only a boolean value (passed or
> not) and a description of the error.
> The tests that make use of an expected output file, they
> should open it and do the comparison by themselves, not in
> run_tests.py.

Sounds attractive, but there is one complication for print-and-compare tests. The code that does the print-and-compare is not trivial (see run_tests.py). It is possible to have the print-and-compare code in a helper module, which is then imported by each print-and-compare test. Still, while currently the print-and-compare tests have the advantage of being simple, they will get more complicated if we require the print-and-compare to be part of each test.

Does anybody have an opinion on this? It's either doing the print-and-compare as part of each print-and-compare test script, or requiring a test_suite() function in each unittest-based test script, and assuming that a test script is a unittest-based test script if it contains a test_suite() function.

--Michiel


From tiagoantao at gmail.com  Sat Jan 10 16:48:03 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 10 Jan 2009 16:48:03 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <496778D2.1050801@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com>
	<6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
Message-ID: <6d941f120901100848h6e186022o241b928ea2566993@mail.gmail.com>

This whole discussion is very interesting. In fact, whatever are the
conclusions I think they should be labeled "offical policy" and put on
the Wiki.

The biggest problem that I've faced is that, whenever I am doing
something, I don't know the level of acceptability with other
developers. I tend to put everything to discussion before I commit it
and whenever I say something I might get completely different answers
from time to time and from different people. The end result is that I
defer from commiting things because of issues that are raised in an
ad-hoc fashion.

There should be a page clarifying things like:
1. Are contributions that have a small target audience accepted?
2. Use of foreign libraries (e.g., SciPy)?
3. Code management policies. Branches?  Adding new code? Breaking interfaces?
4. New developers
5. Legal issues
6. Interop with non-free software
7. Code quality strategies. Code review? Testing?
8. Multiplatform issues

I am not saying a big document. But as questions arise, just discuss
them, arrive at a decision and document them. It becomes tiring having
to answer the same questions about code that you want to submit over
and over again and with different issues everytime.

One can live with decisions that are disliked, but it is much more
difficult to live when the playing ground is moving all the time.

On Fri, Jan 9, 2009 at 4:18 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> In a previous thread (and indicated in others) it was suggested that perhaps
> Biopython needs some type of development  or experimental branch. So this
> thread is orientated to provide some discussion on this and considers that
> Biopython has moved to SVN. I think it is very relevant discussion because
> Biopython needs an effective approach to mainly handle new code but also
> handle significant rewrites of older code.
>
> The most important question is do you support creating developmental and
> experimental branches or not?
>
> However, I do not think that this is a yes or no answer and I am not
> concerned about the question at the present time.  Rather I am concerned
> about the burden placed on the maintainers (especially Peter and Michiel),
> the expression of the developer needs and how this impact the community. I
> am rather neutral on it (probably because I have not contributed any major
> code to Biopython) but I would like to ensure that the discussion leads to
> positive changes.
>
> I find Biopython interesting and special for various reasons. There is a
> solid core of functions that are common to many aspects of bioinformatics.
> But it also contains very specialized code that has a much smaller audience.
> Consequently certain parts get considerable exposure and other parts get
> limited or no exposure. This means that it may be necessary to release beta
> versions in order to get the necessary exposure as I assume that code has
> had sufficient development to be released in the first place. Creating
> developmental and experimental branches is one way to get this exposure but
> perhaps branches are not necessary.
>
> An alternative approach is creating specialized projects within Biopython
> that can be used for development and testing. For example, Scipy provides
> SciKits that are related code that is typically special purpose or is
> released under a different license than scipy/numpy. This replaced the
> sandboxes that existed in prior versions of numpy and scipy. But a recent
> problem arose in numpy was how to get code from such a location into numpy
> by creating a experimental section in the main distribution but that met
> some strong resistance.
>
> Therefore, I see the following issues that need to be addressed regardless
> of the approach taken:
>
> 0) Must be easy for project maintenance and release as this must not create
> an extra burden to Biopython!
> 1) Ensure adequate testing is performed especially to get it out to the
> appropriate audience and to correct the code and APIs. I consider this
> rather important because I tend to follow a type of user experience design
> (http://en.wikipedia.org/wiki/User_experience_design) and software
> prototyping (http://en.wikipedia.org/wiki/Software_prototyping) for software
> development.
> 2) Stabilization of APIs for backwards compatibility as we don't want to
> change these with each Biopython release.
> 3) Adequate test coverage especially across platforms and different software
> versions. For example Windows paths and older software versions can cause
> problems on other peoples machines but not yours.
> 4) Some type of code review even if it is just to ensure a consistent format
> (like spaces versus tabs) or compatibility across Python versions and
> platforms.
> 5) If developmental or experimental branch are used then how does the code
> move into the main distribution and how are these branches created and
> destroyed.
>
> Please add other issues.
>
> I would appreciate these issues being addressed when appropriate.
>
> Regards
> Bruce
>
> Peter wrote:
>>
>> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>>
>>>
>>> Considering that CVS has no development branch I think having git is
>>> very good. I would just recommend extreme care with changing existing
>>> code. When merging back into CVS, changes to existing code might not
>>> go in (especially if they change interfaces) or be delayed.
>>>
>>>
>>
>> If there is a strong interest in having experimental branches in the
>> official Biopython repository, we could discuss that as an option.
>> Although I would prefer we get moved from CVS to SVN first before
>> actually doing this, in order to keep the migration as simple as
>> possible.
>>
>> Peter
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 
"Systems can remain irrational far longer than you or I can survive" -
Freely adapted from John Maynard Keynes


From tiagoantao at gmail.com  Sat Jan 10 16:52:44 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 10 Jan 2009 16:52:44 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
Message-ID: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>

On Sat, Jan 10, 2009 at 2:46 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Using the wiki in this way is a nice idea.  Tiago - do you fancy
> adding a PopGen page describing the additions you're working on?  As a
> bonus, once these do get into the main repository, you may find the
> wiki text will be a useful basis for extending the documentation.

Where do you want me to link the page on the Wiki?


From biopython at maubp.freeserve.co.uk  Sat Jan 10 17:03:05 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 17:03:05 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
	<6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
Message-ID: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>

On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> On Sat, Jan 10, 2009 at 2:46 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Using the wiki in this way is a nice idea.  Tiago - do you fancy
>> adding a PopGen page describing the additions you're working on?  As a
>> bonus, once these do get into the main repository, you may find the
>> wiki text will be a useful basis for extending the documentation.
>
> Where do you want me to link the page on the Wiki?

How about having two pages:

http://biopython.org/wiki/PopGen
- documentation on the code in the current official release,
- linked to from the main doc page

http://biopython.org/wiki/PopGen_dev
- discussion and links to your branch etc,
- linked to from the above PopGen page

This would be consistent with how I did the Bio.SeqIO pages,
http://biopython.org/wiki/SeqIO
http://biopython.org/wiki/SeqIO_dev

If you think you have an better idea, feel free to make suggestions.

Peter


From peter at maubp.freeserve.co.uk  Sat Jan 10 17:46:38 2009
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 17:46:38 +0000
Subject: [Biopython-dev] Developmental policies
Message-ID: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>

On Sat, Jan 10, 2009 at 4:48 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> This whole discussion is very interesting. In fact, whatever are the
> conclusions I think they should be labeled "offical policy" and put on
> the Wiki.

That sounds good.

> The biggest problem that I've faced is that, whenever I am doing
> something, I don't know the level of acceptability with other
> developers. I tend to put everything to discussion before I commit it
> and whenever I say something I might get completely different answers
> from time to time and from different people. The end result is that I
> defer from commiting things because of issues that are raised in an
> ad-hoc fashion.

Asking before doing things is in general a good plan.  Sadly not
everyone will be free to respond at any one time - but I agree with
you that having more of the defacto policy written out explicitly
would help.

> There should be a page clarifying things like:
> 1. Are contributions that have a small target audience accepted?

Historically yes this has happened - although my impression is that
the bar was perhaps set too low.  I would say some things were
accepted without sufficient documentation and tests.  The problem with
small interest modules is that if the original developer moves on, in
the absense of any apparent users, the module gets abandoned.  This
seems to explain several of the smaller modules we've deprecated in
the last couple of years.

On the other hand, somethings will start with a small target audience
that will grow.  If I was confident that the developer concerned would
stick arround for several years and was prepared to deal with
documentation, unit tests and bug fixes then I would be much happier
about including something, even if it might have a relatively small
target audience initially.

> 2. Use of foreign libraries (e.g., SciPy)?

I think the current stance has been to try and minimise 3rd party
dependencies, other than the special case of python wrappers for
command line tools.  This makes much easier for beginners to install
and use Biopython, and lowering the barrier to entry is a good thing.

There are practical points here too.  In general, 3rd party
dependencies can be a pain (e.g. our Martel parsers broke when
mxTextTools changed their API between 2.0 and 3.0).  Similarly they
can restrict the distribution of Biopython (e.g. NumPy isn't get
available on Windows for Python 2.6), and will also be a potential
road block for moving to Python 3.  As another example, a small part
of Bio.PDB uses flex in a parser, and again this makes building and
distributing it a real pain (so much so, that its been commented out
by default).

However, run time only dependencies (like pure python libraries and
command line tools) are not such an issue for packaging/distribution.
e.g. ReportLab (used in Bio.Graphics only).  If SciPy were to be used
by part of Bio.PopGen, and this didn't affect packaging/distribution
then this might be OK.

> 3. Code management policies. Branches?  Adding new code? Breaking interfaces?

Biopython has historically worked from a stable trunk.  As a
consequence we try and avoid breaking interfaces, instead adopting a
gradual deprecation of an old interface when adding a new interface,
or adding enhancements in a backwards compatible manor.

> 4. New developers

I think there is something written down about this already...

> 5. Legal issues

Try and avoid them?  What did you mean in particular?

> 6. Interop with non-free software

This is linked to the legal issues question.  Many of the tools we
link to like BLAST aren't open source, but are "free" as in cost.  I
don't think we have any examples of non-free software.

> 7. Code quality strategies. Code review? Testing?

Code review:
For new code in a specialist area, it can be difficult to get a
qualified second opinion on the approach, but existing developers can
at least comment on the coding style.  For existing code, my
impression is module owners have been trusted to make changes to
"their" code without review - and generally speaking this has worked
out OK.  Although if anyone spot someone making a change they disagree
with, then please do raise it.  I would hope any larger change had
some discussion before hand - possibly via enhancement entries on
bugzilla.

Testing:
I'd strongly resist adding any new module without an accompanying
test, and wish this had been a firm policy from day one.

> 8. Multiplatform issues

Ideally everything should be cross platform (like python itself).
There are exceptions to this - in particular some 3rd party tools are
not cross platform.  I personally use and test on Windows, Linux and
Mac - and I believe Michiel does too.

> I am not saying a big document. But as questions arise, just discuss
> them, arrive at a decision and document them. It becomes tiring having
> to answer the same questions about code that you want to submit over
> and over again and with different issues everytime.
> One can live with decisions that are disliked, but it is much more
> difficult to live when the playing ground is moving all the time.

I'm sorry if you've had that feeling.  However, circumstances change.
As I recall when you first asked about using SciPy as a dependency,
Biopython was still using Numeric instead of Numpy - so using SciPy
had to wait until after that transition.  Now that we have moved to
NumPy, I think you have a much stronger case.

Peter


From tiagoantao at gmail.com  Sat Jan 10 18:31:05 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Sat, 10 Jan 2009 18:31:05 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
Message-ID: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>

> mxTextTools changed their API between 2.0 and 3.0).  Similarly they
> can restrict the distribution of Biopython (e.g. NumPy isn't get
> available on Windows for Python 2.6), and will also be a potential
> road block for moving to Python 3.  As another example, a small part

By the way, another issue that would be interesting to address is
deprecation of older Python versions and Python 3. Like just having a
clear stance on what is the current feeling about this. It seems to be
a recurring question.


>> 5. Legal issues
>
> Try and avoid them?  What did you mean in particular?

In my opinion something should be said about this. Actually I think
(suggest) it is essencially a matter of mainly taking Bruce' s
comments (e.g. one cannot have derived works of non-free software) and
write them down on a wiki page. Just things potential contributor
would have to be aware of on a legal front.

> Testing:
> I'd strongly resist adding any new module without an accompanying
> test, and wish this had been a firm policy from day one.

People should also be encouraged to test (in as much as possible) in
at least Win/Linux/Mac. Of course, for some people it will be
difficult as access to all platforms is not always possible for
everybody. But at least encouragement should be made...


> I'm sorry if you've had that feeling.  However, circumstances change.
> As I recall when you first asked about using SciPy as a dependency,
> Biopython was still using Numeric instead of Numpy - so using SciPy
> had to wait until after that transition.  Now that we have moved to
> NumPy, I think you have a much stronger case.

Boss, don't say sorry, I think everybody would agree that you make a
most fantastic effort.

Regarding circunstances: When circunstances change, then one would
ammend documents.
Again, my point is not in favour of this or that policy. Only that a
barebones policy should be documented. So that people know what the
basic rules are, this will allow for realistic expectations with
regards to code being accepted or not in the stable distribution.


From peter at maubp.freeserve.co.uk  Sat Jan 10 20:10:27 2009
From: peter at maubp.freeserve.co.uk (Peter)
Date: Sat, 10 Jan 2009 20:10:27 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
Message-ID: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>

On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
> By the way, another issue that would be interesting to address is
> deprecation of older Python versions and Python 3. Like just having a
> clear stance on what is the current feeling about this. It seems to be
> a recurring question.

Regarding older versions of python, we have stated that Biopython 1.49
should work on Python 2.3 to 2.6, and we expect to do the same for
Biopython 1.50.  Thereafter, we will probably drop support for Python
2.3 (unless anyone has a strong need for it and makes their voice
heard).  See the mailing list archive and the corresponding new
postings:
http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/
http://news.open-bio.org/news/2008/11/biopython-release-149/

Regarding Python 3, one hold up will be neither ReportLab nor NumPy
have a clear plan for Python 3 - or at least that is my impression.
However, even ignoring those parts of Biopython which use NumPy (e.g.
Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab),
we have a lot of useful code.  In the short term we should be aiming
to have everything run under Python 2.6 in warnings mode, as a step
towards eventual Python 3 support.

Beyond that, I think that it is likely we'll want to use bytes rather
than (unicode) strings in Python 3 for the Seq object, but have not
given this much thought.

>>> 5. Legal issues
>>
>> Try and avoid them?  What did you mean in particular?
>
> In my opinion something should be said about this. Actually I think
> (suggest) it is essencially a matter of mainly taking Bruce' s
> comments (e.g. one cannot have derived works of non-free software) and
> write them down on a wiki page. Just things potential contributor
> would have to be aware of on a legal front.

I see what you mean.  Perhaps I am naive in thinking this should be
common knowledge amongst potential contributors.

>> Testing:
>> I'd strongly resist adding any new module without an accompanying
>> test, and wish this had been a firm policy from day one.
>
> People should also be encouraged to test (in as much as possible) in
> at least Win/Linux/Mac. Of course, for some people it will be
> difficult as access to all platforms is not always possible for
> everybody. But at least encouragement should be made...

Also tests which require additional setup are a pain.  The BioSQL
tests are an example of this, where it is unavoidable - but any
situation like this reduces the number of people/machines where that
test will get checked.  Michiel has stressed this kind of thing as a
concern in the past (as I recall).

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 14:31:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 09:31:22 -0500
Subject: [Biopython-dev] [Bug 2731] New: Adding .upper() and .lower()
	methods to the Seq object
Message-ID: <bug-2731-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731

           Summary: Adding .upper() and .lower() methods to the Seq object
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
 BugsThisDependsOn: 2532
OtherBugsDependingO 2351
             nThis:


As part of making the Seq object more string like (Bug 2351), it would be nice
to support the .upper() and .lower() methods.

Doing this elegantly will require different case versions of the alphabets (see
Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet
object itself.

Alternatively, we can handle this without adding new Alphabets by mapping the
fixed case IUPAC alphabets to case-less generic alphabets.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 14:31:25 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 09:31:25 -0500
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
	objects
In-Reply-To: <bug-2532-42@http.bugzilla.open-bio.org/>
Message-ID: <200901121431.n0CEVPFK010376@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2532


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |2731
              nThis|                            |


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 14:31:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 09:31:30 -0500
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
	even subclass string?
In-Reply-To: <bug-2351-42@http.bugzilla.open-bio.org/>
Message-ID: <200901121431.n0CEVUDG010399@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2351


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |2731


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bsouthey at gmail.com  Mon Jan 12 17:03:45 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 12 Jan 2009 11:03:45 -0600
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
Message-ID: <496B77F1.9060207@gmail.com>

Peter wrote:
> On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>   
>> By the way, another issue that would be interesting to address is
>> deprecation of older Python versions and Python 3. Like just having a
>> clear stance on what is the current feeling about this. It seems to be
>> a recurring question.
>>     
>
> Regarding older versions of python, we have stated that Biopython 1.49
> should work on Python 2.3 to 2.6, and we expect to do the same for
> Biopython 1.50.  Thereafter, we will probably drop support for Python
> 2.3 (unless anyone has a strong need for it and makes their voice
> heard).  See the mailing list archive and the corresponding new
> postings:
> http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/
> http://news.open-bio.org/news/2008/11/biopython-release-149/
>
> Regarding Python 3, one hold up will be neither ReportLab nor NumPy
> have a clear plan for Python 3 - or at least that is my impression.
>   
There has been limited information on the numpy list regarding Python 3 
but there has been some investigation on this 
(http://www.scipy.org/Python3k). I did ask about Python 3 last year in 
the thread titled 'Report from SciPy' and Robert Kern's response should 
be at:
http://www.mail-archive.com/numpy-discussion at scipy.org/msg12101.html

Also, this thread has the future aims of numpy (obviously still awaiting 
scipy 0.7):
http://www.mail-archive.com/numpy-discussion at scipy.org/msg12091.html

Currently I think the main current effort for numpy 1.3 is getting 
Python 2.6 fully supported (windows is the main problem) before there 
will be any further consideration of Python 3. One of the main problems 
is that numpy uses a few APIs that are depreciated in Python 3. So any 
porting will not go far until the correct APIs are used which is 
probably be after the next numpy release.

> However, even ignoring those parts of Biopython which use NumPy (e.g.
> Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab),
> we have a lot of useful code.  In the short term we should be aiming
> to have everything run under Python 2.6 in warnings mode, as a step
> towards eventual Python 3 support.
>   
While I understand this approach, I do wonder how effective it will be 
compared to direct porting using the 2to3 tool. One reason is that 2to3 
is more than a code convertor as it also attempts to guess at what you 
are trying to do.

Anyhow, this is not a trivial task and I am willing to help in that regard.
 
> Beyond that, I think that it is likely we'll want to use bytes rather
> than (unicode) strings in Python 3 for the Seq object, but have not
> given this much thought.
>
>   
>>>> 5. Legal issues
>>>>         
>>> Try and avoid them?  What did you mean in particular?
>>>       
>> In my opinion something should be said about this. Actually I think
>> (suggest) it is essencially a matter of mainly taking Bruce' s
>> comments (e.g. one cannot have derived works of non-free software) and
>> write them down on a wiki page. Just things potential contributor
>> would have to be aware of on a legal front.
>>     
>
> I see what you mean.  Perhaps I am naive in thinking this should be
> common knowledge amongst potential contributors.
>   
I think we must be explicit in this and ensure that any accepted code is 
BSD-compatible because we can not ensure what people really know. 
Further the license of any application that Biopython interacts with 
must be clearly stated and the developer is responsible to get one if it 
does not have one. That way we know what is included and should help 
users as well in terms of whether or not they can use some application.

>   
>>> Testing:
>>> I'd strongly resist adding any new module without an accompanying
>>> test, and wish this had been a firm policy from day one.
>>>       
>> People should also be encouraged to test (in as much as possible) in
>> at least Win/Linux/Mac. Of course, for some people it will be
>> difficult as access to all platforms is not always possible for
>> everybody. But at least encouragement should be made...
>>     
>
> Also tests which require additional setup are a pain.  The BioSQL
> tests are an example of this, where it is unavoidable - but any
> situation like this reduces the number of people/machines where that
> test will get checked.  Michiel has stressed this kind of thing as a
> concern in the past (as I recall).
>
> Peter
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>   
We can not force people to run tests but hope that sufficient people who 
do cover many of the variations as possible. Do we need to create 
buildbots (eg http://sourceforge.net/projects/buildbot/)?

I do not test or use BioSQL code because I do not use BioSQL and do not 
run a compatible database on my system. So it would be really great if 
BioSQL supported sqlite because the database requirements would be 
alleviated.

The other related aspect is that certain applications like clustalw must 
be in the path otherwise the application will not be found and the test 
skipped. But I do not know how to solve this except perhaps using 
environmental variables.

Regards
Bruce


From bsouthey at gmail.com  Mon Jan 12 17:34:50 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 12 Jan 2009 11:34:50 -0600
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>	
	<496397C9.3030706@gmail.com>	
	<6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
	<320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>
Message-ID: <496B7F3A.60407@gmail.com>

Peter wrote:
> On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman <jae at lmi.net> wrote:
>   
>> Greetings all,
>>
>> Presently, the code I have for dealing with STRUCTURE is similar to the code
>> for interacting with Clustal, in that it does not modify any of the STRUCTURE
>> source code by merely initiates the compiled executable.
>>     
>
> Biopython has code for interacting with lots of command line tools,
> and this neatly avoids any copyright/licence questions about being a
> derived work.
>   
I have no problem with this provided that the parsing follows documented 
information such a description of the output. I would have a problem if 
you based it code from another source that uses undocumented information 
or information not obvious from the output.

>   
>> Initially, I have used my code in place of their Java front end as it allows
>> for more control of the run-time variables for successive runs with varying
>> run parameters.  At some point, I'd like to get it to interface more
>> directly with the STRUCTURE code to be able to pipe results directly to
>> python for parsing rather than working with the STRUCTURE text output but
>> that's a ways off still.
>>     
>
> I'm not quite clear what you have in mind, but this would probably
> need a little more thought from the legal perspective.  If STRUCTURE
> provides an API with header files you can compile against, that should
> be OK (but I am not a lawyer).  Note that do this within Biopython
> would then mean adding another build time dependency, which would need
> to be justified in terms of the benefits it brings.
>
> Peter
>   
Linking against header files is a gray area but some views considered it 
to be illegal (see the Linux kernel discussions on that!). It does 
really depend on whether or not the result can be considered to a 
derivative.

Unless STRUCTURE is released under a BSD-compatible license, you should 
not use any code from it (and probably should not even look at the 
code). Just saying the code is free is insufficient because code 
licensed under the GPL is 'free' but not BSD-compatible. So if STRUCTURE 
does not have a license then either get one or forget about this until 
it does have a BSD-compatible license. Alternatively, get STRUCTURE to 
support your changes.

One is being difficult simply because of the potential impact on the 
Biopython project by including code incompatible with the BSD license.

Bruce


From biopython at maubp.freeserve.co.uk  Mon Jan 12 18:19:03 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 12 Jan 2009 18:19:03 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <496B77F1.9060207@gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
	<496B77F1.9060207@gmail.com>
Message-ID: <320fb6e00901121019h72463a5dl316cabc85100c09d@mail.gmail.com>

> We can not force people to run tests but hope that sufficient people who do
> cover many of the variations as possible. Do we need to create buildbots (eg
> http://sourceforge.net/projects/buildbot/)?

Some kind of "buildbots" would be nice - possibly with something
hosted on the OBF server to hold the reports (even just via the wiki
pages would work). I have access to one or two platforms at work which
might be able to act in this way, but the infrastructure isn't there
yet.

> I do not test or use BioSQL code because I do not use BioSQL and do not run
> a compatible database on my system. So it would be really great if BioSQL
> supported sqlite because the database requirements would be alleviated.

This was recently requested on the BioSQL mailing list - and it would be nice.

> The other related aspect is that certain applications like clustalw must be
> in the path otherwise the application will not be found and the test
> skipped. But I do not know how to solve this except perhaps using
> environmental variables.

Part of setting up a "buildbot" or test server would include
installing all the optional command line tools (like ClustalW) so that
the full test suite can be run.

Peter


From bsouthey at gmail.com  Mon Jan 12 22:24:00 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Mon, 12 Jan 2009 16:24:00 -0600
Subject: [Biopython-dev] Alphabet case and standards
Message-ID: <496BC300.90003@gmail.com>

Hi,
I am moving a potential discussion away from the bugzilla because it 
affects at least the following Bugs (please add others):
2351 (Make Seq more like a string, even subclass string? 
http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ),
2532 (Using IUPAC alphabets in mixed case Seq objects 
http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ),
2597 (Enforce alphabet letters in Seq objects 
http://bugzilla.open-bio.org/show_bug.cgi?id=2597 )
2731 (Adding .upper() and .lower() methods to the Seq object 
http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ).

I am hoping it gets wider feedback than using bugzilla, avoid 
unnecessary duplication and closure of these bugs.

 From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with 
defined lists of valid letters which are in upper case ONLY". But 
various applications ignore the alphabet case and hence the standards. 
So this creates the problem of how Biopython should handle alphabet case.

If we follow the standard for all modules then there should be not need 
to do anything except to ensure we follow it. There are numerous 
examples where the standard is not followed including users ignorance, 
simplicity or design (such as using mixed case to denote 'important' 
things), and various databases and applications do not follow it. But I 
think that the actual case is irrelevant in most situations and not 
following the standard would make Biopython inefficient.

One suggestion given in two of the bugs is to change the Alphabet object 
but I believe that this is wrong because you do not know which alphabet 
to use. If you already know the case then my preferred option is change 
the case of your query. Otherwise  you would have to obtain and use one 
alphabet for every case used, for example, a user may need two alphabets 
to handle upper and lower case or just one combined one. Also, if mixed 
case alphabets are used, then an excessive number of alphabets may be 
required.

I think that current approach is to force to user to using uppercase 
when interacting with the Alphabet object or derived from it (such as an 
actual alphabet). While this maintains storage of the input case, it 
does not enforce the standard. This is also inefficient because it 
requires constant checks for the correct case.

Similar to the first suggestion in Bug 2731, I think that we should 
automatically changes the case when creating any sequence-related object 
and provide a warning that the input has changed. This enforces standard 
and probably requires small changes to the code but loses the format of 
the input. Outside of Biopython, an example of this is the web version 
of NCBI blast silently converts input case of the query.

Less desirable options:
a) Enforces the standard such as with Bug 2597 so that an error is 
return for any sequence-related object if the case is incorrect. This is 
probably a little too harsh for a difference in case.
b) Use regular expressions to ignore case but this will create a large 
penalty especially if it is not required.

Regards
Bruce


From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 22:43:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 17:43:55 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901122243.n0CMhtlZ017015@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


------- Comment #1 from bsouthey at gmail.com  2009-01-12 17:43 EST -------
(In reply to comment #0)
> As part of making the Seq object more string like (Bug 2351), it would be nice
> to support the .upper() and .lower() methods.

Sure it would be nice in terms of following the string object, but I do not
follow the reasons for having .upper() and .lower() methods to the Seq object.
If we follow the standards, these should be unnecessary. The only time that I
see is when you want this is to output the sequence. In such situations, the
sequence is likely to be a string which has these methods.

I do not consider that other applications can handle different case a
sufficiently compelling reason.

> 
> Doing this elegantly will require different case versions of the alphabets (see
> Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet
> object itself.
> 
> Alternatively, we can handle this without adding new Alphabets by mapping the
> fixed case IUPAC alphabets to case-less generic alphabets.
> 

These comments suggests that Seq object needs to be case-aware which also
affects other methods like string queries. But I think this is a different
issue such as whether or not the standards would be enforced than having these
two methods. 

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Mon Jan 12 23:04:46 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 12 Jan 2009 23:04:46 +0000
Subject: [Biopython-dev] Alphabet case and standards
In-Reply-To: <496BC300.90003@gmail.com>
References: <496BC300.90003@gmail.com>
Message-ID: <320fb6e00901121504u6e9f3b7fu23e5f2ea25dee003@mail.gmail.com>

On Mon, Jan 12, 2009 at 10:24 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> I am moving a potential discussion away from the bugzilla because it affects
> at least the following Bugs (please add others):
> 2351 (Make Seq more like a string, even subclass string?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ),
> 2532 (Using IUPAC alphabets in mixed case Seq objects
> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ),
> 2597 (Enforce alphabet letters in Seq objects
> http://bugzilla.open-bio.org/show_bug.cgi?id=2597 )
> 2731 (Adding .upper() and .lower() methods to the Seq object
> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ).
>
> I am hoping it gets wider feedback than using bugzilla, avoid unnecessary
> duplication and closure of these bugs.

Yes, having a discussion on the mailing list is probably better than
on bugzilla.  I should probably write up my views on this topic
explicitly, but I've tried to do so below in reply to your points.

> From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with
> defined lists of valid letters which are in upper case ONLY". But various
> applications ignore the alphabet case and hence the standards. So this
> creates the problem of how Biopython should handle alphabet case.
> ...

I don't want to prevent people from using mixed case or lower case
sequences if they want to.  However, I do think doing so with an
alphabet which is intended to be an upper case ONLY should be treated
as an error.

We currently have a number of generic alphabets which DO NOT define
the a set of valid letters.  We also have some IUPAC derived alphabet
which define a set of upper case only expected letters.

So, if you want to use lower or mixed case sequences in a Seq object,
(1) Use a generic alphabet which does not explicitly define the valid
letters (so any characters are allowed)
(2) Use an explicit alphabet which includes the relevant cases.  This
could be a user defined alphabet, or we one added to Biopython.

Most of the time in my personally usage, I don't actually care about
the precise alphabet - the generic DNA/RNA/protein alphabets suffice.
These do not list the expected/allowed letters, and thus can be used
for upper case, lower case or mixed case sequences.  Working with well
defined alphabets is more important when working with things like
BLOSUM matrices.

> One suggestion given in two of the bugs is to change the Alphabet object but
> I believe that this is wrong because you do not know which alphabet to use.

The person creating the Seq object should know what kind of data they
are dealing with, and if they specifically want to use say "mixed case
unambiguous IUPAC DNA" (if this were in Biopython) then that's up to
them.  If you don't know exactly what you are dealing with, fall back
on the generic DNA alphabet, or the generic nucleotide alphabet, or
even the generic single letter alphabet.

> ... Also, if mixed case alphabets are used, then an excessive number
> of alphabets may be required.

We *could* introduce mixed case IUPAC alphabets, and lower case IUPAC
alphabets to complement the existing upper case IUPAC alphabets (see
my patch on 2532).  Yes, this does add a lot of alphabets, and I'm not
entirely keen on this either.  Maybe just adding mixed case versions
would suffice?

> I think that current approach is to force to user to using uppercase when
> interacting with the Alphabet object or derived from it (such as an actual
> alphabet). While this maintains storage of the input case, it does not
> enforce the standard. This is also inefficient because it requires constant
> checks for the correct case.

Right now we don't force the user to do anything.  I would like to
make the alphabet check strict (Bug 2579), or at least give a warning.
 Running with this change locally has flagged up several typos in my
unit tests - I think it is a good thing.

> Similar to the first suggestion in Bug 2731, I think that we should
> automatically changes the case when creating any sequence-related object and
> provide a warning that the input has changed. This enforces standard and
> probably requires small changes to the code but loses the format of the
> input. Outside of Biopython, an example of this is the web version of NCBI
> blast silently converts input case of the query.

My personal view on automatically changing the case of the sequence
string when creating a Seq object: NO WAY.  You're throwing away
potentially important data, and also preventing people from working
with mixed case sequences - for no real benefit.

> Less desirable options:
> a) Enforces the standard such as with Bug 2597 so that an error is return
> for any sequence-related object if the case is incorrect. This is probably a
> little too harsh for a difference in case.

It could be done as a warning for a couple of releases, and later an
error.  Why do you think it is too hash?  Maybe I am being pedantic
here, but lots of code gets written assuming uppercase letters only,
and in this situation having any unwanted lower case caught early is a
good thing.

To my mind the whole point about the user explicity using for example
the IUPAC protein alphabet is they expect the sequence to comply with
the IUPAC conventions.  I *WANT* to get an error if the sequence
contained something invalid like a "@" character, or anything else not
in the IUPAC definition.  Mixed cases are a special case of this (the
IUPAC standards use upper case).

> b) Use regular expressions to ignore case but this will create a large
> penalty especially if it is not required.

I'm not sure what you mean here, but I don't think regular expressions
are required.

Peter


From bugzilla-daemon at portal.open-bio.org  Mon Jan 12 23:30:49 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 18:30:49 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901122330.n0CNUnG7021141@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-12 18:30 EST -------
Created an attachment (id=1191)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1191&action=view)
Patch to Bio/Seq.py ONLY adding upper and lower methods

This patch is a proof of principle of how we could add upper and lower methods
while following the strict alphabet checking proposed on Bug 2597.  The code is
a little complicated/nasty in order to localise the change to Bio/Seq.py only.

Here is a usage example with the patch applied,

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_dna = Seq("AGGGTGTTGA",IUPAC.IUPACUnambiguousDNA())
>>> my_dna
Seq('AGGGTGTTGA', IUPACUnambiguousDNA())
>>> my_dna.lower()
Seq('agggtgttga', NucleotideAlphabet())
>>> my_dna.lower().upper()
Seq('AGGGTGTTGA', NucleotideAlphabet())

Note that If we implemented (private) upper and lower methods in the Alphabet
objects as I suggested on Bug 2532, the code in the Seq class would be much
simpler, e.g.

def upper(self) :
    return Seq(str(self).upper(), self.alphabet._upper())
def lower(self) :
    return Seq(str(self).lower(), self.alphabet._upper())

The generic alphabets (where the list of letters is undefined) would just
return self, while the AlphabetEncoders could also implement these methods
simply.  Individual explicit alphabets (i.e. the IUPAC ones) would have to
define sensible upper/lower mappings - perhaps by defining lower case variants
(see Bug 2532).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jan 13 00:21:42 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 19:21:42 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901130021.n0D0LgUu024264@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1191 is|0                           |1
           obsolete|                            |


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-12 19:21 EST -------
(From update of attachment 1191)
There are a couple of "if" statements which should be "elif", but otherwise the
patch seems to cover the basics.

However, it does not cover the pathological/evil situation where a LETTER has
been used for a stop codon or gap character.  e.g. Something this should happen
(assuming Bug 2597 is implemented in order to trigger the exception shown):

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC, Gapped
>>> my_dna = Seq("AGGGTXGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x"))
Traceback (most recent call last):
...
ValueError: Letter 'X' not in Gapped(IUPACUnambiguousDNA(), 'x')
>>> my_dna = Seq("AGGGTxGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x"))
>>> my_dna.lower()
Seq('agggtxgttga', Gapped(DNAAlphabet(), 'x'))
>>> my_dna.lower().upper()
Seq('AGGGTXGTTGA', Gapped(DNAAlphabet(), 'X'))

I think the most elegant way to deal with the AlphabetEncoders (stop and gaps)
is by adding (private) upper/lower methods to the Alphabet objects as I
outlined in comment 2. Patch taking this approach to follow...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Jan 13 00:30:55 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 12 Jan 2009 19:30:55 -0500
Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to
	the Seq object
In-Reply-To: <bug-2731-42@http.bugzilla.open-bio.org/>
Message-ID: <200901130030.n0D0UtHL024905@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2731


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-12 19:30 EST -------
Created an attachment (id=1192)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1192&action=view)
Patch to Bio/Seq.py and Bio/Alphabet/__init__.py

Implements upper/lower methods in the Seq object, handling the alphabet case
conversion in the Alphabet object using (private) upper/lower methods.  This
could be extended for the IUPAC alphabets if we add lower case variants to
those (see Bug 2532).

This works for the evil example in comment 3 where the case of any extra
characters from an AlphabetEncoder should also be changed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From dalloliogm at gmail.com  Tue Jan 13 11:49:19 2009
From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio)
Date: Tue, 13 Jan 2009 12:49:19 +0100
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
	<6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
	<320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>
Message-ID: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com>

On Sat, Jan 10, 2009 at 6:03 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o <tiagoantao at gmail.com> wrote:
>> On Sat, Jan 10, 2009 at 2:46 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> Using the wiki in this way is a nice idea.  Tiago - do you fancy
>>> adding a PopGen page describing the additions you're working on?  As a
>>> bonus, once these do get into the main repository, you may find the
>>> wiki text will be a useful basis for extending the documentation.
>>
>> Where do you want me to link the page on the Wiki?
>
> How about having two pages:
>
> http://biopython.org/wiki/PopGen
> - documentation on the code in the current official release,
> - linked to from the main doc page
>
> http://biopython.org/wiki/PopGen_dev

ok, I have started writing something there..


_______________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


-- 

My blog on bioinformatics (now in English): http://bioinfoblog.it


From tiagoantao at gmail.com  Tue Jan 13 12:14:05 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 13 Jan 2009 12:14:05 +0000
Subject: [Biopython-dev] Structure and LDNe
In-Reply-To: <496B7F3A.60407@gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496397C9.3030706@gmail.com>
	<6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net>
	<320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com>
	<496B7F3A.60407@gmail.com>
Message-ID: <6d941f120901130414v3f770f3dy84bc44e4b4a8e25f@mail.gmail.com>

> Linking against header files is a gray area but some views considered it to
> be illegal (see the Linux kernel discussions on that!). It does really
> depend on whether or not the result can be considered to a derivative.

Fortunately this is not the case with Jason's code.
Anyway, if there is agreement on what you said, I think most of the
comments made should be put on the Wiki in some form. I don't mind to
draft something myself based on your comments.


From tiagoantao at gmail.com  Tue Jan 13 12:34:56 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 13 Jan 2009 12:34:56 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <496B77F1.9060207@gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
	<496B77F1.9060207@gmail.com>
Message-ID: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>

> I think we must be explicit in this and ensure that any accepted code is
> BSD-compatible because we can not ensure what people really know. Further
> the license of any application that Biopython interacts with must be clearly
> stated and the developer is responsible to get one if it does not have one.
> That way we know what is included and should help users as well in terms of
> whether or not they can use some application.


A point is not clear here to me: If you only interact with an (say
command-line and web-based) application, is there a problem if that
application has an unspecified license? There are 3 dimensions here
that I find important
1. If biopython interacts with a application with no license are there
possible liabilities with regards to the project? The same question in
regards to users?
2. I would remember that interaction might be library based (with
linking - where we know problems exist), command-line based (are there
any problems?) and web-based (are there any problems different from
the command-line case?).
3. I would suppose (for licensed non-free apps) that some licenses
might not be clear in regards to this kind of usage. Would it be
necessary to inspect the licenses in detail?

A strict view regarding software without licenses (ie, no interaction
at all) would require immediate removal of the fdist code (not very
important, it is the part that is probably not used by anyone). No
inclusion of LDNe code. And more importantly no STRUCTURE interaction
code and no Genepop interaction code (although the file format parser
that currently inside is OK).

So, the very pertinent question are:
1. Can biopython command-line interact with applications with no license?
2. Is biopython interacting with applications (command-line or web)
for which the license is not clear regarding interaction with
software?


From p.j.a.cock at googlemail.com  Tue Jan 13 12:54:57 2009
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 13 Jan 2009 12:54:57 +0000
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>
	<496B77F1.9060207@gmail.com>
	<6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
Message-ID: <320fb6e00901130454i13f1faedw29e049f9b9df9478@mail.gmail.com>

> So, the very pertinent question are:
> 1. Can biopython command-line interact with applications with no license?

I think so, yes.  If there was a license then it may try and impose
rules which could prevent this (possible in some legal
jurisdictions?).  Even "viral" licences like the GPL should be fine in
this context.

However, for the Population Genetics software you are talking about,
trying to get the authors to make their licence explicit would be
worthwhile (even if they just say its given freely to the public
domain or whatever the terminology is).

> 2. Is biopython interacting with applications (command-line or web)
> for which the license is not clear regarding interaction with
> software?

For command line tools (e.g. ClustalW, BLAST) calling them from a
script is common practice.  In fact, by the nature command line tools
are generally expected to be used in this way.  I think we are OK
here.

For web tools, in some cases the provider provides clear instructions
(e.g. NCBI and BLAST and Entrez).  Another example is Bio.PDB can
fetch files from the FTP site - which is by its nature provided as a
public server.  In other cases things are perhaps a little less clear
cut.  Speaking generally, many websites do have conditions imposed in
their terms of service (e.g. TV listing sites don't want people
"screen scraping" with a script to "steal" the schedule information),
although these may not be legally enforeable.  However, this is
unlikely to be a problem in the academic setting applicable to most
websites Biopython may interact with.

Peter


From bsouthey at gmail.com  Tue Jan 13 16:50:28 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Tue, 13 Jan 2009 10:50:28 -0600
Subject: [Biopython-dev] Developmental policies
In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com>	
	<6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com>	
	<320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com>	
	<496B77F1.9060207@gmail.com>
	<6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com>
Message-ID: <496CC654.5090806@gmail.com>

Tiago Ant?o wrote:
>> I think we must be explicit in this and ensure that any accepted code is
>> BSD-compatible because we can not ensure what people really know. Further
>> the license of any application that Biopython interacts with must be clearly
>> stated and the developer is responsible to get one if it does not have one.
>> That way we know what is included and should help users as well in terms of
>> whether or not they can use some application.
>>     
>
>
> A point is not clear here to me: If you only interact with an (say
> command-line and web-based) application, is there a problem if that
> application has an unspecified license? There are 3 dimensions here
> that I find important
> 1. If biopython interacts with a application with no license are there
> possible liabilities with regards to the project? The same question in
> regards to users? 
>   
I do not think that there is any real difference between the developer 
and the user as ignorance is usually not a good defense.

If you use code from another application in your project with little or 
no modification (such as rewriting the code into Python) or did 
reverse-engineering or even looked at the code then your application 
could be controlled by the license of that application. Obviously if it 
has a license then you must abide those terms. If it does not have a 
license and you do not get permission to use that code then you have 
violated the original author's copyrights and you are liable for 
damages. Of course, as in one of the most important open-source related 
cases in the USA, the Jacobsen v. Katzer case (eg 
http://www.groklaw.net/article.php?story=2008081313212422 ) about the 
Java Model Railroad Interface (JMRI), those damages may be nothing.

> 2. I would remember that interaction might be library based (with
> linking - where we know problems exist), command-line based (are there
> any problems?) and web-based (are there any problems different from
> the command-line case?).
>   
Unless the application forbids it then there is no problem on how you 
actually run the application. As Peter said, web tools also have 
conditions that you have keep or you will find yourself locked out.

The main problem is using someone else's code in your project and the 
real problem is the actual terms of the code used. Using a function from 
that code in yours is a potential violation such as how to parse the 
output especially if it is in a binary format.  If your code clearly 
follows the published documentation or a clean-room approach (see 
http://en.wikipedia.org/wiki/Clean_room_design ) was properly used then 
there should no problems. Linking only becomes a problem if your code 
can be considered a derivative or the license forbids linking such as 
the GPL but not the LGPL. However, this is a grey area as evident from 
the use of binary drivers in Linux.

> 3. I would suppose (for licensed non-free apps) that some licenses
> might not be clear in regards to this kind of usage. Would it be
> necessary to inspect the licenses in detail?
>   
Yes, you must inspect any license in detail because even downloading the 
code can involve or imply acceptance of the terms. Some licenses, 
usually for commercial applications, are rather nasty in terms what can 
and can not be done like no reverse engineering. Even open source 
license like the GPL v3 can have some unexpected side effects (ie 
related to patents). Most non-open source licenses (including academic 
only licenses) that I have seen related to bioinformatics usually are 
aimed at restricting the commercial usage of the code and the subsequent 
distribution of it. But you need to see if there are other restrictions 
involved that limit the output from that application.

> A strict view regarding software without licenses (ie, no interaction
> at all) would require immediate removal of the fdist code (not very
> important, it is the part that is probably not used by anyone). No
> inclusion of LDNe code. And more importantly no STRUCTURE interaction
> code and no Genepop interaction code (although the file format parser
> that currently inside is OK).
>   
If the interaction is just creating inputs, running the standalone 
application and parsing the output, then those interactions should be 
okay. Obviously the code to create the input and parse the output must 
be free of the application like based on public documentation or a 
clean-room approach.

If the interaction creates a derivative such as when the code of the 
application is required in addition to your code then it is not okay. 
Further, as Peter commented elsewhere, there needs to be strong 
justification to include it into Biopython. Rather I would strongly 
suggest that you try to get your code included in the other application 
as it may help other users and you don't have to maintain a version of 
the original application.
> So, the very pertinent question are:
> 1. Can biopython command-line interact with applications with no license?
>   
Yes, but must not be considered a derivative of the application or it 
must do so in terms of the license. For example, AlignACE uses the 
Harvard  University license where everyone using it must have their own 
license or it can be run on a second computer provided that only one 
copy is running at a time.

> 2. Is biopython interacting with applications (command-line or web)
> for which the license is not clear regarding interaction with
> software?
>   
I do not know the answer to this question because I do not know or use 
all the applications involved. However, we do need to create a list of 
applications with associated web sites and licenses that Biopython 
'interacts' with which would answer this question.

Regards
Bruce


From bsouthey at gmail.com  Wed Jan 14 20:24:29 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 14 Jan 2009 14:24:29 -0600
Subject: [Biopython-dev] Running Biopython tests on windows xp
Message-ID: <496E49FD.4080305@gmail.com>

Hi,
I decided to install windows on a virtual system part to have a windows 
test system.  I installed Python 2.5, numpy 1.2 and biopython 1.49 using 
binary installers. I am aiming to get add the optional software like 
Reportlab and a C compiler.

Is there a way to run the Biopython tests within Python rather than 
using the system command line?

When I run the tests from the command like I get a number a failures 
that I think are due to a lack of a C compiler.
Are these expected or do you want bug reports?

Bruce

C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.49>c:\Pyt
hon25\python.exe setup.py test
running test
test_Ace ... ok
test_AlignIO ... ok
test_BioSQL ... skipping. Install MySQLdb or correct 
Tests/setup_BioSQL.py (not
important if you do not plan to use BioSQL).
ok
test_BioSQL_SeqIO ... skipping. Install MySQLdb or correct 
Tests/setup_BioSQL.py
 (not important if you do not plan to use BioSQL).
ok
test_CAPS ... ERROR
test_Clustalw ... ok
test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you 
want to us
e Bio.Clustalw.
ok
test_Cluster ... FAIL
test_CodonTable ... ok
test_CodonUsage ... ok
test_Compass ... ok
test_Crystal ... ok
test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL.
ok
test_EmbossPrimer ... ok
test_Entrez ... ok
test_Enzyme ... ok
test_FSSP ... ok
test_Fasta ... ok
test_Fasta2 ... ok
test_File ... ok
test_GACrossover ... ok
test_GAMutation ... ok
test_GAOrganism ... ok
test_GAQueens ... ok
test_GARepair ... ok
test_GASelection ... ok
test_GFF ... skipping. Environment is not configured for this test (not 
importan
t if you do not plan to use Bio.GFF).
ok
test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF.
ok
test_GenBank ... ok
test_GraphicsChromosome ... skipping. Install reportlab if you want to 
use Bio.G
raphics.
ok
test_GraphicsDistribution ... skipping. Install reportlab if you want to 
use Bio
.Graphics.
ok
test_GraphicsGeneral ... skipping. Install reportlab if you want to use 
Bio.Grap
hics.
ok
test_HMMCasino ... ok
test_HMMGeneral ... ok
test_HotRand ... ok
test_IsoelectricPoint ... ok
test_KDTree ... ERROR
test_KEGG ... ok
test_KeyWList ... ok
test_Location ... ok
test_LocationParser ... ok
test_LogisticRegression ... ok
test_MEME ... ok
test_MarkovModel ... ok
test_Medline ... ok
test_NCBIStandalone ... ok
test_NCBIXML ... ok
test_NCBI_qblast ... ok
test_NNExclusiveOr ... ok
test_NNGene ... ok
test_NNGeneral ... ok
test_Nexus ... ok
test_PDB ... ERROR
test_ParserSupport ... ok
test_Pathway ... ok
test_Phd ... ok
test_PopGen_FDist ... skipping. Install FDist if you want to use 
Bio.PopGen.FDis
t.
ok
test_PopGen_FDist_nodepend ... ok
test_PopGen_GenePop ... ok
test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use 
Bio.PopGen
.SimCoal.
ok
test_PopGen_SimCoal_nodepend ... ok
test_ProtParam ... ok
test_Registry ... ok
test_Restriction ... ERROR
test_SCOP_Astral ... ok
test_SCOP_Cla ... FAIL
test_SCOP_Des ... ok
test_SCOP_Dom ... ok
test_SCOP_Hie ... ok
test_SCOP_Raf ... FAIL
test_SCOP_Residues ... ok
test_SCOP_Scop ... ok
test_SProt ... ok
test_SVDSuperimposer ... ok
test_SeqIO ... ok
test_SeqIO_online ... ok
test_SeqUtils ... ok
test_SubsMat ... ok
test_UniGene ... ok
test_Wise ... skipping. Don't know how to find the Wise2 tool dnal on 
Windows.
ok
test_align ... ok
test_docstrings ... ok
test_geo ... ok
test_interpro ... ok
test_kNN ... ok
test_lowess ... ok
test_pairwise2 ... ok
test_prodoc ... ok
test_property_manager ... ok
test_prosite ... ok
test_prosite2 ... ok
test_psw ... skipping. Don't know how to find the Wise2 tool dnal on 
Windows.
ok
test_seq ... ok
test_translate ... ok
test_trie ... ERROR
test_triefind ... ERROR

======================================================================
ERROR: test_CAPS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_CAPS.py", line 3, in <module>
    from Bio.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\Restriction.py", line 96, in <module>
    from Bio.Restriction.PrintFormat import PrintFormat
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\PrintFormat.py", line 14, in <module>
    from Bio.Restriction.DNAUtils import complement
ImportError: No module named DNAUtils

======================================================================
ERROR: test_KDTree
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_KDTree.py", line 10, in <module>
    from Bio.KDTree.KDTree import _neighbor_test, _test
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\__init__.py", line 10, in <module>
    from KDTree import KDTree
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\KDTree.py", line 20, in <module>
    from Bio.KDTree import _CKDTree
ImportError: cannot import name _CKDTree

======================================================================
ERROR: test_PDB
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_PDB.py", line 98, in <module>
    run_test()
  File "test_PDB.py", line 90, in run_test
    quick_neighbor_search_test()
  File "test_PDB.py", line 19, in quick_neighbor_search_test
    from Bio.PDB.NeighborSearch import NeighborSearch
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\PDB\NeighborSearch.py", line 8, in <module>
    from Bio.KDTree import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\__init__.py", line 10, in <module>
    from KDTree import KDTree
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\KDTree.py", line 20, in <module>
    from Bio.KDTree import _CKDTree
ImportError: cannot import name _CKDTree

======================================================================
ERROR: test_Restriction
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_Restriction.py", line 8, in <module>
    from Bio.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\__init__.py", line 61, in <module>
    from Bio.Restriction.Restriction import *
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\Restriction.py", line 96, in <module>
    from Bio.Restriction.PrintFormat import PrintFormat
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\Restriction\PrintFormat.py", line 13, in <module>
    from Bio.Restriction import RanaConfig as RanaConf
ImportError: cannot import name RanaConfig

======================================================================
ERROR: test_trie
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_trie.py", line 6, in <module>
    from Bio import trie
ImportError: cannot import name trie

======================================================================
ERROR: test_triefind
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_triefind.py", line 6, in <module>
    from Bio import trie
ImportError: cannot import name trie

======================================================================
FAIL: test_Cluster
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 162, in runSafeTest
    expected_handle)
  File "run_tests.py", line 263, in compare_output
    % (repr(output_line), repr(expected_line))
AssertionError:
Output  : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n'
Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n'

======================================================================
FAIL: test_SCOP_Cla
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 162, in runSafeTest
    expected_handle)
  File "run_tests.py", line 263, in compare_output
    % (repr(output_line), repr(expected_line))
AssertionError:
Output  : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n'
Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n'

======================================================================
FAIL: test_SCOP_Raf
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 162, in runSafeTest
    expected_handle)
  File "run_tests.py", line 263, in compare_output
    % (repr(output_line), repr(expected_line))
AssertionError:
Output  : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n'
Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n'

----------------------------------------------------------------------
Ran 96 tests in 86.153s

FAILED (failures=3, errors=6)

C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.49>


From tiagoantao at gmail.com  Wed Jan 14 20:52:58 2009
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 14 Jan 2009 20:52:58 +0000
Subject: [Biopython-dev] Developmental and experimental branches
In-Reply-To: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com>
References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com>
	<496778D2.1050801@gmail.com>
	<5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com>
	<320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com>
	<8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com>
	<20090109225155.GF4135@sobchak.mgh.harvard.edu>
	<320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com>
	<6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com>
	<320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com>
	<5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com>
Message-ID: <6d941f120901141252x1a1088f9n7f30d894f35c18ab@mail.gmail.com>

>> http://biopython.org/wiki/PopGen_dev
>
> ok, I have started writing something there..

I've edited the development one. I would recommend anyone interested
in tracking the changes to watch the page.


From biopython at maubp.freeserve.co.uk  Wed Jan 14 21:43:33 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 14 Jan 2009 21:43:33 +0000
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <496E49FD.4080305@gmail.com>
References: <496E49FD.4080305@gmail.com>
Message-ID: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>

On Wed, Jan 14, 2009 at 8:24 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Hi,
> I decided to install windows on a virtual system part to have a windows test
> system.  I installed Python 2.5, numpy 1.2 and biopython 1.49 using binary
> installers. I am aiming to get add the optional software like Reportlab and
> a C compiler.

If you are installing Biopython using our Windows Installer then you
shouldn't need a C compiler.

If you would like to install from source, then yes, you will need a C
compiler.  You can either try the appropriate MS compiler for your
version of python, or we suggest Mingw32 from cygwin.

> Is there a way to run the Biopython tests within Python rather than using
> the system command line?

Not really - why do you want to?  I suppose you could use python to
invoke the command "python run_tests.py".

> When I run the tests from the command like I get a number a failures that I
> think are due to a lack of a C compiler.
>
> Are these expected or do you want bug reports?

These are not expected.  The whole test suite passes for me on Windows
where I have installed Biopython from source.

So you installed Biopython using our Window Installer - how did you
get the unit tests?  I'm pretty sure the SCOP failures are due to the
files under Tests\SCOP having Unix line endings instead of Windows
line endings (we're fixed some similar issues in the past).  Note that
both the source code archives as *.zip and *.tar.gz use Unix line
endings internally, but if you used CVS it should have got them with
Windows line endings for you.

However, most of your test failures do seem to be related to C code in
some way.  I wonder if this is linked to the virtual environment?  I
should be able to try the Biopython 1.49 installer with Python 2.5 on
a Windows machine myself to check that...

The list of failures:
> test_CAPS ... ERROR
> test_Cluster ... FAIL
> test_KDTree ... ERROR
> test_PDB ... ERROR
> test_Restriction ... ERROR
> test_SCOP_Cla ... FAIL
> test_SCOP_Raf ... FAIL
> test_trie ... ERROR
> test_triefind ... ERROR

And some comments on the messages:

> ERROR: test_CAPS
> ...
>   from Bio.Restriction.DNAUtils import complement
> ImportError: No module named DNAUtils

Strange.  Note Bio.Restriction.DNAUtils is a C module.

> ERROR: test_KDTree
> ...
>   from Bio.KDTree import _CKDTree
> ImportError: cannot import name _CKDTree

Again, Bio.KDTree. _CKDTree is a C module

> ERROR: test_PDB
> ...
>   from Bio.KDTree import _CKDTree
> ImportError: cannot import name _CKDTree

Same failure as test_KDTree

> ERROR: test_Restriction
> ...
>   from Bio.Restriction import RanaConfig as RanaConf
> ImportError: cannot import name RanaConfig

Odd.  RanaConfig is a pure python module, and pretty short too.

> ERROR: test_trie
> ...
>   from Bio import trie
> ImportError: cannot import name trie

Bio.trie is another C module

> ERROR: test_triefind
> ...
>   from Bio import trie
> ImportError: cannot import name trie

Same error as test_trie above.

> FAIL: test_Cluster
> ...
> Output  : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n'
> Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n'

Could you run this test directly (python test_Cluster.py) which should
give a more helpful message.  But again, this module does include some
C code....

> FAIL: test_SCOP_Cla
> ...
> Output  : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n'
> Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n'

I think this is just a new line issue.

> FAIL: test_SCOP_Raf
> ...
> Output  : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n'
> Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n'

I think this is just a new line issue.

Peter


From bsouthey at gmail.com  Wed Jan 14 22:48:27 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 14 Jan 2009 16:48:27 -0600
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
Message-ID: <496E6BBB.2020506@gmail.com>

Peter wrote:
> These are not expected.  The whole test suite passes for me on Windows
> where I have installed Biopython from source.
>
> So you installed Biopython using our Window Installer - how did you
> get the unit tests?  I'm pretty sure the SCOP failures are due to the
> files under Tests\SCOP having Unix line endings instead of Windows
> line endings (we're fixed some similar issues in the past).  Note that
> both the source code archives as *.zip and *.tar.gz use Unix line
> endings internally, but if you used CVS it should have got them with
> Windows line endings for you.
>
> However, most of your test failures do seem to be related to C code in
> some way.  I wonder if this is linked to the virtual environment?  I
> should be able to try the Biopython 1.49 installer with Python 2.5 on
> a Windows machine myself to check that...
>
> The list of failures:
>   
>> test_CAPS ... ERROR
>> test_Cluster ... FAIL
>> test_KDTree ... ERROR
>> test_PDB ... ERROR
>> test_Restriction ... ERROR
>> test_trie ... ERROR
>> test_triefind ... ERROR
>>     
Using IDLE, 'from Bio.Restriction import *' works correctly.

These ones are failures to find the correct biopython installation. 
Both  'python setup.py test' and 'python run_tests.py' are assuming that 
I have built from source and everything is in the local directory. But 
that assumption is wrong since I used the Biopython binary installer so 
technically the tests I run are invalid. The difference for these 
failures can be seen here:

C:\Documents and 
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe test_KDTree.py
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.
Passed.

C:\Documents and 
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe run_tests.py test_KDTree.py
test_KDTree ... ERROR

======================================================================
ERROR: test_KDTree
----------------------------------------------------------------------
Traceback (most recent call last):
  File "run_tests.py", line 125, in runTest
    self.runSafeTest()
  File "run_tests.py", line 138, in runSafeTest
    cur_test = __import__(self.test_name)
  File "test_KDTree.py", line 10, in <module>
    from Bio.KDTree.KDTree import _neighbor_test, _test
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\__init__.py", line 10, in <module>
  File "C:\Documents and 
Settings\virtualme\Desktop\biopython-1.49\biopython-1.4
9\Bio\KDTree\KDTree.py", line 20, in <module>
ImportError: cannot import name _CKDTree

----------------------------------------------------------------------
Ran 1 test in 0.100s

FAILED (errors=1)


For the SCOP tests, this is as you say, a 'end of line' issue between 
windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and 
saved it with a new name. The line from testIndex in test_SCOP_Cla.py 
that gave the error index['d4hbia_'] works with the new file but not the 
old file.

I also installed reportlab and biosql and these pass the tests (except 
for the mysql warning with Biosql that Peter reported).

Regards
Bruce


From biopython at maubp.freeserve.co.uk  Wed Jan 14 23:27:27 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 14 Jan 2009 23:27:27 +0000
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <496E6BBB.2020506@gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
	<496E6BBB.2020506@gmail.com>
Message-ID: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>

On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> Using IDLE, 'from Bio.Restriction import *' works correctly.
>
> These ones are failures to find the correct biopython installation. Both
>  'python setup.py test' and 'python run_tests.py' are assuming that I have
> built from source and everything is in the local directory. But that
> assumption is wrong since I used the Biopython binary installer so
> technically the tests I run are invalid.

I think I understand what's going on now.  All these failures are
essentially due to the unusual and unexpected setup on your machine
(or for the SCOP tests, the line endings).  You still didn't explain
how/where you installed the test scripts etc, but what I think is
happening is the following:

You're official installation (including the compiled C code) create
using the Windows Installer is in one place, say under
C:\XXX\site-packages for the sake of discussion.

You've unpacked the source code in another location, and are trying to
run the test suite there.  This set of files will NOT have the
compiled C code - and thus running some of the tests via run_tests.py
will fail.  If you run individual test_XXX.py files this should use
the system installed files under C:\XXX\site-packages and so the test
should work.

It would be a bit of a hack, but you can probably overcome this by
manually copying the installed compiled modules from
C:\XXX\site-packages into the unpacked source code (under a suitably
named build sub directory), or moving the Test suite next to the
installed code.

Alternatively, you could try editing run_tests.py to comment out the
path "magic" so that is just uses the system installation of Biopython
(rather than trying to use the local copy it expects you to have just
built from source), i.e. try commenting out these two lines in
run_tests.py found near the start of the main function:

sys.path.insert(1, source_path)
sys.path.insert(1, build_path)

However, I'm no longer surprised that the C code tests are failing,
and don't think this is a bug per se.

> For the SCOP tests, this is as you say, a 'end of line' issue between
> windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and
> saved it with a new name. The line from testIndex in test_SCOP_Cla.py that
> gave the error index['d4hbia_'] works with the new file but not the old
> file.

Good to confirm that.  If you spot an easy cross platform fix so that
the SCOP code can cope with either line ending that would be good, but
I didn't consider this worth sending much time on.

> I also installed reportlab and biosql and these pass the tests (except for
> the mysql warning with Biosql that Peter reported).

Good.  Out of interest, which BioSQL warning are you talking about?

Peter


From bsouthey at gmail.com  Thu Jan 15 03:10:30 2009
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 14 Jan 2009 21:10:30 -0600
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
	<496E6BBB.2020506@gmail.com>
	<320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>
Message-ID: <bbcd77d00901141910s304ef76brc92257af459361b0@mail.gmail.com>

On Wed, Jan 14, 2009 at 5:27 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> Using IDLE, 'from Bio.Restriction import *' works correctly.
>>
>> These ones are failures to find the correct biopython installation. Both
>>  'python setup.py test' and 'python run_tests.py' are assuming that I have
>> built from source and everything is in the local directory. But that
>> assumption is wrong since I used the Biopython binary installer so
>> technically the tests I run are invalid.
>
> I think I understand what's going on now.  All these failures are
> essentially due to the unusual and unexpected setup on your machine
> (or for the SCOP tests, the line endings).

I do not see it as unusual as it does follow the instructions. But
these clearly need some enhancement to address perhaps a variation of
one of the options below.

I am now curious about what happens under Linux distros because these
may have the same issue.

> You still didn't explain
> how/where you installed the test scripts etc, but what I think is
> happening is the following:
>
> You're official installation (including the compiled C code) create
> using the Windows Installer is in one place, say under
> C:\XXX\site-packages for the sake of discussion.
>
> You've unpacked the source code in another location, and are trying to
> run the test suite there.  This set of files will NOT have the
> compiled C code - and thus running some of the tests via run_tests.py
> will fail.  If you run individual test_XXX.py files this should use
> the system installed files under C:\XXX\site-packages and so the test
> should work.

Correct!

The installation documentation is lacking at least for the binary
installer. Depending on what happens, I will write down this
information.

Would be be a hassle to include the tests with the binary installer?
At least of the tests should work if they are run from that directory.

>
> It would be a bit of a hack, but you can probably overcome this by
> manually copying the installed compiled modules from
> C:\XXX\site-packages into the unpacked source code (under a suitably
> named build sub directory), or moving the Test suite next to the
> installed code.

While this would work for the binary installer, I do not think it is
suitable solution for building it from source - especially if someone
has the binary installer and is building but not necessary installing
from source.

>
> Alternatively, you could try editing run_tests.py to comment out the
> path "magic" so that is just uses the system installation of Biopython
> (rather than trying to use the local copy it expects you to have just
> built from source), i.e. try commenting out these two lines in
> run_tests.py found near the start of the main function:
>
> sys.path.insert(1, source_path)
> sys.path.insert(1, build_path)

I think the best solution is to fix this part because these assume the
location of the source and build directories even if these are not
really present. I would suggest we add a new commandline option that
causes the source_path and/or build_path variables to be undefined
forcing Python to use the installed versions. Passing a user-specified
path is also an option but these can get long.


> However, I'm no longer surprised that the C code tests are failing,
> and don't think this is a bug per se.

Agreed - just a case that has not been addressed yet.

>
>> For the SCOP tests, this is as you say, a 'end of line' issue between
>> windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and
>> saved it with a new name. The line from testIndex in test_SCOP_Cla.py that
>> gave the error index['d4hbia_'] works with the new file but not the old
>> file.
>
> Good to confirm that.  If you spot an easy cross platform fix so that
> the SCOP code can cope with either line ending that would be good, but
> I didn't consider this worth sending much time on.

When I get to my system, I will see if my Linux system will accept the
file correctly because the other SCOP tests did work. If I get time I
will try to look at that as I looked at the function and I think it is
just the way the file is being used.
>
>> I also installed reportlab and biosql and these pass the tests (except for
>> the mysql warning with Biosql that Peter reported).
>
> Good.  Out of interest, which BioSQL warning are you talking about?
>
> Peter

Sorry, I do not have that handy but it is depreciation one for a
setting that will be gone in MySQL 5.2.

Bruce


From biopython at maubp.freeserve.co.uk  Thu Jan 15 12:46:21 2009
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 15 Jan 2009 12:46:21 +0000
Subject: [Biopython-dev] Running Biopython tests on windows xp
In-Reply-To: <bbcd77d00901141910s304ef76brc92257af459361b0@mail.gmail.com>
References: <496E49FD.4080305@gmail.com>
	<320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com>
	<496E6BBB.2020506@gmail.com>
	<320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com>
	<bbcd77d00901141910s304ef76brc92257af459361b0@mail.gmail.com>
Message-ID: <320fb6e00901150446j57748cf0mb493601444a9422d@mail.gmail.com>

>>
>> I think I understand what's going on now.  All these failures are
>> essentially due to the unusual and unexpected setup on your machine
>> (or for the SCOP tests, the line endings).
>
> I do not see it as unusual as it does follow the instructions. But
> these clearly need some enhancement to address perhaps a variation of
> one of the options below.

There are no instructions on how to install Biopython on Windows using
the provided installer and then run the unit tests - so I don't
understand what you mean by you followed the instructions.  If the
installer came with the unit tests then this would be sensible.

Right now the only documented way to run the unit tests is part of an
installation from source.

>> You've unpacked the source code in another location, and are trying to
>> run the test suite there.  This set of files will NOT have the
>> compiled C code - and thus running some of the tests via run_tests.py
>> will fail.  If you run individual test_XXX.py files this should use
>> the system installed files under C:\XXX\site-packages and so the test
>> should work.
>
> Correct!
>
> The installation documentation is lacking at least for the binary
> installer. Depending on what happens, I will write down this
> information.
>
> Would be be a hassle to include the tests with the binary installer?

I don't know enough about distutils to answer that.  So the short
answer is yes, it might be a hassle.

> At least of the tests should work if they are run from that directory.

Which directory?

>> It would be a bit of a hack, but you can probably overcome this by
>> manually copying the installed compiled modules from
>> C:\XXX\site-packages into the unpacked source code (under a suitably
>> named build sub directory), or moving the Test suite next to the
>> installed code.
>
> While this would work for the binary installer, I do not think it is
> suitable solution for building it from source - especially if someone
> has the binary installer and is building but not necessary installing
> from source.

The hack suggested was specifically for combining the installed files
from the Windows installer with the test suite by hand - you don't
need to do anything special if you are building from source.  The
current run_tests.py should work perfectly for anyone building from
source (on Windows, Linux and Mac).  You can (and ideally should)
build biopython, and then run the tests BEFORE installing it.

>> Alternatively, you could try editing run_tests.py to comment out the
>> path "magic" so that is just uses the system installation of Biopython
>> (rather than trying to use the local copy it expects you to have just
>> built from source), i.e. try commenting out these two lines in
>> run_tests.py found near the start of the main function:
>>
>> sys.path.insert(1, source_path)
>> sys.path.insert(1, build_path)
>
> I think the best solution is to fix this part because these assume the
> location of the source and build directories even if these are not
> really present.

If you are building from source this is a safe assumption (and in fact
the code does check they exist).  We WANT to run the tests using the
just built and not yet installed files!

> I would suggest we add a new commandline option that
> causes the source_path and/or build_path variables to be undefined
> forcing Python to use the installed versions. Passing a user-specified
> path is also an option but these can get long.

Yes, an option to run_test.py to use the system installed version of
Biopython could solve this particular situation.  Alternatively, and
perhaps more simply for the end user, we could add a prompt if there
is no build directory to ask the user if they want to run the tests
using an already installed version of Biopython.  I might have time to
come up with a patch for this...

>> However, I'm no longer surprised that the C code tests are failing,
>> and don't think this is a bug per se.
>
> Agreed - just a case that has not been addressed yet.

----------------------------------------------------------------------------------------------

>>> I also installed reportlab and biosql and these pass the tests (except for
>>> the mysql warning with Biosql that Peter reported).
>>
>> Good.  Out of interest, which BioSQL warning are you talking about?
>>
>> Peter
>
> Sorry, I do not have that handy but it is depreciation one for a
> setting that will be gone in MySQL 5.2.

You might be referring to BioSQL Bug 2568,
http://bugzilla.open-bio.org/show_bug.cgi?id=2568

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 14:37:57 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 09:37:57 -0500
Subject: [Biopython-dev] [Bug 2733] New: Unit tests incorrectly assume that
	Biopthyon was built from source
Message-ID: <bug-2733-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733

           Summary: Unit tests incorrectly assume that Biopthyon was built
                    from source
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: minor
          Priority: P4
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


If Biopython is not built from source and the tests are run from a different
place than the installation, the test that use C objects fail because these are
not found (an example is below).

Currently the test environment uses the Biopython in the build directory. It
would be nice to be able to optionally specify some other Biopython such as the
installed version using say a command line argument.

Example of a failure:

======================================================================
ERROR: test_KDTree                                                    
----------------------------------------------------------------------
Traceback (most recent call last):                                    
  File "run_tests.py.orig", line 125, in runTest                      
    self.runSafeTest()                                                
  File "run_tests.py.orig", line 138, in runSafeTest                  
    cur_test = __import__(self.test_name)                             
  File "test_KDTree.py", line 10, in <module>                         
    from Bio.KDTree.KDTree import _neighbor_test, _test               
  File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/__init__.py",
line 10, in <module>
    from KDTree import KDTree                                                   
  File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/KDTree.py",
line 20, in <module>  
    from Bio.KDTree import _CKDTree                                             
ImportError: cannot import name _CKDTree  
======================================================================


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 14:44:15 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 09:44:15 -0500
Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that
	Biopthyon was built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151444.n0FEiFd8020991@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #1 from bsouthey at gmail.com  2009-01-15 09:44 EST -------
Created an attachment (id=1197)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1197&action=view)
Patch to avoid adding source path if Biopython is not built from source

This is a simple path to that just moves the inclusion of the source path to
being conditional on the presence of the build directory. That is, if a build
directory exists, then we assume that Biopython was built from the source. But
if the build directory does not exist then the source path is not added and the
test environment will use the installed Biopython and not the source directory. 

This patch works on a Linux system with the build directory removed and a
Windows XP system using the binary Biopython installer.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 15:20:58 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 10:20:58 -0500
Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that
	Biopthyon was built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151520.n0FFKwqZ024124@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-15 10:20 EST -------
Created an attachment (id=1198)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view)
Patch to Tests/run_tests.py

Bruce,

Could you try out this alternative patch which tries to tell the user what is
happening in this atypical situation.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 15:26:13 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 10:26:13 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151526.n0FFQD5F024483@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|minor                       |enhancement
            Summary|Unit tests incorrectly      |Runing unit tests where
                   |assume that Biopthyon was   |Biopthyon wasn't built from
                   |built from source           |source


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-15 10:26 EST -------
Retitling bug and marking it as an enhancement.

The main use case for this is Windows users who installed Biopython from one
our Windows Installers (pre-compiled, does not include the unit tests), and
later download and unzip the source code archive in order to run the unit
tests.

As Bruce points out, this might also apply to Linux users who install a
Biopython package (pre-compiled, and presumably not including the unit tests),
and then want to run the unit tests without themselves compiling Biopython.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 15:41:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 10:41:34 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151541.n0FFfYgG025830@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #4 from dalloliogm at gmail.com  2009-01-15 10:41 EST -------
(In reply to comment #0)

What about re-organizing the tests in three categories:
- the ones needed to make sure the modules don't contain errors
- the ones needed to make sure that biopython can run correctly in the user's
environment
- the ones needed to make sure that the C modules are compiled correctly.

Usually, people don't need to repeat the tests from case 1, but only case 2 and
in 3 if they have compiled biopython by theirselves.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 16:09:34 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 11:09:34 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151609.n0FG9Y5V028318@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #5 from bsouthey at gmail.com  2009-01-15 11:09 EST -------
(In reply to comment #2)
> Created an attachment (id=1198)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view) [details]
> Patch to Tests/run_tests.py
> 
> Bruce,
> 
> Could you try out this alternative patch which tries to tell the user what is
> happening in this atypical situation.
> 
> Peter
> 

Very quickly it works for my Linux system where I removed the build directory
but have Biopython installed. I will let you known for Windows and also when
Biopython is not installed. But I do not foresee any problems with the patch.

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 17:18:31 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 12:18:31 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901151718.n0FHIVSm001687@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-15 12:18 EST -------
(In reply to comment #4)
> (In reply to comment #0)
> 
> What about re-organizing the tests in three categories:
> - the ones needed to make sure the modules don't contain errors
> - the ones needed to make sure that biopython can run correctly
>   in the user's environment
> - the ones needed to make sure that the C modules are compiled correctly.
> 
> Usually, people don't need to repeat the tests from case 1, but only
> case 2 and in 3 if they have compiled biopython by theirselves.

Case 1 applies to all the unit tests.
Case 2 applies to all the unit tests whose dependencies are present.
Case 3 applies to those modules with C code.

I don't really understand your divisions.  If was compiling Biopython myself,
I've want all the tests run.  If I installed a pre-compiled version Biopython
(from a Linux distribution or the Windows installers), I'd still want to try
and run all the tests.

There is the special case of trying to use Biopython without the C code modules
(e.g. installing from source without a C compiler, or for repackaging a subset
of the modules), but that is atypical.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 20:31:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 15:31:21 -0500
Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't
	built from source
In-Reply-To: <bug-2733-42@http.bugzilla.open-bio.org/>
Message-ID: <200901152031.n0FKVLDp015913@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2733


------- Comment #7 from bsouthey at gmail.com  2009-01-15 15:31 EST -------
(In reply to comment #5)
> (In reply to comment #2)
Just to confirm that it works as expected with windows xp 
1) Without Biopython installed

C:\Documents and
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py
You do not seem to have built Biopython from source.
You do not seem to have installed Biopython.

2) With Biopython installed:
C:\Documents and
Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy
thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py
You do not seem to have built Biopython from source.
Unit tests will be run using the installed Biopython.
test_trie ... ok

----------------------------------------------------------------------
Ran 1 test in 0.731s

OK


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 15 23:55:14 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 15 Jan 2009 18:55:14 -0500
Subject: [Biopython-dev] [Bug 2734] New: db.load problem with postgresql and
	psycopg2
Message-ID: <bug-2734-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734

           Summary: db.load problem with postgresql and psycopg2
           Product: Biopython
           Version: 1.49
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: stephen at blackrim.net


I have a simple script to load sequences into a postgresql database using the
biosql schema and biopython db.load function. 

here is the script :

from Bio import GenBank
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="psycopg2", user=...)
db = server["plants"]
for i in range(37):
        handle = open("PLN/gbpln"+str(i+1)+".seq", "r")
        db.load(SeqIO.parse(handle,"genbank"))
        handle.close()
        print str(i+1)
server.adaptor.commit()

there is an error with the output and here it is with some of the psycopg2
debug info:

asis_dealloc: deleted asis object at 0x52350, refcnt = 0
psyco_curs_execute: cvt->refcnt = 1
curs_execute: pg connection at 0x8d0c00 OK
pq_begin: pgconn = 0x8d0c00, isolevel = 1, status = 2
pq_begin: transaction in progress
pq_execute: executing SYNC query:
   SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id =
"3" AND dbxref_id = "6"
pq_execute: entering syncronous DBAPI compatibility mode
pq_fetch: pgstatus = PGRES_FATAL_ERROR
pq_fetch: uh-oh, something FAILED
pq_fetch: fetching done; check for critical errors
psyco_curs_execute: res = -1, pgres = 0x0
Traceback (most recent call last):
 File "add_seqs_subdb2 2.py", line 9, in <module>
   db.load(SeqIO.parse(handle,"genbank"))
 File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420,
in load
   db_loader.load_seqrecord(cur_record)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in
load_seqrecord
   self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in
_load_seqfeature
   self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in
_load_seqfeature_qualifiers
   seqfeature_id)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in
_load_seqfeature_dbxref
   self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1)
 File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in
_get_seqfeature_dbxref
   dbxref_id))
 File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295,
in execute_and_fetch_col0
   self.cursor.execute(sql, args or ())
psycopg2.ProgrammingError: column "3" does not exist
LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db...

it seems like there could be some issues with the double quotes but i am not
sure where that is being called. i am using postgresql 8.2.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 16 10:24:16 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 16 Jan 2009 05:24:16 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901161024.n0GAOGFA015422@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-16 05:24 EST -------
Hi Stephen,

Does this happen for all the files you've tried, or just one or two?  If its
the later it may be something funny about the file and how its been parsed. 
I'm guessing you downloaded the GenBank files from
ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing.

Have you tried running the Biopython unit tests - in particular the two for
BioSQL?  I presume you installed Biopython from source on your Mac, so you
should have all the files present.  You'll need to edit the file
Tests/setup_BioSQL.py to point to a suitable postgresql test database.

P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to
import Bio.GenBank (first line of code snippet).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 16 19:12:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 16 Jan 2009 14:12:28 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901161912.n0GJCSWO030831@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #2 from stephen at blackrim.net  2009-01-16 14:12 EST -------
Hi Peter,
Thanks for the quick reply. I will try to answer everything here. So I just
reran the BioSQL tests and I get 
test_BioSQL ... ok
test_BioSQL_SeqIO ... ok

so seems like everything there is fine (and I did configure the test for
postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it
happens not only with all the files but also with the example on the biopython
biosql wiki page. Specifically with this example:
from Bio import Entrez
from Bio import SeqIO
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="psycopg2", ...)
db = server["plants"]
handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289",
rettype="genbank")
db.load(SeqIO.parse(handle, "genbank"))
server.adaptor.commit()

I get the same error:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420,
in load
    db_loader.load_seqrecord(cur_record)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in
load_seqrecord
    self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in
_load_seqfeature
    self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in
_load_seqfeature_qualifiers
    seqfeature_id)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in
_load_seqfeature_dbxref
    self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1)
  File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in
_get_seqfeature_dbxref
    dbxref_id))
  File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295,
in execute_and_fetch_col0
    self.cursor.execute(sql, args or ())
psycopg2.ProgrammingError: column "3" does not exist
LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db...

Thanks for any help. 
Stephen

(In reply to comment #1)
> Hi Stephen,
> 
> Does this happen for all the files you've tried, or just one or two?  If its
> the later it may be something funny about the file and how its been parsed. 
> I'm guessing you downloaded the GenBank files from
> ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing.
> 
> Have you tried running the Biopython unit tests - in particular the two for
> BioSQL?  I presume you installed Biopython from source on your Mac, so you
> should have all the files present.  You'll need to edit the file
> Tests/setup_BioSQL.py to point to a suitable postgresql test database.
> 
> P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to
> import Bio.GenBank (first line of code snippet).
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jan 17 10:09:21 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 17 Jan 2009 05:09:21 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901171009.n0HA9Lk3027163@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #3 from cymon.cox at gmail.com  2009-01-17 05:09 EST -------
Hi Stephen,

2009/1/16  <bugzilla-daemon at portal.open-bio.org>:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2734
>
> ------- Comment #2 from stephen at blackrim.net  2009-01-16 14:12 EST -------
> Hi Peter,
> Thanks for the quick reply. I will try to answer everything here. So I just
> reran the BioSQL tests and I get
> test_BioSQL ... ok
> test_BioSQL_SeqIO ... ok
>
> so seems like everything there is fine (and I did configure the test for
> postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it
> happens not only with all the files but also with the example on the biopython
> biosql wiki page. Specifically with this example:
> from Bio import Entrez
> from Bio import SeqIO
> from BioSQL import BioSeqDatabase
> server = BioSeqDatabase.open_database(driver="psycopg2", ...)
> db = server["plants"]
> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289",
> rettype="genbank")
> db.load(SeqIO.parse(handle, "genbank"))
> server.adaptor.commit()

This code works form me:
[cymon at chara ~]$ python
Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio import Entrez
>>> from Bio import SeqIO
>>> from BioSQL import BioSeqDatabase
>>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test")
>>> db = server.new_database("blah", description="Just for testing")
>>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank")
>>> server.adaptor.commit()
>>>

What versions of biopython and the BioSQL schema are you using?

Cymon


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Sat Jan 17 10:50:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 17 Jan 2009 05:50:19 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901171050.n0HAoJZa029834@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #4 from cymon.cox at gmail.com  2009-01-17 05:50 EST -------
> This code works form me:
> [cymon at chara ~]$ python
> Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36)
> [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from Bio import Entrez
> >>> from Bio import SeqIO
> >>> from BioSQL import BioSeqDatabase
> >>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test")
> >>> db = server.new_database("blah", description="Just for testing")
> >>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank")
> >>> server.adaptor.commit()
> >>>

Sorry forgot to load it! :)

>>> db.load(SeqIO.parse(handle, "genbank"))
3
>>> server.adaptor.commit()
>>> 

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan 21 18:22:47 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 Jan 2009 13:22:47 -0500
Subject: [Biopython-dev] [Bug 2738] New: Speed up GenBank parsing,
	in particular location parsing
Message-ID: <bug-2738-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738

           Summary: Speed up GenBank parsing, in particular location parsing
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This is an enhancement "bug", for trying to improve the speed of parsing
GenBank files WITHOUT any functionality changes.  From previous profiling, I
have found that the location parsing looks like an easy target.  However, this
code is non-trivial so we should proceed with caution.

Possible patch to follow...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan 21 18:30:27 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 21 Jan 2009 13:30:27 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901211830.n0LIURFx009561@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-21 13:30 EST -------
Created an attachment (id=1206)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1206&action=view)
Patch for Bio/GenBank/__init__.py to handle simple locations with re

This patch handles the simple cases (non-fuzzy, no database references) using
simple python and regular expressions.  Everything else works by falling back
on the old spark based Bio.GenBank.LocationParser code (e.g. fuzzy locations).

The new code is pretty simple, and could potentially be extended to cover all
the currently used location strings found in the feature table, allowing us to
remove the use of Bio.GenBank.LocationParser, which in the long term this could
lead to an overall code simplification.

In the short term, this patch does complicate the location parsing because it
means there are effectively two ways we parse the location strings (my new
code, and the old spark based Bio.GenBank.LocationParser code).

However, from my limited testing using Python 2.5 on the Mac with GenBank files
for large bacterial genomes, this may be a price worth paying.  I'll like
independent measurements (and to check this on other platforms), but this does
seem to more than halve the time taken to parse GenBank files!


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 22 18:58:18 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 22 Jan 2009 13:58:18 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901221858.n0MIwIpR000974@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-22 13:58 EST -------
Created an attachment (id=1208)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view)
Simple test script for timing GenBank parsing

I've attached a trivial script to time parsing all the GenBank files in 
directory to help anyone wanting to benchmark this change.

(In reply to comment #1)
> However, from my limited testing using Python 2.5 on the Mac with GenBank
> files for large bacterial genomes, this may be a price worth paying.  I'll
> like independent measurements (and to check this on other platforms), but
> this does seem to more than halve the time taken to parse GenBank files!

Further testing with Python 2.5 on Linux, this time also with some large
Eurakyotics files, appears to confirm a very large speed up (most obvious on
feature rich GenBank files of course).

I still want to check this on other versions of python...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 23 08:43:01 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 Jan 2009 03:43:01 -0500
Subject: [Biopython-dev] [Bug 2740] New: Wise test fails with wise 2.4.1
Message-ID: <bug-2740-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2740

           Summary: Wise test fails with wise 2.4.1
           Product: Biopython
           Version: 1.49
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Unit Tests
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: charles-debian-nospam at plessy.org


Dear Biopython developers,

The test for wise fails with wise 2.4.1 and Biopython 1.49. I think one gap is
missing in the reference used in the test script (probably that wise changed
its gap opening penalties):

anx159???Tests???$ dnal Wise/human_114_g01_exons.fna_01
Wise/human_114_g02_exons.fna_01 
Warning Error
        Strangely truncated line in fasta file
Warning Error
        Strangely truncated line in fasta file
DnaAlign Matrix calculation: [  14000] Cells 95%
Score 114
Warning Error
        Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than
allowed name block (12). Truncating

Warning Error
        Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than
allowed name block (12). Truncating

ENSG00000172135   AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC 
                  A GGAA GCCCC  AGCTC  CT  TCT   CT C TCC    TGC A   TGG  TCC 
ENSG000001631     ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC 


ENSG00000172191   CA                                                          
                  CA                                                          
ENSG0000016347    CA         


This is compared to a different reference result in the test script:

anx159???Tests???$ grep -A5 -B5 ENSG00000172135 test_Wise.py 
        sys.stdout = self.old_stdout

class TestWise(unittest.TestCase):
    def test_align(self):
        temp_file = Wise.align(["dnal"], ("Wise/human_114_g01_exons.fna_01",
"Wise/human_114_g02_exons.fna_01"), kbyte=100000, force_type="DNA", quiet=True)
        self.assertEqual(temp_file.readline().rstrip(), "ENSG00000172135  
AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC")

def run_tests(argv):
    test_suite = testing_suite()
    runner = unittest.TextTestRunner(sys.stdout, verbosity = 2)
    runner.run(test_suite)

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 23 12:06:29 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 Jan 2009 07:06:29 -0500
Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1
In-Reply-To: <bug-2740-42@http.bugzilla.open-bio.org/>
Message-ID: <200901231206.n0NC6T4B023669@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2740


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-23 07:06 EST -------
Thanks for the report.  Based on the following pages I had assumed the latest
version was wise 2.2.0, available here:

http://www.sanger.ac.uk/Software/Wise2/ points to
ftp://ftp.ebi.ac.uk/pub/software/unix/wise2/ which only contains up to wise
2.2.0

After some Google searching I found Ewan Birney had changed his mind and stared
work on it again:
http://www.ebi.ac.uk/~birney/wise2/

Installing wise 2.4.1 took a while (tip for Linux uses, edit file
src/models/phasemodel.c line 23 to replace isnumber by isdigit), but I can
confirm the error you reported.

This is the output from an older version of wise,

$ ~/Downloads/wise2.2.0/src/bin/dnal Wise/human_114_g01_exons.fna_01
Wise/human_114_g02_exons.fna_01 
DnaAlign Matrix calculation: [  14000] Cells 97%
Warning Error
        Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than
allowed name block (12). Truncating

Warning Error
        Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than
allowed name block (12). Truncating

ENSG00000172135   AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC 
                  A GGAA GCCCC  AGCTC  CT  TCT   CT C TCC    TGC A    GG TCCC 
ENSG000001631     ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGCTCCC 


ENSG00000172192   A                                                           
                  A                                                           
ENSG0000016348    A                                                           


Using the newer version of wise, we do indeed get a different alignment:

$ ~/Downloads/wise2.4.1/src/bin/dnal Wise/human_114_g01_exons.fna_01
Wise/human_114_g02_exons.fna_01 
DnaAlign Matrix calculation: [  14000] Cells 97%
Score 114
Warning Error
        Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than
allowed name block (12). Truncating

Warning Error
        Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than
allowed name block (12). Truncating

ENSG00000172135   AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC 
                  A GGAA GCCCC  AGCTC  CT  TCT   CT C TCC    TGC A   TGG  TCC 
ENSG000001631     ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC 


ENSG00000172191   CA                                                          
                  CA                                                          
ENSG0000016347    CA 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 23 12:28:05 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 23 Jan 2009 07:28:05 -0500
Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1
In-Reply-To: <bug-2740-42@http.bugzilla.open-bio.org/>
Message-ID: <200901231228.n0NCS5a8028823@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2740


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-23 07:28 EST -------
This should be fixed in CVS, see:

Tests/test_Wise.py revision 1.7
Tests/output/test_Wise revision 1.3

All I have done is made the unit test accept the old output, or the slightly
different output from wise 2.4.1 - the main Biopython code is unchanged.

>From the help text (just run dnal with no arguments), it appears the gap
penalties have not changed - so the differing alignments but be an algorithm
change of some sort.

Another small difference is with wise 2.4.1, even in quiet mode, dnal starts
its output by printing the score.

Thank you for reporting this,

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 10:13:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 05:13:43 -0500
Subject: [Biopython-dev] [Bug 2743] New: manual installation overwrites
	previous biopython installations
Message-ID: <bug-2743-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743

           Summary: manual installation overwrites previous biopython
                    installations
           Product: Biopython
           Version: Not Applicable
          Platform: All
               URL: http://lists.open-bio.org/pipermail/biopython/2009-
                    January/004893.html
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: dalloliogm at gmail.com


The manual biopython installation (the one made with python setup.py install)
installs all the files in a directory like this:
- /usr/lib/python2.5/site-packages/Bio

The problem comes when you want to install biopython in a system where there is
already an old version installed.
In that case, it is not clear what happens to the old installation... are all
the old files removed before the new version is installed? Or are the two
versions 'mixed'?

please refer to this discussion:
- http://lists.open-bio.org/pipermail/biopython/2009-January/004893.html


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 11:05:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 06:05:07 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281105.n0SB577F013398@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp  2009-01-28 06:05 EST -------
(In reply to comment #0)
> The manual biopython installation (the one made with python setup.py install)
> installs all the files in a directory like this:
> - /usr/lib/python2.5/site-packages/Bio
> 
> The problem comes when you want to install biopython in a system where there is
> already an old version installed.
> In that case, it is not clear what happens to the old installation... are all
> the old files removed before the new version is installed? Or are the two
> versions 'mixed'?

Isn't this what always happens when installing a Python module? If so, then it
doesn't seem to be a Biopython bug to me.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 11:14:28 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 06:14:28 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281114.n0SBESYY014510@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #2 from dalloliogm at gmail.com  2009-01-28 06:14 EST -------
(In reply to comment #1)
> (In reply to comment #0)
> > The manual biopython installation (the one made with python setup.py install)
> > installs all the files in a directory like this:
> > - /usr/lib/python2.5/site-packages/Bio
> > 
> > The problem comes when you want to install biopython in a system where there is
> > already an old version installed.
> > In that case, it is not clear what happens to the old installation... are all
> > the old files removed before the new version is installed? Or are the two
> > versions 'mixed'?
> 
> Isn't this what always happens when installing a Python module? If so, then it
> doesn't seem to be a Biopython bug to me.


Well, I don't know if it is the same behaviour for the other python modules,
but it can create dangerous situations, especially if you are 'downgrading' a
biopython installation.
The biopython installer should clarify that, asking the user if he wants to
overwrite the existing installation, change the installation path, or abort.


Anyway. the right way to install biopython should be by using easy_install.
Easy_install downloads the latest code and creates an egg, and then install
everything on a directory like this:
- /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/
automatically changing $PYTHON_PATH.

I suggest to change the biopython's wiki to tell people that they should always
prefer to install biopython with easy_install, which by the way works perfectly
and automatically checks the dependencies.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 12:46:37 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 07:46:37 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281246.n0SCkbKj028750@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-28 07:46 EST -------
(In reply to comment #1)
> > the old files removed before the new version is installed? Or are the two
> > versions 'mixed'?
> 
> Isn't this what always happens when installing a Python module? If so, then it
> doesn't seem to be a Biopython bug to me.

Agreed.  As far as I know, this affects ANY python module installed with
distutils - and indeed this is typical practice for ANY unix tool installed
from source via a make file.  It is essentially NORMAL, although not so nice
for beginners.

Linux distributions will often provide packaged versions of python libraries
(including Biopython) which you can install/update/remove using the system's
package manager (e.g. apt, yum, up2date etc).  The only downside to me is they
won't always have the latest version of each package.

I suppose we could add a hack to setup.py to check if there is already a
Biopython installation present (try doing "import Bio"), and if it is
installed, ask the user if they want to continue.  However, there are
legitimate situations where this just makes things more confusing.  e.g. You
don't have admin rights on a unix machine where your systems administrator has
provided python and an old version of Biopython, so you want to install the
latest version of Biopython under your home directory.

(In reply to comment #2)
> I suggest to change the biopython's wiki to tell people that they should
> always prefer to install biopython with easy_install, which by the way works
> perfectly and automatically checks the dependencies.

For now distutils is still the python standard, while easy_install is an
non-standard optional extra.  This in some ways using easy_install is more
work.

Note that easy_install doesn't provide a simple uninstall either:
http://peak.telecommunity.com/DevCenter/EasyInstall#uninstalling-packages


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Wed Jan 28 15:23:48 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 28 Jan 2009 10:23:48 -0500
Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous
	biopython installations
In-Reply-To: <bug-2743-42@http.bugzilla.open-bio.org/>
Message-ID: <200901281523.n0SFNmqQ013945@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2743


------- Comment #4 from bsouthey at gmail.com  2009-01-28 10:23 EST -------
(In reply to comment #3)
> (In reply to comment #1)
> > > the old files removed before the new version is installed? Or are the two
> > > versions 'mixed'?
> > 
> > Isn't this what always happens when installing a Python module? If so, then it
> > doesn't seem to be a Biopython bug to me.
> 
> Agreed.  As far as I know, this affects ANY python module installed with
> distutils - and indeed this is typical practice for ANY unix tool installed
> from source via a make file.  It is essentially NORMAL, although not so nice
> for beginners.
> 

Agreed that this is not a Biopython bug but a Python feature.

Yes, the installation is usually 'mixed' when installing from source. The setup
will remove the existing egg-info and then a new one. Python copies the files
to the appropriate place thus overwriting any old files with new versions but
old files that are no longer present or files with different names will remain.
To my knowledge, Python and Biopython will not know about those files unless a
user explicitly tries to use them. 

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 17:41:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 12:41:19 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291741.n0THfJYC018518@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #3 from bsouthey at gmail.com  2009-01-29 12:41 EST -------
First, I object to this patch because it replaces the current version without
keeping the old code. It should create a new parsing function so verify that
the old and new versions provide exactly the same output for the same input. 

As indicated below, it does speed things up! So I have no problems for it to
replace the current parsing code in the next release provided that the old
parsing code remains as depreciated function. (Alternatively add a conditional
statement with a flag to avoid this new code as required.) 

(In reply to comment #2)
> Created an attachment (id=1208)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view) [details]
> Simple test script for timing GenBank parsing
> 
> I've attached a trivial script to time parsing all the GenBank files in 
> directory to help anyone wanting to benchmark this change.
> 
> (In reply to comment #1)
> > However, from my limited testing using Python 2.5 on the Mac with GenBank
> > files for large bacterial genomes, this may be a price worth paying.  I'll
> > like independent measurements (and to check this on other platforms), but
> > this does seem to more than halve the time taken to parse GenBank files!
> 
> Further testing with Python 2.5 on Linux, this time also with some large
> Eurakyotics files, appears to confirm a very large speed up (most obvious on
> feature rich GenBank files of course).
> 
> I still want to check this on other versions of python...
> 

I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5 and
2.6) and noted that this halved the time required to parse a Genbank
Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb) with
213942 records with total length 158245604 bp). 

While the number of records and sequences are the same, I have not checked if
the patched version is providing exactly the same output as the unpatched
version. This is very important for the different types of GenBank files (Whole
Genome Shotgun and CON types).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 17:57:22 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 12:57:22 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291757.n0THvMVl023111@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-29 12:57 EST -------
(In reply to comment #3)
> First, I object to this patch because it replaces the current version without
> keeping the old code.

It does keep the old code, and explicitly uses the old code for the non-simple
locations.

> It should create a new parsing function so verify that
> the old and new versions provide exactly the same output for the same input. 

We should probably extend the Biopython GenBank/EMBL parsing unit tests to make
sure this patch doesn't break anything, and additionally have some extra test
cases using big GenBank files which won't become official unit tests.  This
could be as simple as a script which parses all the records in a set of GenBank
files, printing out a very minimal summary of each feature location (including
subfeatures).  We then run the script with and without the patch, and confirm
their output matches.

Once we are happy that the patch doesn't change the parser behaviour, I don't
see any reason to offer both options to the end user.  In fact, I would prefer
to go further and REMOVE the old slow location parser after extending the
regular expression based parser to cope with ALL location variants.

> As indicated below, it does speed things up! So I have no problems for it to
> replace the current parsing code in the next release provided that the old
> parsing code remains as depreciated function. (Alternatively add a conditional
> statement with a flag to avoid this new code as required.) 

Having the new code controlled by some option would actually be pretty easy. 
Other than for testing I see no reason to do this.

> I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5
> and 2.6) and noted that this halved the time required to parse a Genbank
> Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb)
> with 213942 records with total length 158245604 bp). 

That is consistent with the speed ups I have seen - you can get even more
depending on the proportion of features in the file.  Thanks for checking
python 2.3 to 2.6, nice to see they all benefit.

> While the number of records and sequences are the same, I have not checked if
> the patched version is providing exactly the same output as the unpatched
> version. This is very important for the different types of GenBank files
> (Whole Genome Shotgun and CON types).

I agree through testing is important here.  Would you like to suggest any
particular WGS or CON files for testing with?  I'm thinking something large
with a wide range of location types would be good for checking this patch (but
not to include with Biopython).

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 18:26:09 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 13:26:09 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291826.n0TIQ9YR030903@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-29 13:26 EST -------
Created an attachment (id=1209)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1209&action=view)
Simple test script for checking GenBank location parsing

This is a simple script to help validate the location parsing has not changed. 
Intended usage is to put the script in a directory with a good set of test
GenBank files (all ending with the extension .gbk), then:

(starting with a clean install of Biopython)

$ time python parse_gbk_locs.py > old.txt

(apply the patch)

$ time python parse_gbk_locs.py > new.txt

(verify the output matches)

$ ls -l old.txt new.txt

(check file sizes agree)

$ diff old.txt new.txt

(should be no output)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 19:38:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 14:38:20 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901291938.n0TJcKh2021246@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #6 from bsouthey at gmail.com  2009-01-29 14:38 EST -------
Created an attachment (id=1210)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view)
Single test case that is not correctly parsed

I just used a simple 'print record' followed by a diff (but that does not check
the references). This record (and related ones) has a difference between
versions ...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Jan 29 21:13:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 29 Jan 2009 16:13:19 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901292113.n0TLDJ51019466@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #7 from bsouthey at gmail.com  2009-01-29 16:13 EST -------
(In reply to comment #4)
> > While the number of records and sequences are the same, I have not checked if
> > the patched version is providing exactly the same output as the unpatched
> > version. This is very important for the different types of GenBank files
> > (Whole Genome Shotgun and CON types).
> 
> I agree through testing is important here.  Would you like to suggest any
> particular WGS or CON files for testing with? 

I downloaded a few example files including WGS and CON. I found that CON files
are not parsed by either version. Not a surprise given that these have no
sequences but that is a different topic. Apart from the errors in attached
case, I have not seen any other errors (even parsing the references).

Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 11:00:24 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:00:24 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301100.n0UB0OsD002442@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:00 EST -------
(In reply to comment #6)
> Created an attachment (id=1210)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view) [details]
> Single test case that is not correctly parsed
> 
> I just used a simple 'print record' followed by a diff (but that does not
> check the references). This record (and related ones) has a difference
> between versions ...

If you do a 'print record' with a SeqRecord object, any references are shown
using their __repr__ string - which is currently the python object default
which includes a memory address (something I've been meaning to address on Bug
2544).  Different objects will have different memory locations, which will show
up in the diff.

For example, using the following as a simple test script and capturing its
output to files:

from Bio import SeqIO
record = SeqIO.read(open("CY029873.gbk"), "genbank")
print record

Running diff with and without the patch gave me:

9c9
< /references=[<Bio.SeqFeature.Reference instance at 0xb7b7bfcc>,
<Bio.SeqFeature.Reference instance at 0xb7b8412c>]
---
> /references=[<Bio.SeqFeature.Reference instance at 0x866b04c>, <Bio.SeqFeature.Reference instance at 0x866b18c>]

i.e. No real differences between the records as far as I can see.  Please
clarify - if you have found a failing example I would be most interested.

(In reply to comment #7)
> I downloaded a few example files including WGS and CON. I found that CON files
> are not parsed by either version. Not a surprise given that these have no
> sequences but that is a different topic. Apart from the errors in attached
> case, I have not seen any other errors (even parsing the references).

Could you clarify your problem with the CON files please (on a new bug, or the
mailing list - since as you point out this is a different topic).  I've just
downloaded and unzipped one of the smaller CON files and it parses fine for me:
ftp://ftp.ncbi.nih.gov/genbank/gbcon107.seq.gz

>>> from Bio import SeqIO
>>> count = 0
>>> for record in SeqIO.parse(open("gbcon107.seq"),"genbank") : count += 1
...
>>> print count
55031

As expected there is no sequence, but the name, description, features,
references etc are there.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 11:29:07 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:29:07 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301129.n0UBT7Ah008213@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:29 EST -------
I've run my test script (attachment 1209) on a Linux machine with Python 2.5

 5.5K Jan 30 10:29 CY029873.gbk
  67M Jan 22 17:53 dr_ref_chr16.gbk
  42M Jan 22 17:53 NC_003075.gbk
  14M Jan 22 18:43 NC_003272.gbk
  25M Jan 22 17:52 NC_003279.gbk
 4.8M Jan 22 18:44 NC_004350.gbk
  20M Jan 22 18:42 NC_008095.gbk
  14M Jan 22 18:44 NC_009925.gbk
  18M Jan 22 18:43 NC_010628.gbk
 296M Jan 22 17:52 ptr_ref_chr1.gbk
  86M Jan 30 10:55 wgs.AAAB.1.gnp.gbk
 297M Jan 30 10:55 wgs.AABR.10.gbff.gbk

The last two files are WGS data for protein and nucleotide sequences,
downloaded from ftp://ftp.ncbi.nih.gov/genbank/wgs/ then unzipped and a gbk
extension added so my script parses them.

With and without the patch the test script gives identical output - which
appears to confirm the location parsing is not functionally altered.  The
timings where just over 2min and just over 8min with and without the patch (a
four fold speed up on this dataset).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 11:30:30 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:30:30 -0500
Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with
	dtype="float32" on 64 bit machines.
In-Reply-To: <bug-2649-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301130.n0UBUUMm008550@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2649


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:30 EST -------
Marking as fixed - please reopen this if need be.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 11:54:26 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 06:54:26 -0500
Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for
	arguments for their types
In-Reply-To: <bug-2639-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301154.n0UBsQbw014456@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2639


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 06:54 EST -------
(In reply to comment #5)
> Ok, understood. I didn't thought of these cases.
> However, having not a Seq causes errors that are difficult to
> understand in other functions that use SeqRecord.
> For example, if you do:
> 
> >>> a = SeqRecord(id = '1')
> >>> a.format('fasta')
> 
> you get the error: 
> <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute
> 'tostring'
> 
> This could scary an eventual biopython newbie, an exception like to
> 'error - current SeqRecord object doesn't have a Seq' could be better.

Well, if you want to create a SeqRecord where the sequence is None, you'd have
to do SeqRecord(None, id="1") - your suggestion of SeqRecord(id="1") doesn't
work as the sequence is a mandatory argument.

However, I see your point that the current AttributeError isn't helpful in this
special case.  I've updated the Bio/SeqIO/FastaIO.py file in CVS (revision
1.15) to give a TypeError in this situation which will try to explain the
problem.

> What do you think about creating a 'NullSeq' object, which represent a
> Seq with no value, and using it as a default for SeqRecord?
> Later we could modify the other functions like .format e Seq.translate to
> intercept these objects and return the right error message.

Hmm.  It seems rather complicated for a rare case.  Using None to mean
"missing" or "null" is done in other python libraries/code (e.g. database
access), which is why I suggested someone might want to do this.

Marking this bug as fixed.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 12:00:19 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 07:00:19 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301200.n0UC0JcD016114@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 07:00 EST -------
(In reply to comment #3)
> 
> What versions of biopython and the BioSQL schema are you using?
> 
> Cymon

According to the bug report, Stephen was using Biopython 1.49, so:

Stephen:
Biopython 1.49
postgresql 8.2 
BioSQL - schema version unspecified
psycopg2 - version unspecified
python - version unspecified
OS - Mac OS X

What about you Cymon - you have postgresql with psycopg2 working, but what
versions of things?

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 12:13:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 07:13:52 -0500
Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation
	document
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301213.n0UCDqef019147@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 07:13 EST -------
(In reply to comment #2)
> I'm leaving this bug open until I've updated the HTML and PDF copies of the
> installation document on the website.  I don't have the tools hevea installed
> on this machine, so I can't create the HTML version of the installation
> document -- just the PDF.  I should be be able to do this next week...

Website updated.  Marking this bug as fixed. 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 12:20:06 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 07:20:06 -0500
Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and
	psycopg2
In-Reply-To: <bug-2734-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301220.n0UCK6Fp020687@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2734


------- Comment #6 from cymon.cox at gmail.com  2009-01-30 07:20 EST -------
(In reply to comment #5)
> (In reply to comment #3)
> > 
> > What versions of biopython and the BioSQL schema are you using?
> > 
> > Cymon
> 
> According to the bug report, Stephen was using Biopython 1.49, so:
> 
> Stephen:
> Biopython 1.49
> postgresql 8.2 
> BioSQL - schema version unspecified
> psycopg2 - version unspecified
> python - version unspecified
> OS - Mac OS X
> 
> What about you Cymon - you have postgresql with psycopg2 working, but what
> versions of things?
> 
> Peter
> 

Peter,

I'm using:
Biopython: CVS
Posgresql: 8.1.11
BioSQL: 1.0.1
Python: 2.5.2
Psycopg: 2.0.8 
OS: Red Hat Enterprise 5.3

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 14:16:32 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:16:32 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301416.n0UEGWeN005337@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1139 is|0                           |1
           obsolete|                            |


------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:16 EST -------
Created an attachment (id=1211)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1211&action=view)
Patch to Bio/MaxEntropy.py to make the convergence parameters optional
arguments

This should retain API backwards compatibility by using the current module
level values as the function's default arguments (see earlier comments).  I've
checked that changing these and then re-calling the train function does work as
expected.

How does this look?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 14:17:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:17:43 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301417.n0UEHhKG005438@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #1211|application/octet-stream    |text/plain
          mime type|                            |
Attachment #1211 is|0                           |1
              patch|                            |


------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:17 EST -------
(From update of attachment 1211)
Marking this as a patch (plain text)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 14:19:43 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:19:43 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301419.n0UEJhID005587@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1211 is|0                           |1
           obsolete|                            |


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:19 EST -------
(From update of attachment 1211)
Sorry - wrong version of the patch.  This doesn't cover _iis_solve_delta etc.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 14:30:40 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:30:40 -0500
Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes
	integer values for class and convergence criteria is hard coded
In-Reply-To: <bug-2697-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301430.n0UEUe04006448@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2697


------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 09:30 EST -------
Created an attachment (id=1212)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1212&action=view)
Patch to Bio/MaxEntropy.py to make the convergence parameters optional
arguments

This time its the whole patch - sorry for the extra emails this has triggered. 
I had stopped to check in a couple of docstring changes and fixed a few tabs in
MaxEntropy.py first, which confused things.

Note this is a bit different to what I was thinking in comment #5,
> ... something like this:
> 
> def train(training_set, results, feature_fns, update_fn=None,
>           max_iis_iterations = MAX_IIS_ITERATIONS,
>           iis_convere = IIS_CONVERGE,
>           max_newton_iterations = MAX_NEWTON_ITERATIONS
>           newton_coverage = NEWTON_CONVERGE):

The above code won't pick up changes to the module level variables like
MAX_IIS_ITERATIONS because the defaults are only evaluated once when the
function is created.  The patch deals with this as follows:

def train(training_set, results, feature_fns, update_fn=None,
          max_iis_iterations=None, iis_converge=None,
          max_newton_iterations=None, newton_converge=None):
    if max_iis_iterations is None :
        max_iis_iterations = MAX_IIS_ITERATIONS
    if iis_converge is None :
        iis_converge = IIS_CONVERGE
    if max_newton_iterations is None :
        max_newton_iterations = MAX_NEWTON_ITERATIONS
    if newton_converge is None :
        newton_converge = NEWTON_CONVERGE

This works :)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 14:34:23 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:34:23 -0500
Subject: [Biopython-dev] [Bug 2745] New: Bio.GenBank.LocationParserError
	with a GenBank CON file
Message-ID: <bug-2745-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745

           Summary: Bio.GenBank.LocationParserError with a GenBank CON file
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: bsouthey at gmail.com


The following file has a Bio.GenBank.LocationParserError:
ftp://ftp.ncbi.nih.gov/genbank/daily-nc/con_nc.0103.flat.gz

Partial error message (as the last line is the complete CONTIG line).

Syntax error at or near `Tokens('close_paren')' token                           
Traceback (most recent call last):                                              
  File "parse_gbk.py", line 26, in <module>                                     
    for record in SeqIO.parse(handle, "genbank") :                              
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 410, in parse_records                                
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 393, in parse                                        
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 371, in feed                                         
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py",
line 1093, in _feed_misc_lines                            
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py",
line 990, in contig_location                             
  File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py",
line 707, in location                                    
Bio.GenBank.LocationParserError:
join(DS483543.1:1..325170,gap(unk100),DS483544.1:1..218545,gap(unk100),DS483545.1:1..95394,gap(unk100),DS483546.1:1..261305,gap(unk100),DS483547.1:1..63422,gap(unk100),DS483548.1:1..77432,gap(unk100),DS483549.1:1..371434,gap(unk100),DS483550.1:1..74569,gap(unk100),DS483551.1:1..54637,gap(unk100),DS483552.1:1..73591,gap(unk100),DS483553.1:1..63632,gap(unk100),DS483554.1:1..60619,gap(unk100),DS483555.1:1..57196,gap(unk100),DS483556.1:1..95189,gap(unk100),DS483557.1:1..48586,gap(unk100),DS483558.1:1..45971,gap(unk100),DS483559.1:1..59826,gap(unk100),DS483560.1:1..49535,gap(unk100),DS483561.1:1..51083,gap(unk100),...


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 14:35:41 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:35:41 -0500
Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a
	GenBank CON file
In-Reply-To: <bug-2745-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301435.n0UEZfpC007388@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745


------- Comment #1 from bsouthey at gmail.com  2009-01-30 09:35 EST -------
Created an attachment (id=1213)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1213&action=view)
Example of a single GenBank CON record that fails


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 14:47:36 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 09:47:36 -0500
Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing,
	in particular location parsing
In-Reply-To: <bug-2738-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301447.n0UEla5Q009025@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2738


------- Comment #10 from bsouthey at gmail.com  2009-01-30 09:47 EST -------
(In reply to comment #8)
Thanks, I was able to print out the references from the annotations and I also
did not see any differences. 

I submitted a bug for the CON file.

I am a lot more comfortable with this patch now that a wide range of files have
been tested. But you can confirm that the example I provided is correctly
parsed?

Thanks
Bruce


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 15:11:56 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 10:11:56 -0500
Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a
	GenBank CON file
In-Reply-To: <bug-2745-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301511.n0UFBuEW012224@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745


------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 10:11 EST -------
It's the "gap(unk100)" entries which are breaking the location parser in
Bruce's examples.  Similarly even "gap()" entries of unknown length like this
will fail:

LOCUS       AH007743     7832 bp    DNA             CON       26-MAY-1999
DEFINITION  Gallus gallus ornithine transcarbamylase (OTC) gene, complete cds.
ACCESSION   AH007743
VERSION     AH007743.1  GI:4927367
KEYWORDS    .
SOURCE      chicken.
  ORGANISM  Gallus gallus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Archosauria;
            Aves; Neognathae; Galliformes; Phasianidae; Phasianinae; Gallus.
[....]
FEATURES             Location/Qualifiers
     source          1..7832
                     /organism="Gallus gallus"
                     /db_xref="taxon:9031"
                     /chromosome="1"
CONTIG      join(AF065630.1:1..1903,gap(),AF065631.1:1..435,gap(),
            AF065632.1:1..509,gap(),AF065633.1:1..722,gap(),AF065634.1:1..707,
            gap(),AF065635.1:1..836,gap(),AF065636.1:1..1614,gap(),
            AF065637.1:1..605,gap(),AF065638.1:1..501)
//

Example based on ftp://ftp.ncbi.nih.gov/genbank/README.genbank although this
does not describe the new terms.  Older versions of the release notes do, e.g.
ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb168.release.notes

========================= [start quote] =========================

3.4.15 CONTIG Format

  As an alternative to SEQUENCE, a CONTIG record can be present
following the ORIGIN record. A join() statement utilizing a syntax
similar to that of feature locations (see the Feature Table specification
mentioned in Section 3.4.12) provides the accession numbers and basepair
ranges of other GenBank sequences which contribute to a large-scale
biological object, such as a chromosome or complete genome. Here is
an example of the use of CONTIG :

CONTIG      join(AE003590.3:1..305900,AE003589.4:61..306076,
            AE003588.3:61..308447,AE003587.4:61..314549,AE003586.3:61..306696,
            AE003585.5:61..343161,AE003584.5:61..346734,AE003583.3:101..303641,

            [ lines removed for brevity ]

            AE003782.4:61..298116,AE003783.3:16..111706,AE002603.3:61..143856)

However, the CONTIG join() statement can also utilize a special operator
which is *not* part of the syntax for feature locations:

        gap()     : Gap of unknown length.

        gap(X)    : Gap with an estimated integer length of X bases.

                    To be represented as a run of n's of length X
                    in the sequence that can be constructed from
                    the CONTIG line join() statement .

        gap(unkX) : Gap of unknown length, which is to be represented
                    as an integer number (X) of n's in the sequence that
                    can be constructed from the CONTIG line join()
                    statement.

                    The value of this gap operator consists of the 
                    literal characters 'unk', followed by an integer.

Here is an example of a CONTIG line join() that utilizes the gap() operator:

CONTIG      join(complement(AADE01002756.1:1..10234),gap(1206),
            AADE01006160.1:1..1963,gap(323),AADE01002525.1:1..11915,gap(1633),
            AADE01005641.1:1..2377)

The first and last elements of the join() statement may be a gap() operator.
But if so, then those gaps should represent telomeres, centromeres, etc.

Consecutive gap() operators are illegal.

========================= [end quote] =========================

Evidently Biopython doesn't cope with these CONTIG lines - but then they do
have a different syntax to the feature locations.  I never understood why the
current code tries to parse the CONTIG string into a SeqFeature object in the
first place.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 15:36:52 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 10:36:52 -0500
Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements
In-Reply-To: <bug-2681-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301536.n0UFaq5u015637@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2681


------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 10:36 EST -------
(In reply to comment #2)
> > 'contig' is ignored by loader because it's a SeqFeature object. Is there any
> > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)
> 
> I couldn't even say off hand how the CONTIG line in that example would be
> parsed, let alone how it gets dealt with when loading into BioSQL.

Basically the CONTIG line looks rather a lot like a feature location, typically
the join of lots of (external) sequences.  It makes some sense to parse this
into an object structure, which given the way joins are handled for features,
this lead the original author to represent the CONTIG information as a dummy
feature with lots of sub features.  Given the CONTIG can also include gaps (of
unknown length), this doesn't quite fit the current SeqFeature location objects
(see Bug 2745).

If we extend the location objects to cope with these gaps, then perhaps the
CONTIG can stay as a SeqFeature in which case for BioSQL maybe we should record
it in the SeqFeature table.  We'd have to invent a way to record these gap
locations though.

However, if we just stored the CONTIG line as a raw string, we could then store
it in BioSQL as just another bioentry qualifier (assuming it doesn't overflow
the text field limit).

I've checked how and where BioPerl stores the contig information using the
example Bruce used on Bug 2745, attachment 1213, and see that the CONTIG
information is stored in the bioentry_qualifier_value table under the term
"contig" under the ontology "Annotation Tags".  They have retained the separate
lines, storing each as a separate entry with an increasing rank.

Thus for compatibility with BioSQL, it would make sense for the GenBank parser
to store the CONTIG line as a simple string (or list of strings), and not as a
SeqFeature (which is currently half broken anyway - see Bug 2745).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 16:20:18 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 11:20:18 -0500
Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a
	GenBank CON file
In-Reply-To: <bug-2745-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301620.n0UGKIXW024960@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2745


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 11:20 EST -------
Created an attachment (id=1214)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1214&action=view)
Treat the CONTIG information as a string, not a SeqFeature

As outlined on Bug 2681 comment 8, there are good reasons to simply store the
CONTIG information as a string or perhaps a list of strings.  This will make
our BioSQL bindings consistent with BioPerl.

More generally, I never really liked the idea of storing the CONTIG location as
a SeqFeature.  I could understand in principle using a location-object, but the
current location objects do not deal with joins directly - which is why you
have to use a SeqFeature with subfeatures.

In the long term, a new location object might be a worthwhile change to both
features and the contig.  For now, this patch simply stores the CONTIG
information as one long string.

If we commit this, then Tests/output/test_GenBank will need updating too.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Fri Jan 30 16:54:20 2009
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 30 Jan 2009 11:54:20 -0500
Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation
	document
In-Reply-To: <bug-2723-42@http.bugzilla.open-bio.org/>
Message-ID: <200901301654.n0UGsK0D003024@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=2723


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-01-30 11:54 EST -------
This is fixed now.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.