From bugzilla-daemon at portal.open-bio.org Thu Jan 1 20:37:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 1 Jan 2009 20:37:43 -0500 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200901020137.n021bhEB022751@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 ------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2009-01-01 20:37 EST ------- Can I instantiate GenBank file, reverse-complement the sequence (keep letter casing) in the SeqIO object and dump it back to a GenBank file? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 2 13:15:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 2 Jan 2009 13:15:46 -0500 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200901021815.n02IFkcf012662@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-02 13:15 EST ------- (In reply to comment #4) > Can I instantiate GenBank file, reverse-complement the sequence > (keep letter casing) in the SeqIO object and dump it back to a > GenBank file? I think this question would have been better handled on the mailing lists, rather than on this bug. Note that currently our GenBank output via Bio.SeqIO does not include the features and references - see Bug 2294. I would do this based on the approach described in the tutorial, which assumes there could be many records in the input file. Here is a variation for just one record (untested): from Bio import SeqIO from Bio.SeqRecord import SeqRecord record = SeqIO.read(open("example.gbk"), "genbank") rc_record = SeqRecord(seq = record.seq.reverse_complement(), \ id = "rc_" + record.id, \ name = "rc_" + record.name, \ description = "reverse complement") out_handle = open("rc_example.gbk","w") SeqIO.write([rc_record], out_handle, "genbank") out_handle.close() Note you *could* override the record's sequence in situ: record.seq = record.seq.reverse_complement() #BAD IDEA This is a bad idea because none of the annotations will have been changed - in addition to the name/id/description still being the same, all the feature locations etc will still be for the forward sequence. -- I'm leaving this bug open for defining __repr__ for the Bio.SeqFeature.Reference object (and perhaps tweaking the display of the references in the SeqRecord __str__ method) ONLY. Please continue any other discussion on the mailing lists. Thanks. Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 3 17:18:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 17:18:56 -0500 Subject: [Biopython-dev] [Bug 2723] New: Clarify what applies to which version of biopython and other doc cleanup Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2723 Summary: Clarify what applies to which version of biopython and other doc cleanup Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: mmokrejs at ribosome.natur.cuni.cz I went to look around at the docs because the built-in tests of 1.49 setup.py spitted some messages about external programs missing. I haven't found any hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/. Anyway, looking at http://biopython.org/DIST/docs/install/Installation.html#htoc17 I see: "3.4 mxTextTools (no longer needed)". I would propose: 3.4 mxTextTools (no longer needed since 1.49) Similarly: - 3.1 Numerical Python (NumPy) (strongly recommended) + 3.1 Numerical Python (NumPy) (strongly recommended since 1.49) Bad URL links are in the text: 3.3 Database Access (MySQLdb, ...) (optional) [cut] Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be used for accessing BioSQL databases through Biopython (see ). Again if you are -----------------------------------------------------------^ not going to use BioSQL, there shouldn???t be any need to install these modules. 3.4 mxTextTools (no longer needed) [cut] However, we currently recommend you install mxTextTools 2.0, as some of the API changes made in 3.0 version were not compatible with Biopython. Goto to download ---------------------------------------------------------------------^^ this. I haven't found an answer for me yet: test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDist. ok test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. ok test_PopGen_SimCoal_nodepend ... ok test_ProtParam ... ok test_Registry ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SProt ... ok test_SVDSuperimposer ... ok test_SeqIO ... ok test_SeqIO_online ... ok test_SeqUtils ... ok test_SubsMat ... ok test_UniGene ... ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. ok test_align ... ok test_docstrings ... ok test_geo ... ok test_interpro ... ok test_kNN ... ok test_lowess ... ok test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... ok test_prosite ... ok test_prosite2 ... ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. ok test_seq ... ok test_translate ... ok test_trie ... ok test_triefind ... ok ---------------------------------------------------------------------- Ran 96 tests in 172.215s OK Pointer to those packages would have been helpful. From the test suite as well as from installation manual. Moreover, what database username/password would I have to make to get the BioSQL stuff compiled and tested? ^H^H^H^H^H^H I see, it gets compiled anyway the tests just were not run. The installation manual and the output from test suite should be clearer. Thanks, Peter! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 3 17:30:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 17:30:55 -0500 Subject: [Biopython-dev] [Bug 2724] New: Unclear? changes between 1.47 and 1.49 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2724 Summary: Unclear? changes between 1.47 and 1.49 Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mmokrejs at ribosome.natur.cuni.cz I had a look by diff(1) what files were installed on my machine by 1.47 release and which were installed by 1.49. I don't know what cdistance was about but the mailing list archive search tool does not work, and searching for it manually in raw archives of Oct and Nov 2008 did not help. The second file shown here contains a white space in a filename, not critical but maybe good to rename in next release. -/usr/lib/python2.5/site-packages/Bio/cdistance.so +/usr/share/biopython/Tests/Clustalw/temp horses.dnd -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 3 20:10:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 20:10:02 -0500 Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49 In-Reply-To: Message-ID: <200901040110.n041A2e5028585@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2724 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-03 20:10 EST ------- Bio.cdistance was an optional C implementation used within Bio.distance - the C code was used if available to speed up calculations. You can see the (now deleted) code in CVS here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Attic/cdistancemodule.c?hideattic=0&cvsroot=biopython This C code (Bio.cdistance) was removed when the python code (Bio.distance) was deprecated for release 1.49. This was discussed at the start of October on the mailing list, see this thread: http://lists.open-bio.org/pipermail/biopython/2008-October/004532.html This should have been mentioned in the DEPRECATED file, but wasn't. I've update this in CVS, see revision 1.41 http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/DEPRECATED?cvsroot=biopython Thanks for spotting this omission. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 3 20:20:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 20:20:42 -0500 Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49 In-Reply-To: Message-ID: <200901040120.n041Kgkx029421@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2724 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-03 20:20 EST ------- The file "/usr/share/biopython/Tests/Clustalw/temp horses.dnd" is normally created by one of the unit tests, test_Clustalw_tool.py (and the space is very deliberate). This stray dnd file does appear to have been included with biopython-1.49.zip (and probably the tar ball as well), which must have been a minor slip on my part. However, I don't think its worth re-issuing the archive files over this. I've updated test_Clustalw_tool.py as of CVS revision 1.4 so that it should remove this dnd file automatically. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 3 20:37:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 20:37:26 -0500 Subject: [Biopython-dev] [Bug 2723] Clarify what applies to which version of biopython and other doc cleanup In-Reply-To: Message-ID: <200901040137.n041bQ6Z030767@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-03 20:37 EST ------- (In reply to comment #0) > I went to look around at the docs because the built-in tests of 1.49 setup.py > spitted some messages about external programs missing. I haven't found any > hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/. No, that text and the matching email announcement don't do into details about installation - the text was already long enough I felt. However, the download page does list various external programs: http://biopython.org/wiki/Download (Someone else had pointed out we were missing a few, which as been fixed, but I couldn't find the email/bug report while writing this reply). > Anyway, looking at > http://biopython.org/DIST/docs/install/Installation.html#htoc17 > I see: "3.4 mxTextTools (no longer needed)". I would propose: > > 3.4 mxTextTools (no longer needed since 1.49) > > Similarly: > - 3.1 Numerical Python (NumPy) (strongly recommended) > + 3.1 Numerical Python (NumPy) (strongly recommended since 1.49) That does seem sensible. > Bad URL links are in the text: > > 3.3 Database Access (MySQLdb, ...) (optional) > > [cut] > > Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be > used for accessing BioSQL databases through Biopython (see ). Again if you > -----------------------------------------------------------^ > are not going to use BioSQL, there shouldn???t be any need to install these > modules. > > > 3.4 mxTextTools (no longer needed) > > [cut] > > However, we currently recommend you install mxTextTools 2.0, as some of the > API changes made in 3.0 version were not compatible with Biopython. Goto > ---------------------------------------------------------------------^^ > to download this. I'll have to check those... probably something silly in the LaTeX source. > I haven't found an answer for me yet: > > test_PopGen_FDist ... skipping. Install FDist if you want to use > Bio.PopGen.FDist. > ok > ... > test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use > Bio.PopGen.SimCoal. > ok > ... > test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. > ok > test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. > ok See http://biopython.org/wiki/Download > Pointer to those packages would have been helpful. From the test suite as well > as from installation manual. I'm not keen on making the unit test even more verbose by adding URLs to these messages. The information is on the download page, but yes, adding it to the installation document seems sensible. > Moreover, what database username/password would > I have to make to get the BioSQL stuff compiled and tested? ^H^H^H^H^H^H > I see, it gets compiled anyway the tests just were not run. The BioSQL unit test message should say: "Check settings in Tests/setup_BioSQL.py if you plan to use BioSQL". i.e. Once you have installed BioSQL and setup a database, edit the file setup_BioSQL.py to match. See http://biopython.org/wiki/BioSQL Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 4 13:56:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 4 Jan 2009 13:56:22 -0500 Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation document In-Reply-To: Message-ID: <200901041856.n04IuMhJ028749@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Clarify what applies to |Minor corrections to the |which version of biopython |installation document |and other doc cleanup | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-04 13:56 EST ------- (In reply to comment #1) > (In reply to comment #0) > > I went to look around at the docs because the built-in tests of 1.49 > > setup.py spitted some messages about external programs missing. I haven't > > found any hints on them in > > http://news.open-bio.org/news/2008/11/biopython-release-149/. > > No, that text and the matching email announcement don't do into details about > installation - the text was already long enough I felt. However, the download > page does list various external programs: > http://biopython.org/wiki/Download I've added a section on third party tools to the installation document in CVS. > > Anyway, looking at > > http://biopython.org/DIST/docs/install/Installation.html#htoc17 > > I see: "3.4 mxTextTools (no longer needed)". I would propose: > > > > 3.4 mxTextTools (no longer needed since 1.49) > > > > Similarly: > > - 3.1 Numerical Python (NumPy) (strongly recommended) > > + 3.1 Numerical Python (NumPy) (strongly recommended since 1.49) > > That does seem sensible. On reflection, I don't like the layout with version numbers stuck in the section names. The NumPy section is already very clear about the fact that this applies to 1.49 onwards, and that older versions of Biopython needed Numeric instead. I have tried to clarify the mxTextTools section in CVS. > > Bad URL links are in the text: > > > > 3.3 Database Access (MySQLdb, ...) (optional) > > ... > > 3.4 mxTextTools (no longer needed) > > ... > > I'll have to check those... probably something silly in the LaTeX source. Fixed in CVS. I'm leaving this bug open until I've updated the HTML and PDF copies of the installation document on the website. I don't have the tools hevea installed on this machine, so I can't create the HTML version of the installation document -- just the PDF. I should be be able to do this next week... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 4 17:09:47 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 4 Jan 2009 17:09:47 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200901042209.n04M9lJ0010428@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #32 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-04 17:09 EST ------- (In reply to comment #30) > (In reply to comment #29) > > > > I propose that in Biopython 1.50 we support both "colour" and "color", > > but for Biopython 1.51 we add deprecation warnings when "colour" is used. > > > > We should probably do the same thing for "centre" and "center" as well... > > > > I agree. We should encourage use of the US spelling in the documentation, to > catch those new to GD. This approach provides a window for conversion of old > GD scripts for previous users, which is a good thing. > I've updated CVS to switch from centre to centre, with properties setup to allow access under the old spellings, and where I thought it appropriate I've included both spellings in argument lists. Another set of eyes to check this wouldn't hurt. I'm leaving this bug open until we've done the documentation (see my comment 25). There is also the issue of Bug 2705 for the AT and GC content and skew functions and any windowing function to help plot these in GenomeDiagram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 11:30:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 11:30:46 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901051630.n05GUkun032207@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #17 from bsouthey at gmail.com 2009-01-05 11:30 EST ------- I do not consider this bug completely fixed for multiple reasons of which my patch addressed some of these prior to the creation of the _write function. I do like where _write is heading as it is making cleaner and more understandable code. 1) I do not understand the need for the dictionary of modules 'formatdict' in _write as it creates unnecessary inefficient code. The options need to be part of the check for the type of output. 2) There is no indication that the output for write and write_to_string only accepts uppercase. Note the _write function states this but a user will not see these. I do not understand why lowercase is unacceptable. 3) The check for renderPM at start is really redundant because _write checks for it (well sort of). It is also an unnecessary delay if renderPM is not used. If you really must use the dictionary (which I really do not like) I would suggest something like: formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG} try: from reportlab.graphics import renderPM formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM, 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM}) The current code would show the correct options regardless of status ofrenderPM. Perhaps an exception could provide a warning that renderPM is not present. 4) There is no test for the presence of renderPM. The test function must check for renderPM and should at least provide a warning if not present. Otherwise this is a surprise to a user because not all options will be available. 5) The installation documentation must also indicate that renderPM is optional and also how to install the renderPM module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 11:49:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 11:49:46 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200901051649.n05GnkVK001550@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 11:49 EST ------- Still to do on the documentation front (as written in comment #25), > > * Updating the existing GenomeDiagram manual to match (different imports, > colour to color), which I think can stay as a separate PDF file. > > * A short introduction to Bio.Graphics including GenomeDiagram as part of > a new chapter in the tutorial? Plus (as pointed out on Bug 2711 / Bug 2710): * Updating the installation instructions so that the ReportLab section also covers renderPM (needed for bitmaps). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 11:56:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 11:56:57 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901051656.n05GuvPP002443@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 11:56 EST ------- (In reply to comment #17) > I do not consider this bug completely fixed for multiple reasons of which my > patch addressed some of these prior to the creation of the _write function. I > do like where _write is heading as it is making cleaner and more > understandable code. > > 1) I do not understand the need for the dictionary of modules 'formatdict' in > _write as it creates unnecessary inefficient code. The options need to be part > of the check for the type of output. OK the use of a dictionary is a style thing. You think its ugly and inefficient. Leighton and I don't find it ugly. I thought the if/elif/elif/else alternative you suggested was "ugly". The argument for the type of output does get checked (by catching a KeyError from the dictionary). > 2) There is no indication that the output for write and write_to_string only > accepts uppercase. Note the _write function states this but a user will not > see these. I do not understand why lowercase is unacceptable. As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we should after all accept either case. > 3) The check for renderPM at start is really redundant because _write checks > for it (well sort of). It is also an unnecessary delay if renderPM is not > used. If you really must use the dictionary (which I really do not like) I > would suggest something like: > formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG} > try: > from reportlab.graphics import renderPM > formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM, > 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM}) I don't see how that would work, because unfortunately with the reportlab API, we must treat renderPM differently to renderPDF, renderPS and renderSVG. > The current code would show the correct options regardless of status > ofrenderPM. Perhaps an exception could provide a warning that renderPM > is not present. Right now we do have a "helpful" exception raised when a bitmap format is requested and renderPM is not installed. > 4) There is no test for the presence of renderPM. The test function must check > for renderPM and should at least provide a warning if not present. Otherwise > this is a surprise to a user because not all options will be available. There is an "on demand" test - via the _write function. As Leighton has already pointed out, this is nasty in that it can come as a surprise to the user. However, as far as I can see the alternative is an error/warning at import time regardless even if the user doesn't need or want bitmap output (i.e. Bug 2710). The current situation strikes me as the lesser of two evils. > 5) The installation documentation must also indicate that renderPM is > optional and also how to install the renderPM module. Yes, we should indicate renderPM is optional. Updating our documentation to cover GenomeDiagram is still pending on Bug 2671. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 16:46:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 16:46:37 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901052146.n05LkbSZ031281@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #19 from bsouthey at gmail.com 2009-01-05 16:46 EST ------- (In reply to comment #18) > (In reply to comment #17) > > I do not consider this bug completely fixed for multiple reasons of which my > > patch addressed some of these prior to the creation of the _write function. I > > do like where _write is heading as it is making cleaner and more > > understandable code. > > > > 1) I do not understand the need for the dictionary of modules 'formatdict' in > > _write as it creates unnecessary inefficient code. The options need to be part > > of the check for the type of output. > > OK the use of a dictionary is a style thing. You think its ugly and > inefficient. Leighton and I don't find it ugly. I thought the > if/elif/elif/else alternative you suggested was "ugly". > > The argument for the type of output does get checked (by catching a KeyError > from the dictionary). I agree that reportlab makes any solution "ugly" because the different types require different arguments. I agree this is partly a style issue because it is a case of what to do first, when to do it and when to tell the user what is missing. > > > 2) There is no indication that the output for write and write_to_string only > > accepts uppercase. Note the _write function states this but a user will not > > see these. I do not understand why lowercase is unacceptable. > > As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we > should after all accept either case. > > > 3) The check for renderPM at start is really redundant because _write checks > > for it (well sort of). It is also an unnecessary delay if renderPM is not > > used. If you really must use the dictionary (which I really do not like) I > > would suggest something like: > > formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG} > > try: > > from reportlab.graphics import renderPM > > formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM, > > 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM}) > > I don't see how that would work, because unfortunately with the reportlab API, > we must treat renderPM differently to renderPDF, renderPS and renderSVG. > This just moves the renderPM import into _write and the rest of the code runs if you add: except: renderPM=None > > The current code would show the correct options regardless of status > > ofrenderPM. Perhaps an exception could provide a warning that renderPM > > is not present. > > Right now we do have a "helpful" exception raised when a bitmap format is > requested and renderPM is not installed. Again a style issue because I just find it redundant if we already know that renderPM is not present. > > > 4) There is no test for the presence of renderPM. The test function must check > > for renderPM and should at least provide a warning if not present. Otherwise > > this is a surprise to a user because not all options will be available. > > There is an "on demand" test - via the _write function. As Leighton has > already pointed out, this is nasty in that it can come as a surprise to the > user. However, as far as I can see the alternative is an error/warning at > import time regardless even if the user doesn't need or want bitmap output > (i.e. Bug 2710). The current situation strikes me as the lesser of two evils. > I mean that test_GenomeDiagram should also check for renderPM and provide a warning if not present. So if tests are run then there is some indication that something is missing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 17:33:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 17:33:30 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901052233.n05MXUCS002828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 17:33 EST ------- (In reply to comment #19) > I mean that test_GenomeDiagram should also check for renderPM and provide a > warning if not present. So if tests are run then there is some indication that > something is missing. The way we have our external dependency checking setup, if something is missing the whole test is skipped. I want to keep test_GenomeDiagram.py as it is producing PDF output (with no dependency on renderPM - so that the core GenomeDiagram functionality is tested). However, I had been thinking about adding a (smaller) extra test, say test_GenomeDiagram_bitmaps.py which would need renderPM installed. Alternatively this could be a more general quick test for making PNG etc with all of Bio.Graphics after fixing Bug 2718. This would as you point out mean anyone running the test suite would then be alerted to the fact they may be missing renderPM - which would be a good thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 18:20:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 18:20:52 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200901052320.n05NKqok006769@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 18:20 EST ------- (In reply to comment #2) > In addition, I notice that Bio.Graphics.BasicChromosome, > Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case > formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram > expects upper case. We should be consistent, which for backwards > compatibility would mean accepting either case. Bio.Graphics.GenomeDiagram will now accept format names in any case. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 19:16:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 19:16:10 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200901060016.n060GAfe011559@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 19:16 EST ------- Created an attachment (id=1186) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1186&action=view) Adding output function to Bio.Graphics for shared use This is based on the code from Bio.Graphics.GenomeDiagram.Diagram and would be called from all the Bio.Graphics modules to output to a file/handle in any supported file format, in a consistent manor. This is done as a private function, as I do not want to expose this as a new public API. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 19:18:06 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 19:18:06 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901060018.n060I6eq011760@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 19:18 EST ------- (In reply to comment #17) > I do not consider this bug completely fixed for multiple reasons of which my > patch addressed some of these prior to the creation of the _write function. I > do like where _write is heading as it is making cleaner and more > understandable code. I decided that since ReportLab used a cStringIO or StringIO handle internally to implement its writeToString method, we might as well do the same as it allows a great simplification to the GenomeDiagram write and write_to_string methods (and we can get rid of _write too). See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython I hope you'll agree that this is a further improvement (even if the dictionary approach is still used internally). My plan (see Bug 2718) is to move this code into a shared private function for all of the Bio.Graphics modules to use. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Mon Jan 5 19:48:12 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 6 Jan 2009 00:48:12 +0000 Subject: [Biopython-dev] Structure and LDNe Message-ID: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> Hi all, Jason Eshleman (he subscribes to this list also) has made available code to interact with Structure (a widely used application in population genetics - the 2 papers related to it have around 3000 citations acording to Google scholar). We will try to convert his code to the Bio.PopGen namespace, create documentation and test cases. To this adds the exsiting LDNe code (mine). This all should be ready in a reasonably fast time frame (I suppose before the next release). The all important statistics part is still due, I am afraid (I don't know if anybody has looked at the beta code on git). But at least this LDNe and Structure code will be ready to go soon. Tiago From bugzilla-daemon at portal.open-bio.org Mon Jan 5 21:56:35 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 21:56:35 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901060256.n062uZBF023086@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #22 from bsouthey at gmail.com 2009-01-05 21:56 EST ------- (In reply to comment #21) > (In reply to comment #17) > > I do not consider this bug completely fixed for multiple reasons of which my > > patch addressed some of these prior to the creation of the _write function. I > > do like where _write is heading as it is making cleaner and more > > understandable code. > > I decided that since ReportLab used a cStringIO or StringIO handle internally > to implement its writeToString method, we might as well do the same as it > allows a great simplification to the GenomeDiagram write and write_to_string > methods (and we can get rid of _write too). > > See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython > > I hope you'll agree that this is a further improvement (even if the dictionary > approach is still used internally). > > My plan (see Bug 2718) is to move this code into a shared private function for > all of the Bio.Graphics modules to use. > That is great! Note that reportlab's drawToString first uses it's getStringIO() and passes that to drawToFile. I am not sure the difference between getStringIO() and StringIO() but getStringIO() might be preferred. Also, I would presume that checking for the filename would allow you to combine the writing to a file and writing to a string into a single new function to maintain backwards compatibility. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rhythmbox-devel at maubp.freeserve.co.uk Tue Jan 6 05:01:34 2009 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Tue, 6 Jan 2009 10:01:34 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> Message-ID: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> On Tue, Jan 6, 2009 at 12:48 AM, Tiago Ant?o wrote: > Hi all, > > Jason Eshleman (he subscribes to this list also) has made available > code to interact with Structure (a widely used application in > population genetics - the 2 papers related to it have around 3000 > citations acording to Google scholar). We will try to convert his code > to the Bio.PopGen namespace, create documentation and test cases. > To this adds the exsiting LDNe code (mine). This all should be ready > in a reasonably fast time frame (I suppose before the next release). That sounds good :) > The all important statistics part is still due, I am afraid (I don't > know if anybody has looked at the beta code on git). But at least this > LDNe and Structure code will be ready to go soon. > > Tiago I haven't looked at any of your code on git - and I probably won't have any spare time till next week. But anyway, do you have the URL handy? Thanks Peter From bugzilla-daemon at portal.open-bio.org Tue Jan 6 07:30:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Jan 2009 07:30:39 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901061230.n06CUds2006927@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-06 07:30 EST ------- (In reply to comment #22) > That is great! > > Note that reportlab's drawToString first uses it's getStringIO() and passes > that to drawToFile. I am not sure the difference between getStringIO() and > StringIO() but getStringIO() might be preferred. >From going through the ReportLab code a week or two ago, it ends up using cStringIO (or falling back on StringIO) internally. > Also, I would presume that checking for the filename would allow you to > combine the writing to a file and writing to a string into a single new > function to maintain backwards compatibility. You'd then have one method to write to a string, handle or filename. As I said before, I'm not keen on this - having two very different return values (string or nothing) depending on the arguments, with some special invocation needed to request the string output (maybe None rather than a filename/handle?). The status quo seems OK here, with a write method (to a handle or filename) and separate a write_to_string method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Tue Jan 6 11:52:22 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 6 Jan 2009 16:52:22 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> Message-ID: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> On Tue, Jan 6, 2009 at 10:01 AM, Peter wrote: > I haven't looked at any of your code on git - and I probably won't > have any spare time till next week. But anyway, do you have the URL > handy? I gave the code to Giovanni, so its his URL: http://github.com/dalloliogm/biopython---popgen/tree/master The code on Stats is still in a version that will have to be changed. It is probably only of interest to developers that might have direct interest in the module. For development purposes I will put the code there (I don't want to commit to the main CVS branch - as it is a production branch - before the code is in an acceptable format). Tiago From bsouthey at gmail.com Tue Jan 6 12:41:29 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 06 Jan 2009 11:41:29 -0600 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> Message-ID: <496397C9.3030706@gmail.com> Tiago Ant?o wrote: > Hi all, > > Jason Eshleman (he subscribes to this list also) has made available > code to interact with Structure (a widely used application in > population genetics - the 2 papers related to it have around 3000 > citations acording to Google scholar). We will try to convert his code > to the Bio.PopGen namespace, create documentation and test cases. > To this adds the exsiting LDNe code (mine). This all should be ready > in a reasonably fast time frame (I suppose before the next release). > > The all important statistics part is still due, I am afraid (I don't > know if anybody has looked at the beta code on git). But at least this > LDNe and Structure code will be ready to go soon. > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, What are the licenses for LDNe and Structure? Saying just 'free' is insufficient because it is not clear in which definition is being used. Also, please ensure that none of the code that is included into Biopython is not a deriviative of LDNe and Structure unless these have explicit license that is compatible with Biopython. For example, 'copying' an existing function into Python would be considered a derivative. Obviously reading a documented output is probably not considered a derivative. I prefer to be proactive with licenses so these don't bite back like has happened in some formally open sources projects or use of unclean code sources. A current example of this is that the current release of scipy 0.7 has been significantly delayed due to some major effort to check various functions that reference the Numerical Recipes book (which has an incompatible license). Anyhow, this sounds good! Bruce From tiagoantao at gmail.com Tue Jan 6 13:10:28 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 6 Jan 2009 18:10:28 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <496397C9.3030706@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> Message-ID: <6d941f120901061010n36281702gc073d9f4469d492c@mail.gmail.com> On Tue, Jan 6, 2009 at 5:41 PM, Bruce Southey wrote: > What are the licenses for LDNe and Structure? > Saying just 'free' is insufficient because it is not clear in which > definition is being used. > > Also, please ensure that none of the code that is included into Biopython is > not a deriviative of LDNe and Structure unless these have explicit license > that is compatible with Biopython. For example, 'copying' an existing > function into Python would be considered a derivative. Obviously reading a > documented output is probably not considered a derivative. Regarding LDNe we have had this discussion in the past. I have some updates/extra info: 1. They only make available a Windows/DOS version. But they will make a Linux version available (compiled by me, I offered to do that). Probably a mac version also. 2. As I said before and as it is common in population genetics (unfortunately), the software comes with no license at all, they didn't even think that is an issue. 3. No code is remotely derived or adapted. Regarding structure, the authors make the source available (a notch better than LDNe) http://pritch.bsd.uchicago.edu/structure.html , but again, they didn't bother to include license info. I am contacting them in order to investigate this. I will report back as soon as I have an answer. This being said, structure support is way more important than LDNe. The userbase of structure is quite big (just check the factoid previous on google schoolar citations). From dalloliogm at gmail.com Wed Jan 7 05:37:00 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 7 Jan 2009 11:37:00 +0100 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> Message-ID: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> On Tue, Jan 6, 2009 at 5:52 PM, Tiago Ant?o wrote: > On Tue, Jan 6, 2009 at 10:01 AM, Peter > wrote: >> I haven't looked at any of your code on git - and I probably won't >> have any spare time till next week. But anyway, do you have the URL >> handy? > > I gave the code to Giovanni, so its his URL: > http://github.com/dalloliogm/biopython---popgen/tree/master Hi people, if you want to upload the code there, please tell me and I will give you the write access. However, the right way to do it should be that you create a fork of the code on github, add your changes and work on it locally, and then merge them back again in the original repository. I suppose that is the standard way to use git. > The code on Stats is still in a version that will have to be changed. > It is probably only of interest to developers that might have direct > interest in the module. > For development purposes I will put the code there (I don't want to > commit to the main CVS branch - as it is a production branch - before > the code is in an acceptable format). > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From tiagoantao at gmail.com Wed Jan 7 06:54:19 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 7 Jan 2009 11:54:19 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> Message-ID: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> > However, the right way to do it should be that you create a fork of > the code on github, add your changes and work on it locally, and then > merge them back again in the original repository. I suppose that is > the standard way to use git. Considering that CVS has no development branch I think having git is very good. I would just recommend extreme care with changing existing code. When merging back into CVS, changes to existing code might not go in (especially if they change interfaces) or be delayed. Big _design_ changes will have to be discussed in advance. For my part, what I am including is just new LDNe code and helping Jason with the structure code. So I expect zero impact on existing code and no need for design changes. Tiago PS - I am travelling until Saturday, apologies in advance for delayed answers. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 09:12:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:12:46 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071412.n07ECk1n012802@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #24 from lpritc at scri.sari.ac.uk 2009-01-07 09:12 EST ------- (In reply to comment #13) > I can not check this as I am away from my system. As I recall, the Python code > for accessing this library is provided with the standard install as there is a > renderPM.py file. But that is just a wrapper to some C code found in the > rl_addons directory. So it is a big no that renderPM is available unless you > actually build the C sources or download the binaries (only valid for Windows). That's not really a big deal, as those are the only two ways to get ReportLab, from reportlab.org! >From the website (http://www.reportlab.org/downloads.html): """ We provide precompiled binaries for Windows, but not for any other platform. Many Linux distributors and other UNIX-like OS vendors provide their own binaries for download """ The installation procedure for me was to issue: python setup.py install at the command line while in the top directory of the source download, which isn't any harder than installing Biopython itself. This installed ReportLab 2.2, including compilation of renderPM. > According to the website > http://www.reportlab.org/subversion.html > " > It will create subdirectories for reportlab, which is an importable > python package, and rl_addons which contains the C extensions. The > latter need building with the contained setup script, but can also be > downloaded in pre-built form from our downloads page. They rarely > change. > " > > What did you actually install? Reportlab 2.2, stable build as ReportLab_2_2.tgz, downloaded on December 15th last year. From the checksum, it's the 11/9 build. I've just checked the SVN trunk, and that also builds renderPM, on the same machine. > In particular where was _renderPM built? Initially, in [download location]/ReportLab_2_2/src/rl_addons/renderPM and the library was installed to /usr/local/lib/python2.4/site-packages/_renderPM.so by the setup script. > Basically we need to document this as there appears to be different ways to > install reporlab (may also be version or svn related). I'm happy with this, but it's not exactly a complicated issue: either the local Reportlab installation does or does not have renderPM; if it does not, then raising an error before the user dedicates too much effort to something that can't work seems at least polite. Also, providing pointers in the documentation to where renderPM can be obtained (at time of last writing) is a good idea. IMO, given the straightforward installation procedure that corrects the issue - which ought not to affect *nix users that do not run precompiled binaries, anyway - I reckon that raising an error will be sufficient for most of the few cases that renderPM is not installed. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 09:33:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:33:21 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071433.n07EXLSn014755@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #25 from lpritc at scri.sari.ac.uk 2009-01-07 09:33 EST ------- (In reply to comment #17) > 1) I do not understand the need for the dictionary of modules 'formatdict' in > _write as it creates unnecessary inefficient code. The options need to be part > of the check for the type of output. The need is that input types are associated with alternative rendering backends. The distribution dictionary approach is highly-readable and readily extendable to accept, for example, lowercase variants of format names that map to the same backend - as in your point number 2. I also don't understand your efficiency argument. Firstly, this step is not AFAIAA a bottleneck, and hardly a priority for optimisation; secondly I do not believe that a distribution dictionary is less efficient than your suggestion. The dictionary achieves the same end in three lines of code, rather than ten for the elif. Also computationally, if the format name is 'TIF', your elif code will always have to cycle through all output format name tests (four conditionals, and an O(n) list search) in order to associate that format with renderPM. This is less efficient than a dictionary approach: retrieving values from dictionaries takes approximately constant time. Not that if we ran profile on the two approaches we'd see much of a difference, of course - this is not a speed-critical step. Also, and in my opinion, elifs are not as easy to maintain, or as readable, as distribution dictionaries. > 2) There is no indication that the output for write and write_to_string only > accepts uppercase. Note the _write function states this but a user will not see > these. I do not understand why lowercase is unacceptable. It's not unacceptable - at least, not to me - I just didn't write it to accept lowercase, originally. I've no objection to adding lowercase variants of the format names to the distribution dictionary. > 3) The check for renderPM at start is really redundant because _write checks > for it (well sort of). It is also an unnecessary delay if renderPM is not used. It's not a big speed hit (or is there contradictory data? it's certainly not a speed worry for my work) and, if tested on import, needs only to be done once when GenomeDiagram is imported. > 4) There is no test for the presence of renderPM. The test function must check > for renderPM and should at least provide a warning if not present. Otherwise > this is a surprise to a user because not all options will be available. Raising an error, or at least a warning, is a good idea. I favour raising this error on first import. > 5) The installation documentation must also indicate that renderPM is optional > and also how to install the renderPM module. I'm still not convinced that this is all that big an issue: renderPM is part of the source ReportLab 2.2 distribution, and the instructions on reportlab.org are pretty clear. However, for those users who have pathological installations, a line pointing out that renderPM can be obtained via reportlab.org is a good idea. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 09:38:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:38:14 -0500 Subject: [Biopython-dev] [Bug 2727] New: PDB.Bio: header should include CRYST1 information Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2727 Summary: PDB.Bio: header should include CRYST1 information Product: Biopython Version: 1.49b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mok at bioxray.au.dk The unit cell and spacegroup information should be available from PDBParser's get_header() method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 09:40:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:40:52 -0500 Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1 information In-Reply-To: Message-ID: <200901071440.n07EeqsZ015513@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2727 ------- Comment #1 from mok at bioxray.au.dk 2009-01-07 09:40 EST ------- Created an attachment (id=1188) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1188&action=view) Patch for parse_pdb_header.py Attached patch will add three keys to the header dictionary: cell, spacegroup and cell_z, giving access to this data gleaned from the CRYST1 record of a PDB file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 10:10:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 10:10:12 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071510.n07FACPH017825@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #26 from bsouthey at gmail.com 2009-01-07 10:10 EST ------- (In reply to comment #24) I had Reportlab version 2.1 installed but once I upgraded to version 2.2 I got renderPM built. So anyone using reportlab version 2.2 will be happy, others that don't will not be happy! So please ensure that Reportlab version 2.2 (released 11 Sep 2008) and higher is required. Otherwise you must check for renderPM because most people probably have old version around with renderPM and most distributions (OpenSUSE seems to be an exception if you look in the right place) don't have the 2.2 version yet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 10:52:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 10:52:52 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071552.n07FqqcX021811@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #27 from bsouthey at gmail.com 2009-01-07 10:52 EST ------- (In reply to comment #25) This is a mainly a reportlab issue (API and version problem) and, as Peter said, a style issue. So the only remaining issue is a unit test involving at least checks for the presence of renderPM due to versions of reportlab less than 2.2. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jae at lmi.net Thu Jan 8 17:24:21 2009 From: jae at lmi.net (Jason Eshleman) Date: Thu, 08 Jan 2009 14:24:21 -0800 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <496397C9.3030706@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> Message-ID: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> Greetings all, Presently, the code I have for dealing with STRUCTURE is similar to the code for interacting with Clustal in that it does not modify any of the STRUCTURE source code by merely initiates the compiled executable. Initially, I have used my code in place of their Java front end as it allows for more control of the run-time variables for successive runs with varying run parameters. At some point, I'd like to get it to interface more directly with the STRUCTURE code to be able to pipe results directly to python for parsing rather than working with the STRUCTURE text output but that's a ways off still. -Jason At 09:41 AM 1/6/2009, Bruce Southey wrote: >Tiago Ant?o wrote: >>Hi all, >> >>Jason Eshleman (he subscribes to this list also) has made available >>code to interact with Structure (a widely used application in >>population genetics - the 2 papers related to it have around 3000 >>citations acording to Google scholar). We will try to convert his code >>to the Bio.PopGen namespace, create documentation and test cases. >>To this adds the exsiting LDNe code (mine). This all should be ready >>in a reasonably fast time frame (I suppose before the next release). >> >>The all important statistics part is still due, I am afraid (I don't >>know if anybody has looked at the beta code on git). But at least this >>LDNe and Structure code will be ready to go soon. >> >>Tiago >>_______________________________________________ >>Biopython-dev mailing list >>Biopython-dev at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >Hi, >What are the licenses for LDNe and Structure? >Saying just 'free' is insufficient because it is not clear in which >definition is being used. > >Also, please ensure that none of the code that is included into Biopython >is not a deriviative of LDNe and Structure unless these have explicit >license that is compatible with Biopython. For example, 'copying' an >existing function into Python would be considered a derivative. Obviously >reading a documented output is probably not considered a derivative. > >I prefer to be proactive with licenses so these don't bite back like has >happened in some formally open sources projects or use of unclean code >sources. A current example of this is that the current release of scipy >0.7 has been significantly delayed due to some major effort to check >various functions that reference the Numerical Recipes book (which has an >incompatible license). > >Anyhow, this sounds good! > >Bruce >_______________________________________________ >Biopython-dev mailing list >Biopython-dev at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Fri Jan 9 07:50:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 07:50:37 -0500 Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1 information In-Reply-To: Message-ID: <200901091250.n09Cob1q021245@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2727 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 07:50 EST ------- Hopefully Bio.PDB's owner/maintainer Thomas Hamelryck can comment on this. In the meantime, the code style seems to fit fine with the rest of parse_pdb_header.py which is good. However, you have not updated the parse_pdb_header function's docstring to include the new keys. Furthermore, it would be nice to have the docstring describe the meaning of the cell, z-cell and spacegroup entries you have introduced. I'm also curious about the default values and their meanings. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rhythmbox-devel at maubp.freeserve.co.uk Fri Jan 9 07:55:13 2009 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 12:55:13 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> Message-ID: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: > > Considering that CVS has no development branch I think having git is > very good. I would just recommend extreme care with changing existing > code. When merging back into CVS, changes to existing code might not > go in (especially if they change interfaces) or be delayed. > If there is a strong interest in having experimental branches in the official Biopython repository, we could discuss that as an option. Although I would prefer we get moved from CVS to SVN first before actually doing this, in order to keep the migration as simple as possible. Peter From biopython at maubp.freeserve.co.uk Fri Jan 9 07:59:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 12:59:00 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> Message-ID: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman wrote: > Greetings all, > > Presently, the code I have for dealing with STRUCTURE is similar to the code > for interacting with Clustal, in that it does not modify any of the STRUCTURE > source code by merely initiates the compiled executable. Biopython has code for interacting with lots of command line tools, and this neatly avoids any copyright/licence questions about being a derived work. > Initially, I have used my code in place of their Java front end as it allows > for more control of the run-time variables for successive runs with varying > run parameters. At some point, I'd like to get it to interface more > directly with the STRUCTURE code to be able to pipe results directly to > python for parsing rather than working with the STRUCTURE text output but > that's a ways off still. I'm not quite clear what you have in mind, but this would probably need a little more thought from the legal perspective. If STRUCTURE provides an API with header files you can compile against, that should be OK (but I am not a lawyer). Note that do this within Biopython would then mean adding another build time dependency, which would need to be justified in terms of the benefits it brings. Peter From bsouthey at gmail.com Fri Jan 9 09:46:15 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 09 Jan 2009 08:46:15 -0600 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> Message-ID: <49676337.7050504@gmail.com> Peter wrote: > On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: > >> Considering that CVS has no development branch I think having git is >> very good. I would just recommend extreme care with changing existing >> code. When merging back into CVS, changes to existing code might not >> go in (especially if they change interfaces) or be delayed. >> >> > > If there is a strong interest in having experimental branches in the > official Biopython repository, we could discuss that as an option. > Although I would prefer we get moved from CVS to SVN first before > actually doing this, in order to keep the migration as simple as > possible. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > I agree that it is essential to move from CVS before doing this but does not prevent any discussion. So I'll start a thread. Bruce From bugzilla-daemon at portal.open-bio.org Fri Jan 9 10:59:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 10:59:40 -0500 Subject: [Biopython-dev] [Bug 2729] New: Importing Bio.SeqUtils before importing pylab gives a "Bus Error" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2729 Summary: Importing Bio.SeqUtils before importing pylab gives a "Bus Error" Product: Biopython Version: 1.49 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: stephan_schiffels at mac.com I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0 The following two lines crash: import Bio.SeqUtils import pylab I nailed down the problem to lines 122 through 125 in Bio/SeqUtils/__init__.py. Commenting out these four lines SOLVES the bug for me, since I don't use the graphics-functions in the SeqUtils package Best, Stephan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Jan 9 11:18:26 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 09 Jan 2009 10:18:26 -0600 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> Message-ID: <496778D2.1050801@gmail.com> Hi, In a previous thread (and indicated in others) it was suggested that perhaps Biopython needs some type of development or experimental branch. So this thread is orientated to provide some discussion on this and considers that Biopython has moved to SVN. I think it is very relevant discussion because Biopython needs an effective approach to mainly handle new code but also handle significant rewrites of older code. The most important question is do you support creating developmental and experimental branches or not? However, I do not think that this is a yes or no answer and I am not concerned about the question at the present time. Rather I am concerned about the burden placed on the maintainers (especially Peter and Michiel), the expression of the developer needs and how this impact the community. I am rather neutral on it (probably because I have not contributed any major code to Biopython) but I would like to ensure that the discussion leads to positive changes. I find Biopython interesting and special for various reasons. There is a solid core of functions that are common to many aspects of bioinformatics. But it also contains very specialized code that has a much smaller audience. Consequently certain parts get considerable exposure and other parts get limited or no exposure. This means that it may be necessary to release beta versions in order to get the necessary exposure as I assume that code has had sufficient development to be released in the first place. Creating developmental and experimental branches is one way to get this exposure but perhaps branches are not necessary. An alternative approach is creating specialized projects within Biopython that can be used for development and testing. For example, Scipy provides SciKits that are related code that is typically special purpose or is released under a different license than scipy/numpy. This replaced the sandboxes that existed in prior versions of numpy and scipy. But a recent problem arose in numpy was how to get code from such a location into numpy by creating a experimental section in the main distribution but that met some strong resistance. Therefore, I see the following issues that need to be addressed regardless of the approach taken: 0) Must be easy for project maintenance and release as this must not create an extra burden to Biopython! 1) Ensure adequate testing is performed especially to get it out to the appropriate audience and to correct the code and APIs. I consider this rather important because I tend to follow a type of user experience design (http://en.wikipedia.org/wiki/User_experience_design) and software prototyping (http://en.wikipedia.org/wiki/Software_prototyping) for software development. 2) Stabilization of APIs for backwards compatibility as we don't want to change these with each Biopython release. 3) Adequate test coverage especially across platforms and different software versions. For example Windows paths and older software versions can cause problems on other peoples machines but not yours. 4) Some type of code review even if it is just to ensure a consistent format (like spaces versus tabs) or compatibility across Python versions and platforms. 5) If developmental or experimental branch are used then how does the code move into the main distribution and how are these branches created and destroyed. Please add other issues. I would appreciate these issues being addressed when appropriate. Regards Bruce Peter wrote: > On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: > >> Considering that CVS has no development branch I think having git is >> very good. I would just recommend extreme care with changing existing >> code. When merging back into CVS, changes to existing code might not >> go in (especially if they change interfaces) or be delayed. >> >> > > If there is a strong interest in having experimental branches in the > official Biopython repository, we could discuss that as an option. > Although I would prefer we get moved from CVS to SVN first before > actually doing this, in order to keep the migration as simple as > possible. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From bugzilla-daemon at portal.open-bio.org Fri Jan 9 11:27:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:27:08 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091627.n09GR88l003529@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 11:27 EST ------- i.e. these lines? try: from Tkinter import * except ImportError: pass What happens with just "import Tkinter" on your machine? Are you using the default Apple installed copy of python? I can see why this might cause trouble if Tkinter does some initialisation at import time. Could you include the actual crash/traceback error please? Note I see no crash on my MacOS machine (not sure which version of pylab) which has Tkinter. Nor do I see a crash on one of my linux machines (again, not sure which pylab) which does NOT have TKinter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 11:33:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:33:59 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091633.n09GXxDS004117@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2009-01-09 11:33 EST ------- (In reply to comment #0) > I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0 > The following two lines crash: > > import Bio.SeqUtils > import pylab > What do you mean by crash? Also, do you get the same problem with the latest matplotlib (0.98.4 I believe)? If try: from Tkinter import * except ImportError: pass import pylab crashes, then this is not a Biopython bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 11:45:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:45:52 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091645.n09GjqFV004905@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 11:45 EST ------- Created an attachment (id=1189) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1189&action=view) Patch to Bio/SeqUtils/__init__.py to moving the Tkinter imports This patch moves the Tkinter import back into the xGC_skew function as suggested by the old comments in the code, and uses an explicit import list instead of "import *". For the history of this bit of code, see the deleted file Bio/sequtils.py in CVS. I think this is worthwhile little bit of clean up - but it probably won't have any effect on Stephan's issue with Tkinter/pylab. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 11:53:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:53:23 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091653.n09GrN6W005481@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #4 from stephan_schiffels at mac.com 2009-01-09 11:53 EST ------- Hi, importing Tkinter works fine. Only calling import pylab after it crashes... (no traceback... just "bus error"). Here is the shell-output: mac14:~ stschiff$ python Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import Tkinter >>> import pylab Bus error mac14:~ stschiff$ The weirdest thing is that calling the other way around works fine: mac14:~ stschiff$ python Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pylab >>> import Tkinter >>> The same holds for first calling pylab and then Bio.SeqUtils... I dont know, it could be that this is just a pathological case on my specific setup. It's still weird though, since matplotlib uses GTK on X11 on my machine, not Tkinter... I dont get it. Maybe this is not a biopython bug after all... sorry and thanks anyway for your concern Stephan (In reply to comment #1) > i.e. these lines? > > try: > from Tkinter import * > except ImportError: > pass > > What happens with just "import Tkinter" on your machine? > > Are you using the default Apple installed copy of python? > > I can see why this might cause trouble if Tkinter does some initialisation at > import time. Could you include the actual crash/traceback error please? > > Note I see no crash on my MacOS machine (not sure which version of pylab) which > has Tkinter. Nor do I see a crash on one of my linux machines (again, not sure > which pylab) which does NOT have TKinter. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 12:10:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 12:10:10 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091710.n09HAA5c006886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 12:10 EST ------- (In reply to comment #4) > Hi, > importing Tkinter works fine. Only calling import pylab after it crashes... > (no traceback... just "bus error"). You could try going to Application, Utilities, Console on your Mac to look for any error log associated with the bus error. > Here is the shell-output: > > mac14:~ stschiff$ python > Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) > [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import Tkinter > >>> import pylab > Bus error > mac14:~ stschiff$ OK - that does seem to confirm that its a bug with pylab, and therefore isn't Biopython's fault. I'm going to close this bug. I would suggest you update your installation of pylab, and if it still goes wrong, file a bug with pylab. Thanks anyway, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 12:10:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 12:10:52 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091710.n09HAqh1006971@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1189 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 12:10 EST ------- (From update of attachment 1189) This didn't turn out to be related to Bug 2729 after all. However, I've checked it in anyway. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Fri Jan 9 12:17:53 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 9 Jan 2009 18:17:53 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <496778D2.1050801@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> Message-ID: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: > Hi, > In a previous thread (and indicated in others) it was suggested that perhaps > Biopython needs some type of development or experimental branch. So this > thread is orientated to provide some discussion on this and considers that > Biopython has moved to SVN. Maybe you can consider the approach at the basis of git, in which every developer works on its personal branch, and the owner of the 'official branch' can decide whether to accept the changes apported by the single branches or not. If you want to play a bit with it, you can use my repository at github: - http://github.com/dalloliogm/biopython---popgen/commits/master and then create a fork from it. I am sorry that you will have to create an account on github.. but I don't know of any other free hosting service for git repositories. Git has also other advantages over svn, like working on local (which is done by creating a local branch internally) and being faster (this is what they say). Well, I am not a git guru, but I can suggest you some good videos, like this one: - http://excess.org/article/2008/07/ogre-git-tutorial/ > I think it is very relevant discussion because > Biopython needs an effective approach to mainly handle new code but also > handle significant rewrites of older code. > > The most important question is do you support creating developmental and > experimental branches or not? > > Please add other issues. > > I would appreciate these issues being addressed when appropriate. > > Regards > Bruce > > Peter wrote: >> >> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: >> >>> >>> Considering that CVS has no development branch I think having git is >>> very good. I would just recommend extreme care with changing existing >>> code. When merging back into CVS, changes to existing code might not >>> go in (especially if they change interfaces) or be delayed. >>> >>> >> >> If there is a strong interest in having experimental branches in the >> official Biopython repository, we could discuss that as an option. >> Although I would prefer we get moved from CVS to SVN first before >> actually doing this, in order to keep the migration as simple as >> possible. >> >> Peter >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Jan 9 12:28:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 17:28:06 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> Message-ID: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio wrote: > On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: >> Hi, >> In a previous thread (and indicated in others) it was suggested that perhaps >> Biopython needs some type of development or experimental branch. So this >> thread is orientated to provide some discussion on this and considers that >> Biopython has moved to SVN. > > Maybe you can consider the approach at the basis of git, in which > every developer works on its personal branch, and the owner of the > 'official branch' can decide whether to accept the changes apported by > the single branches or not. In some ways this describes the current situation but without the software: The CVS/SVN repository is the master official branch which we (as a group) try and keep pretty stable. When working on new modules, individual developers or contributors have hacked away on their own machines (perhaps using a local repository - I tended to just save versioned snapshots of work in progress), and commit things to the master once it was sufficiently stable to be approved. For self contained modules, this works OK - although using something like git would be a bit more formalised and automated, and allow this kind of "work in progress" to be done openly. Peter From dalloliogm at gmail.com Fri Jan 9 12:43:26 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 9 Jan 2009 18:43:26 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> Message-ID: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com> On Fri, Jan 9, 2009 at 6:28 PM, Peter wrote: > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio > wrote: >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: >>> Hi, >>> In a previous thread (and indicated in others) it was suggested that perhaps >>> Biopython needs some type of development or experimental branch. So this >>> thread is orientated to provide some discussion on this and considers that >>> Biopython has moved to SVN. >> >> Maybe you can consider the approach at the basis of git, in which >> every developer works on its personal branch, and the owner of the >> 'official branch' can decide whether to accept the changes apported by >> the single branches or not. > > In some ways this describes the current situation but without the > software: The CVS/SVN repository is the master official branch which > we (as a group) try and keep pretty stable. When working on new > modules, individual developers or contributors have hacked away on > their own machines (perhaps using a local repository - I tended to > just save versioned snapshots of work in progress), and commit things > to the master once it was sufficiently stable to be approved. For > self contained modules, this works OK - although using something like > git would be a bit more formalised and automated, and allow this kind > of "work in progress" to be done openly. just a note: since I was trying to simplify the concept, I said something which is not particularly correct. In git, you are not needed to have a central repository. Everyone has its personal branch and there is not such thing as an 'official branch', unless it is defined by convention. For example, look at this graph: - http://github.com/blog/39-say-hello-to-the-network-graph-visualizer on March 6th someone has created a fork to work on a mysql support, which has not been merged in the ufficial branch yet. There are many other forks, too: which one is the official? The answer is none of them, but if the authors wanted, they could have created a repository and decided that it was the official one, and kept it up to date. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Jan 9 12:49:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 17:49:43 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com> Message-ID: <320fb6e00901090949v695333ak2615e9c217bc1387@mail.gmail.com> > just a note: since I was trying to simplify the concept, I said > something which is not particularly correct. > In git, you are not needed to have a central repository. Everyone has > its personal branch and there is not such thing as an 'official > branch', unless it is defined by convention. If we did want to adopt a git style approach, I do think we need an official branch which would be used for the releases and installers hosted on biopython.org, and this branch would be managed in much the same way as we do now with CVS/SVN. I think this would be essential for avoiding confusion in the typical end user. Peter From bartek at rezolwenta.eu.org Fri Jan 9 13:17:09 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 9 Jan 2009 19:17:09 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> Message-ID: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> On Fri, Jan 9, 2009 at 6:28 PM, Peter wrote: > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio > wrote: >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: >>> Hi, >>> In a previous thread (and indicated in others) it was suggested that perhaps >>> Biopython needs some type of development or experimental branch. So this >>> thread is orientated to provide some discussion on this and considers that >>> Biopython has moved to SVN. >> >> Maybe you can consider the approach at the basis of git, in which >> every developer works on its personal branch, and the owner of the >> 'official branch' can decide whether to accept the changes apported by >> the single branches or not. > > In some ways this describes the current situation but without the > software: The CVS/SVN repository is the master official branch which > we (as a group) try and keep pretty stable. When working on new > modules, individual developers or contributors have hacked away on > their own machines (perhaps using a local repository - I tended to > just save versioned snapshots of work in progress), and commit things > to the master once it was sufficiently stable to be approved. For > self contained modules, this works OK - although using something like > git would be a bit more formalised and automated, and allow this kind > of "work in progress" to be done openly. > It can be viewed this way, but the point here is that making this change to the process of development might decrease the amount of work required to join the development. Especially, if you think about adding new library to biopython, the most sensible way to do it is to branch and then stabilize. I've recently experienced (with Bio.Motif) that it might be tedious even for a very simple task. Also, using the distributed version control system, it is very easy for a small team of people to collaborate on a branch before merging back to the main repository. In the current mode this would be really difficult. And another benefit is that you do not loose the history of changes made "on a branch". As for github, it is currently used by BioRuby project hosted on open-bio.org. We can try to talk to them and ask about their experiences. I'm not personally involved in any way in it, but it seems, that they've basically moved the main branch to github and update the cvs repository only occasionaly. I think that for biopython, if we decided to use distributed version control, it would be better to use bazaar+launchpad instead of git+github. And for the following reasons: - it's completely free, as opposed to <300Mb of free account on github - launchpad could make the transition very easy. They provide a service of importing existing open source projects to launchpad: https://help.launchpad.net/VcsImports They convert the trunk to bazzaar for us and set it up to update from the cvs every 6-12 hours. It would be easy then to see whether we like it like this or not - bazaar is specifically aimed to be more user friendly than git, and allows developers to keep working in a familiar environment when moving from cvs or svn. I think it is important since git itself is really different from cvs and if we switch to anything else, everybody needs to learn the tool. - they use openID, which makes it simpler for people to join (even though you still need another account) - both bazaar and launchpad are developed in python, so they're more python oriented (while github is developed in ruby, so a better choice for bioruby). More on comparing these to possibilities (from the bazaar developers non-objective point of view): http://bazaar-vcs.org/BzrVsGit These are my 2 cents on the choice of tools for development, but I have to admit that I'm not sure whether it is needed for biopython now. I'm very open to discussion. -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From chapmanb at 50mail.com Fri Jan 9 17:51:55 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 9 Jan 2009 17:51:55 -0500 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> Message-ID: <20090109225155.GF4135@sobchak.mgh.harvard.edu> Hi all; In terms of the coding of experimental modules, Giovanni is taking an excellent approach. While they are under development, we can utilize one of the many free hosting platforms to develop it as a separate project in the Bio namespace. This allows interested users to get the code, contribute, and test. Once an interface and functionality is hammered out and they begin to stabilize, then it's a good time to package it up and roll it into Biopython provided the ol' mailing list consensus is happy. This is a nice development model as it leverages the community, but only rolls code into the main release when it stabilizes reasonable well. Peter has taken a really good development methodology -- creating a rock solid stable core of modules, and actively deprecating or fixing those that fall out of line. My only suggestion would be to have a Biopython wiki page for the experimental modules as they are under development. Something simple with a description of the goals and a link to the source code would help the majority of people who don't follow the mailing list find and contribute to these. Brad > On Fri, Jan 9, 2009 at 6:28 PM, Peter wrote: > > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio > > wrote: > >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: > >>> Hi, > >>> In a previous thread (and indicated in others) it was suggested that perhaps > >>> Biopython needs some type of development or experimental branch. So this > >>> thread is orientated to provide some discussion on this and considers that > >>> Biopython has moved to SVN. > >> > >> Maybe you can consider the approach at the basis of git, in which > >> every developer works on its personal branch, and the owner of the > >> 'official branch' can decide whether to accept the changes apported by > >> the single branches or not. > > > > In some ways this describes the current situation but without the > > software: The CVS/SVN repository is the master official branch which > > we (as a group) try and keep pretty stable. When working on new > > modules, individual developers or contributors have hacked away on > > their own machines (perhaps using a local repository - I tended to > > just save versioned snapshots of work in progress), and commit things > > to the master once it was sufficiently stable to be approved. For > > self contained modules, this works OK - although using something like > > git would be a bit more formalised and automated, and allow this kind > > of "work in progress" to be done openly. > > > > It can be viewed this way, but the point here is that making this change to > the process of development might decrease the amount of work required to > join the development. Especially, if you think about adding new library > to biopython, the most sensible way to do it is to branch and then > stabilize. I've > recently experienced (with Bio.Motif) that it might be tedious even > for a very simple > task. Also, using the distributed version control system, it is very > easy for a small team > of people to collaborate on a branch before merging back to the main > repository. In the > current mode this would be really difficult. And another benefit is > that you do not loose > the history of changes made "on a branch". > > As for github, it is currently used by BioRuby project hosted on > open-bio.org. We can try > to talk to them and ask about their experiences. I'm not personally > involved in any way in it, > but it seems, that they've basically moved the main branch to github > and update the cvs repository > only occasionaly. > > I think that for biopython, if we decided to use distributed version > control, it would > be better to use bazaar+launchpad instead of git+github. And for the > following reasons: > - it's completely free, as opposed to <300Mb of free account on github > - launchpad could make the transition very easy. They provide a > service of importing existing > open source projects to launchpad: > https://help.launchpad.net/VcsImports They convert the trunk > to bazzaar for us and set it up to update from the cvs every 6-12 > hours. It would be easy then to > see whether we like it like this or not > - bazaar is specifically aimed to be more user friendly than git, and > allows developers > to keep working in a familiar environment when moving from cvs or svn. > I think it is important since git > itself is really different from cvs and if we switch to anything else, > everybody needs to learn the tool. > - they use openID, which makes it simpler for people to join (even > though you still need another > account) > - both bazaar and launchpad are developed in python, so they're more > python oriented > (while github is developed in ruby, so a better choice for bioruby). > > More on comparing these to possibilities (from the bazaar developers > non-objective point of view): > http://bazaar-vcs.org/BzrVsGit > > These are my 2 cents on the choice of tools for development, but I > have to admit that I'm not > sure whether it is needed for biopython now. I'm very open to discussion. > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Sat Jan 10 09:46:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 14:46:13 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <20090109225155.GF4135@sobchak.mgh.harvard.edu> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> On Fri, Jan 9, 2009 at 10:51 PM, Brad Chapman wrote: > Hi all; > In terms of the coding of experimental modules, Giovanni is taking > an excellent approach. While they are under development, we can > utilize one of the many free hosting platforms to develop it as a > separate project in the Bio namespace. This allows interested users > to get the code, contribute, and test. Once an interface and > functionality is hammered out and they begin to stabilize, then it's > a good time to package it up and roll it into Biopython provided the > ol' mailing list consensus is happy. This does describe recent large additions fairly well - such as Bio.SeqIO, Bio.AlignIO, Bio.Entrez, Bio.PopGen and most recently Bio.Graphics.GenomeDiagram (which is a little different in that it was previously publicly available as a separate module). Modifications to existing bits of code (for example I have some proposals for Seq, SeqRecord and Alignment objects as enhancement bugs) don't really work in the same way - but also by their nature require more discussion because they can indirectly affect a lot of code. > This is a nice development model as it leverages the community, but > only rolls code into the main release when it stabilizes reasonable > well. Peter has taken a really good development methodology -- > creating a rock solid stable core of modules, and actively deprecating > or fixing those that fall out of line. I really don't deserve all the credit here - Michiel has also been a strong proponent for this "spring cleaning" as needed, for example how our NCBI online bits have been rationalised, refocusing on Bio.Entrez at the preferred module. > My only suggestion would be to have a Biopython wiki page for the > experimental modules as they are under development. Something simple > with a description of the goals and a link to the source code would > help the majority of people who don't follow the mailing list find > and contribute to these. Using the wiki in this way is a nice idea. Tiago - do you fancy adding a PopGen page describing the additions you're working on? As a bonus, once these do get into the main repository, you may find the wiki text will be a useful basis for extending the documentation. Peter From mjldehoon at yahoo.com Sat Jan 10 11:30:07 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 10 Jan 2009 08:30:07 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> Message-ID: <126502.76038.qm@web62403.mail.re1.yahoo.com> > > We could discuss a modification to run_tests.py so > > that if there is no expected output file > > output/test_XXX for test_XXX.py we just run > > test_XXX.py and check its return value (I think > > Michiel had previously > > suggested something like this). > > I think this should be done inside the test itself. > All the tests should return only a boolean value (passed or > not) and a description of the error. > The tests that make use of an expected output file, they > should open it and do the comparison by themselves, not in > run_tests.py. Sounds attractive, but there is one complication for print-and-compare tests. The code that does the print-and-compare is not trivial (see run_tests.py). It is possible to have the print-and-compare code in a helper module, which is then imported by each print-and-compare test. Still, while currently the print-and-compare tests have the advantage of being simple, they will get more complicated if we require the print-and-compare to be part of each test. Does anybody have an opinion on this? It's either doing the print-and-compare as part of each print-and-compare test script, or requiring a test_suite() function in each unittest-based test script, and assuming that a test script is a unittest-based test script if it contains a test_suite() function. --Michiel From tiagoantao at gmail.com Sat Jan 10 11:48:03 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 10 Jan 2009 16:48:03 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <496778D2.1050801@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> Message-ID: <6d941f120901100848h6e186022o241b928ea2566993@mail.gmail.com> This whole discussion is very interesting. In fact, whatever are the conclusions I think they should be labeled "offical policy" and put on the Wiki. The biggest problem that I've faced is that, whenever I am doing something, I don't know the level of acceptability with other developers. I tend to put everything to discussion before I commit it and whenever I say something I might get completely different answers from time to time and from different people. The end result is that I defer from commiting things because of issues that are raised in an ad-hoc fashion. There should be a page clarifying things like: 1. Are contributions that have a small target audience accepted? 2. Use of foreign libraries (e.g., SciPy)? 3. Code management policies. Branches? Adding new code? Breaking interfaces? 4. New developers 5. Legal issues 6. Interop with non-free software 7. Code quality strategies. Code review? Testing? 8. Multiplatform issues I am not saying a big document. But as questions arise, just discuss them, arrive at a decision and document them. It becomes tiring having to answer the same questions about code that you want to submit over and over again and with different issues everytime. One can live with decisions that are disliked, but it is much more difficult to live when the playing ground is moving all the time. On Fri, Jan 9, 2009 at 4:18 PM, Bruce Southey wrote: > Hi, > In a previous thread (and indicated in others) it was suggested that perhaps > Biopython needs some type of development or experimental branch. So this > thread is orientated to provide some discussion on this and considers that > Biopython has moved to SVN. I think it is very relevant discussion because > Biopython needs an effective approach to mainly handle new code but also > handle significant rewrites of older code. > > The most important question is do you support creating developmental and > experimental branches or not? > > However, I do not think that this is a yes or no answer and I am not > concerned about the question at the present time. Rather I am concerned > about the burden placed on the maintainers (especially Peter and Michiel), > the expression of the developer needs and how this impact the community. I > am rather neutral on it (probably because I have not contributed any major > code to Biopython) but I would like to ensure that the discussion leads to > positive changes. > > I find Biopython interesting and special for various reasons. There is a > solid core of functions that are common to many aspects of bioinformatics. > But it also contains very specialized code that has a much smaller audience. > Consequently certain parts get considerable exposure and other parts get > limited or no exposure. This means that it may be necessary to release beta > versions in order to get the necessary exposure as I assume that code has > had sufficient development to be released in the first place. Creating > developmental and experimental branches is one way to get this exposure but > perhaps branches are not necessary. > > An alternative approach is creating specialized projects within Biopython > that can be used for development and testing. For example, Scipy provides > SciKits that are related code that is typically special purpose or is > released under a different license than scipy/numpy. This replaced the > sandboxes that existed in prior versions of numpy and scipy. But a recent > problem arose in numpy was how to get code from such a location into numpy > by creating a experimental section in the main distribution but that met > some strong resistance. > > Therefore, I see the following issues that need to be addressed regardless > of the approach taken: > > 0) Must be easy for project maintenance and release as this must not create > an extra burden to Biopython! > 1) Ensure adequate testing is performed especially to get it out to the > appropriate audience and to correct the code and APIs. I consider this > rather important because I tend to follow a type of user experience design > (http://en.wikipedia.org/wiki/User_experience_design) and software > prototyping (http://en.wikipedia.org/wiki/Software_prototyping) for software > development. > 2) Stabilization of APIs for backwards compatibility as we don't want to > change these with each Biopython release. > 3) Adequate test coverage especially across platforms and different software > versions. For example Windows paths and older software versions can cause > problems on other peoples machines but not yours. > 4) Some type of code review even if it is just to ensure a consistent format > (like spaces versus tabs) or compatibility across Python versions and > platforms. > 5) If developmental or experimental branch are used then how does the code > move into the main distribution and how are these branches created and > destroyed. > > Please add other issues. > > I would appreciate these issues being addressed when appropriate. > > Regards > Bruce > > Peter wrote: >> >> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: >> >>> >>> Considering that CVS has no development branch I think having git is >>> very good. I would just recommend extreme care with changing existing >>> code. When merging back into CVS, changes to existing code might not >>> go in (especially if they change interfaces) or be delayed. >>> >>> >> >> If there is a strong interest in having experimental branches in the >> official Biopython repository, we could discuss that as an option. >> Although I would prefer we get moved from CVS to SVN first before >> actually doing this, in order to keep the migration as simple as >> possible. >> >> Peter >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "Systems can remain irrational far longer than you or I can survive" - Freely adapted from John Maynard Keynes From tiagoantao at gmail.com Sat Jan 10 11:52:44 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 10 Jan 2009 16:52:44 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> Message-ID: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> On Sat, Jan 10, 2009 at 2:46 PM, Peter wrote: > Using the wiki in this way is a nice idea. Tiago - do you fancy > adding a PopGen page describing the additions you're working on? As a > bonus, once these do get into the main repository, you may find the > wiki text will be a useful basis for extending the documentation. Where do you want me to link the page on the Wiki? From biopython at maubp.freeserve.co.uk Sat Jan 10 12:03:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 17:03:05 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> Message-ID: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o wrote: > On Sat, Jan 10, 2009 at 2:46 PM, Peter wrote: >> Using the wiki in this way is a nice idea. Tiago - do you fancy >> adding a PopGen page describing the additions you're working on? As a >> bonus, once these do get into the main repository, you may find the >> wiki text will be a useful basis for extending the documentation. > > Where do you want me to link the page on the Wiki? How about having two pages: http://biopython.org/wiki/PopGen - documentation on the code in the current official release, - linked to from the main doc page http://biopython.org/wiki/PopGen_dev - discussion and links to your branch etc, - linked to from the above PopGen page This would be consistent with how I did the Bio.SeqIO pages, http://biopython.org/wiki/SeqIO http://biopython.org/wiki/SeqIO_dev If you think you have an better idea, feel free to make suggestions. Peter From peter at maubp.freeserve.co.uk Sat Jan 10 12:46:38 2009 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 17:46:38 +0000 Subject: [Biopython-dev] Developmental policies Message-ID: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> On Sat, Jan 10, 2009 at 4:48 PM, Tiago Ant?o wrote: > This whole discussion is very interesting. In fact, whatever are the > conclusions I think they should be labeled "offical policy" and put on > the Wiki. That sounds good. > The biggest problem that I've faced is that, whenever I am doing > something, I don't know the level of acceptability with other > developers. I tend to put everything to discussion before I commit it > and whenever I say something I might get completely different answers > from time to time and from different people. The end result is that I > defer from commiting things because of issues that are raised in an > ad-hoc fashion. Asking before doing things is in general a good plan. Sadly not everyone will be free to respond at any one time - but I agree with you that having more of the defacto policy written out explicitly would help. > There should be a page clarifying things like: > 1. Are contributions that have a small target audience accepted? Historically yes this has happened - although my impression is that the bar was perhaps set too low. I would say some things were accepted without sufficient documentation and tests. The problem with small interest modules is that if the original developer moves on, in the absense of any apparent users, the module gets abandoned. This seems to explain several of the smaller modules we've deprecated in the last couple of years. On the other hand, somethings will start with a small target audience that will grow. If I was confident that the developer concerned would stick arround for several years and was prepared to deal with documentation, unit tests and bug fixes then I would be much happier about including something, even if it might have a relatively small target audience initially. > 2. Use of foreign libraries (e.g., SciPy)? I think the current stance has been to try and minimise 3rd party dependencies, other than the special case of python wrappers for command line tools. This makes much easier for beginners to install and use Biopython, and lowering the barrier to entry is a good thing. There are practical points here too. In general, 3rd party dependencies can be a pain (e.g. our Martel parsers broke when mxTextTools changed their API between 2.0 and 3.0). Similarly they can restrict the distribution of Biopython (e.g. NumPy isn't get available on Windows for Python 2.6), and will also be a potential road block for moving to Python 3. As another example, a small part of Bio.PDB uses flex in a parser, and again this makes building and distributing it a real pain (so much so, that its been commented out by default). However, run time only dependencies (like pure python libraries and command line tools) are not such an issue for packaging/distribution. e.g. ReportLab (used in Bio.Graphics only). If SciPy were to be used by part of Bio.PopGen, and this didn't affect packaging/distribution then this might be OK. > 3. Code management policies. Branches? Adding new code? Breaking interfaces? Biopython has historically worked from a stable trunk. As a consequence we try and avoid breaking interfaces, instead adopting a gradual deprecation of an old interface when adding a new interface, or adding enhancements in a backwards compatible manor. > 4. New developers I think there is something written down about this already... > 5. Legal issues Try and avoid them? What did you mean in particular? > 6. Interop with non-free software This is linked to the legal issues question. Many of the tools we link to like BLAST aren't open source, but are "free" as in cost. I don't think we have any examples of non-free software. > 7. Code quality strategies. Code review? Testing? Code review: For new code in a specialist area, it can be difficult to get a qualified second opinion on the approach, but existing developers can at least comment on the coding style. For existing code, my impression is module owners have been trusted to make changes to "their" code without review - and generally speaking this has worked out OK. Although if anyone spot someone making a change they disagree with, then please do raise it. I would hope any larger change had some discussion before hand - possibly via enhancement entries on bugzilla. Testing: I'd strongly resist adding any new module without an accompanying test, and wish this had been a firm policy from day one. > 8. Multiplatform issues Ideally everything should be cross platform (like python itself). There are exceptions to this - in particular some 3rd party tools are not cross platform. I personally use and test on Windows, Linux and Mac - and I believe Michiel does too. > I am not saying a big document. But as questions arise, just discuss > them, arrive at a decision and document them. It becomes tiring having > to answer the same questions about code that you want to submit over > and over again and with different issues everytime. > One can live with decisions that are disliked, but it is much more > difficult to live when the playing ground is moving all the time. I'm sorry if you've had that feeling. However, circumstances change. As I recall when you first asked about using SciPy as a dependency, Biopython was still using Numeric instead of Numpy - so using SciPy had to wait until after that transition. Now that we have moved to NumPy, I think you have a much stronger case. Peter From tiagoantao at gmail.com Sat Jan 10 13:31:05 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 10 Jan 2009 18:31:05 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> Message-ID: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> > mxTextTools changed their API between 2.0 and 3.0). Similarly they > can restrict the distribution of Biopython (e.g. NumPy isn't get > available on Windows for Python 2.6), and will also be a potential > road block for moving to Python 3. As another example, a small part By the way, another issue that would be interesting to address is deprecation of older Python versions and Python 3. Like just having a clear stance on what is the current feeling about this. It seems to be a recurring question. >> 5. Legal issues > > Try and avoid them? What did you mean in particular? In my opinion something should be said about this. Actually I think (suggest) it is essencially a matter of mainly taking Bruce' s comments (e.g. one cannot have derived works of non-free software) and write them down on a wiki page. Just things potential contributor would have to be aware of on a legal front. > Testing: > I'd strongly resist adding any new module without an accompanying > test, and wish this had been a firm policy from day one. People should also be encouraged to test (in as much as possible) in at least Win/Linux/Mac. Of course, for some people it will be difficult as access to all platforms is not always possible for everybody. But at least encouragement should be made... > I'm sorry if you've had that feeling. However, circumstances change. > As I recall when you first asked about using SciPy as a dependency, > Biopython was still using Numeric instead of Numpy - so using SciPy > had to wait until after that transition. Now that we have moved to > NumPy, I think you have a much stronger case. Boss, don't say sorry, I think everybody would agree that you make a most fantastic effort. Regarding circunstances: When circunstances change, then one would ammend documents. Again, my point is not in favour of this or that policy. Only that a barebones policy should be documented. So that people know what the basic rules are, this will allow for realistic expectations with regards to code being accepted or not in the stable distribution. From peter at maubp.freeserve.co.uk Sat Jan 10 15:10:27 2009 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 20:10:27 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> Message-ID: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o wrote: > By the way, another issue that would be interesting to address is > deprecation of older Python versions and Python 3. Like just having a > clear stance on what is the current feeling about this. It seems to be > a recurring question. Regarding older versions of python, we have stated that Biopython 1.49 should work on Python 2.3 to 2.6, and we expect to do the same for Biopython 1.50. Thereafter, we will probably drop support for Python 2.3 (unless anyone has a strong need for it and makes their voice heard). See the mailing list archive and the corresponding new postings: http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/ http://news.open-bio.org/news/2008/11/biopython-release-149/ Regarding Python 3, one hold up will be neither ReportLab nor NumPy have a clear plan for Python 3 - or at least that is my impression. However, even ignoring those parts of Biopython which use NumPy (e.g. Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab), we have a lot of useful code. In the short term we should be aiming to have everything run under Python 2.6 in warnings mode, as a step towards eventual Python 3 support. Beyond that, I think that it is likely we'll want to use bytes rather than (unicode) strings in Python 3 for the Seq object, but have not given this much thought. >>> 5. Legal issues >> >> Try and avoid them? What did you mean in particular? > > In my opinion something should be said about this. Actually I think > (suggest) it is essencially a matter of mainly taking Bruce' s > comments (e.g. one cannot have derived works of non-free software) and > write them down on a wiki page. Just things potential contributor > would have to be aware of on a legal front. I see what you mean. Perhaps I am naive in thinking this should be common knowledge amongst potential contributors. >> Testing: >> I'd strongly resist adding any new module without an accompanying >> test, and wish this had been a firm policy from day one. > > People should also be encouraged to test (in as much as possible) in > at least Win/Linux/Mac. Of course, for some people it will be > difficult as access to all platforms is not always possible for > everybody. But at least encouragement should be made... Also tests which require additional setup are a pain. The BioSQL tests are an example of this, where it is unavoidable - but any situation like this reduces the number of people/machines where that test will get checked. Michiel has stressed this kind of thing as a concern in the past (as I recall). Peter From bugzilla-daemon at portal.open-bio.org Mon Jan 12 09:31:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 09:31:22 -0500 Subject: [Biopython-dev] [Bug 2731] New: Adding .upper() and .lower() methods to the Seq object Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2731 Summary: Adding .upper() and .lower() methods to the Seq object Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BugsThisDependsOn: 2532 OtherBugsDependingO 2351 nThis: As part of making the Seq object more string like (Bug 2351), it would be nice to support the .upper() and .lower() methods. Doing this elegantly will require different case versions of the alphabets (see Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet object itself. Alternatively, we can handle this without adding new Alphabets by mapping the fixed case IUPAC alphabets to case-less generic alphabets. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 12 09:31:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 09:31:25 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200901121431.n0CEVPFK010376@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2731 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 12 09:31:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 09:31:30 -0500 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200901121431.n0CEVUDG010399@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2731 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Jan 12 12:03:45 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 12 Jan 2009 11:03:45 -0600 Subject: [Biopython-dev] Developmental policies In-Reply-To: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> Message-ID: <496B77F1.9060207@gmail.com> Peter wrote: > On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o wrote: > >> By the way, another issue that would be interesting to address is >> deprecation of older Python versions and Python 3. Like just having a >> clear stance on what is the current feeling about this. It seems to be >> a recurring question. >> > > Regarding older versions of python, we have stated that Biopython 1.49 > should work on Python 2.3 to 2.6, and we expect to do the same for > Biopython 1.50. Thereafter, we will probably drop support for Python > 2.3 (unless anyone has a strong need for it and makes their voice > heard). See the mailing list archive and the corresponding new > postings: > http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/ > http://news.open-bio.org/news/2008/11/biopython-release-149/ > > Regarding Python 3, one hold up will be neither ReportLab nor NumPy > have a clear plan for Python 3 - or at least that is my impression. > There has been limited information on the numpy list regarding Python 3 but there has been some investigation on this (http://www.scipy.org/Python3k). I did ask about Python 3 last year in the thread titled 'Report from SciPy' and Robert Kern's response should be at: http://www.mail-archive.com/numpy-discussion at scipy.org/msg12101.html Also, this thread has the future aims of numpy (obviously still awaiting scipy 0.7): http://www.mail-archive.com/numpy-discussion at scipy.org/msg12091.html Currently I think the main current effort for numpy 1.3 is getting Python 2.6 fully supported (windows is the main problem) before there will be any further consideration of Python 3. One of the main problems is that numpy uses a few APIs that are depreciated in Python 3. So any porting will not go far until the correct APIs are used which is probably be after the next numpy release. > However, even ignoring those parts of Biopython which use NumPy (e.g. > Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab), > we have a lot of useful code. In the short term we should be aiming > to have everything run under Python 2.6 in warnings mode, as a step > towards eventual Python 3 support. > While I understand this approach, I do wonder how effective it will be compared to direct porting using the 2to3 tool. One reason is that 2to3 is more than a code convertor as it also attempts to guess at what you are trying to do. Anyhow, this is not a trivial task and I am willing to help in that regard. > Beyond that, I think that it is likely we'll want to use bytes rather > than (unicode) strings in Python 3 for the Seq object, but have not > given this much thought. > > >>>> 5. Legal issues >>>> >>> Try and avoid them? What did you mean in particular? >>> >> In my opinion something should be said about this. Actually I think >> (suggest) it is essencially a matter of mainly taking Bruce' s >> comments (e.g. one cannot have derived works of non-free software) and >> write them down on a wiki page. Just things potential contributor >> would have to be aware of on a legal front. >> > > I see what you mean. Perhaps I am naive in thinking this should be > common knowledge amongst potential contributors. > I think we must be explicit in this and ensure that any accepted code is BSD-compatible because we can not ensure what people really know. Further the license of any application that Biopython interacts with must be clearly stated and the developer is responsible to get one if it does not have one. That way we know what is included and should help users as well in terms of whether or not they can use some application. > >>> Testing: >>> I'd strongly resist adding any new module without an accompanying >>> test, and wish this had been a firm policy from day one. >>> >> People should also be encouraged to test (in as much as possible) in >> at least Win/Linux/Mac. Of course, for some people it will be >> difficult as access to all platforms is not always possible for >> everybody. But at least encouragement should be made... >> > > Also tests which require additional setup are a pain. The BioSQL > tests are an example of this, where it is unavoidable - but any > situation like this reduces the number of people/machines where that > test will get checked. Michiel has stressed this kind of thing as a > concern in the past (as I recall). > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > We can not force people to run tests but hope that sufficient people who do cover many of the variations as possible. Do we need to create buildbots (eg http://sourceforge.net/projects/buildbot/)? I do not test or use BioSQL code because I do not use BioSQL and do not run a compatible database on my system. So it would be really great if BioSQL supported sqlite because the database requirements would be alleviated. The other related aspect is that certain applications like clustalw must be in the path otherwise the application will not be found and the test skipped. But I do not know how to solve this except perhaps using environmental variables. Regards Bruce From bsouthey at gmail.com Mon Jan 12 12:34:50 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 12 Jan 2009 11:34:50 -0600 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> Message-ID: <496B7F3A.60407@gmail.com> Peter wrote: > On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman wrote: > >> Greetings all, >> >> Presently, the code I have for dealing with STRUCTURE is similar to the code >> for interacting with Clustal, in that it does not modify any of the STRUCTURE >> source code by merely initiates the compiled executable. >> > > Biopython has code for interacting with lots of command line tools, > and this neatly avoids any copyright/licence questions about being a > derived work. > I have no problem with this provided that the parsing follows documented information such a description of the output. I would have a problem if you based it code from another source that uses undocumented information or information not obvious from the output. > >> Initially, I have used my code in place of their Java front end as it allows >> for more control of the run-time variables for successive runs with varying >> run parameters. At some point, I'd like to get it to interface more >> directly with the STRUCTURE code to be able to pipe results directly to >> python for parsing rather than working with the STRUCTURE text output but >> that's a ways off still. >> > > I'm not quite clear what you have in mind, but this would probably > need a little more thought from the legal perspective. If STRUCTURE > provides an API with header files you can compile against, that should > be OK (but I am not a lawyer). Note that do this within Biopython > would then mean adding another build time dependency, which would need > to be justified in terms of the benefits it brings. > > Peter > Linking against header files is a gray area but some views considered it to be illegal (see the Linux kernel discussions on that!). It does really depend on whether or not the result can be considered to a derivative. Unless STRUCTURE is released under a BSD-compatible license, you should not use any code from it (and probably should not even look at the code). Just saying the code is free is insufficient because code licensed under the GPL is 'free' but not BSD-compatible. So if STRUCTURE does not have a license then either get one or forget about this until it does have a BSD-compatible license. Alternatively, get STRUCTURE to support your changes. One is being difficult simply because of the potential impact on the Biopython project by including code incompatible with the BSD license. Bruce From biopython at maubp.freeserve.co.uk Mon Jan 12 13:19:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Jan 2009 18:19:03 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <496B77F1.9060207@gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> Message-ID: <320fb6e00901121019h72463a5dl316cabc85100c09d@mail.gmail.com> > We can not force people to run tests but hope that sufficient people who do > cover many of the variations as possible. Do we need to create buildbots (eg > http://sourceforge.net/projects/buildbot/)? Some kind of "buildbots" would be nice - possibly with something hosted on the OBF server to hold the reports (even just via the wiki pages would work). I have access to one or two platforms at work which might be able to act in this way, but the infrastructure isn't there yet. > I do not test or use BioSQL code because I do not use BioSQL and do not run > a compatible database on my system. So it would be really great if BioSQL > supported sqlite because the database requirements would be alleviated. This was recently requested on the BioSQL mailing list - and it would be nice. > The other related aspect is that certain applications like clustalw must be > in the path otherwise the application will not be found and the test > skipped. But I do not know how to solve this except perhaps using > environmental variables. Part of setting up a "buildbot" or test server would include installing all the optional command line tools (like ClustalW) so that the full test suite can be run. Peter From bsouthey at gmail.com Mon Jan 12 17:24:00 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 12 Jan 2009 16:24:00 -0600 Subject: [Biopython-dev] Alphabet case and standards Message-ID: <496BC300.90003@gmail.com> Hi, I am moving a potential discussion away from the bugzilla because it affects at least the following Bugs (please add others): 2351 (Make Seq more like a string, even subclass string? http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ), 2532 (Using IUPAC alphabets in mixed case Seq objects http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ), 2597 (Enforce alphabet letters in Seq objects http://bugzilla.open-bio.org/show_bug.cgi?id=2597 ) 2731 (Adding .upper() and .lower() methods to the Seq object http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ). I am hoping it gets wider feedback than using bugzilla, avoid unnecessary duplication and closure of these bugs. From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with defined lists of valid letters which are in upper case ONLY". But various applications ignore the alphabet case and hence the standards. So this creates the problem of how Biopython should handle alphabet case. If we follow the standard for all modules then there should be not need to do anything except to ensure we follow it. There are numerous examples where the standard is not followed including users ignorance, simplicity or design (such as using mixed case to denote 'important' things), and various databases and applications do not follow it. But I think that the actual case is irrelevant in most situations and not following the standard would make Biopython inefficient. One suggestion given in two of the bugs is to change the Alphabet object but I believe that this is wrong because you do not know which alphabet to use. If you already know the case then my preferred option is change the case of your query. Otherwise you would have to obtain and use one alphabet for every case used, for example, a user may need two alphabets to handle upper and lower case or just one combined one. Also, if mixed case alphabets are used, then an excessive number of alphabets may be required. I think that current approach is to force to user to using uppercase when interacting with the Alphabet object or derived from it (such as an actual alphabet). While this maintains storage of the input case, it does not enforce the standard. This is also inefficient because it requires constant checks for the correct case. Similar to the first suggestion in Bug 2731, I think that we should automatically changes the case when creating any sequence-related object and provide a warning that the input has changed. This enforces standard and probably requires small changes to the code but loses the format of the input. Outside of Biopython, an example of this is the web version of NCBI blast silently converts input case of the query. Less desirable options: a) Enforces the standard such as with Bug 2597 so that an error is return for any sequence-related object if the case is incorrect. This is probably a little too harsh for a difference in case. b) Use regular expressions to ignore case but this will create a large penalty especially if it is not required. Regards Bruce From bugzilla-daemon at portal.open-bio.org Mon Jan 12 17:43:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 17:43:55 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901122243.n0CMhtlZ017015@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ------- Comment #1 from bsouthey at gmail.com 2009-01-12 17:43 EST ------- (In reply to comment #0) > As part of making the Seq object more string like (Bug 2351), it would be nice > to support the .upper() and .lower() methods. Sure it would be nice in terms of following the string object, but I do not follow the reasons for having .upper() and .lower() methods to the Seq object. If we follow the standards, these should be unnecessary. The only time that I see is when you want this is to output the sequence. In such situations, the sequence is likely to be a string which has these methods. I do not consider that other applications can handle different case a sufficiently compelling reason. > > Doing this elegantly will require different case versions of the alphabets (see > Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet > object itself. > > Alternatively, we can handle this without adding new Alphabets by mapping the > fixed case IUPAC alphabets to case-less generic alphabets. > These comments suggests that Seq object needs to be case-aware which also affects other methods like string queries. But I think this is a different issue such as whether or not the standards would be enforced than having these two methods. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Jan 12 18:04:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Jan 2009 23:04:46 +0000 Subject: [Biopython-dev] Alphabet case and standards In-Reply-To: <496BC300.90003@gmail.com> References: <496BC300.90003@gmail.com> Message-ID: <320fb6e00901121504u6e9f3b7fu23e5f2ea25dee003@mail.gmail.com> On Mon, Jan 12, 2009 at 10:24 PM, Bruce Southey wrote: > Hi, > I am moving a potential discussion away from the bugzilla because it affects > at least the following Bugs (please add others): > 2351 (Make Seq more like a string, even subclass string? > http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ), > 2532 (Using IUPAC alphabets in mixed case Seq objects > http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ), > 2597 (Enforce alphabet letters in Seq objects > http://bugzilla.open-bio.org/show_bug.cgi?id=2597 ) > 2731 (Adding .upper() and .lower() methods to the Seq object > http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ). > > I am hoping it gets wider feedback than using bugzilla, avoid unnecessary > duplication and closure of these bugs. Yes, having a discussion on the mailing list is probably better than on bugzilla. I should probably write up my views on this topic explicitly, but I've tried to do so below in reply to your points. > From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with > defined lists of valid letters which are in upper case ONLY". But various > applications ignore the alphabet case and hence the standards. So this > creates the problem of how Biopython should handle alphabet case. > ... I don't want to prevent people from using mixed case or lower case sequences if they want to. However, I do think doing so with an alphabet which is intended to be an upper case ONLY should be treated as an error. We currently have a number of generic alphabets which DO NOT define the a set of valid letters. We also have some IUPAC derived alphabet which define a set of upper case only expected letters. So, if you want to use lower or mixed case sequences in a Seq object, (1) Use a generic alphabet which does not explicitly define the valid letters (so any characters are allowed) (2) Use an explicit alphabet which includes the relevant cases. This could be a user defined alphabet, or we one added to Biopython. Most of the time in my personally usage, I don't actually care about the precise alphabet - the generic DNA/RNA/protein alphabets suffice. These do not list the expected/allowed letters, and thus can be used for upper case, lower case or mixed case sequences. Working with well defined alphabets is more important when working with things like BLOSUM matrices. > One suggestion given in two of the bugs is to change the Alphabet object but > I believe that this is wrong because you do not know which alphabet to use. The person creating the Seq object should know what kind of data they are dealing with, and if they specifically want to use say "mixed case unambiguous IUPAC DNA" (if this were in Biopython) then that's up to them. If you don't know exactly what you are dealing with, fall back on the generic DNA alphabet, or the generic nucleotide alphabet, or even the generic single letter alphabet. > ... Also, if mixed case alphabets are used, then an excessive number > of alphabets may be required. We *could* introduce mixed case IUPAC alphabets, and lower case IUPAC alphabets to complement the existing upper case IUPAC alphabets (see my patch on 2532). Yes, this does add a lot of alphabets, and I'm not entirely keen on this either. Maybe just adding mixed case versions would suffice? > I think that current approach is to force to user to using uppercase when > interacting with the Alphabet object or derived from it (such as an actual > alphabet). While this maintains storage of the input case, it does not > enforce the standard. This is also inefficient because it requires constant > checks for the correct case. Right now we don't force the user to do anything. I would like to make the alphabet check strict (Bug 2579), or at least give a warning. Running with this change locally has flagged up several typos in my unit tests - I think it is a good thing. > Similar to the first suggestion in Bug 2731, I think that we should > automatically changes the case when creating any sequence-related object and > provide a warning that the input has changed. This enforces standard and > probably requires small changes to the code but loses the format of the > input. Outside of Biopython, an example of this is the web version of NCBI > blast silently converts input case of the query. My personal view on automatically changing the case of the sequence string when creating a Seq object: NO WAY. You're throwing away potentially important data, and also preventing people from working with mixed case sequences - for no real benefit. > Less desirable options: > a) Enforces the standard such as with Bug 2597 so that an error is return > for any sequence-related object if the case is incorrect. This is probably a > little too harsh for a difference in case. It could be done as a warning for a couple of releases, and later an error. Why do you think it is too hash? Maybe I am being pedantic here, but lots of code gets written assuming uppercase letters only, and in this situation having any unwanted lower case caught early is a good thing. To my mind the whole point about the user explicity using for example the IUPAC protein alphabet is they expect the sequence to comply with the IUPAC conventions. I *WANT* to get an error if the sequence contained something invalid like a "@" character, or anything else not in the IUPAC definition. Mixed cases are a special case of this (the IUPAC standards use upper case). > b) Use regular expressions to ignore case but this will create a large > penalty especially if it is not required. I'm not sure what you mean here, but I don't think regular expressions are required. Peter From bugzilla-daemon at portal.open-bio.org Mon Jan 12 18:30:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 18:30:49 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901122330.n0CNUnG7021141@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-12 18:30 EST ------- Created an attachment (id=1191) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1191&action=view) Patch to Bio/Seq.py ONLY adding upper and lower methods This patch is a proof of principle of how we could add upper and lower methods while following the strict alphabet checking proposed on Bug 2597. The code is a little complicated/nasty in order to localise the change to Bio/Seq.py only. Here is a usage example with the patch applied, >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("AGGGTGTTGA",IUPAC.IUPACUnambiguousDNA()) >>> my_dna Seq('AGGGTGTTGA', IUPACUnambiguousDNA()) >>> my_dna.lower() Seq('agggtgttga', NucleotideAlphabet()) >>> my_dna.lower().upper() Seq('AGGGTGTTGA', NucleotideAlphabet()) Note that If we implemented (private) upper and lower methods in the Alphabet objects as I suggested on Bug 2532, the code in the Seq class would be much simpler, e.g. def upper(self) : return Seq(str(self).upper(), self.alphabet._upper()) def lower(self) : return Seq(str(self).lower(), self.alphabet._upper()) The generic alphabets (where the list of letters is undefined) would just return self, while the AlphabetEncoders could also implement these methods simply. Individual explicit alphabets (i.e. the IUPAC ones) would have to define sensible upper/lower mappings - perhaps by defining lower case variants (see Bug 2532). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 12 19:21:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 19:21:42 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901130021.n0D0LgUu024264@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1191 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-12 19:21 EST ------- (From update of attachment 1191) There are a couple of "if" statements which should be "elif", but otherwise the patch seems to cover the basics. However, it does not cover the pathological/evil situation where a LETTER has been used for a stop codon or gap character. e.g. Something this should happen (assuming Bug 2597 is implemented in order to trigger the exception shown): >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped >>> my_dna = Seq("AGGGTXGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x")) Traceback (most recent call last): ... ValueError: Letter 'X' not in Gapped(IUPACUnambiguousDNA(), 'x') >>> my_dna = Seq("AGGGTxGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x")) >>> my_dna.lower() Seq('agggtxgttga', Gapped(DNAAlphabet(), 'x')) >>> my_dna.lower().upper() Seq('AGGGTXGTTGA', Gapped(DNAAlphabet(), 'X')) I think the most elegant way to deal with the AlphabetEncoders (stop and gaps) is by adding (private) upper/lower methods to the Alphabet objects as I outlined in comment 2. Patch taking this approach to follow... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 12 19:30:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 19:30:55 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901130030.n0D0UtHL024905@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-12 19:30 EST ------- Created an attachment (id=1192) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1192&action=view) Patch to Bio/Seq.py and Bio/Alphabet/__init__.py Implements upper/lower methods in the Seq object, handling the alphabet case conversion in the Alphabet object using (private) upper/lower methods. This could be extended for the IUPAC alphabets if we add lower case variants to those (see Bug 2532). This works for the evil example in comment 3 where the case of any extra characters from an AlphabetEncoder should also be changed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Jan 13 06:49:19 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 13 Jan 2009 12:49:19 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> Message-ID: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com> On Sat, Jan 10, 2009 at 6:03 PM, Peter wrote: > On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o wrote: >> On Sat, Jan 10, 2009 at 2:46 PM, Peter wrote: >>> Using the wiki in this way is a nice idea. Tiago - do you fancy >>> adding a PopGen page describing the additions you're working on? As a >>> bonus, once these do get into the main repository, you may find the >>> wiki text will be a useful basis for extending the documentation. >> >> Where do you want me to link the page on the Wiki? > > How about having two pages: > > http://biopython.org/wiki/PopGen > - documentation on the code in the current official release, > - linked to from the main doc page > > http://biopython.org/wiki/PopGen_dev ok, I have started writing something there.. _______________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From tiagoantao at gmail.com Tue Jan 13 07:14:05 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 13 Jan 2009 12:14:05 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <496B7F3A.60407@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> <496B7F3A.60407@gmail.com> Message-ID: <6d941f120901130414v3f770f3dy84bc44e4b4a8e25f@mail.gmail.com> > Linking against header files is a gray area but some views considered it to > be illegal (see the Linux kernel discussions on that!). It does really > depend on whether or not the result can be considered to a derivative. Fortunately this is not the case with Jason's code. Anyway, if there is agreement on what you said, I think most of the comments made should be put on the Wiki in some form. I don't mind to draft something myself based on your comments. From tiagoantao at gmail.com Tue Jan 13 07:34:56 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 13 Jan 2009 12:34:56 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <496B77F1.9060207@gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> Message-ID: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> > I think we must be explicit in this and ensure that any accepted code is > BSD-compatible because we can not ensure what people really know. Further > the license of any application that Biopython interacts with must be clearly > stated and the developer is responsible to get one if it does not have one. > That way we know what is included and should help users as well in terms of > whether or not they can use some application. A point is not clear here to me: If you only interact with an (say command-line and web-based) application, is there a problem if that application has an unspecified license? There are 3 dimensions here that I find important 1. If biopython interacts with a application with no license are there possible liabilities with regards to the project? The same question in regards to users? 2. I would remember that interaction might be library based (with linking - where we know problems exist), command-line based (are there any problems?) and web-based (are there any problems different from the command-line case?). 3. I would suppose (for licensed non-free apps) that some licenses might not be clear in regards to this kind of usage. Would it be necessary to inspect the licenses in detail? A strict view regarding software without licenses (ie, no interaction at all) would require immediate removal of the fdist code (not very important, it is the part that is probably not used by anyone). No inclusion of LDNe code. And more importantly no STRUCTURE interaction code and no Genepop interaction code (although the file format parser that currently inside is OK). So, the very pertinent question are: 1. Can biopython command-line interact with applications with no license? 2. Is biopython interacting with applications (command-line or web) for which the license is not clear regarding interaction with software? From p.j.a.cock at googlemail.com Tue Jan 13 07:54:57 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 13 Jan 2009 12:54:57 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> Message-ID: <320fb6e00901130454i13f1faedw29e049f9b9df9478@mail.gmail.com> > So, the very pertinent question are: > 1. Can biopython command-line interact with applications with no license? I think so, yes. If there was a license then it may try and impose rules which could prevent this (possible in some legal jurisdictions?). Even "viral" licences like the GPL should be fine in this context. However, for the Population Genetics software you are talking about, trying to get the authors to make their licence explicit would be worthwhile (even if they just say its given freely to the public domain or whatever the terminology is). > 2. Is biopython interacting with applications (command-line or web) > for which the license is not clear regarding interaction with > software? For command line tools (e.g. ClustalW, BLAST) calling them from a script is common practice. In fact, by the nature command line tools are generally expected to be used in this way. I think we are OK here. For web tools, in some cases the provider provides clear instructions (e.g. NCBI and BLAST and Entrez). Another example is Bio.PDB can fetch files from the FTP site - which is by its nature provided as a public server. In other cases things are perhaps a little less clear cut. Speaking generally, many websites do have conditions imposed in their terms of service (e.g. TV listing sites don't want people "screen scraping" with a script to "steal" the schedule information), although these may not be legally enforeable. However, this is unlikely to be a problem in the academic setting applicable to most websites Biopython may interact with. Peter From bsouthey at gmail.com Tue Jan 13 11:50:28 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 13 Jan 2009 10:50:28 -0600 Subject: [Biopython-dev] Developmental policies In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> Message-ID: <496CC654.5090806@gmail.com> Tiago Ant?o wrote: >> I think we must be explicit in this and ensure that any accepted code is >> BSD-compatible because we can not ensure what people really know. Further >> the license of any application that Biopython interacts with must be clearly >> stated and the developer is responsible to get one if it does not have one. >> That way we know what is included and should help users as well in terms of >> whether or not they can use some application. >> > > > A point is not clear here to me: If you only interact with an (say > command-line and web-based) application, is there a problem if that > application has an unspecified license? There are 3 dimensions here > that I find important > 1. If biopython interacts with a application with no license are there > possible liabilities with regards to the project? The same question in > regards to users? > I do not think that there is any real difference between the developer and the user as ignorance is usually not a good defense. If you use code from another application in your project with little or no modification (such as rewriting the code into Python) or did reverse-engineering or even looked at the code then your application could be controlled by the license of that application. Obviously if it has a license then you must abide those terms. If it does not have a license and you do not get permission to use that code then you have violated the original author's copyrights and you are liable for damages. Of course, as in one of the most important open-source related cases in the USA, the Jacobsen v. Katzer case (eg http://www.groklaw.net/article.php?story=2008081313212422 ) about the Java Model Railroad Interface (JMRI), those damages may be nothing. > 2. I would remember that interaction might be library based (with > linking - where we know problems exist), command-line based (are there > any problems?) and web-based (are there any problems different from > the command-line case?). > Unless the application forbids it then there is no problem on how you actually run the application. As Peter said, web tools also have conditions that you have keep or you will find yourself locked out. The main problem is using someone else's code in your project and the real problem is the actual terms of the code used. Using a function from that code in yours is a potential violation such as how to parse the output especially if it is in a binary format. If your code clearly follows the published documentation or a clean-room approach (see http://en.wikipedia.org/wiki/Clean_room_design ) was properly used then there should no problems. Linking only becomes a problem if your code can be considered a derivative or the license forbids linking such as the GPL but not the LGPL. However, this is a grey area as evident from the use of binary drivers in Linux. > 3. I would suppose (for licensed non-free apps) that some licenses > might not be clear in regards to this kind of usage. Would it be > necessary to inspect the licenses in detail? > Yes, you must inspect any license in detail because even downloading the code can involve or imply acceptance of the terms. Some licenses, usually for commercial applications, are rather nasty in terms what can and can not be done like no reverse engineering. Even open source license like the GPL v3 can have some unexpected side effects (ie related to patents). Most non-open source licenses (including academic only licenses) that I have seen related to bioinformatics usually are aimed at restricting the commercial usage of the code and the subsequent distribution of it. But you need to see if there are other restrictions involved that limit the output from that application. > A strict view regarding software without licenses (ie, no interaction > at all) would require immediate removal of the fdist code (not very > important, it is the part that is probably not used by anyone). No > inclusion of LDNe code. And more importantly no STRUCTURE interaction > code and no Genepop interaction code (although the file format parser > that currently inside is OK). > If the interaction is just creating inputs, running the standalone application and parsing the output, then those interactions should be okay. Obviously the code to create the input and parse the output must be free of the application like based on public documentation or a clean-room approach. If the interaction creates a derivative such as when the code of the application is required in addition to your code then it is not okay. Further, as Peter commented elsewhere, there needs to be strong justification to include it into Biopython. Rather I would strongly suggest that you try to get your code included in the other application as it may help other users and you don't have to maintain a version of the original application. > So, the very pertinent question are: > 1. Can biopython command-line interact with applications with no license? > Yes, but must not be considered a derivative of the application or it must do so in terms of the license. For example, AlignACE uses the Harvard University license where everyone using it must have their own license or it can be run on a second computer provided that only one copy is running at a time. > 2. Is biopython interacting with applications (command-line or web) > for which the license is not clear regarding interaction with > software? > I do not know the answer to this question because I do not know or use all the applications involved. However, we do need to create a list of applications with associated web sites and licenses that Biopython 'interacts' with which would answer this question. Regards Bruce From bsouthey at gmail.com Wed Jan 14 15:24:29 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 14 Jan 2009 14:24:29 -0600 Subject: [Biopython-dev] Running Biopython tests on windows xp Message-ID: <496E49FD.4080305@gmail.com> Hi, I decided to install windows on a virtual system part to have a windows test system. I installed Python 2.5, numpy 1.2 and biopython 1.49 using binary installers. I am aiming to get add the optional software like Reportlab and a C compiler. Is there a way to run the Biopython tests within Python rather than using the system command line? When I run the tests from the command like I get a number a failures that I think are due to a lack of a C compiler. Are these expected or do you want bug reports? Bruce C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.49>c:\Pyt hon25\python.exe setup.py test running test test_Ace ... ok test_AlignIO ... ok test_BioSQL ... skipping. Install MySQLdb or correct Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). ok test_BioSQL_SeqIO ... skipping. Install MySQLdb or correct Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). ok test_CAPS ... ERROR test_Clustalw ... ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to us e Bio.Clustalw. ok test_Cluster ... FAIL test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. ok test_EmbossPrimer ... ok test_Entrez ... ok test_Enzyme ... ok test_FSSP ... ok test_Fasta ... ok test_Fasta2 ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GFF ... skipping. Environment is not configured for this test (not importan t if you do not plan to use Bio.GFF). ok test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF. ok test_GenBank ... ok test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.G raphics. ok test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio .Graphics. ok test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Grap hics. ok test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_IsoelectricPoint ... ok test_KDTree ... ERROR test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LocationParser ... ok test_LogisticRegression ... ok test_MEME ... ok test_MarkovModel ... ok test_Medline ... ok test_NCBIStandalone ... ok test_NCBIXML ... ok test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PDB ... ERROR test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDis t. ok test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen .SimCoal. ok test_PopGen_SimCoal_nodepend ... ok test_ProtParam ... ok test_Registry ... ok test_Restriction ... ERROR test_SCOP_Astral ... ok test_SCOP_Cla ... FAIL test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... FAIL test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SProt ... ok test_SVDSuperimposer ... ok test_SeqIO ... ok test_SeqIO_online ... ok test_SeqUtils ... ok test_SubsMat ... ok test_UniGene ... ok test_Wise ... skipping. Don't know how to find the Wise2 tool dnal on Windows. ok test_align ... ok test_docstrings ... ok test_geo ... ok test_interpro ... ok test_kNN ... ok test_lowess ... ok test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... ok test_prosite ... ok test_prosite2 ... ok test_psw ... skipping. Don't know how to find the Wise2 tool dnal on Windows. ok test_seq ... ok test_translate ... ok test_trie ... ERROR test_triefind ... ERROR ====================================================================== ERROR: test_CAPS ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_CAPS.py", line 3, in from Bio.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\__init__.py", line 61, in from Bio.Restriction.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\Restriction.py", line 96, in from Bio.Restriction.PrintFormat import PrintFormat File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\PrintFormat.py", line 14, in from Bio.Restriction.DNAUtils import complement ImportError: No module named DNAUtils ====================================================================== ERROR: test_KDTree ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_KDTree.py", line 10, in from Bio.KDTree.KDTree import _neighbor_test, _test File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\__init__.py", line 10, in from KDTree import KDTree File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\KDTree.py", line 20, in from Bio.KDTree import _CKDTree ImportError: cannot import name _CKDTree ====================================================================== ERROR: test_PDB ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_PDB.py", line 98, in run_test() File "test_PDB.py", line 90, in run_test quick_neighbor_search_test() File "test_PDB.py", line 19, in quick_neighbor_search_test from Bio.PDB.NeighborSearch import NeighborSearch File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\PDB\NeighborSearch.py", line 8, in from Bio.KDTree import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\__init__.py", line 10, in from KDTree import KDTree File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\KDTree.py", line 20, in from Bio.KDTree import _CKDTree ImportError: cannot import name _CKDTree ====================================================================== ERROR: test_Restriction ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_Restriction.py", line 8, in from Bio.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\__init__.py", line 61, in from Bio.Restriction.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\Restriction.py", line 96, in from Bio.Restriction.PrintFormat import PrintFormat File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\PrintFormat.py", line 13, in from Bio.Restriction import RanaConfig as RanaConf ImportError: cannot import name RanaConfig ====================================================================== ERROR: test_trie ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_trie.py", line 6, in from Bio import trie ImportError: cannot import name trie ====================================================================== ERROR: test_triefind ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_triefind.py", line 6, in from Bio import trie ImportError: cannot import name trie ====================================================================== FAIL: test_Cluster ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest expected_handle) File "run_tests.py", line 263, in compare_output % (repr(output_line), repr(expected_line)) AssertionError: Output : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n' Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n' ====================================================================== FAIL: test_SCOP_Cla ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest expected_handle) File "run_tests.py", line 263, in compare_output % (repr(output_line), repr(expected_line)) AssertionError: Output : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n' Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n' ====================================================================== FAIL: test_SCOP_Raf ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest expected_handle) File "run_tests.py", line 263, in compare_output % (repr(output_line), repr(expected_line)) AssertionError: Output : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n' Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n' ---------------------------------------------------------------------- Ran 96 tests in 86.153s FAILED (failures=3, errors=6) C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.49> From tiagoantao at gmail.com Wed Jan 14 15:52:58 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 14 Jan 2009 20:52:58 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com> Message-ID: <6d941f120901141252x1a1088f9n7f30d894f35c18ab@mail.gmail.com> >> http://biopython.org/wiki/PopGen_dev > > ok, I have started writing something there.. I've edited the development one. I would recommend anyone interested in tracking the changes to watch the page. From biopython at maubp.freeserve.co.uk Wed Jan 14 16:43:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jan 2009 21:43:33 +0000 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <496E49FD.4080305@gmail.com> References: <496E49FD.4080305@gmail.com> Message-ID: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> On Wed, Jan 14, 2009 at 8:24 PM, Bruce Southey wrote: > Hi, > I decided to install windows on a virtual system part to have a windows test > system. I installed Python 2.5, numpy 1.2 and biopython 1.49 using binary > installers. I am aiming to get add the optional software like Reportlab and > a C compiler. If you are installing Biopython using our Windows Installer then you shouldn't need a C compiler. If you would like to install from source, then yes, you will need a C compiler. You can either try the appropriate MS compiler for your version of python, or we suggest Mingw32 from cygwin. > Is there a way to run the Biopython tests within Python rather than using > the system command line? Not really - why do you want to? I suppose you could use python to invoke the command "python run_tests.py". > When I run the tests from the command like I get a number a failures that I > think are due to a lack of a C compiler. > > Are these expected or do you want bug reports? These are not expected. The whole test suite passes for me on Windows where I have installed Biopython from source. So you installed Biopython using our Window Installer - how did you get the unit tests? I'm pretty sure the SCOP failures are due to the files under Tests\SCOP having Unix line endings instead of Windows line endings (we're fixed some similar issues in the past). Note that both the source code archives as *.zip and *.tar.gz use Unix line endings internally, but if you used CVS it should have got them with Windows line endings for you. However, most of your test failures do seem to be related to C code in some way. I wonder if this is linked to the virtual environment? I should be able to try the Biopython 1.49 installer with Python 2.5 on a Windows machine myself to check that... The list of failures: > test_CAPS ... ERROR > test_Cluster ... FAIL > test_KDTree ... ERROR > test_PDB ... ERROR > test_Restriction ... ERROR > test_SCOP_Cla ... FAIL > test_SCOP_Raf ... FAIL > test_trie ... ERROR > test_triefind ... ERROR And some comments on the messages: > ERROR: test_CAPS > ... > from Bio.Restriction.DNAUtils import complement > ImportError: No module named DNAUtils Strange. Note Bio.Restriction.DNAUtils is a C module. > ERROR: test_KDTree > ... > from Bio.KDTree import _CKDTree > ImportError: cannot import name _CKDTree Again, Bio.KDTree. _CKDTree is a C module > ERROR: test_PDB > ... > from Bio.KDTree import _CKDTree > ImportError: cannot import name _CKDTree Same failure as test_KDTree > ERROR: test_Restriction > ... > from Bio.Restriction import RanaConfig as RanaConf > ImportError: cannot import name RanaConfig Odd. RanaConfig is a pure python module, and pretty short too. > ERROR: test_trie > ... > from Bio import trie > ImportError: cannot import name trie Bio.trie is another C module > ERROR: test_triefind > ... > from Bio import trie > ImportError: cannot import name trie Same error as test_trie above. > FAIL: test_Cluster > ... > Output : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n' > Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n' Could you run this test directly (python test_Cluster.py) which should give a more helpful message. But again, this module does include some C code.... > FAIL: test_SCOP_Cla > ... > Output : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n' > Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n' I think this is just a new line issue. > FAIL: test_SCOP_Raf > ... > Output : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n' > Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n' I think this is just a new line issue. Peter From bsouthey at gmail.com Wed Jan 14 17:48:27 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 14 Jan 2009 16:48:27 -0600 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> Message-ID: <496E6BBB.2020506@gmail.com> Peter wrote: > These are not expected. The whole test suite passes for me on Windows > where I have installed Biopython from source. > > So you installed Biopython using our Window Installer - how did you > get the unit tests? I'm pretty sure the SCOP failures are due to the > files under Tests\SCOP having Unix line endings instead of Windows > line endings (we're fixed some similar issues in the past). Note that > both the source code archives as *.zip and *.tar.gz use Unix line > endings internally, but if you used CVS it should have got them with > Windows line endings for you. > > However, most of your test failures do seem to be related to C code in > some way. I wonder if this is linked to the virtual environment? I > should be able to try the Biopython 1.49 installer with Python 2.5 on > a Windows machine myself to check that... > > The list of failures: > >> test_CAPS ... ERROR >> test_Cluster ... FAIL >> test_KDTree ... ERROR >> test_PDB ... ERROR >> test_Restriction ... ERROR >> test_trie ... ERROR >> test_triefind ... ERROR >> Using IDLE, 'from Bio.Restriction import *' works correctly. These ones are failures to find the correct biopython installation. Both 'python setup.py test' and 'python run_tests.py' are assuming that I have built from source and everything is in the local directory. But that assumption is wrong since I used the Biopython binary installer so technically the tests I run are invalid. The difference for these failures can be seen here: C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe test_KDTree.py Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe run_tests.py test_KDTree.py test_KDTree ... ERROR ====================================================================== ERROR: test_KDTree ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_KDTree.py", line 10, in from Bio.KDTree.KDTree import _neighbor_test, _test File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\__init__.py", line 10, in File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\KDTree.py", line 20, in ImportError: cannot import name _CKDTree ---------------------------------------------------------------------- Ran 1 test in 0.100s FAILED (errors=1) For the SCOP tests, this is as you say, a 'end of line' issue between windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and saved it with a new name. The line from testIndex in test_SCOP_Cla.py that gave the error index['d4hbia_'] works with the new file but not the old file. I also installed reportlab and biosql and these pass the tests (except for the mysql warning with Biosql that Peter reported). Regards Bruce From biopython at maubp.freeserve.co.uk Wed Jan 14 18:27:27 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jan 2009 23:27:27 +0000 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <496E6BBB.2020506@gmail.com> References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> <496E6BBB.2020506@gmail.com> Message-ID: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey wrote: > Using IDLE, 'from Bio.Restriction import *' works correctly. > > These ones are failures to find the correct biopython installation. Both > 'python setup.py test' and 'python run_tests.py' are assuming that I have > built from source and everything is in the local directory. But that > assumption is wrong since I used the Biopython binary installer so > technically the tests I run are invalid. I think I understand what's going on now. All these failures are essentially due to the unusual and unexpected setup on your machine (or for the SCOP tests, the line endings). You still didn't explain how/where you installed the test scripts etc, but what I think is happening is the following: You're official installation (including the compiled C code) create using the Windows Installer is in one place, say under C:\XXX\site-packages for the sake of discussion. You've unpacked the source code in another location, and are trying to run the test suite there. This set of files will NOT have the compiled C code - and thus running some of the tests via run_tests.py will fail. If you run individual test_XXX.py files this should use the system installed files under C:\XXX\site-packages and so the test should work. It would be a bit of a hack, but you can probably overcome this by manually copying the installed compiled modules from C:\XXX\site-packages into the unpacked source code (under a suitably named build sub directory), or moving the Test suite next to the installed code. Alternatively, you could try editing run_tests.py to comment out the path "magic" so that is just uses the system installation of Biopython (rather than trying to use the local copy it expects you to have just built from source), i.e. try commenting out these two lines in run_tests.py found near the start of the main function: sys.path.insert(1, source_path) sys.path.insert(1, build_path) However, I'm no longer surprised that the C code tests are failing, and don't think this is a bug per se. > For the SCOP tests, this is as you say, a 'end of line' issue between > windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and > saved it with a new name. The line from testIndex in test_SCOP_Cla.py that > gave the error index['d4hbia_'] works with the new file but not the old > file. Good to confirm that. If you spot an easy cross platform fix so that the SCOP code can cope with either line ending that would be good, but I didn't consider this worth sending much time on. > I also installed reportlab and biosql and these pass the tests (except for > the mysql warning with Biosql that Peter reported). Good. Out of interest, which BioSQL warning are you talking about? Peter From bsouthey at gmail.com Wed Jan 14 22:10:30 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 14 Jan 2009 21:10:30 -0600 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> <496E6BBB.2020506@gmail.com> <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> Message-ID: On Wed, Jan 14, 2009 at 5:27 PM, Peter wrote: > On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey wrote: >> Using IDLE, 'from Bio.Restriction import *' works correctly. >> >> These ones are failures to find the correct biopython installation. Both >> 'python setup.py test' and 'python run_tests.py' are assuming that I have >> built from source and everything is in the local directory. But that >> assumption is wrong since I used the Biopython binary installer so >> technically the tests I run are invalid. > > I think I understand what's going on now. All these failures are > essentially due to the unusual and unexpected setup on your machine > (or for the SCOP tests, the line endings). I do not see it as unusual as it does follow the instructions. But these clearly need some enhancement to address perhaps a variation of one of the options below. I am now curious about what happens under Linux distros because these may have the same issue. > You still didn't explain > how/where you installed the test scripts etc, but what I think is > happening is the following: > > You're official installation (including the compiled C code) create > using the Windows Installer is in one place, say under > C:\XXX\site-packages for the sake of discussion. > > You've unpacked the source code in another location, and are trying to > run the test suite there. This set of files will NOT have the > compiled C code - and thus running some of the tests via run_tests.py > will fail. If you run individual test_XXX.py files this should use > the system installed files under C:\XXX\site-packages and so the test > should work. Correct! The installation documentation is lacking at least for the binary installer. Depending on what happens, I will write down this information. Would be be a hassle to include the tests with the binary installer? At least of the tests should work if they are run from that directory. > > It would be a bit of a hack, but you can probably overcome this by > manually copying the installed compiled modules from > C:\XXX\site-packages into the unpacked source code (under a suitably > named build sub directory), or moving the Test suite next to the > installed code. While this would work for the binary installer, I do not think it is suitable solution for building it from source - especially if someone has the binary installer and is building but not necessary installing from source. > > Alternatively, you could try editing run_tests.py to comment out the > path "magic" so that is just uses the system installation of Biopython > (rather than trying to use the local copy it expects you to have just > built from source), i.e. try commenting out these two lines in > run_tests.py found near the start of the main function: > > sys.path.insert(1, source_path) > sys.path.insert(1, build_path) I think the best solution is to fix this part because these assume the location of the source and build directories even if these are not really present. I would suggest we add a new commandline option that causes the source_path and/or build_path variables to be undefined forcing Python to use the installed versions. Passing a user-specified path is also an option but these can get long. > However, I'm no longer surprised that the C code tests are failing, > and don't think this is a bug per se. Agreed - just a case that has not been addressed yet. > >> For the SCOP tests, this is as you say, a 'end of line' issue between >> windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and >> saved it with a new name. The line from testIndex in test_SCOP_Cla.py that >> gave the error index['d4hbia_'] works with the new file but not the old >> file. > > Good to confirm that. If you spot an easy cross platform fix so that > the SCOP code can cope with either line ending that would be good, but > I didn't consider this worth sending much time on. When I get to my system, I will see if my Linux system will accept the file correctly because the other SCOP tests did work. If I get time I will try to look at that as I looked at the function and I think it is just the way the file is being used. > >> I also installed reportlab and biosql and these pass the tests (except for >> the mysql warning with Biosql that Peter reported). > > Good. Out of interest, which BioSQL warning are you talking about? > > Peter Sorry, I do not have that handy but it is depreciation one for a setting that will be gone in MySQL 5.2. Bruce From biopython at maubp.freeserve.co.uk Thu Jan 15 07:46:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jan 2009 12:46:21 +0000 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> <496E6BBB.2020506@gmail.com> <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> Message-ID: <320fb6e00901150446j57748cf0mb493601444a9422d@mail.gmail.com> >> >> I think I understand what's going on now. All these failures are >> essentially due to the unusual and unexpected setup on your machine >> (or for the SCOP tests, the line endings). > > I do not see it as unusual as it does follow the instructions. But > these clearly need some enhancement to address perhaps a variation of > one of the options below. There are no instructions on how to install Biopython on Windows using the provided installer and then run the unit tests - so I don't understand what you mean by you followed the instructions. If the installer came with the unit tests then this would be sensible. Right now the only documented way to run the unit tests is part of an installation from source. >> You've unpacked the source code in another location, and are trying to >> run the test suite there. This set of files will NOT have the >> compiled C code - and thus running some of the tests via run_tests.py >> will fail. If you run individual test_XXX.py files this should use >> the system installed files under C:\XXX\site-packages and so the test >> should work. > > Correct! > > The installation documentation is lacking at least for the binary > installer. Depending on what happens, I will write down this > information. > > Would be be a hassle to include the tests with the binary installer? I don't know enough about distutils to answer that. So the short answer is yes, it might be a hassle. > At least of the tests should work if they are run from that directory. Which directory? >> It would be a bit of a hack, but you can probably overcome this by >> manually copying the installed compiled modules from >> C:\XXX\site-packages into the unpacked source code (under a suitably >> named build sub directory), or moving the Test suite next to the >> installed code. > > While this would work for the binary installer, I do not think it is > suitable solution for building it from source - especially if someone > has the binary installer and is building but not necessary installing > from source. The hack suggested was specifically for combining the installed files from the Windows installer with the test suite by hand - you don't need to do anything special if you are building from source. The current run_tests.py should work perfectly for anyone building from source (on Windows, Linux and Mac). You can (and ideally should) build biopython, and then run the tests BEFORE installing it. >> Alternatively, you could try editing run_tests.py to comment out the >> path "magic" so that is just uses the system installation of Biopython >> (rather than trying to use the local copy it expects you to have just >> built from source), i.e. try commenting out these two lines in >> run_tests.py found near the start of the main function: >> >> sys.path.insert(1, source_path) >> sys.path.insert(1, build_path) > > I think the best solution is to fix this part because these assume the > location of the source and build directories even if these are not > really present. If you are building from source this is a safe assumption (and in fact the code does check they exist). We WANT to run the tests using the just built and not yet installed files! > I would suggest we add a new commandline option that > causes the source_path and/or build_path variables to be undefined > forcing Python to use the installed versions. Passing a user-specified > path is also an option but these can get long. Yes, an option to run_test.py to use the system installed version of Biopython could solve this particular situation. Alternatively, and perhaps more simply for the end user, we could add a prompt if there is no build directory to ask the user if they want to run the tests using an already installed version of Biopython. I might have time to come up with a patch for this... >> However, I'm no longer surprised that the C code tests are failing, >> and don't think this is a bug per se. > > Agreed - just a case that has not been addressed yet. ---------------------------------------------------------------------------------------------- >>> I also installed reportlab and biosql and these pass the tests (except for >>> the mysql warning with Biosql that Peter reported). >> >> Good. Out of interest, which BioSQL warning are you talking about? >> >> Peter > > Sorry, I do not have that handy but it is depreciation one for a > setting that will be gone in MySQL 5.2. You might be referring to BioSQL Bug 2568, http://bugzilla.open-bio.org/show_bug.cgi?id=2568 Peter From bugzilla-daemon at portal.open-bio.org Thu Jan 15 09:37:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 09:37:57 -0500 Subject: [Biopython-dev] [Bug 2733] New: Unit tests incorrectly assume that Biopthyon was built from source Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2733 Summary: Unit tests incorrectly assume that Biopthyon was built from source Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: minor Priority: P4 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com If Biopython is not built from source and the tests are run from a different place than the installation, the test that use C objects fail because these are not found (an example is below). Currently the test environment uses the Biopython in the build directory. It would be nice to be able to optionally specify some other Biopython such as the installed version using say a command line argument. Example of a failure: ====================================================================== ERROR: test_KDTree ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py.orig", line 125, in runTest self.runSafeTest() File "run_tests.py.orig", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_KDTree.py", line 10, in from Bio.KDTree.KDTree import _neighbor_test, _test File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/__init__.py", line 10, in from KDTree import KDTree File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/KDTree.py", line 20, in from Bio.KDTree import _CKDTree ImportError: cannot import name _CKDTree ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 09:44:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 09:44:15 -0500 Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that Biopthyon was built from source In-Reply-To: Message-ID: <200901151444.n0FEiFd8020991@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #1 from bsouthey at gmail.com 2009-01-15 09:44 EST ------- Created an attachment (id=1197) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1197&action=view) Patch to avoid adding source path if Biopython is not built from source This is a simple path to that just moves the inclusion of the source path to being conditional on the presence of the build directory. That is, if a build directory exists, then we assume that Biopython was built from the source. But if the build directory does not exist then the source path is not added and the test environment will use the installed Biopython and not the source directory. This patch works on a Linux system with the build directory removed and a Windows XP system using the binary Biopython installer. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 10:20:58 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 10:20:58 -0500 Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that Biopthyon was built from source In-Reply-To: Message-ID: <200901151520.n0FFKwqZ024124@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-15 10:20 EST ------- Created an attachment (id=1198) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view) Patch to Tests/run_tests.py Bruce, Could you try out this alternative patch which tries to tell the user what is happening in this atypical situation. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 10:26:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 10:26:13 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151526.n0FFQD5F024483@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|minor |enhancement Summary|Unit tests incorrectly |Runing unit tests where |assume that Biopthyon was |Biopthyon wasn't built from |built from source |source ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-15 10:26 EST ------- Retitling bug and marking it as an enhancement. The main use case for this is Windows users who installed Biopython from one our Windows Installers (pre-compiled, does not include the unit tests), and later download and unzip the source code archive in order to run the unit tests. As Bruce points out, this might also apply to Linux users who install a Biopython package (pre-compiled, and presumably not including the unit tests), and then want to run the unit tests without themselves compiling Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 10:41:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 10:41:34 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151541.n0FFfYgG025830@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #4 from dalloliogm at gmail.com 2009-01-15 10:41 EST ------- (In reply to comment #0) What about re-organizing the tests in three categories: - the ones needed to make sure the modules don't contain errors - the ones needed to make sure that biopython can run correctly in the user's environment - the ones needed to make sure that the C modules are compiled correctly. Usually, people don't need to repeat the tests from case 1, but only case 2 and in 3 if they have compiled biopython by theirselves. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 11:09:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 11:09:34 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151609.n0FG9Y5V028318@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #5 from bsouthey at gmail.com 2009-01-15 11:09 EST ------- (In reply to comment #2) > Created an attachment (id=1198) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view) [details] > Patch to Tests/run_tests.py > > Bruce, > > Could you try out this alternative patch which tries to tell the user what is > happening in this atypical situation. > > Peter > Very quickly it works for my Linux system where I removed the build directory but have Biopython installed. I will let you known for Windows and also when Biopython is not installed. But I do not foresee any problems with the patch. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 12:18:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 12:18:31 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151718.n0FHIVSm001687@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-15 12:18 EST ------- (In reply to comment #4) > (In reply to comment #0) > > What about re-organizing the tests in three categories: > - the ones needed to make sure the modules don't contain errors > - the ones needed to make sure that biopython can run correctly > in the user's environment > - the ones needed to make sure that the C modules are compiled correctly. > > Usually, people don't need to repeat the tests from case 1, but only > case 2 and in 3 if they have compiled biopython by theirselves. Case 1 applies to all the unit tests. Case 2 applies to all the unit tests whose dependencies are present. Case 3 applies to those modules with C code. I don't really understand your divisions. If was compiling Biopython myself, I've want all the tests run. If I installed a pre-compiled version Biopython (from a Linux distribution or the Windows installers), I'd still want to try and run all the tests. There is the special case of trying to use Biopython without the C code modules (e.g. installing from source without a C compiler, or for repackaging a subset of the modules), but that is atypical. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 15:31:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 15:31:21 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901152031.n0FKVLDp015913@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #7 from bsouthey at gmail.com 2009-01-15 15:31 EST ------- (In reply to comment #5) > (In reply to comment #2) Just to confirm that it works as expected with windows xp 1) Without Biopython installed C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py You do not seem to have built Biopython from source. You do not seem to have installed Biopython. 2) With Biopython installed: C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py You do not seem to have built Biopython from source. Unit tests will be run using the installed Biopython. test_trie ... ok ---------------------------------------------------------------------- Ran 1 test in 0.731s OK -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 18:55:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 18:55:14 -0500 Subject: [Biopython-dev] [Bug 2734] New: db.load problem with postgresql and psycopg2 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2734 Summary: db.load problem with postgresql and psycopg2 Product: Biopython Version: 1.49 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: stephen at blackrim.net I have a simple script to load sequences into a postgresql database using the biosql schema and biopython db.load function. here is the script : from Bio import GenBank from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="psycopg2", user=...) db = server["plants"] for i in range(37): handle = open("PLN/gbpln"+str(i+1)+".seq", "r") db.load(SeqIO.parse(handle,"genbank")) handle.close() print str(i+1) server.adaptor.commit() there is an error with the output and here it is with some of the psycopg2 debug info: asis_dealloc: deleted asis object at 0x52350, refcnt = 0 psyco_curs_execute: cvt->refcnt = 1 curs_execute: pg connection at 0x8d0c00 OK pq_begin: pgconn = 0x8d0c00, isolevel = 1, status = 2 pq_begin: transaction in progress pq_execute: executing SYNC query: SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND dbxref_id = "6" pq_execute: entering syncronous DBAPI compatibility mode pq_fetch: pgstatus = PGRES_FATAL_ERROR pq_fetch: uh-oh, something FAILED pq_fetch: fetching done; check for critical errors psyco_curs_execute: res = -1, pgres = 0x0 Traceback (most recent call last): File "add_seqs_subdb2 2.py", line 9, in db.load(SeqIO.parse(handle,"genbank")) File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420, in load db_loader.load_seqrecord(cur_record) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in load_seqrecord self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in _load_seqfeature self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in _load_seqfeature_qualifiers seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in _load_seqfeature_dbxref self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in _get_seqfeature_dbxref dbxref_id)) File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295, in execute_and_fetch_col0 self.cursor.execute(sql, args or ()) psycopg2.ProgrammingError: column "3" does not exist LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db... it seems like there could be some issues with the double quotes but i am not sure where that is being called. i am using postgresql 8.2. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 16 05:24:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Jan 2009 05:24:16 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901161024.n0GAOGFA015422@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-16 05:24 EST ------- Hi Stephen, Does this happen for all the files you've tried, or just one or two? If its the later it may be something funny about the file and how its been parsed. I'm guessing you downloaded the GenBank files from ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing. Have you tried running the Biopython unit tests - in particular the two for BioSQL? I presume you installed Biopython from source on your Mac, so you should have all the files present. You'll need to edit the file Tests/setup_BioSQL.py to point to a suitable postgresql test database. P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to import Bio.GenBank (first line of code snippet). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 16 14:12:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Jan 2009 14:12:28 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901161912.n0GJCSWO030831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #2 from stephen at blackrim.net 2009-01-16 14:12 EST ------- Hi Peter, Thanks for the quick reply. I will try to answer everything here. So I just reran the BioSQL tests and I get test_BioSQL ... ok test_BioSQL_SeqIO ... ok so seems like everything there is fine (and I did configure the test for postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it happens not only with all the files but also with the example on the biopython biosql wiki page. Specifically with this example: from Bio import Entrez from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="psycopg2", ...) db = server["plants"] handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") db.load(SeqIO.parse(handle, "genbank")) server.adaptor.commit() I get the same error: Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420, in load db_loader.load_seqrecord(cur_record) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in load_seqrecord self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in _load_seqfeature self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in _load_seqfeature_qualifiers seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in _load_seqfeature_dbxref self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in _get_seqfeature_dbxref dbxref_id)) File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295, in execute_and_fetch_col0 self.cursor.execute(sql, args or ()) psycopg2.ProgrammingError: column "3" does not exist LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db... Thanks for any help. Stephen (In reply to comment #1) > Hi Stephen, > > Does this happen for all the files you've tried, or just one or two? If its > the later it may be something funny about the file and how its been parsed. > I'm guessing you downloaded the GenBank files from > ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing. > > Have you tried running the Biopython unit tests - in particular the two for > BioSQL? I presume you installed Biopython from source on your Mac, so you > should have all the files present. You'll need to edit the file > Tests/setup_BioSQL.py to point to a suitable postgresql test database. > > P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to > import Bio.GenBank (first line of code snippet). > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 17 05:09:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 17 Jan 2009 05:09:21 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901171009.n0HA9Lk3027163@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #3 from cymon.cox at gmail.com 2009-01-17 05:09 EST ------- Hi Stephen, 2009/1/16 : > http://bugzilla.open-bio.org/show_bug.cgi?id=2734 > > ------- Comment #2 from stephen at blackrim.net 2009-01-16 14:12 EST ------- > Hi Peter, > Thanks for the quick reply. I will try to answer everything here. So I just > reran the BioSQL tests and I get > test_BioSQL ... ok > test_BioSQL_SeqIO ... ok > > so seems like everything there is fine (and I did configure the test for > postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it > happens not only with all the files but also with the example on the biopython > biosql wiki page. Specifically with this example: > from Bio import Entrez > from Bio import SeqIO > from BioSQL import BioSeqDatabase > server = BioSeqDatabase.open_database(driver="psycopg2", ...) > db = server["plants"] > handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", > rettype="genbank") > db.load(SeqIO.parse(handle, "genbank")) > server.adaptor.commit() This code works form me: [cymon at chara ~]$ python Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import Entrez >>> from Bio import SeqIO >>> from BioSQL import BioSeqDatabase >>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test") >>> db = server.new_database("blah", description="Just for testing") >>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") >>> server.adaptor.commit() >>> What versions of biopython and the BioSQL schema are you using? Cymon -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 17 05:50:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 17 Jan 2009 05:50:19 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901171050.n0HAoJZa029834@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #4 from cymon.cox at gmail.com 2009-01-17 05:50 EST ------- > This code works form me: > [cymon at chara ~]$ python > Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36) > [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio import Entrez > >>> from Bio import SeqIO > >>> from BioSQL import BioSeqDatabase > >>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test") > >>> db = server.new_database("blah", description="Just for testing") > >>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") > >>> server.adaptor.commit() > >>> Sorry forgot to load it! :) >>> db.load(SeqIO.parse(handle, "genbank")) 3 >>> server.adaptor.commit() >>> C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 21 13:22:47 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Jan 2009 13:22:47 -0500 Subject: [Biopython-dev] [Bug 2738] New: Speed up GenBank parsing, in particular location parsing Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2738 Summary: Speed up GenBank parsing, in particular location parsing Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This is an enhancement "bug", for trying to improve the speed of parsing GenBank files WITHOUT any functionality changes. From previous profiling, I have found that the location parsing looks like an easy target. However, this code is non-trivial so we should proceed with caution. Possible patch to follow... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 21 13:30:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Jan 2009 13:30:27 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901211830.n0LIURFx009561@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-21 13:30 EST ------- Created an attachment (id=1206) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1206&action=view) Patch for Bio/GenBank/__init__.py to handle simple locations with re This patch handles the simple cases (non-fuzzy, no database references) using simple python and regular expressions. Everything else works by falling back on the old spark based Bio.GenBank.LocationParser code (e.g. fuzzy locations). The new code is pretty simple, and could potentially be extended to cover all the currently used location strings found in the feature table, allowing us to remove the use of Bio.GenBank.LocationParser, which in the long term this could lead to an overall code simplification. In the short term, this patch does complicate the location parsing because it means there are effectively two ways we parse the location strings (my new code, and the old spark based Bio.GenBank.LocationParser code). However, from my limited testing using Python 2.5 on the Mac with GenBank files for large bacterial genomes, this may be a price worth paying. I'll like independent measurements (and to check this on other platforms), but this does seem to more than halve the time taken to parse GenBank files! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 22 13:58:18 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Jan 2009 13:58:18 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901221858.n0MIwIpR000974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-22 13:58 EST ------- Created an attachment (id=1208) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view) Simple test script for timing GenBank parsing I've attached a trivial script to time parsing all the GenBank files in directory to help anyone wanting to benchmark this change. (In reply to comment #1) > However, from my limited testing using Python 2.5 on the Mac with GenBank > files for large bacterial genomes, this may be a price worth paying. I'll > like independent measurements (and to check this on other platforms), but > this does seem to more than halve the time taken to parse GenBank files! Further testing with Python 2.5 on Linux, this time also with some large Eurakyotics files, appears to confirm a very large speed up (most obvious on feature rich GenBank files of course). I still want to check this on other versions of python... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 23 03:43:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Jan 2009 03:43:01 -0500 Subject: [Biopython-dev] [Bug 2740] New: Wise test fails with wise 2.4.1 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2740 Summary: Wise test fails with wise 2.4.1 Product: Biopython Version: 1.49 Platform: Other OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: charles-debian-nospam at plessy.org Dear Biopython developers, The test for wise fails with wise 2.4.1 and Biopython 1.49. I think one gap is missing in the reference used in the test script (probably that wise changed its gap opening penalties): anx159???Tests???$ dnal Wise/human_114_g01_exons.fna_01 Wise/human_114_g02_exons.fna_01 Warning Error Strangely truncated line in fasta file Warning Error Strangely truncated line in fasta file DnaAlign Matrix calculation: [ 14000] Cells 95% Score 114 Warning Error Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than allowed name block (12). Truncating Warning Error Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than allowed name block (12). Truncating ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC A GGAA GCCCC AGCTC CT TCT CT C TCC TGC A TGG TCC ENSG000001631 ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC ENSG00000172191 CA CA ENSG0000016347 CA This is compared to a different reference result in the test script: anx159???Tests???$ grep -A5 -B5 ENSG00000172135 test_Wise.py sys.stdout = self.old_stdout class TestWise(unittest.TestCase): def test_align(self): temp_file = Wise.align(["dnal"], ("Wise/human_114_g01_exons.fna_01", "Wise/human_114_g02_exons.fna_01"), kbyte=100000, force_type="DNA", quiet=True) self.assertEqual(temp_file.readline().rstrip(), "ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC") def run_tests(argv): test_suite = testing_suite() runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) runner.run(test_suite) Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 23 07:06:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Jan 2009 07:06:29 -0500 Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1 In-Reply-To: Message-ID: <200901231206.n0NC6T4B023669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2740 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-23 07:06 EST ------- Thanks for the report. Based on the following pages I had assumed the latest version was wise 2.2.0, available here: http://www.sanger.ac.uk/Software/Wise2/ points to ftp://ftp.ebi.ac.uk/pub/software/unix/wise2/ which only contains up to wise 2.2.0 After some Google searching I found Ewan Birney had changed his mind and stared work on it again: http://www.ebi.ac.uk/~birney/wise2/ Installing wise 2.4.1 took a while (tip for Linux uses, edit file src/models/phasemodel.c line 23 to replace isnumber by isdigit), but I can confirm the error you reported. This is the output from an older version of wise, $ ~/Downloads/wise2.2.0/src/bin/dnal Wise/human_114_g01_exons.fna_01 Wise/human_114_g02_exons.fna_01 DnaAlign Matrix calculation: [ 14000] Cells 97% Warning Error Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than allowed name block (12). Truncating Warning Error Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than allowed name block (12). Truncating ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC A GGAA GCCCC AGCTC CT TCT CT C TCC TGC A GG TCCC ENSG000001631 ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGCTCCC ENSG00000172192 A A ENSG0000016348 A Using the newer version of wise, we do indeed get a different alignment: $ ~/Downloads/wise2.4.1/src/bin/dnal Wise/human_114_g01_exons.fna_01 Wise/human_114_g02_exons.fna_01 DnaAlign Matrix calculation: [ 14000] Cells 97% Score 114 Warning Error Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than allowed name block (12). Truncating Warning Error Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than allowed name block (12). Truncating ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC A GGAA GCCCC AGCTC CT TCT CT C TCC TGC A TGG TCC ENSG000001631 ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC ENSG00000172191 CA CA ENSG0000016347 CA -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 23 07:28:05 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Jan 2009 07:28:05 -0500 Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1 In-Reply-To: Message-ID: <200901231228.n0NCS5a8028823@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2740 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-23 07:28 EST ------- This should be fixed in CVS, see: Tests/test_Wise.py revision 1.7 Tests/output/test_Wise revision 1.3 All I have done is made the unit test accept the old output, or the slightly different output from wise 2.4.1 - the main Biopython code is unchanged. >From the help text (just run dnal with no arguments), it appears the gap penalties have not changed - so the differing alignments but be an algorithm change of some sort. Another small difference is with wise 2.4.1, even in quiet mode, dnal starts its output by printing the score. Thank you for reporting this, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 05:13:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 05:13:43 -0500 Subject: [Biopython-dev] [Bug 2743] New: manual installation overwrites previous biopython installations Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2743 Summary: manual installation overwrites previous biopython installations Product: Biopython Version: Not Applicable Platform: All URL: http://lists.open-bio.org/pipermail/biopython/2009- January/004893.html OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com The manual biopython installation (the one made with python setup.py install) installs all the files in a directory like this: - /usr/lib/python2.5/site-packages/Bio The problem comes when you want to install biopython in a system where there is already an old version installed. In that case, it is not clear what happens to the old installation... are all the old files removed before the new version is installed? Or are the two versions 'mixed'? please refer to this discussion: - http://lists.open-bio.org/pipermail/biopython/2009-January/004893.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 06:05:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 06:05:07 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281105.n0SB577F013398@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2009-01-28 06:05 EST ------- (In reply to comment #0) > The manual biopython installation (the one made with python setup.py install) > installs all the files in a directory like this: > - /usr/lib/python2.5/site-packages/Bio > > The problem comes when you want to install biopython in a system where there is > already an old version installed. > In that case, it is not clear what happens to the old installation... are all > the old files removed before the new version is installed? Or are the two > versions 'mixed'? Isn't this what always happens when installing a Python module? If so, then it doesn't seem to be a Biopython bug to me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 06:14:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 06:14:28 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281114.n0SBESYY014510@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #2 from dalloliogm at gmail.com 2009-01-28 06:14 EST ------- (In reply to comment #1) > (In reply to comment #0) > > The manual biopython installation (the one made with python setup.py install) > > installs all the files in a directory like this: > > - /usr/lib/python2.5/site-packages/Bio > > > > The problem comes when you want to install biopython in a system where there is > > already an old version installed. > > In that case, it is not clear what happens to the old installation... are all > > the old files removed before the new version is installed? Or are the two > > versions 'mixed'? > > Isn't this what always happens when installing a Python module? If so, then it > doesn't seem to be a Biopython bug to me. Well, I don't know if it is the same behaviour for the other python modules, but it can create dangerous situations, especially if you are 'downgrading' a biopython installation. The biopython installer should clarify that, asking the user if he wants to overwrite the existing installation, change the installation path, or abort. Anyway. the right way to install biopython should be by using easy_install. Easy_install downloads the latest code and creates an egg, and then install everything on a directory like this: - /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/ automatically changing $PYTHON_PATH. I suggest to change the biopython's wiki to tell people that they should always prefer to install biopython with easy_install, which by the way works perfectly and automatically checks the dependencies. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 07:46:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 07:46:37 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281246.n0SCkbKj028750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-28 07:46 EST ------- (In reply to comment #1) > > the old files removed before the new version is installed? Or are the two > > versions 'mixed'? > > Isn't this what always happens when installing a Python module? If so, then it > doesn't seem to be a Biopython bug to me. Agreed. As far as I know, this affects ANY python module installed with distutils - and indeed this is typical practice for ANY unix tool installed from source via a make file. It is essentially NORMAL, although not so nice for beginners. Linux distributions will often provide packaged versions of python libraries (including Biopython) which you can install/update/remove using the system's package manager (e.g. apt, yum, up2date etc). The only downside to me is they won't always have the latest version of each package. I suppose we could add a hack to setup.py to check if there is already a Biopython installation present (try doing "import Bio"), and if it is installed, ask the user if they want to continue. However, there are legitimate situations where this just makes things more confusing. e.g. You don't have admin rights on a unix machine where your systems administrator has provided python and an old version of Biopython, so you want to install the latest version of Biopython under your home directory. (In reply to comment #2) > I suggest to change the biopython's wiki to tell people that they should > always prefer to install biopython with easy_install, which by the way works > perfectly and automatically checks the dependencies. For now distutils is still the python standard, while easy_install is an non-standard optional extra. This in some ways using easy_install is more work. Note that easy_install doesn't provide a simple uninstall either: http://peak.telecommunity.com/DevCenter/EasyInstall#uninstalling-packages -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 10:23:48 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 10:23:48 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281523.n0SFNmqQ013945@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #4 from bsouthey at gmail.com 2009-01-28 10:23 EST ------- (In reply to comment #3) > (In reply to comment #1) > > > the old files removed before the new version is installed? Or are the two > > > versions 'mixed'? > > > > Isn't this what always happens when installing a Python module? If so, then it > > doesn't seem to be a Biopython bug to me. > > Agreed. As far as I know, this affects ANY python module installed with > distutils - and indeed this is typical practice for ANY unix tool installed > from source via a make file. It is essentially NORMAL, although not so nice > for beginners. > Agreed that this is not a Biopython bug but a Python feature. Yes, the installation is usually 'mixed' when installing from source. The setup will remove the existing egg-info and then a new one. Python copies the files to the appropriate place thus overwriting any old files with new versions but old files that are no longer present or files with different names will remain. To my knowledge, Python and Biopython will not know about those files unless a user explicitly tries to use them. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 12:41:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 12:41:19 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291741.n0THfJYC018518@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #3 from bsouthey at gmail.com 2009-01-29 12:41 EST ------- First, I object to this patch because it replaces the current version without keeping the old code. It should create a new parsing function so verify that the old and new versions provide exactly the same output for the same input. As indicated below, it does speed things up! So I have no problems for it to replace the current parsing code in the next release provided that the old parsing code remains as depreciated function. (Alternatively add a conditional statement with a flag to avoid this new code as required.) (In reply to comment #2) > Created an attachment (id=1208) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view) [details] > Simple test script for timing GenBank parsing > > I've attached a trivial script to time parsing all the GenBank files in > directory to help anyone wanting to benchmark this change. > > (In reply to comment #1) > > However, from my limited testing using Python 2.5 on the Mac with GenBank > > files for large bacterial genomes, this may be a price worth paying. I'll > > like independent measurements (and to check this on other platforms), but > > this does seem to more than halve the time taken to parse GenBank files! > > Further testing with Python 2.5 on Linux, this time also with some large > Eurakyotics files, appears to confirm a very large speed up (most obvious on > feature rich GenBank files of course). > > I still want to check this on other versions of python... > I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5 and 2.6) and noted that this halved the time required to parse a Genbank Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb) with 213942 records with total length 158245604 bp). While the number of records and sequences are the same, I have not checked if the patched version is providing exactly the same output as the unpatched version. This is very important for the different types of GenBank files (Whole Genome Shotgun and CON types). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 12:57:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 12:57:22 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291757.n0THvMVl023111@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-29 12:57 EST ------- (In reply to comment #3) > First, I object to this patch because it replaces the current version without > keeping the old code. It does keep the old code, and explicitly uses the old code for the non-simple locations. > It should create a new parsing function so verify that > the old and new versions provide exactly the same output for the same input. We should probably extend the Biopython GenBank/EMBL parsing unit tests to make sure this patch doesn't break anything, and additionally have some extra test cases using big GenBank files which won't become official unit tests. This could be as simple as a script which parses all the records in a set of GenBank files, printing out a very minimal summary of each feature location (including subfeatures). We then run the script with and without the patch, and confirm their output matches. Once we are happy that the patch doesn't change the parser behaviour, I don't see any reason to offer both options to the end user. In fact, I would prefer to go further and REMOVE the old slow location parser after extending the regular expression based parser to cope with ALL location variants. > As indicated below, it does speed things up! So I have no problems for it to > replace the current parsing code in the next release provided that the old > parsing code remains as depreciated function. (Alternatively add a conditional > statement with a flag to avoid this new code as required.) Having the new code controlled by some option would actually be pretty easy. Other than for testing I see no reason to do this. > I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5 > and 2.6) and noted that this halved the time required to parse a Genbank > Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb) > with 213942 records with total length 158245604 bp). That is consistent with the speed ups I have seen - you can get even more depending on the proportion of features in the file. Thanks for checking python 2.3 to 2.6, nice to see they all benefit. > While the number of records and sequences are the same, I have not checked if > the patched version is providing exactly the same output as the unpatched > version. This is very important for the different types of GenBank files > (Whole Genome Shotgun and CON types). I agree through testing is important here. Would you like to suggest any particular WGS or CON files for testing with? I'm thinking something large with a wide range of location types would be good for checking this patch (but not to include with Biopython). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 13:26:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 13:26:09 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291826.n0TIQ9YR030903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-29 13:26 EST ------- Created an attachment (id=1209) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1209&action=view) Simple test script for checking GenBank location parsing This is a simple script to help validate the location parsing has not changed. Intended usage is to put the script in a directory with a good set of test GenBank files (all ending with the extension .gbk), then: (starting with a clean install of Biopython) $ time python parse_gbk_locs.py > old.txt (apply the patch) $ time python parse_gbk_locs.py > new.txt (verify the output matches) $ ls -l old.txt new.txt (check file sizes agree) $ diff old.txt new.txt (should be no output) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 14:38:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 14:38:20 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291938.n0TJcKh2021246@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #6 from bsouthey at gmail.com 2009-01-29 14:38 EST ------- Created an attachment (id=1210) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view) Single test case that is not correctly parsed I just used a simple 'print record' followed by a diff (but that does not check the references). This record (and related ones) has a difference between versions ... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 16:13:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 16:13:19 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901292113.n0TLDJ51019466@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #7 from bsouthey at gmail.com 2009-01-29 16:13 EST ------- (In reply to comment #4) > > While the number of records and sequences are the same, I have not checked if > > the patched version is providing exactly the same output as the unpatched > > version. This is very important for the different types of GenBank files > > (Whole Genome Shotgun and CON types). > > I agree through testing is important here. Would you like to suggest any > particular WGS or CON files for testing with? I downloaded a few example files including WGS and CON. I found that CON files are not parsed by either version. Not a surprise given that these have no sequences but that is a different topic. Apart from the errors in attached case, I have not seen any other errors (even parsing the references). Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 06:00:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:00:24 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901301100.n0UB0OsD002442@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:00 EST ------- (In reply to comment #6) > Created an attachment (id=1210) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view) [details] > Single test case that is not correctly parsed > > I just used a simple 'print record' followed by a diff (but that does not > check the references). This record (and related ones) has a difference > between versions ... If you do a 'print record' with a SeqRecord object, any references are shown using their __repr__ string - which is currently the python object default which includes a memory address (something I've been meaning to address on Bug 2544). Different objects will have different memory locations, which will show up in the diff. For example, using the following as a simple test script and capturing its output to files: from Bio import SeqIO record = SeqIO.read(open("CY029873.gbk"), "genbank") print record Running diff with and without the patch gave me: 9c9 < /references=[, ] --- > /references=[, ] i.e. No real differences between the records as far as I can see. Please clarify - if you have found a failing example I would be most interested. (In reply to comment #7) > I downloaded a few example files including WGS and CON. I found that CON files > are not parsed by either version. Not a surprise given that these have no > sequences but that is a different topic. Apart from the errors in attached > case, I have not seen any other errors (even parsing the references). Could you clarify your problem with the CON files please (on a new bug, or the mailing list - since as you point out this is a different topic). I've just downloaded and unzipped one of the smaller CON files and it parses fine for me: ftp://ftp.ncbi.nih.gov/genbank/gbcon107.seq.gz >>> from Bio import SeqIO >>> count = 0 >>> for record in SeqIO.parse(open("gbcon107.seq"),"genbank") : count += 1 ... >>> print count 55031 As expected there is no sequence, but the name, description, features, references etc are there. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 06:29:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:29:07 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901301129.n0UBT7Ah008213@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:29 EST ------- I've run my test script (attachment 1209) on a Linux machine with Python 2.5 5.5K Jan 30 10:29 CY029873.gbk 67M Jan 22 17:53 dr_ref_chr16.gbk 42M Jan 22 17:53 NC_003075.gbk 14M Jan 22 18:43 NC_003272.gbk 25M Jan 22 17:52 NC_003279.gbk 4.8M Jan 22 18:44 NC_004350.gbk 20M Jan 22 18:42 NC_008095.gbk 14M Jan 22 18:44 NC_009925.gbk 18M Jan 22 18:43 NC_010628.gbk 296M Jan 22 17:52 ptr_ref_chr1.gbk 86M Jan 30 10:55 wgs.AAAB.1.gnp.gbk 297M Jan 30 10:55 wgs.AABR.10.gbff.gbk The last two files are WGS data for protein and nucleotide sequences, downloaded from ftp://ftp.ncbi.nih.gov/genbank/wgs/ then unzipped and a gbk extension added so my script parses them. With and without the patch the test script gives identical output - which appears to confirm the location parsing is not functionally altered. The timings where just over 2min and just over 8min with and without the patch (a four fold speed up on this dataset). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 06:30:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:30:30 -0500 Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. In-Reply-To: Message-ID: <200901301130.n0UBUUMm008550@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:30 EST ------- Marking as fixed - please reopen this if need be. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 06:54:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:54:26 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments for their types In-Reply-To: Message-ID: <200901301154.n0UBsQbw014456@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:54 EST ------- (In reply to comment #5) > Ok, understood. I didn't thought of these cases. > However, having not a Seq causes errors that are difficult to > understand in other functions that use SeqRecord. > For example, if you do: > > >>> a = SeqRecord(id = '1') > >>> a.format('fasta') > > you get the error: > : 'NoneType' object has no attribute > 'tostring' > > This could scary an eventual biopython newbie, an exception like to > 'error - current SeqRecord object doesn't have a Seq' could be better. Well, if you want to create a SeqRecord where the sequence is None, you'd have to do SeqRecord(None, id="1") - your suggestion of SeqRecord(id="1") doesn't work as the sequence is a mandatory argument. However, I see your point that the current AttributeError isn't helpful in this special case. I've updated the Bio/SeqIO/FastaIO.py file in CVS (revision 1.15) to give a TypeError in this situation which will try to explain the problem. > What do you think about creating a 'NullSeq' object, which represent a > Seq with no value, and using it as a default for SeqRecord? > Later we could modify the other functions like .format e Seq.translate to > intercept these objects and return the right error message. Hmm. It seems rather complicated for a rare case. Using None to mean "missing" or "null" is done in other python libraries/code (e.g. database access), which is why I suggested someone might want to do this. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 07:00:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 07:00:19 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901301200.n0UC0JcD016114@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 07:00 EST ------- (In reply to comment #3) > > What versions of biopython and the BioSQL schema are you using? > > Cymon According to the bug report, Stephen was using Biopython 1.49, so: Stephen: Biopython 1.49 postgresql 8.2 BioSQL - schema version unspecified psycopg2 - version unspecified python - version unspecified OS - Mac OS X What about you Cymon - you have postgresql with psycopg2 working, but what versions of things? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 07:13:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 07:13:52 -0500 Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation document In-Reply-To: Message-ID: <200901301213.n0UCDqef019147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 07:13 EST ------- (In reply to comment #2) > I'm leaving this bug open until I've updated the HTML and PDF copies of the > installation document on the website. I don't have the tools hevea installed > on this machine, so I can't create the HTML version of the installation > document -- just the PDF. I should be be able to do this next week... Website updated. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 07:20:06 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 07:20:06 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901301220.n0UCK6Fp020687@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #6 from cymon.cox at gmail.com 2009-01-30 07:20 EST ------- (In reply to comment #5) > (In reply to comment #3) > > > > What versions of biopython and the BioSQL schema are you using? > > > > Cymon > > According to the bug report, Stephen was using Biopython 1.49, so: > > Stephen: > Biopython 1.49 > postgresql 8.2 > BioSQL - schema version unspecified > psycopg2 - version unspecified > python - version unspecified > OS - Mac OS X > > What about you Cymon - you have postgresql with psycopg2 working, but what > versions of things? > > Peter > Peter, I'm using: Biopython: CVS Posgresql: 8.1.11 BioSQL: 1.0.1 Python: 2.5.2 Psycopg: 2.0.8 OS: Red Hat Enterprise 5.3 C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 09:16:32 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:16:32 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301416.n0UEGWeN005337@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1139 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:16 EST ------- Created an attachment (id=1211) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1211&action=view) Patch to Bio/MaxEntropy.py to make the convergence parameters optional arguments This should retain API backwards compatibility by using the current module level values as the function's default arguments (see earlier comments). I've checked that changing these and then re-calling the train function does work as expected. How does this look? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 09:17:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:17:43 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301417.n0UEHhKG005438@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1211|application/octet-stream |text/plain mime type| | Attachment #1211 is|0 |1 patch| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:17 EST ------- (From update of attachment 1211) Marking this as a patch (plain text) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 09:19:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:19:43 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301419.n0UEJhID005587@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1211 is|0 |1 obsolete| | ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:19 EST ------- (From update of attachment 1211) Sorry - wrong version of the patch. This doesn't cover _iis_solve_delta etc. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 09:30:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:30:40 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301430.n0UEUe04006448@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:30 EST ------- Created an attachment (id=1212) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1212&action=view) Patch to Bio/MaxEntropy.py to make the convergence parameters optional arguments This time its the whole patch - sorry for the extra emails this has triggered. I had stopped to check in a couple of docstring changes and fixed a few tabs in MaxEntropy.py first, which confused things. Note this is a bit different to what I was thinking in comment #5, > ... something like this: > > def train(training_set, results, feature_fns, update_fn=None, > max_iis_iterations = MAX_IIS_ITERATIONS, > iis_convere = IIS_CONVERGE, > max_newton_iterations = MAX_NEWTON_ITERATIONS > newton_coverage = NEWTON_CONVERGE): The above code won't pick up changes to the module level variables like MAX_IIS_ITERATIONS because the defaults are only evaluated once when the function is created. The patch deals with this as follows: def train(training_set, results, feature_fns, update_fn=None, max_iis_iterations=None, iis_converge=None, max_newton_iterations=None, newton_converge=None): if max_iis_iterations is None : max_iis_iterations = MAX_IIS_ITERATIONS if iis_converge is None : iis_converge = IIS_CONVERGE if max_newton_iterations is None : max_newton_iterations = MAX_NEWTON_ITERATIONS if newton_converge is None : newton_converge = NEWTON_CONVERGE This works :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 09:34:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:34:23 -0500 Subject: [Biopython-dev] [Bug 2745] New: Bio.GenBank.LocationParserError with a GenBank CON file Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Summary: Bio.GenBank.LocationParserError with a GenBank CON file Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The following file has a Bio.GenBank.LocationParserError: ftp://ftp.ncbi.nih.gov/genbank/daily-nc/con_nc.0103.flat.gz Partial error message (as the last line is the complete CONTIG line). Syntax error at or near `Tokens('close_paren')' token Traceback (most recent call last): File "parse_gbk.py", line 26, in for record in SeqIO.parse(handle, "genbank") : File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 410, in parse_records File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 393, in parse File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 371, in feed File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 1093, in _feed_misc_lines File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py", line 990, in contig_location File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py", line 707, in location Bio.GenBank.LocationParserError: join(DS483543.1:1..325170,gap(unk100),DS483544.1:1..218545,gap(unk100),DS483545.1:1..95394,gap(unk100),DS483546.1:1..261305,gap(unk100),DS483547.1:1..63422,gap(unk100),DS483548.1:1..77432,gap(unk100),DS483549.1:1..371434,gap(unk100),DS483550.1:1..74569,gap(unk100),DS483551.1:1..54637,gap(unk100),DS483552.1:1..73591,gap(unk100),DS483553.1:1..63632,gap(unk100),DS483554.1:1..60619,gap(unk100),DS483555.1:1..57196,gap(unk100),DS483556.1:1..95189,gap(unk100),DS483557.1:1..48586,gap(unk100),DS483558.1:1..45971,gap(unk100),DS483559.1:1..59826,gap(unk100),DS483560.1:1..49535,gap(unk100),DS483561.1:1..51083,gap(unk100),... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 09:35:41 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:35:41 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200901301435.n0UEZfpC007388@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #1 from bsouthey at gmail.com 2009-01-30 09:35 EST ------- Created an attachment (id=1213) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1213&action=view) Example of a single GenBank CON record that fails -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 09:47:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:47:36 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901301447.n0UEla5Q009025@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #10 from bsouthey at gmail.com 2009-01-30 09:47 EST ------- (In reply to comment #8) Thanks, I was able to print out the references from the annotations and I also did not see any differences. I submitted a bug for the CON file. I am a lot more comfortable with this patch now that a wide range of files have been tested. But you can confirm that the example I provided is correctly parsed? Thanks Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 10:11:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 10:11:56 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200901301511.n0UFBuEW012224@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 10:11 EST ------- It's the "gap(unk100)" entries which are breaking the location parser in Bruce's examples. Similarly even "gap()" entries of unknown length like this will fail: LOCUS AH007743 7832 bp DNA CON 26-MAY-1999 DEFINITION Gallus gallus ornithine transcarbamylase (OTC) gene, complete cds. ACCESSION AH007743 VERSION AH007743.1 GI:4927367 KEYWORDS . SOURCE chicken. ORGANISM Gallus gallus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Archosauria; Aves; Neognathae; Galliformes; Phasianidae; Phasianinae; Gallus. [....] FEATURES Location/Qualifiers source 1..7832 /organism="Gallus gallus" /db_xref="taxon:9031" /chromosome="1" CONTIG join(AF065630.1:1..1903,gap(),AF065631.1:1..435,gap(), AF065632.1:1..509,gap(),AF065633.1:1..722,gap(),AF065634.1:1..707, gap(),AF065635.1:1..836,gap(),AF065636.1:1..1614,gap(), AF065637.1:1..605,gap(),AF065638.1:1..501) // Example based on ftp://ftp.ncbi.nih.gov/genbank/README.genbank although this does not describe the new terms. Older versions of the release notes do, e.g. ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb168.release.notes ========================= [start quote] ========================= 3.4.15 CONTIG Format As an alternative to SEQUENCE, a CONTIG record can be present following the ORIGIN record. A join() statement utilizing a syntax similar to that of feature locations (see the Feature Table specification mentioned in Section 3.4.12) provides the accession numbers and basepair ranges of other GenBank sequences which contribute to a large-scale biological object, such as a chromosome or complete genome. Here is an example of the use of CONTIG : CONTIG join(AE003590.3:1..305900,AE003589.4:61..306076, AE003588.3:61..308447,AE003587.4:61..314549,AE003586.3:61..306696, AE003585.5:61..343161,AE003584.5:61..346734,AE003583.3:101..303641, [ lines removed for brevity ] AE003782.4:61..298116,AE003783.3:16..111706,AE002603.3:61..143856) However, the CONTIG join() statement can also utilize a special operator which is *not* part of the syntax for feature locations: gap() : Gap of unknown length. gap(X) : Gap with an estimated integer length of X bases. To be represented as a run of n's of length X in the sequence that can be constructed from the CONTIG line join() statement . gap(unkX) : Gap of unknown length, which is to be represented as an integer number (X) of n's in the sequence that can be constructed from the CONTIG line join() statement. The value of this gap operator consists of the literal characters 'unk', followed by an integer. Here is an example of a CONTIG line join() that utilizes the gap() operator: CONTIG join(complement(AADE01002756.1:1..10234),gap(1206), AADE01006160.1:1..1963,gap(323),AADE01002525.1:1..11915,gap(1633), AADE01005641.1:1..2377) The first and last elements of the join() statement may be a gap() operator. But if so, then those gaps should represent telomeres, centromeres, etc. Consecutive gap() operators are illegal. ========================= [end quote] ========================= Evidently Biopython doesn't cope with these CONTIG lines - but then they do have a different syntax to the feature locations. I never understood why the current code tries to parse the CONTIG string into a SeqFeature object in the first place. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 10:36:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 10:36:52 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200901301536.n0UFaq5u015637@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 10:36 EST ------- (In reply to comment #2) > > 'contig' is ignored by loader because it's a SeqFeature object. Is there any > > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) > > I couldn't even say off hand how the CONTIG line in that example would be > parsed, let alone how it gets dealt with when loading into BioSQL. Basically the CONTIG line looks rather a lot like a feature location, typically the join of lots of (external) sequences. It makes some sense to parse this into an object structure, which given the way joins are handled for features, this lead the original author to represent the CONTIG information as a dummy feature with lots of sub features. Given the CONTIG can also include gaps (of unknown length), this doesn't quite fit the current SeqFeature location objects (see Bug 2745). If we extend the location objects to cope with these gaps, then perhaps the CONTIG can stay as a SeqFeature in which case for BioSQL maybe we should record it in the SeqFeature table. We'd have to invent a way to record these gap locations though. However, if we just stored the CONTIG line as a raw string, we could then store it in BioSQL as just another bioentry qualifier (assuming it doesn't overflow the text field limit). I've checked how and where BioPerl stores the contig information using the example Bruce used on Bug 2745, attachment 1213, and see that the CONTIG information is stored in the bioentry_qualifier_value table under the term "contig" under the ontology "Annotation Tags". They have retained the separate lines, storing each as a separate entry with an increasing rank. Thus for compatibility with BioSQL, it would make sense for the GenBank parser to store the CONTIG line as a simple string (or list of strings), and not as a SeqFeature (which is currently half broken anyway - see Bug 2745). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 11:20:18 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 11:20:18 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200901301620.n0UGKIXW024960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 11:20 EST ------- Created an attachment (id=1214) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1214&action=view) Treat the CONTIG information as a string, not a SeqFeature As outlined on Bug 2681 comment 8, there are good reasons to simply store the CONTIG information as a string or perhaps a list of strings. This will make our BioSQL bindings consistent with BioPerl. More generally, I never really liked the idea of storing the CONTIG location as a SeqFeature. I could understand in principle using a location-object, but the current location objects do not deal with joins directly - which is why you have to use a SeqFeature with subfeatures. In the long term, a new location object might be a worthwhile change to both features and the contig. For now, this patch simply stores the CONTIG information as one long string. If we commit this, then Tests/output/test_GenBank will need updating too. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 11:54:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 11:54:20 -0500 Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation document In-Reply-To: Message-ID: <200901301654.n0UGsK0D003024@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 11:54 EST ------- This is fixed now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 2 01:37:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 1 Jan 2009 20:37:43 -0500 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200901020137.n021bhEB022751@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 ------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2009-01-01 20:37 EST ------- Can I instantiate GenBank file, reverse-complement the sequence (keep letter casing) in the SeqIO object and dump it back to a GenBank file? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 2 18:15:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 2 Jan 2009 13:15:46 -0500 Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements In-Reply-To: Message-ID: <200901021815.n02IFkcf012662@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2544 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-02 13:15 EST ------- (In reply to comment #4) > Can I instantiate GenBank file, reverse-complement the sequence > (keep letter casing) in the SeqIO object and dump it back to a > GenBank file? I think this question would have been better handled on the mailing lists, rather than on this bug. Note that currently our GenBank output via Bio.SeqIO does not include the features and references - see Bug 2294. I would do this based on the approach described in the tutorial, which assumes there could be many records in the input file. Here is a variation for just one record (untested): from Bio import SeqIO from Bio.SeqRecord import SeqRecord record = SeqIO.read(open("example.gbk"), "genbank") rc_record = SeqRecord(seq = record.seq.reverse_complement(), \ id = "rc_" + record.id, \ name = "rc_" + record.name, \ description = "reverse complement") out_handle = open("rc_example.gbk","w") SeqIO.write([rc_record], out_handle, "genbank") out_handle.close() Note you *could* override the record's sequence in situ: record.seq = record.seq.reverse_complement() #BAD IDEA This is a bad idea because none of the annotations will have been changed - in addition to the name/id/description still being the same, all the feature locations etc will still be for the forward sequence. -- I'm leaving this bug open for defining __repr__ for the Bio.SeqFeature.Reference object (and perhaps tweaking the display of the references in the SeqRecord __str__ method) ONLY. Please continue any other discussion on the mailing lists. Thanks. Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 3 22:18:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 17:18:56 -0500 Subject: [Biopython-dev] [Bug 2723] New: Clarify what applies to which version of biopython and other doc cleanup Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2723 Summary: Clarify what applies to which version of biopython and other doc cleanup Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: mmokrejs at ribosome.natur.cuni.cz I went to look around at the docs because the built-in tests of 1.49 setup.py spitted some messages about external programs missing. I haven't found any hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/. Anyway, looking at http://biopython.org/DIST/docs/install/Installation.html#htoc17 I see: "3.4 mxTextTools (no longer needed)". I would propose: 3.4 mxTextTools (no longer needed since 1.49) Similarly: - 3.1 Numerical Python (NumPy) (strongly recommended) + 3.1 Numerical Python (NumPy) (strongly recommended since 1.49) Bad URL links are in the text: 3.3 Database Access (MySQLdb, ...) (optional) [cut] Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be used for accessing BioSQL databases through Biopython (see ). Again if you are -----------------------------------------------------------^ not going to use BioSQL, there shouldn???t be any need to install these modules. 3.4 mxTextTools (no longer needed) [cut] However, we currently recommend you install mxTextTools 2.0, as some of the API changes made in 3.0 version were not compatible with Biopython. Goto to download ---------------------------------------------------------------------^^ this. I haven't found an answer for me yet: test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDist. ok test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. ok test_PopGen_SimCoal_nodepend ... ok test_ProtParam ... ok test_Registry ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SProt ... ok test_SVDSuperimposer ... ok test_SeqIO ... ok test_SeqIO_online ... ok test_SeqUtils ... ok test_SubsMat ... ok test_UniGene ... ok test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. ok test_align ... ok test_docstrings ... ok test_geo ... ok test_interpro ... ok test_kNN ... ok test_lowess ... ok test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... ok test_prosite ... ok test_prosite2 ... ok test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. ok test_seq ... ok test_translate ... ok test_trie ... ok test_triefind ... ok ---------------------------------------------------------------------- Ran 96 tests in 172.215s OK Pointer to those packages would have been helpful. From the test suite as well as from installation manual. Moreover, what database username/password would I have to make to get the BioSQL stuff compiled and tested? ^H^H^H^H^H^H I see, it gets compiled anyway the tests just were not run. The installation manual and the output from test suite should be clearer. Thanks, Peter! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 3 22:30:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 17:30:55 -0500 Subject: [Biopython-dev] [Bug 2724] New: Unclear? changes between 1.47 and 1.49 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2724 Summary: Unclear? changes between 1.47 and 1.49 Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mmokrejs at ribosome.natur.cuni.cz I had a look by diff(1) what files were installed on my machine by 1.47 release and which were installed by 1.49. I don't know what cdistance was about but the mailing list archive search tool does not work, and searching for it manually in raw archives of Oct and Nov 2008 did not help. The second file shown here contains a white space in a filename, not critical but maybe good to rename in next release. -/usr/lib/python2.5/site-packages/Bio/cdistance.so +/usr/share/biopython/Tests/Clustalw/temp horses.dnd -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 4 01:10:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 20:10:02 -0500 Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49 In-Reply-To: Message-ID: <200901040110.n041A2e5028585@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2724 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-03 20:10 EST ------- Bio.cdistance was an optional C implementation used within Bio.distance - the C code was used if available to speed up calculations. You can see the (now deleted) code in CVS here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Attic/cdistancemodule.c?hideattic=0&cvsroot=biopython This C code (Bio.cdistance) was removed when the python code (Bio.distance) was deprecated for release 1.49. This was discussed at the start of October on the mailing list, see this thread: http://lists.open-bio.org/pipermail/biopython/2008-October/004532.html This should have been mentioned in the DEPRECATED file, but wasn't. I've update this in CVS, see revision 1.41 http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/DEPRECATED?cvsroot=biopython Thanks for spotting this omission. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 4 01:20:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 20:20:42 -0500 Subject: [Biopython-dev] [Bug 2724] Unclear? changes between 1.47 and 1.49 In-Reply-To: Message-ID: <200901040120.n041Kgkx029421@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2724 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-03 20:20 EST ------- The file "/usr/share/biopython/Tests/Clustalw/temp horses.dnd" is normally created by one of the unit tests, test_Clustalw_tool.py (and the space is very deliberate). This stray dnd file does appear to have been included with biopython-1.49.zip (and probably the tar ball as well), which must have been a minor slip on my part. However, I don't think its worth re-issuing the archive files over this. I've updated test_Clustalw_tool.py as of CVS revision 1.4 so that it should remove this dnd file automatically. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 4 01:37:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Jan 2009 20:37:26 -0500 Subject: [Biopython-dev] [Bug 2723] Clarify what applies to which version of biopython and other doc cleanup In-Reply-To: Message-ID: <200901040137.n041bQ6Z030767@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-03 20:37 EST ------- (In reply to comment #0) > I went to look around at the docs because the built-in tests of 1.49 setup.py > spitted some messages about external programs missing. I haven't found any > hints on them in http://news.open-bio.org/news/2008/11/biopython-release-149/. No, that text and the matching email announcement don't do into details about installation - the text was already long enough I felt. However, the download page does list various external programs: http://biopython.org/wiki/Download (Someone else had pointed out we were missing a few, which as been fixed, but I couldn't find the email/bug report while writing this reply). > Anyway, looking at > http://biopython.org/DIST/docs/install/Installation.html#htoc17 > I see: "3.4 mxTextTools (no longer needed)". I would propose: > > 3.4 mxTextTools (no longer needed since 1.49) > > Similarly: > - 3.1 Numerical Python (NumPy) (strongly recommended) > + 3.1 Numerical Python (NumPy) (strongly recommended since 1.49) That does seem sensible. > Bad URL links are in the text: > > 3.3 Database Access (MySQLdb, ...) (optional) > > [cut] > > Additionally, both MySQLdb and psycopg (a PostgreSQL database adaptor) can be > used for accessing BioSQL databases through Biopython (see ). Again if you > -----------------------------------------------------------^ > are not going to use BioSQL, there shouldn???t be any need to install these > modules. > > > 3.4 mxTextTools (no longer needed) > > [cut] > > However, we currently recommend you install mxTextTools 2.0, as some of the > API changes made in 3.0 version were not compatible with Biopython. Goto > ---------------------------------------------------------------------^^ > to download this. I'll have to check those... probably something silly in the LaTeX source. > I haven't found an answer for me yet: > > test_PopGen_FDist ... skipping. Install FDist if you want to use > Bio.PopGen.FDist. > ok > ... > test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use > Bio.PopGen.SimCoal. > ok > ... > test_Wise ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. > ok > test_psw ... skipping. Install Wise2 (dnal) if you want to use Bio.Wise. > ok See http://biopython.org/wiki/Download > Pointer to those packages would have been helpful. From the test suite as well > as from installation manual. I'm not keen on making the unit test even more verbose by adding URLs to these messages. The information is on the download page, but yes, adding it to the installation document seems sensible. > Moreover, what database username/password would > I have to make to get the BioSQL stuff compiled and tested? ^H^H^H^H^H^H > I see, it gets compiled anyway the tests just were not run. The BioSQL unit test message should say: "Check settings in Tests/setup_BioSQL.py if you plan to use BioSQL". i.e. Once you have installed BioSQL and setup a database, edit the file setup_BioSQL.py to match. See http://biopython.org/wiki/BioSQL Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 4 18:56:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 4 Jan 2009 13:56:22 -0500 Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation document In-Reply-To: Message-ID: <200901041856.n04IuMhJ028749@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Clarify what applies to |Minor corrections to the |which version of biopython |installation document |and other doc cleanup | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-04 13:56 EST ------- (In reply to comment #1) > (In reply to comment #0) > > I went to look around at the docs because the built-in tests of 1.49 > > setup.py spitted some messages about external programs missing. I haven't > > found any hints on them in > > http://news.open-bio.org/news/2008/11/biopython-release-149/. > > No, that text and the matching email announcement don't do into details about > installation - the text was already long enough I felt. However, the download > page does list various external programs: > http://biopython.org/wiki/Download I've added a section on third party tools to the installation document in CVS. > > Anyway, looking at > > http://biopython.org/DIST/docs/install/Installation.html#htoc17 > > I see: "3.4 mxTextTools (no longer needed)". I would propose: > > > > 3.4 mxTextTools (no longer needed since 1.49) > > > > Similarly: > > - 3.1 Numerical Python (NumPy) (strongly recommended) > > + 3.1 Numerical Python (NumPy) (strongly recommended since 1.49) > > That does seem sensible. On reflection, I don't like the layout with version numbers stuck in the section names. The NumPy section is already very clear about the fact that this applies to 1.49 onwards, and that older versions of Biopython needed Numeric instead. I have tried to clarify the mxTextTools section in CVS. > > Bad URL links are in the text: > > > > 3.3 Database Access (MySQLdb, ...) (optional) > > ... > > 3.4 mxTextTools (no longer needed) > > ... > > I'll have to check those... probably something silly in the LaTeX source. Fixed in CVS. I'm leaving this bug open until I've updated the HTML and PDF copies of the installation document on the website. I don't have the tools hevea installed on this machine, so I can't create the HTML version of the installation document -- just the PDF. I should be be able to do this next week... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Jan 4 22:09:47 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 4 Jan 2009 17:09:47 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200901042209.n04M9lJ0010428@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #32 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-04 17:09 EST ------- (In reply to comment #30) > (In reply to comment #29) > > > > I propose that in Biopython 1.50 we support both "colour" and "color", > > but for Biopython 1.51 we add deprecation warnings when "colour" is used. > > > > We should probably do the same thing for "centre" and "center" as well... > > > > I agree. We should encourage use of the US spelling in the documentation, to > catch those new to GD. This approach provides a window for conversion of old > GD scripts for previous users, which is a good thing. > I've updated CVS to switch from centre to centre, with properties setup to allow access under the old spellings, and where I thought it appropriate I've included both spellings in argument lists. Another set of eyes to check this wouldn't hurt. I'm leaving this bug open until we've done the documentation (see my comment 25). There is also the issue of Bug 2705 for the AT and GC content and skew functions and any windowing function to help plot these in GenomeDiagram. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 16:30:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 11:30:46 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901051630.n05GUkun032207@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | ------- Comment #17 from bsouthey at gmail.com 2009-01-05 11:30 EST ------- I do not consider this bug completely fixed for multiple reasons of which my patch addressed some of these prior to the creation of the _write function. I do like where _write is heading as it is making cleaner and more understandable code. 1) I do not understand the need for the dictionary of modules 'formatdict' in _write as it creates unnecessary inefficient code. The options need to be part of the check for the type of output. 2) There is no indication that the output for write and write_to_string only accepts uppercase. Note the _write function states this but a user will not see these. I do not understand why lowercase is unacceptable. 3) The check for renderPM at start is really redundant because _write checks for it (well sort of). It is also an unnecessary delay if renderPM is not used. If you really must use the dictionary (which I really do not like) I would suggest something like: formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG} try: from reportlab.graphics import renderPM formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM, 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM}) The current code would show the correct options regardless of status ofrenderPM. Perhaps an exception could provide a warning that renderPM is not present. 4) There is no test for the presence of renderPM. The test function must check for renderPM and should at least provide a warning if not present. Otherwise this is a surprise to a user because not all options will be available. 5) The installation documentation must also indicate that renderPM is optional and also how to install the renderPM module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 16:49:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 11:49:46 -0500 Subject: [Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution In-Reply-To: Message-ID: <200901051649.n05GnkVK001550@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2671 ------- Comment #33 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 11:49 EST ------- Still to do on the documentation front (as written in comment #25), > > * Updating the existing GenomeDiagram manual to match (different imports, > colour to color), which I think can stay as a separate PDF file. > > * A short introduction to Bio.Graphics including GenomeDiagram as part of > a new chapter in the tutorial? Plus (as pointed out on Bug 2711 / Bug 2710): * Updating the installation instructions so that the ReportLab section also covers renderPM (needed for bitmaps). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 16:56:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 11:56:57 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901051656.n05GuvPP002443@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 11:56 EST ------- (In reply to comment #17) > I do not consider this bug completely fixed for multiple reasons of which my > patch addressed some of these prior to the creation of the _write function. I > do like where _write is heading as it is making cleaner and more > understandable code. > > 1) I do not understand the need for the dictionary of modules 'formatdict' in > _write as it creates unnecessary inefficient code. The options need to be part > of the check for the type of output. OK the use of a dictionary is a style thing. You think its ugly and inefficient. Leighton and I don't find it ugly. I thought the if/elif/elif/else alternative you suggested was "ugly". The argument for the type of output does get checked (by catching a KeyError from the dictionary). > 2) There is no indication that the output for write and write_to_string only > accepts uppercase. Note the _write function states this but a user will not > see these. I do not understand why lowercase is unacceptable. As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we should after all accept either case. > 3) The check for renderPM at start is really redundant because _write checks > for it (well sort of). It is also an unnecessary delay if renderPM is not > used. If you really must use the dictionary (which I really do not like) I > would suggest something like: > formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG} > try: > from reportlab.graphics import renderPM > formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM, > 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM}) I don't see how that would work, because unfortunately with the reportlab API, we must treat renderPM differently to renderPDF, renderPS and renderSVG. > The current code would show the correct options regardless of status > ofrenderPM. Perhaps an exception could provide a warning that renderPM > is not present. Right now we do have a "helpful" exception raised when a bitmap format is requested and renderPM is not installed. > 4) There is no test for the presence of renderPM. The test function must check > for renderPM and should at least provide a warning if not present. Otherwise > this is a surprise to a user because not all options will be available. There is an "on demand" test - via the _write function. As Leighton has already pointed out, this is nasty in that it can come as a surprise to the user. However, as far as I can see the alternative is an error/warning at import time regardless even if the user doesn't need or want bitmap output (i.e. Bug 2710). The current situation strikes me as the lesser of two evils. > 5) The installation documentation must also indicate that renderPM is > optional and also how to install the renderPM module. Yes, we should indicate renderPM is optional. Updating our documentation to cover GenomeDiagram is still pending on Bug 2671. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 21:46:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 16:46:37 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901052146.n05LkbSZ031281@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #19 from bsouthey at gmail.com 2009-01-05 16:46 EST ------- (In reply to comment #18) > (In reply to comment #17) > > I do not consider this bug completely fixed for multiple reasons of which my > > patch addressed some of these prior to the creation of the _write function. I > > do like where _write is heading as it is making cleaner and more > > understandable code. > > > > 1) I do not understand the need for the dictionary of modules 'formatdict' in > > _write as it creates unnecessary inefficient code. The options need to be part > > of the check for the type of output. > > OK the use of a dictionary is a style thing. You think its ugly and > inefficient. Leighton and I don't find it ugly. I thought the > if/elif/elif/else alternative you suggested was "ugly". > > The argument for the type of output does get checked (by catching a KeyError > from the dictionary). I agree that reportlab makes any solution "ugly" because the different types require different arguments. I agree this is partly a style issue because it is a case of what to do first, when to do it and when to tell the user what is missing. > > > 2) There is no indication that the output for write and write_to_string only > > accepts uppercase. Note the _write function states this but a user will not > > see these. I do not understand why lowercase is unacceptable. > > As part of Bug 2718, for consistency with the rest of Bio.Graphics I think we > should after all accept either case. > > > 3) The check for renderPM at start is really redundant because _write checks > > for it (well sort of). It is also an unnecessary delay if renderPM is not > > used. If you really must use the dictionary (which I really do not like) I > > would suggest something like: > > formatdict = {'PS': renderPS, 'PDF': renderPDF,'SVG': renderSVG} > > try: > > from reportlab.graphics import renderPM > > formatdict.update({'JPG': renderPM, 'BMP': renderPM, 'GIF': renderPM, > > 'PNG': renderPM, 'TIFF': renderPM,'TIF': renderPM}) > > I don't see how that would work, because unfortunately with the reportlab API, > we must treat renderPM differently to renderPDF, renderPS and renderSVG. > This just moves the renderPM import into _write and the rest of the code runs if you add: except: renderPM=None > > The current code would show the correct options regardless of status > > ofrenderPM. Perhaps an exception could provide a warning that renderPM > > is not present. > > Right now we do have a "helpful" exception raised when a bitmap format is > requested and renderPM is not installed. Again a style issue because I just find it redundant if we already know that renderPM is not present. > > > 4) There is no test for the presence of renderPM. The test function must check > > for renderPM and should at least provide a warning if not present. Otherwise > > this is a surprise to a user because not all options will be available. > > There is an "on demand" test - via the _write function. As Leighton has > already pointed out, this is nasty in that it can come as a surprise to the > user. However, as far as I can see the alternative is an error/warning at > import time regardless even if the user doesn't need or want bitmap output > (i.e. Bug 2710). The current situation strikes me as the lesser of two evils. > I mean that test_GenomeDiagram should also check for renderPM and provide a warning if not present. So if tests are run then there is some indication that something is missing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 22:33:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 17:33:30 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901052233.n05MXUCS002828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 17:33 EST ------- (In reply to comment #19) > I mean that test_GenomeDiagram should also check for renderPM and provide a > warning if not present. So if tests are run then there is some indication that > something is missing. The way we have our external dependency checking setup, if something is missing the whole test is skipped. I want to keep test_GenomeDiagram.py as it is producing PDF output (with no dependency on renderPM - so that the core GenomeDiagram functionality is tested). However, I had been thinking about adding a (smaller) extra test, say test_GenomeDiagram_bitmaps.py which would need renderPM installed. Alternatively this could be a more general quick test for making PNG etc with all of Bio.Graphics after fixing Bug 2718. This would as you point out mean anyone running the test suite would then be alerted to the fact they may be missing renderPM - which would be a good thing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 5 23:20:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 18:20:52 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200901052320.n05NKqok006769@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 18:20 EST ------- (In reply to comment #2) > In addition, I notice that Bio.Graphics.BasicChromosome, > Bio.Graphics.Comparative and Bio.Graphics.Distribution expect lower case > formats (currently just pdf and eps) while Bio.Graphics.GenomeDiagram > expects upper case. We should be consistent, which for backwards > compatibility would mean accepting either case. Bio.Graphics.GenomeDiagram will now accept format names in any case. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jan 6 00:16:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 19:16:10 -0500 Subject: [Biopython-dev] [Bug 2718] Bio.Graphics and output file formats (PDF, EPS, SVG, and bitmaps) In-Reply-To: Message-ID: <200901060016.n060GAfe011559@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2718 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 19:16 EST ------- Created an attachment (id=1186) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1186&action=view) Adding output function to Bio.Graphics for shared use This is based on the code from Bio.Graphics.GenomeDiagram.Diagram and would be called from all the Bio.Graphics modules to output to a file/handle in any supported file format, in a consistent manor. This is done as a private function, as I do not want to expose this as a new public API. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jan 6 00:18:06 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 19:18:06 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901060018.n060I6eq011760@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #21 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-05 19:18 EST ------- (In reply to comment #17) > I do not consider this bug completely fixed for multiple reasons of which my > patch addressed some of these prior to the creation of the _write function. I > do like where _write is heading as it is making cleaner and more > understandable code. I decided that since ReportLab used a cStringIO or StringIO handle internally to implement its writeToString method, we might as well do the same as it allows a great simplification to the GenomeDiagram write and write_to_string methods (and we can get rid of _write too). See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython I hope you'll agree that this is a further improvement (even if the dictionary approach is still used internally). My plan (see Bug 2718) is to move this code into a shared private function for all of the Bio.Graphics modules to use. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Tue Jan 6 00:48:12 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 6 Jan 2009 00:48:12 +0000 Subject: [Biopython-dev] Structure and LDNe Message-ID: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> Hi all, Jason Eshleman (he subscribes to this list also) has made available code to interact with Structure (a widely used application in population genetics - the 2 papers related to it have around 3000 citations acording to Google scholar). We will try to convert his code to the Bio.PopGen namespace, create documentation and test cases. To this adds the exsiting LDNe code (mine). This all should be ready in a reasonably fast time frame (I suppose before the next release). The all important statistics part is still due, I am afraid (I don't know if anybody has looked at the beta code on git). But at least this LDNe and Structure code will be ready to go soon. Tiago From bugzilla-daemon at portal.open-bio.org Tue Jan 6 02:56:35 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Jan 2009 21:56:35 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901060256.n062uZBF023086@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #22 from bsouthey at gmail.com 2009-01-05 21:56 EST ------- (In reply to comment #21) > (In reply to comment #17) > > I do not consider this bug completely fixed for multiple reasons of which my > > patch addressed some of these prior to the creation of the _write function. I > > do like where _write is heading as it is making cleaner and more > > understandable code. > > I decided that since ReportLab used a cStringIO or StringIO handle internally > to implement its writeToString method, we might as well do the same as it > allows a great simplification to the GenomeDiagram write and write_to_string > methods (and we can get rid of _write too). > > See revision 1.14 of Bio/Graphics/GenomeDiagram/Diagram.py > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Graphics/GenomeDiagram/Diagram.py?cvsroot=biopython > > I hope you'll agree that this is a further improvement (even if the dictionary > approach is still used internally). > > My plan (see Bug 2718) is to move this code into a shared private function for > all of the Bio.Graphics modules to use. > That is great! Note that reportlab's drawToString first uses it's getStringIO() and passes that to drawToFile. I am not sure the difference between getStringIO() and StringIO() but getStringIO() might be preferred. Also, I would presume that checking for the filename would allow you to combine the writing to a file and writing to a string into a single new function to maintain backwards compatibility. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rhythmbox-devel at maubp.freeserve.co.uk Tue Jan 6 10:01:34 2009 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Tue, 6 Jan 2009 10:01:34 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> Message-ID: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> On Tue, Jan 6, 2009 at 12:48 AM, Tiago Ant?o wrote: > Hi all, > > Jason Eshleman (he subscribes to this list also) has made available > code to interact with Structure (a widely used application in > population genetics - the 2 papers related to it have around 3000 > citations acording to Google scholar). We will try to convert his code > to the Bio.PopGen namespace, create documentation and test cases. > To this adds the exsiting LDNe code (mine). This all should be ready > in a reasonably fast time frame (I suppose before the next release). That sounds good :) > The all important statistics part is still due, I am afraid (I don't > know if anybody has looked at the beta code on git). But at least this > LDNe and Structure code will be ready to go soon. > > Tiago I haven't looked at any of your code on git - and I probably won't have any spare time till next week. But anyway, do you have the URL handy? Thanks Peter From bugzilla-daemon at portal.open-bio.org Tue Jan 6 12:30:39 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Jan 2009 07:30:39 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901061230.n06CUds2006927@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-06 07:30 EST ------- (In reply to comment #22) > That is great! > > Note that reportlab's drawToString first uses it's getStringIO() and passes > that to drawToFile. I am not sure the difference between getStringIO() and > StringIO() but getStringIO() might be preferred. >From going through the ReportLab code a week or two ago, it ends up using cStringIO (or falling back on StringIO) internally. > Also, I would presume that checking for the filename would allow you to > combine the writing to a file and writing to a string into a single new > function to maintain backwards compatibility. You'd then have one method to write to a string, handle or filename. As I said before, I'm not keen on this - having two very different return values (string or nothing) depending on the arguments, with some special invocation needed to request the string output (maybe None rather than a filename/handle?). The status quo seems OK here, with a write method (to a handle or filename) and separate a write_to_string method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From tiagoantao at gmail.com Tue Jan 6 16:52:22 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 6 Jan 2009 16:52:22 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> Message-ID: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> On Tue, Jan 6, 2009 at 10:01 AM, Peter wrote: > I haven't looked at any of your code on git - and I probably won't > have any spare time till next week. But anyway, do you have the URL > handy? I gave the code to Giovanni, so its his URL: http://github.com/dalloliogm/biopython---popgen/tree/master The code on Stats is still in a version that will have to be changed. It is probably only of interest to developers that might have direct interest in the module. For development purposes I will put the code there (I don't want to commit to the main CVS branch - as it is a production branch - before the code is in an acceptable format). Tiago From bsouthey at gmail.com Tue Jan 6 17:41:29 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 06 Jan 2009 11:41:29 -0600 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> Message-ID: <496397C9.3030706@gmail.com> Tiago Ant?o wrote: > Hi all, > > Jason Eshleman (he subscribes to this list also) has made available > code to interact with Structure (a widely used application in > population genetics - the 2 papers related to it have around 3000 > citations acording to Google scholar). We will try to convert his code > to the Bio.PopGen namespace, create documentation and test cases. > To this adds the exsiting LDNe code (mine). This all should be ready > in a reasonably fast time frame (I suppose before the next release). > > The all important statistics part is still due, I am afraid (I don't > know if anybody has looked at the beta code on git). But at least this > LDNe and Structure code will be ready to go soon. > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, What are the licenses for LDNe and Structure? Saying just 'free' is insufficient because it is not clear in which definition is being used. Also, please ensure that none of the code that is included into Biopython is not a deriviative of LDNe and Structure unless these have explicit license that is compatible with Biopython. For example, 'copying' an existing function into Python would be considered a derivative. Obviously reading a documented output is probably not considered a derivative. I prefer to be proactive with licenses so these don't bite back like has happened in some formally open sources projects or use of unclean code sources. A current example of this is that the current release of scipy 0.7 has been significantly delayed due to some major effort to check various functions that reference the Numerical Recipes book (which has an incompatible license). Anyhow, this sounds good! Bruce From tiagoantao at gmail.com Tue Jan 6 18:10:28 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 6 Jan 2009 18:10:28 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <496397C9.3030706@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> Message-ID: <6d941f120901061010n36281702gc073d9f4469d492c@mail.gmail.com> On Tue, Jan 6, 2009 at 5:41 PM, Bruce Southey wrote: > What are the licenses for LDNe and Structure? > Saying just 'free' is insufficient because it is not clear in which > definition is being used. > > Also, please ensure that none of the code that is included into Biopython is > not a deriviative of LDNe and Structure unless these have explicit license > that is compatible with Biopython. For example, 'copying' an existing > function into Python would be considered a derivative. Obviously reading a > documented output is probably not considered a derivative. Regarding LDNe we have had this discussion in the past. I have some updates/extra info: 1. They only make available a Windows/DOS version. But they will make a Linux version available (compiled by me, I offered to do that). Probably a mac version also. 2. As I said before and as it is common in population genetics (unfortunately), the software comes with no license at all, they didn't even think that is an issue. 3. No code is remotely derived or adapted. Regarding structure, the authors make the source available (a notch better than LDNe) http://pritch.bsd.uchicago.edu/structure.html , but again, they didn't bother to include license info. I am contacting them in order to investigate this. I will report back as soon as I have an answer. This being said, structure support is way more important than LDNe. The userbase of structure is quite big (just check the factoid previous on google schoolar citations). From dalloliogm at gmail.com Wed Jan 7 10:37:00 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 7 Jan 2009 11:37:00 +0100 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> Message-ID: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> On Tue, Jan 6, 2009 at 5:52 PM, Tiago Ant?o wrote: > On Tue, Jan 6, 2009 at 10:01 AM, Peter > wrote: >> I haven't looked at any of your code on git - and I probably won't >> have any spare time till next week. But anyway, do you have the URL >> handy? > > I gave the code to Giovanni, so its his URL: > http://github.com/dalloliogm/biopython---popgen/tree/master Hi people, if you want to upload the code there, please tell me and I will give you the write access. However, the right way to do it should be that you create a fork of the code on github, add your changes and work on it locally, and then merge them back again in the original repository. I suppose that is the standard way to use git. > The code on Stats is still in a version that will have to be changed. > It is probably only of interest to developers that might have direct > interest in the module. > For development purposes I will put the code there (I don't want to > commit to the main CVS branch - as it is a production branch - before > the code is in an acceptable format). > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From tiagoantao at gmail.com Wed Jan 7 11:54:19 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 7 Jan 2009 11:54:19 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> Message-ID: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> > However, the right way to do it should be that you create a fork of > the code on github, add your changes and work on it locally, and then > merge them back again in the original repository. I suppose that is > the standard way to use git. Considering that CVS has no development branch I think having git is very good. I would just recommend extreme care with changing existing code. When merging back into CVS, changes to existing code might not go in (especially if they change interfaces) or be delayed. Big _design_ changes will have to be discussed in advance. For my part, what I am including is just new LDNe code and helping Jason with the structure code. So I expect zero impact on existing code and no need for design changes. Tiago PS - I am travelling until Saturday, apologies in advance for delayed answers. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 14:12:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:12:46 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071412.n07ECk1n012802@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #24 from lpritc at scri.sari.ac.uk 2009-01-07 09:12 EST ------- (In reply to comment #13) > I can not check this as I am away from my system. As I recall, the Python code > for accessing this library is provided with the standard install as there is a > renderPM.py file. But that is just a wrapper to some C code found in the > rl_addons directory. So it is a big no that renderPM is available unless you > actually build the C sources or download the binaries (only valid for Windows). That's not really a big deal, as those are the only two ways to get ReportLab, from reportlab.org! >From the website (http://www.reportlab.org/downloads.html): """ We provide precompiled binaries for Windows, but not for any other platform. Many Linux distributors and other UNIX-like OS vendors provide their own binaries for download """ The installation procedure for me was to issue: python setup.py install at the command line while in the top directory of the source download, which isn't any harder than installing Biopython itself. This installed ReportLab 2.2, including compilation of renderPM. > According to the website > http://www.reportlab.org/subversion.html > " > It will create subdirectories for reportlab, which is an importable > python package, and rl_addons which contains the C extensions. The > latter need building with the contained setup script, but can also be > downloaded in pre-built form from our downloads page. They rarely > change. > " > > What did you actually install? Reportlab 2.2, stable build as ReportLab_2_2.tgz, downloaded on December 15th last year. From the checksum, it's the 11/9 build. I've just checked the SVN trunk, and that also builds renderPM, on the same machine. > In particular where was _renderPM built? Initially, in [download location]/ReportLab_2_2/src/rl_addons/renderPM and the library was installed to /usr/local/lib/python2.4/site-packages/_renderPM.so by the setup script. > Basically we need to document this as there appears to be different ways to > install reporlab (may also be version or svn related). I'm happy with this, but it's not exactly a complicated issue: either the local Reportlab installation does or does not have renderPM; if it does not, then raising an error before the user dedicates too much effort to something that can't work seems at least polite. Also, providing pointers in the documentation to where renderPM can be obtained (at time of last writing) is a good idea. IMO, given the straightforward installation procedure that corrects the issue - which ought not to affect *nix users that do not run precompiled binaries, anyway - I reckon that raising an error will be sufficient for most of the few cases that renderPM is not installed. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 14:33:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:33:21 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071433.n07EXLSn014755@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #25 from lpritc at scri.sari.ac.uk 2009-01-07 09:33 EST ------- (In reply to comment #17) > 1) I do not understand the need for the dictionary of modules 'formatdict' in > _write as it creates unnecessary inefficient code. The options need to be part > of the check for the type of output. The need is that input types are associated with alternative rendering backends. The distribution dictionary approach is highly-readable and readily extendable to accept, for example, lowercase variants of format names that map to the same backend - as in your point number 2. I also don't understand your efficiency argument. Firstly, this step is not AFAIAA a bottleneck, and hardly a priority for optimisation; secondly I do not believe that a distribution dictionary is less efficient than your suggestion. The dictionary achieves the same end in three lines of code, rather than ten for the elif. Also computationally, if the format name is 'TIF', your elif code will always have to cycle through all output format name tests (four conditionals, and an O(n) list search) in order to associate that format with renderPM. This is less efficient than a dictionary approach: retrieving values from dictionaries takes approximately constant time. Not that if we ran profile on the two approaches we'd see much of a difference, of course - this is not a speed-critical step. Also, and in my opinion, elifs are not as easy to maintain, or as readable, as distribution dictionaries. > 2) There is no indication that the output for write and write_to_string only > accepts uppercase. Note the _write function states this but a user will not see > these. I do not understand why lowercase is unacceptable. It's not unacceptable - at least, not to me - I just didn't write it to accept lowercase, originally. I've no objection to adding lowercase variants of the format names to the distribution dictionary. > 3) The check for renderPM at start is really redundant because _write checks > for it (well sort of). It is also an unnecessary delay if renderPM is not used. It's not a big speed hit (or is there contradictory data? it's certainly not a speed worry for my work) and, if tested on import, needs only to be done once when GenomeDiagram is imported. > 4) There is no test for the presence of renderPM. The test function must check > for renderPM and should at least provide a warning if not present. Otherwise > this is a surprise to a user because not all options will be available. Raising an error, or at least a warning, is a good idea. I favour raising this error on first import. > 5) The installation documentation must also indicate that renderPM is optional > and also how to install the renderPM module. I'm still not convinced that this is all that big an issue: renderPM is part of the source ReportLab 2.2 distribution, and the instructions on reportlab.org are pretty clear. However, for those users who have pathological installations, a line pointing out that renderPM can be obtained via reportlab.org is a good idea. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 14:38:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:38:14 -0500 Subject: [Biopython-dev] [Bug 2727] New: PDB.Bio: header should include CRYST1 information Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2727 Summary: PDB.Bio: header should include CRYST1 information Product: Biopython Version: 1.49b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: mok at bioxray.au.dk The unit cell and spacegroup information should be available from PDBParser's get_header() method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 14:40:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 09:40:52 -0500 Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1 information In-Reply-To: Message-ID: <200901071440.n07EeqsZ015513@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2727 ------- Comment #1 from mok at bioxray.au.dk 2009-01-07 09:40 EST ------- Created an attachment (id=1188) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1188&action=view) Patch for parse_pdb_header.py Attached patch will add three keys to the header dictionary: cell, spacegroup and cell_z, giving access to this data gleaned from the CRYST1 record of a PDB file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 15:10:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 10:10:12 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071510.n07FACPH017825@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #26 from bsouthey at gmail.com 2009-01-07 10:10 EST ------- (In reply to comment #24) I had Reportlab version 2.1 installed but once I upgraded to version 2.2 I got renderPM built. So anyone using reportlab version 2.2 will be happy, others that don't will not be happy! So please ensure that Reportlab version 2.2 (released 11 Sep 2008) and higher is required. Otherwise you must check for renderPM because most people probably have old version around with renderPM and most distributions (OpenSUSE seems to be an exception if you look in the right place) don't have the 2.2 version yet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 7 15:52:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Jan 2009 10:52:52 -0500 Subject: [Biopython-dev] [Bug 2711] GenomeDiagram.py: write() and write_to_string() are inefficient and don't check inputs In-Reply-To: Message-ID: <200901071552.n07FqqcX021811@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2711 ------- Comment #27 from bsouthey at gmail.com 2009-01-07 10:52 EST ------- (In reply to comment #25) This is a mainly a reportlab issue (API and version problem) and, as Peter said, a style issue. So the only remaining issue is a unit test involving at least checks for the presence of renderPM due to versions of reportlab less than 2.2. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jae at lmi.net Thu Jan 8 22:24:21 2009 From: jae at lmi.net (Jason Eshleman) Date: Thu, 08 Jan 2009 14:24:21 -0800 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <496397C9.3030706@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> Message-ID: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> Greetings all, Presently, the code I have for dealing with STRUCTURE is similar to the code for interacting with Clustal in that it does not modify any of the STRUCTURE source code by merely initiates the compiled executable. Initially, I have used my code in place of their Java front end as it allows for more control of the run-time variables for successive runs with varying run parameters. At some point, I'd like to get it to interface more directly with the STRUCTURE code to be able to pipe results directly to python for parsing rather than working with the STRUCTURE text output but that's a ways off still. -Jason At 09:41 AM 1/6/2009, Bruce Southey wrote: >Tiago Ant?o wrote: >>Hi all, >> >>Jason Eshleman (he subscribes to this list also) has made available >>code to interact with Structure (a widely used application in >>population genetics - the 2 papers related to it have around 3000 >>citations acording to Google scholar). We will try to convert his code >>to the Bio.PopGen namespace, create documentation and test cases. >>To this adds the exsiting LDNe code (mine). This all should be ready >>in a reasonably fast time frame (I suppose before the next release). >> >>The all important statistics part is still due, I am afraid (I don't >>know if anybody has looked at the beta code on git). But at least this >>LDNe and Structure code will be ready to go soon. >> >>Tiago >>_______________________________________________ >>Biopython-dev mailing list >>Biopython-dev at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >Hi, >What are the licenses for LDNe and Structure? >Saying just 'free' is insufficient because it is not clear in which >definition is being used. > >Also, please ensure that none of the code that is included into Biopython >is not a deriviative of LDNe and Structure unless these have explicit >license that is compatible with Biopython. For example, 'copying' an >existing function into Python would be considered a derivative. Obviously >reading a documented output is probably not considered a derivative. > >I prefer to be proactive with licenses so these don't bite back like has >happened in some formally open sources projects or use of unclean code >sources. A current example of this is that the current release of scipy >0.7 has been significantly delayed due to some major effort to check >various functions that reference the Numerical Recipes book (which has an >incompatible license). > >Anyhow, this sounds good! > >Bruce >_______________________________________________ >Biopython-dev mailing list >Biopython-dev at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Fri Jan 9 12:50:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 07:50:37 -0500 Subject: [Biopython-dev] [Bug 2727] PDB.Bio: header should include CRYST1 information In-Reply-To: Message-ID: <200901091250.n09Cob1q021245@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2727 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 07:50 EST ------- Hopefully Bio.PDB's owner/maintainer Thomas Hamelryck can comment on this. In the meantime, the code style seems to fit fine with the rest of parse_pdb_header.py which is good. However, you have not updated the parse_pdb_header function's docstring to include the new keys. Furthermore, it would be nice to have the docstring describe the meaning of the cell, z-cell and spacegroup entries you have introduced. I'm also curious about the default values and their meanings. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rhythmbox-devel at maubp.freeserve.co.uk Fri Jan 9 12:55:13 2009 From: rhythmbox-devel at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 12:55:13 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> Message-ID: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: > > Considering that CVS has no development branch I think having git is > very good. I would just recommend extreme care with changing existing > code. When merging back into CVS, changes to existing code might not > go in (especially if they change interfaces) or be delayed. > If there is a strong interest in having experimental branches in the official Biopython repository, we could discuss that as an option. Although I would prefer we get moved from CVS to SVN first before actually doing this, in order to keep the migration as simple as possible. Peter From biopython at maubp.freeserve.co.uk Fri Jan 9 12:59:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 12:59:00 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> Message-ID: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman wrote: > Greetings all, > > Presently, the code I have for dealing with STRUCTURE is similar to the code > for interacting with Clustal, in that it does not modify any of the STRUCTURE > source code by merely initiates the compiled executable. Biopython has code for interacting with lots of command line tools, and this neatly avoids any copyright/licence questions about being a derived work. > Initially, I have used my code in place of their Java front end as it allows > for more control of the run-time variables for successive runs with varying > run parameters. At some point, I'd like to get it to interface more > directly with the STRUCTURE code to be able to pipe results directly to > python for parsing rather than working with the STRUCTURE text output but > that's a ways off still. I'm not quite clear what you have in mind, but this would probably need a little more thought from the legal perspective. If STRUCTURE provides an API with header files you can compile against, that should be OK (but I am not a lawyer). Note that do this within Biopython would then mean adding another build time dependency, which would need to be justified in terms of the benefits it brings. Peter From bsouthey at gmail.com Fri Jan 9 14:46:15 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 09 Jan 2009 08:46:15 -0600 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> Message-ID: <49676337.7050504@gmail.com> Peter wrote: > On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: > >> Considering that CVS has no development branch I think having git is >> very good. I would just recommend extreme care with changing existing >> code. When merging back into CVS, changes to existing code might not >> go in (especially if they change interfaces) or be delayed. >> >> > > If there is a strong interest in having experimental branches in the > official Biopython repository, we could discuss that as an option. > Although I would prefer we get moved from CVS to SVN first before > actually doing this, in order to keep the migration as simple as > possible. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > I agree that it is essential to move from CVS before doing this but does not prevent any discussion. So I'll start a thread. Bruce From bugzilla-daemon at portal.open-bio.org Fri Jan 9 15:59:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 10:59:40 -0500 Subject: [Biopython-dev] [Bug 2729] New: Importing Bio.SeqUtils before importing pylab gives a "Bus Error" Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2729 Summary: Importing Bio.SeqUtils before importing pylab gives a "Bus Error" Product: Biopython Version: 1.49 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: stephan_schiffels at mac.com I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0 The following two lines crash: import Bio.SeqUtils import pylab I nailed down the problem to lines 122 through 125 in Bio/SeqUtils/__init__.py. Commenting out these four lines SOLVES the bug for me, since I don't use the graphics-functions in the SeqUtils package Best, Stephan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Jan 9 16:18:26 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 09 Jan 2009 10:18:26 -0600 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> Message-ID: <496778D2.1050801@gmail.com> Hi, In a previous thread (and indicated in others) it was suggested that perhaps Biopython needs some type of development or experimental branch. So this thread is orientated to provide some discussion on this and considers that Biopython has moved to SVN. I think it is very relevant discussion because Biopython needs an effective approach to mainly handle new code but also handle significant rewrites of older code. The most important question is do you support creating developmental and experimental branches or not? However, I do not think that this is a yes or no answer and I am not concerned about the question at the present time. Rather I am concerned about the burden placed on the maintainers (especially Peter and Michiel), the expression of the developer needs and how this impact the community. I am rather neutral on it (probably because I have not contributed any major code to Biopython) but I would like to ensure that the discussion leads to positive changes. I find Biopython interesting and special for various reasons. There is a solid core of functions that are common to many aspects of bioinformatics. But it also contains very specialized code that has a much smaller audience. Consequently certain parts get considerable exposure and other parts get limited or no exposure. This means that it may be necessary to release beta versions in order to get the necessary exposure as I assume that code has had sufficient development to be released in the first place. Creating developmental and experimental branches is one way to get this exposure but perhaps branches are not necessary. An alternative approach is creating specialized projects within Biopython that can be used for development and testing. For example, Scipy provides SciKits that are related code that is typically special purpose or is released under a different license than scipy/numpy. This replaced the sandboxes that existed in prior versions of numpy and scipy. But a recent problem arose in numpy was how to get code from such a location into numpy by creating a experimental section in the main distribution but that met some strong resistance. Therefore, I see the following issues that need to be addressed regardless of the approach taken: 0) Must be easy for project maintenance and release as this must not create an extra burden to Biopython! 1) Ensure adequate testing is performed especially to get it out to the appropriate audience and to correct the code and APIs. I consider this rather important because I tend to follow a type of user experience design (http://en.wikipedia.org/wiki/User_experience_design) and software prototyping (http://en.wikipedia.org/wiki/Software_prototyping) for software development. 2) Stabilization of APIs for backwards compatibility as we don't want to change these with each Biopython release. 3) Adequate test coverage especially across platforms and different software versions. For example Windows paths and older software versions can cause problems on other peoples machines but not yours. 4) Some type of code review even if it is just to ensure a consistent format (like spaces versus tabs) or compatibility across Python versions and platforms. 5) If developmental or experimental branch are used then how does the code move into the main distribution and how are these branches created and destroyed. Please add other issues. I would appreciate these issues being addressed when appropriate. Regards Bruce Peter wrote: > On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: > >> Considering that CVS has no development branch I think having git is >> very good. I would just recommend extreme care with changing existing >> code. When merging back into CVS, changes to existing code might not >> go in (especially if they change interfaces) or be delayed. >> >> > > If there is a strong interest in having experimental branches in the > official Biopython repository, we could discuss that as an option. > Although I would prefer we get moved from CVS to SVN first before > actually doing this, in order to keep the migration as simple as > possible. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From bugzilla-daemon at portal.open-bio.org Fri Jan 9 16:27:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:27:08 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091627.n09GR88l003529@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 11:27 EST ------- i.e. these lines? try: from Tkinter import * except ImportError: pass What happens with just "import Tkinter" on your machine? Are you using the default Apple installed copy of python? I can see why this might cause trouble if Tkinter does some initialisation at import time. Could you include the actual crash/traceback error please? Note I see no crash on my MacOS machine (not sure which version of pylab) which has Tkinter. Nor do I see a crash on one of my linux machines (again, not sure which pylab) which does NOT have TKinter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 16:33:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:33:59 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091633.n09GXxDS004117@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2009-01-09 11:33 EST ------- (In reply to comment #0) > I use the newest cvs version of biopython (2009 Jan 09) and matplotlib 0.90.0 > The following two lines crash: > > import Bio.SeqUtils > import pylab > What do you mean by crash? Also, do you get the same problem with the latest matplotlib (0.98.4 I believe)? If try: from Tkinter import * except ImportError: pass import pylab crashes, then this is not a Biopython bug. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 16:45:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:45:52 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091645.n09GjqFV004905@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 11:45 EST ------- Created an attachment (id=1189) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1189&action=view) Patch to Bio/SeqUtils/__init__.py to moving the Tkinter imports This patch moves the Tkinter import back into the xGC_skew function as suggested by the old comments in the code, and uses an explicit import list instead of "import *". For the history of this bit of code, see the deleted file Bio/sequtils.py in CVS. I think this is worthwhile little bit of clean up - but it probably won't have any effect on Stephan's issue with Tkinter/pylab. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 16:53:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 11:53:23 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091653.n09GrN6W005481@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 ------- Comment #4 from stephan_schiffels at mac.com 2009-01-09 11:53 EST ------- Hi, importing Tkinter works fine. Only calling import pylab after it crashes... (no traceback... just "bus error"). Here is the shell-output: mac14:~ stschiff$ python Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import Tkinter >>> import pylab Bus error mac14:~ stschiff$ The weirdest thing is that calling the other way around works fine: mac14:~ stschiff$ python Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import pylab >>> import Tkinter >>> The same holds for first calling pylab and then Bio.SeqUtils... I dont know, it could be that this is just a pathological case on my specific setup. It's still weird though, since matplotlib uses GTK on X11 on my machine, not Tkinter... I dont get it. Maybe this is not a biopython bug after all... sorry and thanks anyway for your concern Stephan (In reply to comment #1) > i.e. these lines? > > try: > from Tkinter import * > except ImportError: > pass > > What happens with just "import Tkinter" on your machine? > > Are you using the default Apple installed copy of python? > > I can see why this might cause trouble if Tkinter does some initialisation at > import time. Could you include the actual crash/traceback error please? > > Note I see no crash on my MacOS machine (not sure which version of pylab) which > has Tkinter. Nor do I see a crash on one of my linux machines (again, not sure > which pylab) which does NOT have TKinter. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 17:10:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 12:10:10 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091710.n09HAA5c006886@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 12:10 EST ------- (In reply to comment #4) > Hi, > importing Tkinter works fine. Only calling import pylab after it crashes... > (no traceback... just "bus error"). You could try going to Application, Utilities, Console on your Mac to look for any error log associated with the bus error. > Here is the shell-output: > > mac14:~ stschiff$ python > Python 2.5 (r25:51908, Apr 19 2007, 16:49:06) > [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import Tkinter > >>> import pylab > Bus error > mac14:~ stschiff$ OK - that does seem to confirm that its a bug with pylab, and therefore isn't Biopython's fault. I'm going to close this bug. I would suggest you update your installation of pylab, and if it still goes wrong, file a bug with pylab. Thanks anyway, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 9 17:10:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Jan 2009 12:10:52 -0500 Subject: [Biopython-dev] [Bug 2729] Importing Bio.SeqUtils before importing pylab gives a "Bus Error" In-Reply-To: Message-ID: <200901091710.n09HAqh1006971@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2729 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1189 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-09 12:10 EST ------- (From update of attachment 1189) This didn't turn out to be related to Bug 2729 after all. However, I've checked it in anyway. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Fri Jan 9 17:17:53 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 9 Jan 2009 18:17:53 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <496778D2.1050801@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> Message-ID: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: > Hi, > In a previous thread (and indicated in others) it was suggested that perhaps > Biopython needs some type of development or experimental branch. So this > thread is orientated to provide some discussion on this and considers that > Biopython has moved to SVN. Maybe you can consider the approach at the basis of git, in which every developer works on its personal branch, and the owner of the 'official branch' can decide whether to accept the changes apported by the single branches or not. If you want to play a bit with it, you can use my repository at github: - http://github.com/dalloliogm/biopython---popgen/commits/master and then create a fork from it. I am sorry that you will have to create an account on github.. but I don't know of any other free hosting service for git repositories. Git has also other advantages over svn, like working on local (which is done by creating a local branch internally) and being faster (this is what they say). Well, I am not a git guru, but I can suggest you some good videos, like this one: - http://excess.org/article/2008/07/ogre-git-tutorial/ > I think it is very relevant discussion because > Biopython needs an effective approach to mainly handle new code but also > handle significant rewrites of older code. > > The most important question is do you support creating developmental and > experimental branches or not? > > Please add other issues. > > I would appreciate these issues being addressed when appropriate. > > Regards > Bruce > > Peter wrote: >> >> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: >> >>> >>> Considering that CVS has no development branch I think having git is >>> very good. I would just recommend extreme care with changing existing >>> code. When merging back into CVS, changes to existing code might not >>> go in (especially if they change interfaces) or be delayed. >>> >>> >> >> If there is a strong interest in having experimental branches in the >> official Biopython repository, we could discuss that as an option. >> Although I would prefer we get moved from CVS to SVN first before >> actually doing this, in order to keep the migration as simple as >> possible. >> >> Peter >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Jan 9 17:28:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 17:28:06 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> Message-ID: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio wrote: > On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: >> Hi, >> In a previous thread (and indicated in others) it was suggested that perhaps >> Biopython needs some type of development or experimental branch. So this >> thread is orientated to provide some discussion on this and considers that >> Biopython has moved to SVN. > > Maybe you can consider the approach at the basis of git, in which > every developer works on its personal branch, and the owner of the > 'official branch' can decide whether to accept the changes apported by > the single branches or not. In some ways this describes the current situation but without the software: The CVS/SVN repository is the master official branch which we (as a group) try and keep pretty stable. When working on new modules, individual developers or contributors have hacked away on their own machines (perhaps using a local repository - I tended to just save versioned snapshots of work in progress), and commit things to the master once it was sufficiently stable to be approved. For self contained modules, this works OK - although using something like git would be a bit more formalised and automated, and allow this kind of "work in progress" to be done openly. Peter From dalloliogm at gmail.com Fri Jan 9 17:43:26 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 9 Jan 2009 18:43:26 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> Message-ID: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com> On Fri, Jan 9, 2009 at 6:28 PM, Peter wrote: > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio > wrote: >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: >>> Hi, >>> In a previous thread (and indicated in others) it was suggested that perhaps >>> Biopython needs some type of development or experimental branch. So this >>> thread is orientated to provide some discussion on this and considers that >>> Biopython has moved to SVN. >> >> Maybe you can consider the approach at the basis of git, in which >> every developer works on its personal branch, and the owner of the >> 'official branch' can decide whether to accept the changes apported by >> the single branches or not. > > In some ways this describes the current situation but without the > software: The CVS/SVN repository is the master official branch which > we (as a group) try and keep pretty stable. When working on new > modules, individual developers or contributors have hacked away on > their own machines (perhaps using a local repository - I tended to > just save versioned snapshots of work in progress), and commit things > to the master once it was sufficiently stable to be approved. For > self contained modules, this works OK - although using something like > git would be a bit more formalised and automated, and allow this kind > of "work in progress" to be done openly. just a note: since I was trying to simplify the concept, I said something which is not particularly correct. In git, you are not needed to have a central repository. Everyone has its personal branch and there is not such thing as an 'official branch', unless it is defined by convention. For example, look at this graph: - http://github.com/blog/39-say-hello-to-the-network-graph-visualizer on March 6th someone has created a fork to work on a mysql support, which has not been merged in the ufficial branch yet. There are many other forks, too: which one is the official? The answer is none of them, but if the authors wanted, they could have created a repository and decided that it was the official one, and kept it up to date. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Jan 9 17:49:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 9 Jan 2009 17:49:43 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <5aa3b3570901090943t37b14a4gfd7228eb747f2866@mail.gmail.com> Message-ID: <320fb6e00901090949v695333ak2615e9c217bc1387@mail.gmail.com> > just a note: since I was trying to simplify the concept, I said > something which is not particularly correct. > In git, you are not needed to have a central repository. Everyone has > its personal branch and there is not such thing as an 'official > branch', unless it is defined by convention. If we did want to adopt a git style approach, I do think we need an official branch which would be used for the releases and installers hosted on biopython.org, and this branch would be managed in much the same way as we do now with CVS/SVN. I think this would be essential for avoiding confusion in the typical end user. Peter From bartek at rezolwenta.eu.org Fri Jan 9 18:17:09 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Fri, 9 Jan 2009 19:17:09 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> Message-ID: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> On Fri, Jan 9, 2009 at 6:28 PM, Peter wrote: > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio > wrote: >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: >>> Hi, >>> In a previous thread (and indicated in others) it was suggested that perhaps >>> Biopython needs some type of development or experimental branch. So this >>> thread is orientated to provide some discussion on this and considers that >>> Biopython has moved to SVN. >> >> Maybe you can consider the approach at the basis of git, in which >> every developer works on its personal branch, and the owner of the >> 'official branch' can decide whether to accept the changes apported by >> the single branches or not. > > In some ways this describes the current situation but without the > software: The CVS/SVN repository is the master official branch which > we (as a group) try and keep pretty stable. When working on new > modules, individual developers or contributors have hacked away on > their own machines (perhaps using a local repository - I tended to > just save versioned snapshots of work in progress), and commit things > to the master once it was sufficiently stable to be approved. For > self contained modules, this works OK - although using something like > git would be a bit more formalised and automated, and allow this kind > of "work in progress" to be done openly. > It can be viewed this way, but the point here is that making this change to the process of development might decrease the amount of work required to join the development. Especially, if you think about adding new library to biopython, the most sensible way to do it is to branch and then stabilize. I've recently experienced (with Bio.Motif) that it might be tedious even for a very simple task. Also, using the distributed version control system, it is very easy for a small team of people to collaborate on a branch before merging back to the main repository. In the current mode this would be really difficult. And another benefit is that you do not loose the history of changes made "on a branch". As for github, it is currently used by BioRuby project hosted on open-bio.org. We can try to talk to them and ask about their experiences. I'm not personally involved in any way in it, but it seems, that they've basically moved the main branch to github and update the cvs repository only occasionaly. I think that for biopython, if we decided to use distributed version control, it would be better to use bazaar+launchpad instead of git+github. And for the following reasons: - it's completely free, as opposed to <300Mb of free account on github - launchpad could make the transition very easy. They provide a service of importing existing open source projects to launchpad: https://help.launchpad.net/VcsImports They convert the trunk to bazzaar for us and set it up to update from the cvs every 6-12 hours. It would be easy then to see whether we like it like this or not - bazaar is specifically aimed to be more user friendly than git, and allows developers to keep working in a familiar environment when moving from cvs or svn. I think it is important since git itself is really different from cvs and if we switch to anything else, everybody needs to learn the tool. - they use openID, which makes it simpler for people to join (even though you still need another account) - both bazaar and launchpad are developed in python, so they're more python oriented (while github is developed in ruby, so a better choice for bioruby). More on comparing these to possibilities (from the bazaar developers non-objective point of view): http://bazaar-vcs.org/BzrVsGit These are my 2 cents on the choice of tools for development, but I have to admit that I'm not sure whether it is needed for biopython now. I'm very open to discussion. -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From chapmanb at 50mail.com Fri Jan 9 22:51:55 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 9 Jan 2009 17:51:55 -0500 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> Message-ID: <20090109225155.GF4135@sobchak.mgh.harvard.edu> Hi all; In terms of the coding of experimental modules, Giovanni is taking an excellent approach. While they are under development, we can utilize one of the many free hosting platforms to develop it as a separate project in the Bio namespace. This allows interested users to get the code, contribute, and test. Once an interface and functionality is hammered out and they begin to stabilize, then it's a good time to package it up and roll it into Biopython provided the ol' mailing list consensus is happy. This is a nice development model as it leverages the community, but only rolls code into the main release when it stabilizes reasonable well. Peter has taken a really good development methodology -- creating a rock solid stable core of modules, and actively deprecating or fixing those that fall out of line. My only suggestion would be to have a Biopython wiki page for the experimental modules as they are under development. Something simple with a description of the goals and a link to the source code would help the majority of people who don't follow the mailing list find and contribute to these. Brad > On Fri, Jan 9, 2009 at 6:28 PM, Peter wrote: > > On Fri, Jan 9, 2009 at 5:17 PM, Giovanni Marco Dall'Olio > > wrote: > >> On Fri, Jan 9, 2009 at 5:18 PM, Bruce Southey wrote: > >>> Hi, > >>> In a previous thread (and indicated in others) it was suggested that perhaps > >>> Biopython needs some type of development or experimental branch. So this > >>> thread is orientated to provide some discussion on this and considers that > >>> Biopython has moved to SVN. > >> > >> Maybe you can consider the approach at the basis of git, in which > >> every developer works on its personal branch, and the owner of the > >> 'official branch' can decide whether to accept the changes apported by > >> the single branches or not. > > > > In some ways this describes the current situation but without the > > software: The CVS/SVN repository is the master official branch which > > we (as a group) try and keep pretty stable. When working on new > > modules, individual developers or contributors have hacked away on > > their own machines (perhaps using a local repository - I tended to > > just save versioned snapshots of work in progress), and commit things > > to the master once it was sufficiently stable to be approved. For > > self contained modules, this works OK - although using something like > > git would be a bit more formalised and automated, and allow this kind > > of "work in progress" to be done openly. > > > > It can be viewed this way, but the point here is that making this change to > the process of development might decrease the amount of work required to > join the development. Especially, if you think about adding new library > to biopython, the most sensible way to do it is to branch and then > stabilize. I've > recently experienced (with Bio.Motif) that it might be tedious even > for a very simple > task. Also, using the distributed version control system, it is very > easy for a small team > of people to collaborate on a branch before merging back to the main > repository. In the > current mode this would be really difficult. And another benefit is > that you do not loose > the history of changes made "on a branch". > > As for github, it is currently used by BioRuby project hosted on > open-bio.org. We can try > to talk to them and ask about their experiences. I'm not personally > involved in any way in it, > but it seems, that they've basically moved the main branch to github > and update the cvs repository > only occasionaly. > > I think that for biopython, if we decided to use distributed version > control, it would > be better to use bazaar+launchpad instead of git+github. And for the > following reasons: > - it's completely free, as opposed to <300Mb of free account on github > - launchpad could make the transition very easy. They provide a > service of importing existing > open source projects to launchpad: > https://help.launchpad.net/VcsImports They convert the trunk > to bazzaar for us and set it up to update from the cvs every 6-12 > hours. It would be easy then to > see whether we like it like this or not > - bazaar is specifically aimed to be more user friendly than git, and > allows developers > to keep working in a familiar environment when moving from cvs or svn. > I think it is important since git > itself is really different from cvs and if we switch to anything else, > everybody needs to learn the tool. > - they use openID, which makes it simpler for people to join (even > though you still need another > account) > - both bazaar and launchpad are developed in python, so they're more > python oriented > (while github is developed in ruby, so a better choice for bioruby). > > More on comparing these to possibilities (from the bazaar developers > non-objective point of view): > http://bazaar-vcs.org/BzrVsGit > > These are my 2 cents on the choice of tools for development, but I > have to admit that I'm not > sure whether it is needed for biopython now. I'm very open to discussion. > > -- > Bartek Wilczynski > ================== > Postdoctoral fellow > EMBL, Furlong group > Meyerhoffstrasse 1, > 69012 Heidelberg, > Germany > tel: +49 6221 387 8433 > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Sat Jan 10 14:46:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 14:46:13 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <20090109225155.GF4135@sobchak.mgh.harvard.edu> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> On Fri, Jan 9, 2009 at 10:51 PM, Brad Chapman wrote: > Hi all; > In terms of the coding of experimental modules, Giovanni is taking > an excellent approach. While they are under development, we can > utilize one of the many free hosting platforms to develop it as a > separate project in the Bio namespace. This allows interested users > to get the code, contribute, and test. Once an interface and > functionality is hammered out and they begin to stabilize, then it's > a good time to package it up and roll it into Biopython provided the > ol' mailing list consensus is happy. This does describe recent large additions fairly well - such as Bio.SeqIO, Bio.AlignIO, Bio.Entrez, Bio.PopGen and most recently Bio.Graphics.GenomeDiagram (which is a little different in that it was previously publicly available as a separate module). Modifications to existing bits of code (for example I have some proposals for Seq, SeqRecord and Alignment objects as enhancement bugs) don't really work in the same way - but also by their nature require more discussion because they can indirectly affect a lot of code. > This is a nice development model as it leverages the community, but > only rolls code into the main release when it stabilizes reasonable > well. Peter has taken a really good development methodology -- > creating a rock solid stable core of modules, and actively deprecating > or fixing those that fall out of line. I really don't deserve all the credit here - Michiel has also been a strong proponent for this "spring cleaning" as needed, for example how our NCBI online bits have been rationalised, refocusing on Bio.Entrez at the preferred module. > My only suggestion would be to have a Biopython wiki page for the > experimental modules as they are under development. Something simple > with a description of the goals and a link to the source code would > help the majority of people who don't follow the mailing list find > and contribute to these. Using the wiki in this way is a nice idea. Tiago - do you fancy adding a PopGen page describing the additions you're working on? As a bonus, once these do get into the main repository, you may find the wiki text will be a useful basis for extending the documentation. Peter From mjldehoon at yahoo.com Sat Jan 10 16:30:07 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 10 Jan 2009 08:30:07 -0800 (PST) Subject: [Biopython-dev] Rethinking Biopython's testing framework In-Reply-To: <5aa3b3570812301034r3633ebe0k937e33c731e69ccd@mail.gmail.com> Message-ID: <126502.76038.qm@web62403.mail.re1.yahoo.com> > > We could discuss a modification to run_tests.py so > > that if there is no expected output file > > output/test_XXX for test_XXX.py we just run > > test_XXX.py and check its return value (I think > > Michiel had previously > > suggested something like this). > > I think this should be done inside the test itself. > All the tests should return only a boolean value (passed or > not) and a description of the error. > The tests that make use of an expected output file, they > should open it and do the comparison by themselves, not in > run_tests.py. Sounds attractive, but there is one complication for print-and-compare tests. The code that does the print-and-compare is not trivial (see run_tests.py). It is possible to have the print-and-compare code in a helper module, which is then imported by each print-and-compare test. Still, while currently the print-and-compare tests have the advantage of being simple, they will get more complicated if we require the print-and-compare to be part of each test. Does anybody have an opinion on this? It's either doing the print-and-compare as part of each print-and-compare test script, or requiring a test_suite() function in each unittest-based test script, and assuming that a test script is a unittest-based test script if it contains a test_suite() function. --Michiel From tiagoantao at gmail.com Sat Jan 10 16:48:03 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 10 Jan 2009 16:48:03 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <496778D2.1050801@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901060201s3830c972w4638f5eefcd42b6a@mail.gmail.com> <6d941f120901060852r482baf16m6b8399959b3c1aaa@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> Message-ID: <6d941f120901100848h6e186022o241b928ea2566993@mail.gmail.com> This whole discussion is very interesting. In fact, whatever are the conclusions I think they should be labeled "offical policy" and put on the Wiki. The biggest problem that I've faced is that, whenever I am doing something, I don't know the level of acceptability with other developers. I tend to put everything to discussion before I commit it and whenever I say something I might get completely different answers from time to time and from different people. The end result is that I defer from commiting things because of issues that are raised in an ad-hoc fashion. There should be a page clarifying things like: 1. Are contributions that have a small target audience accepted? 2. Use of foreign libraries (e.g., SciPy)? 3. Code management policies. Branches? Adding new code? Breaking interfaces? 4. New developers 5. Legal issues 6. Interop with non-free software 7. Code quality strategies. Code review? Testing? 8. Multiplatform issues I am not saying a big document. But as questions arise, just discuss them, arrive at a decision and document them. It becomes tiring having to answer the same questions about code that you want to submit over and over again and with different issues everytime. One can live with decisions that are disliked, but it is much more difficult to live when the playing ground is moving all the time. On Fri, Jan 9, 2009 at 4:18 PM, Bruce Southey wrote: > Hi, > In a previous thread (and indicated in others) it was suggested that perhaps > Biopython needs some type of development or experimental branch. So this > thread is orientated to provide some discussion on this and considers that > Biopython has moved to SVN. I think it is very relevant discussion because > Biopython needs an effective approach to mainly handle new code but also > handle significant rewrites of older code. > > The most important question is do you support creating developmental and > experimental branches or not? > > However, I do not think that this is a yes or no answer and I am not > concerned about the question at the present time. Rather I am concerned > about the burden placed on the maintainers (especially Peter and Michiel), > the expression of the developer needs and how this impact the community. I > am rather neutral on it (probably because I have not contributed any major > code to Biopython) but I would like to ensure that the discussion leads to > positive changes. > > I find Biopython interesting and special for various reasons. There is a > solid core of functions that are common to many aspects of bioinformatics. > But it also contains very specialized code that has a much smaller audience. > Consequently certain parts get considerable exposure and other parts get > limited or no exposure. This means that it may be necessary to release beta > versions in order to get the necessary exposure as I assume that code has > had sufficient development to be released in the first place. Creating > developmental and experimental branches is one way to get this exposure but > perhaps branches are not necessary. > > An alternative approach is creating specialized projects within Biopython > that can be used for development and testing. For example, Scipy provides > SciKits that are related code that is typically special purpose or is > released under a different license than scipy/numpy. This replaced the > sandboxes that existed in prior versions of numpy and scipy. But a recent > problem arose in numpy was how to get code from such a location into numpy > by creating a experimental section in the main distribution but that met > some strong resistance. > > Therefore, I see the following issues that need to be addressed regardless > of the approach taken: > > 0) Must be easy for project maintenance and release as this must not create > an extra burden to Biopython! > 1) Ensure adequate testing is performed especially to get it out to the > appropriate audience and to correct the code and APIs. I consider this > rather important because I tend to follow a type of user experience design > (http://en.wikipedia.org/wiki/User_experience_design) and software > prototyping (http://en.wikipedia.org/wiki/Software_prototyping) for software > development. > 2) Stabilization of APIs for backwards compatibility as we don't want to > change these with each Biopython release. > 3) Adequate test coverage especially across platforms and different software > versions. For example Windows paths and older software versions can cause > problems on other peoples machines but not yours. > 4) Some type of code review even if it is just to ensure a consistent format > (like spaces versus tabs) or compatibility across Python versions and > platforms. > 5) If developmental or experimental branch are used then how does the code > move into the main distribution and how are these branches created and > destroyed. > > Please add other issues. > > I would appreciate these issues being addressed when appropriate. > > Regards > Bruce > > Peter wrote: >> >> On Wed, Jan 7, 2009 at 11:54 AM, Tiago Ant?o wrote: >> >>> >>> Considering that CVS has no development branch I think having git is >>> very good. I would just recommend extreme care with changing existing >>> code. When merging back into CVS, changes to existing code might not >>> go in (especially if they change interfaces) or be delayed. >>> >>> >> >> If there is a strong interest in having experimental branches in the >> official Biopython repository, we could discuss that as an option. >> Although I would prefer we get moved from CVS to SVN first before >> actually doing this, in order to keep the migration as simple as >> possible. >> >> Peter >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "Systems can remain irrational far longer than you or I can survive" - Freely adapted from John Maynard Keynes From tiagoantao at gmail.com Sat Jan 10 16:52:44 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 10 Jan 2009 16:52:44 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <5aa3b3570901070237s487a4307hb68fa69abc3cb23d@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> Message-ID: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> On Sat, Jan 10, 2009 at 2:46 PM, Peter wrote: > Using the wiki in this way is a nice idea. Tiago - do you fancy > adding a PopGen page describing the additions you're working on? As a > bonus, once these do get into the main repository, you may find the > wiki text will be a useful basis for extending the documentation. Where do you want me to link the page on the Wiki? From biopython at maubp.freeserve.co.uk Sat Jan 10 17:03:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 17:03:05 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <6d941f120901070354o70b6c99ah37ffdb38a1af7554@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> Message-ID: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o wrote: > On Sat, Jan 10, 2009 at 2:46 PM, Peter wrote: >> Using the wiki in this way is a nice idea. Tiago - do you fancy >> adding a PopGen page describing the additions you're working on? As a >> bonus, once these do get into the main repository, you may find the >> wiki text will be a useful basis for extending the documentation. > > Where do you want me to link the page on the Wiki? How about having two pages: http://biopython.org/wiki/PopGen - documentation on the code in the current official release, - linked to from the main doc page http://biopython.org/wiki/PopGen_dev - discussion and links to your branch etc, - linked to from the above PopGen page This would be consistent with how I did the Bio.SeqIO pages, http://biopython.org/wiki/SeqIO http://biopython.org/wiki/SeqIO_dev If you think you have an better idea, feel free to make suggestions. Peter From peter at maubp.freeserve.co.uk Sat Jan 10 17:46:38 2009 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 17:46:38 +0000 Subject: [Biopython-dev] Developmental policies Message-ID: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> On Sat, Jan 10, 2009 at 4:48 PM, Tiago Ant?o wrote: > This whole discussion is very interesting. In fact, whatever are the > conclusions I think they should be labeled "offical policy" and put on > the Wiki. That sounds good. > The biggest problem that I've faced is that, whenever I am doing > something, I don't know the level of acceptability with other > developers. I tend to put everything to discussion before I commit it > and whenever I say something I might get completely different answers > from time to time and from different people. The end result is that I > defer from commiting things because of issues that are raised in an > ad-hoc fashion. Asking before doing things is in general a good plan. Sadly not everyone will be free to respond at any one time - but I agree with you that having more of the defacto policy written out explicitly would help. > There should be a page clarifying things like: > 1. Are contributions that have a small target audience accepted? Historically yes this has happened - although my impression is that the bar was perhaps set too low. I would say some things were accepted without sufficient documentation and tests. The problem with small interest modules is that if the original developer moves on, in the absense of any apparent users, the module gets abandoned. This seems to explain several of the smaller modules we've deprecated in the last couple of years. On the other hand, somethings will start with a small target audience that will grow. If I was confident that the developer concerned would stick arround for several years and was prepared to deal with documentation, unit tests and bug fixes then I would be much happier about including something, even if it might have a relatively small target audience initially. > 2. Use of foreign libraries (e.g., SciPy)? I think the current stance has been to try and minimise 3rd party dependencies, other than the special case of python wrappers for command line tools. This makes much easier for beginners to install and use Biopython, and lowering the barrier to entry is a good thing. There are practical points here too. In general, 3rd party dependencies can be a pain (e.g. our Martel parsers broke when mxTextTools changed their API between 2.0 and 3.0). Similarly they can restrict the distribution of Biopython (e.g. NumPy isn't get available on Windows for Python 2.6), and will also be a potential road block for moving to Python 3. As another example, a small part of Bio.PDB uses flex in a parser, and again this makes building and distributing it a real pain (so much so, that its been commented out by default). However, run time only dependencies (like pure python libraries and command line tools) are not such an issue for packaging/distribution. e.g. ReportLab (used in Bio.Graphics only). If SciPy were to be used by part of Bio.PopGen, and this didn't affect packaging/distribution then this might be OK. > 3. Code management policies. Branches? Adding new code? Breaking interfaces? Biopython has historically worked from a stable trunk. As a consequence we try and avoid breaking interfaces, instead adopting a gradual deprecation of an old interface when adding a new interface, or adding enhancements in a backwards compatible manor. > 4. New developers I think there is something written down about this already... > 5. Legal issues Try and avoid them? What did you mean in particular? > 6. Interop with non-free software This is linked to the legal issues question. Many of the tools we link to like BLAST aren't open source, but are "free" as in cost. I don't think we have any examples of non-free software. > 7. Code quality strategies. Code review? Testing? Code review: For new code in a specialist area, it can be difficult to get a qualified second opinion on the approach, but existing developers can at least comment on the coding style. For existing code, my impression is module owners have been trusted to make changes to "their" code without review - and generally speaking this has worked out OK. Although if anyone spot someone making a change they disagree with, then please do raise it. I would hope any larger change had some discussion before hand - possibly via enhancement entries on bugzilla. Testing: I'd strongly resist adding any new module without an accompanying test, and wish this had been a firm policy from day one. > 8. Multiplatform issues Ideally everything should be cross platform (like python itself). There are exceptions to this - in particular some 3rd party tools are not cross platform. I personally use and test on Windows, Linux and Mac - and I believe Michiel does too. > I am not saying a big document. But as questions arise, just discuss > them, arrive at a decision and document them. It becomes tiring having > to answer the same questions about code that you want to submit over > and over again and with different issues everytime. > One can live with decisions that are disliked, but it is much more > difficult to live when the playing ground is moving all the time. I'm sorry if you've had that feeling. However, circumstances change. As I recall when you first asked about using SciPy as a dependency, Biopython was still using Numeric instead of Numpy - so using SciPy had to wait until after that transition. Now that we have moved to NumPy, I think you have a much stronger case. Peter From tiagoantao at gmail.com Sat Jan 10 18:31:05 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 10 Jan 2009 18:31:05 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> Message-ID: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> > mxTextTools changed their API between 2.0 and 3.0). Similarly they > can restrict the distribution of Biopython (e.g. NumPy isn't get > available on Windows for Python 2.6), and will also be a potential > road block for moving to Python 3. As another example, a small part By the way, another issue that would be interesting to address is deprecation of older Python versions and Python 3. Like just having a clear stance on what is the current feeling about this. It seems to be a recurring question. >> 5. Legal issues > > Try and avoid them? What did you mean in particular? In my opinion something should be said about this. Actually I think (suggest) it is essencially a matter of mainly taking Bruce' s comments (e.g. one cannot have derived works of non-free software) and write them down on a wiki page. Just things potential contributor would have to be aware of on a legal front. > Testing: > I'd strongly resist adding any new module without an accompanying > test, and wish this had been a firm policy from day one. People should also be encouraged to test (in as much as possible) in at least Win/Linux/Mac. Of course, for some people it will be difficult as access to all platforms is not always possible for everybody. But at least encouragement should be made... > I'm sorry if you've had that feeling. However, circumstances change. > As I recall when you first asked about using SciPy as a dependency, > Biopython was still using Numeric instead of Numpy - so using SciPy > had to wait until after that transition. Now that we have moved to > NumPy, I think you have a much stronger case. Boss, don't say sorry, I think everybody would agree that you make a most fantastic effort. Regarding circunstances: When circunstances change, then one would ammend documents. Again, my point is not in favour of this or that policy. Only that a barebones policy should be documented. So that people know what the basic rules are, this will allow for realistic expectations with regards to code being accepted or not in the stable distribution. From peter at maubp.freeserve.co.uk Sat Jan 10 20:10:27 2009 From: peter at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Jan 2009 20:10:27 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> Message-ID: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o wrote: > By the way, another issue that would be interesting to address is > deprecation of older Python versions and Python 3. Like just having a > clear stance on what is the current feeling about this. It seems to be > a recurring question. Regarding older versions of python, we have stated that Biopython 1.49 should work on Python 2.3 to 2.6, and we expect to do the same for Biopython 1.50. Thereafter, we will probably drop support for Python 2.3 (unless anyone has a strong need for it and makes their voice heard). See the mailing list archive and the corresponding new postings: http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/ http://news.open-bio.org/news/2008/11/biopython-release-149/ Regarding Python 3, one hold up will be neither ReportLab nor NumPy have a clear plan for Python 3 - or at least that is my impression. However, even ignoring those parts of Biopython which use NumPy (e.g. Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab), we have a lot of useful code. In the short term we should be aiming to have everything run under Python 2.6 in warnings mode, as a step towards eventual Python 3 support. Beyond that, I think that it is likely we'll want to use bytes rather than (unicode) strings in Python 3 for the Seq object, but have not given this much thought. >>> 5. Legal issues >> >> Try and avoid them? What did you mean in particular? > > In my opinion something should be said about this. Actually I think > (suggest) it is essencially a matter of mainly taking Bruce' s > comments (e.g. one cannot have derived works of non-free software) and > write them down on a wiki page. Just things potential contributor > would have to be aware of on a legal front. I see what you mean. Perhaps I am naive in thinking this should be common knowledge amongst potential contributors. >> Testing: >> I'd strongly resist adding any new module without an accompanying >> test, and wish this had been a firm policy from day one. > > People should also be encouraged to test (in as much as possible) in > at least Win/Linux/Mac. Of course, for some people it will be > difficult as access to all platforms is not always possible for > everybody. But at least encouragement should be made... Also tests which require additional setup are a pain. The BioSQL tests are an example of this, where it is unavoidable - but any situation like this reduces the number of people/machines where that test will get checked. Michiel has stressed this kind of thing as a concern in the past (as I recall). Peter From bugzilla-daemon at portal.open-bio.org Mon Jan 12 14:31:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 09:31:22 -0500 Subject: [Biopython-dev] [Bug 2731] New: Adding .upper() and .lower() methods to the Seq object Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2731 Summary: Adding .upper() and .lower() methods to the Seq object Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BugsThisDependsOn: 2532 OtherBugsDependingO 2351 nThis: As part of making the Seq object more string like (Bug 2351), it would be nice to support the .upper() and .lower() methods. Doing this elegantly will require different case versions of the alphabets (see Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet object itself. Alternatively, we can handle this without adding new Alphabets by mapping the fixed case IUPAC alphabets to case-less generic alphabets. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 12 14:31:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 09:31:25 -0500 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200901121431.n0CEVPFK010376@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2731 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Jan 12 14:31:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 09:31:30 -0500 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200901121431.n0CEVUDG010399@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2731 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Jan 12 17:03:45 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 12 Jan 2009 11:03:45 -0600 Subject: [Biopython-dev] Developmental policies In-Reply-To: <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> Message-ID: <496B77F1.9060207@gmail.com> Peter wrote: > On Sat, Jan 10, 2009 at 6:31 PM, Tiago Ant?o wrote: > >> By the way, another issue that would be interesting to address is >> deprecation of older Python versions and Python 3. Like just having a >> clear stance on what is the current feeling about this. It seems to be >> a recurring question. >> > > Regarding older versions of python, we have stated that Biopython 1.49 > should work on Python 2.3 to 2.6, and we expect to do the same for > Biopython 1.50. Thereafter, we will probably drop support for Python > 2.3 (unless anyone has a strong need for it and makes their voice > heard). See the mailing list archive and the corresponding new > postings: > http://news.open-bio.org/news/2008/11/biopython-and-python-26-and-python-23/ > http://news.open-bio.org/news/2008/11/biopython-release-149/ > > Regarding Python 3, one hold up will be neither ReportLab nor NumPy > have a clear plan for Python 3 - or at least that is my impression. > There has been limited information on the numpy list regarding Python 3 but there has been some investigation on this (http://www.scipy.org/Python3k). I did ask about Python 3 last year in the thread titled 'Report from SciPy' and Robert Kern's response should be at: http://www.mail-archive.com/numpy-discussion at scipy.org/msg12101.html Also, this thread has the future aims of numpy (obviously still awaiting scipy 0.7): http://www.mail-archive.com/numpy-discussion at scipy.org/msg12091.html Currently I think the main current effort for numpy 1.3 is getting Python 2.6 fully supported (windows is the main problem) before there will be any further consideration of Python 3. One of the main problems is that numpy uses a few APIs that are depreciated in Python 3. So any porting will not go far until the correct APIs are used which is probably be after the next numpy release. > However, even ignoring those parts of Biopython which use NumPy (e.g. > Bio.PDB and Bio.Cluster) and Bio.Graphics (the only use of ReportLab), > we have a lot of useful code. In the short term we should be aiming > to have everything run under Python 2.6 in warnings mode, as a step > towards eventual Python 3 support. > While I understand this approach, I do wonder how effective it will be compared to direct porting using the 2to3 tool. One reason is that 2to3 is more than a code convertor as it also attempts to guess at what you are trying to do. Anyhow, this is not a trivial task and I am willing to help in that regard. > Beyond that, I think that it is likely we'll want to use bytes rather > than (unicode) strings in Python 3 for the Seq object, but have not > given this much thought. > > >>>> 5. Legal issues >>>> >>> Try and avoid them? What did you mean in particular? >>> >> In my opinion something should be said about this. Actually I think >> (suggest) it is essencially a matter of mainly taking Bruce' s >> comments (e.g. one cannot have derived works of non-free software) and >> write them down on a wiki page. Just things potential contributor >> would have to be aware of on a legal front. >> > > I see what you mean. Perhaps I am naive in thinking this should be > common knowledge amongst potential contributors. > I think we must be explicit in this and ensure that any accepted code is BSD-compatible because we can not ensure what people really know. Further the license of any application that Biopython interacts with must be clearly stated and the developer is responsible to get one if it does not have one. That way we know what is included and should help users as well in terms of whether or not they can use some application. > >>> Testing: >>> I'd strongly resist adding any new module without an accompanying >>> test, and wish this had been a firm policy from day one. >>> >> People should also be encouraged to test (in as much as possible) in >> at least Win/Linux/Mac. Of course, for some people it will be >> difficult as access to all platforms is not always possible for >> everybody. But at least encouragement should be made... >> > > Also tests which require additional setup are a pain. The BioSQL > tests are an example of this, where it is unavoidable - but any > situation like this reduces the number of people/machines where that > test will get checked. Michiel has stressed this kind of thing as a > concern in the past (as I recall). > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > We can not force people to run tests but hope that sufficient people who do cover many of the variations as possible. Do we need to create buildbots (eg http://sourceforge.net/projects/buildbot/)? I do not test or use BioSQL code because I do not use BioSQL and do not run a compatible database on my system. So it would be really great if BioSQL supported sqlite because the database requirements would be alleviated. The other related aspect is that certain applications like clustalw must be in the path otherwise the application will not be found and the test skipped. But I do not know how to solve this except perhaps using environmental variables. Regards Bruce From bsouthey at gmail.com Mon Jan 12 17:34:50 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 12 Jan 2009 11:34:50 -0600 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> Message-ID: <496B7F3A.60407@gmail.com> Peter wrote: > On Thu, Jan 8, 2009 at 10:24 PM, Jason Eshleman wrote: > >> Greetings all, >> >> Presently, the code I have for dealing with STRUCTURE is similar to the code >> for interacting with Clustal, in that it does not modify any of the STRUCTURE >> source code by merely initiates the compiled executable. >> > > Biopython has code for interacting with lots of command line tools, > and this neatly avoids any copyright/licence questions about being a > derived work. > I have no problem with this provided that the parsing follows documented information such a description of the output. I would have a problem if you based it code from another source that uses undocumented information or information not obvious from the output. > >> Initially, I have used my code in place of their Java front end as it allows >> for more control of the run-time variables for successive runs with varying >> run parameters. At some point, I'd like to get it to interface more >> directly with the STRUCTURE code to be able to pipe results directly to >> python for parsing rather than working with the STRUCTURE text output but >> that's a ways off still. >> > > I'm not quite clear what you have in mind, but this would probably > need a little more thought from the legal perspective. If STRUCTURE > provides an API with header files you can compile against, that should > be OK (but I am not a lawyer). Note that do this within Biopython > would then mean adding another build time dependency, which would need > to be justified in terms of the benefits it brings. > > Peter > Linking against header files is a gray area but some views considered it to be illegal (see the Linux kernel discussions on that!). It does really depend on whether or not the result can be considered to a derivative. Unless STRUCTURE is released under a BSD-compatible license, you should not use any code from it (and probably should not even look at the code). Just saying the code is free is insufficient because code licensed under the GPL is 'free' but not BSD-compatible. So if STRUCTURE does not have a license then either get one or forget about this until it does have a BSD-compatible license. Alternatively, get STRUCTURE to support your changes. One is being difficult simply because of the potential impact on the Biopython project by including code incompatible with the BSD license. Bruce From biopython at maubp.freeserve.co.uk Mon Jan 12 18:19:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Jan 2009 18:19:03 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <496B77F1.9060207@gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> Message-ID: <320fb6e00901121019h72463a5dl316cabc85100c09d@mail.gmail.com> > We can not force people to run tests but hope that sufficient people who do > cover many of the variations as possible. Do we need to create buildbots (eg > http://sourceforge.net/projects/buildbot/)? Some kind of "buildbots" would be nice - possibly with something hosted on the OBF server to hold the reports (even just via the wiki pages would work). I have access to one or two platforms at work which might be able to act in this way, but the infrastructure isn't there yet. > I do not test or use BioSQL code because I do not use BioSQL and do not run > a compatible database on my system. So it would be really great if BioSQL > supported sqlite because the database requirements would be alleviated. This was recently requested on the BioSQL mailing list - and it would be nice. > The other related aspect is that certain applications like clustalw must be > in the path otherwise the application will not be found and the test > skipped. But I do not know how to solve this except perhaps using > environmental variables. Part of setting up a "buildbot" or test server would include installing all the optional command line tools (like ClustalW) so that the full test suite can be run. Peter From bsouthey at gmail.com Mon Jan 12 22:24:00 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 12 Jan 2009 16:24:00 -0600 Subject: [Biopython-dev] Alphabet case and standards Message-ID: <496BC300.90003@gmail.com> Hi, I am moving a potential discussion away from the bugzilla because it affects at least the following Bugs (please add others): 2351 (Make Seq more like a string, even subclass string? http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ), 2532 (Using IUPAC alphabets in mixed case Seq objects http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ), 2597 (Enforce alphabet letters in Seq objects http://bugzilla.open-bio.org/show_bug.cgi?id=2597 ) 2731 (Adding .upper() and .lower() methods to the Seq object http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ). I am hoping it gets wider feedback than using bugzilla, avoid unnecessary duplication and closure of these bugs. From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with defined lists of valid letters which are in upper case ONLY". But various applications ignore the alphabet case and hence the standards. So this creates the problem of how Biopython should handle alphabet case. If we follow the standard for all modules then there should be not need to do anything except to ensure we follow it. There are numerous examples where the standard is not followed including users ignorance, simplicity or design (such as using mixed case to denote 'important' things), and various databases and applications do not follow it. But I think that the actual case is irrelevant in most situations and not following the standard would make Biopython inefficient. One suggestion given in two of the bugs is to change the Alphabet object but I believe that this is wrong because you do not know which alphabet to use. If you already know the case then my preferred option is change the case of your query. Otherwise you would have to obtain and use one alphabet for every case used, for example, a user may need two alphabets to handle upper and lower case or just one combined one. Also, if mixed case alphabets are used, then an excessive number of alphabets may be required. I think that current approach is to force to user to using uppercase when interacting with the Alphabet object or derived from it (such as an actual alphabet). While this maintains storage of the input case, it does not enforce the standard. This is also inefficient because it requires constant checks for the correct case. Similar to the first suggestion in Bug 2731, I think that we should automatically changes the case when creating any sequence-related object and provide a warning that the input has changed. This enforces standard and probably requires small changes to the code but loses the format of the input. Outside of Biopython, an example of this is the web version of NCBI blast silently converts input case of the query. Less desirable options: a) Enforces the standard such as with Bug 2597 so that an error is return for any sequence-related object if the case is incorrect. This is probably a little too harsh for a difference in case. b) Use regular expressions to ignore case but this will create a large penalty especially if it is not required. Regards Bruce From bugzilla-daemon at portal.open-bio.org Mon Jan 12 22:43:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 17:43:55 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901122243.n0CMhtlZ017015@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ------- Comment #1 from bsouthey at gmail.com 2009-01-12 17:43 EST ------- (In reply to comment #0) > As part of making the Seq object more string like (Bug 2351), it would be nice > to support the .upper() and .lower() methods. Sure it would be nice in terms of following the string object, but I do not follow the reasons for having .upper() and .lower() methods to the Seq object. If we follow the standards, these should be unnecessary. The only time that I see is when you want this is to output the sequence. In such situations, the sequence is likely to be a string which has these methods. I do not consider that other applications can handle different case a sufficiently compelling reason. > > Doing this elegantly will require different case versions of the alphabets (see > Bug 2532), perhaps by adding (private) upper and lower methods to the Alphabet > object itself. > > Alternatively, we can handle this without adding new Alphabets by mapping the > fixed case IUPAC alphabets to case-less generic alphabets. > These comments suggests that Seq object needs to be case-aware which also affects other methods like string queries. But I think this is a different issue such as whether or not the standards would be enforced than having these two methods. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Jan 12 23:04:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Jan 2009 23:04:46 +0000 Subject: [Biopython-dev] Alphabet case and standards In-Reply-To: <496BC300.90003@gmail.com> References: <496BC300.90003@gmail.com> Message-ID: <320fb6e00901121504u6e9f3b7fu23e5f2ea25dee003@mail.gmail.com> On Mon, Jan 12, 2009 at 10:24 PM, Bruce Southey wrote: > Hi, > I am moving a potential discussion away from the bugzilla because it affects > at least the following Bugs (please add others): > 2351 (Make Seq more like a string, even subclass string? > http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ), > 2532 (Using IUPAC alphabets in mixed case Seq objects > http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ), > 2597 (Enforce alphabet letters in Seq objects > http://bugzilla.open-bio.org/show_bug.cgi?id=2597 ) > 2731 (Adding .upper() and .lower() methods to the Seq object > http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ). > > I am hoping it gets wider feedback than using bugzilla, avoid unnecessary > duplication and closure of these bugs. Yes, having a discussion on the mailing list is probably better than on bugzilla. I should probably write up my views on this topic explicitly, but I've tried to do so below in reply to your points. > From Bug 2351, "Bio.Alphabets.IUPAC defines a number of alphabets with > defined lists of valid letters which are in upper case ONLY". But various > applications ignore the alphabet case and hence the standards. So this > creates the problem of how Biopython should handle alphabet case. > ... I don't want to prevent people from using mixed case or lower case sequences if they want to. However, I do think doing so with an alphabet which is intended to be an upper case ONLY should be treated as an error. We currently have a number of generic alphabets which DO NOT define the a set of valid letters. We also have some IUPAC derived alphabet which define a set of upper case only expected letters. So, if you want to use lower or mixed case sequences in a Seq object, (1) Use a generic alphabet which does not explicitly define the valid letters (so any characters are allowed) (2) Use an explicit alphabet which includes the relevant cases. This could be a user defined alphabet, or we one added to Biopython. Most of the time in my personally usage, I don't actually care about the precise alphabet - the generic DNA/RNA/protein alphabets suffice. These do not list the expected/allowed letters, and thus can be used for upper case, lower case or mixed case sequences. Working with well defined alphabets is more important when working with things like BLOSUM matrices. > One suggestion given in two of the bugs is to change the Alphabet object but > I believe that this is wrong because you do not know which alphabet to use. The person creating the Seq object should know what kind of data they are dealing with, and if they specifically want to use say "mixed case unambiguous IUPAC DNA" (if this were in Biopython) then that's up to them. If you don't know exactly what you are dealing with, fall back on the generic DNA alphabet, or the generic nucleotide alphabet, or even the generic single letter alphabet. > ... Also, if mixed case alphabets are used, then an excessive number > of alphabets may be required. We *could* introduce mixed case IUPAC alphabets, and lower case IUPAC alphabets to complement the existing upper case IUPAC alphabets (see my patch on 2532). Yes, this does add a lot of alphabets, and I'm not entirely keen on this either. Maybe just adding mixed case versions would suffice? > I think that current approach is to force to user to using uppercase when > interacting with the Alphabet object or derived from it (such as an actual > alphabet). While this maintains storage of the input case, it does not > enforce the standard. This is also inefficient because it requires constant > checks for the correct case. Right now we don't force the user to do anything. I would like to make the alphabet check strict (Bug 2579), or at least give a warning. Running with this change locally has flagged up several typos in my unit tests - I think it is a good thing. > Similar to the first suggestion in Bug 2731, I think that we should > automatically changes the case when creating any sequence-related object and > provide a warning that the input has changed. This enforces standard and > probably requires small changes to the code but loses the format of the > input. Outside of Biopython, an example of this is the web version of NCBI > blast silently converts input case of the query. My personal view on automatically changing the case of the sequence string when creating a Seq object: NO WAY. You're throwing away potentially important data, and also preventing people from working with mixed case sequences - for no real benefit. > Less desirable options: > a) Enforces the standard such as with Bug 2597 so that an error is return > for any sequence-related object if the case is incorrect. This is probably a > little too harsh for a difference in case. It could be done as a warning for a couple of releases, and later an error. Why do you think it is too hash? Maybe I am being pedantic here, but lots of code gets written assuming uppercase letters only, and in this situation having any unwanted lower case caught early is a good thing. To my mind the whole point about the user explicity using for example the IUPAC protein alphabet is they expect the sequence to comply with the IUPAC conventions. I *WANT* to get an error if the sequence contained something invalid like a "@" character, or anything else not in the IUPAC definition. Mixed cases are a special case of this (the IUPAC standards use upper case). > b) Use regular expressions to ignore case but this will create a large > penalty especially if it is not required. I'm not sure what you mean here, but I don't think regular expressions are required. Peter From bugzilla-daemon at portal.open-bio.org Mon Jan 12 23:30:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 18:30:49 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901122330.n0CNUnG7021141@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-12 18:30 EST ------- Created an attachment (id=1191) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1191&action=view) Patch to Bio/Seq.py ONLY adding upper and lower methods This patch is a proof of principle of how we could add upper and lower methods while following the strict alphabet checking proposed on Bug 2597. The code is a little complicated/nasty in order to localise the change to Bio/Seq.py only. Here is a usage example with the patch applied, >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_dna = Seq("AGGGTGTTGA",IUPAC.IUPACUnambiguousDNA()) >>> my_dna Seq('AGGGTGTTGA', IUPACUnambiguousDNA()) >>> my_dna.lower() Seq('agggtgttga', NucleotideAlphabet()) >>> my_dna.lower().upper() Seq('AGGGTGTTGA', NucleotideAlphabet()) Note that If we implemented (private) upper and lower methods in the Alphabet objects as I suggested on Bug 2532, the code in the Seq class would be much simpler, e.g. def upper(self) : return Seq(str(self).upper(), self.alphabet._upper()) def lower(self) : return Seq(str(self).lower(), self.alphabet._upper()) The generic alphabets (where the list of letters is undefined) would just return self, while the AlphabetEncoders could also implement these methods simply. Individual explicit alphabets (i.e. the IUPAC ones) would have to define sensible upper/lower mappings - perhaps by defining lower case variants (see Bug 2532). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jan 13 00:21:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 19:21:42 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901130021.n0D0LgUu024264@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1191 is|0 |1 obsolete| | ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-12 19:21 EST ------- (From update of attachment 1191) There are a couple of "if" statements which should be "elif", but otherwise the patch seems to cover the basics. However, it does not cover the pathological/evil situation where a LETTER has been used for a stop codon or gap character. e.g. Something this should happen (assuming Bug 2597 is implemented in order to trigger the exception shown): >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC, Gapped >>> my_dna = Seq("AGGGTXGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x")) Traceback (most recent call last): ... ValueError: Letter 'X' not in Gapped(IUPACUnambiguousDNA(), 'x') >>> my_dna = Seq("AGGGTxGTTGA",Gapped(IUPAC.IUPACUnambiguousDNA(), "x")) >>> my_dna.lower() Seq('agggtxgttga', Gapped(DNAAlphabet(), 'x')) >>> my_dna.lower().upper() Seq('AGGGTXGTTGA', Gapped(DNAAlphabet(), 'X')) I think the most elegant way to deal with the AlphabetEncoders (stop and gaps) is by adding (private) upper/lower methods to the Alphabet objects as I outlined in comment 2. Patch taking this approach to follow... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Jan 13 00:30:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 12 Jan 2009 19:30:55 -0500 Subject: [Biopython-dev] [Bug 2731] Adding .upper() and .lower() methods to the Seq object In-Reply-To: Message-ID: <200901130030.n0D0UtHL024905@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2731 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-12 19:30 EST ------- Created an attachment (id=1192) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1192&action=view) Patch to Bio/Seq.py and Bio/Alphabet/__init__.py Implements upper/lower methods in the Seq object, handling the alphabet case conversion in the Alphabet object using (private) upper/lower methods. This could be extended for the IUPAC alphabets if we add lower case variants to those (see Bug 2532). This works for the evil example in comment 3 where the case of any extra characters from an AlphabetEncoder should also be changed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Jan 13 11:49:19 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 13 Jan 2009 12:49:19 +0100 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <320fb6e00901090455y334ecebdo55cd7c1a718ab499@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> Message-ID: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com> On Sat, Jan 10, 2009 at 6:03 PM, Peter wrote: > On Sat, Jan 10, 2009 at 4:52 PM, Tiago Ant?o wrote: >> On Sat, Jan 10, 2009 at 2:46 PM, Peter wrote: >>> Using the wiki in this way is a nice idea. Tiago - do you fancy >>> adding a PopGen page describing the additions you're working on? As a >>> bonus, once these do get into the main repository, you may find the >>> wiki text will be a useful basis for extending the documentation. >> >> Where do you want me to link the page on the Wiki? > > How about having two pages: > > http://biopython.org/wiki/PopGen > - documentation on the code in the current official release, > - linked to from the main doc page > > http://biopython.org/wiki/PopGen_dev ok, I have started writing something there.. _______________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From tiagoantao at gmail.com Tue Jan 13 12:14:05 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 13 Jan 2009 12:14:05 +0000 Subject: [Biopython-dev] Structure and LDNe In-Reply-To: <496B7F3A.60407@gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496397C9.3030706@gmail.com> <6.1.2.0.2.20090108141534.0323a5f0@mail.lmi.net> <320fb6e00901090459x52976449gddcc4483699e0f56@mail.gmail.com> <496B7F3A.60407@gmail.com> Message-ID: <6d941f120901130414v3f770f3dy84bc44e4b4a8e25f@mail.gmail.com> > Linking against header files is a gray area but some views considered it to > be illegal (see the Linux kernel discussions on that!). It does really > depend on whether or not the result can be considered to a derivative. Fortunately this is not the case with Jason's code. Anyway, if there is agreement on what you said, I think most of the comments made should be put on the Wiki in some form. I don't mind to draft something myself based on your comments. From tiagoantao at gmail.com Tue Jan 13 12:34:56 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 13 Jan 2009 12:34:56 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <496B77F1.9060207@gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> Message-ID: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> > I think we must be explicit in this and ensure that any accepted code is > BSD-compatible because we can not ensure what people really know. Further > the license of any application that Biopython interacts with must be clearly > stated and the developer is responsible to get one if it does not have one. > That way we know what is included and should help users as well in terms of > whether or not they can use some application. A point is not clear here to me: If you only interact with an (say command-line and web-based) application, is there a problem if that application has an unspecified license? There are 3 dimensions here that I find important 1. If biopython interacts with a application with no license are there possible liabilities with regards to the project? The same question in regards to users? 2. I would remember that interaction might be library based (with linking - where we know problems exist), command-line based (are there any problems?) and web-based (are there any problems different from the command-line case?). 3. I would suppose (for licensed non-free apps) that some licenses might not be clear in regards to this kind of usage. Would it be necessary to inspect the licenses in detail? A strict view regarding software without licenses (ie, no interaction at all) would require immediate removal of the fdist code (not very important, it is the part that is probably not used by anyone). No inclusion of LDNe code. And more importantly no STRUCTURE interaction code and no Genepop interaction code (although the file format parser that currently inside is OK). So, the very pertinent question are: 1. Can biopython command-line interact with applications with no license? 2. Is biopython interacting with applications (command-line or web) for which the license is not clear regarding interaction with software? From p.j.a.cock at googlemail.com Tue Jan 13 12:54:57 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 13 Jan 2009 12:54:57 +0000 Subject: [Biopython-dev] Developmental policies In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> Message-ID: <320fb6e00901130454i13f1faedw29e049f9b9df9478@mail.gmail.com> > So, the very pertinent question are: > 1. Can biopython command-line interact with applications with no license? I think so, yes. If there was a license then it may try and impose rules which could prevent this (possible in some legal jurisdictions?). Even "viral" licences like the GPL should be fine in this context. However, for the Population Genetics software you are talking about, trying to get the authors to make their licence explicit would be worthwhile (even if they just say its given freely to the public domain or whatever the terminology is). > 2. Is biopython interacting with applications (command-line or web) > for which the license is not clear regarding interaction with > software? For command line tools (e.g. ClustalW, BLAST) calling them from a script is common practice. In fact, by the nature command line tools are generally expected to be used in this way. I think we are OK here. For web tools, in some cases the provider provides clear instructions (e.g. NCBI and BLAST and Entrez). Another example is Bio.PDB can fetch files from the FTP site - which is by its nature provided as a public server. In other cases things are perhaps a little less clear cut. Speaking generally, many websites do have conditions imposed in their terms of service (e.g. TV listing sites don't want people "screen scraping" with a script to "steal" the schedule information), although these may not be legally enforeable. However, this is unlikely to be a problem in the academic setting applicable to most websites Biopython may interact with. Peter From bsouthey at gmail.com Tue Jan 13 16:50:28 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 13 Jan 2009 10:50:28 -0600 Subject: [Biopython-dev] Developmental policies In-Reply-To: <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> References: <320fb6e00901100946g62e26170o7e112f3b2f357e76@mail.gmail.com> <6d941f120901101031w22d9730dy87353cf22646d6fc@mail.gmail.com> <320fb6e00901101210k5e715beer240aa9338aa7ce2f@mail.gmail.com> <496B77F1.9060207@gmail.com> <6d941f120901130434u19c87dehe3c1376b4b20031@mail.gmail.com> Message-ID: <496CC654.5090806@gmail.com> Tiago Ant?o wrote: >> I think we must be explicit in this and ensure that any accepted code is >> BSD-compatible because we can not ensure what people really know. Further >> the license of any application that Biopython interacts with must be clearly >> stated and the developer is responsible to get one if it does not have one. >> That way we know what is included and should help users as well in terms of >> whether or not they can use some application. >> > > > A point is not clear here to me: If you only interact with an (say > command-line and web-based) application, is there a problem if that > application has an unspecified license? There are 3 dimensions here > that I find important > 1. If biopython interacts with a application with no license are there > possible liabilities with regards to the project? The same question in > regards to users? > I do not think that there is any real difference between the developer and the user as ignorance is usually not a good defense. If you use code from another application in your project with little or no modification (such as rewriting the code into Python) or did reverse-engineering or even looked at the code then your application could be controlled by the license of that application. Obviously if it has a license then you must abide those terms. If it does not have a license and you do not get permission to use that code then you have violated the original author's copyrights and you are liable for damages. Of course, as in one of the most important open-source related cases in the USA, the Jacobsen v. Katzer case (eg http://www.groklaw.net/article.php?story=2008081313212422 ) about the Java Model Railroad Interface (JMRI), those damages may be nothing. > 2. I would remember that interaction might be library based (with > linking - where we know problems exist), command-line based (are there > any problems?) and web-based (are there any problems different from > the command-line case?). > Unless the application forbids it then there is no problem on how you actually run the application. As Peter said, web tools also have conditions that you have keep or you will find yourself locked out. The main problem is using someone else's code in your project and the real problem is the actual terms of the code used. Using a function from that code in yours is a potential violation such as how to parse the output especially if it is in a binary format. If your code clearly follows the published documentation or a clean-room approach (see http://en.wikipedia.org/wiki/Clean_room_design ) was properly used then there should no problems. Linking only becomes a problem if your code can be considered a derivative or the license forbids linking such as the GPL but not the LGPL. However, this is a grey area as evident from the use of binary drivers in Linux. > 3. I would suppose (for licensed non-free apps) that some licenses > might not be clear in regards to this kind of usage. Would it be > necessary to inspect the licenses in detail? > Yes, you must inspect any license in detail because even downloading the code can involve or imply acceptance of the terms. Some licenses, usually for commercial applications, are rather nasty in terms what can and can not be done like no reverse engineering. Even open source license like the GPL v3 can have some unexpected side effects (ie related to patents). Most non-open source licenses (including academic only licenses) that I have seen related to bioinformatics usually are aimed at restricting the commercial usage of the code and the subsequent distribution of it. But you need to see if there are other restrictions involved that limit the output from that application. > A strict view regarding software without licenses (ie, no interaction > at all) would require immediate removal of the fdist code (not very > important, it is the part that is probably not used by anyone). No > inclusion of LDNe code. And more importantly no STRUCTURE interaction > code and no Genepop interaction code (although the file format parser > that currently inside is OK). > If the interaction is just creating inputs, running the standalone application and parsing the output, then those interactions should be okay. Obviously the code to create the input and parse the output must be free of the application like based on public documentation or a clean-room approach. If the interaction creates a derivative such as when the code of the application is required in addition to your code then it is not okay. Further, as Peter commented elsewhere, there needs to be strong justification to include it into Biopython. Rather I would strongly suggest that you try to get your code included in the other application as it may help other users and you don't have to maintain a version of the original application. > So, the very pertinent question are: > 1. Can biopython command-line interact with applications with no license? > Yes, but must not be considered a derivative of the application or it must do so in terms of the license. For example, AlignACE uses the Harvard University license where everyone using it must have their own license or it can be run on a second computer provided that only one copy is running at a time. > 2. Is biopython interacting with applications (command-line or web) > for which the license is not clear regarding interaction with > software? > I do not know the answer to this question because I do not know or use all the applications involved. However, we do need to create a list of applications with associated web sites and licenses that Biopython 'interacts' with which would answer this question. Regards Bruce From bsouthey at gmail.com Wed Jan 14 20:24:29 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 14 Jan 2009 14:24:29 -0600 Subject: [Biopython-dev] Running Biopython tests on windows xp Message-ID: <496E49FD.4080305@gmail.com> Hi, I decided to install windows on a virtual system part to have a windows test system. I installed Python 2.5, numpy 1.2 and biopython 1.49 using binary installers. I am aiming to get add the optional software like Reportlab and a C compiler. Is there a way to run the Biopython tests within Python rather than using the system command line? When I run the tests from the command like I get a number a failures that I think are due to a lack of a C compiler. Are these expected or do you want bug reports? Bruce C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.49>c:\Pyt hon25\python.exe setup.py test running test test_Ace ... ok test_AlignIO ... ok test_BioSQL ... skipping. Install MySQLdb or correct Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). ok test_BioSQL_SeqIO ... skipping. Install MySQLdb or correct Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). ok test_CAPS ... ERROR test_Clustalw ... ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to us e Bio.Clustalw. ok test_Cluster ... FAIL test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. ok test_EmbossPrimer ... ok test_Entrez ... ok test_Enzyme ... ok test_FSSP ... ok test_Fasta ... ok test_Fasta2 ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GFF ... skipping. Environment is not configured for this test (not importan t if you do not plan to use Bio.GFF). ok test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF. ok test_GenBank ... ok test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.G raphics. ok test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio .Graphics. ok test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Grap hics. ok test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_IsoelectricPoint ... ok test_KDTree ... ERROR test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LocationParser ... ok test_LogisticRegression ... ok test_MEME ... ok test_MarkovModel ... ok test_Medline ... ok test_NCBIStandalone ... ok test_NCBIXML ... ok test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PDB ... ERROR test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDis t. ok test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen .SimCoal. ok test_PopGen_SimCoal_nodepend ... ok test_ProtParam ... ok test_Registry ... ok test_Restriction ... ERROR test_SCOP_Astral ... ok test_SCOP_Cla ... FAIL test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... FAIL test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SProt ... ok test_SVDSuperimposer ... ok test_SeqIO ... ok test_SeqIO_online ... ok test_SeqUtils ... ok test_SubsMat ... ok test_UniGene ... ok test_Wise ... skipping. Don't know how to find the Wise2 tool dnal on Windows. ok test_align ... ok test_docstrings ... ok test_geo ... ok test_interpro ... ok test_kNN ... ok test_lowess ... ok test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... ok test_prosite ... ok test_prosite2 ... ok test_psw ... skipping. Don't know how to find the Wise2 tool dnal on Windows. ok test_seq ... ok test_translate ... ok test_trie ... ERROR test_triefind ... ERROR ====================================================================== ERROR: test_CAPS ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_CAPS.py", line 3, in from Bio.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\__init__.py", line 61, in from Bio.Restriction.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\Restriction.py", line 96, in from Bio.Restriction.PrintFormat import PrintFormat File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\PrintFormat.py", line 14, in from Bio.Restriction.DNAUtils import complement ImportError: No module named DNAUtils ====================================================================== ERROR: test_KDTree ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_KDTree.py", line 10, in from Bio.KDTree.KDTree import _neighbor_test, _test File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\__init__.py", line 10, in from KDTree import KDTree File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\KDTree.py", line 20, in from Bio.KDTree import _CKDTree ImportError: cannot import name _CKDTree ====================================================================== ERROR: test_PDB ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_PDB.py", line 98, in run_test() File "test_PDB.py", line 90, in run_test quick_neighbor_search_test() File "test_PDB.py", line 19, in quick_neighbor_search_test from Bio.PDB.NeighborSearch import NeighborSearch File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\PDB\NeighborSearch.py", line 8, in from Bio.KDTree import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\__init__.py", line 10, in from KDTree import KDTree File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\KDTree.py", line 20, in from Bio.KDTree import _CKDTree ImportError: cannot import name _CKDTree ====================================================================== ERROR: test_Restriction ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_Restriction.py", line 8, in from Bio.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\__init__.py", line 61, in from Bio.Restriction.Restriction import * File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\Restriction.py", line 96, in from Bio.Restriction.PrintFormat import PrintFormat File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\Restriction\PrintFormat.py", line 13, in from Bio.Restriction import RanaConfig as RanaConf ImportError: cannot import name RanaConfig ====================================================================== ERROR: test_trie ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_trie.py", line 6, in from Bio import trie ImportError: cannot import name trie ====================================================================== ERROR: test_triefind ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_triefind.py", line 6, in from Bio import trie ImportError: cannot import name trie ====================================================================== FAIL: test_Cluster ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest expected_handle) File "run_tests.py", line 263, in compare_output % (repr(output_line), repr(expected_line)) AssertionError: Output : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n' Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n' ====================================================================== FAIL: test_SCOP_Cla ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest expected_handle) File "run_tests.py", line 263, in compare_output % (repr(output_line), repr(expected_line)) AssertionError: Output : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n' Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n' ====================================================================== FAIL: test_SCOP_Raf ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 162, in runSafeTest expected_handle) File "run_tests.py", line 263, in compare_output % (repr(output_line), repr(expected_line)) AssertionError: Output : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n' Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n' ---------------------------------------------------------------------- Ran 96 tests in 86.153s FAILED (failures=3, errors=6) C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.49> From tiagoantao at gmail.com Wed Jan 14 20:52:58 2009 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 14 Jan 2009 20:52:58 +0000 Subject: [Biopython-dev] Developmental and experimental branches In-Reply-To: <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com> References: <6d941f120901051648t4be985e5se19b0f07fc6bd6b8@mail.gmail.com> <496778D2.1050801@gmail.com> <5aa3b3570901090917m7c82fb17hb4c93235702b120b@mail.gmail.com> <320fb6e00901090928u662023d6rd6f2d82c5fbb7365@mail.gmail.com> <8b34ec180901091017o172e4acemf91c0a46a736bdb1@mail.gmail.com> <20090109225155.GF4135@sobchak.mgh.harvard.edu> <320fb6e00901100646y6132686ap8a928404dd1e36c3@mail.gmail.com> <6d941f120901100852g47b10e9ar214cf2ad2b206f6@mail.gmail.com> <320fb6e00901100903v1aa0180bsd1ca5335f7da1f7f@mail.gmail.com> <5aa3b3570901130349u32924629lcf914579de34626e@mail.gmail.com> Message-ID: <6d941f120901141252x1a1088f9n7f30d894f35c18ab@mail.gmail.com> >> http://biopython.org/wiki/PopGen_dev > > ok, I have started writing something there.. I've edited the development one. I would recommend anyone interested in tracking the changes to watch the page. From biopython at maubp.freeserve.co.uk Wed Jan 14 21:43:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jan 2009 21:43:33 +0000 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <496E49FD.4080305@gmail.com> References: <496E49FD.4080305@gmail.com> Message-ID: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> On Wed, Jan 14, 2009 at 8:24 PM, Bruce Southey wrote: > Hi, > I decided to install windows on a virtual system part to have a windows test > system. I installed Python 2.5, numpy 1.2 and biopython 1.49 using binary > installers. I am aiming to get add the optional software like Reportlab and > a C compiler. If you are installing Biopython using our Windows Installer then you shouldn't need a C compiler. If you would like to install from source, then yes, you will need a C compiler. You can either try the appropriate MS compiler for your version of python, or we suggest Mingw32 from cygwin. > Is there a way to run the Biopython tests within Python rather than using > the system command line? Not really - why do you want to? I suppose you could use python to invoke the command "python run_tests.py". > When I run the tests from the command like I get a number a failures that I > think are due to a lack of a C compiler. > > Are these expected or do you want bug reports? These are not expected. The whole test suite passes for me on Windows where I have installed Biopython from source. So you installed Biopython using our Window Installer - how did you get the unit tests? I'm pretty sure the SCOP failures are due to the files under Tests\SCOP having Unix line endings instead of Windows line endings (we're fixed some similar issues in the past). Note that both the source code archives as *.zip and *.tar.gz use Unix line endings internally, but if you used CVS it should have got them with Windows line endings for you. However, most of your test failures do seem to be related to C code in some way. I wonder if this is linked to the virtual environment? I should be able to try the Biopython 1.49 installer with Python 2.5 on a Windows machine myself to check that... The list of failures: > test_CAPS ... ERROR > test_Cluster ... FAIL > test_KDTree ... ERROR > test_PDB ... ERROR > test_Restriction ... ERROR > test_SCOP_Cla ... FAIL > test_SCOP_Raf ... FAIL > test_trie ... ERROR > test_triefind ... ERROR And some comments on the messages: > ERROR: test_CAPS > ... > from Bio.Restriction.DNAUtils import complement > ImportError: No module named DNAUtils Strange. Note Bio.Restriction.DNAUtils is a C module. > ERROR: test_KDTree > ... > from Bio.KDTree import _CKDTree > ImportError: cannot import name _CKDTree Again, Bio.KDTree. _CKDTree is a C module > ERROR: test_PDB > ... > from Bio.KDTree import _CKDTree > ImportError: cannot import name _CKDTree Same failure as test_KDTree > ERROR: test_Restriction > ... > from Bio.Restriction import RanaConfig as RanaConf > ImportError: cannot import name RanaConfig Odd. RanaConfig is a pure python module, and pretty short too. > ERROR: test_trie > ... > from Bio import trie > ImportError: cannot import name trie Bio.trie is another C module > ERROR: test_triefind > ... > from Bio import trie > ImportError: cannot import name trie Same error as test_trie above. > FAIL: test_Cluster > ... > Output : 'test_clusterdistance (test_Cluster.TestCluster) ... ERROR\n' > Expected: 'test_clusterdistance (test_Cluster.TestCluster) ... ok\n' Could you run this test directly (python test_Cluster.py) which should give a more helpful message. But again, this module does include some C code.... > FAIL: test_SCOP_Cla > ... > Output : 'testIndex (test_SCOP_Cla.ClaTests) ... ERROR\n' > Expected: 'testIndex (test_SCOP_Cla.ClaTests) ... ok\n' I think this is just a new line issue. > FAIL: test_SCOP_Raf > ... > Output : 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ERROR\n' > Expected: 'testSeqMapIndex (test_SCOP_Raf.RafTests) ... ok\n' I think this is just a new line issue. Peter From bsouthey at gmail.com Wed Jan 14 22:48:27 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 14 Jan 2009 16:48:27 -0600 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> Message-ID: <496E6BBB.2020506@gmail.com> Peter wrote: > These are not expected. The whole test suite passes for me on Windows > where I have installed Biopython from source. > > So you installed Biopython using our Window Installer - how did you > get the unit tests? I'm pretty sure the SCOP failures are due to the > files under Tests\SCOP having Unix line endings instead of Windows > line endings (we're fixed some similar issues in the past). Note that > both the source code archives as *.zip and *.tar.gz use Unix line > endings internally, but if you used CVS it should have got them with > Windows line endings for you. > > However, most of your test failures do seem to be related to C code in > some way. I wonder if this is linked to the virtual environment? I > should be able to try the Biopython 1.49 installer with Python 2.5 on > a Windows machine myself to check that... > > The list of failures: > >> test_CAPS ... ERROR >> test_Cluster ... FAIL >> test_KDTree ... ERROR >> test_PDB ... ERROR >> test_Restriction ... ERROR >> test_trie ... ERROR >> test_triefind ... ERROR >> Using IDLE, 'from Bio.Restriction import *' works correctly. These ones are failures to find the correct biopython installation. Both 'python setup.py test' and 'python run_tests.py' are assuming that I have built from source and everything is in the local directory. But that assumption is wrong since I used the Biopython binary installer so technically the tests I run are invalid. The difference for these failures can be seen here: C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe test_KDTree.py Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. Passed. C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe run_tests.py test_KDTree.py test_KDTree ... ERROR ====================================================================== ERROR: test_KDTree ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 125, in runTest self.runSafeTest() File "run_tests.py", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_KDTree.py", line 10, in from Bio.KDTree.KDTree import _neighbor_test, _test File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\__init__.py", line 10, in File "C:\Documents and Settings\virtualme\Desktop\biopython-1.49\biopython-1.4 9\Bio\KDTree\KDTree.py", line 20, in ImportError: cannot import name _CKDTree ---------------------------------------------------------------------- Ran 1 test in 0.100s FAILED (errors=1) For the SCOP tests, this is as you say, a 'end of line' issue between windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and saved it with a new name. The line from testIndex in test_SCOP_Cla.py that gave the error index['d4hbia_'] works with the new file but not the old file. I also installed reportlab and biosql and these pass the tests (except for the mysql warning with Biosql that Peter reported). Regards Bruce From biopython at maubp.freeserve.co.uk Wed Jan 14 23:27:27 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 14 Jan 2009 23:27:27 +0000 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <496E6BBB.2020506@gmail.com> References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> <496E6BBB.2020506@gmail.com> Message-ID: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey wrote: > Using IDLE, 'from Bio.Restriction import *' works correctly. > > These ones are failures to find the correct biopython installation. Both > 'python setup.py test' and 'python run_tests.py' are assuming that I have > built from source and everything is in the local directory. But that > assumption is wrong since I used the Biopython binary installer so > technically the tests I run are invalid. I think I understand what's going on now. All these failures are essentially due to the unusual and unexpected setup on your machine (or for the SCOP tests, the line endings). You still didn't explain how/where you installed the test scripts etc, but what I think is happening is the following: You're official installation (including the compiled C code) create using the Windows Installer is in one place, say under C:\XXX\site-packages for the sake of discussion. You've unpacked the source code in another location, and are trying to run the test suite there. This set of files will NOT have the compiled C code - and thus running some of the tests via run_tests.py will fail. If you run individual test_XXX.py files this should use the system installed files under C:\XXX\site-packages and so the test should work. It would be a bit of a hack, but you can probably overcome this by manually copying the installed compiled modules from C:\XXX\site-packages into the unpacked source code (under a suitably named build sub directory), or moving the Test suite next to the installed code. Alternatively, you could try editing run_tests.py to comment out the path "magic" so that is just uses the system installation of Biopython (rather than trying to use the local copy it expects you to have just built from source), i.e. try commenting out these two lines in run_tests.py found near the start of the main function: sys.path.insert(1, source_path) sys.path.insert(1, build_path) However, I'm no longer surprised that the C code tests are failing, and don't think this is a bug per se. > For the SCOP tests, this is as you say, a 'end of line' issue between > windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and > saved it with a new name. The line from testIndex in test_SCOP_Cla.py that > gave the error index['d4hbia_'] works with the new file but not the old > file. Good to confirm that. If you spot an easy cross platform fix so that the SCOP code can cope with either line ending that would be good, but I didn't consider this worth sending much time on. > I also installed reportlab and biosql and these pass the tests (except for > the mysql warning with Biosql that Peter reported). Good. Out of interest, which BioSQL warning are you talking about? Peter From bsouthey at gmail.com Thu Jan 15 03:10:30 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 14 Jan 2009 21:10:30 -0600 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> <496E6BBB.2020506@gmail.com> <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> Message-ID: On Wed, Jan 14, 2009 at 5:27 PM, Peter wrote: > On Wed, Jan 14, 2009 at 10:48 PM, Bruce Southey wrote: >> Using IDLE, 'from Bio.Restriction import *' works correctly. >> >> These ones are failures to find the correct biopython installation. Both >> 'python setup.py test' and 'python run_tests.py' are assuming that I have >> built from source and everything is in the local directory. But that >> assumption is wrong since I used the Biopython binary installer so >> technically the tests I run are invalid. > > I think I understand what's going on now. All these failures are > essentially due to the unusual and unexpected setup on your machine > (or for the SCOP tests, the line endings). I do not see it as unusual as it does follow the instructions. But these clearly need some enhancement to address perhaps a variation of one of the options below. I am now curious about what happens under Linux distros because these may have the same issue. > You still didn't explain > how/where you installed the test scripts etc, but what I think is > happening is the following: > > You're official installation (including the compiled C code) create > using the Windows Installer is in one place, say under > C:\XXX\site-packages for the sake of discussion. > > You've unpacked the source code in another location, and are trying to > run the test suite there. This set of files will NOT have the > compiled C code - and thus running some of the tests via run_tests.py > will fail. If you run individual test_XXX.py files this should use > the system installed files under C:\XXX\site-packages and so the test > should work. Correct! The installation documentation is lacking at least for the binary installer. Depending on what happens, I will write down this information. Would be be a hassle to include the tests with the binary installer? At least of the tests should work if they are run from that directory. > > It would be a bit of a hack, but you can probably overcome this by > manually copying the installed compiled modules from > C:\XXX\site-packages into the unpacked source code (under a suitably > named build sub directory), or moving the Test suite next to the > installed code. While this would work for the binary installer, I do not think it is suitable solution for building it from source - especially if someone has the binary installer and is building but not necessary installing from source. > > Alternatively, you could try editing run_tests.py to comment out the > path "magic" so that is just uses the system installation of Biopython > (rather than trying to use the local copy it expects you to have just > built from source), i.e. try commenting out these two lines in > run_tests.py found near the start of the main function: > > sys.path.insert(1, source_path) > sys.path.insert(1, build_path) I think the best solution is to fix this part because these assume the location of the source and build directories even if these are not really present. I would suggest we add a new commandline option that causes the source_path and/or build_path variables to be undefined forcing Python to use the installed versions. Passing a user-specified path is also an option but these can get long. > However, I'm no longer surprised that the C code tests are failing, > and don't think this is a bug per se. Agreed - just a case that has not been addressed yet. > >> For the SCOP tests, this is as you say, a 'end of line' issue between >> windows and Linux. I opened 'and dir.cla.scop.txt_test' with wordpad and >> saved it with a new name. The line from testIndex in test_SCOP_Cla.py that >> gave the error index['d4hbia_'] works with the new file but not the old >> file. > > Good to confirm that. If you spot an easy cross platform fix so that > the SCOP code can cope with either line ending that would be good, but > I didn't consider this worth sending much time on. When I get to my system, I will see if my Linux system will accept the file correctly because the other SCOP tests did work. If I get time I will try to look at that as I looked at the function and I think it is just the way the file is being used. > >> I also installed reportlab and biosql and these pass the tests (except for >> the mysql warning with Biosql that Peter reported). > > Good. Out of interest, which BioSQL warning are you talking about? > > Peter Sorry, I do not have that handy but it is depreciation one for a setting that will be gone in MySQL 5.2. Bruce From biopython at maubp.freeserve.co.uk Thu Jan 15 12:46:21 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Jan 2009 12:46:21 +0000 Subject: [Biopython-dev] Running Biopython tests on windows xp In-Reply-To: References: <496E49FD.4080305@gmail.com> <320fb6e00901141343r529df66did6e172343592843d@mail.gmail.com> <496E6BBB.2020506@gmail.com> <320fb6e00901141527t1d2be466uf0b87f12b0d51d3a@mail.gmail.com> Message-ID: <320fb6e00901150446j57748cf0mb493601444a9422d@mail.gmail.com> >> >> I think I understand what's going on now. All these failures are >> essentially due to the unusual and unexpected setup on your machine >> (or for the SCOP tests, the line endings). > > I do not see it as unusual as it does follow the instructions. But > these clearly need some enhancement to address perhaps a variation of > one of the options below. There are no instructions on how to install Biopython on Windows using the provided installer and then run the unit tests - so I don't understand what you mean by you followed the instructions. If the installer came with the unit tests then this would be sensible. Right now the only documented way to run the unit tests is part of an installation from source. >> You've unpacked the source code in another location, and are trying to >> run the test suite there. This set of files will NOT have the >> compiled C code - and thus running some of the tests via run_tests.py >> will fail. If you run individual test_XXX.py files this should use >> the system installed files under C:\XXX\site-packages and so the test >> should work. > > Correct! > > The installation documentation is lacking at least for the binary > installer. Depending on what happens, I will write down this > information. > > Would be be a hassle to include the tests with the binary installer? I don't know enough about distutils to answer that. So the short answer is yes, it might be a hassle. > At least of the tests should work if they are run from that directory. Which directory? >> It would be a bit of a hack, but you can probably overcome this by >> manually copying the installed compiled modules from >> C:\XXX\site-packages into the unpacked source code (under a suitably >> named build sub directory), or moving the Test suite next to the >> installed code. > > While this would work for the binary installer, I do not think it is > suitable solution for building it from source - especially if someone > has the binary installer and is building but not necessary installing > from source. The hack suggested was specifically for combining the installed files from the Windows installer with the test suite by hand - you don't need to do anything special if you are building from source. The current run_tests.py should work perfectly for anyone building from source (on Windows, Linux and Mac). You can (and ideally should) build biopython, and then run the tests BEFORE installing it. >> Alternatively, you could try editing run_tests.py to comment out the >> path "magic" so that is just uses the system installation of Biopython >> (rather than trying to use the local copy it expects you to have just >> built from source), i.e. try commenting out these two lines in >> run_tests.py found near the start of the main function: >> >> sys.path.insert(1, source_path) >> sys.path.insert(1, build_path) > > I think the best solution is to fix this part because these assume the > location of the source and build directories even if these are not > really present. If you are building from source this is a safe assumption (and in fact the code does check they exist). We WANT to run the tests using the just built and not yet installed files! > I would suggest we add a new commandline option that > causes the source_path and/or build_path variables to be undefined > forcing Python to use the installed versions. Passing a user-specified > path is also an option but these can get long. Yes, an option to run_test.py to use the system installed version of Biopython could solve this particular situation. Alternatively, and perhaps more simply for the end user, we could add a prompt if there is no build directory to ask the user if they want to run the tests using an already installed version of Biopython. I might have time to come up with a patch for this... >> However, I'm no longer surprised that the C code tests are failing, >> and don't think this is a bug per se. > > Agreed - just a case that has not been addressed yet. ---------------------------------------------------------------------------------------------- >>> I also installed reportlab and biosql and these pass the tests (except for >>> the mysql warning with Biosql that Peter reported). >> >> Good. Out of interest, which BioSQL warning are you talking about? >> >> Peter > > Sorry, I do not have that handy but it is depreciation one for a > setting that will be gone in MySQL 5.2. You might be referring to BioSQL Bug 2568, http://bugzilla.open-bio.org/show_bug.cgi?id=2568 Peter From bugzilla-daemon at portal.open-bio.org Thu Jan 15 14:37:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 09:37:57 -0500 Subject: [Biopython-dev] [Bug 2733] New: Unit tests incorrectly assume that Biopthyon was built from source Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2733 Summary: Unit tests incorrectly assume that Biopthyon was built from source Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: minor Priority: P4 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com If Biopython is not built from source and the tests are run from a different place than the installation, the test that use C objects fail because these are not found (an example is below). Currently the test environment uses the Biopython in the build directory. It would be nice to be able to optionally specify some other Biopython such as the installed version using say a command line argument. Example of a failure: ====================================================================== ERROR: test_KDTree ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py.orig", line 125, in runTest self.runSafeTest() File "run_tests.py.orig", line 138, in runSafeTest cur_test = __import__(self.test_name) File "test_KDTree.py", line 10, in from Bio.KDTree.KDTree import _neighbor_test, _test File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/__init__.py", line 10, in from KDTree import KDTree File "/home/bsouthey/python/biopython_cvs/biopython/Bio/KDTree/KDTree.py", line 20, in from Bio.KDTree import _CKDTree ImportError: cannot import name _CKDTree ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 14:44:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 09:44:15 -0500 Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that Biopthyon was built from source In-Reply-To: Message-ID: <200901151444.n0FEiFd8020991@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #1 from bsouthey at gmail.com 2009-01-15 09:44 EST ------- Created an attachment (id=1197) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1197&action=view) Patch to avoid adding source path if Biopython is not built from source This is a simple path to that just moves the inclusion of the source path to being conditional on the presence of the build directory. That is, if a build directory exists, then we assume that Biopython was built from the source. But if the build directory does not exist then the source path is not added and the test environment will use the installed Biopython and not the source directory. This patch works on a Linux system with the build directory removed and a Windows XP system using the binary Biopython installer. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 15:20:58 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 10:20:58 -0500 Subject: [Biopython-dev] [Bug 2733] Unit tests incorrectly assume that Biopthyon was built from source In-Reply-To: Message-ID: <200901151520.n0FFKwqZ024124@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-15 10:20 EST ------- Created an attachment (id=1198) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view) Patch to Tests/run_tests.py Bruce, Could you try out this alternative patch which tries to tell the user what is happening in this atypical situation. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 15:26:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 10:26:13 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151526.n0FFQD5F024483@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|minor |enhancement Summary|Unit tests incorrectly |Runing unit tests where |assume that Biopthyon was |Biopthyon wasn't built from |built from source |source ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-15 10:26 EST ------- Retitling bug and marking it as an enhancement. The main use case for this is Windows users who installed Biopython from one our Windows Installers (pre-compiled, does not include the unit tests), and later download and unzip the source code archive in order to run the unit tests. As Bruce points out, this might also apply to Linux users who install a Biopython package (pre-compiled, and presumably not including the unit tests), and then want to run the unit tests without themselves compiling Biopython. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 15:41:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 10:41:34 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151541.n0FFfYgG025830@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #4 from dalloliogm at gmail.com 2009-01-15 10:41 EST ------- (In reply to comment #0) What about re-organizing the tests in three categories: - the ones needed to make sure the modules don't contain errors - the ones needed to make sure that biopython can run correctly in the user's environment - the ones needed to make sure that the C modules are compiled correctly. Usually, people don't need to repeat the tests from case 1, but only case 2 and in 3 if they have compiled biopython by theirselves. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 16:09:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 11:09:34 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151609.n0FG9Y5V028318@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #5 from bsouthey at gmail.com 2009-01-15 11:09 EST ------- (In reply to comment #2) > Created an attachment (id=1198) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1198&action=view) [details] > Patch to Tests/run_tests.py > > Bruce, > > Could you try out this alternative patch which tries to tell the user what is > happening in this atypical situation. > > Peter > Very quickly it works for my Linux system where I removed the build directory but have Biopython installed. I will let you known for Windows and also when Biopython is not installed. But I do not foresee any problems with the patch. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 17:18:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 12:18:31 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901151718.n0FHIVSm001687@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-15 12:18 EST ------- (In reply to comment #4) > (In reply to comment #0) > > What about re-organizing the tests in three categories: > - the ones needed to make sure the modules don't contain errors > - the ones needed to make sure that biopython can run correctly > in the user's environment > - the ones needed to make sure that the C modules are compiled correctly. > > Usually, people don't need to repeat the tests from case 1, but only > case 2 and in 3 if they have compiled biopython by theirselves. Case 1 applies to all the unit tests. Case 2 applies to all the unit tests whose dependencies are present. Case 3 applies to those modules with C code. I don't really understand your divisions. If was compiling Biopython myself, I've want all the tests run. If I installed a pre-compiled version Biopython (from a Linux distribution or the Windows installers), I'd still want to try and run all the tests. There is the special case of trying to use Biopython without the C code modules (e.g. installing from source without a C compiler, or for repackaging a subset of the modules), but that is atypical. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 20:31:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 15:31:21 -0500 Subject: [Biopython-dev] [Bug 2733] Runing unit tests where Biopthyon wasn't built from source In-Reply-To: Message-ID: <200901152031.n0FKVLDp015913@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2733 ------- Comment #7 from bsouthey at gmail.com 2009-01-15 15:31 EST ------- (In reply to comment #5) > (In reply to comment #2) Just to confirm that it works as expected with windows xp 1) Without Biopython installed C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py You do not seem to have built Biopython from source. You do not seem to have installed Biopython. 2) With Biopython installed: C:\Documents and Settings\virtualme\Desktop\Python_packages\biopython-1.49\biopy thon-1.49\Tests>c:\Python25\python.exe run_tests3.py test_trie.py You do not seem to have built Biopython from source. Unit tests will be run using the installed Biopython. test_trie ... ok ---------------------------------------------------------------------- Ran 1 test in 0.731s OK -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 15 23:55:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 15 Jan 2009 18:55:14 -0500 Subject: [Biopython-dev] [Bug 2734] New: db.load problem with postgresql and psycopg2 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2734 Summary: db.load problem with postgresql and psycopg2 Product: Biopython Version: 1.49 Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: stephen at blackrim.net I have a simple script to load sequences into a postgresql database using the biosql schema and biopython db.load function. here is the script : from Bio import GenBank from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="psycopg2", user=...) db = server["plants"] for i in range(37): handle = open("PLN/gbpln"+str(i+1)+".seq", "r") db.load(SeqIO.parse(handle,"genbank")) handle.close() print str(i+1) server.adaptor.commit() there is an error with the output and here it is with some of the psycopg2 debug info: asis_dealloc: deleted asis object at 0x52350, refcnt = 0 psyco_curs_execute: cvt->refcnt = 1 curs_execute: pg connection at 0x8d0c00 OK pq_begin: pgconn = 0x8d0c00, isolevel = 1, status = 2 pq_begin: transaction in progress pq_execute: executing SYNC query: SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND dbxref_id = "6" pq_execute: entering syncronous DBAPI compatibility mode pq_fetch: pgstatus = PGRES_FATAL_ERROR pq_fetch: uh-oh, something FAILED pq_fetch: fetching done; check for critical errors psyco_curs_execute: res = -1, pgres = 0x0 Traceback (most recent call last): File "add_seqs_subdb2 2.py", line 9, in db.load(SeqIO.parse(handle,"genbank")) File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420, in load db_loader.load_seqrecord(cur_record) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in load_seqrecord self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in _load_seqfeature self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in _load_seqfeature_qualifiers seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in _load_seqfeature_dbxref self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in _get_seqfeature_dbxref dbxref_id)) File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295, in execute_and_fetch_col0 self.cursor.execute(sql, args or ()) psycopg2.ProgrammingError: column "3" does not exist LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db... it seems like there could be some issues with the double quotes but i am not sure where that is being called. i am using postgresql 8.2. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 16 10:24:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Jan 2009 05:24:16 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901161024.n0GAOGFA015422@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-16 05:24 EST ------- Hi Stephen, Does this happen for all the files you've tried, or just one or two? If its the later it may be something funny about the file and how its been parsed. I'm guessing you downloaded the GenBank files from ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing. Have you tried running the Biopython unit tests - in particular the two for BioSQL? I presume you installed Biopython from source on your Mac, so you should have all the files present. You'll need to edit the file Tests/setup_BioSQL.py to point to a suitable postgresql test database. P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to import Bio.GenBank (first line of code snippet). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 16 19:12:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Jan 2009 14:12:28 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901161912.n0GJCSWO030831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #2 from stephen at blackrim.net 2009-01-16 14:12 EST ------- Hi Peter, Thanks for the quick reply. I will try to answer everything here. So I just reran the BioSQL tests and I get test_BioSQL ... ok test_BioSQL_SeqIO ... ok so seems like everything there is fine (and I did configure the test for postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it happens not only with all the files but also with the example on the biopython biosql wiki page. Specifically with this example: from Bio import Entrez from Bio import SeqIO from BioSQL import BioSeqDatabase server = BioSeqDatabase.open_database(driver="psycopg2", ...) db = server["plants"] handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") db.load(SeqIO.parse(handle, "genbank")) server.adaptor.commit() I get the same error: Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 420, in load db_loader.load_seqrecord(cur_record) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 50, in load_seqrecord self._load_seqfeature(seq_feature, seq_feature_num, bioentry_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 508, in _load_seqfeature self._load_seqfeature_qualifiers(feature.qualifiers, seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 607, in _load_seqfeature_qualifiers seqfeature_id) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in _load_seqfeature_dbxref self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in _get_seqfeature_dbxref dbxref_id)) File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295, in execute_and_fetch_col0 self.cursor.execute(sql, args or ()) psycopg2.ProgrammingError: column "3" does not exist LINE 1: ...f_id FROM seqfeature_dbxref WHERE seqfeature_id = "3" AND db... Thanks for any help. Stephen (In reply to comment #1) > Hi Stephen, > > Does this happen for all the files you've tried, or just one or two? If its > the later it may be something funny about the file and how its been parsed. > I'm guessing you downloaded the GenBank files from > ftp://ftp.ncbi.nih.gov/genbank/ so could you tell us one which is failing. > > Have you tried running the Biopython unit tests - in particular the two for > BioSQL? I presume you installed Biopython from source on your Mac, so you > should have all the files present. You'll need to edit the file > Tests/setup_BioSQL.py to point to a suitable postgresql test database. > > P.S. As you are using Bio.SeqIO to parse the GenBank file, you don't need to > import Bio.GenBank (first line of code snippet). > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 17 10:09:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 17 Jan 2009 05:09:21 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901171009.n0HA9Lk3027163@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #3 from cymon.cox at gmail.com 2009-01-17 05:09 EST ------- Hi Stephen, 2009/1/16 : > http://bugzilla.open-bio.org/show_bug.cgi?id=2734 > > ------- Comment #2 from stephen at blackrim.net 2009-01-16 14:12 EST ------- > Hi Peter, > Thanks for the quick reply. I will try to answer everything here. So I just > reran the BioSQL tests and I get > test_BioSQL ... ok > test_BioSQL_SeqIO ... ok > > so seems like everything there is fine (and I did configure the test for > postgres with the psycopg2 driver). I am downloading from the NCBI ftp and it > happens not only with all the files but also with the example on the biopython > biosql wiki page. Specifically with this example: > from Bio import Entrez > from Bio import SeqIO > from BioSQL import BioSeqDatabase > server = BioSeqDatabase.open_database(driver="psycopg2", ...) > db = server["plants"] > handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", > rettype="genbank") > db.load(SeqIO.parse(handle, "genbank")) > server.adaptor.commit() This code works form me: [cymon at chara ~]$ python Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio import Entrez >>> from Bio import SeqIO >>> from BioSQL import BioSeqDatabase >>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test") >>> db = server.new_database("blah", description="Just for testing") >>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") >>> server.adaptor.commit() >>> What versions of biopython and the BioSQL schema are you using? Cymon -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Jan 17 10:50:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 17 Jan 2009 05:50:19 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901171050.n0HAoJZa029834@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #4 from cymon.cox at gmail.com 2009-01-17 05:50 EST ------- > This code works form me: > [cymon at chara ~]$ python > Python 2.5.2 (r252:60911, Jul 24 2008, 17:11:36) > [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio import Entrez > >>> from Bio import SeqIO > >>> from BioSQL import BioSeqDatabase > >>> server = BioSeqDatabase.open_database(driver="psycopg2", db = "biosql_test") > >>> db = server.new_database("blah", description="Just for testing") > >>> handle = Entrez.efetch(db="nuccore", id="6273291,6273290,6273289", rettype="genbank") > >>> server.adaptor.commit() > >>> Sorry forgot to load it! :) >>> db.load(SeqIO.parse(handle, "genbank")) 3 >>> server.adaptor.commit() >>> C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 21 18:22:47 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Jan 2009 13:22:47 -0500 Subject: [Biopython-dev] [Bug 2738] New: Speed up GenBank parsing, in particular location parsing Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2738 Summary: Speed up GenBank parsing, in particular location parsing Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This is an enhancement "bug", for trying to improve the speed of parsing GenBank files WITHOUT any functionality changes. From previous profiling, I have found that the location parsing looks like an easy target. However, this code is non-trivial so we should proceed with caution. Possible patch to follow... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 21 18:30:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Jan 2009 13:30:27 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901211830.n0LIURFx009561@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-21 13:30 EST ------- Created an attachment (id=1206) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1206&action=view) Patch for Bio/GenBank/__init__.py to handle simple locations with re This patch handles the simple cases (non-fuzzy, no database references) using simple python and regular expressions. Everything else works by falling back on the old spark based Bio.GenBank.LocationParser code (e.g. fuzzy locations). The new code is pretty simple, and could potentially be extended to cover all the currently used location strings found in the feature table, allowing us to remove the use of Bio.GenBank.LocationParser, which in the long term this could lead to an overall code simplification. In the short term, this patch does complicate the location parsing because it means there are effectively two ways we parse the location strings (my new code, and the old spark based Bio.GenBank.LocationParser code). However, from my limited testing using Python 2.5 on the Mac with GenBank files for large bacterial genomes, this may be a price worth paying. I'll like independent measurements (and to check this on other platforms), but this does seem to more than halve the time taken to parse GenBank files! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 22 18:58:18 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Jan 2009 13:58:18 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901221858.n0MIwIpR000974@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-22 13:58 EST ------- Created an attachment (id=1208) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view) Simple test script for timing GenBank parsing I've attached a trivial script to time parsing all the GenBank files in directory to help anyone wanting to benchmark this change. (In reply to comment #1) > However, from my limited testing using Python 2.5 on the Mac with GenBank > files for large bacterial genomes, this may be a price worth paying. I'll > like independent measurements (and to check this on other platforms), but > this does seem to more than halve the time taken to parse GenBank files! Further testing with Python 2.5 on Linux, this time also with some large Eurakyotics files, appears to confirm a very large speed up (most obvious on feature rich GenBank files of course). I still want to check this on other versions of python... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 23 08:43:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Jan 2009 03:43:01 -0500 Subject: [Biopython-dev] [Bug 2740] New: Wise test fails with wise 2.4.1 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2740 Summary: Wise test fails with wise 2.4.1 Product: Biopython Version: 1.49 Platform: Other OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: charles-debian-nospam at plessy.org Dear Biopython developers, The test for wise fails with wise 2.4.1 and Biopython 1.49. I think one gap is missing in the reference used in the test script (probably that wise changed its gap opening penalties): anx159???Tests???$ dnal Wise/human_114_g01_exons.fna_01 Wise/human_114_g02_exons.fna_01 Warning Error Strangely truncated line in fasta file Warning Error Strangely truncated line in fasta file DnaAlign Matrix calculation: [ 14000] Cells 95% Score 114 Warning Error Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than allowed name block (12). Truncating Warning Error Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than allowed name block (12). Truncating ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC A GGAA GCCCC AGCTC CT TCT CT C TCC TGC A TGG TCC ENSG000001631 ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC ENSG00000172191 CA CA ENSG0000016347 CA This is compared to a different reference result in the test script: anx159???Tests???$ grep -A5 -B5 ENSG00000172135 test_Wise.py sys.stdout = self.old_stdout class TestWise(unittest.TestCase): def test_align(self): temp_file = Wise.align(["dnal"], ("Wise/human_114_g01_exons.fna_01", "Wise/human_114_g02_exons.fna_01"), kbyte=100000, force_type="DNA", quiet=True) self.assertEqual(temp_file.readline().rstrip(), "ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC") def run_tests(argv): test_suite = testing_suite() runner = unittest.TextTestRunner(sys.stdout, verbosity = 2) runner.run(test_suite) Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 23 12:06:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Jan 2009 07:06:29 -0500 Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1 In-Reply-To: Message-ID: <200901231206.n0NC6T4B023669@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2740 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-23 07:06 EST ------- Thanks for the report. Based on the following pages I had assumed the latest version was wise 2.2.0, available here: http://www.sanger.ac.uk/Software/Wise2/ points to ftp://ftp.ebi.ac.uk/pub/software/unix/wise2/ which only contains up to wise 2.2.0 After some Google searching I found Ewan Birney had changed his mind and stared work on it again: http://www.ebi.ac.uk/~birney/wise2/ Installing wise 2.4.1 took a while (tip for Linux uses, edit file src/models/phasemodel.c line 23 to replace isnumber by isdigit), but I can confirm the error you reported. This is the output from an older version of wise, $ ~/Downloads/wise2.2.0/src/bin/dnal Wise/human_114_g01_exons.fna_01 Wise/human_114_g02_exons.fna_01 DnaAlign Matrix calculation: [ 14000] Cells 97% Warning Error Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than allowed name block (12). Truncating Warning Error Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than allowed name block (12). Truncating ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAGTGGGGTCCC A GGAA GCCCC AGCTC CT TCT CT C TCC TGC A GG TCCC ENSG000001631 ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGCTCCC ENSG00000172192 A A ENSG0000016348 A Using the newer version of wise, we do indeed get a different alignment: $ ~/Downloads/wise2.4.1/src/bin/dnal Wise/human_114_g01_exons.fna_01 Wise/human_114_g02_exons.fna_01 DnaAlign Matrix calculation: [ 14000] Cells 97% Score 114 Warning Error Name ENSG00000172056|ENST00000321078|ENSE00001281503 is longer than allowed name block (12). Truncating Warning Error Name ENSG00000163182|ENST00000295339|ENSE00001130648 is longer than allowed name block (12). Truncating ENSG00000172135 AGGGAAAGCCCCTAAGCTC--CTGATCTATGCTGCATCCAGTTTGCAAAG-TGGGGTCC A GGAA GCCCC AGCTC CT TCT CT C TCC TGC A TGG TCC ENSG000001631 ATGGAA-GCCCC--AGCTCAGCT--TCT---CTTCCTCC----TGCTACTCTGGC-TCC ENSG00000172191 CA CA ENSG0000016347 CA -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 23 12:28:05 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Jan 2009 07:28:05 -0500 Subject: [Biopython-dev] [Bug 2740] Wise test fails with wise 2.4.1 In-Reply-To: Message-ID: <200901231228.n0NCS5a8028823@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2740 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-23 07:28 EST ------- This should be fixed in CVS, see: Tests/test_Wise.py revision 1.7 Tests/output/test_Wise revision 1.3 All I have done is made the unit test accept the old output, or the slightly different output from wise 2.4.1 - the main Biopython code is unchanged. >From the help text (just run dnal with no arguments), it appears the gap penalties have not changed - so the differing alignments but be an algorithm change of some sort. Another small difference is with wise 2.4.1, even in quiet mode, dnal starts its output by printing the score. Thank you for reporting this, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 10:13:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 05:13:43 -0500 Subject: [Biopython-dev] [Bug 2743] New: manual installation overwrites previous biopython installations Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2743 Summary: manual installation overwrites previous biopython installations Product: Biopython Version: Not Applicable Platform: All URL: http://lists.open-bio.org/pipermail/biopython/2009- January/004893.html OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com The manual biopython installation (the one made with python setup.py install) installs all the files in a directory like this: - /usr/lib/python2.5/site-packages/Bio The problem comes when you want to install biopython in a system where there is already an old version installed. In that case, it is not clear what happens to the old installation... are all the old files removed before the new version is installed? Or are the two versions 'mixed'? please refer to this discussion: - http://lists.open-bio.org/pipermail/biopython/2009-January/004893.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 11:05:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 06:05:07 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281105.n0SB577F013398@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2009-01-28 06:05 EST ------- (In reply to comment #0) > The manual biopython installation (the one made with python setup.py install) > installs all the files in a directory like this: > - /usr/lib/python2.5/site-packages/Bio > > The problem comes when you want to install biopython in a system where there is > already an old version installed. > In that case, it is not clear what happens to the old installation... are all > the old files removed before the new version is installed? Or are the two > versions 'mixed'? Isn't this what always happens when installing a Python module? If so, then it doesn't seem to be a Biopython bug to me. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 11:14:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 06:14:28 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281114.n0SBESYY014510@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #2 from dalloliogm at gmail.com 2009-01-28 06:14 EST ------- (In reply to comment #1) > (In reply to comment #0) > > The manual biopython installation (the one made with python setup.py install) > > installs all the files in a directory like this: > > - /usr/lib/python2.5/site-packages/Bio > > > > The problem comes when you want to install biopython in a system where there is > > already an old version installed. > > In that case, it is not clear what happens to the old installation... are all > > the old files removed before the new version is installed? Or are the two > > versions 'mixed'? > > Isn't this what always happens when installing a Python module? If so, then it > doesn't seem to be a Biopython bug to me. Well, I don't know if it is the same behaviour for the other python modules, but it can create dangerous situations, especially if you are 'downgrading' a biopython installation. The biopython installer should clarify that, asking the user if he wants to overwrite the existing installation, change the installation path, or abort. Anyway. the right way to install biopython should be by using easy_install. Easy_install downloads the latest code and creates an egg, and then install everything on a directory like this: - /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-i686.egg/ automatically changing $PYTHON_PATH. I suggest to change the biopython's wiki to tell people that they should always prefer to install biopython with easy_install, which by the way works perfectly and automatically checks the dependencies. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 12:46:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 07:46:37 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281246.n0SCkbKj028750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-28 07:46 EST ------- (In reply to comment #1) > > the old files removed before the new version is installed? Or are the two > > versions 'mixed'? > > Isn't this what always happens when installing a Python module? If so, then it > doesn't seem to be a Biopython bug to me. Agreed. As far as I know, this affects ANY python module installed with distutils - and indeed this is typical practice for ANY unix tool installed from source via a make file. It is essentially NORMAL, although not so nice for beginners. Linux distributions will often provide packaged versions of python libraries (including Biopython) which you can install/update/remove using the system's package manager (e.g. apt, yum, up2date etc). The only downside to me is they won't always have the latest version of each package. I suppose we could add a hack to setup.py to check if there is already a Biopython installation present (try doing "import Bio"), and if it is installed, ask the user if they want to continue. However, there are legitimate situations where this just makes things more confusing. e.g. You don't have admin rights on a unix machine where your systems administrator has provided python and an old version of Biopython, so you want to install the latest version of Biopython under your home directory. (In reply to comment #2) > I suggest to change the biopython's wiki to tell people that they should > always prefer to install biopython with easy_install, which by the way works > perfectly and automatically checks the dependencies. For now distutils is still the python standard, while easy_install is an non-standard optional extra. This in some ways using easy_install is more work. Note that easy_install doesn't provide a simple uninstall either: http://peak.telecommunity.com/DevCenter/EasyInstall#uninstalling-packages -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Jan 28 15:23:48 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Jan 2009 10:23:48 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200901281523.n0SFNmqQ013945@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 ------- Comment #4 from bsouthey at gmail.com 2009-01-28 10:23 EST ------- (In reply to comment #3) > (In reply to comment #1) > > > the old files removed before the new version is installed? Or are the two > > > versions 'mixed'? > > > > Isn't this what always happens when installing a Python module? If so, then it > > doesn't seem to be a Biopython bug to me. > > Agreed. As far as I know, this affects ANY python module installed with > distutils - and indeed this is typical practice for ANY unix tool installed > from source via a make file. It is essentially NORMAL, although not so nice > for beginners. > Agreed that this is not a Biopython bug but a Python feature. Yes, the installation is usually 'mixed' when installing from source. The setup will remove the existing egg-info and then a new one. Python copies the files to the appropriate place thus overwriting any old files with new versions but old files that are no longer present or files with different names will remain. To my knowledge, Python and Biopython will not know about those files unless a user explicitly tries to use them. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 17:41:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 12:41:19 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291741.n0THfJYC018518@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #3 from bsouthey at gmail.com 2009-01-29 12:41 EST ------- First, I object to this patch because it replaces the current version without keeping the old code. It should create a new parsing function so verify that the old and new versions provide exactly the same output for the same input. As indicated below, it does speed things up! So I have no problems for it to replace the current parsing code in the next release provided that the old parsing code remains as depreciated function. (Alternatively add a conditional statement with a flag to avoid this new code as required.) (In reply to comment #2) > Created an attachment (id=1208) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1208&action=view) [details] > Simple test script for timing GenBank parsing > > I've attached a trivial script to time parsing all the GenBank files in > directory to help anyone wanting to benchmark this change. > > (In reply to comment #1) > > However, from my limited testing using Python 2.5 on the Mac with GenBank > > files for large bacterial genomes, this may be a price worth paying. I'll > > like independent measurements (and to check this on other platforms), but > > this does seem to more than halve the time taken to parse GenBank files! > > Further testing with Python 2.5 on Linux, this time also with some large > Eurakyotics files, appears to confirm a very large speed up (most obvious on > feature rich GenBank files of course). > > I still want to check this on other versions of python... > I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5 and 2.6) and noted that this halved the time required to parse a Genbank Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb) with 213942 records with total length 158245604 bp). While the number of records and sequences are the same, I have not checked if the patched version is providing exactly the same output as the unpatched version. This is very important for the different types of GenBank files (Whole Genome Shotgun and CON types). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 17:57:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 12:57:22 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291757.n0THvMVl023111@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-29 12:57 EST ------- (In reply to comment #3) > First, I object to this patch because it replaces the current version without > keeping the old code. It does keep the old code, and explicitly uses the old code for the non-simple locations. > It should create a new parsing function so verify that > the old and new versions provide exactly the same output for the same input. We should probably extend the Biopython GenBank/EMBL parsing unit tests to make sure this patch doesn't break anything, and additionally have some extra test cases using big GenBank files which won't become official unit tests. This could be as simple as a script which parses all the records in a set of GenBank files, printing out a very minimal summary of each feature location (including subfeatures). We then run the script with and without the patch, and confirm their output matches. Once we are happy that the patch doesn't change the parser behaviour, I don't see any reason to offer both options to the end user. In fact, I would prefer to go further and REMOVE the old slow location parser after extending the regular expression based parser to cope with ALL location variants. > As indicated below, it does speed things up! So I have no problems for it to > replace the current parsing code in the next release provided that the old > parsing code remains as depreciated function. (Alternatively add a conditional > statement with a flag to avoid this new code as required.) Having the new code controlled by some option would actually be pretty easy. Other than for testing I see no reason to do this. > I ran the script on patched version of Linux Python (versions 2.3, 2.4, 2.5 > and 2.6) and noted that this halved the time required to parse a Genbank > Incremental Update file (an update from Jan 2009: nc0101.flat size 573 mb) > with 213942 records with total length 158245604 bp). That is consistent with the speed ups I have seen - you can get even more depending on the proportion of features in the file. Thanks for checking python 2.3 to 2.6, nice to see they all benefit. > While the number of records and sequences are the same, I have not checked if > the patched version is providing exactly the same output as the unpatched > version. This is very important for the different types of GenBank files > (Whole Genome Shotgun and CON types). I agree through testing is important here. Would you like to suggest any particular WGS or CON files for testing with? I'm thinking something large with a wide range of location types would be good for checking this patch (but not to include with Biopython). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 18:26:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 13:26:09 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291826.n0TIQ9YR030903@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-29 13:26 EST ------- Created an attachment (id=1209) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1209&action=view) Simple test script for checking GenBank location parsing This is a simple script to help validate the location parsing has not changed. Intended usage is to put the script in a directory with a good set of test GenBank files (all ending with the extension .gbk), then: (starting with a clean install of Biopython) $ time python parse_gbk_locs.py > old.txt (apply the patch) $ time python parse_gbk_locs.py > new.txt (verify the output matches) $ ls -l old.txt new.txt (check file sizes agree) $ diff old.txt new.txt (should be no output) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 19:38:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 14:38:20 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901291938.n0TJcKh2021246@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #6 from bsouthey at gmail.com 2009-01-29 14:38 EST ------- Created an attachment (id=1210) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view) Single test case that is not correctly parsed I just used a simple 'print record' followed by a diff (but that does not check the references). This record (and related ones) has a difference between versions ... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Jan 29 21:13:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Jan 2009 16:13:19 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901292113.n0TLDJ51019466@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #7 from bsouthey at gmail.com 2009-01-29 16:13 EST ------- (In reply to comment #4) > > While the number of records and sequences are the same, I have not checked if > > the patched version is providing exactly the same output as the unpatched > > version. This is very important for the different types of GenBank files > > (Whole Genome Shotgun and CON types). > > I agree through testing is important here. Would you like to suggest any > particular WGS or CON files for testing with? I downloaded a few example files including WGS and CON. I found that CON files are not parsed by either version. Not a surprise given that these have no sequences but that is a different topic. Apart from the errors in attached case, I have not seen any other errors (even parsing the references). Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 11:00:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:00:24 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901301100.n0UB0OsD002442@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:00 EST ------- (In reply to comment #6) > Created an attachment (id=1210) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1210&action=view) [details] > Single test case that is not correctly parsed > > I just used a simple 'print record' followed by a diff (but that does not > check the references). This record (and related ones) has a difference > between versions ... If you do a 'print record' with a SeqRecord object, any references are shown using their __repr__ string - which is currently the python object default which includes a memory address (something I've been meaning to address on Bug 2544). Different objects will have different memory locations, which will show up in the diff. For example, using the following as a simple test script and capturing its output to files: from Bio import SeqIO record = SeqIO.read(open("CY029873.gbk"), "genbank") print record Running diff with and without the patch gave me: 9c9 < /references=[, ] --- > /references=[, ] i.e. No real differences between the records as far as I can see. Please clarify - if you have found a failing example I would be most interested. (In reply to comment #7) > I downloaded a few example files including WGS and CON. I found that CON files > are not parsed by either version. Not a surprise given that these have no > sequences but that is a different topic. Apart from the errors in attached > case, I have not seen any other errors (even parsing the references). Could you clarify your problem with the CON files please (on a new bug, or the mailing list - since as you point out this is a different topic). I've just downloaded and unzipped one of the smaller CON files and it parses fine for me: ftp://ftp.ncbi.nih.gov/genbank/gbcon107.seq.gz >>> from Bio import SeqIO >>> count = 0 >>> for record in SeqIO.parse(open("gbcon107.seq"),"genbank") : count += 1 ... >>> print count 55031 As expected there is no sequence, but the name, description, features, references etc are there. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 11:29:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:29:07 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901301129.n0UBT7Ah008213@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:29 EST ------- I've run my test script (attachment 1209) on a Linux machine with Python 2.5 5.5K Jan 30 10:29 CY029873.gbk 67M Jan 22 17:53 dr_ref_chr16.gbk 42M Jan 22 17:53 NC_003075.gbk 14M Jan 22 18:43 NC_003272.gbk 25M Jan 22 17:52 NC_003279.gbk 4.8M Jan 22 18:44 NC_004350.gbk 20M Jan 22 18:42 NC_008095.gbk 14M Jan 22 18:44 NC_009925.gbk 18M Jan 22 18:43 NC_010628.gbk 296M Jan 22 17:52 ptr_ref_chr1.gbk 86M Jan 30 10:55 wgs.AAAB.1.gnp.gbk 297M Jan 30 10:55 wgs.AABR.10.gbff.gbk The last two files are WGS data for protein and nucleotide sequences, downloaded from ftp://ftp.ncbi.nih.gov/genbank/wgs/ then unzipped and a gbk extension added so my script parses them. With and without the patch the test script gives identical output - which appears to confirm the location parsing is not functionally altered. The timings where just over 2min and just over 8min with and without the patch (a four fold speed up on this dataset). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 11:30:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:30:30 -0500 Subject: [Biopython-dev] [Bug 2649] Bio.KDTree expects numpy array with dtype="float32" on 64 bit machines. In-Reply-To: Message-ID: <200901301130.n0UBUUMm008550@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2649 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:30 EST ------- Marking as fixed - please reopen this if need be. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 11:54:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 06:54:26 -0500 Subject: [Biopython-dev] [Bug 2639] SeqRecord.init doesn't check for arguments for their types In-Reply-To: Message-ID: <200901301154.n0UBsQbw014456@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2639 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 06:54 EST ------- (In reply to comment #5) > Ok, understood. I didn't thought of these cases. > However, having not a Seq causes errors that are difficult to > understand in other functions that use SeqRecord. > For example, if you do: > > >>> a = SeqRecord(id = '1') > >>> a.format('fasta') > > you get the error: > : 'NoneType' object has no attribute > 'tostring' > > This could scary an eventual biopython newbie, an exception like to > 'error - current SeqRecord object doesn't have a Seq' could be better. Well, if you want to create a SeqRecord where the sequence is None, you'd have to do SeqRecord(None, id="1") - your suggestion of SeqRecord(id="1") doesn't work as the sequence is a mandatory argument. However, I see your point that the current AttributeError isn't helpful in this special case. I've updated the Bio/SeqIO/FastaIO.py file in CVS (revision 1.15) to give a TypeError in this situation which will try to explain the problem. > What do you think about creating a 'NullSeq' object, which represent a > Seq with no value, and using it as a default for SeqRecord? > Later we could modify the other functions like .format e Seq.translate to > intercept these objects and return the right error message. Hmm. It seems rather complicated for a rare case. Using None to mean "missing" or "null" is done in other python libraries/code (e.g. database access), which is why I suggested someone might want to do this. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 12:00:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 07:00:19 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901301200.n0UC0JcD016114@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 07:00 EST ------- (In reply to comment #3) > > What versions of biopython and the BioSQL schema are you using? > > Cymon According to the bug report, Stephen was using Biopython 1.49, so: Stephen: Biopython 1.49 postgresql 8.2 BioSQL - schema version unspecified psycopg2 - version unspecified python - version unspecified OS - Mac OS X What about you Cymon - you have postgresql with psycopg2 working, but what versions of things? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 12:13:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 07:13:52 -0500 Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation document In-Reply-To: Message-ID: <200901301213.n0UCDqef019147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 07:13 EST ------- (In reply to comment #2) > I'm leaving this bug open until I've updated the HTML and PDF copies of the > installation document on the website. I don't have the tools hevea installed > on this machine, so I can't create the HTML version of the installation > document -- just the PDF. I should be be able to do this next week... Website updated. Marking this bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 12:20:06 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 07:20:06 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200901301220.n0UCK6Fp020687@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #6 from cymon.cox at gmail.com 2009-01-30 07:20 EST ------- (In reply to comment #5) > (In reply to comment #3) > > > > What versions of biopython and the BioSQL schema are you using? > > > > Cymon > > According to the bug report, Stephen was using Biopython 1.49, so: > > Stephen: > Biopython 1.49 > postgresql 8.2 > BioSQL - schema version unspecified > psycopg2 - version unspecified > python - version unspecified > OS - Mac OS X > > What about you Cymon - you have postgresql with psycopg2 working, but what > versions of things? > > Peter > Peter, I'm using: Biopython: CVS Posgresql: 8.1.11 BioSQL: 1.0.1 Python: 2.5.2 Psycopg: 2.0.8 OS: Red Hat Enterprise 5.3 C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 14:16:32 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:16:32 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301416.n0UEGWeN005337@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1139 is|0 |1 obsolete| | ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:16 EST ------- Created an attachment (id=1211) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1211&action=view) Patch to Bio/MaxEntropy.py to make the convergence parameters optional arguments This should retain API backwards compatibility by using the current module level values as the function's default arguments (see earlier comments). I've checked that changing these and then re-calling the train function does work as expected. How does this look? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 14:17:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:17:43 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301417.n0UEHhKG005438@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1211|application/octet-stream |text/plain mime type| | Attachment #1211 is|0 |1 patch| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:17 EST ------- (From update of attachment 1211) Marking this as a patch (plain text) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 14:19:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:19:43 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301419.n0UEJhID005587@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1211 is|0 |1 obsolete| | ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:19 EST ------- (From update of attachment 1211) Sorry - wrong version of the patch. This doesn't cover _iis_solve_delta etc. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 14:30:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:30:40 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200901301430.n0UEUe04006448@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 09:30 EST ------- Created an attachment (id=1212) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1212&action=view) Patch to Bio/MaxEntropy.py to make the convergence parameters optional arguments This time its the whole patch - sorry for the extra emails this has triggered. I had stopped to check in a couple of docstring changes and fixed a few tabs in MaxEntropy.py first, which confused things. Note this is a bit different to what I was thinking in comment #5, > ... something like this: > > def train(training_set, results, feature_fns, update_fn=None, > max_iis_iterations = MAX_IIS_ITERATIONS, > iis_convere = IIS_CONVERGE, > max_newton_iterations = MAX_NEWTON_ITERATIONS > newton_coverage = NEWTON_CONVERGE): The above code won't pick up changes to the module level variables like MAX_IIS_ITERATIONS because the defaults are only evaluated once when the function is created. The patch deals with this as follows: def train(training_set, results, feature_fns, update_fn=None, max_iis_iterations=None, iis_converge=None, max_newton_iterations=None, newton_converge=None): if max_iis_iterations is None : max_iis_iterations = MAX_IIS_ITERATIONS if iis_converge is None : iis_converge = IIS_CONVERGE if max_newton_iterations is None : max_newton_iterations = MAX_NEWTON_ITERATIONS if newton_converge is None : newton_converge = NEWTON_CONVERGE This works :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 14:34:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:34:23 -0500 Subject: [Biopython-dev] [Bug 2745] New: Bio.GenBank.LocationParserError with a GenBank CON file Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Summary: Bio.GenBank.LocationParserError with a GenBank CON file Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com The following file has a Bio.GenBank.LocationParserError: ftp://ftp.ncbi.nih.gov/genbank/daily-nc/con_nc.0103.flat.gz Partial error message (as the last line is the complete CONTIG line). Syntax error at or near `Tokens('close_paren')' token Traceback (most recent call last): File "parse_gbk.py", line 26, in for record in SeqIO.parse(handle, "genbank") : File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 410, in parse_records File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 393, in parse File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 371, in feed File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/Scanner.py", line 1093, in _feed_misc_lines File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py", line 990, in contig_location File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/GenBank/__init__.py", line 707, in location Bio.GenBank.LocationParserError: join(DS483543.1:1..325170,gap(unk100),DS483544.1:1..218545,gap(unk100),DS483545.1:1..95394,gap(unk100),DS483546.1:1..261305,gap(unk100),DS483547.1:1..63422,gap(unk100),DS483548.1:1..77432,gap(unk100),DS483549.1:1..371434,gap(unk100),DS483550.1:1..74569,gap(unk100),DS483551.1:1..54637,gap(unk100),DS483552.1:1..73591,gap(unk100),DS483553.1:1..63632,gap(unk100),DS483554.1:1..60619,gap(unk100),DS483555.1:1..57196,gap(unk100),DS483556.1:1..95189,gap(unk100),DS483557.1:1..48586,gap(unk100),DS483558.1:1..45971,gap(unk100),DS483559.1:1..59826,gap(unk100),DS483560.1:1..49535,gap(unk100),DS483561.1:1..51083,gap(unk100),... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 14:35:41 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:35:41 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200901301435.n0UEZfpC007388@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #1 from bsouthey at gmail.com 2009-01-30 09:35 EST ------- Created an attachment (id=1213) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1213&action=view) Example of a single GenBank CON record that fails -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 14:47:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 09:47:36 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200901301447.n0UEla5Q009025@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #10 from bsouthey at gmail.com 2009-01-30 09:47 EST ------- (In reply to comment #8) Thanks, I was able to print out the references from the annotations and I also did not see any differences. I submitted a bug for the CON file. I am a lot more comfortable with this patch now that a wide range of files have been tested. But you can confirm that the example I provided is correctly parsed? Thanks Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 15:11:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 10:11:56 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200901301511.n0UFBuEW012224@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 10:11 EST ------- It's the "gap(unk100)" entries which are breaking the location parser in Bruce's examples. Similarly even "gap()" entries of unknown length like this will fail: LOCUS AH007743 7832 bp DNA CON 26-MAY-1999 DEFINITION Gallus gallus ornithine transcarbamylase (OTC) gene, complete cds. ACCESSION AH007743 VERSION AH007743.1 GI:4927367 KEYWORDS . SOURCE chicken. ORGANISM Gallus gallus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Archosauria; Aves; Neognathae; Galliformes; Phasianidae; Phasianinae; Gallus. [....] FEATURES Location/Qualifiers source 1..7832 /organism="Gallus gallus" /db_xref="taxon:9031" /chromosome="1" CONTIG join(AF065630.1:1..1903,gap(),AF065631.1:1..435,gap(), AF065632.1:1..509,gap(),AF065633.1:1..722,gap(),AF065634.1:1..707, gap(),AF065635.1:1..836,gap(),AF065636.1:1..1614,gap(), AF065637.1:1..605,gap(),AF065638.1:1..501) // Example based on ftp://ftp.ncbi.nih.gov/genbank/README.genbank although this does not describe the new terms. Older versions of the release notes do, e.g. ftp://ftp.ncbi.nih.gov/genbank/release.notes/gb168.release.notes ========================= [start quote] ========================= 3.4.15 CONTIG Format As an alternative to SEQUENCE, a CONTIG record can be present following the ORIGIN record. A join() statement utilizing a syntax similar to that of feature locations (see the Feature Table specification mentioned in Section 3.4.12) provides the accession numbers and basepair ranges of other GenBank sequences which contribute to a large-scale biological object, such as a chromosome or complete genome. Here is an example of the use of CONTIG : CONTIG join(AE003590.3:1..305900,AE003589.4:61..306076, AE003588.3:61..308447,AE003587.4:61..314549,AE003586.3:61..306696, AE003585.5:61..343161,AE003584.5:61..346734,AE003583.3:101..303641, [ lines removed for brevity ] AE003782.4:61..298116,AE003783.3:16..111706,AE002603.3:61..143856) However, the CONTIG join() statement can also utilize a special operator which is *not* part of the syntax for feature locations: gap() : Gap of unknown length. gap(X) : Gap with an estimated integer length of X bases. To be represented as a run of n's of length X in the sequence that can be constructed from the CONTIG line join() statement . gap(unkX) : Gap of unknown length, which is to be represented as an integer number (X) of n's in the sequence that can be constructed from the CONTIG line join() statement. The value of this gap operator consists of the literal characters 'unk', followed by an integer. Here is an example of a CONTIG line join() that utilizes the gap() operator: CONTIG join(complement(AADE01002756.1:1..10234),gap(1206), AADE01006160.1:1..1963,gap(323),AADE01002525.1:1..11915,gap(1633), AADE01005641.1:1..2377) The first and last elements of the join() statement may be a gap() operator. But if so, then those gaps should represent telomeres, centromeres, etc. Consecutive gap() operators are illegal. ========================= [end quote] ========================= Evidently Biopython doesn't cope with these CONTIG lines - but then they do have a different syntax to the feature locations. I never understood why the current code tries to parse the CONTIG string into a SeqFeature object in the first place. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 15:36:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 10:36:52 -0500 Subject: [Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements In-Reply-To: Message-ID: <200901301536.n0UFaq5u015637@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2681 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 10:36 EST ------- (In reply to comment #2) > > 'contig' is ignored by loader because it's a SeqFeature object. Is there any > > reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb) > > I couldn't even say off hand how the CONTIG line in that example would be > parsed, let alone how it gets dealt with when loading into BioSQL. Basically the CONTIG line looks rather a lot like a feature location, typically the join of lots of (external) sequences. It makes some sense to parse this into an object structure, which given the way joins are handled for features, this lead the original author to represent the CONTIG information as a dummy feature with lots of sub features. Given the CONTIG can also include gaps (of unknown length), this doesn't quite fit the current SeqFeature location objects (see Bug 2745). If we extend the location objects to cope with these gaps, then perhaps the CONTIG can stay as a SeqFeature in which case for BioSQL maybe we should record it in the SeqFeature table. We'd have to invent a way to record these gap locations though. However, if we just stored the CONTIG line as a raw string, we could then store it in BioSQL as just another bioentry qualifier (assuming it doesn't overflow the text field limit). I've checked how and where BioPerl stores the contig information using the example Bruce used on Bug 2745, attachment 1213, and see that the CONTIG information is stored in the bioentry_qualifier_value table under the term "contig" under the ontology "Annotation Tags". They have retained the separate lines, storing each as a separate entry with an increasing rank. Thus for compatibility with BioSQL, it would make sense for the GenBank parser to store the CONTIG line as a simple string (or list of strings), and not as a SeqFeature (which is currently half broken anyway - see Bug 2745). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 16:20:18 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 11:20:18 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200901301620.n0UGKIXW024960@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 11:20 EST ------- Created an attachment (id=1214) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1214&action=view) Treat the CONTIG information as a string, not a SeqFeature As outlined on Bug 2681 comment 8, there are good reasons to simply store the CONTIG information as a string or perhaps a list of strings. This will make our BioSQL bindings consistent with BioPerl. More generally, I never really liked the idea of storing the CONTIG location as a SeqFeature. I could understand in principle using a location-object, but the current location objects do not deal with joins directly - which is why you have to use a SeqFeature with subfeatures. In the long term, a new location object might be a worthwhile change to both features and the contig. For now, this patch simply stores the CONTIG information as one long string. If we commit this, then Tests/output/test_GenBank will need updating too. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Jan 30 16:54:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Jan 2009 11:54:20 -0500 Subject: [Biopython-dev] [Bug 2723] Minor corrections to the installation document In-Reply-To: Message-ID: <200901301654.n0UGsK0D003024@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2723 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-01-30 11:54 EST ------- This is fixed now. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.