From mjldehoon at yahoo.com Sun Feb 1 03:38:03 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 1 Feb 2009 00:38:03 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite Message-ID: <104713.36194.qm@web62407.mail.re1.yahoo.com> Hi everybody, I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new version automatically detects whether a test is a unittest-style test or a print-and-compare test. By doing so, the unittest-style tests no longer need to have a file containing the test output in Tests/output. For users, run_tests.py works essentially the same as before. As changing the test framework is tricky business, I'd like to ask you to be careful with the Biopython tests, in particular to make sure that there are no bugs in the testing framework that would let test failures go unnoticed. If no problems show up in the next few weeks, we can start removing the output files of unittest-style tests from Biopython, as they're no longer needed. --Michiel. From dalloliogm at gmail.com Mon Feb 2 05:03:00 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 2 Feb 2009 11:03:00 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <104713.36194.qm@web62407.mail.re1.yahoo.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> Message-ID: <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon wrote: > Hi everybody, > > I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new version automatically detects whether a test is a unittest-style test or a print-and-compare test. By doing so, the unittest-style tests no longer need to have a file containing the test output in Tests/output. For users, run_tests.py works essentially the same as before. ok: - it seems it doesn't support doctest yet. - how this run_tests script is supposed to be called? Can you add this information in run_tests's docstring? If I run it from the biopython main directory (python Tests/run_tests.py) it gives me an error on test_AlignAce, but if I run it from within the Tests directory, it retunrs me an import error on test_CAPS. - some tests have some docstring associated. It would be more useful if, along with the name, you print these docs. For example, instead of: test_ACE ... ok It would be nice to see: test_ACE (tests the ACE module for .... which does ...) .... ok again, nose does this already. - while you are at it, it would be nice to be able to define some global fixtures for all tests. Something like setup_BioSQL ran only once and with a warning that it has been created. nose already does that by using the @classmethod syntax - it's not very intuitive at first but it works. There is something that has never been clear to me about biopython's doctest. Are they supposed to be ran by the developers only, or by the users who install biopython manually? Some of the test seems to be written to check whether biopython can run on the user's computer correctly, others are tests on the code. > > As changing the test framework is tricky business, I'd like to ask you to be careful with the Biopython tests, in particular to make sure that there are no bugs in the testing framework that would let test failures go unnoticed. If no problems show up in the next few weeks, we can start removing the output files of unittest-style tests from Biopython, as they're no longer needed. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Feb 2 05:29:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Feb 2009 10:29:13 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> Message-ID: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> On Mon, Feb 2, 2009 at 10:03 AM, Giovanni Marco Dall'Olio wrote: > On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon wrote: >> Hi everybody, >> >> I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new >> version automatically detects whether a test is a unittest-style test or a >> print-and-compare test. By doing so, the unittest-style tests no longer >> need to have a file containing the test output in Tests/output. For users, >> run_tests.py works essentially the same as before. > > ok: > - it seems it doesn't support doctest yet. I think Michiel has only switched over test_Cluster.py thus far. The doctests are currently run via test_docstrings.py which is still a print-and-compare test for now. > - how this run_tests script is supposed to be called? Can you add this > information in run_tests's docstring? I guess it could be more explicit. > If I run it from the biopython main directory (python > Tests/run_tests.py) it gives me an error on test_AlignAce, but if I > run it from within the Tests directory, it retunrs me an import error > on test_CAPS. Michiel hasn't changed this. From the Tests directory do: python run_tests.py Or, from the parent directory (typically between doing build and install): python setup.py test Trying to call run_tests.py from outside the Tests directory is not expected to work. As explained in the docstring for run_tests.py (read the start of the file), if you want to run just some of the tests, you can list them like this: python run_tests.py test_CAPS test_docstrings You can include the py extension here optionally. Could you show us the error with test_CAPS.py please, with details of your setup. This test is working for me. > - some tests have some docstring associated. It would be more useful > if, along with the name, you print these docs. > For example, instead of: > test_ACE ... ok > It would be nice to see: > test_ACE (tests the ACE module for .... which does ...) .... ok > again, nose does this already. I think it would be unnecessary text, of little interest to the typical user. > - while you are at it, it would be nice to be able to define some > global fixtures for all tests. > Something like setup_BioSQL ran only once and with a warning that it > has been created. > nose already does that by using the @classmethod syntax - it's not > very intuitive at first but it works. > > There is something that has never been clear to me about biopython's doctest. > Are they supposed to be ran by the developers only, or by the users > who install biopython manually? Both - developers, and optionally/ideally anyone installing from source. With CVS, they should also work for Windows users who used the installation setup exe, but this requires them to download the source code separately to get the unit tests. > Some of the test seems to be written to check whether biopython can > run on the user's computer correctly, others are tests on the code. In a sense they are all tests on the code - some of the code by its nature is a wrapper for a command line tool, so this may or not be present on the user's machine. Peter From mjldehoon at yahoo.com Mon Feb 2 05:48:33 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 2 Feb 2009 02:48:33 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> Message-ID: <50228.73742.qm@web62405.mail.re1.yahoo.com> Just for clarification: The only purpose of the run_tests.py rewrite is to remove the requirement of an output file for unittest-based tests. While personally I am in favor of unittest-based tests, it is not my intention to remove support for the print-and-compare tests. I expect that for the most part, the test scripts themselves won't need to be changed. A few test scripts will need to be adjusted; test_Cluster.py was one of them. The main visible result of the new run_tests.py is that we will be able to remove the output files in Tests/output/test_* for the unittest-based tests. As Peter wrote, the doctests are being run via test_docstrings.py, which is picked up by run_tests.py --Michiel. --- On Mon, 2/2/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: dalloliogm at gmail.com > Cc: mjldehoon at yahoo.com, biopython-dev at biopython.org > Date: Monday, February 2, 2009, 5:29 AM > On Mon, Feb 2, 2009 at 10:03 AM, Giovanni Marco > Dall'Olio > wrote: > > On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon > wrote: > >> Hi everybody, > >> > >> I just uploaded to CVS a rewritten version of > Tests/run_tests.py. This new > >> version automatically detects whether a test is a > unittest-style test or a > >> print-and-compare test. By doing so, the > unittest-style tests no longer > >> need to have a file containing the test output in > Tests/output. For users, > >> run_tests.py works essentially the same as before. > > > > ok: > > - it seems it doesn't support doctest yet. > > I think Michiel has only switched over test_Cluster.py thus > far. The > doctests are currently run via test_docstrings.py which is > still a > print-and-compare test for now. > > > - how this run_tests script is supposed to be called? > Can you add this > > information in run_tests's docstring? > > I guess it could be more explicit. > > > If I run it from the biopython main directory (python > > Tests/run_tests.py) it gives me an error on > test_AlignAce, but if I > > run it from within the Tests directory, it retunrs me > an import error > > on test_CAPS. > > Michiel hasn't changed this. From the Tests directory > do: > python run_tests.py > > Or, from the parent directory (typically between doing > build and install): > python setup.py test > > Trying to call run_tests.py from outside the Tests > directory is not > expected to work. > > As explained in the docstring for run_tests.py (read the > start of the > file), if you want to run just some of the tests, you can > list them > like this: > > python run_tests.py test_CAPS test_docstrings > > You can include the py extension here optionally. > > Could you show us the error with test_CAPS.py please, with > details of > your setup. This test is working for me. > > > - some tests have some docstring associated. It would > be more useful > > if, along with the name, you print these docs. > > For example, instead of: > > test_ACE ... ok > > It would be nice to see: > > test_ACE (tests the ACE module for .... which does > ...) .... ok > > again, nose does this already. > > I think it would be unnecessary text, of little interest to > the typical user. > > > - while you are at it, it would be nice to be able to > define some > > global fixtures for all tests. > > Something like setup_BioSQL ran only once and with a > warning that it > > has been created. > > nose already does that by using the @classmethod > syntax - it's not > > very intuitive at first but it works. > > > > There is something that has never been clear to me > about biopython's doctest. > > Are they supposed to be ran by the developers only, or > by the users > > who install biopython manually? > > Both - developers, and optionally/ideally anyone installing > from source. > > With CVS, they should also work for Windows users who used > the > installation setup exe, but this requires them to download > the source > code separately to get the unit tests. > > > Some of the test seems to be written to check whether > biopython can > > run on the user's computer correctly, others are > tests on the code. > > In a sense they are all tests on the code - some of the > code by its > nature is a wrapper for a command line tool, so this may or > not be > present on the user's machine. > > Peter From biopython at maubp.freeserve.co.uk Mon Feb 2 06:16:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Feb 2009 11:16:30 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <50228.73742.qm@web62405.mail.re1.yahoo.com> References: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <50228.73742.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00902020316tfa20931r6fe39444cc958adc@mail.gmail.com> On Mon, Feb 2, 2009 at 10:48 AM, Michiel de Hoon wrote: > > Just for clarification: > The only purpose of the run_tests.py rewrite is to remove the requirement > of an output file for unittest-based tests. While personally I am in favor of > unittest-based tests, it is not my intention to remove support for the > print-and-compare tests. I expect that for the most part, the test scripts > themselves won't need to be changed. A few test scripts will need to be > adjusted; test_Cluster.py was one of them. The main visible result of the > new run_tests.py is that we will be able to remove the output files in > Tests/output/test_* for the unittest-based tests. I've found something that will need changing. Consider the following output (based on what run_tests.py is now doing; this was tested on Mac OS X, Python 2.5.2): >>> import unittest >>> unittest.TestLoader().loadTestsFromModule(__import__("test_Cluster")).countTestCases() 7 >>> unittest.TestLoader().loadTestsFromModule(__import__("test_Motif")).countTestCases() 0 >>> unittest.TestLoader().loadTestsFromModule(__import__("test_Phd")).countTestCases() 0 We need to override testMethodPrefix to "t" instead of the default of "test" in order to detect these (and others like them). >>> test_loader = unittest.TestLoader() >>> test_loader.testMethodPrefix="t" >>> test_loader.loadTestsFromModule(__import__("test_Phd")).countTestCases() 2 >>> test_loader.loadTestsFromModule(__import__("test_Motif")).countTestCases() 8 We could just have run_tests.py check using either prefix, or we can standardise on one. I think we have more unit tests using the "t" prefix than the "test" prefix - so it would be simpler to standardise on using "t_*", although on the other hand, using "test_*" fits with the default. Which do you prefer Michiel? Peter From n.j.loman at bham.ac.uk Mon Feb 2 06:54:50 2009 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Mon, 02 Feb 2009 11:54:50 +0000 Subject: [Biopython-dev] Problems importing GenBank Files with complex LOCATION tags Message-ID: <4986DF0A.1040103@bham.ac.uk> Hi there, I'm attempting to import the whole of RefSeq into a BioSQL schema using the BioPython loader. However, I am encountering problems with items in the CON division, such as NW_002063152. I am using stock Biopython 1.49 install. The problem occurs when parsing complex CONTIG location tags, such as the following (spacing adjusted for readability): CONTIG join(NZ_ABJI01000250.1:1..6235,gap(unk100), NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802, gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100), NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192, gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100), NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364, gap(511),NZ_ABJI01000259.1:1..17405,gap(unk100), NZ_ABJI01000260.1:1..2462,gap(570),NZ_ABJI01000261.1:1..3348, gap(410),NZ_ABJI01000262.1:1..815,gap(196), NZ_ABJI01000263.1:1..589) I have worked around the problem by rewriting during my import to produce a blank ORIGIN definition, which at least gets the sequence features imported. I realise complex location parsing has been discussed before on this list - would the authors expect this to parse correctly, or is it out of the scope of the current code? Best regards, Nick. From bsouthey at gmail.com Mon Feb 2 09:39:03 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 02 Feb 2009 08:39:03 -0600 Subject: [Biopython-dev] Problems importing GenBank Files with complex LOCATION tags In-Reply-To: <4986DF0A.1040103@bham.ac.uk> References: <4986DF0A.1040103@bham.ac.uk> Message-ID: <49870587.2080009@gmail.com> Hi, I guess this pertains to Bugs 2681 and 2745. Please see Peter's comments and suggested patch to Bug 2745. http://bugzilla.open-bio.org/show_bug.cgi?id=2681 http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Any comments or thoughts on these would be appreciated! Thanks Bruce Nick Loman wrote: > Hi there, > > I'm attempting to import the whole of RefSeq into a BioSQL schema > using the BioPython loader. However, I am encountering problems with > items in the CON division, such as NW_002063152. I am using stock > Biopython 1.49 install. > > The problem occurs when parsing complex CONTIG location tags, such as > the following (spacing adjusted for readability): > > CONTIG > join(NZ_ABJI01000250.1:1..6235,gap(unk100), > NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802, > gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100), > NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192, > gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100), > NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364, > gap(511),NZ_ABJI01000259.1:1..17405,gap(unk100), > NZ_ABJI01000260.1:1..2462,gap(570),NZ_ABJI01000261.1:1..3348, > gap(410),NZ_ABJI01000262.1:1..815,gap(196), > NZ_ABJI01000263.1:1..589) > > I have worked around the problem by rewriting during my import to > produce a blank ORIGIN definition, which at least gets the sequence > features imported. > > I realise complex location parsing has been discussed before on this > list - would the authors expect this to parse correctly, or is it out > of the scope of the current code? > > Best regards, > > Nick. > > > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Mon Feb 2 11:53:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Feb 2009 11:53:28 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200902021653.n12GrS1a028869@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-02 11:53 EST ------- (In reply to comment #1) > Created an attachment (id=1213) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1213&action=view) [details] > Example of a single GenBank CON record that fails For interest, and as a possible work around, note that you can download this GenBank file from Entrez WITH the sequence. First of all, try this: >>> from Bio import Entrez >>> Entrez.email = "A.N.Other at example.com" >>> data = Entrez.efetch("nucleotide",id="FA000001",rettype="genbank",retmode="text").read() >>> out_handle = open("FA000001.gbk","w") >>> out_handle.write(data) >>> out_handle.close() This gives the CONTIG line without the actual nucleotides (as in Bruce's attachment, which I assume came from the NCBI's FTP site). However, from reading the Entrez documentation, we can get the nucleotides too by asking for "gbwithparts" instead of "gb" (or its equivalent, "genbank"). See http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html#SequenceDatabases i.e. >>> data = Entrez.efetch("nucleotide",id="FA000001",rettype="gbwithparts",retmode="text").read() >>> out_handle = open("FA000001.gbwithparts.gbk","w") >>> out_handle.write(data) >>> out_handle.close() I was getting some "Service unavailable!" or proxy errors earlier (which Bio.Entrez wasn't catching - I've updated it in CVS), but this does work giving a 12.8 MB file with the full sequence (with plenty of sections with an N). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 3 05:03:04 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Feb 2009 05:03:04 -0500 Subject: [Biopython-dev] [Bug 2748] New: test_GAQueens's documentation refers to an unknown script 'place_queens.py' Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2748 Summary: test_GAQueens's documentation refers to an unknown script 'place_queens.py' Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com The test_GAQueens docstring refers to a script called 'place_queens.py', and it is not clear what it is: 12 python place_queens.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Feb 3 05:18:01 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 3 Feb 2009 11:18:01 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> Message-ID: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> On Mon, Feb 2, 2009 at 11:29 AM, Peter wrote: > On Mon, Feb 2, 2009 at 10:03 AM, Giovanni Marco Dall'Olio > wrote: >> On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon wrote: >>> Hi everybody, >>> >>> I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new >>> version automatically detects whether a test is a unittest-style test or a >>> print-and-compare test. By doing so, the unittest-style tests no longer >>> need to have a file containing the test output in Tests/output. For users, >>> run_tests.py works essentially the same as before. >> >> ok: >> - it seems it doesn't support doctest yet. > > I think Michiel has only switched over test_Cluster.py thus far. The > doctests are currently run via test_docstrings.py which is still a > print-and-compare test for now. ah! I see. However, this way, test_docstring will be difficult to mantain in the future. A better solution would be to have run_test.py go throught all biopython's modules, and then execute every doctest it encounters. You can do this with doctest.DocTestFinder (have a look at nose's code, which does it already: - http://code.google.com/p/python-nose/source/browse/trunk/nose/plugins/doctests.py) > Could you show us the error with test_CAPS.py please, with details of > your setup. This test is working for me. sorry.. it works fine if I run it from within the Tests dir. >> - some tests have some docstring associated. It would be more useful >> if, along with the name, you print these docs. >> For example, instead of: >> test_ACE ... ok >> It would be nice to see: >> test_ACE (tests the ACE module for .... which does ...) .... ok >> again, nose does this already. > > I think it would be unnecessary text, of little interest to the typical user. It would be useful to make sure that every test is documented. Most of the tests in biopython are not: for example, can you tell which is the difference between test_Fasta.py and test_Fasta2.py? Moreover, why the typical user should be running biopython's tests? >> - while you are at it, it would be nice to be able to define some >> global fixtures for all tests. >> Something like setup_BioSQL ran only once and with a warning that it >> has been created. >> nose already does that by using the @classmethod syntax - it's not >> very intuitive at first but it works. What about having support to global fixtures? For example, many test scripts begin in the same way: they 'import numpy', check for python's version, etc.. All of this could be moved to a global fixture and then executed only once for all the tests. All the Bio.Seq files could open the sequence files only once, therefore it will be easier to write more complex tests. The Bio.BioSQL modules could create a database only once, reducing memory usage. > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Tue Feb 3 05:30:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Feb 2009 05:30:50 -0500 Subject: [Biopython-dev] [Bug 2748] test_GAQueens's documentation refers to an unknown script 'place_queens.py' In-Reply-To: Message-ID: <200902031030.n13AUolG010593@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2748 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Version|Not Applicable |1.49 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-03 05:30 EST ------- Fixed in CVS (replacing references to place_queens.py with test_GAQueens.py). I believe test_GAQueens.py used to be called place_queens.py before it was re-used as a test case. Note that when it is run from the test suite, 5 queens are used. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 3 05:35:09 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 10:35:09 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> Message-ID: <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> >> I think Michiel has only switched over test_Cluster.py thus far. The >> doctests are currently run via test_docstrings.py which is still a >> print-and-compare test for now. > > ah! I see. I was wrong - as Michiel clarified in a later comment, run_tests.py should have been finding all the unittest based tests (but right now it isn't). As in my earlier email, some of our unittest cases use a prefix of "t" and others use "test" meaning only some of the unittest test cases are currently being detected. One this is fixed, then test_docstring should work too. >> Could you show us the error with test_CAPS.py please, with details of >> your setup. This test is working for me. > > sorry.. it works fine if I run it from within the Tests dir. Good. Thanks. Peter From mjldehoon at yahoo.com Tue Feb 3 06:38:06 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 03:38:06 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902020316tfa20931r6fe39444cc958adc@mail.gmail.com> Message-ID: <362230.6662.qm@web62401.mail.re1.yahoo.com> Good catch. > We need to override testMethodPrefix to "t" > instead of the default of "test" in order > to detect these (and others like them). ... > We could just have run_tests.py check using either prefix, > or we can standardise on one. I think we have more unit > tests using the "t" prefix than the "test" prefix - so it > would be simpler to standardise on using "t_*", although > on the other hand, using> "test_*" fits with > the default. Which do you prefer Michiel? I prefer sticking to the default ... changing the method names from t_* to test_* needs to be done only once, whereas if we continue to use t_* we'll have to remind ourselves of that for all future tests that will be written. So I've changed all the t_* method names to test_*. These tests should run now. Thanks again for noticing this bug. --Michiel. From dalloliogm at gmail.com Tue Feb 3 06:46:54 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 3 Feb 2009 12:46:54 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> Message-ID: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> ufff I am sorry but the more I think about it, the more it seems a nonsense to me.. Why are you writing a new test-discovery framework for biopython, when there are many already available that work fine and better? Isn't it a waste of time, really? I am not criticizing you - but speaking from a purely technical point of view, I really don't understand. If you are worried that using nose will add a new prerequisite to biopython (which is not true, by the way), you can easily include the nose executable within the test dir, as I think many other projects already do; Honestly, I have the feeling that you didn't even had a look at all the links I posted in the old discussion on nose, neither you have tried it, and that's so bad. You didn't discuss about the pros or cons of nose, you just kept saying 'it would add a prerequisite to biopython' (which is not true, again), and started writing your own new test discovery framework. With nose, you could have a good testing infrastructure and take advantage of things like global fixtures, automatic formatting of the output, integration with profilers, and a lot of things more. It seems a nonsense to me, because with biopython you provide source code that you make available to all the bioinformaticians, with the idea that reuse of the code is good; but then, you don't want to use the code written by someone else. I have seen many bioinformatician telling me that they don't use biopython because they don't have the time to study it and they don't know how it works. I really believe that this is terrible, making the whole bioinformatics field a mess. Cheers :) On Tue, Feb 3, 2009 at 11:35 AM, Peter wrote: >>> I think Michiel has only switched over test_Cluster.py thus far. The >>> doctests are currently run via test_docstrings.py which is still a >>> print-and-compare test for now. >> >> ah! I see. > > I was wrong - as Michiel clarified in a later comment, run_tests.py > should have been finding all the unittest based tests (but right now > it isn't). As in my earlier email, some of our unittest cases use a > prefix of "t" and others use "test" meaning only some of the unittest > test cases are currently being detected. One this is fixed, then > test_docstring should work too. > >>> Could you show us the error with test_CAPS.py please, with details of >>> your setup. This test is working for me. >> >> sorry.. it works fine if I run it from within the Tests dir. > > Good. Thanks. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Tue Feb 3 06:55:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 11:55:40 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> Message-ID: <320fb6e00902030355r373be6d2taf6d1926f2f83757@mail.gmail.com> On Tue, Feb 3, 2009 at 11:46 AM, Giovanni Marco Dall'Olio wrote: > > ufff I am sorry but the more I think about it, the more it seems a > nonsense to me.. > Why are you writing a new test-discovery framework for biopython, when > there are many already available that work fine and better? > Isn't it a waste of time, really? I am not criticizing you - but > speaking from a purely technical point of view, I really don't > understand. We're NOT writing a new test-discovery framework - in this recent change we're reusing part of the existing unittest framework included with python. > If you are worried that using nose will add a new prerequisite to > biopython (which is not true, by the way), you can easily include the > nose executable within the test dir, as I think many other projects > already do; Using nose would be another prerequisite for anyone running the tests (although as you point out, it may be possible to include it with Biopython). > Honestly, I have the feeling that you didn't even had a look at all > the links I posted in the old discussion on nose, neither you have > tried it, and that's so bad. You didn't discuss about the pros or cons > of nose, you just kept saying 'it would add a prerequisite to > biopython' (which is not true, again), and started writing your own > new test discovery framework. We didn't just start writing our own framework (which I agree would be a waste of time). We already had a simple framework, and with Michiel's recent changes it make more use of the python unittest infrastructure. Peter From mjldehoon at yahoo.com Tue Feb 3 06:52:05 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 03:52:05 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> Message-ID: <752614.16176.qm@web62402.mail.re1.yahoo.com> > However, this way, test_docstring will be difficult to > mantain in the future. > A better solution would be to have run_test.py go through > all biopython's modules, and then execute every doctest it > encounters. > You can do this with doctest.DocTestFinder (have a look at > nose's code, which does it already: Can doctest.DocTestFinder handle missing external dependencies? For example, if a user installed Biopython without NumPy, then the NumPy-dependent modules should be skipped and not flagged as errors. > Moreover, why the typical user should be running > biopython's tests? To make sure that it works. Biopython interacts with and therefore depends more on 3rd party software, web servers, and file formats than most other Python modules. Things are more likely to break than for example for a more self-contained library such as NumPy. I always run the Biopython tests, and I would advise every user to do so too. In addition, the tests can function as example scripts showing how to use Biopython. It is important that all users can run those scripts. > What about having support to global fixtures? > For example, many test scripts begin in the same way: they > 'import > numpy', check for python's version, etc.. All of > this could be moved > to a global fixture and then executed only once for all the > tests. Hmm... currently the Biopython tests can be written essentially independently of each other, without knowing much about the testing overall framework. I think that that makes it easier for new users/developers to add tests. I think we should avoid the situation that somebody first has to study Biopython's testing framework to be able to add a test. --Michiel. From mjldehoon at yahoo.com Tue Feb 3 07:21:05 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 04:21:05 -0800 (PST) Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd Message-ID: <240911.28388.qm@web62402.mail.re1.yahoo.com> These three tests currently are written as a combination of a unittest-based test and a print-and-compare test. That is, they contain classes deriving from unittest.TestCase, but then print out stuff that should get compared to the output file. However, run_tests.py assumes that they are true unittest-style tests, so the comparison is never done. Does anybody mind if I convert these three to pure print-and-compare or pure unittest-style tests? test_Ace.py and test_Nexus.py produce lots of output, so I'm tempted to go with a print-and-compare test there; test_Phd.py might work well as a unittest-style test. --Michiel. From mjldehoon at yahoo.com Tue Feb 3 07:26:02 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 04:26:02 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> Message-ID: <425937.30005.qm@web62402.mail.re1.yahoo.com> Maybe it was a mistake to call this a rewrite ... basically all I'm doing is making some changes in run_tests.py so that it will distinguish between unittest-style tests and print-and-compare tests, and cleaning up some code while I'm at it. This will allow us to remove the trivial output files for the unittest-style tests, which were a real annoyance because they had to be updated whenever a new test was added to an existing test script. And since the output files did not contain any real information, people tended to forget that. Maybe nose can do the same as unittest, but unittest comes with Python and nose does not, so as long as unittest does the job, I see no reason to change to nose. --Michiel. --- On Tue, 2/3/09, Giovanni Marco Dall'Olio wrote: > From: Giovanni Marco Dall'Olio > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: "Peter" > Cc: biopython-dev at biopython.org > Date: Tuesday, February 3, 2009, 6:46 AM > ufff I am sorry but the more I think about it, the more it > seems a > nonsense to me.. > Why are you writing a new test-discovery framework for > biopython, when > there are many already available that work fine and better? > Isn't it a waste of time, really? I am not criticizing > you - but > speaking from a purely technical point of view, I really > don't > understand. > > If you are worried that using nose will add a new > prerequisite to > biopython (which is not true, by the way), you can easily > include the > nose executable within the test dir, as I think many other > projects > already do; > > Honestly, I have the feeling that you didn't even had a > look at all > the links I posted in the old discussion on nose, neither > you have > tried it, and that's so bad. You didn't discuss > about the pros or cons > of nose, you just kept saying 'it would add a > prerequisite to > biopython' (which is not true, again), and started > writing your own > new test discovery framework. > With nose, you could have a good testing infrastructure and > take > advantage of things like global fixtures, automatic > formatting of the > output, integration with profilers, and a lot of things > more. > > It seems a nonsense to me, because with biopython you > provide source > code that you make available to all the bioinformaticians, > with the > idea that reuse of the code is good; but then, you > don't want to use > the code written by someone else. > I have seen many bioinformatician telling me that they > don't use > biopython because they don't have the time to study it > and they don't > know how it works. I really believe that this is terrible, > making the > whole bioinformatics field a mess. > > Cheers :) > > On Tue, Feb 3, 2009 at 11:35 AM, Peter > wrote: > >>> I think Michiel has only switched over > test_Cluster.py thus far. The > >>> doctests are currently run via > test_docstrings.py which is still a > >>> print-and-compare test for now. > >> > >> ah! I see. > > > > I was wrong - as Michiel clarified in a later comment, > run_tests.py > > should have been finding all the unittest based tests > (but right now > > it isn't). As in my earlier email, some of our > unittest cases use a > > prefix of "t" and others use > "test" meaning only some of the unittest > > test cases are currently being detected. One this is > fixed, then > > test_docstring should work too. > > > >>> Could you show us the error with test_CAPS.py > please, with details of > >>> your setup. This test is working for me. > >> > >> sorry.. it works fine if I run it from within the > Tests dir. > > > > Good. Thanks. > > > > Peter > > > > > > -- > > My blog on bioinformatics (now in English): > http://bioinfoblog.it > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Tue Feb 3 08:12:51 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 13:12:51 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <425937.30005.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <425937.30005.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> On Tue, Feb 3, 2009 at 12:26 PM, Michiel de Hoon wrote: > Maybe it was a mistake to call this a rewrite ... With hindsight, it did give the impression of something bigger happening. Oh well. > ... basically all I'm doing is making some changes in run_tests.py so that it will > distinguish between unittest-style tests and print-and-compare tests, and > cleaning up some code while I'm at it. In terms of cleaning up the code, something we can probably now remove from the print-and-compare handler is the special case of modules called via a run_tests method. I'd like to suggest removing this bit (lines 167 to 171 at the moment): try: cur_test.run_tests([]) except AttributeError: pass [As an aside, using a hasattr(module,"run_tests") would be safer in case the test itself raised an AttributeError. If we remove this code it doesn't matter.] Currently I think only test_GAQueens.py requires this "magic" which can be solved by making it explicitly default to running with five queens. Right now it is not at all clear from looking at this example how this default happens if run via run_tests.py but not when running test_GAQueens.py on its own. The only other print-and-compare module I found with a run_tests function is test_NNExclusiveOr.py but here it makes no difference as the same code gets called via the __main__ trick. Peter From biopython at maubp.freeserve.co.uk Tue Feb 3 08:26:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 13:26:15 +0000 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <240911.28388.qm@web62402.mail.re1.yahoo.com> References: <240911.28388.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00902030526i6c77c327ue346a4a14c545c93@mail.gmail.com> On Tue, Feb 3, 2009 at 12:21 PM, Michiel de Hoon wrote: > These three tests currently are written as a combination of a unittest-based > test and a print-and-compare test. That is, they contain classes deriving from > unittest.TestCase, but then print out stuff that should get compared to the > output file. However, run_tests.py assumes that they are true unittest-style > tests, so the comparison is never done. That makes sense - its good there are only three of them! > Does anybody mind if I convert these three to pure print-and-compare or > pure unittest-style tests? test_Ace.py and test_Nexus.py produce lots of > output, so I'm tempted to go with a print-and-compare test there; > test_Phd.py might work well as a unittest-style test. That sounds sensible - unless Frank or Cymon want to help out carry on. [I've recently fixed a couple of tear-down problems in test_PopGen_FDist.py and test_PopGen_SimCoal_nodepend.py to do with trying to remove files/directories which may not have been created if the test failed.] Peter From biopython at maubp.freeserve.co.uk Tue Feb 3 09:02:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 14:02:25 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> References: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <425937.30005.qm@web62402.mail.re1.yahoo.com> <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> Message-ID: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> Michiel, I've noticed that for print-and-compare tests we can get unexpected errors from the line: module = __import__(name) For example, if there is an IOError in test_SeqIO_online.py this does not get caught - we only try to catch a MissingExternalDependencyError. Perhaps we should also catch any generic exception and report that test as a failure. Otherwise, the run_test.py file terminates prematurely. Would you like to look into this, or should I? Peter From bsouthey at gmail.com Tue Feb 3 10:15:37 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 03 Feb 2009 09:15:37 -0600 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030355r373be6d2taf6d1926f2f83757@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <320fb6e00902030355r373be6d2taf6d1926f2f83757@mail.gmail.com> Message-ID: <49885F99.30701@gmail.com> Hi, I do get your point and I do agree with it. In part I see this as a necessary step to clean up the current tests that would permit a smoother change to a different testing framework if or when necessary. It is a different question whether or not to do such a change. Technically nose is required by Numpy 1.2+ (but only for testing) so it is not really an extra dependency on Biopython (unless Biopython is split into two components - with and without Numpy). But I do not see an real advantage for a new testing framework in the current code base without a major effect to change everything at once (I would at least act as a tester for any new framework). Perhaps it would make better sense to do that when porting Biopython to Python 3 because the tests will need to be examined and perhaps rewritten. Bruce Peter wrote: > On Tue, Feb 3, 2009 at 11:46 AM, Giovanni Marco Dall'Olio > wrote: > >> ufff I am sorry but the more I think about it, the more it seems a >> nonsense to me.. >> Why are you writing a new test-discovery framework for biopython, when >> there are many already available that work fine and better? >> Isn't it a waste of time, really? I am not criticizing you - but >> speaking from a purely technical point of view, I really don't >> understand. >> > > We're NOT writing a new test-discovery framework - in this recent > change we're reusing part of the existing unittest framework included > with python. > > >> If you are worried that using nose will add a new prerequisite to >> biopython (which is not true, by the way), you can easily include the >> nose executable within the test dir, as I think many other projects >> already do; >> > > Using nose would be another prerequisite for anyone running the tests > (although as you point out, it may be possible to include it with > Biopython). > > >> Honestly, I have the feeling that you didn't even had a look at all >> the links I posted in the old discussion on nose, neither you have >> tried it, and that's so bad. You didn't discuss about the pros or cons >> of nose, you just kept saying 'it would add a prerequisite to >> biopython' (which is not true, again), and started writing your own >> new test discovery framework. >> > > We didn't just start writing our own framework (which I agree would be > a waste of time). We already had a simple framework, and with > Michiel's recent changes it make more use of the python unittest > infrastructure. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From dalloliogm at gmail.com Tue Feb 3 19:13:43 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 01:13:43 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <425937.30005.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <425937.30005.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570902031613l6d69908w72331a532ca3c095@mail.gmail.com> On 2/3/09, Michiel de Hoon wrote: > Maybe it was a mistake to call this a rewrite ... basically all I'm doing is making some changes in run_tests.py so that it will distinguish between unittest-style tests and print-and-compare tests, and cleaning up some code while I'm at it. This will allow us to remove the trivial output files for the unittest-style tests, which were a real annoyance because they had to be updated whenever a new test was added to an existing test script. And since the output files did not contain any real information, people tended to forget that. > Maybe nose can do the same as unittest, but unittest comes with Python and nose does not, so as long as unittest does the job, I see no reason to change to nose. uff no!! :) nose is not a library, neither it is a substitute for unittest. it is a tool that you run from the command line and does exactly what you are doing with your run_tests.py: it finds and discover any function resembling a test (including unittests) and execute them. Only, it does it very well, since it is developed by many people and it is more than one year old. -> http://code.google.com/p/python-nose/wiki/NosetestsUsage > > --Michiel. > > > > > > > > > > -- > > > > My blog on bioinformatics (now in English): > > http://bioinfoblog.it > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Tue Feb 3 19:35:07 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 01:35:07 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <752614.16176.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <752614.16176.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> On 2/3/09, Michiel de Hoon wrote: > > However, this way, test_docstring will be difficult to > > mantain in the future. > > A better solution would be to have run_test.py go through > > > all biopython's modules, and then execute every doctest it > > encounters. > > You can do this with doctest.DocTestFinder (have a look at > > nose's code, which does it already: > > > Can doctest.DocTestFinder handle missing external dependencies? For example, if a user installed Biopython without NumPy, then the NumPy-dependent modules should be skipped and not flagged as errors. mmm no idea, sorry :( > > > Moreover, why the typical user should be running > > biopython's tests? > > > To make sure that it works. Biopython interacts with and therefore depends more on 3rd party software, web servers, and file formats than most other Python modules. Things are more likely to break than for example for a more self-contained library such as NumPy. I always run the Biopython tests, and I would advise every user to do so too. In addition, the tests can function as example scripts showing how to use Biopython. It is important that all users can run those scripts. I think that all the tests which check if biopython can run correctly on a computer should be separated from all the others. Why do I have to test whether biopython correctly translate the sequence ACTAGCT to a protein code when I install biopython? It should have been already checked by the developers/volonteers. If I want to install biopython on my computer, I want to run only the tests needed to make it sure it can work fine on my configuration, not all of them. As an example, take pytable, a library to handle HDF5 files with python. The authors claim that they have written more than 10^6 tests for it. However, when you install pytables from source, you don't have to run all of these tests: but only a subset of them, the ones required to check if it can run correctly on your computer. Consider that some of the tests on pytables take hours or days to complete, because they check the handling of big binary files. The idea is that, if we separate the tests on the code from the ones on the configuration, we will be able to enhance the test section of biopython a lot. For example, at the moment there are not many tests to check biopython's behaviour with big sequence files (e.g. 1 GB). It would be useful to have such tests, because now it is becoming common to handle big files in bioinformatics, and it would be possible to do some profiling on that. With that strategy, it would make sense to adopt a tool like nose which enchance the test framework a lot. For example, it will be very difficult to write tests on big files without using global fixtures (which the basic unittest doesn't support). This means that if you want to write a test which studies the handling of 1 GB sequence file with biopython, with the basic python testing framework, you are forced to open the file on every test (setUp function) while with a global fixture, you will be able to do it in a very elegant way. nose has a lot of many other interesting features: it supports fixtures for doctests, it can be used to profile the execution of all tests, and it supports many plugins. For example, have a look at these ones: http://darcs.idyll.org/~t/projects/pinocchio/doc/#stopwatch-selecting-tests-based-on-execution-time > > > > What about having support to global fixtures? > > For example, many test scripts begin in the same way: they > > 'import > > numpy', check for python's version, etc.. All of > > this could be moved > > to a global fixture and then executed only once for all the > > tests. > > > Hmm... currently the Biopython tests can be written essentially independently of each other, without knowing much about the testing overall framework. I think that that makes it easier for new users/developers to add tests. I think we should avoid the situation that somebody first has to study Biopython's testing framework to be able to add a test. You could write a skeleton for biopython's tests, and it will be a lot useful (e.g. have a look at this recipe for elixir: http://elixir.ematia.de/trac/wiki/Recipes/Testing) > > > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Tue Feb 3 21:57:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Feb 2009 21:57:14 -0500 Subject: [Biopython-dev] [Bug 2749] New: Proposal: a template for biopython's unittests Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2749 Summary: Proposal: a template for biopython's unittests Product: Biopython Version: Not Applicable Platform: All URL: http://github.com/dalloliogm/bio-test-datasets- repository/blob/master/templates/biopython/biotest_templ ate.py OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com I have posted here: - http://github.com/dalloliogm/bio-test-datasets-repository/blob/master/templates/biopython/biotest_template.py a draft for a template for biopython's unittests. The idea is that if you provide a template for writing unittests for biopython, it will be easier for new developers. This example, in particular, makes uses of nose, and it has example of global fixtures (the two setUpAll and tearDownAll methods). It could be adapted for being used without nose, but it will be more difficult to understand. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Feb 3 22:01:51 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 04:01:51 +0100 Subject: [Biopython-dev] a template for unittests in biopython Message-ID: <5aa3b3570902031901q5450bf43nfeb23ded1c70608c@mail.gmail.com> Hi people, I have posted here: - http://github.com/dalloliogm/bio-test-datasets-repository/blob/master/templates/biopython/biotest_template.py a draft for a template for unittests in biopython. You can also refer to it as bug 2749 (http://bugzilla.open-bio.org/show_bug.cgi?id=2749). So, the idea is that if we have a template for test files, it will be easier for new developers to write new tests and modules. This one in particular makes use of nose, and it has some example of global fixtures (the two setUpClass and tearDownAll methods). What do you think about it? Cheers.. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Tue Feb 3 22:57:23 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 19:57:23 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> Message-ID: <289461.11809.qm@web62401.mail.re1.yahoo.com> > In terms of cleaning up the code, something we can probably > now remove from the print-and-compare handler is the special > case of modules called via a run_tests method. I've removed these run_tests function from the print-and-compare tests, and from a few unittest-based tests where this function is not actually being used. I've updated run_tests.py accordingly. --Michiel. From biopython at maubp.freeserve.co.uk Wed Feb 4 05:22:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Feb 2009 10:22:05 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> References: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <752614.16176.qm@web62402.mail.re1.yahoo.com> <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> Message-ID: <320fb6e00902040222r4b3297b3q3b4141db1f16a25d@mail.gmail.com> >> > Moreover, why the typical user should be running >> > biopython's tests? >> >> >> To make sure that it works. Biopython interacts with and therefore >> depends more on 3rd party software, web servers, and file formats >> than most other Python modules. Things are more likely to break >> than for example for a more self-contained library such as NumPy. >> I always run the Biopython tests, and I would advise every user to >> do so too. In addition, the tests can function as example scripts >> showing how to use Biopython. It is important that all users can run those scripts. > > I think that all the tests which check if biopython can run correctly > on a computer should be separated from all the others. > Why do I have to test whether biopython correctly translate the > sequence ACTAGCT to a protein code when I install biopython? It should > have been already checked by the developers/volonteers. If I want to > install biopython on my computer, I want to run only the tests needed > to make it sure it can work fine on my configuration, not all of them. As an end user, I would still prefer to know that even simple things like translation have been checked as working on my machine. With a very simple example like this is it unlikely to break on some setups and not others, but for many test cases it is very hard to make this judgement call. The only real way to "to make it sure it can work fine on my configuration" is to just test everything - and it doesn't take that long anyway. > As an example, take pytable, a library to handle HDF5 files with python. > The authors claim that they have written more than 10^6 tests for it. > However, when you install pytables from source, you don't have to run > all of these tests: but only a subset of them, the ones required to > check if it can run correctly on your computer. Consider that some of > the tests on pytables take hours or days to complete, because they > check the handling of big binary files. OK, this is a little different - simply because of the time taken. If the full test suite takes hours or more, then I can see why the pytables people only distribute a subset of the tests. > The idea is that, if we separate the tests on the code from the ones > on the configuration, we will be able to enhance the test section of > biopython a lot. > For example, at the moment there are not many tests to check > biopython's behaviour with big sequence files (e.g. 1 GB). It would be > useful to have such tests, because now it is becoming common to handle > big files in bioinformatics, and it would be possible to do some > profiling on that. If you want developers to download 1 GB files as part of building and testing Biopython, it will be a hurdle/barrier to development. Even for existing developers, it would make setting up a new machine that much more complicated. Other than looking at performance speed/memory, we can check most features of large multi-record files with much smaller examples. Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 4 05:27:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 05:27:28 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902041027.n14ARS6S023505@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-04 05:27 EST ------- The current view from the Biopython developers is that we don't want to depend on nose for running our unit tests (nose is not installed automatically as part of python). This has been discussed on the mailing list, so I won't repeat myself here. In this example, having global setUpAll and tearDownAll methods isn't needed, but I can see how they might be helpful on larger (slower) tests. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed Feb 4 05:30:31 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 4 Feb 2009 02:30:31 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> Message-ID: <114112.52378.qm@web62406.mail.re1.yahoo.com> I've uploaded to CVS a modified version of run_tests.py to address import errors. Could you have a look to see if you agree with my solution? --Michiel. --- On Tue, 2/3/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Tuesday, February 3, 2009, 9:02 AM > Michiel, > > I've noticed that for print-and-compare tests we can > get unexpected > errors from the line: > module = __import__(name) > > For example, if there is an IOError in test_SeqIO_online.py > this does > not get caught - we only try to catch a > MissingExternalDependencyError. Perhaps we should also > catch any > generic exception and report that test as a failure. > Otherwise, the > run_test.py file terminates prematurely. > > Would you like to look into this, or should I? > > Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 4 06:28:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 06:28:46 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902041128.n14BSkDG030225@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #2 from dalloliogm at gmail.com 2009-02-04 06:28 EST ------- yeee! I just come with an idea which makes this test template work both with and without nose. The global fixture methods should be called manually before executing the test suite. I am sure there is a way to do this automatically rather than manually as it is now. Anyway, look at the latest commit: - http://github.com/dalloliogm/bio-test-datasets-repository/commit/53554d7ee9a117bc4df9e9ea5bc844e0d4e4d2fa It can improved, of course. However, the idea behind this feature proposal is to have a template for unittests in biopython. What do you think about it? It can be refined... for example, telling people which version of numpy they should import if they need it, how they should format docstrings, etc.. The same can be done for a template of sequence formats parser. I was looking for something like that when I wrote the fastPhase parser. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Wed Feb 4 06:37:56 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 12:37:56 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902040222r4b3297b3q3b4141db1f16a25d@mail.gmail.com> References: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <752614.16176.qm@web62402.mail.re1.yahoo.com> <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> <320fb6e00902040222r4b3297b3q3b4141db1f16a25d@mail.gmail.com> Message-ID: <5aa3b3570902040337h26c590bara35a096a9d642c9b@mail.gmail.com> On Wed, Feb 4, 2009 at 11:22 AM, Peter wrote: >>> > Moreover, why the typical user should be running >>> > biopython's tests? >>> >>> >>> To make sure that it works. Biopython interacts with and therefore >>> depends more on 3rd party software, web servers, and file formats >>> than most other Python modules. Things are more likely to break >>> than for example for a more self-contained library such as NumPy. >>> I always run the Biopython tests, and I would advise every user to >>> do so too. In addition, the tests can function as example scripts >>> showing how to use Biopython. It is important that all users can run those scripts. >> >> I think that all the tests which check if biopython can run correctly >> on a computer should be separated from all the others. >> Why do I have to test whether biopython correctly translate the >> sequence ACTAGCT to a protein code when I install biopython? It should >> have been already checked by the developers/volonteers. If I want to >> install biopython on my computer, I want to run only the tests needed >> to make it sure it can work fine on my configuration, not all of them. > > As an end user, I would still prefer to know that even simple things > like translation have been checked as working on my machine. With a > very simple example like this is it unlikely to break on some setups > and not others, but for many test cases it is very hard to make this > judgement call. The only real way to "to make it sure it can work > fine on my configuration" is to just test everything - and it doesn't > take that long anyway. It doesn't take long, but the developers are forced to write tests which don't take long. However, this doesn't mean that big tests are not necessary. Many libraries I have installed have two separated commands, 'setup.py test' and 'setup.py test_all'. >> As an example, take pytable, a library to handle HDF5 files with python. >> The authors claim that they have written more than 10^6 tests for it. >> However, when you install pytables from source, you don't have to run >> all of these tests: but only a subset of them, the ones required to >> check if it can run correctly on your computer. Consider that some of >> the tests on pytables take hours or days to complete, because they >> check the handling of big binary files. > > OK, this is a little different - simply because of the time taken. If > the full test suite takes hours or more, then I can see why the > pytables people only distribute a subset of the tests. > >> The idea is that, if we separate the tests on the code from the ones >> on the configuration, we will be able to enhance the test section of >> biopython a lot. >> For example, at the moment there are not many tests to check >> biopython's behaviour with big sequence files (e.g. 1 GB). It would be >> useful to have such tests, because now it is becoming common to handle >> big files in bioinformatics, and it would be possible to do some >> profiling on that. > > If you want developers to download 1 GB files as part of building and > testing Biopython, it will be a hurdle/barrier to development. Even > for existing developers, it would make setting up a new machine that > much more complicated. Other than looking at performance > speed/memory, we can check most features of large multi-record files > with much smaller examples. well it is not necessary to put an 1 GB file in the repo.. we could generate it with the random or hmm module, using always the same seed :). It would be a 'package' global fixture. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 4 08:14:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Feb 2009 13:14:14 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <114112.52378.qm@web62406.mail.re1.yahoo.com> References: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> <114112.52378.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00902040514te7c7d2ci245433371770d172@mail.gmail.com> On Wed, Feb 4, 2009 at 10:30 AM, Michiel de Hoon wrote: > > I've uploaded to CVS a modified version of run_tests.py to address import > errors. Could you have a look to see if you agree with my solution? > It look a little while to show up in CVS for me, but I've got it now. That seems to solve the problem neatly - and you've even managed to capture the stack trace elegantly, something I hadn't worked out how to do. Nice :) Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 4 11:16:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 11:16:37 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902041616.n14GGbwk031325@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #3 from bsouthey at gmail.com 2009-02-04 11:16 EST ------- (In reply to comment #1) > The current view from the Biopython developers is that we don't want to depend > on nose for running our unit tests (nose is not installed automatically as part > of python). This has been discussed on the mailing list, so I won't repeat > myself here. Also, the test framework must support Python 2.3 while Biopython supports it. Really I find that the huge diversity in Biopython prevents a 'single' template that is sufficiently easy to follow. I do not like the splitting that test into setups for each 'subtest' followed by a general test. This starts to get rather difficult to read and manage when you have modules like the sequence object involve many different tasks that require a separate setup for each test as well as the actual test. A related problem is that certain tests may require a specific exception for a specific situation. Another problem is that some of the tests are very similar for the same module (say Logistic regression or testing alphabets in reading sequences into a Seq object) so it makes more sense to do what numpy does (http://projects.scipy.org/scipy/numpy/wiki/TestingGuidelines ) where the same test function is used with different inputs. I would like to easily add a new test case to an existing test like Numpy has a test case class that is separate from the actual tests. Just few cents, Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 4 13:02:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 13:02:27 -0500 Subject: [Biopython-dev] [Bug 2750] New: EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2750 Summary: EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: wolfgang.resch at gmail.com for example the following embl record: ID cel-let-7 standard; RNA; CEL; 99 BP. XX AC MI0000001; XX DE Caenorhabditis elegans let-7 stem-loop XX RN [1] RX PUBMED; 11679671. RA Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; RT "An abundant class of tiny RNAs with probable regulatory roles in RT Caenorhabditis elegans"; RL Science. 294:858-862(2001). XX FH Key Location/Qualifiers FH FT miRNA 17..38 FT /accession="MIMAT0000001" FT /product="cel-let-7" FT /evidence=experimental FT /experiment="cloned [1-3,5], Northern [1], PCR [4]" XX SQ Sequence 99 BP; 26 A; 19 C; 24 G; 0 T; 30 other; uacacugugg auccggugag guaguagguu guauaguuug gaauauuacc accggugaac 60 uaugcaauuu ucuaccuuac cggagacaga acucuucga 99 // is parsed as follows: authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; title: Caenorhabditis elegans"; journal: Science. 294:858-862(2001). medline id: pubmed id: comment: -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Wed Feb 4 14:55:20 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 04 Feb 2009 13:55:20 -0600 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <240911.28388.qm@web62402.mail.re1.yahoo.com> References: <240911.28388.qm@web62402.mail.re1.yahoo.com> Message-ID: <4989F2A8.3020300@gmail.com> Michiel de Hoon wrote: > These three tests currently are written as a combination of a unittest-based test and a print-and-compare test. That is, they contain classes deriving from unittest.TestCase, but then print out stuff that should get compared to the output file. However, run_tests.py assumes that they are true unittest-style tests, so the comparison is never done. > > Does anybody mind if I convert these three to pure print-and-compare or pure unittest-style tests? test_Ace.py and test_Nexus.py produce lots of output, so I'm tempted to go with a print-and-compare test there; test_Phd.py might work well as a unittest-style test. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > I looked at these tests and think that these are actually examples not tests (except to say the code run). If so, then I would go with what is easiest. Bruce From bugzilla-daemon at portal.open-bio.org Wed Feb 4 16:58:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 16:58:51 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902042158.n14Lwpra003911@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #4 from dalloliogm at gmail.com 2009-02-04 16:58 EST ------- > (In reply to comment #1) > > The current view from the Biopython developers is that we don't want to depend > > on nose for running our unit tests (nose is not installed automatically as part > > of python). This has been discussed on the mailing list, so I won't repeat > > myself here. > > Also, the test framework must support Python 2.3 while Biopython supports it. > > Really I find that the huge diversity in Biopython prevents a 'single' template > that is sufficiently easy to follow. ok, but you should give to new developers at least some guidelines on how they should write tests, documentation, and code. The fact that the tests in biopython are so various is not a positive point, it make it difficult to understand and to maintain them, especially for newcomers. > I do not like the splitting that test into > setups for each 'subtest' followed by a general test. Well, it is a matter of taste, I think. I find it elegant and rather clear: you can easily see in which conditions and environment every test is run, the code in every test method is reducted to the minimum, and you clean everything after the execution of the first test, so the order in which the tests are executed doesn't count. > This starts to get rather > difficult to read and manage when you have modules like the sequence object > involve many different tasks that require a separate setup for each test as > well as the actual test. You should put those in a different test module. Every test unit is a particular use case: for example, look at my example, where the first unit test is a simple sequence, and the second (subclassed) is a blank one. > A related problem is that certain tests may require a > specific exception for a specific situation. mmm what do you mean, exactly? > Another problem is that some of the tests are very similar for the same module > (say Logistic regression or testing alphabets in reading sequences into a Seq > object) so it makes more sense to do what numpy does > (http://projects.scipy.org/scipy/numpy/wiki/TestingGuidelines ) where the same > test function is used with different inputs. That will be difficult to do until you are so convinced against using nose :(. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 5 13:14:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Feb 2009 13:14:19 -0500 Subject: [Biopython-dev] [Bug 2751] New: PDBParser crashes on empty tempFactor fields Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2751 Summary: PDBParser crashes on empty tempFactor fields Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com When parsing ATOM lines, Bio.PDB.PDBParser appears to be passing the contents of indexes 60-66 directly to the float() constructor without checking if the string is empty (or all spaces). The PDB spec seems to indicate that the default value for this field should be 0.0: http://www.wwpdb.org/documentation/format23/sect9.html#ATOM I interpret that to mean PDBParser should assume 0.0 if the string is blank, at least in permissive mode; otherwise, perhaps a PDBException should be raised. Here's a traceback: File "/usr/lib/python2.5/site-packages/Bio/PDB/PDBParser.py", line 66, in get_structure self._parse(file.readlines()) File "/usr/lib/python2.5/site-packages/Bio/PDB/PDBParser.py", line 86, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/usr/lib/python2.5/site-packages/Bio/PDB/PDBParser.py", line 160, in _parse_coordinates bfactor=float(line[60:66]) ValueError: empty string for float() This occurs when parsing a file that looks like this: HEADER 1ad5 ATOM 4255 N GLU B 82 -6.363 45.622 156.936 1.00 69.02 ATOM 4256 CA GLU B 82 -6.235 44.414 157.713 1.00 68.26 ATOM 4257 C GLU B 82 -5.067 44.774 158.648 1.00 68.19 ATOM 4258 O GLU B 82 -5.169 45.863 159.227 1.00 67.24 ATOM 4259 CB GLU B 82 -5.903 43.230 156.774 1.00 68.47 ATOM 4260 H1 GLU B 82 -6.252 46.392 157.641 1.00 0 ATOM 4261 H2 GLU B 82 -5.588 45.683 156.246 1.00 0 ATOM 4262 H3 GLU B 82 -7.267 45.667 156.437 1.00 0 ATOM 4263 N ASP B 83 -3.979 43.981 158.770 1.00 67.44 ... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 5 13:25:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Feb 2009 13:25:10 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902051825.n15IPAXF016833@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #1 from eric.talevich at gmail.com 2009-02-05 13:25 EST ------- (In reply to comment #0) Sorry, that PDB example was manually fixed. The broken line format is: ATOM 4260 H1 GLU B 82 -6.252 46.392 157.641 1.00 This is from an odd edition of the 1AD5 structure; RCSB's version has the 0.0 values filled in correctly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 05:46:54 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 05:46:54 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902061046.n16AksQd020147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 05:46 EST ------- (In reply to comment #0) > When parsing ATOM lines, Bio.PDB.PDBParser appears to be passing the contents > of indexes 60-66 directly to the float() constructor without checking if the > string is empty (or all spaces). > > The PDB spec seems to indicate that the default value for this field should be > 0.0: > http://www.wwpdb.org/documentation/format23/sect9.html#ATOM > > I interpret that to mean PDBParser should assume 0.0 if the string is blank, > at least in permissive mode; otherwise, perhaps a PDBException should be > raised. I would have read that spec to mean if you don't know the tempFactor, put "0.0" in the field and don't leave it blank. By this interpretation of the spec, your old file is invalid, and Biopython's failure is therefore not unreasonable. It would be good to cope with this in permissive mode though, and raise a more meaningful error in strict mode. (In reply to comment #1) > (In reply to comment #0) > > Sorry, that PDB example was manually fixed. The broken line format is: > > ATOM 4260 H1 GLU B 82 -6.252 46.392 157.641 1.00 > > This is from an odd edition of the 1AD5 structure; RCSB's version has the 0.0 > values filled in correctly. Do you have a link to download the old ("invalid") version of PDB reference 1AD5, as it would be very helpful to test this on a real file? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Feb 6 06:22:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 6 Feb 2009 11:22:16 +0000 Subject: [Biopython-dev] Biopython tutorial update for unit tests Message-ID: <320fb6e00902060322s2d860056yd61dabd19b144d00@mail.gmail.com> Hi all [I thought I sent this email on Wednesday - oh well, better late than never!] I've recently checked in a revision to the test case section of the tutorial, see tutorial.tex revision 194, http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython My intention is to describe the current test system in more detail. I've tried to make sure the text makes sense for both Biopython 1.49 (in case we want to update the website before the next release) and CVS (assuming we do get rid of the expected output files as currently being trialled). Let me know if anyone spots a typo, or something that should be clearer. You'll need (pdf)latex to build the PDF file, and hevea for the HTML output - but the raw tex file can just be read directly from the CVS link instead. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 6 07:27:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 07:27:49 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061227.n16CRng0029039@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 07:27 EST ------- Confirmed title problem, example code using your EMBL record saved to a file: >>> from Bio import SeqIO >>> record = SeqIO.read(open("long_ref.embl"),"embl") >>> print record.annotations["references"][0] authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; title: Caenorhabditis elegans"; journal: Science. 294:858-862(2001). medline id: pubmed id: comment: This is due to a subtle difference between the GenBank and EMBL scanner code, the GenBank scanner pre-combines the title lines before passing it to the consumer, while the EMBL scanner passes the title in chunks. Fixed the consumer to cope with either. Also fixed for multi-line author lists etc. Could you update your Bio/GenBank/__init__.py file to CVS revision 102, which you will be able to download here, and retest: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython Or update the full installation to CVS if you would find that easier. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 08:34:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 08:34:22 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061334.n16DYMkp005189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 08:34 EST ------- Regarding the missing PUBMED ID, that is also now fixed in CVS. Note that this still ignores DOI and AGRICOLA references (supporting this would require a change to our reference object, and perhaps our BioSQL bindings to). You will need to update your Bio/GenBank/Scanner.py file to revision 1.27 which you will be able to download here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/Scanner.py?cvsroot=biopython Rather than manually updating these two files (Bio/GenBank/__init__.py as per comment 1, and Bio/GenBank/Scanner.py as above), you may find doing a full installation from CVS simpler. e.g. >>> from Bio import SeqIO >>> record = SeqIO.read(open("long_ref.embl"),"embl") >>> for ref in record.annotations["references"] : print ref ... authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; title: "An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans"; journal: Science. 294:858-862(2001). medline id: pubmed id: 11679671 comment: Again, please let us know if that solves your problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 09:47:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 09:47:10 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061447.n16ElAAr013975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 ------- Comment #3 from wolfgang.resch at gmail.com 2009-02-06 09:47 EST ------- Peter, phantastic - that solved the problem. I've really got to learn the internals of biopython... Thanks and best regards, Wolfgang -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 09:57:48 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 09:57:48 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061457.n16Evmkm015880@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 09:57 EST ------- (In reply to comment #3) > Peter, > > phantastic - that solved the problem. I've really got to learn the internals > of biopython... > > Thanks and best regards, > > Wolfgang Good to know that's working - marking this as FIXED. If you do find anything else amiss, please report it. The EMBL parsing is not yet as well tested / well used as the GenBank support... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 13:29:45 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 13:29:45 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902061829.n16ITj9p007988@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #3 from eric.talevich at gmail.com 2009-02-06 13:29 EST ------- Created an attachment (id=1215) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1215&action=view) PDB file with some missing bfactor fields -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Fri Feb 6 15:11:15 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 6 Feb 2009 15:11:15 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring Message-ID: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> Hello all, Scanning the biopython-dev mailing list archives, it appears that either the CVS-to-SVN migration either stalled out during the past year, or the discussion about this migration went someplace other than the mailing list and wiki. I did, however, find another page for the project on Launchpad ( https://launchpad.net/biopython), apparently started a few years ago by Jonathan Taylor and abandoned. I didn't see any discussion of it on biopython-dev around that time. I'm pretty fond of of bzr, branching, and Launchpad's PPA feature ( https://help.launchpad.net/Packaging/PPA) in particular, so I'd like to see if it's possible to start mirroring the CVS repository on Launchpad to see how it goes. I'm happy to take care of whatever setup and maintenance is needed. Comments? Is Jonathan Taylor still around and interested in resurrecting the Launchpad page? Best regards, Eric From bugzilla-daemon at portal.open-bio.org Fri Feb 6 16:38:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 16:38:09 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902062138.n16Lc9Vq003304@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #4 from eric.talevich at gmail.com 2009-02-06 16:38 EST ------- Created an attachment (id=1216) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1216&action=view) Catch float() failures and substitute a default or forward the exception The error message could be more helpful, and it would be nice to log a warning whenever the first exception is caught and a default value is used. The placement of try_float() may not match the coding conventions, I'm not sure. Generalizing as: try_coerce(field, into=float, default=None): ... would allow the same function to be used for coercing the integers and re-raising the exceptions as PDBConstructionException. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Feb 7 07:55:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 7 Feb 2009 12:55:42 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> Message-ID: <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> On Fri, Feb 6, 2009 at 8:11 PM, Eric Talevich wrote: > Hello all, > > Scanning the biopython-dev mailing list archives, it appears that either the > CVS-to-SVN migration either stalled out during the past year, or the > discussion about this migration went someplace other than the mailing list > and wiki. There have been some off list discussions with the OBF guys who look after all the servers etc about the logistics doing the migration, and when might suite them. Having all the OBF projects moved from CVS to SVN will make life easier for them (BioPerl etc have already moved). I was actually about to chase that up... > I did, however, find another page for the project on Launchpad ( > https://launchpad.net/biopython), apparently started a few years ago by > Jonathan Taylor and abandoned. I didn't see any discussion of it on > biopython-dev around that time. I guess that was some 3rd party, I don't recall this being discussed here. In terms of other tools, several people here are interested in git, and git and SVN can be made to work together. Hopefully getting Biopython from CVS to SVN will make things easier for them. Peter From bugzilla-daemon at portal.open-bio.org Sat Feb 7 12:44:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 7 Feb 2009 12:44:46 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902071744.n17Hik8w021047@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #5 from eric.talevich at gmail.com 2009-02-07 12:44 EST ------- (From update of attachment 1216) >=== modified file 'Bio/PDB/PDBParser.py' >--- Bio/PDB/PDBParser.py 2009-02-06 20:42:42 +0000 >+++ Bio/PDB/PDBParser.py 2009-02-06 21:17:08 +0000 >@@ -111,6 +111,20 @@ > current_segid=None > current_residue_id=None > current_resname=None >+ >+ def try_float(field, default=None): >+ """Try coercing a string into a float, safely. >+ >+ If the string is not a valid float, then if default is given, >+ default is returned; otherwise an exception is raised. >+ """ >+ try: >+ return float(field) >+ except (ValueError, NameError): >+ if (self.PERMISSIVE==0) or default is None: >+ raise PDBConstructionException("Detected an invalid value in a field") >+ return default >+ > for i in range(0, len(coords_trailer)): > line=coords_trailer[i] > record_type=line[0:6] >@@ -150,13 +164,13 @@ > hetero_flag=" " > residue_id=(hetero_flag, resseq, icode) > # atomic coordinates >- x=float(line[30:38]) >- y=float(line[38:46]) >- z=float(line[46:54]) >+ x=try_float(line[30:38]) >+ y=try_float(line[38:46]) >+ z=try_float(line[46:54]) > coord=numpy.array((x, y, z), 'f') > # occupancy & B factor >- occupancy=float(line[54:60]) >- bfactor=float(line[60:66]) >+ occupancy=try_float(line[54:60], default=0.0) >+ bfactor=try_float(line[60:66], default=0.0) > segid=line[72:76] > if current_segid!=segid: > current_segid=segid >@@ -183,7 +197,7 @@ > except PDBConstructionException, message: > self._handle_PDB_exception(message, global_line_counter) > elif(record_type=='ANISOU'): >- anisou=map(float, (line[28:35], line[35:42], line[43:49], line[49:56], line[56:63], line[63:70])) >+ anisou=map(try_float, (line[28:35], line[35:42], line[43:49], line[49:56], line[56:63], line[63:70])) > # U's are scaled by 10^4 > anisou_array=(numpy.array(anisou, 'f')/10000.0).astype('f') > structure_builder.set_anisou(anisou_array) >@@ -203,13 +217,13 @@ > current_residue_id=None > elif(record_type=='SIGUIJ'): > # standard deviation of anisotropic B factor >- siguij=map(float, (line[28:35], line[35:42], line[42:49], line[49:56], line[56:63], line[63:70])) >+ siguij=map(try_float, (line[28:35], line[35:42], line[42:49], line[49:56], line[56:63], line[63:70])) > # U sigma's are scaled by 10^4 > siguij_array=(numpy.array(siguij, 'f')/10000.0).astype('f') > structure_builder.set_siguij(siguij_array) > elif(record_type=='SIGATM'): > # standard deviation of atomic positions >- sigatm=map(float, (line[30:38], line[38:45], line[46:54], line[54:60], line[60:66])) >+ sigatm=map(try_float, (line[30:38], line[38:45], line[46:54], line[54:60], line[60:66])) > sigatm_array=numpy.array(sigatm, 'f') > structure_builder.set_sigatm(sigatm_array) > local_line_counter=local_line_counter+1 > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Sun Feb 8 01:20:12 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 8 Feb 2009 01:20:12 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> Message-ID: <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> On Sat, Feb 7, 2009 at 7:55 AM, Peter wrote: > > In terms of other tools, several people here are interested in git, > and git and SVN can be made to work together. Hopefully getting > Biopython from CVS to SVN will make things easier for them. > Good to know. I can get behind git, too -- I see BioRuby is already on GitHub, and so are a couple of (partial/modified) branches of Biopython. It looks like git-cvs and git-cvsimport are reasonably complete, or at least enough that mirroring the existing CVS trunk on GitHub would be feasible already. Has this also been discussed before? I'd like to try it sometime, if no one objects. -Eric From dalloliogm at gmail.com Sun Feb 8 11:47:48 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 8 Feb 2009 17:47:48 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> Message-ID: <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> On Sun, Feb 8, 2009 at 7:20 AM, Eric Talevich wrote: > On Sat, Feb 7, 2009 at 7:55 AM, Peter wrote: > >> >> In terms of other tools, several people here are interested in git, >> and git and SVN can be made to work together. Hopefully getting >> Biopython from CVS to SVN will make things easier for them. >> > > Good to know. I can get behind git, too -- I see BioRuby is already on > GitHub, and so are a couple of (partial/modified) branches of Biopython I like github and I think its web interface is one of the best to work with git: it has some tools that I didn't see in the other hosting services supporting git (trac, gitorious), especially those for creating forks. The problem is that the basic account on github is limited to 100 MB, and with the peculiar approach adopted by git (distributed source control) anyone wishing to participate code to biopython should have to create an account on github and in theory create a copy of the repository in his space. Moreover, I think it would be more difficult to use git without the tools offered by github, even if we configure a git repository with trac or similar on the openbio's servers. I don't know if the git-trac plugins has a feature to show all the forks like the one in github. Maybe I am just wrong.. but you should ask to the bioruby people how they are comfortable with these issues, since they are more expert. > > It looks like git-cvs and git-cvsimport are reasonably complete, or at least > enough that mirroring the existing CVS trunk on GitHub would be feasible > already. Has this also been discussed before? I'd like to try it sometime, > if no one objects. > > -Eric > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Sun Feb 8 13:30:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 8 Feb 2009 13:30:14 -0500 Subject: [Biopython-dev] [Bug 2752] New: Context management for Bio.Entrez handles Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2752 Summary: Context management for Bio.Entrez handles Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com I'd like the following code to work: def write_gbk(gi): with open("gi%s.gbk" % gi, 'w+') as outfile: with Entrez.efetch(db='protein, rettype='genbank', id=gi) as gbk: text = gbk.read() outfile.write(text) print "Wrote", gi Since Python 2.5 it's been possible to use the "with" statement to ensure handles are closed properly even if an exception occurs (PEP 343). There's also a decorator, @contextlib.contextmanager, to make this feature easy to support, but in general it works by adding the __enter__ and __exit__ methods to a class. To make Bio.Entrez work this way, we could just add @contextmanager decorators to efetch() and the others, but that would break 2.3 & 2.4 compatibility, so, it's probably best to make a factory class that returns handles on instantiation, and includes __enter__ and __exit__ methods. The e* functions would become trivial classes that derive from the factory; this would also make it possible to remove the redundant code around the deprecated "cgi=None" argument. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Sun Feb 8 14:03:31 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 8 Feb 2009 20:03:31 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> Message-ID: <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> On Sun, Feb 8, 2009 at 5:47 PM, Giovanni Marco Dall'Olio wrote: > I like github and I think its web interface is one of the best to work > with git: it has some tools that I didn't see in the other hosting > services supporting git (trac, gitorious), especially those for > creating forks. > > The problem is that the basic account on github is limited to 100 MB, > and with the peculiar approach adopted by git (distributed source > control) anyone wishing to participate code to biopython should have > to create an account on github and in theory create a copy of the > repository in his space. > > Moreover, I think it would be more difficult to use git without the > tools offered by github, even if we configure a git repository with > trac or similar on the openbio's servers. I don't know if the git-trac > plugins has a feature to show all the forks like the one in github. > Maybe I am just wrong.. but you should ask to the bioruby people how > they are comfortable with these issues, since they are more expert. > > Have you tried to use bazaar+launchpad? It's really easy and should do all the tricks you need from a distributed vcs. It also has features for bugtracking (like trac on github) but i dont' know if we are unhappy with current setup (bugzilla). I think bzr+launchpad has a number of advantages over git+github: -> can work with CVS as a master repository which means that the transition would not require going through SVN (although if it would help people from OBF it is also possible). -> Anyone used to cvs commands (commit, diff, update etc..) can use bzr without trouble. You only need to know new "distributed" commands (push,branch) -> it supports centralized decisions on merging: the possible scenario is that only a limited number of people can merge to the main repository (push in bzr terminology) cheers Bartek From chris.lasher at gmail.com Sun Feb 8 14:34:08 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 8 Feb 2009 14:34:08 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> Message-ID: <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> On Sun, Feb 8, 2009 at 2:03 PM, Bartek Wilczynski wrote: > On Sun, Feb 8, 2009 at 5:47 PM, Giovanni Marco Dall'Olio > wrote: > >> I like github and I think its web interface is one of the best to work >> with git: it has some tools that I didn't see in the other hosting >> services supporting git (trac, gitorious), especially those for >> creating forks. >> >> The problem is that the basic account on github is limited to 100 MB, >> and with the peculiar approach adopted by git (distributed source >> control) anyone wishing to participate code to biopython should have >> to create an account on github and in theory create a copy of the >> repository in his space. >> >> Moreover, I think it would be more difficult to use git without the >> tools offered by github, even if we configure a git repository with >> trac or similar on the openbio's servers. I don't know if the git-trac >> plugins has a feature to show all the forks like the one in github. >> Maybe I am just wrong.. but you should ask to the bioruby people how >> they are comfortable with these issues, since they are more expert. >> >> > Have you tried to use bazaar+launchpad? It's really easy and should do > all the tricks you need from a distributed vcs. It also has features for > bugtracking (like trac on github) but i dont' know if we are unhappy with > current setup (bugzilla). I think bzr+launchpad has a number of advantages > over git+github: > -> can work with CVS as a master repository which means that the > transition would > not require going through SVN (although if it would help people from > OBF it is also possible). > -> Anyone used to cvs commands (commit, diff, update etc..) can use bzr without > trouble. You only need to know new "distributed" commands (push,branch) > -> it supports centralized decisions on merging: the possible scenario > is that only a > limited number of people can merge to the main repository (push in bzr > terminology) This is a good discussion. The longer BioPython has taken to move to SVN and the more I've worked with distributed revision control systems, the more inclined I am to say that moving from CVS to SVN is a waste of time. The advantages of DSCMs and the tools that have emerged around them (GitHub, Launchpad, Bitbucket, etc.) are too great to ignore; at some point in BioPython's path, it will move over to one of these tools. So why not skip to the current generation of SCM? I'm most a fan of Bazaar VCS, especially given its great integration with Launchpad. If BioPython were to move to hosting its bugs on Launchpad (I believe importing from Bugzilla is possible), I think the benefit becomes significantly greater, due to the great ability to automatically associate branches/commits with bugs. If BioPython chooses to stick with Bugzilla, that feature wouldn't be as useful. (I think the same could be said for using the GitHub + Lighthouse combination.) On that note, I do recommend making sure that the BioPython project moves the code to one of these "social coding" sites (e.g., GitHub, Launchpad, Bitbucket). They bring the "who's working on what" that's necessary for tracking the project as a whole. Finally, none of this is really technically challenging, just socially challenging: we have to find a consensus and then actually follow through and make the move. It's 2009; we need to say goodbye to CVS, acknowledge that we missed our time with SVN, and just go straight to a DSCM and a modern code tracking site. Best, Chris L. From eric.talevich at gmail.com Sun Feb 8 14:57:32 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 8 Feb 2009 14:57:32 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> Message-ID: <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> A couple more notes on Launchpad: - Checking out from the master branch does not require signing up for a Launchpad account. Using Launchpad's bug tracker, etc. does, but that's optional and expected. - The PPA feature really is cool, at least using it from Ubuntu. The python-biopython package is included in the main distribution, but Biopython releases happen more frequently than every 6 months, so that package gets out of date. With the PPA, interested users can track new releases in the package manager without downloading a fresh copy or checking out the development version with cvs/svn/bzr. Cheers, Eric From chris.lasher at gmail.com Sun Feb 8 15:11:10 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 8 Feb 2009 15:11:10 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> Message-ID: <128a885f0902081211o5db2e00esdc9aa9055412872f@mail.gmail.com> On Sun, Feb 8, 2009 at 2:57 PM, Eric Talevich wrote: > A couple more notes on Launchpad: > > - Checking out from the master branch does not require signing up for a > Launchpad account. Using Launchpad's bug tracker, etc. does, but that's > optional and expected. > > - The PPA feature really is cool, at least using it from Ubuntu. The > python-biopython package is included in the main distribution, but Biopython > releases happen more frequently than every 6 months, so that package gets > out of date. With the PPA, interested users can track new releases in the > package manager without downloading a fresh copy or checking out the > development version with cvs/svn/bzr. - Launchpad can host, but does not require hosting, the repositories for a project on its servers. It will mirror existing repositories hosted at another location, or simply provide the address for the repositories of branches for a project. In essence, it's happy to just track the presence of branches hosted outside of its own service--a major plus. I just went picking through GitHub and can't find a similar feature. Someone more familiar with GitHub might know a way, though. Chris From bsouthey at gmail.com Mon Feb 9 10:02:00 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Feb 2009 09:02:00 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> Message-ID: <49904568.1040400@gmail.com> Eric Talevich wrote: > A couple more notes on Launchpad: > > - Checking out from the master branch does not require signing up for a > Launchpad account. Using Launchpad's bug tracker, etc. does, but that's > optional and expected. > What is a good project using Launchpad? Ignoring the arguments about it's openness, I have found it to be too slow and difficult to navigate to be useful. Sure the latter is experience since I use command line to update numpy and Biopython. > - The PPA feature really is cool, at least using it from Ubuntu. The > python-biopython package is included in the main distribution, but Biopython > releases happen more frequently than every 6 months, so that package gets > out of date. With the PPA, interested users can track new releases in the > package manager without downloading a fresh copy or checking out the > development version with cvs/svn/bzr. > I find the idea of PPA was complete waste of effort and time! Why? Simply because we are not the distribution maintainers for numpy and Biopython. It would be far better to work with the package maintainers to ensure these are up to date as well as any bugs that may get reported or fixed by them. While I do not use it, I am not sure how relevant that is to just using EasyInstall to provide the latest snapshots which try to avoid any distro or platform requirements. My couple of cents, Bruce From bsouthey at gmail.com Mon Feb 9 11:04:09 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Feb 2009 10:04:09 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> Message-ID: <499053F9.60709@gmail.com> Chris Lasher wrote: > On Sun, Feb 8, 2009 at 2:03 PM, Bartek Wilczynski > wrote: > >> On Sun, Feb 8, 2009 at 5:47 PM, Giovanni Marco Dall'Olio >> wrote: >> >> >>> I like github and I think its web interface is one of the best to work >>> with git: it has some tools that I didn't see in the other hosting >>> services supporting git (trac, gitorious), especially those for >>> creating forks. >>> >>> The problem is that the basic account on github is limited to 100 MB, >>> and with the peculiar approach adopted by git (distributed source >>> control) anyone wishing to participate code to biopython should have >>> to create an account on github and in theory create a copy of the >>> repository in his space. >>> >>> Moreover, I think it would be more difficult to use git without the >>> tools offered by github, even if we configure a git repository with >>> trac or similar on the openbio's servers. I don't know if the git-trac >>> plugins has a feature to show all the forks like the one in github. >>> Maybe I am just wrong.. but you should ask to the bioruby people how >>> they are comfortable with these issues, since they are more expert. >>> >>> >>> >> Have you tried to use bazaar+launchpad? It's really easy and should do >> all the tricks you need from a distributed vcs. It also has features for >> bugtracking (like trac on github) but i dont' know if we are unhappy with >> current setup (bugzilla). I think bzr+launchpad has a number of advantages >> over git+github: >> -> can work with CVS as a master repository which means that the >> transition would >> not require going through SVN (although if it would help people from >> OBF it is also possible). >> -> Anyone used to cvs commands (commit, diff, update etc..) can use bzr without >> trouble. You only need to know new "distributed" commands (push,branch) >> -> it supports centralized decisions on merging: the possible scenario >> is that only a >> limited number of people can merge to the main repository (push in bzr >> terminology) >> > > This is a good discussion. The longer BioPython has taken to move to > SVN and the more I've worked with distributed revision control > systems, the more inclined I am to say that moving from CVS to SVN is > a waste of time. The advantages of DSCMs and the tools that have > emerged around them (GitHub, Launchpad, Bitbucket, etc.) are too great > to ignore; at some point in BioPython's path, it will move over to one > of these tools. So why not skip to the current generation of SCM? > Do you control your own project with multiple developers? If so, how do you ensure which is the standard version and address conflicts? While I understand the advantages of distributed option, I do not see the end result any different between a distributed and a non-distributed version control system. Even in Linux, the only 'tree' that counts is Linus's as he provides the official versions of the kernel. I would argue that same applies to Biopython especially as there appears to be single developers providing their own material to the single tree rather than multiple developers working together. Part of that is legacy in that the core bioinformatics in Biopython is rather complete. > I'm most a fan of Bazaar VCS, especially given its great integration > with Launchpad. If BioPython were to move to hosting its bugs on > Launchpad (I believe importing from Bugzilla is possible), I think the > benefit becomes significantly greater, due to the great ability to > automatically associate branches/commits with bugs. I don't find automatic association between fixes and bugs a reason to change. In numpy's Trac system you can see which version where the bug was closed. > If BioPython > chooses to stick with Bugzilla, that feature wouldn't be as useful. (I > think the same could be said for using the GitHub + Lighthouse > combination.) > > On that note, I do recommend making sure that the BioPython project > moves the code to one of these "social coding" sites (e.g., GitHub, > Launchpad, Bitbucket). They bring the "who's working on what" that's > necessary for tracking the project as a whole. > > Finally, none of this is really technically challenging, just socially > challenging: we have to find a consensus and then actually follow > through and make the move. It's 2009; we need to say goodbye to CVS, > acknowledge that we missed our time with SVN, and just go straight to > a DSCM and a modern code tracking site. > > I think that central question that is lacking so far is how will any of these approaches work with what Biopython is, how Biopython operates and what Biopython provides? It is very easy to argue in general terms on how one system is better than another - lots of web pages on that. But that does not address the needs of the project as a whole. At present, you and others have not specifically addressed how Biopython would benefit from this. How do you maintain a stable tree that always should be correct and addresses conflicts (like different coding style and semantics :-) )? As with Linux, people do not scale, so the one of the main goals of any system is that it should minimize effort of maintaining and producing the stable release. How does a user get the 'latest' version if they have bug? How do you even know what version that actually have? How do they avoid picking up other changes that a developer has made in addition to that bug fix? (Not that any system is immune, like developers adding unsupported dependencies or undefined variables as in recent cases in numpy). I also favor the centralized system because I am not a Biopython developer but a tester. So getting the current version is essential to do that and I do not want to have to pull other people's code in to do that especially if it brings in new code not related to a fix. Nor do I think that an extended period of pre-release testing is suitable for Biopython. Just some thoughts, Bruce From bartek at rezolwenta.eu.org Mon Feb 9 11:08:17 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 9 Feb 2009 17:08:17 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902090807j46586568k5300a3565516d4bc@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> <49904568.1040400@gmail.com> <8b34ec180902090807j46586568k5300a3565516d4bc@mail.gmail.com> Message-ID: <8b34ec180902090808i7968a054nc77fb7190dac50f1@mail.gmail.com> Hi, On Mon, Feb 9, 2009 at 4:02 PM, Bruce Southey wrote: > What is a good project using Launchpad? Probably the biggest one would be MySQL (https://launchpad.net/mysql-server). You may also look ?at http://bazaar-vcs.org/WhoUsesBzr > Ignoring the arguments about it's openness, I have found it to be too slow > and difficult to navigate to be useful. Sure the latter is experience since > I use command line to update numpy and Biopython. I don't exactly know what you have done, so it's hard to say what is at fault here. There are two separate pieces of software at work here: -bzr, the proper dvcs, which is a command-line tool -launchpad, the website where you can host your code branches and projects I find bzr at ?faster than cvs (although it is considerably slower than git), and I don;t find launchpad slow, but as usually with websites, YMMV. > > I find the idea of PPA was complete waste of effort and time! Why? Simply > because we are not the distribution maintainers for numpy and Biopython. It > would be far better to work with the package maintainers to ensure these are > up to date as well as any bugs that may get reported or fixed by them. While > I do not use it, I am not sure how relevant that is to just using > EasyInstall to provide the latest snapshots which try to avoid any distro or > platform requirements. I don't think that PPA is an important thing for biopython. It might be a nice addition for those who use ubuntu (I do) and there is not much effort required, once there is a current bzr branch of biopython available in launchpad. However it doesn't need to be an official one, so It's not a major issue now. -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bartek at rezolwenta.eu.org Mon Feb 9 11:24:59 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 9 Feb 2009 17:24:59 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902090824t1acacbd7lf0377202c03ee6bf@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <8b34ec180902090824t1acacbd7lf0377202c03ee6bf@mail.gmail.com> Message-ID: <8b34ec180902090824k20988294hbff5b9c0525c486e@mail.gmail.com> Hi, On Mon, Feb 9, 2009 at 5:04 PM, Bruce Southey wrote: >> >> This is a good discussion. The longer BioPython has taken to move to >> SVN and the more I've worked with distributed revision control >> systems, the more inclined I am to say that moving from CVS to SVN is >> a waste of time. The advantages of DSCMs and the tools that have >> emerged around them (GitHub, Launchpad, Bitbucket, etc.) are too great >> to ignore; at some point in BioPython's path, it will move over to one >> of these tools. So why not skip to the current generation of SCM? >> > > Do you control your own project with multiple developers? > If so, how do you ensure which is the standard version and address > conflicts? > > While I understand the advantages of distributed option, I do not see the > end result any different between a distributed and a non-distributed version > control system. Even in Linux, the only 'tree' that counts is Linus's as he > provides the official versions of the kernel. I would argue that same > applies to Biopython especially as there appears to be single developers > providing their own material to the single tree rather than multiple > developers working together. Part of that is legacy in that the core > bioinformatics in Biopython is rather complete. > That's the point. Linux is a perfect example how a large project can benefit from using a distibuted vcs. The official branch is the one which is linked from biopython.org website. But anyone can _easily_ branch it on his/her own, make changes to it and send submit it for merge with the trunk or just publish it so people can use his branch. >> I'm most a fan of Bazaar VCS, especially given its great integration >> with Launchpad. If BioPython were to move to hosting its bugs on >> Launchpad (I believe importing from Bugzilla is possible), I think the >> benefit becomes significantly greater, due to the great ability to >> automatically associate branches/commits with bugs. > > I don't find automatic association between fixes and bugs a reason to > change. In numpy's Trac system you can see which version where the bug was > closed. I think that using launchpad for bugtracking is a separate issue. There are different options here. The good thing about launchpad+bzr is that it allows this, so it won't be a problem if we decide to switch from bugzilla to somehing else. But it is a separate decision. >> Finally, none of this is really technically challenging, just socially >> challenging: we have to find a consensus and then actually follow >> through and make the move. It's 2009; we need to say goodbye to CVS, >> acknowledge that we missed our time with SVN, and just go straight to >> a DSCM and a modern code tracking site. >> >> > > I think that central question that is lacking so far is how will any of > these approaches work with what Biopython is, how Biopython operates and > what Biopython provides? > It is very easy to argue in general terms on how one system is better than > another - lots of web pages on that. But that does not address the needs of > the project as a whole. At present, you and others have not specifically > addressed how Biopython would benefit from this. > > How do you maintain a stable tree that always should be correct and > addresses conflicts (like different coding style and semantics :-) )? > As with Linux, people do not scale, so the one of the main goals of any > system is that it should minimize effort of maintaining and producing the > stable release. > > How does a user get the 'latest' version if they have bug? How do you even > know what version that actually have? > How do they avoid picking up other changes that a developer has made in > addition to that bug fix? (Not that any system is immune, like developers > adding unsupported dependencies or undefined variables as in recent cases in > numpy). > > I also favor the centralized system because I am not a Biopython developer > but a tester. So getting the current version is essential to do that and I > do not want to have to pull other people's code in to do that especially if > it brings in new code not related to a fix. ?Nor do I think that an extended > period of pre-release testing is suitable for Biopython. > Absolutely right. I think there is a misconception about the "distributed" part of git or bzr. I don't think anybody was proposing some guerilla style development with no official releases and code-base. Using dvcs is for enabling people to contribute effectively rather than because it centralized development easier. The key thing here that bzr/launchpad (or git+github, but I'll stick to what I know for sake of this example) does not _need_ to be the main repository for biopython. I think that possible advantages are not so much ?in using it internally, but making it easier for people to branch and merge. Having an "official" bzr branch of biopython which is automatically updated from current main vcs (currently CVS) makes branching as easy as writing: bzr branch lp:biopython After someone has made a number of changes (and commits to his local vcs) and is happy with the result you just do bzr send lp:biopython and the maintainer of the branch gets notified about a submission of a patch. Then he can decide to merge it into trunk (without loosing any changes history) or refuse. Once the changes are merged into the official bzr branch it's easy to commit them back to CVS. After a while, if people are happy with using bzr instead ov cvs, weo could switch to bzr to avoid synchronizing with CVS, but this is not necessary. It's all about making it easier for people to get involved. Currently the only possibility to participate is to send patches through bugzilla or mailing list but merging this into a cvs is a nightmare. While in bzr (or git) you can develop "on a branch" locally, without disturbing anyone, and then merge with trunk without loosing your development history (virtually impossible in cvs or svn) -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Mon Feb 9 11:29:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 9 Feb 2009 16:29:37 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499053F9.60709@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> Message-ID: <320fb6e00902090829h5f4b02e6xcad41f9b47c9be68@mail.gmail.com> On Mon, Feb 9, 2009 at 4:04 PM, Bruce Southey wrote: > > I also favor the centralized system because I am not a Biopython developer > but a tester. So getting the current version is essential to do that and I > do not want to have to pull other people's code in to do that especially if > it brings in new code not related to a fix. > In case there had been any confusion here, yes, even with a distributed source code system, we do need some centralization. i.e. Even if we do end up with several developers having their own git branches, we would have to have an "official" tree used for the releases and installers published on Biopython.org (and this official tree could potentially be CVS, SVN or git based). Bartek has just written an email saying more or less the same thing. Peter From bugzilla-daemon at portal.open-bio.org Mon Feb 9 11:46:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 11:46:10 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902091646.n19GkACh021770@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-09 11:46 EST ------- (In reply to comment #3) > Created an attachment (id=1215) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1215&action=view) [details] > PDB file with some missing bfactor fields > Where did this come from? The official 1AD5 file from the PDB has valid bfactor fields present: http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1AD5 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Mon Feb 9 11:59:36 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 9 Feb 2009 17:59:36 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499053F9.60709@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> Message-ID: <5aa3b3570902090859q5ea82e3au87d94a708e3b2b74@mail.gmail.com> On Mon, Feb 9, 2009 at 5:04 PM, Bruce Southey wrote: > Chris Lasher wrote: >> > > Do you control your own project with multiple developers? > If so, how do you ensure which is the standard version and address > conflicts? > > While I understand the advantages of distributed option, I do not see the > end result any different between a distributed and a non-distributed version > control system. Let's say I want to develop a new module to read fasta sequence, alternative to the current one. With a DVCS, I would fork the official biopython branch, and start working on it. While I am changing things and committing everything to my private branch, the official biopython developers keep committing changes, on the official branch. When I will be sure that my SeqIO personalization is ready, I will send a merge request to you, and it will be easy to know: - which was the exact version and code of biopython when I created my branch; - which commits have been made in the official branch while I was working on mine, so it will be easier to determine how to merge them; - moreover, if my changes will be accepted, the whole history of my private branch will be included in biopython (and it could be useful). Imagine how to do the same with a normal VCS. It would be similar: I would create a local copy of biopython on my computer, and start working on that (since I don't have access to the official repository). When my new module will be ready, I will send the changes to the official biopython branch through bugzilla: the problem is that then, we will have lost the information on which was the version of biopython when I created my local copy, and it will be more difficult to merge it. Have a look at this post: - http://github.com/blog/39-say-hello-to-the-network-graph-visualizer > Even in Linux, the only 'tree' that counts is Linus's as he > provides the official versions of the kernel. I would argue that same > applies to Biopython especially as there appears to be single developers > providing their own material to the single tree rather than multiple > developers working together. Part of that is legacy in that the core > bioinformatics in Biopython is rather complete. > >> I'm most a fan of Bazaar VCS, especially given its great integration >> with Launchpad. If BioPython were to move to hosting its bugs on >> Launchpad (I believe importing from Bugzilla is possible), I think the >> benefit becomes significantly greater, due to the great ability to >> automatically associate branches/commits with bugs. > > I don't find automatic association between fixes and bugs a reason to > change. In numpy's Trac system you can see which version where the bug was > closed. > >> If BioPython >> chooses to stick with Bugzilla, that feature wouldn't be as useful. (I >> think the same could be said for using the GitHub + Lighthouse >> combination.) >> >> On that note, I do recommend making sure that the BioPython project >> moves the code to one of these "social coding" sites (e.g., GitHub, >> Launchpad, Bitbucket). They bring the "who's working on what" that's >> necessary for tracking the project as a whole. >> >> Finally, none of this is really technically challenging, just socially >> challenging: we have to find a consensus and then actually follow >> through and make the move. It's 2009; we need to say goodbye to CVS, >> acknowledge that we missed our time with SVN, and just go straight to >> a DSCM and a modern code tracking site. >> >> > > I think that central question that is lacking so far is how will any of > these approaches work with what Biopython is, how Biopython operates and > what Biopython provides? > It is very easy to argue in general terms on how one system is better than > another - lots of web pages on that. But that does not address the needs of > the project as a whole. At present, you and others have not specifically > addressed how Biopython would benefit from this. > > How do you maintain a stable tree that always should be correct and > addresses conflicts (like different coding style and semantics :-) )? > As with Linux, people do not scale, so the one of the main goals of any > system is that it should minimize effort of maintaining and producing the > stable release. > > How does a user get the 'latest' version if they have bug? How do you even > know what version that actually have? > How do they avoid picking up other changes that a developer has made in > addition to that bug fix? (Not that any system is immune, like developers > adding unsupported dependencies or undefined variables as in recent cases in > numpy). > > I also favor the centralized system because I am not a Biopython developer > but a tester. So getting the current version is essential to do that and I > do not want to have to pull other people's code in to do that especially if > it brings in new code not related to a fix. Nor do I think that an extended > period of pre-release testing is suitable for Biopython. > > Just some thoughts, > Bruce > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Feb 9 12:20:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 12:20:46 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902091720.n19HKkrQ031145@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-09 12:20 EST ------- Hi Eric, Could you try out Bio/PDB/PDBParser.py CVS revision 1.25 please? This allows missing occupancy and B factor (temp factor) fields in permissive mode, and the exception or printed error message does include the line number. I can appreciate that getting these warnings hundreds of times from a single file would be annoying, so perhaps if the fields are just blank a single warning should be given? If you can find any official PDB examples which do this, or get clarification regarding the "legality" of omitting these fields, then I would be happy to change this code. Where are you getting your PDB files from? Note that I have not attempted to deal with ANISOU, SIGUIJ or SIGATM records. Do you have any examples of this, or were these changes just defensive programming? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 9 14:08:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 14:08:29 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902091908.n19J8TNJ022487@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #8 from eric.talevich at gmail.com 2009-02-09 14:08 EST ------- (In reply to comment #7) Works for me. Thanks! > I can appreciate that getting these warnings hundreds of times from a single > file would be annoying, so perhaps if the fields are just blank a single > warning should be given? I haven't explored the rest of Biopython's internals yet -- is there a general logging/warning system where verbosity is configured globally? Another issue: These warnings are printed to standard out, rather than standard error; that would screw up a pipeline. Tracebacks, for instance, are printed on standard error. I assume this complain->stdout situation is the case across the codebase -- should I file a separate bug for that? > If you can find any official PDB examples which do > this, or get clarification regarding the "legality" of omitting these fields, > then I would be happy to change this code. Where are you getting your PDB > files from? Another person in my lab reported the problem. Some other program extracted the 'B' chain from the full PDB file to create this one; I don't know which one, but I believe it's out in the wild, rather than a home-grown script. Scientific Python's PDB parser handles the file without complaint. > Note that I have not attempted to deal with ANISOU, SIGUIJ or SIGATM records. > Do you have any examples of this, or were these changes just defensive > programming? I have no examples, it's just defensive. Using try_float() instead of float() everywhere re-raises any ValueExceptions as PDBConstructionExceptions, and only eats the exception if a default value is supplied. Some Scheme sympathies showing, I guess -- it's a closure that generally works the same as the float constructor, but with our own error handling. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 9 15:36:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 15:36:26 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902092036.n19KaQPt010719@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-09 15:36 EST ------- (In reply to comment #8) > > I haven't explored the rest of Biopython's internals yet -- is > there a generallogging/warning system where verbosity is > configured globally? No, this is specific to Bio.PDB, presumably a reaction to the number of technically invalid but still useful PDB files one has to deal with. > Another issue: These warnings are printed to standard out, rather > than standard error; that would screw up a pipeline. Tracebacks, > for instance, are printed on standard error. I assume this > complain->stdout situation is the case across the > codebase -- should I file a separate bug for that? Please do - but keep it focused on Bio.PDB as I can't think of any other modules which do anything similar off hand. > > If you can find any official PDB examples which do this, or get > > clarification regarding the "legality" of omitting these fields, > > then I would be happy to change this code. Where are you > > getting your PDB files from? > > Another person in my lab reported the problem. Some other program > extracted the 'B' chain from the full PDB file to create this one; > I don't know which one, but I believe it's out in the wild, rather > than a home-grown script. Fair enough. If you can chase that up, it will make other people's lives a tiny bit easier in the future - assuming my reading of the PDB format is valid that is ;) You should be able work with the full PDB file in Biopython, and just look at the one chain you are interested in. > Scientific Python's PDB parser handles the file without complaint. If I was convinced missing occupancy or B-factors were valid then I agree Biopython shouldn't "complain" either. > > Note that I have not attempted to deal with ANISOU, SIGUIJ or SIGATM > > records. Do you have any examples of this, or were these changes just > > defensive programming? > > I have no examples, it's just defensive. Using try_float() instead of > float() everywhere re-raises any ValueExceptions as > PDBConstructionExceptions, and only eats the exception if a default value > is supplied. Some Scheme sympathies showing, I guess -- it's a closure that > generally works the same as the float constructor, but with our own error > handling. Fair enough. I didn't want to make any "invasive" changes without the original author's input. Anyway, marking this bug as fixed. Thank you Eric! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Mon Feb 9 15:39:04 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 9 Feb 2009 15:39:04 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499053F9.60709@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> Message-ID: <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> Mark Shuttleworth blogged about why Ubuntu chose bzr awhile back: * "Choose lossless VCS tools if you have that luxury" -- http://www.markshuttleworth.com/archives/125 * "Merging is the key to software developer collaboration" -- http://www.markshuttleworth.com/archives/126 The use case for DVCS is what I assume usually happens when a new parser or other module is added to Biopython -- an outside developer has some sizeable chunk of useful code and needs to integrate it with the trunk. "Code bombs" are something the Linux kernel deals with constantly; I have no idea how they'd deal with it in a centralized system. (Nobody does; they never did use cvs.) My lab uses bzr now. I have it set up to work like a centralized repository in general; I'm the only one who uses the distributed features at the moment, switching between a laptop and a workstation. The merging and renaming support is much better than svn's, and it was easier to set up. It feels kind of crazy to me now to add a significant new change to a project's trunk in one monolithic commit, and I feel the pain of any maintaner who has to apply a patch set to the trunk after the developer's branch and the trunk have diverged. Regarding other concerns: - For update operations more advanced than just pulling the latest revision from the trunk, in bzr et al., it's possible to cherry-pick specific revisions from other developers. - Similarly, it's possible to only merge completed bug fixes and enhancements to the trunk, skipping any new/unstable work a developer has embarked on in their branch. That's why Linux is now permanently on version 2.6.X -- basically every commit can be made stable, so there's no need for a new unstable series in the trunk. - Testers should enjoy the ability to pull specific changes while ignoring unrelated code, even in the same file -- the distributed systems all have this capability (shelve). - PPAs are less useful for Python packages; they just let you manage everything from apt instead of easy_install. I use a PPA to keep my lab's machines on the same version of bzr despite having different versions of Ubuntu. I don't use easy_install for anything right now because it scares me. Best, Eric From bugzilla-daemon at portal.open-bio.org Mon Feb 9 15:58:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 15:58:42 -0500 Subject: [Biopython-dev] [Bug 2754] New: Bio.PDB: Parse warnings should print to stderr, not stdout Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2754 Summary: Bio.PDB: Parse warnings should print to stderr, not stdout Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com In Bio.PDB.PDBParser, and perhaps its neighbors, warnings raised while parsing in permissive mode are printed to standard output. In general, messages like this should be printed to standard error to avoid sending garbage to the next program in a pipeline. Recommendation: In PDBParser._handle_PDB_exception, change the print statements to include ">>sys.stderr". Also track down other print statements in the PDB module and send any other warnings to sys.stderr as well. (Grepping Bio/PDB/*.py should work.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Feb 9 16:39:06 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Feb 2009 15:39:06 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> Message-ID: <4990A27A.9060500@gmail.com> Eric Talevich wrote: > Mark Shuttleworth blogged about why Ubuntu chose bzr awhile back: > > * "Choose lossless VCS tools if you have that luxury" -- > http://www.markshuttleworth.com/archives/125 > > * "Merging is the key to software developer collaboration" -- > http://www.markshuttleworth.com/archives/126 > (Yeah but I am not a fan of his writing.) Python PEP 0374 "Migrating from svn to a distributed VCS" makes some points. http://www.python.org/dev/peps/pep-0374/ > The use case for DVCS is what I assume usually happens when a new parser or > other module is added to Biopython -- an outside developer has some sizeable > chunk of useful code and needs to integrate it with the trunk. "Code bombs" > are something the Linux kernel deals with constantly; I have no idea how > they'd deal with it in a centralized system. (Nobody does; they never did > use cvs.) > They complain rather loudly! Really there is a development process that tries to avoid this especially the 'release early, release often' adage. Hopefully this should not happen with Biopython... > My lab uses bzr now. I have it set up to work like a centralized repository > in general; I'm the only one who uses the distributed features at the > moment, switching between a laptop and a workstation. The merging and > renaming support is much better than svn's, and it was easier to set up. It > feels kind of crazy to me now to add a significant new change to a project's > trunk in one monolithic commit, and I feel the pain of any maintaner who has > to apply a patch set to the trunk after the developer's branch and the trunk > have diverged. > How do you avoid this pain? > > Regarding other concerns: > > - For update operations more advanced than just pulling the latest revision > from the trunk, in bzr et al., it's possible to cherry-pick specific > revisions from other developers. > But that requires some degree of advanced knowledge. How easy is it to revert a revision, especially down the road? > - Similarly, it's possible to only merge completed bug fixes and > enhancements to the trunk, skipping any new/unstable work a developer has > embarked on in their branch. That's why Linux is now permanently on version > 2.6.X -- basically every commit can be made stable, so there's no need for a > new unstable series in the trunk. > I do not agree here because the Linux kernel process doesn't work that way see Corbet's take: http://ldn.linuxfoundation.org/book/how-participate-linux-community You have the very frequent merge windows with a testing period, the new staging tree, the -mm tree, and a solid team of 'lieutenants' that reduce many of the problems. In addition, you have the stable tree for major bugs. Would Biopython need to do something similar like having a merge windows and a stable tree? > - Testers should enjoy the ability to pull specific changes while ignoring > unrelated code, even in the same file -- the distributed systems all have > this capability (shelve). > Again requires some advanced knowledge. But I am not sure how much time do I would want to waste on doing that if it does not lead to some thing being included in the main branch compared to code that does enter the main branch. (Yes, it is somewhat selfish.) > - PPAs are less useful for Python packages; they just let you manage > everything from apt instead of easy_install. I use a PPA to keep my lab's > machines on the same version of bzr despite having different versions of > Ubuntu. I don't use easy_install for anything right now because it scares > me. > > I know but I don't use either (nor any rpm-based system for that matter)! > Best, > Eric Bruce From bugzilla-daemon at portal.open-bio.org Mon Feb 9 21:32:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 21:32:24 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902100232.n1A2WO4w005726@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2009-02-09 21:32 EST ------- Could you write a patch to Bio.Entrez? Also, with the proposed modifications does anything change for current users of Bio.Entrez (i.e., people who don't use the "with" statement)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 9 23:02:54 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 23:02:54 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902100402.n1A42s0M027862@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from eric.talevich at gmail.com 2009-02-09 23:02 EST ------- I'll take care of it this week. As I'm picturing this, existing users should be unaffected because the new __enter__ and __exit__ methods won't be called. The class-vs-function distinction will be invisible, unless some flagrant isinstance() testing or other metaprogramming is occurring, and I don't know why anyone would do that with these particular functions. This also means some classes will have lowercase names, contradicting the usual style. I hope that's OK; it's for a good cause. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 10 05:12:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 05:12:12 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902101012.n1AACC8N026482@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-10 05:12 EST ------- (In reply to comment #0) > To make Bio.Entrez work this way, we could just add @contextmanager decorators > to efetch() and the others, ... Isn't it simpler just to change our Bio.Entrez._open function instead of all the Bio.Entrez.e* functions? These Bio.Entrez functions are just wrappers for urllib (via our _open function). From reading the example at the end of this page, it looks like closing a urllib handle is left to the user: http://www.python.org/doc/2.5.1/whatsnew/pep-343.html e.g. import urllib, sys from contextlib import closing with closing(urllib.urlopen('http://www.yahoo.com')) as f: for line in f: sys.stdout.write(line) In the short term (without altering Biopython) using this should work, shouldn't it? from contextlib import closing from Bio import Entrez def write_gbk(gi): with open("gi%s.gbk" % gi, 'w+') as outfile: with closing(Entrez.efetch(db='protein', rettype='genbank', id=gi)) as gbk: text = gbk.read() outfile.write(text) print "Wrote", gi Furthermore, rather than messing about with a factory class (which sounds overly complicated), can we just use contextlib.closing ourselves in the Bio.Entrez._open function? This approach should also be easy to keep backwards compatibility with older versions of python. i.e. At the end of _open, replace: return uhandle with: try : from contextlib import closing return closing(uhandle) except ImportError : return uhandle (I haven't tested this yet) Alternatively, we could add the __enter__ and __exit__ methods to the Bio.File.UndoHandle object instead (which would benefit any code using them, not just Bio.Entrez). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue Feb 10 05:28:08 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 02:28:08 -0800 (PST) Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <240911.28388.qm@web62402.mail.re1.yahoo.com> Message-ID: <831036.59343.qm@web62406.mail.re1.yahoo.com> I've converted these three tests to pure unittest-style tests. --Michiel --- On Tue, 2/3/09, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd > To: biopython-dev at biopython.org > Date: Tuesday, February 3, 2009, 7:21 AM > These three tests currently are written as a combination of > a unittest-based test and a print-and-compare test. That is, > they contain classes deriving from unittest.TestCase, but > then print out stuff that should get compared to the output > file. However, run_tests.py assumes that they are true > unittest-style tests, so the comparison is never done. > > Does anybody mind if I convert these three to pure > print-and-compare or pure unittest-style tests? test_Ace.py > and test_Nexus.py produce lots of output, so I'm tempted > to go with a print-and-compare test there; test_Phd.py might > work well as a unittest-style test. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Tue Feb 10 05:35:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Feb 2009 10:35:59 +0000 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <831036.59343.qm@web62406.mail.re1.yahoo.com> References: <240911.28388.qm@web62402.mail.re1.yahoo.com> <831036.59343.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00902100235m5dcd72e1reb9e4e7e0ea3b3e6@mail.gmail.com> On Tue, Feb 10, 2009 at 10:28 AM, Michiel de Hoon wrote: > I've converted these three tests to pure unittest-style tests. > > --Michiel Wow - generating all those assert lines must have taken some time (or a clever script)! The test_Nexus tearDown used to make sure the temp output files were removed. This is important on Windows which does not do this automatically. I see you now allocate "random" filenames using tempfile.NamedTemporaryFile(...) so presumably we would need to record these so that the tearDown method knows what temp files to remove. Peter From mjldehoon at yahoo.com Tue Feb 10 06:25:13 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 03:25:13 -0800 (PST) Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <320fb6e00902100235m5dcd72e1reb9e4e7e0ea3b3e6@mail.gmail.com> Message-ID: <366127.53671.qm@web62408.mail.re1.yahoo.com> > The test_Nexus tearDown used to make sure the temp output > files were removed. This is important on Windows which > does not do this automatically. I see you now allocate > "random" filenames using > tempfile.NamedTemporaryFile(...) so presumably we would > need to record these so that the tearDown method knows > what temp files to remove. >From reading the Python documentation, the file created by tempfile.NamedTemporaryFile is removed automatically when the file handle is closed, even on Windows. --Michiel From bartek at rezolwenta.eu.org Tue Feb 10 07:21:41 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 10 Feb 2009 13:21:41 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <4990A27A.9060500@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> Message-ID: <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> Hi, On Mon, Feb 9, 2009 at 10:39 PM, Bruce Southey wrote: > Python PEP 0374 "Migrating from svn to a distributed VCS" makes some points. > http://www.python.org/dev/peps/pep-0374/ Excellent link. Shows quite thorough comparison between possible DVCSs. And also that there are more people thinking about switching to a DVCS because SVN is not much better than CVS. The worrying part for me were the benchmarks showing that bzr is lagging behing mercurial and git in terms of speed. They mention that the benchmarks were done with an old version of bzr and there seems to be quite a lot of work on bzr performance so I decided to see how it works with actual biopython tree and current bzr and below are my first impressions. An important note here is that I'm not experienced in converting fairly large projects from CVS to any DVCS and what I've done might not be an optimal setup. I've taken the whole biopython CVS tree with complete version history (~3500 commits) and converted it to bzr branch using tailor. It took about 2-3 hours, but it needs to be done only once. The nice thing about tailor is that it gives you a directory structure with both bzr and cvs files so it can be later used for commiting stuff back to the CVS tree as well as getting new changes from CVS. Once I had that, I could publish my private branch of biopython to launchpad (it took about 10s). Now, if anyone is interested in test-driving bazaar+launchpad with biopython, he/she can just branch it to your own computer (you don't need any account for that, just bzr installed): bzr branch lp:~bartek/junk/biopython I did that (branch) on a different computert (~2min). Now one can start modifying code. I've done some changes to the Bio.Motif code (add a method, commit locally, fix a small bug in it, commit again, test) and pushed the changes to the branch on launchpad. Commits are quick (~3s), push takes about a minute, but this is including a scan of the whole tree, so it should not take much longer than this for bigger changes. Note:This is my own branch, so I can commit to it, but if I was not the owner (or maintainer) of the branch, I would have to either send my changes to the maintainer or publish my branch and let him "pull" from it. I realised later that I've accidentaly added a large directory during tailor conversion, so I removed it in the original bzr branch (as made by tailor) merged it with the changes pushed already to launchpad from somewhere else (Motif) and pushed the resulting tree back to launchpad.The removal was very fast (~5s) and the push took about the same time as with the small change.The good thing is that the history of all changes is retained. If anyone wants to give it a try, just install bzr and you can easily branch from me using: bzr branch lp:~bartek/junk/biopython The branch history can be seen here: https://code.launchpad.net/~bartek/+junk/biopython/ And the annotated source code is here: http://bazaar.launchpad.net/~bartek/+junk/biopython/files The specific changes done by me can be seen as revisions: http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3460 http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.1 http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.2 In summary, I think that it's doable to convert current CVS tree to bzr and bzr handle the job of a DVCS. Performance is not stellar (epsecially code browsing in launchpad is sometimes slow) but for it's acceptable, especially given that I'm rarely browsing the history, and much more often use command line tools which are (for me) fast enough. Please let me know what others think. If there will be general interest in that, I can try to set up a more permanent (but still experimental) bzr branch which would be automatically synchronized from CVS, so that we can do a more long-term experiment to see whether it works, and people like it. cheers Bartek From biopython at maubp.freeserve.co.uk Tue Feb 10 08:26:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Feb 2009 13:26:19 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> Message-ID: <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> > I've taken the whole biopython CVS tree with complete version history > (~3500 commits) and converted it to bzr branch using tailor. It took > about 2-3 hours, but it needs to be done only once. Did you do that from the public Biopython CVS server to your machine? If so, its nice to know that step isn't too slow. > In summary, I think that it's doable to convert current CVS tree to bzr and > bzr handle the job of a DVCS. Performance is not stellar (epsecially code > browsing in launchpad is sometimes slow) but for it's acceptable, especially > given that I'm rarely browsing the history, and much more often use command > line tools which are (for me) fast enough. > > Please let me know what others think. If there will be general interest > in that, I can try to set up a more permanent (but still experimental) bzr > branch which would be automatically synchronized from CVS, so that > we can do a more long-term experiment to see whether it works, and > people like it. Have you got a feel for whether it would be easier to sync CVS and bzr, or SVN and bzr? I personally would be more interested in an automatically synchronized git repository (rather than bzr), but this is not a thoroughly researched opinion. As you pointed out, the poor bzr benchmark speeds may not be so bad in the latest code - although the Biopython code base is not so big that this really matters. Peter From bartek at rezolwenta.eu.org Tue Feb 10 09:43:23 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 10 Feb 2009 15:43:23 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> Message-ID: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Hi, On Tue, Feb 10, 2009 at 2:26 PM, Peter wrote: >> I've taken the whole biopython CVS tree with complete version history >> (~3500 commits) and converted it to bzr branch using tailor. It took >> about 2-3 hours, but it needs to be done only once. > > Did you do that from the public Biopython CVS server to your machine? > If so, its nice to know that step isn't too slow. > You can do it using any cvs repository, but doing it over the network slows it down. I got bored so I downloaded the actual CVS repo from dev.open-bio.org:/home/repository/biopython The 2-3 hours is for conversion from a local repository which was a copy of the original biopython one. But once it is done you have a directory tree which has both CVS and .bzr entries, so you can use it for synchronization. > > Have you got a feel for whether it would be easier to sync CVS and > bzr, or SVN and bzr? > The tool I used (tailor) works with all VCS systems out there. Also launchpad is able to update a branch form either cvs or svn main repository. So there should be no difference, apart from one migration (CVS->SVN) more. > I personally would be more interested in an automatically synchronized > git repository (rather than bzr), but this is not a thoroughly > researched opinion. ?As you pointed out, the poor bzr benchmark speeds > may not be so bad in the latest code - although the Biopython code > base is not so big that this really matters. > when it comes to git, I have to say that I'm not really experienced, but my current understanding of the possibilities is as follows: I don't know about any service to _automaticaly_ synchronize CVS (or SVN) repo with git. There is git-svn, so if we move to SVN, we can set up a git repository and write some scripts around git-svn to have it synchronized with SVN trunk. Then, if we want to host it, we need to start a git-server on dev.open-bio.org or use the free account on github. It has a limit of 100mb and current biopython CVS tree is 57Mb, so we can go with it for a while, but I'm not sure if I would recommend it. But for sure there are people more experienced with git than me on the list, so we may hear about better options. cheers Bartek From argriffi at ncsu.edu Tue Feb 10 09:53:38 2009 From: argriffi at ncsu.edu (alex) Date: Tue, 10 Feb 2009 09:53:38 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Message-ID: <499194F2.3020906@ncsu.edu> > It has a limit of 100mb and current biopython CVS tree is 57Mb, so we can go with it for > a while, but I'm not sure if I would recommend it. According to github, "The 100MB is a soft limit setup to prevent abuse of the service. If your open source project needs more space, email us , we're happy to provide it." Biopython is an obviously legitimate project so you could probably get more space. Alex From bsouthey at gmail.com Tue Feb 10 10:29:15 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 10 Feb 2009 09:29:15 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Message-ID: <49919D4B.9060305@gmail.com> Bartek Wilczynski wrote: > Hi, > > On Tue, Feb 10, 2009 at 2:26 PM, Peter wrote: > >>> I've taken the whole biopython CVS tree with complete version history >>> (~3500 commits) and converted it to bzr branch using tailor. It took >>> about 2-3 hours, but it needs to be done only once. >>> >> Did you do that from the public Biopython CVS server to your machine? >> If so, its nice to know that step isn't too slow. >> >> > You can do it using any cvs repository, but doing it over the network > slows it down. > I got bored so I downloaded the actual CVS repo from > dev.open-bio.org:/home/repository/biopython > The 2-3 hours is for conversion from a local repository which was a > copy of the > original biopython one. But once it is done you have a directory tree > which has both > CVS and .bzr entries, so you can use it for synchronization. > > > >> Have you got a feel for whether it would be easier to sync CVS and >> bzr, or SVN and bzr? >> >> > The tool I used (tailor) works with all VCS systems out there. Also launchpad > is able to update a branch form either cvs or svn main repository. So > there should be > no difference, apart from one migration (CVS->SVN) more. > > >> I personally would be more interested in an automatically synchronized >> git repository (rather than bzr), but this is not a thoroughly >> researched opinion. As you pointed out, the poor bzr benchmark speeds >> may not be so bad in the latest code - although the Biopython code >> base is not so big that this really matters. >> >> > > when it comes to git, I have to say that I'm not really experienced, > but my current understanding of > the possibilities is as follows: > I don't know about any service to _automaticaly_ synchronize CVS (or > SVN) repo with git. > There is git-svn, so if we move to SVN, we can set up a git repository > and write some scripts > around git-svn to have it synchronized with SVN trunk. Then, if we > want to host it, > we need to start a git-server on dev.open-bio.org or use the free > account on github. It has a limit of > 100mb and current biopython CVS tree is 57Mb, so we can go with it for > a while, but I'm not sure if > I would recommend it. > > But for sure there are people more experienced with git than me on the > list, so we may hear about better options. > > cheers > Bartek > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, Thanks for doing all of this as it is very very interesting work! I am not experienced in either nor do I have a preference. But I appreciate the various comments on this as it makes it clear that Biopython needs to go to one of these systems. I came across various blogs that show the newest versions are bzr are much faster than the old versions. So nothing like good old competition! Here is one link via Google that shows various actions between git and bzr (a link to a similar comparison between git, bzr and Mercurial is at the bottom of that link): http://laserjock.wordpress.com/2008/05/08/git-and-bzr-historical-performance-comparison/ There are options to convert between bzr and git (like tailor as well as plugins). Also bzr-svn (http://bazaar-vcs.org/BzrForeignBranches/Subversion) and git-svn (http://www.kernel.org/pub/software/scm/git/docs/git-svn.html ) allow you to connect directly to Subversion repositories. From my brief reading, I think these are (or meant to be) bidirectional but the cvs support is somewhat limited. Echoing Chris's comment, should we even bother with svn at all? Obviously going to git or bzr or hg, svn is not necessary but in the short term it could used as a transition towards one of these. Possible uses of the svn would be maintaining the official repository of the current release plus important bug fixes, a sort of staging tree that all new code should build against. Bruce From biopython at maubp.freeserve.co.uk Tue Feb 10 10:31:26 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Feb 2009 15:31:26 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499194F2.3020906@ncsu.edu> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> <499194F2.3020906@ncsu.edu> Message-ID: <320fb6e00902100731r7c931837o699e36b903dfd48c@mail.gmail.com> On Tue, Feb 10, 2009 at 2:53 PM, alex wrote: >> It has a limit of 100mb and current biopython CVS tree is 57Mb, so we can >> go with it for a while, but I'm not sure if I would recommend it. > > According to github, > "The 100MB is a soft limit setup to prevent abuse of the service. If your > open source project needs more space, email us , > we're happy to provide it." > > Biopython is an obviously legitimate project so you could probably get more > space. In the long term, assuming we do want an official git repository, I would be happier if we could just host it on biopython.org - this shouldn't be a technical problem, but would require discussion with the OBF team (e.g. opening ports, and who gets to look after the service, backing it up etc). That doesn't prevent a proof of concept using github. Peter From bugzilla-daemon at portal.open-bio.org Tue Feb 10 11:20:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 11:20:33 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902101620.n1AGKXmN008289@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #4 from eric.talevich at gmail.com 2009-02-10 11:20 EST ------- (In reply to comment #3) > Alternatively, we could add the __enter__ and __exit__ methods to the > Bio.File.UndoHandle object instead (which would benefit any code using them, > not just Bio.Entrez). You're right, that does what I wanted. This bug is just an enhancement to make the Entrez code work more like modern Python, not anything breaking current code -- my example is what I wished I could have written a few weeks ago when I first tried out Bio.Entrez. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Feb 10 11:43:32 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 10 Feb 2009 17:43:32 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Message-ID: <5aa3b3570902100843q652adfcbw277565f0a1c95690@mail.gmail.com> On Tue, Feb 10, 2009 at 3:43 PM, Bartek Wilczynski wrote: > Hi, > when it comes to git, I have to say that I'm not really experienced, In github, for every repository there is a button to create a fork and automatically add it to to your own space. Look at the image in this post: - http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo Is there something similar with launchpad? Or is it planned to be? Moreover, in github there are many tools that shows the ramifications of all the repositories coming from the original one, with a very nice view (it's this link, again:http://github.com/blog/39-say-hello-to-the-network-graph-visualizer) Let's say I fork your repository as you explained: how would you do to keep track of all the forks originated from your repository? Will you get notified that I have forked your repo? By the way, do you have any clue on how to configure bazaar under a proxy? :) > but my current understanding of > the possibilities is as follows: > I don't know about any service to _automaticaly_ synchronize CVS (or > SVN) repo with git. I don't know, but maybe the bioruby developers already know how to do it already. Thank you for all the posts and discussions above.. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Tue Feb 10 11:59:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 11:59:17 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902101659.n1AGxHqO013821@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #5 from eric.talevich at gmail.com 2009-02-10 11:59 EST ------- Created an attachment (id=1226) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1226&action=view) Add __enter__ and __exit__ to UndoHandle Should SGMLHandle also get these methods? They'd be identical. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 10 19:48:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 19:48:43 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902110048.n1B0mhKl028339@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2009-02-10 19:48 EST ------- I committed your patch to CVS; thanks for contributing. > Should SGMLHandle also get these methods? They'd be identical. In principle, yes, but SGMLHandle is currently not used anywhere in Biopython, and I wouldn't be surprised if it is removed from Biopython in a future release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue Feb 10 20:06:38 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 17:06:38 -0800 (PST) Subject: [Biopython-dev] docstring tests Message-ID: <787613.22831.qm@web62407.mail.re1.yahoo.com> Hi everybody, I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: test_seq ... ok test_translate ... ok test_trie ... ok test_triefind ... ok Bio.Seq docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqIO docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.AlignIO docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok ---------------------------------------------------------------------- Ran 107 tests in 97.191 seconds Previously, this code was in test_docstrings.py, but it's easier to do this from run_tests.py because doctest can create a unittest-style test suite directly. This also means that if your module contains docstring tests, you should include the module name to DOCTEST_MODULES near the top of run_tests.py (instead of to test_docstrings.py). I've uploaded the new run_tests.py to CVS so people can try it, but we can revert to the previous version of run_tests.py if preferred. If there are no issues with this approach, we can remove test_docstrings.py. --Michiel. From mjldehoon at yahoo.com Wed Feb 11 00:51:00 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 21:51:00 -0800 (PST) Subject: [Biopython-dev] Updated the documentation of the Biopython testing framework Message-ID: <572067.89768.qm@web62403.mail.re1.yahoo.com> Hi everybody, I've updated the section in the tutorial about the Biopython testing framework. This description includes the examples that were previously in Doc/cookbook/biopython_test. I haven't uploaded this to CVS yet, but the HTML version of the tutorial is viewable here: http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html If there are no objections, I'll upload the new tutorial to CVS. --Michiel. From bartek at rezolwenta.eu.org Wed Feb 11 03:52:05 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 11 Feb 2009 09:52:05 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499194F2.3020906@ncsu.edu> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> <499194F2.3020906@ncsu.edu> Message-ID: <8b34ec180902110052m440b030as7fcf2edab6fbbd7d@mail.gmail.com> On Tue, Feb 10, 2009 at 3:53 PM, alex wrote: > According to github, > "The 100MB is a soft limit setup to prevent abuse of the service. If your > open source project needs more space, email us , > we're happy to provide it." > > Biopython is an obviously legitimate project so you could probably get more > space. Oh, that's cool. I didn't know that. So I guess our job now is to evaluate both github and launchpad (maybe by trying them for a period of time) and see which seems to suit our needs better. Competition is always good :) cheers Bartek From bartek at rezolwenta.eu.org Wed Feb 11 04:11:57 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 11 Feb 2009 10:11:57 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <5aa3b3570902100843q652adfcbw277565f0a1c95690@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> <5aa3b3570902100843q652adfcbw277565f0a1c95690@mail.gmail.com> Message-ID: <8b34ec180902110111h388e6dabq12897c181a5a02b3@mail.gmail.com> Hi, On Tue, Feb 10, 2009 at 5:43 PM, Giovanni Marco Dall'Olio wrote: > In github, for every repository there is a button to create a fork and > automatically add it to to your own space. > Look at the image in this post: > - http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo > Is there something similar with launchpad? Or is it planned to be? > No. As far as I know you need to branch using bzr to your machine and then you need to push it. It looks like this: bzr branch lp:branch-name then you get a local repo with all version history. Now you can register your branch to your launchpad account: bzr push lp:~username/project/branch In principle, you would do that only if you want other people to access it. Normally, I would publish only branches which have been modified in some meaningful way, and to modify it, you need to download it to your computer anyway... But having such a button might make it easier for people to branch and stimulate contributions. In launchpad, instead of a button you have the exact command printed on the page so that you can paste it into your console. > Moreover, in github there are many tools that shows the ramifications > of all the repositories coming from the original one, with a very nice > view (it's this link, > again:http://github.com/blog/39-say-hello-to-the-network-graph-visualizer) > Yeah, It's nice to see all stuff that's going on in all branches forked from the trunk. In launchpad, you have only a list of branches already submitted for merging. I think it's again a different philosphy, which reduces the amount of information to process for maintainers (you only see mature changes submitted for merging into the trunk) but you might miss all the stuff which was not submitted (and which you would see in github). It's hard to predict how many branches with active development we will have in BioPython, but generally I think the more info we have the better. > Let's say I fork your repository as you explained: how would you do to > keep track of all the forks originated from your repository? Will you > get notified that I have forked your repo? Not by default. I think you can be modified to some extent by plugins, but I have no experience here. Again, I don't think that tracking _all_ branches is necessary (and sometimes simply not possible: people can branch anonymously) but having some statistics on how many times a project was branched (i.e. downloaded) could be interesting. > > By the way, do you have any clue on how to configure bazaar under a proxy? :) > I'm not sure what do you mean. Are you behind a http proxy? If you are a registered user of launchpad, the communication is done via ssh, so there should be no problem. I don't know if there are any problems with using launchpad anonymously. What is your setup and what fails? >> but my current understanding of >> the possibilities is as follows: >> I don't know about any service to _automaticaly_ synchronize CVS (or >> SVN) repo with git. > > I don't know, but maybe the bioruby developers already know how to do > it already. According to their website, they just switched to Github. CVS is not synchronized (hasn't been updated for more than 6 months), but they might now about tools. cheers Bartek From dalloliogm at gmail.com Wed Feb 11 04:16:29 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 11 Feb 2009 10:16:29 +0100 Subject: [Biopython-dev] docstring tests In-Reply-To: <787613.22831.qm@web62407.mail.re1.yahoo.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> Message-ID: <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> On Wed, Feb 11, 2009 at 2:06 AM, Michiel de Hoon wrote: > Hi everybody, > > I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: thanks As for the doctests, it would be really be useful to define some global fixtures. It will reduce the docstrings lengths and we won't have to repeat all the examples in every method of every function, we won't have to repeat the 'from Bio import ..' in every test and so on. How this could be implemented in this test framework? > > test_seq ... ok > test_translate ... ok > test_trie ... ok > test_triefind ... ok > Bio.Seq docstring test ... ok > Bio.SeqRecord docstring test ... ok > Bio.SeqIO docstring test ... ok > Bio.Align.Generic docstring test ... ok > Bio.AlignIO docstring test ... ok > Bio.KEGG.Compound docstring test ... ok > Bio.KEGG.Enzyme docstring test ... ok > Bio.Wise docstring test ... ok > Bio.Wise.psw docstring test ... ok > Bio.Statistics.lowess docstring test ... ok > ---------------------------------------------------------------------- > Ran 107 tests in 97.191 seconds > > > Previously, this code was in test_docstrings.py, but it's easier to do this from run_tests.py because doctest can create a unittest-style test suite directly. This also means that if your module contains docstring tests, you should include the module name to DOCTEST_MODULES near the top of run_tests.py (instead of to test_docstrings.py). > > I've uploaded the new run_tests.py to CVS so people can try it, but we can revert to the previous version of run_tests.py if preferred. If there are no issues with this approach, we can remove test_docstrings.py. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 11 05:48:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 10:48:18 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> Message-ID: <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> > On Wed, Feb 11, 2009 at 2:06 AM, Michiel de Hoon wrote: >> Hi everybody, >> >> I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: That's nice, and arguably better than the test_docstring.py solution. In the medium/long term we may want to switch to an automatic detection of these tests. Do you think we need a way to run just the doctests? Before we could do "python run_tests.py test_docstring.py" to do this. On Wed, Feb 11, 2009 at 9:16 AM, Giovanni Marco Dall'Olio wrote: > As for the doctests, it would be really be useful to define some > global fixtures. > It will reduce the docstrings lengths and we won't have to repeat all > the examples in every method of every function, we won't have to > repeat the 'from Bio import ..' in every test and so on. > How this could be implemented in this test framework? I disagree here. If you have any import statements (or any other global fixtures) missing from the individual docstring examples this reduces their value as documentation (you have to explain somewhere what is missing, and make sure the user knows this). Including any required import statements is only a small overhead for the person writing the doctest examples and it makes the examples self contained (which I think is important for documentation). I can understand for example in NumPy that they might have a global "import numpy as np" done implicitly in all their tests, but they have a very flat namespace where this same line would otherwise be repeated for every single doctest. This is one "magic line of code" which would be the same for all the examples, and omitting it is more justified. Peter From dalloliogm at gmail.com Wed Feb 11 06:14:10 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 11 Feb 2009 12:14:10 +0100 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> Message-ID: <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> On Wed, Feb 11, 2009 at 11:48 AM, Peter wrote: >> On Wed, Feb 11, 2009 at 2:06 AM, Michiel de Hoon wrote: >>> Hi everybody, >>> >>> I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: > > That's nice, and arguably better than the test_docstring.py solution. > In the medium/long term we may want to switch to an automatic > detection of these tests. > > Do you think we need a way to run just the doctests? Before we could > do "python run_tests.py test_docstring.py" to do this. > > On Wed, Feb 11, 2009 at 9:16 AM, Giovanni Marco Dall'Olio > wrote: >> As for the doctests, it would be really be useful to define some >> global fixtures. >> It will reduce the docstrings lengths and we won't have to repeat all >> the examples in every method of every function, we won't have to >> repeat the 'from Bio import ..' in every test and so on. >> How this could be implemented in this test framework? > > I disagree here. If you have any import statements (or any other > global fixtures) missing from the individual docstring examples this > reduces their value as documentation (you have to explain somewhere > what is missing, and make sure the user knows this). Including any > required import statements is only a small overhead for the person > writing the doctest examples and it makes the examples self contained > (which I think is important for documentation). On the long run, it will be hard without fixtures: imagine, for example, the docs in BioSQL, where you will have to put the instructions to create a new database in every method's docstring of every class. I think it is not too bad to face the problem now, before it is too late, and at least give a general infrastructure for how doctest's fixtures will have to be in the future. > I can understand for example in NumPy that they might have a global > "import numpy as np" done implicitly in all their tests, but they have > a very flat namespace where this same line would otherwise be repeated > for every single doctest. This is one "magic line of code" which > would be the same for all the examples, and omitting it is more > justified. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 11 06:29:29 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 11:29:29 +0000 Subject: [Biopython-dev] Updated the documentation of the Biopython testing framework In-Reply-To: <572067.89768.qm@web62403.mail.re1.yahoo.com> References: <572067.89768.qm@web62403.mail.re1.yahoo.com> Message-ID: <320fb6e00902110329s4c84dab8w9120cf480fd84437@mail.gmail.com> On Wed, Feb 11, 2009 at 5:51 AM, Michiel de Hoon wrote: > Hi everybody, > > I've updated the section in the tutorial about the Biopython testing framework. > This description includes the examples that were previously in > Doc/cookbook/biopython_test. I haven't uploaded this to CVS yet, but the > HTML version of the tutorial is viewable here: > > http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html > > If there are no objections, I'll upload the new tutorial to CVS. > > --Michiel. In the unittest example could you add simple docstrings, so that the printed output is nicer? Otherwise thus far I have only skimmed the content, it looks good. Peter From biopython at maubp.freeserve.co.uk Wed Feb 11 08:16:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 13:16:06 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> Message-ID: <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> On Wed, Feb 11, 2009 at 11:14 AM, Giovanni Marco Dall'Olio wrote: > On the long run, it will be hard without fixtures: imagine, for > example, the docs in BioSQL, where you will have to put the > instructions to create a new database in every method's docstring of > every class. BioSQL is a special case - we can't have doctests which will work on every machine unless the user has installed particular database (e.g. MySQL), using particular database names, usernames and passwords. So I don't think we need to worry about doctests for BioSQL - because of its nature. For other complicated modules, you could just put one complex multi-part example in the main docstring, and not have individual doctests in each method (if doing so would require a lot of setup code each time). > I think it is not too bad to face the problem now, before it is too > late, and at least give a general infrastructure for how doctest's > fixtures will have to be in the future. It won't be too late - if we continue to write effectively "stand alone" doctests in each docstring, then if at some point we do need more infrastructure to support more complicated doctests, the old simple doctests will still work fine. I think you are inventing unneeded work here. Also if we do add something complicated or non-standard, it makes it harder if later on we do ever want to switch test frameworks (e.g. to nose). Peter From dalke at dalkescientific.com Wed Feb 11 09:25:07 2009 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 11 Feb 2009 15:25:07 +0100 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> Message-ID: <5B539D3F-26E0-4260-B98C-C85419B3D427@dalkescientific.com> On Feb 11, 2009, at 2:16 PM, Peter wrote: > BioSQL is a special case - we can't have doctests which will work on > every machine unless the user has installed particular database (e.g. > MySQL), using particular database names, usernames and passwords. Python comes with SQLite. The distribution could ship/install a small test database with a known schema. Andrew dalke at dalkescientific.com From bsouthey at gmail.com Wed Feb 11 09:34:24 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Feb 2009 08:34:24 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <787613.22831.qm@web62407.mail.re1.yahoo.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> Message-ID: <4992E1F0.1000807@gmail.com> Michiel de Hoon wrote: > Hi everybody, > > I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: > > test_seq ... ok > test_translate ... ok > test_trie ... ok > test_triefind ... ok > Bio.Seq docstring test ... ok > Bio.SeqRecord docstring test ... ok > Bio.SeqIO docstring test ... ok > Bio.Align.Generic docstring test ... ok > Bio.AlignIO docstring test ... ok > Bio.KEGG.Compound docstring test ... ok > Bio.KEGG.Enzyme docstring test ... ok > Bio.Wise docstring test ... ok > Bio.Wise.psw docstring test ... ok > Bio.Statistics.lowess docstring test ... ok > ---------------------------------------------------------------------- > Ran 107 tests in 97.191 seconds > > > Previously, this code was in test_docstrings.py, but it's easier to do this from run_tests.py because doctest can create a unittest-style test suite directly. This also means that if your module contains docstring tests, you should include the module name to DOCTEST_MODULES near the top of run_tests.py (instead of to test_docstrings.py). > > I've uploaded the new run_tests.py to CVS so people can try it, but we can revert to the previous version of run_tests.py if preferred. If there are no issues with this approach, we can remove test_docstrings.py. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, I ran the latest CVS version through my Python versions on Linux. All appear to pass for Python 2.5 (with and without Numpy) and 2.6. BUT Python 2.4 has an error with docstring tests so it crashes (output below): File "run_tests.py", line 263, in runDocTest module = __import__(name, fromlist=name.split(".")) TypeError: __import__() takes no keyword arguments There is also one failure with Python 2.3 which does not test docstrings: ====================================================================== ERROR: Test Nexus module ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Nexus.py", line 92, in test_NexusTest1 self.assertTrue('codons' in n.charpartitions) AttributeError: 'NexusTest1' object has no attribute 'assertTrue' ---------------------------------------------------------------------- Ran 97 tests in 78.412 seconds Bruce [bsouthey at starling biopython]$ python2.4 setup.py test running test test_Ace ... ok test_AlignIO ... ok test_BioSQL ... skipping. Enter your settings in Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). test_BioSQL_SeqIO ... skipping. Enter your settings in Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). test_CAPS ... ok test_Clustalw ... ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use Bio.Clustalw. test_Cluster ... ok test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_EmbossPrimer ... ok test_Entrez ... ok test_Enzyme ... ok test_FSSP ... ok test_Fasta ... ok test_Fasta2 ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GFF ... skipping. Environment is not configured for this test (not important if you do not plan to use Bio.GFF). test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF. test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_IsoelectricPoint ... ok test_KDTree ... ok test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LocationParser ... ok test_LogisticRegression ... ok test_MEME ... ok test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_NCBIStandalone ... ok test_NCBIXML ... ok test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PDB ... ok test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SProt ... ok test_SVDSuperimposer ... ok test_SeqIO ... ok test_SeqIO_online ... ok test_SeqUtils ... ok test_SubsMat ... ok test_UniGene ... ok test_Wise ... ok test_align ... ok test_docstrings ... ok test_geo ... ok test_interpro ... ok test_kNN ... ok test_lowess ... ok test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... ok test_prosite ... ok test_prosite2 ... ok test_psw ... ok test_seq ... ok test_translate ... ok test_trie ... ok test_triefind ... ok Traceback (most recent call last): File "/home/bsouthey/python/biopython_cvs/biopython/setup.py", line 418, in ? data_files=DATA_FILES, File "/usr/local/lib/python2.4/distutils/core.py", line 149, in setup dist.run_commands() File "/usr/local/lib/python2.4/distutils/dist.py", line 946, in run_commands self.run_command(cmd) File "/usr/local/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/home/bsouthey/python/biopython_cvs/biopython/setup.py", line 212, in run run_tests.main([]) File "run_tests.py", line 107, in main runner.run() File "run_tests.py", line 292, in run ok = self.runDocTest(test) File "run_tests.py", line 263, in runDocTest module = __import__(name, fromlist=name.split(".")) TypeError: __import__() takes no keyword arguments From biopython at maubp.freeserve.co.uk Wed Feb 11 09:41:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 14:41:07 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <5B539D3F-26E0-4260-B98C-C85419B3D427@dalkescientific.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> <5B539D3F-26E0-4260-B98C-C85419B3D427@dalkescientific.com> Message-ID: <320fb6e00902110641o4989f899xa8e6f3f51f5218f@mail.gmail.com> On Wed, Feb 11, 2009 at 2:25 PM, Andrew Dalke wrote: > On Feb 11, 2009, at 2:16 PM, Peter wrote: >> >> BioSQL is a special case - we can't have doctests which will work on >> every machine unless the user has installed particular database (e.g. >> MySQL), using particular database names, usernames and passwords. > > Python comes with SQLite. The distribution could ship/install > a small test database with a known schema. Python 2.5+ comes with SQLite, but there isn't (yet) a BioSQL schema for it. That would be nice though, and could make running Biopython and BioSQL easier. Peter From biopython at maubp.freeserve.co.uk Wed Feb 11 09:45:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 14:45:16 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <4992E1F0.1000807@gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> Message-ID: <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> > There is also one failure with Python 2.3 which does not test docstrings: > > ====================================================================== > ERROR: Test Nexus module > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_Nexus.py", line 92, in test_NexusTest1 > ? self.assertTrue('codons' in n.charpartitions) > AttributeError: 'NexusTest1' object has no attribute 'assertTrue' That is because the unittest assertTrue is only available on python 2.4+, so we should add a quick workaround with a note that this can be simplified once we drop Python 2.3 support. Peter From bsouthey at gmail.com Wed Feb 11 10:16:41 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Feb 2009 09:16:41 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> Message-ID: <4992EBD9.5040403@gmail.com> Peter wrote: >> There is also one failure with Python 2.3 which does not test docstrings: >> >> ====================================================================== >> ERROR: Test Nexus module >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "test_Nexus.py", line 92, in test_NexusTest1 >> self.assertTrue('codons' in n.charpartitions) >> AttributeError: 'NexusTest1' object has no attribute 'assertTrue' >> > > That is because the unittest assertTrue is only available on python > 2.4+, so we should add a quick workaround with a note that this can be > simplified once we drop Python 2.3 support. > > Peter > I think these (as there are more than one) should be using failUnless instead: self.failUnless('codons' in n.charpartitions) From the docstring | failUnless(self, expr, msg=None) | Fail the test unless the expression is true. Bruce From biopython at maubp.freeserve.co.uk Wed Feb 11 10:52:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 15:52:05 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <4992EBD9.5040403@gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> <4992EBD9.5040403@gmail.com> Message-ID: <320fb6e00902110752i22c06b94te829641abd68ee65@mail.gmail.com> On Wed, Feb 11, 2009 at 3:16 PM, Bruce Southey wrote: >> That is because the unittest assertTrue is only available on python >> 2.4+, so we should add a quick workaround with a note that this can be >> simplified once we drop Python 2.3 support. > > I think these (as there are more than one) should be using failUnless > instead: > self.failUnless('codons' in n.charpartitions) Actually, from further reading, I think we should really be using assert_ (it would have been called assert, but this is a reserved word, so add a trailing underscore as per PEP8). The variants assertTrue and assertFalse were added to match JUnit. See: http://bugs.python.org/issue2249 Fixed in CVS to use assert_ instead of assertTrue. Peter From bsouthey at gmail.com Wed Feb 11 11:10:41 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Feb 2009 10:10:41 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110752i22c06b94te829641abd68ee65@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> <4992EBD9.5040403@gmail.com> <320fb6e00902110752i22c06b94te829641abd68ee65@mail.gmail.com> Message-ID: <4992F881.9000206@gmail.com> Peter wrote: > On Wed, Feb 11, 2009 at 3:16 PM, Bruce Southey wrote: > >>> That is because the unittest assertTrue is only available on python >>> 2.4+, so we should add a quick workaround with a note that this can be >>> simplified once we drop Python 2.3 support. >>> >> I think these (as there are more than one) should be using failUnless >> instead: >> self.failUnless('codons' in n.charpartitions) >> > > Actually, from further reading, I think we should really be using > assert_ (it would have been called assert, but this is a reserved > word, so add a trailing underscore as per PEP8). The variants > assertTrue and assertFalse were added to match JUnit. See: > http://bugs.python.org/issue2249 > > Fixed in CVS to use assert_ instead of assertTrue. > > Peter > Okay The tests pass or skipped the tests that require Python 2.4+ for Python2.3 with the final message as expected : 'Docstring tests require Python 2.4 or later; skipping' Thanks Bruce From biopython at maubp.freeserve.co.uk Wed Feb 11 18:00:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 23:00:25 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <4992E1F0.1000807@gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> Message-ID: <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> On Wed, Feb 11, 2009 at 2:34 PM, Bruce Southey wrote: > ... Python 2.4 has an error with docstring tests so it crashes (output below): > File "run_tests.py", line 263, in runDocTest > module = __import__(name, fromlist=name.split(".")) > TypeError: __import__() takes no keyword arguments Fixed in run_tests.py CVS revision 1.22, using ordered arguments instead. This now works on Python 2.4. For Python 2.3 we skip the doctests anyway so this doesn't matter. Peter From biopython at maubp.freeserve.co.uk Thu Feb 12 06:49:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 12 Feb 2009 11:49:53 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> Message-ID: <320fb6e00902120349u7898e2bate126208c837be913@mail.gmail.com> Hi Michiel (and everyone else), I was wondering about how the doctests are currently integrated into run_tests.py, and wondered if this patch makes things more concise? This patch is against run_tests.py CVS revision 1.22, essentially it adds the doctest modules to the list of tests - rather than as a separate list. The code becomes slightly shorter, but I am not sure if this is actually clearer or not. Note - this does not address the issue of how to run just the doctests - something I think is very useful when working on them. Peter $ diff run_tests.py run_tests2.py 209,211c209 < if self.tests: < self.doctest_modules = [] < else: --- > if not self.tests: 218c216,222 < self.doctest_modules = DOCTEST_MODULES --- > if sys.version_info[:2] < (2, 4): > #On python 2.3, doctest uses slightly different formatting > #which would be a problem as the expected output won't match. > #Also, it can't cope with in a doctest string. > sys.stderr.write("Skipping doctests which require Python 2.4+\n") > else : > self.tests.extend(DOCTEST_MODULES) 234,240c238,253 < module = __import__(name) < suite = unittest.TestLoader().loadTestsFromModule(module) < if suite.countTestCases()==0: < # This is a print-and-compare test instead of a unittest- < # type test. < test = ComparisonTestCase(name, output) < suite = unittest.TestSuite([test]) --- > if "." in name : > #Its a doc test > #Can't use fromlist=name.split(".") until python 2.5+ > module = __import__(name, None, None, name.split(".")) > suite = doctest.DocTestSuite(module) > del module > else : > #Its a unittest (or a print-and-compare test) > suite = unittest.TestLoader().loadTestsFromName(name) > if suite.countTestCases()==0: > # This is a print-and-compare test instead of a > # unittest-type test. > test = ComparisonTestCase(name, output) > suite = unittest.TestSuite([test]) 263,277d275 < def runDocTest(self, name): < #Can't use fromlist=name.split(".") until python 2.5+ < module = __import__(name, None, None, name.split(".")) < sys.stderr.write("%s docstring test ... " % module.__name__) < suite = doctest.DocTestSuite(module) < result = self._makeResult() < suite.run(result) < if result.wasSuccessful(): < sys.stderr.write("ok\n") < return True < else: < sys.stderr.write("FAIL\n") < result.printErrors() < return False < 287,297d284 < if sys.version_info[:2] < (2, 4): < #On python 2.3, doctest uses slightly different formatting < #which would be a problem as the expected output won't match. < #Also, it can't cope with in a doctest string. < sys.stderr.write("Docstring tests require Python 2.4 or later; skipping\n") < else: < for test in self.doctest_modules: < ok = self.runDocTest(test) < if not ok: < failures += 1 < total += 1 From bugzilla-daemon at portal.open-bio.org Thu Feb 12 08:11:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:11:10 -0500 Subject: [Biopython-dev] [Bug 2759] New: Unit test for Bio.PDB.HSExposure Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2759 Summary: Unit test for Bio.PDB.HSExposure Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Prompted by looked at the example script hsexpo, I've written a unittest based test for the Bio.PDB.HSExposure. I haven't checked it in yet because it prints out looks of warnings to stderr about oddities in the PDB file. We could either add a clean PDB file to the examples, or do something with the stderr. Note that the print-and-compare style test_PDB.py deals this this itself. Perhaps run_tests.py should do something similar for the unittest based cases. See also Bug 2754 which would actually make Bio.PDB print even more warnings to stderr. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 08:12:58 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:12:58 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121312.n1CDCw8h028040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-12 08:12 EST ------- Created an attachment (id=1234) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1234&action=view) New unit test for Bio.PDB.HSExposure This does not cover the DSSP or residue depth calculation, as these require 3rd party tools (DSSP and MSMS) to be installed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 08:14:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:14:13 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121314.n1CDEDgM028550@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-12 08:14 EST ------- This may have implications for how we write further Bio.PDB unit tests, see Bug 2759. [I still agree that any warnings from Bio.PDB should go to stderr rather than stdout] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 08:36:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:36:26 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121336.n1CDaQfQ004937@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #2 from dalloliogm at gmail.com 2009-02-12 08:36 EST ------- (In reply to comment #1) > Created an attachment (id=1234) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1234&action=view) [details] > New unit test for Bio.PDB.HSExposure > > This does not cover the DSSP or residue depth calculation, as these require 3rd > party tools (DSSP and MSMS) to be installed. > Can I suggest you a small refactoring of the test unit? I would move all the asserts in setUp to different functions. Then, it would be good to put also the call to PDB.PDBStructure to a global fixture, to avoid to repeat it for every test. Moreover, I will generalize all the know values and put them as variables, so later you will be able to apply the same test to other files by just subclassing the test. Let me know what is your opinion... :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 08:37:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:37:44 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121337.n1CDbikM005433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #3 from dalloliogm at gmail.com 2009-02-12 08:37 EST ------- Created an attachment (id=1235) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1235&action=view) some of the suggestions I made in the previous comment -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 09:13:45 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 09:13:45 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121413.n1CEDjdZ017446@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-12 09:13 EST ------- (In reply to comment #2) > > Can I suggest you a small refactoring of the test unit? > I would move all the asserts in setUp to different functions. Maybe. They were really to check the file had loaded as I expected, so that the later tests are checking the residues I expect them to. > Then, it would be good to put also the call to PDB.PDBStructure to a global > fixture, to avoid to repeat it for every test. NO! That would be a very bad idea here. The HSExposure calls MODIFY the model passed to them, so for a clean test we NEED a fresh model each time. I suppose we could read the structure in once, and then make a copy for each sub-test, but I think it is clearer as it is. In general, having "global fixtures" is risky. The individual test methods may have side effects (like the changes to the residues in the model in this case), meaning that the overall behaviour will depend on the order the individual test methods are called in. One of the big benefits of using the unittest framework is that each test method is run in a clean known environment (compare this to our print-and-compare scripts, where this isn't the case). Using "global fixtures" shares objects between the individual tests and breaks this. The only good reason I can think of for having a global-setUp method (called once only) rather the current setUp method (called for each test method) is if the set up code is very slow. > Moreover, I will generalize all the know values and put them as variables, > so later you will be able to apply the same test to other files by just > subclassing the test. You would also have to extract the individual exposure scores. It would be simple to get these (and the residue names) as lists, and then check every single residue matches the expected values (rather than the short cut I used to just check the first few and the last few). We could also check any other chains in the structure (not just chain A). These changes are probably a good idea if we ever wanted to extend this unittest to try other PDB files as well, but seemed unnecessary for testing the basics of the Bio.PDB.HSExposure module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 09:31:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 09:31:59 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121431.n1CEVxhk023680@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #2 from eric.talevich at gmail.com 2009-02-12 09:31 EST ------- Created an attachment (id=1236) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1236&action=view) Print errors and warnings in Bio.PDB to sys.stderr I left the test scripts after "if __name__ == '__main__'" printing at stdout since those messages are meant to be the output of the script if it's run directly. There are some apparent debugging print statements in MMCIF2Dict, commented out. I didn't touch them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Feb 12 09:39:03 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 12 Feb 2009 08:39:03 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> Message-ID: <49943487.9010509@gmail.com> Peter wrote: > On Wed, Feb 11, 2009 at 2:34 PM, Bruce Southey wrote: > >> ... Python 2.4 has an error with docstring tests so it crashes (output below): >> File "run_tests.py", line 263, in runDocTest >> module = __import__(name, fromlist=name.split(".")) >> TypeError: __import__() takes no keyword arguments >> > > Fixed in run_tests.py CVS revision 1.22, using ordered arguments > instead. This now works on Python 2.4. For Python 2.3 we skip the > doctests anyway so this doesn't matter. > > Peter > Hi, Thanks! I just update from the cvs and all the tests currently pass on Linux for Python versions 2.3 (no doctests), 2.4, 2.5 (with and without numpy) and 2.6. Bruce From bugzilla-daemon at portal.open-bio.org Thu Feb 12 10:09:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:09:29 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121509.n1CF9TMo003270@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #5 from dalloliogm at gmail.com 2009-02-12 10:09 EST ------- (In reply to comment #4) > > Then, it would be good to put also the call to PDB.PDBStructure to a global > > fixture, to avoid to repeat it for every test. > > NO! That would be a very bad idea here. The HSExposure calls MODIFY the model > passed to them, so for a clean test we NEED a fresh model each time. Now I saw it, you're right! > > Moreover, I will generalize all the know values and put them as variables, > > so later you will be able to apply the same test to other files by just > > subclassing the test. > > You would also have to extract the individual exposure scores. It would be > simple to get these (and the residue names) as lists, and then check every > single residue matches the expected values (rather than the short cut I used to ok, I did it.. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 10:13:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:13:44 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121513.n1CFDiqm004524@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #6 from dalloliogm at gmail.com 2009-02-12 10:13 EST ------- Created an attachment (id=1237) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1237&action=view) proposal of refactoring for test_PDB I have refactored the test and moved all the known values into a separate variable. Now it should be very easy to test other pdb files and conditions: just subclass this test, and redefine the values of residue_number, pdb_filename, expected_values, etc... I left the setUpAll method as commented, as it doesn't harm nobody there... even if it was not commented, it wouldn't be executed from within the normal unittest framework (and from nose, it would just have been an execution more). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 10:14:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:14:25 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121514.n1CFEPYP004828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1235 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 10:14:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:14:33 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121514.n1CFEXbS004896@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #3 from bsouthey at gmail.com 2009-02-12 10:14 EST ------- (In reply to comment #2) > Created an attachment (id=1236) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1236&action=view) [details] > Print errors and warnings in Bio.PDB to sys.stderr (In reply to comment #1) > This may have implications for how we write further Bio.PDB unit tests, see Bug > 2759. > > [I still agree that any warnings from Bio.PDB should go to stderr rather than > stdout] > I believe that we should be using the using Python warnings module for these types of messages: http://docs.python.org/library/warnings.html This permits the user to have a greater control over the output and also allows redirecting the output as required. In the Bio directory, there are currently 36 and 25 uses of stderr and stdout, respectively. In terms of the patch, my limited understanding is that local import sys will override any global redirection of the output which in my opinion is a bad idea. Further it probably implications for the current test_PDB.py (greping stderr): test_PDB.py:14:# Redirect stderr so user does not see warnings test_PDB.py:37: # Class to hide stderr output test_PDB.py:94:old_stderr = sys.stderr test_PDB.py:95:# Hide stderr output for user test_PDB.py:96:sys.stderr=TheVoid() test_PDB.py:100: sys.stderr = old_stderr Also redirection is already being used by the PDB module (from greping): PDB/NACCESS.py:51: stdout = out.readlines() PDB/NACCESS.py:53: stderr = err.readlines() Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 10:27:04 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:27:04 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121527.n1CFR4Mj009728@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #4 from dalloliogm at gmail.com 2009-02-12 10:27 EST ------- (In reply to comment #3) > > I believe that we should be using the using Python warnings module for these > types of messages: > http://docs.python.org/library/warnings.html And what about the logging module? It allows configuration, personalization of the output, etc.. - http://docs.python.org/library/logging.html?highlight=logging#module-logging -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 11:01:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 11:01:10 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121601.n1CG1Avv022384@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1237 is|0 |1 obsolete| | ------- Comment #7 from dalloliogm at gmail.com 2009-02-12 11:01 EST ------- Created an attachment (id=1238) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1238&action=view) proposal of refactoring for test_PDB (fixed some errors) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 12:00:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 12:00:53 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121700.n1CH0r4G010835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #5 from eric.talevich at gmail.com 2009-02-12 12:00 EST ------- (In reply to comment #3-4) Using warnings.warn() sounds right. That module is used in other places in Biopython, but not in Bio.PDB yet. > And what about the logging module? > It allows configuration, personalization of the output, etc.. The logging module is probably overkill for a library, I think. It's very flexible, but the setup is kind of tedious, and generally an application using both Biopython and the logging module would figure out how to raise the warnings as exceptions, re-capture then, and log them in whatever customized way is needed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 13:08:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 13:08:01 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121808.n1CI81oJ031358@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #6 from eric.talevich at gmail.com 2009-02-12 13:08 EST ------- Created an attachment (id=1239) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1239&action=view) Use the warnings module for printing warnings I grepped Bio/PDB for stderr and replaced what looked like warning messages with calls to warnings.warn(). A couple of files need further attention: StructureBuilder.py: Every warning is protected by "if __debug__:", which seems like something the warning module itself should cover. PDBParser.py: Parsing exceptions are caught and passed to _handle_PDB_exception, which then decides whether to re-raise the exception or just issue a warning. The warnings module should be able to cover some of this functionality. There's also a feature to only show the first instance of the same warnings triggered by the same lines, which would make the output from parsing semi-malformed PDB files less annoying in permissive mode. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 14:14:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 14:14:12 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121914.n1CJECTH014825@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #7 from eric.talevich at gmail.com 2009-02-12 14:14 EST ------- (In reply to comment #6) Also, as Bruce and Peter implied may happen, this patch clobbers test_PDB.py. Some options: 1. Redirect stderr to stdout, and modify Tests/output/test_PDB to match again. 2. Change test_PDB.py to check the exceptions separately, maybe converting it to a unittest-style test in the process. Maybe also splitting a_structure.pdb into multiple files, with one bug each. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 14:38:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 14:38:33 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121938.n1CJcXmZ020200@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #8 from bsouthey at gmail.com 2009-02-12 14:38 EST ------- (In reply to comment #7) > (In reply to comment #6) > > Also, as Bruce and Peter implied may happen, this patch clobbers test_PDB.py. > Some options: > > 1. Redirect stderr to stdout, and modify Tests/output/test_PDB to match again. > > 2. Change test_PDB.py to check the exceptions separately, maybe converting it > to a unittest-style test in the process. Maybe also splitting a_structure.pdb > into multiple files, with one bug each. > You know more about this than I do. But I think that test_PDB.py must get rewritten partly because of the text it prints and lack of coverage (like retrieving PDB file online). But really it should be checking these corner cases are handled correctly. So if it is an error in PDB file then the test should check that the error reported is the correct message for that error. For example, running the test from the command line the first message is: PDBConstructionException: Atom N defined twice in residue at line 19. Exception ignored. Is that correct or desired output? The actual error is in my mind irrelevant although I do wonder why a special exception is used. (In reply to comment #6) There are a few cases of this so I think a separate bug should be filed. But cleaning these up would be appreciated, at least by me. Just my couple of cents, Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Feb 12 16:08:42 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 12 Feb 2009 15:08:42 -0600 Subject: [Biopython-dev] FYI LWN article 318699 How patches get into the mainline Message-ID: <49948FDA.3080301@gmail.com> Hi, Just thought it might this might be interesting since we have been talking about git. Jonathan Corbet work this article (How patches get into the mainline ) where he traced a patches for Graphviz (article will be available to all next week). http://lwn.net/SubscriberLink/318699/1df097b75e861618/ * http://lwn.net/SubscriberLink/318699/1df097b75e861618/ Bruce From mjldehoon at yahoo.com Fri Feb 13 03:34:46 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 00:34:46 -0800 (PST) Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> Message-ID: <189305.12645.qm@web62407.mail.re1.yahoo.com> > Do you think we need a way to run just the doctests? > Before we could > do "python run_tests.py test_docstring.py" to do > this. We could add an option "doctest": python run_tests.py doctest runs the doctests only; python run_tests.py test_Cluster doctest runs test_Cluster.py and the doctests, etc. --Michiel From bugzilla-daemon at portal.open-bio.org Fri Feb 13 06:40:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 06:40:40 -0500 Subject: [Biopython-dev] [Bug 2760] New: proposal: enhancement for SeqIO.TabIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2760 Summary: proposal: enhancement for SeqIO.TabIO Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com this patch fix a problem that TabIO had (fail if there it are more than two tabs, or spaces instead of tabs, between the title and the sequence), and introduces a check to skip empty lines. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 06:41:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 06:41:09 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902131141.n1DBf9p1018277@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 ------- Comment #1 from dalloliogm at gmail.com 2009-02-13 06:41 EST ------- Created an attachment (id=1240) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1240&action=view) TabIO patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Feb 13 07:12:38 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 04:12:38 -0800 (PST) Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902120349u7898e2bate126208c837be913@mail.gmail.com> Message-ID: <992871.86988.qm@web62403.mail.re1.yahoo.com> Thanks for the patch. I've updated run_tests.py along these lines, and I added an option "doctest" to specify running the doctests: $ python run_tests.py doctest Bio.Seq docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqIO docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.AlignIO docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok ---------------------------------------------------------------------- Ran 10 tests in 0.726 seconds --- On Thu, 2/12/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] docstring tests > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Thursday, February 12, 2009, 6:49 AM > Hi Michiel (and everyone else), > > I was wondering about how the doctests are currently > integrated into > run_tests.py, and wondered if this patch makes things more > concise? > This patch is against run_tests.py CVS revision 1.22, > essentially it > adds the doctest modules to the list of tests - rather than > as a > separate list. The code becomes slightly shorter, but I am > not sure > if this is actually clearer or not. > > Note - this does not address the issue of how to run just > the doctests > - something I think is very useful when working on them. > > Peter > > $ diff run_tests.py run_tests2.py > 209,211c209 > < if self.tests: > < self.doctest_modules = [] > < else: > --- > > if not self.tests: > 218c216,222 > < self.doctest_modules = DOCTEST_MODULES > --- > > if sys.version_info[:2] < (2, 4): > > #On python 2.3, doctest uses slightly > different formatting > > #which would be a problem as the > expected output won't match. > > #Also, it can't cope with > in a doctest string. > > sys.stderr.write("Skipping > doctests which require Python 2.4+\n") > > else : > > self.tests.extend(DOCTEST_MODULES) > 234,240c238,253 > < module = __import__(name) > < suite = > unittest.TestLoader().loadTestsFromModule(module) > < if suite.countTestCases()==0: > < # This is a print-and-compare test > instead of a unittest- > < # type test. > < test = ComparisonTestCase(name, > output) > < suite = unittest.TestSuite([test]) > --- > > if "." in name : > > #Its a doc test > > #Can't use > fromlist=name.split(".") until python 2.5+ > > module = __import__(name, None, > None, name.split(".")) > > suite = > doctest.DocTestSuite(module) > > del module > > else : > > #Its a unittest (or a > print-and-compare test) > > suite = > unittest.TestLoader().loadTestsFromName(name) > > if suite.countTestCases()==0: > > # This is a print-and-compare > test instead of a > > # unittest-type test. > > test = > ComparisonTestCase(name, output) > > suite = > unittest.TestSuite([test]) > 263,277d275 > < def runDocTest(self, name): > < #Can't use > fromlist=name.split(".") until python 2.5+ > < module = __import__(name, None, None, > name.split(".")) > < sys.stderr.write("%s docstring test ... > " % module.__name__) > < suite = doctest.DocTestSuite(module) > < result = self._makeResult() > < suite.run(result) > < if result.wasSuccessful(): > < sys.stderr.write("ok\n") > < return True > < else: > < sys.stderr.write("FAIL\n") > < result.printErrors() > < return False > < > 287,297d284 > < if sys.version_info[:2] < (2, 4): > < #On python 2.3, doctest uses slightly > different formatting > < #which would be a problem as the expected > output won't match. > < #Also, it can't cope with > in a doctest string. > < sys.stderr.write("Docstring tests > require Python 2.4 or > later; skipping\n") > < else: > < for test in self.doctest_modules: > < ok = self.runDocTest(test) > < if not ok: > < failures += 1 > < total += 1 From mjldehoon at yahoo.com Fri Feb 13 07:16:36 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 04:16:36 -0800 (PST) Subject: [Biopython-dev] Updated the documentation of the Biopython testing framework In-Reply-To: <320fb6e00902110329s4c84dab8w9120cf480fd84437@mail.gmail.com> Message-ID: <975833.90231.qm@web62402.mail.re1.yahoo.com> I've added some docstring examples to the unittest section. --Michiel --- On Wed, 2/11/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Updated the documentation of the Biopython testing framework > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Wednesday, February 11, 2009, 6:29 AM > On Wed, Feb 11, 2009 at 5:51 AM, Michiel de Hoon > wrote: > > Hi everybody, > > > > I've updated the section in the tutorial about the > Biopython testing framework. > > This description includes the examples that were > previously in > > Doc/cookbook/biopython_test. I haven't uploaded > this to CVS yet, but the > > HTML version of the tutorial is viewable here: > > > > > http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html > > > > If there are no objections, I'll upload the new > tutorial to CVS. > > > > --Michiel. > > In the unittest example could you add simple docstrings, so > that the > printed output is nicer? Otherwise thus far I have only > skimmed the > content, it looks good. > > Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 13 07:46:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 07:46:40 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902131246.n1DCkeY1003356@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 07:46 EST ------- The "tab" format in Bio.SeqIO was explicitly ONLY for simple tab files with two fields (see Bug 2533). Perhaps a more helpful error message would be a good idea. If there are more than two fields, determining which are the title and sequence is complicated. Your code seems to assume these are the first two fields, and ignores the rest - which may work in some cases. Do you have some specific examples of tab separated files you want to read in using Bio.SeqIO? I am particularly interested in files from other software packages (not ones you created yourself). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 08:00:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 08:00:09 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902131300.n1DD09jK006776@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 08:00 EST ------- (In reply to comment #0) > > this patch ..., and introduces a check to skip empty lines. > That change is probably a good idea, but not that rather than : if line != "" : #Do stuff... I believe the following is considered better python style: if line : #Do stuff... I have updated CVS to ignore blank lines, and to give a more helpful ValueError when trying to parse invalid files. See revision 1.2, http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqIO/TabIO.py?cvsroot=biopython Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Feb 13 10:37:38 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 13 Feb 2009 09:37:38 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <992871.86988.qm@web62403.mail.re1.yahoo.com> References: <992871.86988.qm@web62403.mail.re1.yahoo.com> Message-ID: <499593C2.3090806@gmail.com> Michiel de Hoon wrote: > Thanks for the patch. I've updated run_tests.py along these lines, and I added an option "doctest" to specify running the doctests: > > $ python run_tests.py doctest > Bio.Seq docstring test ... ok > Bio.SeqRecord docstring test ... ok > Bio.SeqIO docstring test ... ok > Bio.Align.Generic docstring test ... ok > Bio.AlignIO docstring test ... ok > Bio.KEGG.Compound docstring test ... ok > Bio.KEGG.Enzyme docstring test ... ok > Bio.Wise docstring test ... ok > Bio.Wise.psw docstring test ... ok > Bio.Statistics.lowess docstring test ... ok > ---------------------------------------------------------------------- > Ran 10 tests in 0.726 seconds > > > > > --- On Thu, 2/12/09, Peter wrote: > > >> From: Peter >> Subject: Re: [Biopython-dev] docstring tests >> To: mjldehoon at yahoo.com >> Cc: biopython-dev at biopython.org >> Date: Thursday, February 12, 2009, 6:49 AM >> Hi Michiel (and everyone else), >> >> I was wondering about how the doctests are currently >> integrated into >> run_tests.py, and wondered if this patch makes things more >> concise? >> This patch is against run_tests.py CVS revision 1.22, >> essentially it >> adds the doctest modules to the list of tests - rather than >> as a >> separate list. The code becomes slightly shorter, but I am >> not sure >> if this is actually clearer or not. >> >> Note - this does not address the issue of how to run just >> the doctests >> - something I think is very useful when working on them. >> >> Peter >> >> $ diff run_tests.py run_tests2.py >> 209,211c209 >> < if self.tests: >> < self.doctest_modules = [] >> < else: >> --- >> >>> if not self.tests: >>> >> 218c216,222 >> < self.doctest_modules = DOCTEST_MODULES >> --- >> >>> if sys.version_info[:2] < (2, 4): >>> #On python 2.3, doctest uses slightly >>> >> different formatting >> >>> #which would be a problem as the >>> >> expected output won't match. >> >>> #Also, it can't cope with >>> >> in a doctest string. >> >>> sys.stderr.write("Skipping >>> >> doctests which require Python 2.4+\n") >> >>> else : >>> self.tests.extend(DOCTEST_MODULES) >>> >> 234,240c238,253 >> < module = __import__(name) >> < suite = >> unittest.TestLoader().loadTestsFromModule(module) >> < if suite.countTestCases()==0: >> < # This is a print-and-compare test >> instead of a unittest- >> < # type test. >> < test = ComparisonTestCase(name, >> output) >> < suite = unittest.TestSuite([test]) >> --- >> >>> if "." in name : >>> #Its a doc test >>> #Can't use >>> >> fromlist=name.split(".") until python 2.5+ >> >>> module = __import__(name, None, >>> >> None, name.split(".")) >> >>> suite = >>> >> doctest.DocTestSuite(module) >> >>> del module >>> else : >>> #Its a unittest (or a >>> >> print-and-compare test) >> >>> suite = >>> >> unittest.TestLoader().loadTestsFromName(name) >> >>> if suite.countTestCases()==0: >>> # This is a print-and-compare >>> >> test instead of a >> >>> # unittest-type test. >>> test = >>> >> ComparisonTestCase(name, output) >> >>> suite = >>> >> unittest.TestSuite([test]) >> 263,277d275 >> < def runDocTest(self, name): >> < #Can't use >> fromlist=name.split(".") until python 2.5+ >> < module = __import__(name, None, None, >> name.split(".")) >> < sys.stderr.write("%s docstring test ... >> " % module.__name__) >> < suite = doctest.DocTestSuite(module) >> < result = self._makeResult() >> < suite.run(result) >> < if result.wasSuccessful(): >> < sys.stderr.write("ok\n") >> < return True >> < else: >> < sys.stderr.write("FAIL\n") >> < result.printErrors() >> < return False >> < >> 287,297d284 >> < if sys.version_info[:2] < (2, 4): >> < #On python 2.3, doctest uses slightly >> different formatting >> < #which would be a problem as the expected >> output won't match. >> < #Also, it can't cope with >> in a doctest string. >> < sys.stderr.write("Docstring tests >> require Python 2.4 or >> later; skipping\n") >> < else: >> < for test in self.doctest_modules: >> < ok = self.runDocTest(test) >> < if not ok: >> < failures += 1 >> < total += 1 >> > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, At present 'python setup.py test' does do all the tests including the doctests. Just curious, will you also add the ability to select the doctests there as well? Bruce From biopython at maubp.freeserve.co.uk Fri Feb 13 10:48:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Feb 2009 15:48:33 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902040514te7c7d2ci245433371770d172@mail.gmail.com> References: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> <114112.52378.qm@web62406.mail.re1.yahoo.com> <320fb6e00902040514te7c7d2ci245433371770d172@mail.gmail.com> Message-ID: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> On Wed, Feb 4, 2009 at 1:14 PM, Peter wrote: > It look a little while to show up in CVS for me, but I've got it now. > That seems to solve the problem neatly - and you've even managed > to capture the stack trace elegantly, something I hadn't worked out how to do. > > Nice :) Unfortunately, the traceback.format_exc() function you used to capture the stack trace for print out is Python 2.4+ only[1]. This means if one of the print-and-compare tests fails with an exception on Python 2.3, then run_tests.py will fall over. I've checked in a simple fix to use the exception text instead - I'm sure something more useful could be done for Python 2.3, but we'll be dropping support for this fairly soon anyway. This was failing for me on a "known failure", test_Clustalw_tool.py on Windows Python 2.3, where some filenames with spaces just won't work without the subprocess module (Python 2.4+ only). I don't think this can be avoided, so I've updated test_Clustalw_tool.py to skip this bit in future. Peter [1] See http://docs.python.org/library/traceback.html From biopython at maubp.freeserve.co.uk Fri Feb 13 11:02:41 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Feb 2009 16:02:41 +0000 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <366127.53671.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00902100235m5dcd72e1reb9e4e7e0ea3b3e6@mail.gmail.com> <366127.53671.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00902130802i2abcad45xafdfb7e4c08820f9@mail.gmail.com> On Tue, Feb 10, 2009 at 11:25 AM, Michiel de Hoon wrote: > >> The test_Nexus tearDown used to make sure the temp output >> files were removed. ?This is important on Windows which >> does not do this automatically. ?I see you now allocate >> "random" filenames using tempfile.NamedTemporaryFile(...) >> so presumably we would need to record these so that the >> tearDown method knows what temp files to remove. > > From reading the Python documentation, the file created by > tempfile.NamedTemporaryFile is removed automatically > when the file handle is closed, even on Windows. That's good to know. On a related point, I've just found test_Nexus.py is failing on Windows XP with Python 2.6 (but is fine with Python 2.3, 2.4 and 2.5): C:\repository\biopython\Tests>c:\python26\python test_Nexus.py Test Nexus module ... ERROR Test Tree module. ... ok ====================================================================== ERROR: Test Nexus module ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Nexus.py", line 114, in test_NexusTest1 f1=tempfile.NamedTemporaryFile(mode='r+w+b') File "c:\python26\lib\tempfile.py", line 445, in NamedTemporaryFile file = _os.fdopen(fd, mode, bufsize) OSError: [Errno 22] Invalid argument ---------------------------------------------------------------------- Ran 2 tests in 0.016s FAILED (errors=1) I don't have time to look into this right now, but should be able to investigate next week. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 13 11:24:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 11:24:37 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902131624.n1DGOb6d026675@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 11:24 EST ------- I think we have resolved this by extending the main tutorial (in CVS) to include a unittest example. This work also replaces the existing (slightly out of date) unit test examples, which have been deleted in CVS: http://biopython.org/DIST/docs/cookbook/biopython_test.html http://biopython.org/DIST/docs/cookbook/biopython_test.pdf Marking bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 11:26:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 11:26:21 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200902131626.n1DGQLdn027308@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 11:26 EST ------- Closing this bug as "invalid". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 11:58:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 11:58:29 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902131658.n1DGwT20003435@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #6 from dalloliogm at gmail.com 2009-02-13 11:58 EST ------- (In reply to comment #5) > I think we have resolved this by extending the main tutorial (in CVS) to > include a unittest example. This work also replaces the existing (slightly out > of date) unit test examples, which have been deleted in CVS: > http://biopython.org/DIST/docs/cookbook/biopython_test.html > http://biopython.org/DIST/docs/cookbook/biopython_test.pdf > > Marking bug as fixed. In the example, you could add a comment in the setUp and tearDown functions, something like: def setUp(self): """these instructions will be executed *before* each of the tests in this unit""" and def tearDown(self): """these instructions will be executed *after* each of the tests in this unit""" It will make it clearer. Moreover, the python's library reference for unittest explain very clearly how fixtures and unittest works, maybe it's worth to add a link to it somewhere: - http://www.python.org/doc/2.5.2/lib/module-unittest.html I would also structure the test in a slightly different way.. I would put 'filename' in a separated variable (easier to read), and I would add a knowValues test as example. Finally, if you want to add a comment on global fixture, you can say it is possible to implement them with the 'self._is_set_up' trick. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Feb 13 21:12:17 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 18:12:17 -0800 (PST) Subject: [Biopython-dev] docstring tests In-Reply-To: <499593C2.3090806@gmail.com> Message-ID: <729099.67585.qm@web62405.mail.re1.yahoo.com> > Hi, > At present 'python setup.py test' does do all the > tests including the > doctests. Just curious, will you also add the ability to > select the > doctests there as well? > I wasn't planning to, since "python setup.py test" currently does not allow selecting for any of the test scripts either; it just runs all of them. But I won't object if somebody else (wink, wink) adds this capability to "python setup.py test". On the other hand, you may think of "python setup.py test" as the quick-and-comprehensive way to run the tests, and run_tests.py as a more specialized tool that gives you more control. --Michiel. From mjldehoon at yahoo.com Fri Feb 13 21:14:27 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 18:14:27 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> Message-ID: <743351.68664.qm@web62405.mail.re1.yahoo.com> Currently, Numpy doesn't seem to work with python < 2.4, so for reliability maybe Biopython also should require python >= 2.4. --Michiel --- On Fri, 2/13/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Friday, February 13, 2009, 10:48 AM > On Wed, Feb 4, 2009 at 1:14 PM, Peter > wrote: > > It look a little while to show up in CVS for me, but > I've got it now. > > That seems to solve the problem neatly - and > you've even managed > > to capture the stack trace elegantly, something I > hadn't worked out how to do. > > > > Nice :) > > Unfortunately, the traceback.format_exc() function you used > to capture > the stack trace for print out is Python 2.4+ only[1]. This > means if > one of the print-and-compare tests fails with an exception > on Python > 2.3, then run_tests.py will fall over. I've checked in > a simple fix > to use the exception text instead - I'm sure something > more useful > could be done for Python 2.3, but we'll be dropping > support for this > fairly soon anyway. > > This was failing for me on a "known failure", > test_Clustalw_tool.py on > Windows Python 2.3, where some filenames with spaces just > won't work > without the subprocess module (Python 2.4+ only). I > don't think this > can be avoided, so I've updated test_Clustalw_tool.py > to skip this bit > in future. > > Peter > > [1] See http://docs.python.org/library/traceback.html From biopython at maubp.freeserve.co.uk Sat Feb 14 09:31:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 14:31:43 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <743351.68664.qm@web62405.mail.re1.yahoo.com> References: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> <743351.68664.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00902140631n4b5b472bi9e3f8de0e3a8647@mail.gmail.com> On Sat, Feb 14, 2009 at 2:14 AM, Michiel de Hoon wrote: > > Currently, Numpy doesn't seem to work with python < 2.4, so for reliability > maybe Biopython also should require python >= 2.4. > What specifically are you refering to? I've not had any trouble with older versions of numpy on python 2.3 - although I believe later versions of numpy do require python 2.4+ (this must be stated on their website somewhere). We've already said that the next release (Biopython 1.50) will be the last to officially support Python 2.3, so this isn't going to be an issue for much longer anyway. Peter From biopython at maubp.freeserve.co.uk Sat Feb 14 09:37:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 14:37:16 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <729099.67585.qm@web62405.mail.re1.yahoo.com> References: <499593C2.3090806@gmail.com> <729099.67585.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00902140637j68c03936q5efcd1e6ebe8313d@mail.gmail.com> On Sat, Feb 14, 2009 at 2:12 AM, Michiel de Hoon wrote: > I wasn't planning to, since "python setup.py test" currently does not > allow selecting for any of the test scripts either; it just runs all of them. > But I won't object if somebody else (wink, wink) adds this capability > to "python setup.py test". On the other hand, you may think of > "python setup.py test" as the quick-and-comprehensive way to > run the tests, and run_tests.py as a more specialized tool that > gives you more control. I think this is fine as it is, and would not be keen on adding any redundant code to the setup.py file (there is a small risk of causing problems for third party integrators, py2exe etc). The "python setup.py test" is really there as part of the installation procedure, where you would just want all the tests run. The only reason I'd want to run some of the tests is if debugging something - and this is where you would use run_tests.py directly. Peter From bugzilla-daemon at portal.open-bio.org Sat Feb 14 09:59:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 14 Feb 2009 09:59:53 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902141459.n1EExr6u006975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-14 09:59 EST ------- (In reply to comment #6) > > In the example, you could add a comment in the setUp and tearDown functions, > something like: > def setUp(self): > """these instructions will be executed *before* each of the tests in this > unit""" > > and > > def tearDown(self): > """these instructions will be executed *after* each of the tests in this > unit""" > > It will make it clearer. I've added a mention about setUp and tearDown. > Moreover, the python's library reference for unittest explain very clearly how > fixtures and unittest works, maybe it's worth to add a link to it somewhere: > - http://www.python.org/doc/2.5.2/lib/module-unittest.html There was one link at the start of the chapter, but I have added a couple more. We don't need too much detail - that's what the unittest documentation is for ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Sat Feb 14 11:27:19 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 14 Feb 2009 17:27:19 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> Message-ID: <5aa3b3570902140827n55b210f4q49ed9d4b8b1c56fe@mail.gmail.com> On Tue, Feb 10, 2009 at 1:21 PM, Bartek Wilczynski wrote: > Hi, > > > Once I had that, I could publish my private branch of biopython to > launchpad (it took about 10s). > Now, if anyone is interested in test-driving bazaar+launchpad with > biopython, he/she can just > branch it to your own computer (you don't need any account for that, > just bzr installed): > bzr branch lp:~bartek/junk/biopython > > I did that (branch) on a different computert (~2min). Now one can > start modifying code. > I've done some changes to the Bio.Motif code (add a method, commit > locally, fix a small bug in it, > commit again, test) and pushed the changes to the branch on launchpad. > Commits are quick (~3s), > push takes about a minute, but this is including a scan of the whole > tree, so it should not > take much longer than this for bigger changes. > > Note:This is my own branch, so I can commit to it, but if I was not > the owner (or maintainer) of the > branch, I would have to either send my changes to the maintainer or > publish my branch and let him > "pull" from it. > > I realised later that I've accidentaly added a large directory during > tailor conversion, so I removed it in > the original bzr branch (as made by tailor) merged it with the changes > pushed already to launchpad > from somewhere else (Motif) and pushed the resulting tree back to > launchpad.The removal was very fast > (~5s) and the push took about the same time as with the small > change.The good thing is that the history > of all changes is retained. > > If anyone wants to give it a try, just install bzr and you can easily > branch from me using: > bzr branch lp:~bartek/junk/biopython Hi, I was trying bazaar. These are the steps I did, can you check if I did everything correctly? - I have created an account on launchpad and uploaded an ssh key (on my home page, -> click on 'Profile', then 'Edit details', and then 'Ssh keys' - it costed me a bit to find it at first :) ). - On my computer, from a terminal, I have executed "bzr-launchpad-login " to login to launchpad. - I ran "bzr branch lp:~bartek/junk/biopython", to create a branch of your repository in my computer. - I did some stupid changes (I must have messed a bit with creating branches), and then committed them. So now, if I want to inform you of the changes I have made, how does it work? Which is the correct bazaar command to pull a merge request? - In the meantime, I have created an entry for my branch on launchpad. I went to my home page, clicked on 'Code', and then 'Register a new branch'. On the 'Reference Project' field, I couldn't find your project, only the biopython created a few years ago of which you were asking earlier. This is my branch: - https://code.launchpad.net/~dalloliogm-gmail/+junk/biopython-gio How do I link it to your repo, now? > > The branch history can be seen here: > https://code.launchpad.net/~bartek/+junk/biopython/ > > And the annotated source code is here: > http://bazaar.launchpad.net/~bartek/+junk/biopython/files > > The specific changes done by me can be seen as revisions: > http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3460 > http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.1 > http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.2 > > In summary, I think that it's doable to convert current CVS tree to bzr and > bzr handle the job of a DVCS. Performance is not stellar (epsecially code > browsing in launchpad is sometimes slow) but for it's acceptable, especially > given that I'm rarely browsing the history, and much more often use command > line tools which are (for me) fast enough. > > Please let me know what others think. If there will be general > interest in that, I > can try to set up a more permanent (but still experimental) bzr branch which > would be automatically synchronized from CVS, so that we can do a more > long-term experiment to see whether it works, and people like it. > > cheers > Bartek > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Sat Feb 14 13:47:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 18:47:30 +0000 Subject: [Biopython-dev] External python dependencies and doctests Message-ID: <320fb6e00902141047m6d71d977t946a018482313176@mail.gmail.com> Hi all, Currently the doctest handling in run_tests.py requires some special cases for those modules with an optional external dependency, for example the Bio.Statistics.lowess doctests will only work if NumPy is installed. We *could* just run all the doctests, and catch and ignore any import errors. However, an import error might be a real error in Biopython (e.g. if something was deleted or moved). This is therefore probably a bad idea. I was thinking we could introduce new exception(s) which subclasses both the ImportError and our MissingExternalDependencyError exception. This can then be treated as another variant of MissingExternalDependencyError and ignored by run_tests.py, plus as it is also an ImportError any third party scripts can continue to catch import errors as before. This means that run_tests.py doesn't need to know if some doctests require NumPy (or ReportLab) or not - we can just run them and find out (see patch below). The downside is that any bits of Biopython where we import numpy or reportlab (or at least those with doctests) would need to catch any import error and re-raise it (as below). I'm not sure if this is a good idea or not. It would certainly be useful if we want to switch to having the doctests found automatically (which is probably a good idea in the long run - the hand coded list was just my short term pragmatic solution). Peter Index: Bio/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/__init__.py,v retrieving revision 1.31 diff -r1.31 __init__.py 14a15,37 > > class MissingPythonDependencyError(MissingExternalDependencyError,ImportError) : > """Exception for missing python libraries. > > This should be used when "import numpy" or "import reportlab" fail. > This exception subclasses both the standard python ImportError, and > our Biopython MissingExternalDependencyError meaning it can be caught > using "except ImportError" or "except MissingExternalDependencyError". > This is important for our test framework. > """ > pass > > class MissingNumPyDependencyError(MissingPythonDependencyError) : > """Exception for when NumPy is not installed.""" > def __str__(self) : > return "This requires the Numerical Python library, NumPy, " + \ > "freely available from http://www.numpy.org" > > class MissingReportLabDependencyError(MissingPythonDependencyError) : > """Exception for when ReportLab is not installed.""" > def __str__(self) : > return "This requires the python library ReportLab, " + \ > "freely available from http://www.reportlab.org" Index: Bio/Statistics/lowess.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Statistics/lowess.py,v retrieving revision 1.10 diff -r1.10 lowess.py 23c23,27 < import numpy --- > try : > import numpy > except ImportError: > from Bio import MissingNumPyDependencyError > raise MissingNumPyDependencyError() Index: Bio/Cluster/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Cluster/__init__.py,v retrieving revision 1.13 diff -r1.13 __init__.py 1c1,6 < import numpy --- > try : > import numpy > except ImportError: > from Bio import MissingNumPyDependencyError > raise MissingNumPyDependencyError() > Index: Bio/Graphics/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Graphics/__init__.py,v retrieving revision 1.2 diff -r1.2 __init__.py 2d1 < 7c6,7 < raise ImportError("Install ReportLab if you want to use Bio.Graphics. You can find ReportLab at http://www.reportlab.org/downloads.html") --- > from Bio import MissingReportLabDependencyError > raise MissingReportLabDependencyError() From bsouthey at gmail.com Sat Feb 14 15:12:49 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 14 Feb 2009 14:12:49 -0600 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902140631n4b5b472bi9e3f8de0e3a8647@mail.gmail.com> References: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> <743351.68664.qm@web62405.mail.re1.yahoo.com> <320fb6e00902140631n4b5b472bi9e3f8de0e3a8647@mail.gmail.com> Message-ID: On Sat, Feb 14, 2009 at 8:31 AM, Peter wrote: > On Sat, Feb 14, 2009 at 2:14 AM, Michiel de Hoon wrote: >> >> Currently, Numpy doesn't seem to work with python < 2.4, so for reliability >> maybe Biopython also should require python >= 2.4. >> > > What specifically are you refering to? I've not had any trouble with > older versions of numpy on python 2.3 - although I believe later > versions of numpy do require python 2.4+ (this must be stated on their > website somewhere). numpy versions 1.2 requires Python 2.4 and above. This was in the release notes but the web site has not been updated! Likewise there is no information on the lack of support for Python 2.6 for windows - which should be in the numpy 1.3 release (due to major issues of creating the binary installer). Bruce From biopython at maubp.freeserve.co.uk Sat Feb 14 16:32:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 21:32:00 +0000 Subject: [Biopython-dev] External python dependencies and doctests In-Reply-To: <320fb6e00902141047m6d71d977t946a018482313176@mail.gmail.com> References: <320fb6e00902141047m6d71d977t946a018482313176@mail.gmail.com> Message-ID: <320fb6e00902141332m7bba0497g6650a883b86c1994@mail.gmail.com> On Sat, Feb 14, 2009 at 6:47 PM, Peter wrote: > Hi all, > > Currently the doctest handling in run_tests.py requires some special > cases for those modules with an optional external dependency, for > example the Bio.Statistics.lowess doctests will only work if NumPy is > installed. We *could* just run all the doctests, and catch and ignore > any import errors. However, an import error might be a real error in > Biopython (e.g. if something was deleted or moved). This is therefore > probably a bad idea. I've been thinking about the exception idea in my previous email, and maybe it is too complicated - it would be a hassle in the long term to have to manually add this catch ImportError and raise missing dependency code all over the place. An alternative would be to catch all ImportError exceptions in run_tests.py, and treat numpy and reportlab as special cases and skip those tests. Other ImportError cases would indeed be errors. This is basically what I suggested a while back on Bug 2524. http://bugzilla.open-bio.org/show_bug.cgi?id=2524 Perhaps this is better - it puts the special case code in one place only (run_test.py), meaning the our unit tests needing numpy or reportlab don't need to do anything special about raising a missing dependency error. This isn't a big issue for the unit tests, but for the doctests this is a significant benefit I think. [The missing external dependency exception is still useful for missing command line tools - although I'm not sure how best to cope with this in a doctest. See test_psw.py and test_wise.py for an example of this - they are basically doctests with a wrapper to determine if the dnal command line tool is installed.] Peter From bugzilla-daemon at portal.open-bio.org Sat Feb 14 17:06:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 14 Feb 2009 17:06:24 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902142206.n1EM6O83011973@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #9 from eric.talevich at gmail.com 2009-02-14 17:06 EST ------- (In reply to comment #8) Yes, something must be done with test_PDB.py, because I don't think warnings.warn can be made to play nice with that print-and-compare test -- or any print-and-compare, since the warning messages contain extra environment-specific information. The test suite I'm picturing looks like this: - Load the PDB file with permissive=0, verify the first PDBException. - Add a warning filter to silence the first message, retry loading, verify the next Exception. - Repeat for all the expected errors in the PDB file. - Silence warnings and load the PDB file with permissive=1; continue the usual print-and-compare tests. I think doctest can be coaxed into ignoring part of an output message with ellipses, and unittest might have an assertion for error messages or we could just catch the exception and check the message directly. So: either way, test_PDB.py gets a rewrite, and the example PDB file can stay the way it is. > For example, running the test from the command line the first message is: > PDBConstructionException: Atom N defined twice in residue resseq=2 icode= > at line 19. > Exception ignored. > > Is that correct or desired output? Yes, but warnings.warn prepends the absolute file path and the line number where the warning was raised (there's an option to make it look deeper in the stack, for catch-and-release cases like this one), so even if sys.stdout is assigned to sys.stderr, the text doesn't match exactly and the test fails. The important thing is that a PDBConstructionError is raised, and to be precise, that the message contains "defined twice", as I understand it. > The actual error is in my mind irrelevant although I do wonder why a special > exception is used. Two advantages for the user: 1. Tracebacks make it clear that there was a problem parsing the PDB file. Otherwise, it's a little unclear whether there's a problem in the user's code, a real bug in Biopython, or something wrong with the PDB file itself. 2. User code can catch a PDBConstructionException specifically and let other exceptions fall through, e.g. an IOError which could require different handling. > (In reply to comment #6) > There are a few cases of this so I think a separate bug should be filed. But > cleaning these up would be appreciated, at least by me. Cases of file-specific error handling, or sys.stderr/stdout abuse? Both sound like good cleanup tasks. In the case of __debug__ protection, it looks like normally Python executes with __debug__==True except when run with -O. Like turning off assertions, you know. Given that, and the simplicity of turning off warnings globally in user code (import warnings; warnings.simplefilter('ignore')), I think it's safe to remove these checks and just issue the warnings directly. For the other stunt in PDBParser, that seems like it deserves a separate patch at the very least, so I'm not going to attempt to resolve it in this bug unless it's breaking something else. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Feb 15 02:18:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 15 Feb 2009 02:18:49 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200902150718.n1F7InuB029333@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2009-02-15 02:18 EST ------- With this patch, the test_LogisticRegression.py unit test fails. Could you check that? Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn can store the value of llik on each call. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Sun Feb 15 09:18:25 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 15 Feb 2009 15:18:25 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <5aa3b3570902140827n55b210f4q49ed9d4b8b1c56fe@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <5aa3b3570902140827n55b210f4q49ed9d4b8b1c56fe@mail.gmail.com> Message-ID: <8b34ec180902150618p40805703oa700f6d8acbe0aec@mail.gmail.com> Hi, On Sat, Feb 14, 2009 at 5:27 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I was trying bazaar. > > These are the steps I did, can you check if I did everything correctly? > I think so. > This is my branch: > - https://code.launchpad.net/~dalloliogm-gmail/+junk/biopython-gio > > How do I link it to your repo, now? > The problem is, that +junk branches cannot be proposed for merging in launchpad. see (https://help.launchpad.net/Code/PersonalBranches) If you don't have a launchpad project and don't want to setup one, you have two options: either you just send me a changeset (simiar to a patch). You use the command bzr send -o my_changeset_filename this generates a txt file with your changes andd you can just send them to me, so that I can merge them into my tree. The other option is to send me a link to your branch and I can pull from it (I can pull from +junk branches). In order to have all the functionality of merge proposals, code review etc. we need a launchpad project. I created one: https://launchpad.net/biopython-test for the purpose of testing. I've added you (giovanni) to the team of maintainers of the project. I also created two branches and requested one of them to be merged with the other. If now anyone pushes his branch to a proper place: lp:~username/biopython-test/my_branch_name It can be proposed for merging into biopython-test. Branches pushed to the project directory directly: (e.g. lp:biopython-test/trunk) have write permissions for all team-members. If anyone wants to give it a try, please join the biopython-test team on launchpad (you'll need a launchpad account). cheers Bartek From dalloliogm at gmail.com Sun Feb 15 10:29:53 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 15 Feb 2009 16:29:53 +0100 Subject: [Biopython-dev] biopython on github Message-ID: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> Hi, I have uploaded a git-converted branch of biopython on github, in case you want to try it and see how it works. You can find it here: - http://github.com/biopython/biopython/ To work with it, the optimal protocol is: - create an account on github.com. Upload an ssh public key by clicking on 'account' after having logged in. It is not mandatory to use github, but it will help you understanding how git works, and it allows other people to follow your branches and your work. - go to the biopython repo: http://github.com/biopython/biopython/tree/master and you will see a button named 'Fork': click on it. It will create a fork of the official biopython repository your personal account. Here the word 'fork' is not used in the common way it is, but just to indicate that you are going to work on a modified version of the official code, and it's not even a git command. - now, install git on your computer, and execute the following commands: $: git clone git at github.com:/biopython.git $: git remote add official_dist git://github.com/biopython/biopython.git With the first command, you will download a copy of the repository on your local computer, which will be the one you will modify (technically, you are creating a new branch on your computer). With the second command, you are adding a reference to the official biopython repository, so in the future you will be able to easily import the official code and compare it with yours. Here it is an explanation on these two commands: http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo p.s.: to convert to git from cvs I have followed the instructions here: - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/cvs-migration.html This seems to be a good tutorial on git, too: - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Feb 16 08:00:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 08:00:53 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200902161300.n1GD0rep000706@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 andrea at biodec.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrea at biodec.com ------- Comment #7 from andrea at biodec.com 2009-02-16 08:00 EST ------- (In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #3) > > > > > > What versions of biopython and the BioSQL schema are you using? > > > > > > Cymon > > > > According to the bug report, Stephen was using Biopython 1.49, so: > > > > Stephen: > > Biopython 1.49 > > postgresql 8.2 > > BioSQL - schema version unspecified > > psycopg2 - version unspecified > > python - version unspecified > > OS - Mac OS X > > > > What about you Cymon - you have postgresql with psycopg2 working, but what > > versions of things? > > > > Peter > > > > Peter, > > I'm using: > Biopython: CVS > Posgresql: 8.1.11 > BioSQL: 1.0.1 > Python: 2.5.2 > Psycopg: 2.0.8 > OS: Red Hat Enterprise 5.3 > > C. > Hi, the problem, according to me, is already solved. It seems that Stephen has an old version of Loader.py. I submitted a bug and patch that explain that for Postgres is not possible to have double quotes in queries ("). Double quotes are reserved to Column names. In the correct Loader.py version everything is corrected and there aren't double quotes in any queries at all. Stephen, please check if: Loader.DatabaseLoader._get_seqfeature_dbxref is equivalent to: def _get_seqfeature_dbxref(self, seqfeature_id, dbxref_id, rank): """ Check for a pre-existing seqfeature_dbxref entry with the passed seqfeature_id and dbxref_id. If one does not exist, insert new data """ # Check for an existing record sql = r"SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref " \ r"WHERE seqfeature_id = '%s' AND dbxref_id = '%s'" result = self.adaptor.execute_and_fetch_col0(sql, (seqfeature_id, dbxref_id)) # If there was a record, return without executing anything, else create # the record and return if result: return result return self._add_seqfeature_dbxref(seqfeature_id, dbxref_id, rank) maybe in your version, there are still double quotes ("%s") instead of single quotes ('%s') Andrea -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 16 08:24:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 08:24:53 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200902161324.n1GDOrJE003880@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-16 08:24 EST ------- (In reply to comment #7) > Hi, > the problem, according to me, is already solved. > It seems that Stephen has an old version of Loader.py. Well spotted Andrea - you may be right... > I submitted a bug and patch that explain that for Postgres > is not possible to have double quotes in queries ("). > Double quotes are reserved to Column names. > > In the correct Loader.py version everything is corrected > and there aren't double quotes in any queries at all. > ... Andrea is referring to Bug 2506, which was fixed in Loader.py CVS revision 1.31, which means it was included in Biopython 1.46 onwards. Stephen said he was using Biopython 1.49, the error may be an out of date Loader.py which is still using double quotes: Quoting comment #0, > ... > pq_execute: executing SYNC query: > SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id > = "3" AND dbxref_id = "6" > pq_execute: entering syncronous DBAPI compatibility mode > pq_fetch: pgstatus = PGRES_FATAL_ERROR > pq_fetch: uh-oh, something FAILED > pq_fetch: fetching done; check for critical errors > psyco_curs_execute: res = -1, pgres = 0x0 Certainly the SQL command shown in the pg log has double quotes. > Traceback (most recent call last): > ... > File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in > _load_seqfeature_dbxref > self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) > File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in > _get_seqfeature_dbxref > dbxref_id)) > File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295, > in execute_and_fetch_col0 > self.cursor.execute(sql, args or ()) According to that traceback, line 679 in _get_seqfeature_dbxref is excuting some bad SQL (presumably the double quotes). This line number doesn't match up for Biopython 1.49, so it probably is an older version of Biopython. Stephen - maybe you have more than one copy of Biopython installed (e.g. and old system level copy, and a new local copy)? You could try deleting these directories and then reinstalling Biopython: /Library/Python/2.5/site-packages/Bio /Library/Python/2.5/site-packages/BioSQL Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 16 11:05:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 11:05:27 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200902161605.n1GG5RLr010326@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #3 from bsouthey at gmail.com 2009-02-16 11:05 EST ------- (In reply to comment #2) > With this patch, the test_LogisticRegression.py unit test fails. > Could you check that? Yes it fails because the test example does not convergence with the defaults (try the example is R or SAS) and, thus, does not provide a valid check for logistic regression. > > Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn > can store the value of llik on each call. I guess this is all how you define the purpose of the update_fn function. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Mon Feb 16 11:40:03 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 16 Feb 2009 17:40:03 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> Message-ID: <5aa3b3570902160840p41948844tfe73b51cf37e6a7@mail.gmail.com> On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I have uploaded a git-converted branch of biopython on github, in case > you want to try it and see how it works. So, yesterday Bartek and me have tried github a bit, and we both have done some test commits to our personal development branches. If you go here: - http://github.com/biopython/biopython/network you will see the network of all the changes we made each and the differences between the various branches. The application that creates the diagram tries to minimize the number of branches shown: so maybe you won't see my branch or Bartek's if one of the two can be included in the other. If you create your own branch, and later other people commit other changes on other forks of the same project, you will have an utility to list all these changes directly from github. It will look like this: - http://img8.imageshack.us/img8/4194/biopythonforkqueueod5.png So, in principle these are the most useful features that I think github offers and I couldn't find in other similar softwares (e.g. trac). On the other side, github has some disadvantages: it is a commercial product, and it has no specific tool to integrate it with a bug tracker. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Feb 16 14:06:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 14:06:55 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200902161906.n1GJ6t3M022304@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #10 from bsouthey at gmail.com 2009-02-16 14:06 EST ------- (In reply to comment #9) > Created an attachment (id=1212) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1212&action=view) [details] > Patch to Bio/MaxEntropy.py to make the convergence parameters optional > arguments > > This time its the whole patch - sorry for the extra emails this has triggered. > I had stopped to check in a couple of docstring changes and fixed a few tabs in > MaxEntropy.py first, which confused things. > > Note this is a bit different to what I was thinking in comment #5, > > ... something like this: > > > > def train(training_set, results, feature_fns, update_fn=None, > > max_iis_iterations = MAX_IIS_ITERATIONS, > > iis_convere = IIS_CONVERGE, > > max_newton_iterations = MAX_NEWTON_ITERATIONS > > newton_coverage = NEWTON_CONVERGE): > > The above code won't pick up changes to the module level variables like > MAX_IIS_ITERATIONS because the defaults are only evaluated once when the > function is created. My patch removed these hard coded default values and placed them in the function. >The patch deals with this as follows: > > def train(training_set, results, feature_fns, update_fn=None, > max_iis_iterations=None, iis_converge=None, > max_newton_iterations=None, newton_converge=None): > if max_iis_iterations is None : > max_iis_iterations = MAX_IIS_ITERATIONS > if iis_converge is None : > iis_converge = IIS_CONVERGE > if max_newton_iterations is None : > max_newton_iterations = MAX_NEWTON_ITERATIONS > if newton_converge is None : > newton_converge = NEWTON_CONVERGE > > This works :) > I hate the use of the local variable being the lowercase version of another variable. Obviously for the original variables we are stuck with uppercase for backwards compatibility. So we need to change the names of the lowercase variables. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 16 18:14:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 18:14:53 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200902162314.n1GNEr9P028162@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-16 18:14 EST ------- (In reply to comment #10) > > My patch removed these hard coded default values and placed them in the > function. > Yes - and in doing so it broke the existing API. My way preserves the ability to alter these module level variables as another way to control the funtion. We could describe these old upper case module level variables as obsolete in the docstring as a step to phasing them out. > >The patch deals with this as follows: > > > > def train(training_set, results, feature_fns, update_fn=None, > > max_iis_iterations=None, iis_converge=None, > > max_newton_iterations=None, newton_converge=None): > > if max_iis_iterations is None : > > max_iis_iterations = MAX_IIS_ITERATIONS > > if iis_converge is None : > > iis_converge = IIS_CONVERGE > > if max_newton_iterations is None : > > max_newton_iterations = MAX_NEWTON_ITERATIONS > > if newton_converge is None : > > newton_converge = NEWTON_CONVERGE > > > > This works :) > > > I hate the use of the local variable being the lowercase version of another > variable. Obviously for the original variables we are stuck with uppercase for > backwards compatibility. So we need to change the names of the lowercase > variables. Hate is a rather strong word. I can see that having the same name except for the case could confuse some people if they are not used to case mattering, but otherwise using the same name seems like a GOOD idea to me for consistency. Do you have any concrete suggestions? We could expand ISS into words. On a related point, using a lower case N for Newton feels a bit wrong to me ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 17 08:22:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Feb 2009 08:22:57 -0500 Subject: [Biopython-dev] [Bug 2762] New: GFF capability in SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2762 Summary: GFF capability in SeqIO Product: Biopython Version: 1.49b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk I'm increasingly coming across GFF format files, and SeqIO currently can't handle them. It might be useful if at some point in the future, it could. Also, the Bio.GFF module handles access to a database, and doesn't provide a mechanism for importing or writing GFF format files. I'm not sure that there is currently any facility to handle this format in Biopython. There are at least two variants of the GFF format that I've seen in use... GFF2 is the one I'm working with at the moment, and its specification is here: http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml I've come across GFF3 in other contexts, and it is defined here: http://www.sequenceontology.org/gff3.shtml Note that GFF3 is similar to GenBank files in that it may explicitly describe both sequence features, and the sequence itself (potentially for multiple sequences). GFF2 has the potential for this in the specification for the Comments section, which includes a recommended syntax for defining sequences to which the features refer, although that spec makes the reasonable assumption that you would be able to obtain the sequence from elsewhere, knowing the sequence ID from the GFF file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 18 08:54:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Feb 2009 08:54:51 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902181354.n1IDsp8m007943@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-18 08:54 EST ------- This looks like a nice idea, and possible with some simplifications to match our existing object scheme. For example, the current SeqRecord and SeqFeature classes do not let us explicitly define parent (part-of) relationships between SeqFeature objects (e.g. GFF3 examples where a CDS has a parent mRNA, or an exon may have multiple parent mRNAs). We do have the idea of sub-features, but this only allows a single parent and thus won't work here. This parent information could be recorded as just another SeqFeature qualifier dictionary entry. P.S. It is nice to see there is an online GFF3 validator :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 18 09:04:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Feb 2009 09:04:28 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902181404.n1IE4SB7010286@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #2 from dalloliogm at gmail.com 2009-02-18 09:04 EST ------- These are the class of things for which I think it would be useful to have a common repository of use cases with the other bio.* projects. I have seen people using every possible extension and modification of gff, and usually re-writing a new gff parser for each case. If you can, you should ask to the maintainers and the other bio.* projects and make your patch as much compatible with their. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 18 09:34:35 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Feb 2009 09:34:35 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902181434.n1IEYZEX016913@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-18 09:34 EST ------- (In reply to comment #2) > If you can, you should ask to the maintainers and the other bio.* projects and > make your patch as much compatible with their. I'm well aware of one very practical issue regarding compatibility between the Bio* projects, which is for BioSQL. Ideally regardless of which Bio* toolkit you use to load a sequence file into a BioSQL database, they should all record the information in the same way. See http://lists.open-bio.org/pipermail/biosql-l/2009-February/001492.html for a discussion of how GFF files should/could be stored in a BioSQL database. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 03:49:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 03:49:40 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902190849.n1J8neuO016523@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-02-19 03:49 EST ------- (In reply to comment #1) > This looks like a nice idea, and possible with some simplifications to match > our existing object scheme. For example, the current SeqRecord and SeqFeature > classes do not let us explicitly define parent (part-of) relationships between > SeqFeature objects (e.g. GFF3 examples where a CDS has a parent mRNA, or an > exon may have multiple parent mRNAs). We do have the idea of sub-features, but > this only allows a single parent and thus won't work here. This parent > information could be recorded as just another SeqFeature qualifier dictionary > entry. I'm not sure that these relationships would need to complicate the SeqFeature class model at all, and agree that the attribute tags indicating Parenthood (in the sense of CDS having parent mRNA, as opposed to the SeqRecord/SeqFeature parent-child relationship) could potentially be treated just as SeqFeature.qualifiers attributes. The possibility of multiple parents (in general, membership of more than one group) in GFF3 lends itself well to the existing list representation of qualifiers. I may be wrong but I think that at least some, if not all, of the relationships you might be worried about (for example, those in your linked post to the BioSQL list) are well-defined within the SOFA ontology. So, for example, a BioSQL database with properly-configured SOFA ontology, and properly-defined relationships, could be used to infer those parent-child relationships on the basis of the corresponding term_ids. I don't think that's a behaviour we need to expect from the SeqRecord/SeqFeature class models. Where possible, those relationships could be rebuilt by another function, or package, so long as the SeqFeature object correctly records those descriptions as SOFA terms in the qualifier (or implicitly uses the SOFA ontology when depositing in a database - but that's another enhancement request ;)), I'm not sure that this needs to complicate the SeqFeature class model either. (That said, maybe somewhere down the line there's a role for SQLite in handling that sort of behaviour 'on-the-fly'...) I may have misunderstood, but I think that this is still the same sort of general arrangement that is already the case for GenBank file. When loading, say, a bacterial chromosome, SeqRecord.seq gets the chromosome sequence, and the gene, CDS, and various misc_features for a single gene are imported as - essentially - independent features. We can unite them, after the fact, the by gene name, or locus_tag, or some other attribute, which is essentially the same kind of operation as uniting a CDS with its parent gene via the SOFA ontology and the Parent tag for upload into a SOFA-compliant instance of BioSQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Feb 19 05:25:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Feb 2009 10:25:46 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> Message-ID: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> On Wed, Oct 1, 2008 at 4:29 PM, Peter wrote: > Peter wrote: >> From a quick look at approach taken in the matplotlib >> code, we could add something like this to setup.py >> >> __version__ = "Undefined" >> for line in open('Bio/__init__.py'): >> if (line.startswith('__version__')): >> exec(line.strip()) >> >> setup( >> name='biopython', >> version=__version__, >> author='The Biopython Consortium', >> ... >> >> I'm happy to deal with this if we are agreed that we >> should add a __version__ to Bio/__init__.py >> (variations on the naming are possible, but this seems >> to be a de-facto standard in python libraries). > > Any objections to making this change now? > > Peter > Since this thread last year, there have been no objections. Following a recent question on the main mailing list about how to determine the version of Biopython this seems worth doing before the next release. Again, an objections or comments on the implementation details? Otherwise I'll make this change shortly. Peter From bsouthey at gmail.com Thu Feb 19 09:44:54 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 19 Feb 2009 08:44:54 -0600 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> Message-ID: <499D7066.6090309@gmail.com> Peter wrote: > On Wed, Oct 1, 2008 at 4:29 PM, Peter wrote: > >> Peter wrote: >> >>> From a quick look at approach taken in the matplotlib >>> code, we could add something like this to setup.py >>> >>> __version__ = "Undefined" >>> for line in open('Bio/__init__.py'): >>> if (line.startswith('__version__')): >>> exec(line.strip()) >>> >>> setup( >>> name='biopython', >>> version=__version__, >>> author='The Biopython Consortium', >>> ... >>> >>> I'm happy to deal with this if we are agreed that we >>> should add a __version__ to Bio/__init__.py >>> (variations on the naming are possible, but this seems >>> to be a de-facto standard in python libraries). >>> >> Any objections to making this change now? >> >> Peter >> >> > > Since this thread last year, there have been no objections. Following > a recent question on the main mailing list about how to determine the > version of Biopython this seems worth doing before the next release. > Again, an objections or comments on the implementation details? > Otherwise I'll make this change shortly. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, Yes, version information must be included! I like numpy's approach because the version will display a svn-related number when using a developmental version. It is rather clever because the real magic occurs with distutils to found the actual svn version (see _get_svn_revision function in the distutils/misc_util.py file). But I do not know if the same tricks would apply to cvs. So the one thing I would ask for is that the __version__ gets changed immediately after a release so it is clear if you are using an official release or a cvs version. I know that will be a little extra burden on the release maintainer. Bruce From biopython at maubp.freeserve.co.uk Thu Feb 19 10:07:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Feb 2009 15:07:30 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: <499D7066.6090309@gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> <499D7066.6090309@gmail.com> Message-ID: <320fb6e00902190707q57677756o71249ad8e12298d0@mail.gmail.com> On Thu, Feb 19, 2009 at 2:44 PM, Bruce Southey wrote: > > Hi, > Yes, version information must be included! > > I like numpy's approach because the version will display a svn-related > number when using a developmental version. It is rather clever because the > real magic occurs with distutils to found the actual svn version (see > _get_svn_revision function in the distutils/misc_util.py file). > > But I do not know if the same tricks would apply to cvs. So the one thing I > would ask for is that the __version__ gets changed immediately after a > release so it is clear if you are using an official release or a cvs > version. I know that will be a little extra burden on the release > maintainer. Given we probably will be moving from CVS to SVN shortly, there doesn't seem to be much point in setting up any "magic" at this point in time. We already manually update the version number in setup.py as part of the build process, so with this change we'll just have to update Bio/__init__.py instead. Peter From bugzilla-daemon at portal.open-bio.org Thu Feb 19 12:37:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:37:56 -0500 Subject: [Biopython-dev] [Bug 2767] New: Bio.SeqIO support for FASTQ and QUAL files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Summary: Bio.SeqIO support for FASTQ and QUAL files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This is an enhancement bug for adding support to Bio.SeqIO for two commonly used file formats for storing sequencing quality information (i.e. error rates). The format FASTQ (or FastQ) contains both sequences and PHREP style quality scores. This file format appears to have been introduced at the Sanger Centre, but there is no official specification that I am aware of. I would suggest for Bio.SeqIO we call this format "fastq" (as in BioPerl). See: http://maq.sourceforge.net/fastq.shtml http://www.bioperl.org/wiki/FASTQ_sequence_format Also note that Solexa/Illumina sequencers can produce FASTQ-like files which use a different score mapping and are therefore cannot be treated in the same way. These would have to be treated as a different file format (e.g. Bio.SeqIO format name "fastq-solexa" might do). QUAL or qual files do not contain sequences but just the PHREP style quality score. Roche 454 sequencers also appear to use this style file (see also Bug 2382), where again I believe that PHREP style scores are used. Because they don't hold the actual sequence, Qual files normally come with a matching FASTA file containing the sequence for each entry (in the same order within the file). I would suggest we call this the "qual" format in Bio.SeqIO (to match BioPerl). See: http://www.bioperl.org/wiki/Qual_sequence_format http://www.cees.uio.no/research/facilities/roche454/resultsfiles.html I will attach a preliminary set of code to support this shortly. For the "qual" format Bio.SeqIO would return SeqRecord objects without any sequence (perhaps as None, although we do know the sequence length...). For both the "qual" and "fastq" formats the SeqRecord object would need to store the PHRED quality scores, ideally as a list of integers. Where we put this information is open to debate. The simple option is to just add the list of integers to the annotation dictionary, perhaps under key name "phred_quality" (with "solexa_quality" used when parsing a Solexa/Illumina style FASTQ file). This will then work with BioSQL (although the qualities will get stored in the database as strings rather than integers). However, this does not facilitate slicing a SeqRecord (i.e. it would make implementing enhancement Bug 2507 much harder). In order to use a paired "fasta" and "qual" file you might do this: def merge_fasta_qual(fasta_record, qual_record) : """Modifies the fasta_record in place, and also returns it.""" assert fasta_record.id == qual_record.id assert len(f_rec) == len(q_rec.annotations["phred_quality"]) f_rec.annotations["phred_quality"] = q_rec.annotations["phred_quality"] return f_rec from Bio import SeqIO records = [merge_fasta_qual(f_rec, q_rec) for (f_rec, q_rec) in \ zip(SeqIO.parse(open("example.fasta"), "fasta"), SeqIO.parse(open("example.qual"), "qual"))] I think it would probably make sense to offer this kind of functionality in the Bio.SeqIO.QualityIO module itself, as this code above has several draw backs (e.g. the zip makes a list in memory, rather than a generator). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 12:42:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:42:08 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191742.n1JHg8H2017714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:42 EST ------- Created an attachment (id=1244) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1244&action=view) Read/write support for FASTQ and QUAL files, using the annotation dict This patch stores the PHRED qualities as a list of integers in the SeqRecord's annotations dictionary. Changing this to use say a property, or a separate per-letter-annotation dictionary should be trivial. For QUAL files, the SeqRecord's seq is set to None. This requires a few changes to test_SeqIO.py which does not expect this. We could also consider introducing an UnknownSeq object (giving it a character like "?", "N", or "X", an alphabet, and a length). This would have a __str__ output like "?"*length. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 12:44:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:44:40 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191744.n1JHiepx018401@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:44 EST ------- Created an attachment (id=1245) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1245&action=view) Patch for Tests/test_SeqIO.py and Bio/SeqIO/__init__.py This patches Bio/SeqIO/__init__.py to define "fastq" and "qual" as input and output file formats. It also patches Tests/test_SeqIO.py to include a couple of FASTQ and QUAL files, and cope with None as a SeqRecord's seq. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 12:47:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:47:52 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191747.n1JHlqFF019314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:47 EST ------- Created an attachment (id=1246) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1246&action=view) ZIP file of plain text FASTQ and QUAL files for the unit tests These are used in the proposed Bio.SeqIO.QualityIO doctests (attachment 1244), and in the modified test_SeqIO.py file (see patch in attachment 1245). These example files should go in a new folder, Tests/Quality -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 12:54:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:54:16 -0500 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200902191754.n1JHsGd3021099@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:54 EST ------- Enhancement Bug 2767 would add support to Bio.SeqIO for the FASTA like QUAL file format used by both Sanger and Roche. See: http://www.bioperl.org/wiki/Qual_sequence_format http://www.cees.uio.no/research/facilities/roche454/resultsfiles.html This would not solve Jared's original request for a generic FASTA like parsing framework - although it would solve the particular example of dealing with a pair of FASTA and QUAL files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 13:09:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 13:09:13 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191809.n1JI9Doq024787@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 13:09 EST ------- (In reply to comment #0) > In order to use a paired "fasta" and "qual" file you might do this: > > def merge_fasta_qual(fasta_record, qual_record) : > """Modifies the fasta_record in place, and also returns it.""" > assert fasta_record.id == qual_record.id > assert len(f_rec) == len(q_rec.annotations["phred_quality"]) > f_rec.annotations["phred_quality"] = q_rec.annotations["phred_quality"] > return f_rec > > from Bio import SeqIO > records = [merge_fasta_qual(f_rec, q_rec) for (f_rec, q_rec) in \ > zip(SeqIO.parse(open("example.fasta"), "fasta"), > SeqIO.parse(open("example.qual"), "qual"))] > > I think it would probably make sense to offer this kind of functionality in > the Bio.SeqIO.QualityIO module itself, as this code above has several draw > backs (e.g. the zip makes a list in memory, rather than a generator). Alternatively, if you have enough RAM to hold all the records in memory at once, then a simple dictionary approach using just Bio.SeqIO methods would also work. This was inspired by Jared's related example at the end of Bug 2382 comment 0. >>> from Bio import SeqIO >>> reads = SeqIO.to_dict(SeqIO.parse(open("Quality/example.fasta"), "fasta")) >>> for record in SeqIO.parse(open("Quality/example.qual"), "qual") : ... reads[record.id].annotations["phred_quality"] = record.annotations["phred_quality"] You can then access any record by its key, and get both the sequence and the quality scores. >>> print reads["EAS54_6_R1_2_1_540_792"].format("fastq") @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 This is neat, but given QUAL files are often very very large, wanting to use an iterator may be more typical. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 13:19:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 13:19:49 -0500 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200902191819.n1JIJndd026934@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #15 from jflatow at northwestern.edu 2009-02-19 13:19 EST ------- Created an attachment (id=1247) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1247&action=view) command line tool/library for sifting through and transforming complex (multiline) records I should have posted this earlier. Basically this is the strategy I have been taking for dealing with complex record types (multiline), so that you can filter them the way you would with sed/awk or other stream editing tools. This is far from perfect, and it may not be helpful, but perhaps it will give a better idea of what I was picturing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 13:48:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 13:48:46 -0500 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200902191848.n1JImkmY030625@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 13:48 EST ------- (In reply to comment #15) > Created an attachment (id=1247) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1247&action=view) [details] > command line tool/library for sifting through and transforming complex > (multiline) records > > I should have posted this earlier. Basically this is the strategy I have been > taking for dealing with complex record types (multiline), so that you can > filter them the way you would with sed/awk or other stream editing tools. This > is far from perfect, and it may not be helpful, but perhaps it will give a > better idea of what I was picturing. I just had a quick look, and in someways it reminds me of the Martel/Mindy deprecated parsing infrastructure Biopython used to use. This was very flexible, perhaps too flexible as it had quite a learning curve - plus it didn't scale well with large records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 03:34:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 03:34:42 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902200834.n1K8YgK2002690@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #5 from jblanca at btc.upv.es 2009-02-20 03:34 EST ------- Regarding where to store the quality information in the SeqRecord I'm in faor of using a property named .qual or .quality. That is consistent with the actual .seq property. I think this approach is cleaner, and it is very easy to implement and to understand. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Feb 20 06:15:57 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Feb 2009 11:15:57 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? Message-ID: <320fb6e00902200315q19c4dfebr8502a052a1a4fc9b@mail.gmail.com> Over on enhancement Bug 2767, I have uploaded parsers and writers for the FASTQ and QUAL file format, which both hold PHRED style quality scores (integers ranging from 0 to about 90). See http://bugzilla.open-bio.org/show_bug.cgi?id=2767 One open question in this enhancement is how to store these PHRED quality scores in the SeqRecord. Keep in mind that there is more than one type of quality score in use, for example Solexa/Illumina use a different scaling (although it is possible to map between them without too much trouble for the mid range scores), something I hadn't noticed when we last talked abut this (Sept 2008). See: http://lists.open-bio.org/pipermail/biopython-dev/2008-September/004250.html For the initial code on Bug 2767, I took the simple and extensible route of recording the PHRED qualities as a list of integers in the SeqRecord's annotation dictionary under the key "phred_quality". There are a couple of drawbacks. Firstly, sequencing qualities are a good example of per-letter-annotation (others include secondary structure, atomic coordinates - which would apply to proteins as well as nucleotides). If we want to be able to slice a SeqRecord (Bug 2507) then it is important to distinguish between general annotation (like the source species) and per-letter-annotation (which should also be sliced). One way of dealing with this is to introduce a per-letter-annotation dictionary for the SeqRecord, whose entries would be strings/lists with a length equal to that of the sequence. Secondly, putting the PHRED qualities inside an annotations dictionary (or even a per-letter-annotation dictionary) doesn't make them very accessible. If you are wanting to work with sequencing reads, then the sequence, quality and identifier are all key properties. In bug 2767 comment #5 Jose wrote: > Regarding where to store the quality information in the SeqRecord I'm in > favor of using a property named .qual or .quality. That is consistent with > the actual .seq property. I think this approach is cleaner, and it is very > easy to implement and to understand. I can certainly appreciate that a top level property is easier to use - and perhaps quality scores are important enough to justify this. However, what about PHRED qualities versus Solexa/Illumina qualities, or another sequencing system's scheme? I hadn't thought about this incompatibility when we were discussion this on the mailing list last year (Sept 2008). I suppose you could consider adding a .phred_quality property which is explicit, but then you'd end up with many different properties. Then there are other per-letter quality annotations - you might want the A, C, G and T intensity from capillary sequencing (four sets of numbers, not just one). Plus of course this doesn't address non-quality related per-letter-annotations (like secondary structure, or atomic coordinates). My point is that if we can't give top level properties to everything, hence the original introduction of the annotations dictionary in the first place. Only a handful of really important things got their own properties (id, name, description and the sequence itself). If there was only ONE key quality score, then I wouldn't mind making an exception so much - but that doesn't seem to be the case. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 20 06:17:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 06:17:44 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902201117.n1KBHiLG024083@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 06:17 EST ------- In comment #5 Jose wrote: > Regarding where to store the quality information in the SeqRecord I'm in > favor of using a property named .qual or .quality. That is consistent with > the actual .seq property. I think this approach is cleaner, and it is very > easy to implement and to understand. I can appreciate that a top level property is easier to use - and perhaps quality scores are important enough to justify this. However, what about PHRED qualities versus Solexa/Illumina qualities, or another sequencing system's scheme? I've replied in more depth on the mailing list, where I suggest we discuss this: http://lists.open-bio.org/pipermail/biopython-dev/2009-February/005340.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Fri Feb 20 06:49:36 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 20 Feb 2009 12:49:36 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902200315q19c4dfebr8502a052a1a4fc9b@mail.gmail.com> References: <320fb6e00902200315q19c4dfebr8502a052a1a4fc9b@mail.gmail.com> Message-ID: <200902201249.36743.jblanca@btc.upv.es> > I suppose you could consider adding a .phred_quality > property which is explicit, but then you'd end up with many different > properties. Then there are other per-letter quality annotations - you > might want the A, C, G and T intensity from capillary sequencing (four > sets of numbers, not just one). Plus of course this doesn't address > non-quality related per-letter-annotations (like secondary structure, > or atomic coordinates). > > My point is that if we can't give top level properties to everything, > hence the original introduction of the annotations dictionary in the > first place. Only a handful of really important things got their own > properties (id, name, description and the sequence itself). If there > was only ONE key quality score, then I wouldn't mind making an > exception so much - but that doesn't seem to be the case. That's a very good point. It wouldn't be wise to populate the SeqRecord class with a lot of properties. Another posible approach would be to create a derived class for that a SeqWithQuality. It would be like a SeqRecord but with a .quality property. For other cases other classes could be derived from SeqRecord. The problem with putting the quatilies in a dict with all the other per base annotation is that it has a different behaviour than the .seq case. The seq case is special because is much more used, so maybe that's fair enough. I don't know, maybe it is wiser to set all the per case annotations in a dict a let the sequence outside. In that way we won't be creating a lot of new classes derived from SeqRecord. The more I think about the dict possibility, the more I like it. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From lpritc at scri.ac.uk Fri Feb 20 09:15:50 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Fri, 20 Feb 2009 14:15:50 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <200902201249.36743.jblanca@btc.upv.es> Message-ID: Another 2p... I collect them, you know... An additional determinant of how these values are best scored is: "What will they be used for?". If the only use they would ever find was to accompany a sequence so that its file format could be converted from one with embedded qualities to a format that required two such files (or vice-versa), then straightforward storage as a string in a dictionary is all that's needed. This would be sufficient for conversion between some quality scores, as a utility function could just grab the stored string (given an appropriate name for each quality format). The question of how these per-symbol annotations would be modified when returning a Seq slice or join may be an issue. If 'live' access to the values is required for calculation or alignment purposes, then a different interface might be more useful, permitting slicing, base selection on the basis of quality, or other operation. This use case is more complex, as the return value is likely to be dependent on the quality format (single- or multiple-value per base). Conceptually, I see quality scores as annotations of a sequence, rather than an intrinsic property of the sequence, so am happy for them to live in the same place other annotations do. I also see them as only one instance of a class of per-symbol annotations (along with hydrophobicity scores, secondary structure predictions, read map counts and several other measures). I think, therefore, that there is a case for a class describing per-symbol annotations to a Seq, and placing these in a dictionary of per-symbol annotations. Slices of the parent Seq could then be propagated downwards to all members of that dictionary (which would also be expected to implement the same string-like methods as the parent). The per-symbol annotation objects could be subclassed and/or contain a descriptive string from a controlled vocabulary to indicate their format, for standard interfacing with external packages (e.g. Drawing TOPS diagrams from secondary structure predictions or rendering base quality profiles), which I think would be a flexible approach. On 20/02/2009 11:49, "Jose Blanca" wrote: >> I suppose you could consider adding a .phred_quality >> property which is explicit, but then you'd end up with many different >> properties. Then there are other per-letter quality annotations - you >> might want the A, C, G and T intensity from capillary sequencing (four >> sets of numbers, not just one). Plus of course this doesn't address >> non-quality related per-letter-annotations (like secondary structure, >> or atomic coordinates). >> >> My point is that if we can't give top level properties to everything, >> hence the original introduction of the annotations dictionary in the >> first place. Only a handful of really important things got their own >> properties (id, name, description and the sequence itself). If there >> was only ONE key quality score, then I wouldn't mind making an >> exception so much - but that doesn't seem to be the case. > That's a very good point. It wouldn't be wise to populate the SeqRecord class > with a lot of properties. > Another posible approach would be to create a derived class for that a > SeqWithQuality. It would be like a SeqRecord but with a .quality property. > For other cases other classes could be derived from SeqRecord. > The problem with putting the quatilies in a dict with all the other per base > annotation is that it has a different behaviour than the .seq case. The seq > case is special because is much more used, so maybe that's fair enough. > I don't know, maybe it is wiser to set all the per case annotations in a dict > a let the sequence outside. In that way we won't be creating a lot of new > classes derived from SeqRecord. > The more I think about the dict possibility, the more I like it. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Fri Feb 20 11:01:06 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:01:06 -0500 Subject: [Biopython-dev] [Bug 2768] New: Bio.Entrez under a proxy Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2768 Summary: Bio.Entrez under a proxy Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com I think you should add, in biopython's tutorial, a short explanation on how to setup a proxy for modules like Bio.Entrez. I have tried a simple query with entrez, but the first time I have received this error: $: ipython >>> from Bio import Entrez >>> handle = Entrez.einfo() IOError Traceback (most recent call last) ... [Errno url error] invalid proxy for http: 'proxy.upf.es:8080' I am using the latest biopython cvs, updated yesterday. On my system, the proxy variables were set like this: $http_proxy = 'proxy.upf.es:8080' $HTTP_PROXY = 'proxy.upf.es:8080' After a few tries, it seems that the module uses the HTTP_PROXY variable and that it expects it to contain 'http://' $: export HTTP_PROXY=http://proxy.upf.es:8080 $: ipython >>> from Bio import Entrez >>> Entrez.einfo() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 11:15:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:15:24 -0500 Subject: [Biopython-dev] [Bug 2769] New: Entrez results: seek methods doesn't work? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2769 Summary: Entrez results: seek methods doesn't work? Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com Many methods in Entrez return a file-like object which has methods like .read, .readlines, etc.. However I report this error in the .seek method: >>> from Bio import Entrez >>> result = Entrez.einfo() >>> print result.read() ... >>> print result.read() >>> print handle.seek(0) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/gioby/ in () /home/gioby/usr/share/biopython/Bio/File.pyc in seek(self, *args) 89 def seek(self, *args): 90 self._saved = [] ---> 91 self._handle.seek(*args) 92 93 def __getattr__(self, attr): AttributeError: addinfourl instance has no attribute 'seek' p.s. system info: I am running the latest biopython cvs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 11:21:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:21:36 -0500 Subject: [Biopython-dev] [Bug 2770] New: suggestion: raise a warning if Entrez.email is not set Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2770 Summary: suggestion: raise a warning if Entrez.email is not set Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com This is a just proposal... In the biopython tutorial, you suggest users to always set Entrez.email before using any Entrez util: - http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc65 You could raise a warning if Entrez.email is not set and if any util is used. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 11:43:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:43:20 -0500 Subject: [Biopython-dev] [Bug 2771] New: Entrez.efetch: dbSNP not supported yet? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2771 Summary: Entrez.efetch: dbSNP not supported yet? Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com Executing efetch on the 'snp' database returns an html file instead of an xml (by default, running efetch on 'gene' or another database returns an xml). >>> handle = Entrez.efetch(db='snp', id='9996597',) >>> cont = handle.read() >>> print cont ... Moreover, even when forcing retmode=xml, it seems that the xml file returned is written in an xml not supported by biopython (not sure if this a ncbi's problem): >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml') >>> cont = handle.read() >>> print cont ' ... You can see the problem better if you open the result handle as explained in the tutorial, via Entrez.read: >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml') >>> result = Entrez.read(handle) --------------------------------------------------------------------------- UnboundLocalError Traceback (most recent call last) /home/gioby/Test/NCBI_wsdl/ in () /home/gioby/usr/share/biopython/Bio/Entrez/__init__.pyc in read(handle) 284 DTDs = os.path.join(__path__[0], "DTDs") 285 handler = DataHandler(DTDs) --> 286 record = handler.run(handle) 287 return record 288 /home/gioby/usr/share/biopython/Bio/Entrez/Parser.py in run(self, handle) 93 self.parser.CharacterDataHandler = self.characters 94 self.parser.ExternalEntityRefHandler = self.external_entity_ref_handler ---> 95 self.parser.ParseFile(handle) 96 self.parser = None 97 return self.object /home/gioby/usr/share/biopython/Bio/Entrez/Parser.py in startElement(self, name, attrs) 129 self.attributes = attrs 130 return --> 131 if object!="": 132 object.tag = name 133 if attrs: UnboundLocalError: local variable 'object' referenced before assignment Try this code also, it will return a different error: >>> handle = Entrez.efetch(db='snp', id='9996597') # retmode is HTML >>> result = Entrez.read(handle) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 11:50:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:50:09 -0500 Subject: [Biopython-dev] [Bug 2771] Entrez.efetch: dbSNP not supported yet? In-Reply-To: Message-ID: <200902201650.n1KGo9e5026394@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #1 from dalloliogm at gmail.com 2009-02-20 11:50 EST ------- (In reply to comment #0) > Executing efetch on the 'snp' database returns an html file instead of an xml > (by default, running efetch on 'gene' or another database returns an xml). > > >>> handle = Entrez.efetch(db='snp', id='9996597',) > >>> cont = handle.read() > >>> print cont > > ... > Sorry, this part is not correct. I am opening another bug report (#2772 ?) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 12:16:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:16:49 -0500 Subject: [Biopython-dev] [Bug 2772] New: Entrez.efetch: the default value of 'retmode' depends on the database Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2772 Summary: Entrez.efetch: the default value of 'retmode' depends on the database Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com minor issue: Entrez.efetch and Entrez.esummary have different 'retmode' default values. This sometimes is confusing for the users. >>> Entrez.esummary(db='snp', id=1).readline() >>> Entrez.efetch(db='snp', id=1).readline() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 12:37:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:37:08 -0500 Subject: [Biopython-dev] [Bug 2768] Bio.Entrez under a proxy In-Reply-To: Message-ID: <200902201737.n1KHb8hp008109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2768 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:37 EST ------- This does seem like a good idea for the documentation - after all, you are not the first person to ask. See: http://lists.open-bio.org/pipermail/biopython/2008-November/004756.html http://www.python.org/doc/2.5.2/lib/module-urllib.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 12:42:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:42:29 -0500 Subject: [Biopython-dev] [Bug 2770] suggestion: raise a warning if Entrez.email is not set In-Reply-To: Message-ID: <200902201742.n1KHgTOo009851@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2770 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:42 EST ------- We did actually have a warning in the code in CVS, but had concluded it was perhaps a bit too much - see mailing list discussions and revision 1.37 of Bio/Entrez/__init__.py, viewable here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Entrez/__init__.py?cvsroot=biopython The NCBI guidelines are relatively relaxed about this: http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements Let's leave this open in case anyone else wants to comment, but unless the NCBI change their guidelines I am inclined to leave Bio.Entrez as it is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 12:48:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:48:00 -0500 Subject: [Biopython-dev] [Bug 2772] Entrez.efetch: the default value of 'retmode' depends on the database In-Reply-To: Message-ID: <200902201748.n1KHm0f4011522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2772 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:48 EST ------- This is not a bug in Biopython - for Bio.Entrez we leave the defaults to the NCBI, who can and may change them at any time. If you want XML, you should explicitly ask for it. Explicit is better than implicit. http://www.python.org/dev/peps/pep-0020/ [For comparison, our qblast wrapper is perhaps more confusing as it has its own defaults set within Biopython, and since it was first written the NCBI have changed some of their default parameters.] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 12:58:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:58:12 -0500 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200902201758.n1KHwCvA014904@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Entrez.efetch: dbSNP not |Bio.Entrez.read can't parse |supported yet? |XML files from dbSNP (snp | |database) ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:58 EST ------- I've retitled the bug to focus on the failure to parse the XML file (there is no problem in Bio.Entrez.efetch as far as I can tell). For example, >>> from Bio import Entrez >>> result = Entrez.read(Entrez.efetch(db='snp', id='9996597', retmode='xml')) Traceback (most recent call last): File "", line 1, in File "Bio/Entrez/__init__.py", line 286, in read record = handler.run(handle) File "Bio/Entrez/Parser.py", line 95, in run self.parser.ParseFile(handle) File "Bio/Entrez/Parser.py", line 131, in startElement if object!="": UnboundLocalError: local variable 'object' referenced before assignment This may be an NCBI bug, try this: >>> from Bio import Entrez >>> print Entrez.efetch(db='snp', id='9996597', retmode='xml').read() ... Then copy and paste the XML into a validation site like http://www.validome.org/xml/validate/ where I see an error. On the other hand, http://validator.w3.org/#validate_by_input seems happy with only a warning. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 13:03:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 13:03:11 -0500 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200902201803.n1KI3BAi016506@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 13:03 EST ------- (In reply to comment #2) > This may be an NCBI bug, ... According to this page there is/was a problem with the XML files returned for the snp database by efetch, http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html >> Known issues >> * ... >> * eFetch utility generates an invalid XML for SNP, so currently it doesn't >> work through SOAP. The bug is being fixed. >> * ... Unfortunately I have no idea if that information is current or not. This could been unrelated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 13:07:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 13:07:15 -0500 Subject: [Biopython-dev] [Bug 2769] Entrez results: seek methods doesn't work? In-Reply-To: Message-ID: <200902201807.n1KI7FBv017703@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2769 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 13:07 EST ------- This is normal and expected behaviour from many python file-like handles. In particular it is normal for handles to network resources, e.g. >>> import urllib >>> handle = urllib.urlopen("http://biopython.org/") >>> print handle.read() ... >>> handle.seek(0) Traceback (most recent call last): File "", line 1, in AttributeError: addinfourl instance has no attribute 'seek' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 13:43:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 13:43:52 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902201843.n1KIhqCG026268@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 13:43 EST ------- (In reply to comment #0) > this patch fix a problem that TabIO had (fail if there it are more than two > tabs, or spaces instead of tabs, between the title and the sequence), Those cases are intentionally not supported, but the error message should now be clearer. > and introduces a check to skip empty lines. Fixed as this seems like a good idea (you can often get an empty line at the end of files). Closing this bug as "fixed". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Feb 20 18:19:04 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 20 Feb 2009 18:19:04 -0500 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: References: <200902201249.36743.jblanca@btc.upv.es> Message-ID: <20090220231904.GE18294@sobchak.mgh.harvard.edu> Hi all; Good points on this debate so far. What do you all think about a hybrid approach where the .quality attribute is a dictionary? The keys would be the quality type ("phred", "solexa"...) and the values would be a list or string the same length as the sequence. For slicing, all of the quality dictionary values would be sliced identically to the sequence itself. For BioSQL storage the quality items would go in as annotations with names as a concatenation of the attribute and type ("quality_phred"). Treating these specially on the BioSQL in/out is a little hack-y, but quality is likely important enough to not bury it. For Leighton's idea of generalization you could either: - Derive a heavy-weight SeqRecord class from the base class that added a several additional per-symbol cases. - Provide a generic per_symbol_annotations attribute that collected these as a dictionary of dictionaries: dict(quality = dict(phred = [20, 30]), hydrophobicity = dict(some_predictor = ['some', 'scores']) ) These could map to generic attributes in the same way and follow the same slicing rules. After writing this up, I think the second idea is better and probably exactly what Leighton was proposing. Brad > Another 2p... I collect them, you know... > > An additional determinant of how these values are best scored is: "What will > they be used for?". > > If the only use they would ever find was to accompany a sequence so that its > file format could be converted from one with embedded qualities to a format > that required two such files (or vice-versa), then straightforward storage > as a string in a dictionary is all that's needed. This would be sufficient > for conversion between some quality scores, as a utility function could just > grab the stored string (given an appropriate name for each quality format). > The question of how these per-symbol annotations would be modified when > returning a Seq slice or join may be an issue. > > If 'live' access to the values is required for calculation or alignment > purposes, then a different interface might be more useful, permitting > slicing, base selection on the basis of quality, or other operation. This > use case is more complex, as the return value is likely to be dependent on > the quality format (single- or multiple-value per base). > > Conceptually, I see quality scores as annotations of a sequence, rather than > an intrinsic property of the sequence, so am happy for them to live in the > same place other annotations do. I also see them as only one instance of a > class of per-symbol annotations (along with hydrophobicity scores, secondary > structure predictions, read map counts and several other measures). I > think, therefore, that there is a case for a class describing per-symbol > annotations to a Seq, and placing these in a dictionary of per-symbol > annotations. Slices of the parent Seq could then be propagated downwards to > all members of that dictionary (which would also be expected to implement > the same string-like methods as the parent). > > The per-symbol annotation objects could be subclassed and/or contain a > descriptive string from a controlled vocabulary to indicate their format, > for standard interfacing with external packages (e.g. Drawing TOPS diagrams > from secondary structure predictions or rendering base quality profiles), > which I think would be a flexible approach. > > On 20/02/2009 11:49, "Jose Blanca" wrote: > > >> I suppose you could consider adding a .phred_quality > >> property which is explicit, but then you'd end up with many different > >> properties. Then there are other per-letter quality annotations - you > >> might want the A, C, G and T intensity from capillary sequencing (four > >> sets of numbers, not just one). Plus of course this doesn't address > >> non-quality related per-letter-annotations (like secondary structure, > >> or atomic coordinates). > >> > >> My point is that if we can't give top level properties to everything, > >> hence the original introduction of the annotations dictionary in the > >> first place. Only a handful of really important things got their own > >> properties (id, name, description and the sequence itself). If there > >> was only ONE key quality score, then I wouldn't mind making an > >> exception so much - but that doesn't seem to be the case. > > That's a very good point. It wouldn't be wise to populate the SeqRecord class > > with a lot of properties. > > Another posible approach would be to create a derived class for that a > > SeqWithQuality. It would be like a SeqRecord but with a .quality property. > > For other cases other classes could be derived from SeqRecord. > > The problem with putting the quatilies in a dict with all the other per base > > annotation is that it has a different behaviour than the .seq case. The seq > > case is special because is much more used, so maybe that's fair enough. > > I don't know, maybe it is wiser to set all the per case annotations in a dict > > a let the sequence outside. In that way we won't be creating a lot of new > > classes derived from SeqRecord. > > The more I think about the dict possibility, the more I like it. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by > guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views > expressed by the sender are not necessarily the views of SCRI and its > subsidiaries. This email and any files transmitted with it are > confidential > > to the intended recipient at the e-mail address to which it has been > addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this > > confidentiality and you must not use, disclose, copy, print or rely on > this > e-mail in any way. Please notify postmaster at scri.ac.uk quoting the > name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are > present in this email, neither the Institute nor the sender accepts any > responsibility for any viruses, and it is your responsibility to scan > the email and the attachments (if any). > ______________________________________________________________________ > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From idoerg at gmail.com Fri Feb 20 19:24:43 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Fri, 20 Feb 2009 16:24:43 -0800 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <20090220231904.GE18294@sobchak.mgh.harvard.edu> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> Message-ID: <1235175883.22598.62.camel@lafa> Hi all, I am sort of living in this world right now, doing a lot of metagenomics, so here are my $0.02. I agree with Leighton (assuming I understand him): We should consider the possible applications people will run using the quality data when designing the from what I have seen the most common use for quality scores is for trimming the sequences, i.e. removing the lesser quality sequence data (usually on the edges) from the 5' and 3' ends of the read. So any data structure should take into consideration that we will probably have a .trim(self,threshold) method or function trim(seq, threshold) that will return a slice of the sequence. 2) There is a certain optimization need. Quality scores usually appear on high-throughput data, which today can mean around 3GBp per run. I am not sure where this is going exactly, but maybe in the advent of high throughput short-read based genomics we should think about a slim SeqRecord to expedite processing of short read processing. Or simply write some stuff wrapped around C. ./I On Fri, 2009-02-20 at 18:19 -0500, Brad Chapman wrote: > Hi all; > Good points on this debate so far. What do you all think about a > hybrid approach where the .quality attribute is a dictionary? The > keys would be the quality type ("phred", "solexa"...) and the values > would be a list or string the same length as the sequence. > > For slicing, all of the quality dictionary values would be sliced > identically to the sequence itself. For BioSQL storage the quality > items would go in as annotations with names as a concatenation > of the attribute and type ("quality_phred"). > > Treating these specially on the BioSQL in/out is a little hack-y, > but quality is likely important enough to not bury it. > > For Leighton's idea of generalization you could either: > > - Derive a heavy-weight SeqRecord class from the base class that > added a several additional per-symbol cases. > > - Provide a generic per_symbol_annotations attribute that collected > these as a dictionary of dictionaries: > > dict(quality = dict(phred = [20, 30]), > hydrophobicity = dict(some_predictor = ['some', 'scores']) > ) > > These could map to generic attributes in the same way and follow the > same slicing rules. After writing this up, I think the second idea > is better and probably exactly what Leighton was proposing. > > Brad > > > Another 2p... I collect them, you know... > > > > An additional determinant of how these values are best scored is: "What will > > they be used for?". > > > > If the only use they would ever find was to accompany a sequence so that its > > file format could be converted from one with embedded qualities to a format > > that required two such files (or vice-versa), then straightforward storage > > as a string in a dictionary is all that's needed. This would be sufficient > > for conversion between some quality scores, as a utility function could just > > grab the stored string (given an appropriate name for each quality format). > > The question of how these per-symbol annotations would be modified when > > returning a Seq slice or join may be an issue. > > > > If 'live' access to the values is required for calculation or alignment > > purposes, then a different interface might be more useful, permitting > > slicing, base selection on the basis of quality, or other operation. This > > use case is more complex, as the return value is likely to be dependent on > > the quality format (single- or multiple-value per base). > > > > Conceptually, I see quality scores as annotations of a sequence, rather than > > an intrinsic property of the sequence, so am happy for them to live in the > > same place other annotations do. I also see them as only one instance of a > > class of per-symbol annotations (along with hydrophobicity scores, secondary > > structure predictions, read map counts and several other measures). I > > think, therefore, that there is a case for a class describing per-symbol > > annotations to a Seq, and placing these in a dictionary of per-symbol > > annotations. Slices of the parent Seq could then be propagated downwards to > > all members of that dictionary (which would also be expected to implement > > the same string-like methods as the parent). > > > > The per-symbol annotation objects could be subclassed and/or contain a > > descriptive string from a controlled vocabulary to indicate their format, > > for standard interfacing with external packages (e.g. Drawing TOPS diagrams > > from secondary structure predictions or rendering base quality profiles), > > which I think would be a flexible approach. > > > > On 20/02/2009 11:49, "Jose Blanca" wrote: > > > > >> I suppose you could consider adding a .phred_quality > > >> property which is explicit, but then you'd end up with many different > > >> properties. Then there are other per-letter quality annotations - you > > >> might want the A, C, G and T intensity from capillary sequencing (four > > >> sets of numbers, not just one). Plus of course this doesn't address > > >> non-quality related per-letter-annotations (like secondary structure, > > >> or atomic coordinates). > > >> > > >> My point is that if we can't give top level properties to everything, > > >> hence the original introduction of the annotations dictionary in the > > >> first place. Only a handful of really important things got their own > > >> properties (id, name, description and the sequence itself). If there > > >> was only ONE key quality score, then I wouldn't mind making an > > >> exception so much - but that doesn't seem to be the case. > > > That's a very good point. It wouldn't be wise to populate the SeqRecord class > > > with a lot of properties. > > > Another posible approach would be to create a derived class for that a > > > SeqWithQuality. It would be like a SeqRecord but with a .quality property. > > > For other cases other classes could be derived from SeqRecord. > > > The problem with putting the quatilies in a dict with all the other per base > > > annotation is that it has a different behaviour than the .seq case. The seq > > > case is special because is much more used, so maybe that's fair enough. > > > I don't know, maybe it is wiser to set all the per case annotations in a dict > > > a let the sequence outside. In that way we won't be creating a lot of new > > > classes derived from SeqRecord. > > > The more I think about the dict possibility, the more I like it. > > > > -- > > Dr Leighton Pritchard MRSC > > D131, Plant Pathology Programme, SCRI > > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > > > > ______________________________________________________________________ > > SCRI, Invergowrie, Dundee, DD2 5DA. > > The Scottish Crop Research Institute is a charitable company limited by > > guarantee. > > Registered in Scotland No: SC 29367. > > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > > > > DISCLAIMER: > > > > This email is from the Scottish Crop Research Institute, but the views > > expressed by the sender are not necessarily the views of SCRI and its > > subsidiaries. This email and any files transmitted with it are > > confidential > > > > to the intended recipient at the e-mail address to which it has been > > addressed. It may not be disclosed or used by any other than that > > addressee. > > If you are not the intended recipient you are requested to preserve this > > > > confidentiality and you must not use, disclose, copy, print or rely on > > this > > e-mail in any way. Please notify postmaster at scri.ac.uk quoting the > > name of the sender and delete the email from your system. > > > > Although SCRI has taken reasonable precautions to ensure no viruses are > > present in this email, neither the Institute nor the sender accepts any > > responsibility for any viruses, and it is your responsibility to scan > > the email and the attachments (if any). > > ______________________________________________________________________ > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Iddo Friedberg, Ph.D. CALIT2 Atkinson Hall MC #0446 University of California San Diego 9500 Gilman Drive La Jolla, CA 92093-0446 USA +1 (858) 534-0570 http://iddo-friedberg.org From biopython at maubp.freeserve.co.uk Sat Feb 21 13:50:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Feb 2009 18:50:15 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <20090220231904.GE18294@sobchak.mgh.harvard.edu> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> On Fri, Feb 20, 2009 at 11:19 PM, Brad Chapman wrote: > Hi all; > Good points on this debate so far. What do you all think about a > hybrid approach where the .quality attribute is a dictionary? The > keys would be the quality type ("phred", "solexa"...) and the values > would be a list or string the same length as the sequence. I was actually thinking about adding a per_letter_annotations (or using Brad's suggested name per_symbol_annotations) dictionary which could hold phred qualities, solexa qualities, secondary structure, atomic coordinates - any python sequence (e.g. string, list or tuple) with a length matching the sequence. This would cover all the use cases I have come up with, and we can implement SeqRecord slicing which would also slice everything in the per_letter_annotations dictionary. Note that the per_letter_annotations dictionary could actually be a simple subclass of the python dictionary that only allows you to add elements with the appropriate length - this would prevent simple abuses/accidental errors. > For slicing, all of the quality dictionary values would be sliced > identically to the sequence itself. For BioSQL storage the quality > items would go in as annotations with names as a concatenation > of the attribute and type ("quality_phred"). > > Treating these specially on the BioSQL in/out is a little hack-y, > but quality is likely important enough to not bury it. If you are trying to store a sequence-with-quality in BioSQL, then yes using the existing annotation tables could work - the ontology term can tell us its a per-letter-annotation rather than a generic annotation. The only catch is the current tables only let us store strings. We could store each per-letter-annotation entry (e.g. a single quality score) as a separate table entry (where the rank tells us the correct order), but bundling them all into a single long table row might be more efficient. In the case of PHRED or Solexa scores, we could even use the FASTQ encoding (but a string "10, 20, 50, ..." might be more sensible). This would require some co-ordination with the other Bio* projects, probably on the BioSQL mailing list. On the other hand, I don't expect anyone to try and store GB of sequence+quality data in BioSQL. For this a custom database design would be much more efficient (or at least some custom tables). Here as Iddo points out, the SeqRecord object may be overkill. > For Leighton's idea of generalization you could either: > > - Derive a heavy-weight SeqRecord class from the base class that > added a several additional per-symbol cases. > > - Provide a generic per_symbol_annotations attribute that collected > these as a dictionary of dictionaries: > > dict(quality = dict(phred = [20, 30]), > hydrophobicity = dict(some_predictor = ['some', 'scores']) > ) > > These could map to generic attributes in the same way and follow the > same slicing rules. After writing this up, I think the second idea > is better and probably exactly what Leighton was proposing. I'm not sure if its exactly what Leighton has in mind, but it seems more complicated to have to do my_record.per_symbol_annotations["quality"]["phred"] rather than just my_record.per_symbol_annotations["quality_phred"]. I don't see much benefit to the extra level of nesting - after all you'll typically only have one type of quality present. Peter From biopython at maubp.freeserve.co.uk Sat Feb 21 14:03:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Feb 2009 19:03:14 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <1235175883.22598.62.camel@lafa> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <1235175883.22598.62.camel@lafa> Message-ID: <320fb6e00902211103n175fefc7w71a6922ee0cd0f26@mail.gmail.com> On Sat, Feb 21, 2009 at 12:24 AM, Iddo Friedberg wrote: > > Hi all, > > I am sort of living in this world right now, doing a lot of > metagenomics, so here are my $0.02. I agree with Leighton (assuming I > understand him): We should consider the possible applications people > will run using the quality data when designing the [parser?] Sure. By having the FASTQ and QUAL files integrated into Bio.SeqIO (using SeqRecord objects) one simple use case is supported - interconverting these files into other formats (e.g. FASTQ to FASTA, or with a little more effort FASTA+QUAL to FASTQ). Your trimming example is a another good use case - which could be done with the SeqRecord representation. For anything more complicated (like assembly or mapping onto a genome), with massive datasets the modest overhead of the SeqRecord and Seq objects could be an issue - but isn't this sort of thing is usually best handled by an external tool (written in C or C++ by a specialist)? Anyway - If you have a look at Bug 2767 at the first attachment I did the core of the FASTQ parser as a generic function returning a tuple of strings (the record title, sequence and the encoded quality string - see FastqGeneralIterator). While this could be just a private function, I was thinking this could actually be very helpful for anyone trying to do something where performance speed or memory usage was important. On top of this core parser, I had a FastqPhredIterator (and would similarly have a FastqSolexaIterator) function which turns these into SeqRecord objects for use via the Bio.SeqIO API. i.e. We can offer both the standard Bio.SeqIO interface using SeqRecords, and a simpler string based parser for those that need it. Peter From chapmanb at 50mail.com Sun Feb 22 16:27:42 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 22 Feb 2009 16:27:42 -0500 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> Message-ID: <20090222212742.GA58314@kunkel> Hi all; > I was actually thinking about adding a per_letter_annotations (or > using Brad's suggested name per_symbol_annotations) dictionary which > could hold phred qualities, solexa qualities, secondary structure, > atomic coordinates - any python sequence (e.g. string, list or tuple) > with a length matching the sequence. This would cover all the use > cases I have come up with, and we can implement SeqRecord slicing > which would also slice everything in the per_letter_annotations > dictionary. [...] > I'm not sure if its exactly what Leighton has in mind, but it seems > more complicated to have to do > my_record.per_symbol_annotations["quality"]["phred"] rather than just > my_record.per_symbol_annotations["quality_phred"]. I'm agreed with you here -- the double dictionary I proposed is ugly and doesn't do much of anything extra. I'm +1 on exactly what you wrote here, and am not picky about the naming. > The only catch is the current tables only let us store > strings. We could store each per-letter-annotation entry (e.g. a > single quality score) as a separate table entry (where the rank tells > us the correct order), but bundling them all into a single long table > row might be more efficient. In the case of PHRED or Solexa scores, > we could even use the FASTQ encoding (but a string "10, 20, 50, ..." > might be more sensible). This would require some co-ordination with > the other Bio* projects, probably on the BioSQL mailing list. My vote is for bundling them together into a single row table using json to stringify the lists. It's a nice compact representation and will be well supported in any language. Python 2.6 has the simplejson library bundled, so it's just a matter of doing: jsonified_list = json.dumps(the_quality_list) the_quality_list = json.loads(jsonified_list) Since I've been doing more Javascript and Python, I appreciate not munging lists into strings with obscure separators and really like json. As a bonus, it looks just like Python. Brad From lpritc at scri.ac.uk Mon Feb 23 04:48:07 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 23 Feb 2009 09:48:07 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <20090222212742.GA58314@kunkel> Message-ID: Hi all, On 22/02/2009 21:27, "Brad Chapman" wrote: > [...] >> I'm not sure if its exactly what Leighton has in mind, but it seems >> more complicated to have to do >> my_record.per_symbol_annotations["quality"]["phred"] rather than just >> my_record.per_symbol_annotations["quality_phred"]. > > I'm agreed with you here -- the double dictionary I proposed is ugly > and doesn't do much of anything extra. I'm +1 on exactly what you wrote > here, and am not picky about the naming. I was originally suggesting two extremes, a lightweight dictionary and a more heavyweight new class. I now prefer the lightweight option, which I imagine might operate along the lines of (keeping away from quality scores, for now...) >>> my_seqrecord SeqRecord(seq=Seq('FCLEPPYWYKNPGARTESRILRGGIID', Alphabet()), id='my_seqrecord', name='', description='', dbxrefs=[]) >>> my_seqrecord.per_symbol_annotations['secondary_structure'] 'HHHHHHEEEEEEE EEEEEEEEE' >>> my_seqrecord.per_symbol_annotations['hydrophobicity'] [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045, 0.493, 0.162, 0.796, 0.989, 0.419, 0.501, 0.686, 0.985, 0.502, 0.242, 0.890, 0.436, 0.855, 0.426, 0.814, 0.178, 0.923] >>> # Assuming that one day there's slicing of SeqRecords... >>> shorter_seqrecord = my_seqrecord[:10] >>> shorter_seqrecord.per_symbol_annotations['secondary_structure'] 'HHHHHHEEEE" >>> shorter_seqrecord.per_symbol_annotations['hydrophobicity'] [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045] Which I guess could be enforced in slice-handling by having it loop over the values (if any) in my_seqrecord.per_symbol_annotations and propagate accordingly. The more heavyweight idea involved a PerSymbolAnnotation (or somesuch name) class. I imagined this presenting a common API, but permitting the storage of annotation data in an arbitrary fashion so long as it could be returned as a Python sequence. The class-based approach would make it possible to attach methods specific to that kind of annotation data, which may be useful - but probably not in the vast majority of cases. Also, any such operations could probably be handled external to the object by other functions, so long as they can get that Python sequence - which the more lightweight approach provides. Most people's attention here seems to be focused on sequence quality data, with a skew towards high-throughput sequencing, and the lightweight approach is the one that definitely makes most sense to me, there. >> The only catch is the current tables only let us store >> strings. We could store each per-letter-annotation entry (e.g. a >> single quality score) as a separate table entry (where the rank tells >> us the correct order), but bundling them all into a single long table >> row might be more efficient. In the case of PHRED or Solexa scores, >> we could even use the FASTQ encoding (but a string "10, 20, 50, ..." >> might be more sensible). This would require some co-ordination with >> the other Bio* projects, probably on the BioSQL mailing list. > > My vote is for bundling them together into a single row table using > json to stringify the lists. It's a nice compact representation and > will be well supported in any language. Python 2.6 has the > simplejson library bundled, so it's just a matter of doing: > > jsonified_list = json.dumps(the_quality_list) > the_quality_list = json.loads(jsonified_list) > > Since I've been doing more Javascript and Python, I appreciate not > munging lists into strings with obscure separators and really like > json. As a bonus, it looks just like Python. I don't like the idea of storing each per-symbol annotation (i.e. single score/annotation) in its own row, either. I think that we all realise that approach could rapidly become hugely inefficient ;) I can see that pulling out individual symbol annotations might be desirable when people want slices of the annotation in units smaller than a single seqfeature or bioentry (in BioSQL terms). In those cases, on grounds of efficiency, I think it possibly makes more sense to grab either the seqfeature or bioentry (since the per-symbol annotations would always be associated with such an object) as a SeqRecord and slice out the data, rather than to query a table with what would likely be (at least eventually) millions of rows of per-symbol annotations. That possibly means adding slicing to SeqRecords though, which brings its own problems... ;) Storage of per-symbol annotation as Python sequence information in a single db row, in a human-readable plain-text format that's readily-parsable when querying the database with Biopython looks like a winning approach to me. I'd not come across json before - it does remind me of nested Python dictionaries. It looks simple to use and parse, and reverse-engineerable if necessary. If it's robust to the kind of data we want to store, and a de facto or actual standard usable transparently across all Bio* projects, then it sounds like a good candidate, to me. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Mon Feb 23 05:42:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 10:42:13 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: References: <20090222212742.GA58314@kunkel> Message-ID: <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> On Mon, Feb 23, 2009 at 9:48 AM, Leighton Pritchard wrote: > Hi all, > > On 22/02/2009 21:27, "Brad Chapman" wrote: > >> [...] >>> I'm not sure if its exactly what Leighton has in mind, but it seems >>> more complicated to have to do >>> my_record.per_symbol_annotations["quality"]["phred"] rather than just >>> my_record.per_symbol_annotations["quality_phred"]. >> >> I'm agreed with you here -- the double dictionary I proposed is ugly >> and doesn't do much of anything extra. I'm +1 on exactly what you wrote >> here, and am not picky about the naming. > > I was originally suggesting two extremes, a lightweight dictionary and a > more heavyweight new class. ?I now prefer the lightweight option, which I > imagine might operate along the lines of (keeping away from quality scores, > for now...) > >>>> my_seqrecord > SeqRecord(seq=Seq('FCLEPPYWYKNPGARTESRILRGGIID', Alphabet()), > id='my_seqrecord', name='', description=' description>', dbxrefs=[]) >>>> my_seqrecord.per_symbol_annotations['secondary_structure'] > 'HHHHHHEEEEEEE ? ? EEEEEEEEE' >>>> my_seqrecord.per_symbol_annotations['hydrophobicity'] > [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045, > 0.493, 0.162, 0.796, 0.989, 0.419, 0.501, 0.686, 0.985, 0.502, 0.242, 0.890, > 0.436, 0.855, 0.426, 0.814, 0.178, 0.923] >>>> # Assuming that one day there's slicing of SeqRecords... >>>> shorter_seqrecord = my_seqrecord[:10] >>>> shorter_seqrecord.per_symbol_annotations['secondary_structure'] > 'HHHHHHEEEE" >>>> shorter_seqrecord.per_symbol_annotations['hydrophobicity'] > [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045] > > Which I guess could be enforced in slice-handling by having it loop over the > values (if any) in my_seqrecord.per_symbol_annotations and propagate > accordingly. This sounds like a possible consensus :) In terms of names, we've have per_symbol_annotations and per_letter_annotations (to match the existing annotations dictionary), which are long but explicit. We could also have letter_annotations, symbol_annotations (shorter but more ambiguous), or even pas or pla (too short?). For the implementation, we could start with a simple dictionary and see if any kind of safety feature should be added later if is seems necessary. What I had in mind was a dict subclass which takes the sequence length, and by overriding the __setitem__ method checks only python sequences (objects with __len__ and __getitem__) of the appropriate length can be added. This would add a small overhead when creating the annotated SeqRecord, and wouldn't stop abuses like my_seqrecord.per_symbol_annotations['secondary_structure'].append("X"), but would make it harder to accidentally get inconsistent sequence and per-letter-annotation. > The more heavyweight idea involved a PerSymbolAnnotation (or somesuch name) > class. ?I imagined this presenting a common API, but permitting the storage > of annotation data in an arbitrary fashion so long as it could be returned > as a Python sequence. ?The class-based approach would make it possible to > attach methods specific to that kind of annotation data, which may be useful > - but probably not in the vast majority of cases. ?Also, any such operations > could probably be handled external to the object by other functions, so long > as they can get that Python sequence - which the more lightweight approach > provides. You could implement things like a SolexaQualityList and PhredQualityList with methods to inter-convert the scores and still use them within the per_letter_annotations approach described above. One of the nice things about this dictionary approach is it would be very flexible - you could also store an N by 3 numpy array containing the x,y,z atomic coordinates of the C-alpha protein backbone for a protein of length N, or a list of residue objects from our PDB parser. Anything which is a python sequence object (so lists, strings, tuples for a start). >> My vote is for bundling them together into a single row table using >> json to stringify the lists. It's a nice compact representation and >> will be well supported in any language. Python 2.6 has the >> simplejson library bundled, so it's just a matter of doing: >> >> jsonified_list = json.dumps(the_quality_list) >> the_quality_list = json.loads(jsonified_list) >> >> Since I've been doing more Javascript and Python, I appreciate not >> munging lists into strings with obscure separators and really like >> json. As a bonus, it looks just like Python. > > I don't like the idea of storing each per-symbol annotation (i.e. single > score/annotation) in its own row, either. ?I think that we all realise that > approach could rapidly become hugely inefficient ;) ?... For recording complex objects in a BioSQL database, using json sounds like a simple cross language solution. We should take this sub-topic over to the BioSQL mailing list. In terms of Biopython, we'd need to be able to support old versions Python. For simple cases like lists of integers, or lists of floats, this is probably very straight forward - but if we need full json support its a bit more tricky. We'd want to use the BioSQL term/ontology features to indicate the value is json encoded somehow. Peter From andrea at biodec.com Mon Feb 23 07:22:51 2009 From: andrea at biodec.com (Andrea) Date: Mon, 23 Feb 2009 13:22:51 +0100 Subject: [Biopython-dev] DeprecationWorning SProt.py Message-ID: <49A2951B.9060706@biodec.com> Goodmorning, my name is Andrea Zauli. using the last version of biopyhthon (1.49) i received this DeprecationWarning: /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-x86_64.egg/Bio/SwissProt/SProt.py:147: DeprecationWarning: Bio.SwissProt.SProt.Iterator is deprecated. Please use the function Bio.SwissProt.parse instead if you want to get a SwissProt.SProt.Record, or Bio.SeqIO.parse if you want to get a SeqRecord. If these solutions do not work for you, please get in contact with the Biopython developers (biopython-dev at biopython.org). DeprecationWarning) But i still need to use it. I'm going to explain my problem. I noticed that the seq record parser SProt.SequenceParser (or the newest Bio.SeqIO.parse) aren't able to parse uniprot Feature (and generate SeqFeature Objects). I noticed also that SProt.RecordParser is able to parse uniprot Feature (and it generates a list of tuple for the parsed features). So to generate a "featured SeqRecord" i need to parse each uniprot "record" with both (SProt.SequenceParser, SProt.RecordParser) and than transform easily each Feature tuple into a SeqFeature instance . To manage this problem actually i'm using SProt.Iterator that is able to work with a file handle, and return like a generator, each unparsed uniprot record. After, i can easily pass each unparsed uniprot record either to SProt.SequenceParser and to SProt.RecordParser for parsing. In that way i'm ALSO SURE that the record i'm parsing is the SAME. If this method is deprecated, i'd be forced to use Bio.SeqIO.parse and Bio.SwissProt.parse, but each have to act on their own handle (so i've to open 2 file handles)..... and i'm not sure (ok i would be reasonably sure) that i'm working exactly on the same "record" every each "".next()"" . I could work in a different way if the Feature Parser (that in some way parses the Feature in the SProt.RecordParser) could be transferred to the SProt.SequenceParser (or Bio.SeqIO.parse). So actually i need to work with: - SProt.SequenceParser, SProt.RecordParser because of they have the method "parse_str". - SProt.Iterator because it is able to produce the "string" object that represent an uniprot record to parse (and that i can easily pass to the ".parse_str" method for parsing). I could stop to work with Prot.SequenceParser, SProt.RecordParser if Bio.SeqIO and Bio.SwissProt will have the methof ".parse_str". I could stop to work with SProt.Iterator, if in some way there is an alterntive. I could work in a different way if the Feature Parser (that in some way parses the Feature in the SProt.RecordParser) could be transferred to the SProt.SequenceParser (or Bio.SeqIO.parse). Thank in advance Any help is appreciated Best Reards Dr. Andrea Zauli From biopython at maubp.freeserve.co.uk Mon Feb 23 08:16:56 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 13:16:56 +0000 Subject: [Biopython-dev] DeprecationWorning SProt.py In-Reply-To: <49A2951B.9060706@biodec.com> References: <49A2951B.9060706@biodec.com> Message-ID: <320fb6e00902230516p7781bf73n9e2dbdfef43801df@mail.gmail.com> On Mon, Feb 23, 2009 at 12:22 PM, Andrea wrote: > Goodmorning, > my name is Andrea Zauli. > using the last version of biopyhthon (1.49) i received this > DeprecationWarning: > /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-x86_64.egg/Bio/SwissProt/SProt.py:147: > DeprecationWarning: > Bio.SwissProt.SProt.Iterator is deprecated. > Please use the function Bio.SwissProt.parse instead if you want to get a > SwissProt.SProt.Record, or Bio.SeqIO.parse if you want to get a SeqRecord. > If these solutions do not work for you, please get in contact with the Biopython > developers (biopython-dev at biopython.org). DeprecationWarning) > But i still need to use it. > > I'm going to explain my problem. > I noticed that the seq record parser SProt.SequenceParser (or the newest > Bio.SeqIO.parse) aren't able to parse uniprot Feature (and generate > SeqFeature Objects). I noticed also that SProt.RecordParser is able to > parse uniprot Feature (and it generates a list of tuple for the parsed features). The real solution is for us to finish fixing Bug 2235 so that the parsing SwissProt files as SeqRecord objects includes SeqFeature objects. I need to update the patch on that bug to record the SeqFeature object's qualifiers more like the GenBank parser. I don't personally use SwissProt files much, so If you are willing to help test these changes, I'd be a lot happier about committing this. http://bugzilla.open-bio.org/show_bug.cgi?id=2235 > So to generate a "featured SeqRecord" i need to parse > each uniprot "record" with both (SProt.SequenceParser, SProt.RecordParser) > and than transform easily each Feature tuple into a SeqFeature instance . That sounds ugly, but I guess it worked. > If this method is deprecated, i'd be forced to use Bio.SeqIO.parse and > Bio.SwissProt.parse, but each have to act on their own handle (so i've to > open 2 file handles)..... and i'm not sure (ok i would be reasonably sure) > that i'm working exactly on the same "record" every each "".next()"" . You should be able to use two separate handles for the two parsers, and they should iterate over the records correctly. Perhaps add an assert using the record identifier to make sure the records really are in sync. Peter From jblanca at btc.upv.es Mon Feb 23 08:25:21 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Mon, 23 Feb 2009 14:25:21 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> References: <20090222212742.GA58314@kunkel> <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> Message-ID: <200902231425.21877.jblanca@btc.upv.es> > This sounds like a possible consensus :) Great > In terms of names, we've have per_symbol_annotations and > per_letter_annotations (to match the existing annotations dictionary), > which are long but explicit. We could also have letter_annotations, > symbol_annotations (shorter but more ambiguous), or even pas or pla > (too short?). I don't like pla or pas, their not clear, I would vote for letter_annotations. I think it's the clearest one. > For the implementation, we could start with a simple dictionary and > see if any kind of safety feature should be added later if is seems > necessary. What I had in mind was a dict subclass which takes the > sequence length, and by overriding the __setitem__ method checks only > python sequences (objects with __len__ and __getitem__) of the > appropriate length can be added. I'm not sure how to implement that. What would you think about creating a new class based on dict but with an extra property, parent? parent would be a reference to the SeqRecord. This new class would check the length of its parent before adding the letter_annotation. I'm just asking because I'm curious about the best way to implement it. Best regards, Jose Blanca From dalloliogm at gmail.com Mon Feb 23 08:31:00 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 23 Feb 2009 14:31:00 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> Message-ID: <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I have uploaded a git-converted branch of biopython on github, in case > you want to try it and see how it works. > > You can find it here: > - http://github.com/biopython/biopython/ Hi people, so, I am still testing biopython on git. The function to convert a cvs repository to git works well: I have just updated the branch on github to the latest cvs commit in open-bio, and it has correctly imported all the new commits without mixing them with the old ones. Now, if you look at http://github.com/biopython/biopython/network , you can see the results from all these experiments: the black line represent the code imported from cvs, and the other ones are experiments (well, don't care about the red one). For example, let's say you want to test the fix to the SwissProt parser commented by Andrea. You could create a new experimental branch, make it publicly accessible, and put all the changes there: only when you will consider it finish, you will merge it with the official one. The advantage of doing this is that two people or more are able to work on the same patch at the same time, and without having to touch the official code. > > > To work with it, the optimal protocol is: > > - create an account on github.com. Upload an ssh public key by > clicking on 'account' after having logged in. > It is not mandatory to use github, but it will help you understanding > how git works, and it allows other people to follow your branches and > your work. > > - go to the biopython repo: > http://github.com/biopython/biopython/tree/master > and you will see a button named 'Fork': click on it. > It will create a fork of the official biopython repository your > personal account. > Here the word 'fork' is not used in the common way it is, but just to > indicate that you are going to work on a modified version of the > official code, and it's not even a git command. > > > - now, install git on your computer, and execute the following commands: > $: git clone git at github.com:/biopython.git > $: git remote add official_dist git://github.com/biopython/biopython.git > > With the first command, you will download a copy of the repository on > your local computer, which will be the one you will modify > (technically, you are creating a new branch on your computer). > With the second command, you are adding a reference to the official > biopython repository, so in the future you will be able to easily > import the official code and compare it with yours. > > Here it is an explanation on these two commands: > http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo > > > p.s.: to convert to git from cvs I have followed the instructions here: > - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/cvs-migration.html > This seems to be a good tutorial on git, too: > - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html > > > -- > > My blog on bioinformatics (now in English): http://bioinfoblog.it > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Mon Feb 23 08:50:49 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 23 Feb 2009 14:50:49 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> Message-ID: <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> On Sat, Feb 21, 2009 at 7:50 PM, Peter wrote: > On Fri, Feb 20, 2009 at 11:19 PM, Brad Chapman wrote: >> Hi all; >> Good points on this debate so far. What do you all think about a >> hybrid approach where the .quality attribute is a dictionary? The >> keys would be the quality type ("phred", "solexa"...) and the values >> would be a list or string the same length as the sequence. > > I was actually thinking about adding a per_letter_annotations (or I suggest you to use github or any distribuited source versioning system to test the changes you are describing in this discussion. For example, I have created a branch on my github repository called 'qualityscores-experimental' (http://github.com/dalloliogm/biopython/tree/qualityscores-experimental) with a sample commit where I add a per_symbol_annotations attribute to SeqRecord: - http://github.com/dalloliogm/biopython/commit/7821d5f8cab1a5d7c4098c4b52f773b08a45969a I think that it is easier to discuss over this if you can show how the code would look like instead of only describing it. > using Brad's suggested name per_symbol_annotations) dictionary which > could hold phred qualities, solexa qualities, secondary structure, > atomic coordinates - any python sequence (e.g. string, list or tuple) > with a length matching the sequence. This would cover all the use > cases I have come up with, and we can implement SeqRecord slicing > which would also slice everything in the per_letter_annotations > dictionary. > > Note that the per_letter_annotations dictionary could actually be a > simple subclass of the python dictionary that only allows you to add > elements with the appropriate length - this would prevent simple > abuses/accidental errors. > >> For slicing, all of the quality dictionary values would be sliced >> identically to the sequence itself. For BioSQL storage the quality >> items would go in as annotations with names as a concatenation >> of the attribute and type ("quality_phred"). >> >> Treating these specially on the BioSQL in/out is a little hack-y, >> but quality is likely important enough to not bury it. > > If you are trying to store a sequence-with-quality in BioSQL, then yes > using the existing annotation tables could work - the ontology term > can tell us its a per-letter-annotation rather than a generic > annotation. The only catch is the current tables only let us store > strings. We could store each per-letter-annotation entry (e.g. a > single quality score) as a separate table entry (where the rank tells > us the correct order), but bundling them all into a single long table > row might be more efficient. In the case of PHRED or Solexa scores, > we could even use the FASTQ encoding (but a string "10, 20, 50, ..." > might be more sensible). This would require some co-ordination with > the other Bio* projects, probably on the BioSQL mailing list. > > On the other hand, I don't expect anyone to try and store GB of > sequence+quality data in BioSQL. For this a custom database design > would be much more efficient (or at least some custom tables). Here > as Iddo points out, the SeqRecord object may be overkill. > >> For Leighton's idea of generalization you could either: >> >> - Derive a heavy-weight SeqRecord class from the base class that >> added a several additional per-symbol cases. >> >> - Provide a generic per_symbol_annotations attribute that collected >> these as a dictionary of dictionaries: >> >> dict(quality = dict(phred = [20, 30]), >> hydrophobicity = dict(some_predictor = ['some', 'scores']) >> ) >> >> These could map to generic attributes in the same way and follow the >> same slicing rules. After writing this up, I think the second idea >> is better and probably exactly what Leighton was proposing. > > I'm not sure if its exactly what Leighton has in mind, but it seems > more complicated to have to do > my_record.per_symbol_annotations["quality"]["phred"] rather than just > my_record.per_symbol_annotations["quality_phred"]. I don't see much > benefit to the extra level of nesting - after all you'll typically > only have one type of quality present. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Feb 23 09:24:04 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 14:24:04 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> Message-ID: <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> On Mon, Feb 23, 2009 at 1:50 PM, Giovanni Marco Dall'Olio wrote: > > I suggest you to use github or any distribuited source versioning > system to test the changes you are describing in this discussion. > > ... > > I think that it is easier to discuss over this if you can show how the > code would look like instead of only describing it. Or we can stick with the old fashioned approach of uploading patches to bugzilla. This proposal only requires additions to Bio/SeqRecord.py to define the new property, and won't change much existing code at all. I can see there are benefits to using a distributed source version system for more complicated patches touching lots of files, but it isn't needed here and (if you don't have git installed) using github might it actually make it harder for people to try the code on their local machine. Peter From p.j.a.cock at googlemail.com Mon Feb 23 10:31:35 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Feb 2009 15:31:35 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> Message-ID: <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> On Mon, Feb 23, 2009 at 1:31 PM, Giovanni Marco Dall'Olio wrote: > On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio > wrote: >> Hi, >> I have uploaded a git-converted branch of biopython on github, in case >> you want to try it and see how it works. >> >> You can find it here: >> - http://github.com/biopython/biopython/ > > Hi people, > so, I am still testing biopython on git. I should have said something two weeks ago, but I didn't actually realize you weren't just doing this with a branch under your own name. I think it is very misleading that you have created a git user called "biopython" and a branch called "biopython" with a description of "official biopython migration from cvs". I can see the value of having the official CVS server mirrored on github, but the way you have done this suggests this is an official project approved of by the biopython.org developers. What's more, if you get bored, I presume this branch on git hub won't get updated anymore and will just sit there - orphaned and out of date! > The function to convert a cvs repository to git works well: I have > just updated the branch on github to the latest cvs commit in > open-bio, and it has correctly imported all the new commits without > mixing them with the old ones. That sounds nice. > Now, if you look at http://github.com/biopython/biopython/network , > you can see the results from all these experiments: the black line > represent the code imported from cvs, and the other ones are > experiments (well, don't care about the red one). Does this work without Adobe flash? I don't have this on my Linux machine at home, and while I do have gnash it doesn't work on that many sites. Peter From eric.talevich at gmail.com Mon Feb 23 11:43:04 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 23 Feb 2009 11:43:04 -0500 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> Message-ID: <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> Hi folks, > The function to convert a cvs repository to git works well: I have > > just updated the branch on github to the latest cvs commit in > > open-bio, and it has correctly imported all the new commits without > > mixing them with the old ones. > > That sounds nice. > In support of Launchpad once again: Browsing the github docs, I don't see a way for this to be made automatic and continual through the site. (Of course, it's clearly against their financial interest to promote cvs/svn.) Launchpad appears to support it happily: https://help.launchpad.net/VcsImports I see biopython-test hasn't been set up this way yet. Should I try setting up a continuous mirror like this (under a name like biopython-cvs-test)? Or, would Bartek or Giovanni prefer to? > Now, if you look at http://github.com/biopython/biopython/network , > > you can see the results from all these experiments: the black line > > represent the code imported from cvs, and the other ones are > > experiments (well, don't care about the red one). > > Does this work without Adobe flash? I don't have this on my Linux > machine at home, and while I do have gnash it doesn't work on that > many sites. > It doesn't seem to work with gnash 0.8.4/amd64, but I think you could use gitk to get mostly the same information minus the snazzy site integration. Cheers, Eric From p.j.a.cock at googlemail.com Mon Feb 23 12:08:03 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Feb 2009 17:08:03 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> Message-ID: <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> > In support of Launchpad once again: Browsing the github docs, I don't see a > way for this to be made automatic and continual through the site. (Of > course, it's clearly against their financial interest to promote cvs/svn.) Does anyone know if github can automatically keep in sync with ANY external repository? For example, suppose instead of CVS or SVN we actually ran a git server on biopython.org, would github be able to track it automatically? I really don't like the idea of relying on an external host - if github could mirror a repository on biopython.org that would seem much safer. > Launchpad appears to support it happily: > https://help.launchpad.net/VcsImports > > I see biopython-test hasn't been set up this way yet. Should I try setting > up a continuous mirror like this (under a name like biopython-cvs-test)? Or, > would Bartek or Giovanni prefer to? Given Bartek is one of the official Biopython developers, it might make more sense for him to try and setup a biopython-cvs-test tracker in launchpad if people want to try this. He may have done this already, as he seems to have several sub projects... I'm not sure and right now launchpad is being very slow (which does not impress me). See http://bazaar.launchpad.net/~bartek/biopython-test/trunk/files and links >> Does this [github] work without Adobe flash? ?I don't have this on my Linux >> machine at home, and while I do have gnash it doesn't work on that >> many sites. > > It doesn't seem to work with gnash 0.8.4/amd64, but I think you could use > gitk to get mostly the same information minus the snazzy site integration. That's a shame. At least gitk would work on any git repository, you wouldn't be tied into github. Peter From bartek at rezolwenta.eu.org Mon Feb 23 13:29:04 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 23 Feb 2009 19:29:04 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> Message-ID: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> >Does anyone know if github can automatically keep in sync with ANY >external repository? For example, suppose instead of CVS or SVN we >actually ran a git server on biopython.org, would github be able to >track it automatically? I really don't like the idea of relying on an >external host - if github could mirror a repository on biopython.org >that would seem much safer. I guess It's doable if we are allowed to setup cron jobs at open-bio. If we had a git branch at open-bio.org server, we could use git over ssh to push to the main branch and then set up a cron job which would push the main branch from open-bio to github, so that people can branch from it. The same thing is off course doable as well with bzr+launchpad. >> I see biopython-test hasn't been set up this way yet. Should I try setting >> up a continuous mirror like this (under a name like biopython-cvs-test)? Or, >> would Bartek or Giovanni prefer to? > > Given Bartek is one of the official Biopython developers, it might > make more sense for him to try and setup a biopython-cvs-test tracker > in launchpad if people want to try this. He may have done this > already, as he seems to have several sub projects... I'm not sure and > right now launchpad is being very slow (which does not impress me). > See http://bazaar.launchpad.net/~bartek/biopython-test/trunk/files and links > I've requesten launchpad to follow our cvs trunk. They should (after reviewing my request) put it into the location: https://code.edge.launchpad.net/~vcs-imports/biopython-test/trunk I'll post to the list if they get back to me. We'll see how it goes. >>> Does this [github] work without Adobe flash? I don't have this on my Linux >>> machine at home, and while I do have gnash it doesn't work on that >>> many sites. Github in iteslf does not depend on flash. In fact I don't think you need a browser at all to use it. Network visualization of your branch and its "relatives" is flash based, and thus not really accessible from some systems, but I don't think it's too important. cheers Bartek From biopython at maubp.freeserve.co.uk Mon Feb 23 13:34:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 18:34:18 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <200902231425.21877.jblanca@btc.upv.es> References: <20090222212742.GA58314@kunkel> <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> <200902231425.21877.jblanca@btc.upv.es> Message-ID: <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> Peter wrote: >> For the implementation, we could start with a simple dictionary and >> see if any kind of safety feature should be added later if is seems >> necessary. ?What I had in mind was a dict subclass which takes the >> sequence length, and by overriding the __setitem__ method checks only >> python sequences (objects with __len__ and __getitem__) of the >> appropriate length can be added. On Mon, Feb 23, 2009 at 1:25 PM, Jose Blanca wrote: > I'm not sure how to implement that. This is what I had in mind, though I haven't properly test it yet: class RestrictedDict(dict): """A dictionary which only allows sequences of given length as values.""" def __init__(self, length) : """Create an EMPTY dictionary.""" dict.__init__(self) self._length = int(length) def __setitem__(self, key, value) : if not hasattr(value,"__len__") or not hasattr(value,"__getitem__") \ or len(value) != self._length : raise TypeError("We only allow python sequences (lists, tuples or strings) of length %i." % self._length) dict.__setitem__(self, key, value) x = RestrictedDict(4) x["test"] = "abcd" x["test"] = ["a","b",5,None] x["test"] = (1,2,3,4) try : x["test"] = "abcde" #wrong length assert False except TypeError : pass try : x["test"] = 10 #not a sequence assert False except TypeError : pass > What would you think about creating a new > class based on dict but with an extra property, parent? parent would be a > reference to the SeqRecord. This new class would check the length of its > parent before adding the letter_annotation. I'm just asking because I'm > curious about the best way to implement it. This could work, and would also mean the length of the sequence would get updated if the parent SeqRecord's seq property was changed. On the other hand, this kind of thing could cause trouble for automatic garbage collection (because of the circular references between the objects). This may not be real problem, but its something I would worry about. Peter From jblanca at btc.upv.es Tue Feb 24 05:24:07 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Tue, 24 Feb 2009 11:24:07 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> References: <20090222212742.GA58314@kunkel> <200902231425.21877.jblanca@btc.upv.es> <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> Message-ID: <200902241124.07974.jblanca@btc.upv.es> On Monday 23 February 2009 19:34:18 Peter wrote: > class RestrictedDict(dict): > ? ? """A dictionary which only allows sequences of given length as > values.""" def __init__(self, length) : > ? ? ? ? """Create an EMPTY dictionary.""" > ? ? ? ? dict.__init__(self) > ? ? ? ? self._length = int(length) > ? ? def __setitem__(self, key, value) : > ? ? ? ? if not hasattr(value,"__len__") or not hasattr(value,"__getitem__") > \ or len(value) != self._length : > ? ? ? ? ? ? raise TypeError("We only allow python sequences (lists, > tuples or strings) of length %i." % self._length) > ? ? ? ? dict.__setitem__(self, key, value) An alternternative implementation using weakref to link the RestrictedDict with the SeqRecord. class RestrictedDict(dict): """A dictionary which only allows sequences of the same length as the parent as values.""" def __init__(self, parent): """Create an empty dictionary.""" dict.__init__(self) import weakref self._parent = weakref.ref(parent) def __setitem__(self, key, value): attrs = dir(value) if not "__len__" in attrs or not "__getitem__" in attrs: raise TypeError("We only allow python sequences (lists, tuples or strings)") if len(value) != len(self._parent()): raise TypeError('Lengths do not match.') dict.__setitem__(self, key, value) And in the SeqRecord __init__ we should add: #letter_annotations self.letter_annotations = RestrictedDict(self) -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From dalloliogm at gmail.com Tue Feb 24 06:54:53 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 24 Feb 2009 12:54:53 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> Message-ID: <5aa3b3570902240354j25ef5007g9ae750d70ed00993@mail.gmail.com> On Mon, Feb 23, 2009 at 4:31 PM, Peter Cock wrote: > On Mon, Feb 23, 2009 at 1:31 PM, Giovanni Marco Dall'Olio > wrote: >> On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio >> wrote: >>> Hi, >>> I have uploaded a git-converted branch of biopython on github, in case >>> you want to try it and see how it works. >>> >>> You can find it here: >>> - http://github.com/biopython/biopython/ >> >> Hi people, >> so, I am still testing biopython on git. > > I should have said something two weeks ago, but I didn't actually > realize you weren't just doing this with a branch under your own name. > > I think it is very misleading that you have created a git user called > "biopython" and a branch called "biopython" with a description of > "official biopython migration from cvs". I can see the value of > having the official CVS server mirrored on github, but the way you > have done this suggests this is an official project approved of by the > biopython.org developers. Do not worry too much about that.. I also hadn't had too much time to refine it. I was going to send you the credentials of the biopython user, or to anyone wishing to have them, but I wanted to test the cvs update first. In any case, the term 'official' was just meant to indicate that all the other branches should be derived from that, as there are other biopython derivates on github already. > What's more, if you get bored, I presume > this branch on git hub won't get updated anymore and will just sit > there - orphaned and out of date! That is a matter of setting a cron job somewhere to automatically update the branch. However, I don't know if github can mirror a cvs repository, maybe not. But I just wanted to show you how a decentralized versioning system works and how it can be used in a more 'centralized' way, with an official repository - since this is what you were asking earlier. >> The function to convert a cvs repository to git works well: I have >> just updated the branch on github to the latest cvs commit in >> open-bio, and it has correctly imported all the new commits without >> mixing them with the old ones. > > That sounds nice. > >> Now, if you look at http://github.com/biopython/biopython/network , >> you can see the results from all these experiments: the black line >> represent the code imported from cvs, and the other ones are >> experiments (well, don't care about the red one). > > Does this work without Adobe flash? I don't have this on my Linux > machine at home, and while I do have gnash it doesn't work on that > many sites. It seems to not work with gnash... however, you can still see how many derived branches there are, which in launchpad is handled in a different way (since you were asking for the differences, again). > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Tue Feb 24 07:04:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 24 Feb 2009 07:04:40 -0500 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200902241204.n1OC4erT008537@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #4 from dalloliogm at gmail.com 2009-02-24 07:04 EST ------- (In reply to comment #3) > (In reply to comment #2) > > This may be an NCBI bug, ... > > According to this page there is/was a problem with the XML files returned for > the snp database by efetch, > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html > > >> Known issues > >> * ... > >> * eFetch utility generates an invalid XML for SNP, so currently it doesn't > >> work through SOAP. The bug is being fixed. > >> * ... > > Unfortunately I have no idea if that information is current or not. This could > been unrelated. Yeah, unfortunately the XML seems to be still invalid. I have tried to paste an XML result from Bio.Entrez to many XML validators, but they detect errors. I have also tried with a python module to interrogate SOAP services (suds) and it also return errors. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 24 07:46:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Feb 2009 12:46:34 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <200902241124.07974.jblanca@btc.upv.es> References: <20090222212742.GA58314@kunkel> <200902231425.21877.jblanca@btc.upv.es> <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> <200902241124.07974.jblanca@btc.upv.es> Message-ID: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> On Tue, Feb 24, 2009 at 10:24 AM, Jose Blanca wrote: > > An alternternative implementation using weakref to link the RestrictedDict > with the SeqRecord. > > ... > Your code seems a little more complicated, but should work too. It would mean that if the parent SeqRecord's seq property was altered, the per-letter-annotation dictionary would know the new length. This is better - but if someone did change the parent SeqRecord's seq, then perhaps we should also automatically clear the per-letter-annotation? We could do this by using a full property for the seq attribute, which would also us to clear any existing per-letter-annotation by replacing it with a new restricted dictionary using the new length. Peter From p.j.a.cock at googlemail.com Tue Feb 24 07:59:45 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 24 Feb 2009 12:59:45 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902240354j25ef5007g9ae750d70ed00993@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <5aa3b3570902240354j25ef5007g9ae750d70ed00993@mail.gmail.com> Message-ID: <320fb6e00902240459i58ae1ad7w761c079a86fa389@mail.gmail.com> On Tue, Feb 24, 2009 at 11:54 AM, Giovanni Marco Dall'Olio wrote: > On Mon, Feb 23, 2009 at 4:31 PM, Peter Cock wrote: >> >> I think it is very misleading that you have created a git user called >> "biopython" and a branch called "biopython" with a description of >> "official biopython migration from cvs". ?I can see the value of >> having the official CVS server mirrored on github, but the way you >> have done this suggests this is an official project approved of by the >> biopython.org developers. > > Do not worry too much about that.. I also hadn't had too much time to > refine it. I was going to send you the credentials of the biopython user, > or to anyone wishing to have them, but I wanted to test the cvs update > first. > In any case, the term 'official' was just meant to indicate that all > the other branches should be derived from that, as there are other > biopython derivates on github already. The new description of "mirror of official biopython cvs on github" is much better - thanks. I would go further and call it "Unofficial test github mirror of Biopython CVS". If we do decide to use github (even just as a mirror to our own hosted repository), then yes giving the current Biopython admins control of the github "biopython" user would be a good idea. >> What's more, if you get bored, I presume >> this branch on git hub won't get updated anymore and will just sit >> there - orphaned and out of date! > > That is a matter of setting a cron job somewhere to automatically > update the branch. In the short term (as this is just an experiment for now), testing a daily cron job on your machine would be a good idea. In the long term (assuming we want an "official" github mirror), then doing it from the biopython.org repository server would be better. In theory this could be hooked into our main repository to push any trunk branch commits to github immediately. It would be much nicer if github could track an external repository on its own (like Bartek is hoping to get setup with Launchpad). Peter From lpritc at scri.ac.uk Tue Feb 24 08:04:09 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 24 Feb 2009 13:04:09 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> Message-ID: On 24/02/2009 12:46, "Peter" wrote: > On Tue, Feb 24, 2009 at 10:24 AM, Jose Blanca wrote: >> >> An alternternative implementation using weakref to link the RestrictedDict >> with the SeqRecord. >> >> ... >> > > Your code seems a little more complicated, but should work too. It > would mean that if the parent SeqRecord's seq property was altered, > the per-letter-annotation dictionary would know the new length. This > is better - but if someone did change the parent SeqRecord's seq, then > perhaps we should also automatically clear the per-letter-annotation? > We could do this by using a full property for the seq attribute, which > would also us to clear any existing per-letter-annotation by replacing > it with a new restricted dictionary using the new length. I can think of two particular incompatible situations here: 1) I change the parent SeqRecord sequence, by slicing it to a region I'm interested in. I want to keep the per-symbol-annotation, but adjusted to the new sequence. 2) I change the parent SeqRecord sequence by adding some more symbols to it. I've just destroyed the association between the per-symbol-annotation and my sequence without even realising it. I'd prefer a warning that this is going to happen before it destroys my earlier work, so I can make the change in a duplicate SeqRecord object. I think it's worth considering which behaviours we would find desirable, and how to handle others. We'll all have different use cases, I imagine... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Tue Feb 24 08:29:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 24 Feb 2009 08:29:31 -0500 Subject: [Biopython-dev] [Bug 2768] Bio.Entrez under a proxy In-Reply-To: Message-ID: <200902241329.n1ODTVlX016168@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2768 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-24 08:29 EST ------- Fixed in Tutorial.tex CVS revision 1.201, see: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 24 09:08:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Feb 2009 14:08:17 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: References: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> Message-ID: <320fb6e00902240608r50191274m7dc9a996f13964d9@mail.gmail.com> On Tue, Feb 24, 2009 at 1:04 PM, Leighton Pritchard wrote: >> Your code seems a little more complicated, but should work too. ?It >> would mean that if the parent SeqRecord's seq property was altered, >> the per-letter-annotation dictionary would know the new length. ?This >> is better - but if someone did change the parent SeqRecord's seq, then >> perhaps we should also automatically clear the per-letter-annotation? >> We could do this by using a full property for the seq attribute, which >> would also us to clear any existing per-letter-annotation by replacing >> it with a new restricted dictionary using the new length. > > I can think of two particular incompatible situations here: > > 1) I change the parent SeqRecord sequence, by slicing it to a region I'm > interested in. ?I want to keep the per-symbol-annotation, but adjusted to > the new sequence. If you did this by my_record[50:100] (assuming we implement the __getitem__ method, see Bug 2507), then my_record isn't changed - you'd get a new SeqRecord back for the partial sequence, with the appropriate per-symbol-annotation (by which I mean each per-symbol-annotation sequence would have been sliced using [50:100] to match, and a new dictionary created to hold these sub-sequences of the per-symbol-annotation). I'll try and upload a SeqRecord patch that does this shortly... > 2) I change the parent SeqRecord sequence by adding some more symbols to it. > I've just destroyed the association between the per-symbol-annotation and my > sequence without even realising it. ?I'd prefer a warning that this is going > to happen before it destroys my earlier work, so I can make the change in a > duplicate SeqRecord object. This situation could be caught by a set method for the SeqRecord seq property (not implemented yet). I was thinking this would silently throw away the old per-symbol-annotation, but this could instead raise an error (and make no changes), or issue a warning (but carry on). Good point. Peter From jblanca at btc.upv.es Tue Feb 24 09:26:02 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Tue, 24 Feb 2009 15:26:02 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902240608r50191274m7dc9a996f13964d9@mail.gmail.com> References: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> <320fb6e00902240608r50191274m7dc9a996f13964d9@mail.gmail.com> Message-ID: <200902241526.02209.jblanca@btc.upv.es> > Your code seems a little more complicated, but should work too. ?It > would mean that if the parent SeqRecord's seq property was altered, >the per-letter-annotation dictionary would know the new length. I did it that way to allow the creation of an empty SeqRecord and to modify the seq property after the creation. I don't know if that's a behaviour supported by biopython, but it can be done now. Your proposed seq property implementation could take care of that removing the possibility of setting seq after the creation. > > 2) I change the parent SeqRecord sequence by adding some more symbols to > > it. I've just destroyed the association between the per-symbol-annotation > > and my sequence without even realising it. ?I'd prefer a warning that > > this is going to happen before it destroys my earlier work, so I can make > > the change in a duplicate SeqRecord object. > > This situation could be caught by a set method for the SeqRecord seq > property (not implemented yet). I was thinking this would silently > throw away the old per-symbol-annotation, but this could instead raise > an error (and make no changes), or issue a warning (but carry on). > Good point. I would also prefer to raise an error in that case, because the user wouldn't be aware of the problem if the per-symbol-annotation is thown away without any warning. Regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From bsouthey at gmail.com Tue Feb 24 09:31:44 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 24 Feb 2009 08:31:44 -0600 Subject: [Biopython-dev] FYI: Scipy and DVCS Message-ID: <49A404D0.3050106@gmail.com> Hi, In connection with our discussions, there is a long thread (already about 100 entries) started by a post St?fan van der Walt titled 'The future of SciPy and its development infrastructure' : http://thread.gmane.org/gmane.comp.python.scientific.devel/10065 " I'd like to propose two changes to the status quo: 1. Change to a distributed revision control system, encouraging more open collaboration. 2. Determine guidelines for code acceptance, in terms of unit tests, documentation and peer review. " No real conclusions but there is concern about having a suitable bug tracker system as well. Regards Bruce From biopython at maubp.freeserve.co.uk Tue Feb 24 12:22:01 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Feb 2009 17:22:01 +0000 Subject: [Biopython-dev] Converting between PHRED and Solexa quality scores (and FASTQ files) Message-ID: <320fb6e00902240922x1cf77a7amf387432d7f79e51b@mail.gmail.com> Hopefully this information will be of general interest - I could have just stuck it on the end of Bug 2767 but thought it more suited to the mailing list (or even a blog post?). http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Nice links on mapping between Solexa and PHRED scores, http://maq.sourceforge.net/qual.shtml http://maq.sourceforge.net/fastq.shtml (missing some brackets in the final formula at the time of writing, I've emailed them) and: http://illumina.ucr.edu/ht/documentation/file-formats http://rcdev.umassmed.edu/pipeline/Alignment%20Scoring%20Guide%20and%20FAQ.html (note they are missing a minus sign in the definition of Q_solexa) For good quality reads the two scores are almost equal - but they differ for poor quality reads (PHRED scores go to zero, but Solexa scores can be negative). A standard FASTQ file (as used by Sanger) encodes the quality information using PHRED scores, while Solexa/Illumina decided to use their own schema in the FASTQ variant. In a PHRED style FASTQ file, PHRED quality = ord(letter) - 33 In a Solexa style FASTQ file, Solexa quality = ord(letter) - 64 >>> def phred_quality_from_fastq_letter(letter) : ... return ord(letter) - 33 ... >>> def solexa_quality_from_fastq_letter(letter) : ... return ord(letter) - 64 ... Both these scores are defined in terms of the estimated probability of an error (between 0 for a good read and 1 for a bad read). A probability of almost zero gives a high quality score, while a probability of almost one gives a very low quality score. >>> def phred_quality_from_error(error) : ... return -10*log(error,10) ... >>> def solexa_quality_from_error(error) : ... return -10*log(error/(1-error),10) ... >>> solexa_quality_from_error(0.000000001) 89.999999995657035 >>> solexa_quality_from_error(0.999999999) -90.000000118483911 >>> phred_quality_from_error(0.000000001) 89.999999999999986 >>> phred_quality_from_error(0.999999999) 4.3429446983771231e-09 >>> phred_quality_from_error(1) -0.0 Using these relationships you can map between PHRED and Solexa quality scores, assuming their error estimation methods are equivalent, >>> def solexa_quality_from_phred(phred_quality) : ... return 10*log(10**(phred_quality/10.0) - 1, 10) ... >>> solexa_quality_from_phred(90) 89.999999995657035 >>> solexa_quality_from_phred(50) 49.99995657033466 >>> solexa_quality_from_phred(10) 9.5424250943932485 >>> solexa_quality_from_phred(1) -5.8682532438011537 >>> solexa_quality_from_phred(0.1) -16.32774717238372 Or, the other way round, >>> def phred_quality_from_solexa(solexa_quality) : ... return 10*log(10**(solexa_quality/10.0) + 1, 10) ... >>> phred_quality_from_solexa(90) 90.000000004342922 >>> phred_quality_from_solexa(10) 10.41392685158225 >>> phred_quality_from_solexa(0) 3.0102999566398116 >>> phred_quality_from_solexa(-20) 0.043213737826425784 I think these python versions agree with the perl examples on http://maq.sourceforge.net/qual.shtml (doing a base ten logarithm seems much easier in python than in perl). Combining this with the letter mapping using in the Solexa FASTQ files, ord(letter)-64, we have: >>> def phred_quality_from_solexa_fastq_letter(letter) : ... return 10*log(10**((ord(letter)-64)/10.0) + 1, 10) This seems to agree with the perl example on http://maq.sourceforge.net/fastq.shtml (allowing for the missing brackets which I've emailed them about). So, in conclusion: >>> phred_quality_from_fastq_letter("!") 0 >>> phred_quality_from_fastq_letter("{") 90 >>> solexa_quality_from_fastq_letter("!") -31 >>> solexa_quality_from_fastq_letter("{") 59 >>> phred_quality_from_solexa_fastq_letter("!") 0.0034483543102526788 >>> phred_quality_from_solexa_fastq_letter("{") 59.000005467440147 Its very tricky to guess which FASTQ variant you have from the data itself (but from the range of characters, some examples can only be Solexa style). If we know we have a standard FASTQ file we can trivially get the PHRED scores. If we have a Solexa encoded FASTQ file, we can trivially get the Solexa scores. With this log mapping we *could* also do an implicit conversion of Solexa scores into PHRED scores, but due to floating point issues this is a little lossy. I would say follow python conventions and go with making things explicit, and not do this automatically when parsing. We could do this automatically if the user explicitly asks Bio.SeqIO to write out a "fastq-solexa" format file and their SeqRecords don't have Solexa qualities but do have PHRED qualities (or vice versa). Peter From chapmanb at 50mail.com Tue Feb 24 18:11:14 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 24 Feb 2009 18:11:14 -0500 Subject: [Biopython-dev] Converting between PHRED and Solexa quality scores (and FASTQ files) In-Reply-To: <320fb6e00902240922x1cf77a7amf387432d7f79e51b@mail.gmail.com> References: <320fb6e00902240922x1cf77a7amf387432d7f79e51b@mail.gmail.com> Message-ID: <20090224231114.GB39545@sobchak.mgh.harvard.edu> Peter; This is a great summary. I think these things belong on the wiki on the documentation page once the functionality is rolled into Biopython; it's a shame to see useful documentation hidden on the dev mailing list. Agreed 100% with no auto conversion. Providing the functionality to convert is plenty, and I think it would be more confusing to start seeing one type of scores when you expected another. Also, given the size of these data sets we want to be as lightweight as possible. Brad > Hopefully this information will be of general interest - I could have > just stuck it on the end of Bug 2767 but thought it more suited to the > mailing list (or even a blog post?). > http://bugzilla.open-bio.org/show_bug.cgi?id=2767 > > Nice links on mapping between Solexa and PHRED scores, > http://maq.sourceforge.net/qual.shtml > http://maq.sourceforge.net/fastq.shtml (missing some brackets in the > final formula at the time of writing, I've emailed them) > > and: > http://illumina.ucr.edu/ht/documentation/file-formats > http://rcdev.umassmed.edu/pipeline/Alignment%20Scoring%20Guide%20and%20FAQ.html > (note they are missing a minus sign in the definition of Q_solexa) > > For good quality reads the two scores are almost equal - but they > differ for poor quality reads (PHRED scores go to zero, but Solexa > scores can be negative). > > A standard FASTQ file (as used by Sanger) encodes the quality > information using PHRED scores, while Solexa/Illumina decided to use > their own schema in the FASTQ variant. > > In a PHRED style FASTQ file, PHRED quality = ord(letter) - 33 > In a Solexa style FASTQ file, Solexa quality = ord(letter) - 64 > > >>> def phred_quality_from_fastq_letter(letter) : > ... return ord(letter) - 33 > ... > >>> def solexa_quality_from_fastq_letter(letter) : > ... return ord(letter) - 64 > ... > > Both these scores are defined in terms of the estimated probability of > an error (between 0 for a good read and 1 for a bad read). A > probability of almost zero gives a high quality score, while a > probability of almost one gives a very low quality score. > > >>> def phred_quality_from_error(error) : > ... return -10*log(error,10) > ... > >>> def solexa_quality_from_error(error) : > ... return -10*log(error/(1-error),10) > ... > >>> solexa_quality_from_error(0.000000001) > 89.999999995657035 > >>> solexa_quality_from_error(0.999999999) > -90.000000118483911 > >>> phred_quality_from_error(0.000000001) > 89.999999999999986 > >>> phred_quality_from_error(0.999999999) > 4.3429446983771231e-09 > >>> phred_quality_from_error(1) > -0.0 > > Using these relationships you can map between PHRED and Solexa quality > scores, assuming their error estimation methods are equivalent, > > >>> def solexa_quality_from_phred(phred_quality) : > ... return 10*log(10**(phred_quality/10.0) - 1, 10) > ... > >>> solexa_quality_from_phred(90) > 89.999999995657035 > >>> solexa_quality_from_phred(50) > 49.99995657033466 > >>> solexa_quality_from_phred(10) > 9.5424250943932485 > >>> solexa_quality_from_phred(1) > -5.8682532438011537 > >>> solexa_quality_from_phred(0.1) > -16.32774717238372 > > Or, the other way round, > > >>> def phred_quality_from_solexa(solexa_quality) : > ... return 10*log(10**(solexa_quality/10.0) + 1, 10) > ... > >>> phred_quality_from_solexa(90) > 90.000000004342922 > >>> phred_quality_from_solexa(10) > 10.41392685158225 > >>> phred_quality_from_solexa(0) > 3.0102999566398116 > >>> phred_quality_from_solexa(-20) > 0.043213737826425784 > > I think these python versions agree with the perl examples on > http://maq.sourceforge.net/qual.shtml (doing a base ten logarithm > seems much easier in python than in perl). > > Combining this with the letter mapping using in the Solexa FASTQ > files, ord(letter)-64, we have: > > >>> def phred_quality_from_solexa_fastq_letter(letter) : > ... return 10*log(10**((ord(letter)-64)/10.0) + 1, 10) > > This seems to agree with the perl example on > http://maq.sourceforge.net/fastq.shtml (allowing for the missing > brackets which I've emailed them about). > > So, in conclusion: > > >>> phred_quality_from_fastq_letter("!") > 0 > >>> phred_quality_from_fastq_letter("{") > 90 > >>> solexa_quality_from_fastq_letter("!") > -31 > >>> solexa_quality_from_fastq_letter("{") > 59 > >>> phred_quality_from_solexa_fastq_letter("!") > 0.0034483543102526788 > >>> phred_quality_from_solexa_fastq_letter("{") > 59.000005467440147 > > Its very tricky to guess which FASTQ variant you have from the data > itself (but from the range of characters, some examples can only be > Solexa style). > > If we know we have a standard FASTQ file we can trivially get the > PHRED scores. If we have a Solexa encoded FASTQ file, we can > trivially get the Solexa scores. With this log mapping we *could* > also do an implicit conversion of Solexa scores into PHRED scores, but > due to floating point issues this is a little lossy. I would say > follow python conventions and go with making things explicit, and not > do this automatically when parsing. We could do this automatically if > the user explicitly asks Bio.SeqIO to write out a "fastq-solexa" > format file and their SeqRecords don't have Solexa qualities but do > have PHRED qualities (or vice versa). > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bartek at rezolwenta.eu.org Wed Feb 25 04:40:49 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Feb 2009 10:40:49 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> Message-ID: <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> On Mon, Feb 23, 2009 at 7:29 PM, Bartek Wilczynski wrote: > > I've requested launchpad to follow our cvs trunk. They should (after > reviewing my request) put it into the location: > https://code.edge.launchpad.net/~vcs-imports/biopython-test/trunk > I'll post to the list if they get back to me. We'll see how it goes. > There is a small technical problem with bazaar following our repo. For now, their scripts are working only with cvs pserver connections without password. It shouldn't be too difficult for them to adjust (anyway the setup of each import is not fully automated), but just in case it's not possible for now: Can we set up a user with no password and read-only access to our cvs repo? Who would be the right person to contact? cheers Bartek From dalloliogm at gmail.com Wed Feb 25 05:02:37 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 25 Feb 2009 11:02:37 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> Message-ID: <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> On Mon, Feb 23, 2009 at 3:24 PM, Peter wrote: > On Mon, Feb 23, 2009 at 1:50 PM, Giovanni Marco Dall'Olio > wrote: >> >> I suggest you to use github or any distribuited source versioning >> system to test the changes you are describing in this discussion. >> >> ... >> >> I think that it is easier to discuss over this if you can show how the >> code would look like instead of only describing it. > > Or we can stick with the old fashioned approach of uploading patches > to bugzilla. ?This proposal only requires additions to > Bio/SeqRecord.py to define the new property, and won't change much > existing code at all. Of course you can stick with bugzilla, but let me explain why I think using a drcs would be better :-). Basically, you should consider that with a drcs you can create forks very frequently, even for three or four commits, and when you have finished you merge the changes back and nobody will ever know that there it was a fork. If you want to change an attribute to SeqRecord, this doesn't imply a single commit: you have to test various solutions, provide tests for each of them, see which one is the most comfortable, and only then, push it in the official release. Basically, what you do now is similar to what you would do with a drcs: each one of you will probably have a modified copy of biopython on his computer, and when he will have finished he will create a patch or commit to the cvs system. However, the problem is that these local copies are on local computers, and for other people it is very difficult to evaluate them and to give good feedback. Moreover, these copies can become out of synchronization with the official branch. You can post some code snippets via mail, but you probably won't post the tests and many other things. If you create an experimental branch to test the new attribute to SeqRecord, along with its tests and all the separated commits for every change, and post it on a publicly accessible web site, then it will be possible to discuss a lot more over the changes, and I think this could improve the biopython's development process. > > I can see there are benefits to using a distributed source version > system for ?more complicated patches touching lots of files, but it > isn't needed here and (if you don't have git installed) using github > might it actually make it harder for people to try the code on their > local machine. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 25 05:10:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Feb 2009 10:10:19 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> Message-ID: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> On Wed, Feb 25, 2009 at 9:40 AM, Bartek Wilczynski wrote: > On Mon, Feb 23, 2009 at 7:29 PM, Bartek Wilczynski > wrote: >> >> I've requested launchpad to follow our cvs trunk. They should (after >> reviewing my request) put it into the location: >> https://code.edge.launchpad.net/~vcs-imports/biopython-test/trunk >> I'll post to the list if they get back to me. We'll see how it goes. > > There is a small technical problem with bazaar following our repo. For > now, their scripts are working only with cvs pserver connections without > password. It shouldn't be too difficult for them to adjust (anyway the > setup of each import is not fully automated), but just in case it's not > possible for now: Can we set up a user with no password and read-only > access to our cvs repo? Who would be the right person to contact? Right now as far as I know you need username "cvs", password "cvs" - or a full developer account. I guess another read only account could be setup (maybe "guest") with no password, assuming there are no security issues with this, but the OBF guys would have to do this. You could ask them on support at helpdesk.open-bio.org but given we probably won't continue with CVS that much longer anyway, it seems a bit pointless to hassle the OBF over this now - it might easier to just encourage Bazaar to deal with a password (as I'm sure lots of open source projects have a simple password like this). Peter From bartek at rezolwenta.eu.org Wed Feb 25 05:56:01 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Feb 2009 11:56:01 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> Message-ID: <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> On Wed, Feb 25, 2009 at 11:10 AM, Peter wrote: > > Right now as far as I know you need username "cvs", password "cvs" ?- > or a full developer account. ?I guess another read only account could > be setup (maybe "guest") with no password, assuming there are no > security issues with this, but the OBF guys would have to do this. > You could ask them on support at helpdesk.open-bio.org but given we > probably won't continue with CVS that much longer anyway, it seems a > bit pointless to hassle the OBF over this now - it might easier to > just encourage Bazaar to deal with a password (as I'm sure lots of > open source projects have a simple password like this). I've already contacted them about this, but this might take time for them to update their procedures to support passwords. In the meantime, I'll try to look into crontab based update procedure which wouldn't require anything on the launchpad part. cheers Bartek From bugzilla-daemon at portal.open-bio.org Wed Feb 25 10:42:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 25 Feb 2009 10:42:25 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200902251542.n1PFgP2Z029511@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #998 is|0 |1 obsolete| | ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-25 10:42 EST ------- Created an attachment (id=1249) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1249&action=view) Patch to SeqRecord.py and SeqFeature.py This updates the old patch (which no longer applied cleanly to CVS), and implements per-letter-annotation with a restricted dictionary as discussed on the mailing list. The precise name for the publicly exposed per-letter-annotation dictionary is still open to debate, here I have used letter_annotation - see the mailing list for more: http://lists.open-bio.org/pipermail/biopython-dev/2009-February/005340.html This includes a lengthy doctest on the SeqRecord __getitem__ method, but further additions to the unit tests would be wise. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Feb 25 17:00:10 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Feb 2009 22:00:10 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> Message-ID: <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> On Wed, Feb 25, 2009 at 10:02 AM, Giovanni Marco Dall'Olio wrote: > > Of course you can stick with bugzilla, ... > I've put an updated patch on Bug 2507 which implements the per-letter-annotations as a restricted dictionary (as the letter_annotations property for now), and adds a __getitem__ method to the SeqRecord object which is aware of it. This changes both SeqRecord.py and SeqFeature.py (required for switching the co-ordinates on SeqFeature objects as part of a SeqRecord slice), and is against the current CVS code. http://bugzilla.open-bio.org/show_bug.cgi?id=2507 If any of you aren't familiar with using the command line tools diff and patch, here's what you would do to try this code. Get a copy of the latest Biopython code from CVS, change to the Bio directory, download the attachment and save it in that directory as attachment.patch (for example) then and run "patch < attachment.patch" to update the code. Peter From bartek at rezolwenta.eu.org Thu Feb 26 08:26:15 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 26 Feb 2009 14:26:15 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> Message-ID: <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> Hi all, I've been looking around for alternative ways of converting our current CVS repository to a new DVCS system (git or bzr). The launchpad team offers the possibility of automatic mirroring of a cvs repository in a bzr branch, but it would require a chenge in configuration on our side (they still didn't answer my request to support password protected repos). I was looking for other option, and it seems that there is a way to solve the problem of mirroring. There is a tool cvs2git (a part of cvs2svn package http://cvs2svn.tigris.org/cvs2git.html), which reads a cvs repository and outputs a dump which is readable by both git and bzr (using the fast-import command). The nice thing about it is that it's very fast (~3mins) for the whole biopython repo. I've setup a small script, which grabs the newest cvs repo from dev.open-bio.org and converts it to git and bzr branches which are then pushed to github and launchpad. it currently runs as a crontab script on my machine and it could be transferred to open-bio.org if they would install bzr and git, but I'm fine with running it from my computer for a few months, especially if we plan to drop CVS support in a foreseeable future, which would make the installation of the script to open-bio servers useless. You can see the branches here: http://github.com/barwil/biopython-test/tree/master https://code.launchpad.net/~bartek/biopython-test/trunk_updates the branches are different than the previous ones made by me and Giovanni, because they now include the whole repository (including biodata,html,website etc.). We might consider spliting these into different repos. Using this kind of setup, we are allowing all interested to easily fork our current repo and then even merge their changes into the newer version of exported source. The only problem is, that as long as the CVS is the main repository, it might be difficult to commit these changes back to CVS. Does anyone have a clever idea for an easy procedure to commit things back to CVS? Because having a branch is not of much use if cannot easily accept contributions. All comments and/or ideas are welcome cheers Bartek From biopython at maubp.freeserve.co.uk Thu Feb 26 09:00:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Feb 2009 14:00:46 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> Message-ID: <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> On Thu, Feb 26, 2009 at 1:26 PM, Bartek Wilczynski wrote: > You can see the branches here: > http://github.com/barwil/biopython-test/tree/master > https://code.launchpad.net/~bartek/biopython-test/trunk_updates I would have just gone with the main Biopython "directory", ignoring the old website etc. > Using this kind of setup, we are allowing all interested to easily > fork our current repo and then even merge their changes into the newer > version of exported source. The only problem is, that as long as the > CVS is the main repository, it might be difficult to commit these > changes back to CVS. Does anyone have a clever idea for an easy > procedure to commit things back to CVS? Because having a branch is not > of much use if cannot easily accept contributions. Can't you produce a diff between the git mirror of CVS, and your modified branch - and then we can deal with the patch as usual via CVS? Another option to consider would be to switch to running git on biopython.org, but use the git-cvsserver tool to provide an emulated CVS server on top of the git repository. This sounds possible in theory, and would be nice for any "old fashioned" biopython developers because is should be fairly transparent - they can continue to treat it as CVS and just work on the main trunk. This would require someone competent to do the conversion and alter the server setup - we'd have to talk to the OBF team about this. However, if anyone has first hand experience on git-cvsserver perhaps they could comment on weather this sounds like a good plan or not. Peter From jblanca at btc.upv.es Thu Feb 26 10:12:54 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 26 Feb 2009 16:12:54 +0100 Subject: [Biopython-dev] library to create gel image Message-ID: <200902261612.54306.jblanca@btc.upv.es> Hi: I'm writting an application that reads ABIF files (Applied Biosystems files) and generates a gel image. I'm able to read the trace (chromatogram) data from the file and now I would like to plot it. I don't want to plot every trace as a 2d graphic like in: http://www.mun.ca/biology/scarr/ABI377_chromatogram.jpg But to create a 2D gel image using all traces like in: http://www.fieldmuseum.org/research_collections/pritzker_lab/pritzker/images/ecran.jpg Any suggestion on which python library could I use? Of course, if anybody is interested in the code that I already got I'm willing to share it. Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Thu Feb 26 13:51:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Feb 2009 18:51:07 +0000 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200902261612.54306.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> Message-ID: <320fb6e00902261051n62899098i86edd36ba00ee7d4@mail.gmail.com> On Thu, Feb 26, 2009 at 3:12 PM, Jose Blanca wrote: > Hi: > I'm writting an application that reads ABIF files (Applied Biosystems files) > and generates a gel image. I'm able to read the trace (chromatogram) data > from the file and now I would like to plot it. I don't want to plot every > trace as a 2d graphic like in: > http://www.mun.ca/biology/scarr/ABI377_chromatogram.jpg > But to create a 2D gel image using all traces like in: > http://www.fieldmuseum.org/research_collections/pritzker_lab/pritzker/images/ecran.jpg > Any suggestion on which python library could I use? Do you want to recreate the hexagonal grid, or would a simplified rectangular grid do? Do you need to be able to control the size, colour and intensity of the spots (in order to recreate the something close to the original). Do you get quality control information for nasty cases (e.g. non-circular dots, say a ring donut shape)? If you need this kind of fine control it would be a lot of work but you certainly could do this "by hand" using a number of python packages - for example ReportLab would let you generate PDF, PS, SVG or bitmap images from the same drawing object. Other backends might be equally suitable. > Of course, if anybody is interested in the code that I already got I'm willing > to share it. > Best regards, The code for reading the trace (chromatogram) data from ABIF files (Applied Biosystems files) might make a nice a addition to the Bio.Sequencing module. Peter From jblanca at btc.upv.es Fri Feb 27 04:05:28 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Feb 2009 10:05:28 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <320fb6e00902261400v22af3c52ob1a8cb80113f6756@mail.gmail.com> References: <200902261612.54306.jblanca@btc.upv.es> <1235675865.49a6ead98bc24@webmail.upv.es> <320fb6e00902261400v22af3c52ob1a8cb80113f6756@mail.gmail.com> Message-ID: <200902271005.28459.jblanca@btc.upv.es> > The example was a bit small - so I had guessed a bit, and it sounds > like my guess was wrong. Do you have a larger example picture? I want something like the Genographer software does: http://hordeum.oscs.montana.edu/genographer/help/tutorial/tutorial.html But I don't need an interactive GUI application, just the gel rendering. You can take a look at the code at: http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/ Take into account that is just a work in progress. Suggestions are welcomed. Regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Fri Feb 27 05:45:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 10:45:59 +0000 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200902271005.28459.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> <1235675865.49a6ead98bc24@webmail.upv.es> <320fb6e00902261400v22af3c52ob1a8cb80113f6756@mail.gmail.com> <200902271005.28459.jblanca@btc.upv.es> Message-ID: <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca wrote: >> The example was a bit small - so I had guessed a bit, and it sounds >> like my guess was wrong. ?Do you have a larger example picture? > I want something like the Genographer software does: > http://hordeum.oscs.montana.edu/genographer/help/tutorial/tutorial.html > But I don't need an interactive GUI application, just the gel rendering. That's much clearer - is the Genographer software showing the actual image (zoomed as required, with the colours adjusted as required), or an artificial recreation? Are you trying to create this figure for illustrative purposes only? I mean would a slightly cartoon like recreation be fine, or are you trying to make it as realistic as possible? Either way, I doubt there will be any existing software for exactly this purpose - and you will have to create your own code to draw this. > You can take a look at the code at: > http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/ > Take into account that is just a work in progress. I see you are having to reverse engineer their file format. I guess other people have tried this in the past so there may be more clues out on the internet. Have you tried emailing the company to see if they would publish the file format specifications (unlikely I fear, but worth asking). Peter From jblanca at btc.upv.es Fri Feb 27 05:57:49 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Feb 2009 11:57:49 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> References: <200902261612.54306.jblanca@btc.upv.es> <200902271005.28459.jblanca@btc.upv.es> <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> Message-ID: <200902271157.49948.jblanca@btc.upv.es> On Friday 27 February 2009 11:45:59 Peter wrote: > On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca wrote: > That's much clearer - is the Genographer software showing the actual > image (zoomed as required, with the colours adjusted as required), or > an artificial recreation? Is an artificial recreation, the same as I'm trying to accomplish. I just want more resolution an automated process (genographer is a GUI application) > Are you trying to create this figure for illustrative purposes only? > I mean would a slightly cartoon like recreation be fine, or are you > trying to make it as realistic as possible? I want to analyze it. > I see you are having to reverse engineer their file format. I guess > other people have tried this in the past so there may be more clues > out on the internet. Have you tried emailing the company to see if > they would publish the file format specifications (unlikely I fear, > but worth asking). Fortunately the ABIF was reverse enginered by people more clever than me. And a couple of years ago Applied published an specification. http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/ABIF_File_Format.pdf You can't beleive everything in that specification, but it is a good start. Reading an abif file is not a problem, drawing the gel with as little coding as possible is another thing. Regards, Jose Blanca From biopython at maubp.freeserve.co.uk Fri Feb 27 06:13:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 11:13:45 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <1235175883.22598.62.camel@lafa> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <1235175883.22598.62.camel@lafa> Message-ID: <320fb6e00902270313o3c860b4eweb56a0a1cdc87e80@mail.gmail.com> On Sat, Feb 21, 2009 at 12:24 AM, Iddo Friedberg wrote: > > Hi all, > > I am sort of living in this world right now, doing a lot of > metagenomics, so here are my $0.02. I agree with Leighton (assuming I > understand him): We should consider the possible applications people > will run using the quality data when designing the > > from what I have seen the ?most common use for quality scores is for > trimming the sequences, i.e. removing the lesser quality sequence data > (usually on the edges) from the 5' and 3' ends of the read. So any data > structure should take into consideration that we will probably have > a .trim(self,threshold) method or function trim(seq, threshold) that > will return a slice of the sequence. I'm note convinced the SeqRecord needs a trim method (and if it did, it would also need to take an argument saying which per-letter-annotation should use, e.g. the PHRED qualities). But yes, this is an excellent example of where it would be very useful to have the SeqRecord support slicing which also slices the quality information (as recently discussed, with an implementation on Bug 2507). I've got a related example use-case, trimming primer sequences from the raw reads (and trimming the quality scores to match) before assembly. If the quality scores are recorded in a per-letter-annotation dictionary which is integrated into SeqRecord slicing, this becomes fairly straight forward. First read in the data (most simply from a FASTQ file). You look at the SeqRecord's seq to determine where to cut the sequence, and then apply the slice to the SeqRecord - this will give you a new SeqRecord with the appropriate sub-sequence and the appropriate sub-list of the quality scores. You can then save this data, either as a FASTQ file, or paired FASTA and QUAL files. Peter From dalloliogm at gmail.com Fri Feb 27 06:50:03 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 27 Feb 2009 12:50:03 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> Message-ID: <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> On Wed, Feb 25, 2009 at 11:00 PM, Peter wrote: > On Wed, Feb 25, 2009 at 10:02 AM, Giovanni Marco Dall'Olio > wrote: >> >> Of course you can stick with bugzilla, ... >> > > I've put an updated patch on Bug 2507 which implements the > per-letter-annotations as a restricted dictionary (as the > letter_annotations property for now), and adds a __getitem__ method to > the SeqRecord object which is aware of it. Hi, I have applied your patch to my unofficial github branch. Here it is: - http://github.com/dalloliogm/biopython/commit/51383b0e91b46f66ca20b36707c3a21a3dcbf0fb People not wishing to use git can download the code anyway, by clicking on 'download' in this page: - http://github.com/dalloliogm/biopython/tree/qualityscores-experimental The right button to click is the 'download' near the 'watch' button. I know there is a second 'Downloads' page which creates confusion, but it doesn't have nothing to do with it. On the branches graph there is a bit of confusion now (my fault), but you can see that I have applied your patch over a recent version of biopython (there are some commits that I didn't include yet). p.s. on your patch (http://bugzilla.open-bio.org/attachment.cgi?id=1249), on the third change, you modify this in SeqRecord.__init__: 95c120 < self.seq = seq --- > self._seq = seq can it be an error? Why self.seq has been moved to self._seq? > This changes both > SeqRecord.py and SeqFeature.py (required for switching the > co-ordinates on SeqFeature objects as part of a SeqRecord slice), and > is against the current CVS code. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2507 > > If any of you aren't familiar with using the command line tools diff > and patch, here's what you would do to try this code. ?Get a copy of > the latest Biopython code from CVS, change to the Bio directory, > download the attachment and save it in that directory as > attachment.patch (for example) then and run "patch < attachment.patch" > to update the code. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Feb 27 07:12:11 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 12:12:11 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> Message-ID: <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> > p.s. on your patch > (http://bugzilla.open-bio.org/attachment.cgi?id=1249), on the third > change, you modify this in SeqRecord.__init__: > > 95c120 > < ? ? ? ? self.seq = seq > --- >> ? ? ? ? self._seq = seq > > can it be an error? Why self.seq has been moved to self._seq? It is deliberate. Before the patch, the SeqRecord's .seq was a "naked" attribute. After the patch, the actual sequence hidden in the private attribute ._seq and is publicly exposed using a property (also known as a "managed attribute") with a get and set method (and a doc string). The reason for doing this is I want to have some code run when ever anyone tries to set the seq property to a new value (in order prevent the seq and per-letter-annotation getting out of sync). Peter From dalloliogm at gmail.com Fri Feb 27 07:20:51 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 27 Feb 2009 13:20:51 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> Message-ID: <5aa3b3570902270420j5b97932fo3f79bb4cd19566b0@mail.gmail.com> On Fri, Feb 27, 2009 at 1:12 PM, Peter wrote: >> p.s. on your patch >> (http://bugzilla.open-bio.org/attachment.cgi?id=1249), on the third >> change, you modify this in SeqRecord.__init__: >> >> 95c120 >> < ? ? ? ? self.seq = seq >> --- >>> ? ? ? ? self._seq = seq >> >> can it be an error? Why self.seq has been moved to self._seq? > > It is deliberate. ?Before the patch, the SeqRecord's .seq was a > "naked" attribute. ?After the patch, the actual sequence hidden in the > private attribute ._seq and is publicly exposed using a property (also > known as a "managed attribute") with a get and set method (and a doc > string). ?The reason for doing this is I want to have some code run > when ever anyone tries to set the seq property to a new value (in > order prevent the seq and per-letter-annotation getting out of sync). I see, that is pretty nice. now you define seq with the 'property' function. p.s. it is not exactly related... but I was reading this article about python 3: - http://www.informit.com/articles/article.aspx?p=1309289&seqNum=4 Look at the example: the class StockItem has an attribute called 'quantity' which can be comprised only between 1 and 1000; when someone tries to modify it to a negative number, an exception is raised. Maybe it can be interesting for biopython 3 :-) > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Feb 27 07:26:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 12:26:19 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902270420j5b97932fo3f79bb4cd19566b0@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> <5aa3b3570902270420j5b97932fo3f79bb4cd19566b0@mail.gmail.com> Message-ID: <320fb6e00902270426t60ba1970ld7e576a90d9ca99e@mail.gmail.com> On Fri, Feb 27, 2009 at 12:20 PM, Giovanni Marco Dall'Olio wrote: > I see, that is pretty nice. > now you define seq with the 'property' function. > > p.s. it is not exactly related... but I was reading this article about python 3: > - http://www.informit.com/articles/article.aspx?p=1309289&seqNum=4 > Look at the example: the class StockItem has an attribute called > 'quantity' which can be comprised only between 1 and 1000; when > someone tries to modify it to a negative number, an exception is > raised. > Maybe it can be interesting for biopython 3 :-) Actually decorators are available from Python 2.4+ http://www.python.org/dev/peps/pep-0318/ This is something we may want to look at once we've dropped support for Python 2.3 (Biopython 1.50 should be our last release to officially support Python 2.3). Peter From biopython at maubp.freeserve.co.uk Fri Feb 27 09:17:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 14:17:15 +0000 Subject: [Biopython-dev] Bio.NetCatch, Bio.FilteredReader and Bio.File.SGMLHandle Message-ID: <320fb6e00902270617u6ed9e230u35dc8e440fcd21cd@mail.gmail.com> Hello all, Earlier this month over on the main discussion list Michiel suggested we start the deprecation process for the Bio.NetCatch and Bio.FilteredReader modules and Bio.File.SGMLHandle class. http://lists.open-bio.org/pipermail/biopython/2009-February/004932.html http://lists.open-bio.org/pipermail/biopython/2009-February/004933.html We didn't have any response, so I have just updated the docstrings and the DEPRECATED file in CVS to declare them obsolete, stating that in a subsequent release they will be deprecated, and later removed. If anyone wants to, we could probably go with an immediate deprecation of these (plus also Bio.EZRetrieve), but I see no reason to hurry. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 27 13:25:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:25:42 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200902271825.n1RIPgLl011447@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1249 is|0 |1 obsolete| | ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-27 13:25 EST ------- Created an attachment (id=1250) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1250&action=view) Patch to SeqRecord.py and SeqFeature.py Updated the patch, fixes a couple of len(seq) which should have been len(self.seq), updates the __str__ method to show when there is per-letter-annotation. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 27 13:29:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:29:31 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902271829.n1RITVsp011783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1244 is|0 |1 obsolete| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-27 13:29 EST ------- Created an attachment (id=1251) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1251&action=view) Read/write support for FASTQ and QUAL files, using the per-letter-annotation dict Updated to: * use the per-letter-annotation dictionary added by the patch on Bug 2507 * read and write the Solexa FASTQ variant (which I plan to call "fastq-solexa" in Bio.SeqIO) * automatically convert PHRED/Solexa qualities when writing a file in the other format. This needs some more testing with real Solexa FASTQ files, but I expect to be able to do that next with with some real data from a colleague. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 27 13:31:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:31:23 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902271831.n1RIVNRG012028@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2507 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-27 13:31 EST ------- After discussion on the mailing list, storing the qualities values nicely will depend on the per-letter-annotation support being implemented on Bug 2507 (together with SeqRecord slicing). Marking this dependency in bugzilla. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 27 13:31:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:31:25 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200902271831.n1RIVP59012042@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2767 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sun Feb 1 08:38:03 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 1 Feb 2009 00:38:03 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite Message-ID: <104713.36194.qm@web62407.mail.re1.yahoo.com> Hi everybody, I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new version automatically detects whether a test is a unittest-style test or a print-and-compare test. By doing so, the unittest-style tests no longer need to have a file containing the test output in Tests/output. For users, run_tests.py works essentially the same as before. As changing the test framework is tricky business, I'd like to ask you to be careful with the Biopython tests, in particular to make sure that there are no bugs in the testing framework that would let test failures go unnoticed. If no problems show up in the next few weeks, we can start removing the output files of unittest-style tests from Biopython, as they're no longer needed. --Michiel. From dalloliogm at gmail.com Mon Feb 2 10:03:00 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 2 Feb 2009 11:03:00 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <104713.36194.qm@web62407.mail.re1.yahoo.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> Message-ID: <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon wrote: > Hi everybody, > > I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new version automatically detects whether a test is a unittest-style test or a print-and-compare test. By doing so, the unittest-style tests no longer need to have a file containing the test output in Tests/output. For users, run_tests.py works essentially the same as before. ok: - it seems it doesn't support doctest yet. - how this run_tests script is supposed to be called? Can you add this information in run_tests's docstring? If I run it from the biopython main directory (python Tests/run_tests.py) it gives me an error on test_AlignAce, but if I run it from within the Tests directory, it retunrs me an import error on test_CAPS. - some tests have some docstring associated. It would be more useful if, along with the name, you print these docs. For example, instead of: test_ACE ... ok It would be nice to see: test_ACE (tests the ACE module for .... which does ...) .... ok again, nose does this already. - while you are at it, it would be nice to be able to define some global fixtures for all tests. Something like setup_BioSQL ran only once and with a warning that it has been created. nose already does that by using the @classmethod syntax - it's not very intuitive at first but it works. There is something that has never been clear to me about biopython's doctest. Are they supposed to be ran by the developers only, or by the users who install biopython manually? Some of the test seems to be written to check whether biopython can run on the user's computer correctly, others are tests on the code. > > As changing the test framework is tricky business, I'd like to ask you to be careful with the Biopython tests, in particular to make sure that there are no bugs in the testing framework that would let test failures go unnoticed. If no problems show up in the next few weeks, we can start removing the output files of unittest-style tests from Biopython, as they're no longer needed. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Feb 2 10:29:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Feb 2009 10:29:13 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> Message-ID: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> On Mon, Feb 2, 2009 at 10:03 AM, Giovanni Marco Dall'Olio wrote: > On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon wrote: >> Hi everybody, >> >> I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new >> version automatically detects whether a test is a unittest-style test or a >> print-and-compare test. By doing so, the unittest-style tests no longer >> need to have a file containing the test output in Tests/output. For users, >> run_tests.py works essentially the same as before. > > ok: > - it seems it doesn't support doctest yet. I think Michiel has only switched over test_Cluster.py thus far. The doctests are currently run via test_docstrings.py which is still a print-and-compare test for now. > - how this run_tests script is supposed to be called? Can you add this > information in run_tests's docstring? I guess it could be more explicit. > If I run it from the biopython main directory (python > Tests/run_tests.py) it gives me an error on test_AlignAce, but if I > run it from within the Tests directory, it retunrs me an import error > on test_CAPS. Michiel hasn't changed this. From the Tests directory do: python run_tests.py Or, from the parent directory (typically between doing build and install): python setup.py test Trying to call run_tests.py from outside the Tests directory is not expected to work. As explained in the docstring for run_tests.py (read the start of the file), if you want to run just some of the tests, you can list them like this: python run_tests.py test_CAPS test_docstrings You can include the py extension here optionally. Could you show us the error with test_CAPS.py please, with details of your setup. This test is working for me. > - some tests have some docstring associated. It would be more useful > if, along with the name, you print these docs. > For example, instead of: > test_ACE ... ok > It would be nice to see: > test_ACE (tests the ACE module for .... which does ...) .... ok > again, nose does this already. I think it would be unnecessary text, of little interest to the typical user. > - while you are at it, it would be nice to be able to define some > global fixtures for all tests. > Something like setup_BioSQL ran only once and with a warning that it > has been created. > nose already does that by using the @classmethod syntax - it's not > very intuitive at first but it works. > > There is something that has never been clear to me about biopython's doctest. > Are they supposed to be ran by the developers only, or by the users > who install biopython manually? Both - developers, and optionally/ideally anyone installing from source. With CVS, they should also work for Windows users who used the installation setup exe, but this requires them to download the source code separately to get the unit tests. > Some of the test seems to be written to check whether biopython can > run on the user's computer correctly, others are tests on the code. In a sense they are all tests on the code - some of the code by its nature is a wrapper for a command line tool, so this may or not be present on the user's machine. Peter From mjldehoon at yahoo.com Mon Feb 2 10:48:33 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 2 Feb 2009 02:48:33 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> Message-ID: <50228.73742.qm@web62405.mail.re1.yahoo.com> Just for clarification: The only purpose of the run_tests.py rewrite is to remove the requirement of an output file for unittest-based tests. While personally I am in favor of unittest-based tests, it is not my intention to remove support for the print-and-compare tests. I expect that for the most part, the test scripts themselves won't need to be changed. A few test scripts will need to be adjusted; test_Cluster.py was one of them. The main visible result of the new run_tests.py is that we will be able to remove the output files in Tests/output/test_* for the unittest-based tests. As Peter wrote, the doctests are being run via test_docstrings.py, which is picked up by run_tests.py --Michiel. --- On Mon, 2/2/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: dalloliogm at gmail.com > Cc: mjldehoon at yahoo.com, biopython-dev at biopython.org > Date: Monday, February 2, 2009, 5:29 AM > On Mon, Feb 2, 2009 at 10:03 AM, Giovanni Marco > Dall'Olio > wrote: > > On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon > wrote: > >> Hi everybody, > >> > >> I just uploaded to CVS a rewritten version of > Tests/run_tests.py. This new > >> version automatically detects whether a test is a > unittest-style test or a > >> print-and-compare test. By doing so, the > unittest-style tests no longer > >> need to have a file containing the test output in > Tests/output. For users, > >> run_tests.py works essentially the same as before. > > > > ok: > > - it seems it doesn't support doctest yet. > > I think Michiel has only switched over test_Cluster.py thus > far. The > doctests are currently run via test_docstrings.py which is > still a > print-and-compare test for now. > > > - how this run_tests script is supposed to be called? > Can you add this > > information in run_tests's docstring? > > I guess it could be more explicit. > > > If I run it from the biopython main directory (python > > Tests/run_tests.py) it gives me an error on > test_AlignAce, but if I > > run it from within the Tests directory, it retunrs me > an import error > > on test_CAPS. > > Michiel hasn't changed this. From the Tests directory > do: > python run_tests.py > > Or, from the parent directory (typically between doing > build and install): > python setup.py test > > Trying to call run_tests.py from outside the Tests > directory is not > expected to work. > > As explained in the docstring for run_tests.py (read the > start of the > file), if you want to run just some of the tests, you can > list them > like this: > > python run_tests.py test_CAPS test_docstrings > > You can include the py extension here optionally. > > Could you show us the error with test_CAPS.py please, with > details of > your setup. This test is working for me. > > > - some tests have some docstring associated. It would > be more useful > > if, along with the name, you print these docs. > > For example, instead of: > > test_ACE ... ok > > It would be nice to see: > > test_ACE (tests the ACE module for .... which does > ...) .... ok > > again, nose does this already. > > I think it would be unnecessary text, of little interest to > the typical user. > > > - while you are at it, it would be nice to be able to > define some > > global fixtures for all tests. > > Something like setup_BioSQL ran only once and with a > warning that it > > has been created. > > nose already does that by using the @classmethod > syntax - it's not > > very intuitive at first but it works. > > > > There is something that has never been clear to me > about biopython's doctest. > > Are they supposed to be ran by the developers only, or > by the users > > who install biopython manually? > > Both - developers, and optionally/ideally anyone installing > from source. > > With CVS, they should also work for Windows users who used > the > installation setup exe, but this requires them to download > the source > code separately to get the unit tests. > > > Some of the test seems to be written to check whether > biopython can > > run on the user's computer correctly, others are > tests on the code. > > In a sense they are all tests on the code - some of the > code by its > nature is a wrapper for a command line tool, so this may or > not be > present on the user's machine. > > Peter From biopython at maubp.freeserve.co.uk Mon Feb 2 11:16:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 2 Feb 2009 11:16:30 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <50228.73742.qm@web62405.mail.re1.yahoo.com> References: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <50228.73742.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00902020316tfa20931r6fe39444cc958adc@mail.gmail.com> On Mon, Feb 2, 2009 at 10:48 AM, Michiel de Hoon wrote: > > Just for clarification: > The only purpose of the run_tests.py rewrite is to remove the requirement > of an output file for unittest-based tests. While personally I am in favor of > unittest-based tests, it is not my intention to remove support for the > print-and-compare tests. I expect that for the most part, the test scripts > themselves won't need to be changed. A few test scripts will need to be > adjusted; test_Cluster.py was one of them. The main visible result of the > new run_tests.py is that we will be able to remove the output files in > Tests/output/test_* for the unittest-based tests. I've found something that will need changing. Consider the following output (based on what run_tests.py is now doing; this was tested on Mac OS X, Python 2.5.2): >>> import unittest >>> unittest.TestLoader().loadTestsFromModule(__import__("test_Cluster")).countTestCases() 7 >>> unittest.TestLoader().loadTestsFromModule(__import__("test_Motif")).countTestCases() 0 >>> unittest.TestLoader().loadTestsFromModule(__import__("test_Phd")).countTestCases() 0 We need to override testMethodPrefix to "t" instead of the default of "test" in order to detect these (and others like them). >>> test_loader = unittest.TestLoader() >>> test_loader.testMethodPrefix="t" >>> test_loader.loadTestsFromModule(__import__("test_Phd")).countTestCases() 2 >>> test_loader.loadTestsFromModule(__import__("test_Motif")).countTestCases() 8 We could just have run_tests.py check using either prefix, or we can standardise on one. I think we have more unit tests using the "t" prefix than the "test" prefix - so it would be simpler to standardise on using "t_*", although on the other hand, using "test_*" fits with the default. Which do you prefer Michiel? Peter From n.j.loman at bham.ac.uk Mon Feb 2 11:54:50 2009 From: n.j.loman at bham.ac.uk (Nick Loman) Date: Mon, 02 Feb 2009 11:54:50 +0000 Subject: [Biopython-dev] Problems importing GenBank Files with complex LOCATION tags Message-ID: <4986DF0A.1040103@bham.ac.uk> Hi there, I'm attempting to import the whole of RefSeq into a BioSQL schema using the BioPython loader. However, I am encountering problems with items in the CON division, such as NW_002063152. I am using stock Biopython 1.49 install. The problem occurs when parsing complex CONTIG location tags, such as the following (spacing adjusted for readability): CONTIG join(NZ_ABJI01000250.1:1..6235,gap(unk100), NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802, gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100), NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192, gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100), NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364, gap(511),NZ_ABJI01000259.1:1..17405,gap(unk100), NZ_ABJI01000260.1:1..2462,gap(570),NZ_ABJI01000261.1:1..3348, gap(410),NZ_ABJI01000262.1:1..815,gap(196), NZ_ABJI01000263.1:1..589) I have worked around the problem by rewriting during my import to produce a blank ORIGIN definition, which at least gets the sequence features imported. I realise complex location parsing has been discussed before on this list - would the authors expect this to parse correctly, or is it out of the scope of the current code? Best regards, Nick. From bsouthey at gmail.com Mon Feb 2 14:39:03 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 02 Feb 2009 08:39:03 -0600 Subject: [Biopython-dev] Problems importing GenBank Files with complex LOCATION tags In-Reply-To: <4986DF0A.1040103@bham.ac.uk> References: <4986DF0A.1040103@bham.ac.uk> Message-ID: <49870587.2080009@gmail.com> Hi, I guess this pertains to Bugs 2681 and 2745. Please see Peter's comments and suggested patch to Bug 2745. http://bugzilla.open-bio.org/show_bug.cgi?id=2681 http://bugzilla.open-bio.org/show_bug.cgi?id=2745 Any comments or thoughts on these would be appreciated! Thanks Bruce Nick Loman wrote: > Hi there, > > I'm attempting to import the whole of RefSeq into a BioSQL schema > using the BioPython loader. However, I am encountering problems with > items in the CON division, such as NW_002063152. I am using stock > Biopython 1.49 install. > > The problem occurs when parsing complex CONTIG location tags, such as > the following (spacing adjusted for readability): > > CONTIG > join(NZ_ABJI01000250.1:1..6235,gap(unk100), > NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802, > gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100), > NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192, > gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100), > NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364, > gap(511),NZ_ABJI01000259.1:1..17405,gap(unk100), > NZ_ABJI01000260.1:1..2462,gap(570),NZ_ABJI01000261.1:1..3348, > gap(410),NZ_ABJI01000262.1:1..815,gap(196), > NZ_ABJI01000263.1:1..589) > > I have worked around the problem by rewriting during my import to > produce a blank ORIGIN definition, which at least gets the sequence > features imported. > > I realise complex location parsing has been discussed before on this > list - would the authors expect this to parse correctly, or is it out > of the scope of the current code? > > Best regards, > > Nick. > > > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Mon Feb 2 16:53:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 2 Feb 2009 11:53:28 -0500 Subject: [Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file In-Reply-To: Message-ID: <200902021653.n12GrS1a028869@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2745 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-02 11:53 EST ------- (In reply to comment #1) > Created an attachment (id=1213) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1213&action=view) [details] > Example of a single GenBank CON record that fails For interest, and as a possible work around, note that you can download this GenBank file from Entrez WITH the sequence. First of all, try this: >>> from Bio import Entrez >>> Entrez.email = "A.N.Other at example.com" >>> data = Entrez.efetch("nucleotide",id="FA000001",rettype="genbank",retmode="text").read() >>> out_handle = open("FA000001.gbk","w") >>> out_handle.write(data) >>> out_handle.close() This gives the CONTIG line without the actual nucleotides (as in Bruce's attachment, which I assume came from the NCBI's FTP site). However, from reading the Entrez documentation, we can get the nucleotides too by asking for "gbwithparts" instead of "gb" (or its equivalent, "genbank"). See http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html#SequenceDatabases i.e. >>> data = Entrez.efetch("nucleotide",id="FA000001",rettype="gbwithparts",retmode="text").read() >>> out_handle = open("FA000001.gbwithparts.gbk","w") >>> out_handle.write(data) >>> out_handle.close() I was getting some "Service unavailable!" or proxy errors earlier (which Bio.Entrez wasn't catching - I've updated it in CVS), but this does work giving a 12.8 MB file with the full sequence (with plenty of sections with an N). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 3 10:03:04 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Feb 2009 05:03:04 -0500 Subject: [Biopython-dev] [Bug 2748] New: test_GAQueens's documentation refers to an unknown script 'place_queens.py' Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2748 Summary: test_GAQueens's documentation refers to an unknown script 'place_queens.py' Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: trivial Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com The test_GAQueens docstring refers to a script called 'place_queens.py', and it is not clear what it is: 12 python place_queens.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Feb 3 10:18:01 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 3 Feb 2009 11:18:01 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> Message-ID: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> On Mon, Feb 2, 2009 at 11:29 AM, Peter wrote: > On Mon, Feb 2, 2009 at 10:03 AM, Giovanni Marco Dall'Olio > wrote: >> On Sun, Feb 1, 2009 at 9:38 AM, Michiel de Hoon wrote: >>> Hi everybody, >>> >>> I just uploaded to CVS a rewritten version of Tests/run_tests.py. This new >>> version automatically detects whether a test is a unittest-style test or a >>> print-and-compare test. By doing so, the unittest-style tests no longer >>> need to have a file containing the test output in Tests/output. For users, >>> run_tests.py works essentially the same as before. >> >> ok: >> - it seems it doesn't support doctest yet. > > I think Michiel has only switched over test_Cluster.py thus far. The > doctests are currently run via test_docstrings.py which is still a > print-and-compare test for now. ah! I see. However, this way, test_docstring will be difficult to mantain in the future. A better solution would be to have run_test.py go throught all biopython's modules, and then execute every doctest it encounters. You can do this with doctest.DocTestFinder (have a look at nose's code, which does it already: - http://code.google.com/p/python-nose/source/browse/trunk/nose/plugins/doctests.py) > Could you show us the error with test_CAPS.py please, with details of > your setup. This test is working for me. sorry.. it works fine if I run it from within the Tests dir. >> - some tests have some docstring associated. It would be more useful >> if, along with the name, you print these docs. >> For example, instead of: >> test_ACE ... ok >> It would be nice to see: >> test_ACE (tests the ACE module for .... which does ...) .... ok >> again, nose does this already. > > I think it would be unnecessary text, of little interest to the typical user. It would be useful to make sure that every test is documented. Most of the tests in biopython are not: for example, can you tell which is the difference between test_Fasta.py and test_Fasta2.py? Moreover, why the typical user should be running biopython's tests? >> - while you are at it, it would be nice to be able to define some >> global fixtures for all tests. >> Something like setup_BioSQL ran only once and with a warning that it >> has been created. >> nose already does that by using the @classmethod syntax - it's not >> very intuitive at first but it works. What about having support to global fixtures? For example, many test scripts begin in the same way: they 'import numpy', check for python's version, etc.. All of this could be moved to a global fixture and then executed only once for all the tests. All the Bio.Seq files could open the sequence files only once, therefore it will be easier to write more complex tests. The Bio.BioSQL modules could create a database only once, reducing memory usage. > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Tue Feb 3 10:30:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Feb 2009 05:30:50 -0500 Subject: [Biopython-dev] [Bug 2748] test_GAQueens's documentation refers to an unknown script 'place_queens.py' In-Reply-To: Message-ID: <200902031030.n13AUolG010593@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2748 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Version|Not Applicable |1.49 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-03 05:30 EST ------- Fixed in CVS (replacing references to place_queens.py with test_GAQueens.py). I believe test_GAQueens.py used to be called place_queens.py before it was re-used as a test case. Note that when it is run from the test suite, 5 queens are used. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 3 10:35:09 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 10:35:09 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> Message-ID: <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> >> I think Michiel has only switched over test_Cluster.py thus far. The >> doctests are currently run via test_docstrings.py which is still a >> print-and-compare test for now. > > ah! I see. I was wrong - as Michiel clarified in a later comment, run_tests.py should have been finding all the unittest based tests (but right now it isn't). As in my earlier email, some of our unittest cases use a prefix of "t" and others use "test" meaning only some of the unittest test cases are currently being detected. One this is fixed, then test_docstring should work too. >> Could you show us the error with test_CAPS.py please, with details of >> your setup. This test is working for me. > > sorry.. it works fine if I run it from within the Tests dir. Good. Thanks. Peter From mjldehoon at yahoo.com Tue Feb 3 11:38:06 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 03:38:06 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902020316tfa20931r6fe39444cc958adc@mail.gmail.com> Message-ID: <362230.6662.qm@web62401.mail.re1.yahoo.com> Good catch. > We need to override testMethodPrefix to "t" > instead of the default of "test" in order > to detect these (and others like them). ... > We could just have run_tests.py check using either prefix, > or we can standardise on one. I think we have more unit > tests using the "t" prefix than the "test" prefix - so it > would be simpler to standardise on using "t_*", although > on the other hand, using> "test_*" fits with > the default. Which do you prefer Michiel? I prefer sticking to the default ... changing the method names from t_* to test_* needs to be done only once, whereas if we continue to use t_* we'll have to remind ourselves of that for all future tests that will be written. So I've changed all the t_* method names to test_*. These tests should run now. Thanks again for noticing this bug. --Michiel. From dalloliogm at gmail.com Tue Feb 3 11:46:54 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 3 Feb 2009 12:46:54 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> Message-ID: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> ufff I am sorry but the more I think about it, the more it seems a nonsense to me.. Why are you writing a new test-discovery framework for biopython, when there are many already available that work fine and better? Isn't it a waste of time, really? I am not criticizing you - but speaking from a purely technical point of view, I really don't understand. If you are worried that using nose will add a new prerequisite to biopython (which is not true, by the way), you can easily include the nose executable within the test dir, as I think many other projects already do; Honestly, I have the feeling that you didn't even had a look at all the links I posted in the old discussion on nose, neither you have tried it, and that's so bad. You didn't discuss about the pros or cons of nose, you just kept saying 'it would add a prerequisite to biopython' (which is not true, again), and started writing your own new test discovery framework. With nose, you could have a good testing infrastructure and take advantage of things like global fixtures, automatic formatting of the output, integration with profilers, and a lot of things more. It seems a nonsense to me, because with biopython you provide source code that you make available to all the bioinformaticians, with the idea that reuse of the code is good; but then, you don't want to use the code written by someone else. I have seen many bioinformatician telling me that they don't use biopython because they don't have the time to study it and they don't know how it works. I really believe that this is terrible, making the whole bioinformatics field a mess. Cheers :) On Tue, Feb 3, 2009 at 11:35 AM, Peter wrote: >>> I think Michiel has only switched over test_Cluster.py thus far. The >>> doctests are currently run via test_docstrings.py which is still a >>> print-and-compare test for now. >> >> ah! I see. > > I was wrong - as Michiel clarified in a later comment, run_tests.py > should have been finding all the unittest based tests (but right now > it isn't). As in my earlier email, some of our unittest cases use a > prefix of "t" and others use "test" meaning only some of the unittest > test cases are currently being detected. One this is fixed, then > test_docstring should work too. > >>> Could you show us the error with test_CAPS.py please, with details of >>> your setup. This test is working for me. >> >> sorry.. it works fine if I run it from within the Tests dir. > > Good. Thanks. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Tue Feb 3 11:55:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 11:55:40 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> Message-ID: <320fb6e00902030355r373be6d2taf6d1926f2f83757@mail.gmail.com> On Tue, Feb 3, 2009 at 11:46 AM, Giovanni Marco Dall'Olio wrote: > > ufff I am sorry but the more I think about it, the more it seems a > nonsense to me.. > Why are you writing a new test-discovery framework for biopython, when > there are many already available that work fine and better? > Isn't it a waste of time, really? I am not criticizing you - but > speaking from a purely technical point of view, I really don't > understand. We're NOT writing a new test-discovery framework - in this recent change we're reusing part of the existing unittest framework included with python. > If you are worried that using nose will add a new prerequisite to > biopython (which is not true, by the way), you can easily include the > nose executable within the test dir, as I think many other projects > already do; Using nose would be another prerequisite for anyone running the tests (although as you point out, it may be possible to include it with Biopython). > Honestly, I have the feeling that you didn't even had a look at all > the links I posted in the old discussion on nose, neither you have > tried it, and that's so bad. You didn't discuss about the pros or cons > of nose, you just kept saying 'it would add a prerequisite to > biopython' (which is not true, again), and started writing your own > new test discovery framework. We didn't just start writing our own framework (which I agree would be a waste of time). We already had a simple framework, and with Michiel's recent changes it make more use of the python unittest infrastructure. Peter From mjldehoon at yahoo.com Tue Feb 3 11:52:05 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 03:52:05 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> Message-ID: <752614.16176.qm@web62402.mail.re1.yahoo.com> > However, this way, test_docstring will be difficult to > mantain in the future. > A better solution would be to have run_test.py go through > all biopython's modules, and then execute every doctest it > encounters. > You can do this with doctest.DocTestFinder (have a look at > nose's code, which does it already: Can doctest.DocTestFinder handle missing external dependencies? For example, if a user installed Biopython without NumPy, then the NumPy-dependent modules should be skipped and not flagged as errors. > Moreover, why the typical user should be running > biopython's tests? To make sure that it works. Biopython interacts with and therefore depends more on 3rd party software, web servers, and file formats than most other Python modules. Things are more likely to break than for example for a more self-contained library such as NumPy. I always run the Biopython tests, and I would advise every user to do so too. In addition, the tests can function as example scripts showing how to use Biopython. It is important that all users can run those scripts. > What about having support to global fixtures? > For example, many test scripts begin in the same way: they > 'import > numpy', check for python's version, etc.. All of > this could be moved > to a global fixture and then executed only once for all the > tests. Hmm... currently the Biopython tests can be written essentially independently of each other, without knowing much about the testing overall framework. I think that that makes it easier for new users/developers to add tests. I think we should avoid the situation that somebody first has to study Biopython's testing framework to be able to add a test. --Michiel. From mjldehoon at yahoo.com Tue Feb 3 12:21:05 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 04:21:05 -0800 (PST) Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd Message-ID: <240911.28388.qm@web62402.mail.re1.yahoo.com> These three tests currently are written as a combination of a unittest-based test and a print-and-compare test. That is, they contain classes deriving from unittest.TestCase, but then print out stuff that should get compared to the output file. However, run_tests.py assumes that they are true unittest-style tests, so the comparison is never done. Does anybody mind if I convert these three to pure print-and-compare or pure unittest-style tests? test_Ace.py and test_Nexus.py produce lots of output, so I'm tempted to go with a print-and-compare test there; test_Phd.py might work well as a unittest-style test. --Michiel. From mjldehoon at yahoo.com Tue Feb 3 12:26:02 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 04:26:02 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> Message-ID: <425937.30005.qm@web62402.mail.re1.yahoo.com> Maybe it was a mistake to call this a rewrite ... basically all I'm doing is making some changes in run_tests.py so that it will distinguish between unittest-style tests and print-and-compare tests, and cleaning up some code while I'm at it. This will allow us to remove the trivial output files for the unittest-style tests, which were a real annoyance because they had to be updated whenever a new test was added to an existing test script. And since the output files did not contain any real information, people tended to forget that. Maybe nose can do the same as unittest, but unittest comes with Python and nose does not, so as long as unittest does the job, I see no reason to change to nose. --Michiel. --- On Tue, 2/3/09, Giovanni Marco Dall'Olio wrote: > From: Giovanni Marco Dall'Olio > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: "Peter" > Cc: biopython-dev at biopython.org > Date: Tuesday, February 3, 2009, 6:46 AM > ufff I am sorry but the more I think about it, the more it > seems a > nonsense to me.. > Why are you writing a new test-discovery framework for > biopython, when > there are many already available that work fine and better? > Isn't it a waste of time, really? I am not criticizing > you - but > speaking from a purely technical point of view, I really > don't > understand. > > If you are worried that using nose will add a new > prerequisite to > biopython (which is not true, by the way), you can easily > include the > nose executable within the test dir, as I think many other > projects > already do; > > Honestly, I have the feeling that you didn't even had a > look at all > the links I posted in the old discussion on nose, neither > you have > tried it, and that's so bad. You didn't discuss > about the pros or cons > of nose, you just kept saying 'it would add a > prerequisite to > biopython' (which is not true, again), and started > writing your own > new test discovery framework. > With nose, you could have a good testing infrastructure and > take > advantage of things like global fixtures, automatic > formatting of the > output, integration with profilers, and a lot of things > more. > > It seems a nonsense to me, because with biopython you > provide source > code that you make available to all the bioinformaticians, > with the > idea that reuse of the code is good; but then, you > don't want to use > the code written by someone else. > I have seen many bioinformatician telling me that they > don't use > biopython because they don't have the time to study it > and they don't > know how it works. I really believe that this is terrible, > making the > whole bioinformatics field a mess. > > Cheers :) > > On Tue, Feb 3, 2009 at 11:35 AM, Peter > wrote: > >>> I think Michiel has only switched over > test_Cluster.py thus far. The > >>> doctests are currently run via > test_docstrings.py which is still a > >>> print-and-compare test for now. > >> > >> ah! I see. > > > > I was wrong - as Michiel clarified in a later comment, > run_tests.py > > should have been finding all the unittest based tests > (but right now > > it isn't). As in my earlier email, some of our > unittest cases use a > > prefix of "t" and others use > "test" meaning only some of the unittest > > test cases are currently being detected. One this is > fixed, then > > test_docstring should work too. > > > >>> Could you show us the error with test_CAPS.py > please, with details of > >>> your setup. This test is working for me. > >> > >> sorry.. it works fine if I run it from within the > Tests dir. > > > > Good. Thanks. > > > > Peter > > > > > > -- > > My blog on bioinformatics (now in English): > http://bioinfoblog.it > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Tue Feb 3 13:12:51 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 13:12:51 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <425937.30005.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <425937.30005.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> On Tue, Feb 3, 2009 at 12:26 PM, Michiel de Hoon wrote: > Maybe it was a mistake to call this a rewrite ... With hindsight, it did give the impression of something bigger happening. Oh well. > ... basically all I'm doing is making some changes in run_tests.py so that it will > distinguish between unittest-style tests and print-and-compare tests, and > cleaning up some code while I'm at it. In terms of cleaning up the code, something we can probably now remove from the print-and-compare handler is the special case of modules called via a run_tests method. I'd like to suggest removing this bit (lines 167 to 171 at the moment): try: cur_test.run_tests([]) except AttributeError: pass [As an aside, using a hasattr(module,"run_tests") would be safer in case the test itself raised an AttributeError. If we remove this code it doesn't matter.] Currently I think only test_GAQueens.py requires this "magic" which can be solved by making it explicitly default to running with five queens. Right now it is not at all clear from looking at this example how this default happens if run via run_tests.py but not when running test_GAQueens.py on its own. The only other print-and-compare module I found with a run_tests function is test_NNExclusiveOr.py but here it makes no difference as the same code gets called via the __main__ trick. Peter From biopython at maubp.freeserve.co.uk Tue Feb 3 13:26:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 13:26:15 +0000 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <240911.28388.qm@web62402.mail.re1.yahoo.com> References: <240911.28388.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00902030526i6c77c327ue346a4a14c545c93@mail.gmail.com> On Tue, Feb 3, 2009 at 12:21 PM, Michiel de Hoon wrote: > These three tests currently are written as a combination of a unittest-based > test and a print-and-compare test. That is, they contain classes deriving from > unittest.TestCase, but then print out stuff that should get compared to the > output file. However, run_tests.py assumes that they are true unittest-style > tests, so the comparison is never done. That makes sense - its good there are only three of them! > Does anybody mind if I convert these three to pure print-and-compare or > pure unittest-style tests? test_Ace.py and test_Nexus.py produce lots of > output, so I'm tempted to go with a print-and-compare test there; > test_Phd.py might work well as a unittest-style test. That sounds sensible - unless Frank or Cymon want to help out carry on. [I've recently fixed a couple of tear-down problems in test_PopGen_FDist.py and test_PopGen_SimCoal_nodepend.py to do with trying to remove files/directories which may not have been created if the test failed.] Peter From biopython at maubp.freeserve.co.uk Tue Feb 3 14:02:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 3 Feb 2009 14:02:25 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> References: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <425937.30005.qm@web62402.mail.re1.yahoo.com> <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> Message-ID: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> Michiel, I've noticed that for print-and-compare tests we can get unexpected errors from the line: module = __import__(name) For example, if there is an IOError in test_SeqIO_online.py this does not get caught - we only try to catch a MissingExternalDependencyError. Perhaps we should also catch any generic exception and report that test as a failure. Otherwise, the run_test.py file terminates prematurely. Would you like to look into this, or should I? Peter From bsouthey at gmail.com Tue Feb 3 15:15:37 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 03 Feb 2009 09:15:37 -0600 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030355r373be6d2taf6d1926f2f83757@mail.gmail.com> References: <104713.36194.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902020203y21f37037vbae65cc17c7ca563@mail.gmail.com> <320fb6e00902020229l3fcff3a0r80f46e12c446bf3e@mail.gmail.com> <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <320fb6e00902030235r302ec193g5975d20e824f352e@mail.gmail.com> <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <320fb6e00902030355r373be6d2taf6d1926f2f83757@mail.gmail.com> Message-ID: <49885F99.30701@gmail.com> Hi, I do get your point and I do agree with it. In part I see this as a necessary step to clean up the current tests that would permit a smoother change to a different testing framework if or when necessary. It is a different question whether or not to do such a change. Technically nose is required by Numpy 1.2+ (but only for testing) so it is not really an extra dependency on Biopython (unless Biopython is split into two components - with and without Numpy). But I do not see an real advantage for a new testing framework in the current code base without a major effect to change everything at once (I would at least act as a tester for any new framework). Perhaps it would make better sense to do that when porting Biopython to Python 3 because the tests will need to be examined and perhaps rewritten. Bruce Peter wrote: > On Tue, Feb 3, 2009 at 11:46 AM, Giovanni Marco Dall'Olio > wrote: > >> ufff I am sorry but the more I think about it, the more it seems a >> nonsense to me.. >> Why are you writing a new test-discovery framework for biopython, when >> there are many already available that work fine and better? >> Isn't it a waste of time, really? I am not criticizing you - but >> speaking from a purely technical point of view, I really don't >> understand. >> > > We're NOT writing a new test-discovery framework - in this recent > change we're reusing part of the existing unittest framework included > with python. > > >> If you are worried that using nose will add a new prerequisite to >> biopython (which is not true, by the way), you can easily include the >> nose executable within the test dir, as I think many other projects >> already do; >> > > Using nose would be another prerequisite for anyone running the tests > (although as you point out, it may be possible to include it with > Biopython). > > >> Honestly, I have the feeling that you didn't even had a look at all >> the links I posted in the old discussion on nose, neither you have >> tried it, and that's so bad. You didn't discuss about the pros or cons >> of nose, you just kept saying 'it would add a prerequisite to >> biopython' (which is not true, again), and started writing your own >> new test discovery framework. >> > > We didn't just start writing our own framework (which I agree would be > a waste of time). We already had a simple framework, and with > Michiel's recent changes it make more use of the python unittest > infrastructure. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From dalloliogm at gmail.com Wed Feb 4 00:13:43 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 01:13:43 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <425937.30005.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570902030346r69d1677djc22550a1cc68b7c8@mail.gmail.com> <425937.30005.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570902031613l6d69908w72331a532ca3c095@mail.gmail.com> On 2/3/09, Michiel de Hoon wrote: > Maybe it was a mistake to call this a rewrite ... basically all I'm doing is making some changes in run_tests.py so that it will distinguish between unittest-style tests and print-and-compare tests, and cleaning up some code while I'm at it. This will allow us to remove the trivial output files for the unittest-style tests, which were a real annoyance because they had to be updated whenever a new test was added to an existing test script. And since the output files did not contain any real information, people tended to forget that. > Maybe nose can do the same as unittest, but unittest comes with Python and nose does not, so as long as unittest does the job, I see no reason to change to nose. uff no!! :) nose is not a library, neither it is a substitute for unittest. it is a tool that you run from the command line and does exactly what you are doing with your run_tests.py: it finds and discover any function resembling a test (including unittests) and execute them. Only, it does it very well, since it is developed by many people and it is more than one year old. -> http://code.google.com/p/python-nose/wiki/NosetestsUsage > > --Michiel. > > > > > > > > > > -- > > > > My blog on bioinformatics (now in English): > > http://bioinfoblog.it > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Wed Feb 4 00:35:07 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 01:35:07 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <752614.16176.qm@web62402.mail.re1.yahoo.com> References: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <752614.16176.qm@web62402.mail.re1.yahoo.com> Message-ID: <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> On 2/3/09, Michiel de Hoon wrote: > > However, this way, test_docstring will be difficult to > > mantain in the future. > > A better solution would be to have run_test.py go through > > > all biopython's modules, and then execute every doctest it > > encounters. > > You can do this with doctest.DocTestFinder (have a look at > > nose's code, which does it already: > > > Can doctest.DocTestFinder handle missing external dependencies? For example, if a user installed Biopython without NumPy, then the NumPy-dependent modules should be skipped and not flagged as errors. mmm no idea, sorry :( > > > Moreover, why the typical user should be running > > biopython's tests? > > > To make sure that it works. Biopython interacts with and therefore depends more on 3rd party software, web servers, and file formats than most other Python modules. Things are more likely to break than for example for a more self-contained library such as NumPy. I always run the Biopython tests, and I would advise every user to do so too. In addition, the tests can function as example scripts showing how to use Biopython. It is important that all users can run those scripts. I think that all the tests which check if biopython can run correctly on a computer should be separated from all the others. Why do I have to test whether biopython correctly translate the sequence ACTAGCT to a protein code when I install biopython? It should have been already checked by the developers/volonteers. If I want to install biopython on my computer, I want to run only the tests needed to make it sure it can work fine on my configuration, not all of them. As an example, take pytable, a library to handle HDF5 files with python. The authors claim that they have written more than 10^6 tests for it. However, when you install pytables from source, you don't have to run all of these tests: but only a subset of them, the ones required to check if it can run correctly on your computer. Consider that some of the tests on pytables take hours or days to complete, because they check the handling of big binary files. The idea is that, if we separate the tests on the code from the ones on the configuration, we will be able to enhance the test section of biopython a lot. For example, at the moment there are not many tests to check biopython's behaviour with big sequence files (e.g. 1 GB). It would be useful to have such tests, because now it is becoming common to handle big files in bioinformatics, and it would be possible to do some profiling on that. With that strategy, it would make sense to adopt a tool like nose which enchance the test framework a lot. For example, it will be very difficult to write tests on big files without using global fixtures (which the basic unittest doesn't support). This means that if you want to write a test which studies the handling of 1 GB sequence file with biopython, with the basic python testing framework, you are forced to open the file on every test (setUp function) while with a global fixture, you will be able to do it in a very elegant way. nose has a lot of many other interesting features: it supports fixtures for doctests, it can be used to profile the execution of all tests, and it supports many plugins. For example, have a look at these ones: http://darcs.idyll.org/~t/projects/pinocchio/doc/#stopwatch-selecting-tests-based-on-execution-time > > > > What about having support to global fixtures? > > For example, many test scripts begin in the same way: they > > 'import > > numpy', check for python's version, etc.. All of > > this could be moved > > to a global fixture and then executed only once for all the > > tests. > > > Hmm... currently the Biopython tests can be written essentially independently of each other, without knowing much about the testing overall framework. I think that that makes it easier for new users/developers to add tests. I think we should avoid the situation that somebody first has to study Biopython's testing framework to be able to add a test. You could write a skeleton for biopython's tests, and it will be a lot useful (e.g. have a look at this recipe for elixir: http://elixir.ematia.de/trac/wiki/Recipes/Testing) > > > --Michiel. > > > > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Wed Feb 4 02:57:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 3 Feb 2009 21:57:14 -0500 Subject: [Biopython-dev] [Bug 2749] New: Proposal: a template for biopython's unittests Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2749 Summary: Proposal: a template for biopython's unittests Product: Biopython Version: Not Applicable Platform: All URL: http://github.com/dalloliogm/bio-test-datasets- repository/blob/master/templates/biopython/biotest_templ ate.py OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com I have posted here: - http://github.com/dalloliogm/bio-test-datasets-repository/blob/master/templates/biopython/biotest_template.py a draft for a template for biopython's unittests. The idea is that if you provide a template for writing unittests for biopython, it will be easier for new developers. This example, in particular, makes uses of nose, and it has example of global fixtures (the two setUpAll and tearDownAll methods). It could be adapted for being used without nose, but it will be more difficult to understand. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Wed Feb 4 03:01:51 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 04:01:51 +0100 Subject: [Biopython-dev] a template for unittests in biopython Message-ID: <5aa3b3570902031901q5450bf43nfeb23ded1c70608c@mail.gmail.com> Hi people, I have posted here: - http://github.com/dalloliogm/bio-test-datasets-repository/blob/master/templates/biopython/biotest_template.py a draft for a template for unittests in biopython. You can also refer to it as bug 2749 (http://bugzilla.open-bio.org/show_bug.cgi?id=2749). So, the idea is that if we have a template for test files, it will be easier for new developers to write new tests and modules. This one in particular makes use of nose, and it has some example of global fixtures (the two setUpClass and tearDownAll methods). What do you think about it? Cheers.. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From mjldehoon at yahoo.com Wed Feb 4 03:57:23 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 3 Feb 2009 19:57:23 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030512y36a7118ke81a8a890caf1437@mail.gmail.com> Message-ID: <289461.11809.qm@web62401.mail.re1.yahoo.com> > In terms of cleaning up the code, something we can probably > now remove from the print-and-compare handler is the special > case of modules called via a run_tests method. I've removed these run_tests function from the print-and-compare tests, and from a few unittest-based tests where this function is not actually being used. I've updated run_tests.py accordingly. --Michiel. From biopython at maubp.freeserve.co.uk Wed Feb 4 10:22:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Feb 2009 10:22:05 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> References: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <752614.16176.qm@web62402.mail.re1.yahoo.com> <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> Message-ID: <320fb6e00902040222r4b3297b3q3b4141db1f16a25d@mail.gmail.com> >> > Moreover, why the typical user should be running >> > biopython's tests? >> >> >> To make sure that it works. Biopython interacts with and therefore >> depends more on 3rd party software, web servers, and file formats >> than most other Python modules. Things are more likely to break >> than for example for a more self-contained library such as NumPy. >> I always run the Biopython tests, and I would advise every user to >> do so too. In addition, the tests can function as example scripts >> showing how to use Biopython. It is important that all users can run those scripts. > > I think that all the tests which check if biopython can run correctly > on a computer should be separated from all the others. > Why do I have to test whether biopython correctly translate the > sequence ACTAGCT to a protein code when I install biopython? It should > have been already checked by the developers/volonteers. If I want to > install biopython on my computer, I want to run only the tests needed > to make it sure it can work fine on my configuration, not all of them. As an end user, I would still prefer to know that even simple things like translation have been checked as working on my machine. With a very simple example like this is it unlikely to break on some setups and not others, but for many test cases it is very hard to make this judgement call. The only real way to "to make it sure it can work fine on my configuration" is to just test everything - and it doesn't take that long anyway. > As an example, take pytable, a library to handle HDF5 files with python. > The authors claim that they have written more than 10^6 tests for it. > However, when you install pytables from source, you don't have to run > all of these tests: but only a subset of them, the ones required to > check if it can run correctly on your computer. Consider that some of > the tests on pytables take hours or days to complete, because they > check the handling of big binary files. OK, this is a little different - simply because of the time taken. If the full test suite takes hours or more, then I can see why the pytables people only distribute a subset of the tests. > The idea is that, if we separate the tests on the code from the ones > on the configuration, we will be able to enhance the test section of > biopython a lot. > For example, at the moment there are not many tests to check > biopython's behaviour with big sequence files (e.g. 1 GB). It would be > useful to have such tests, because now it is becoming common to handle > big files in bioinformatics, and it would be possible to do some > profiling on that. If you want developers to download 1 GB files as part of building and testing Biopython, it will be a hurdle/barrier to development. Even for existing developers, it would make setting up a new machine that much more complicated. Other than looking at performance speed/memory, we can check most features of large multi-record files with much smaller examples. Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 4 10:27:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 05:27:28 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902041027.n14ARS6S023505@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-04 05:27 EST ------- The current view from the Biopython developers is that we don't want to depend on nose for running our unit tests (nose is not installed automatically as part of python). This has been discussed on the mailing list, so I won't repeat myself here. In this example, having global setUpAll and tearDownAll methods isn't needed, but I can see how they might be helpful on larger (slower) tests. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed Feb 4 10:30:31 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 4 Feb 2009 02:30:31 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> Message-ID: <114112.52378.qm@web62406.mail.re1.yahoo.com> I've uploaded to CVS a modified version of run_tests.py to address import errors. Could you have a look to see if you agree with my solution? --Michiel. --- On Tue, 2/3/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Tuesday, February 3, 2009, 9:02 AM > Michiel, > > I've noticed that for print-and-compare tests we can > get unexpected > errors from the line: > module = __import__(name) > > For example, if there is an IOError in test_SeqIO_online.py > this does > not get caught - we only try to catch a > MissingExternalDependencyError. Perhaps we should also > catch any > generic exception and report that test as a failure. > Otherwise, the > run_test.py file terminates prematurely. > > Would you like to look into this, or should I? > > Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 4 11:28:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 06:28:46 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902041128.n14BSkDG030225@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #2 from dalloliogm at gmail.com 2009-02-04 06:28 EST ------- yeee! I just come with an idea which makes this test template work both with and without nose. The global fixture methods should be called manually before executing the test suite. I am sure there is a way to do this automatically rather than manually as it is now. Anyway, look at the latest commit: - http://github.com/dalloliogm/bio-test-datasets-repository/commit/53554d7ee9a117bc4df9e9ea5bc844e0d4e4d2fa It can improved, of course. However, the idea behind this feature proposal is to have a template for unittests in biopython. What do you think about it? It can be refined... for example, telling people which version of numpy they should import if they need it, how they should format docstrings, etc.. The same can be done for a template of sequence formats parser. I was looking for something like that when I wrote the fastPhase parser. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Wed Feb 4 11:37:56 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 4 Feb 2009 12:37:56 +0100 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902040222r4b3297b3q3b4141db1f16a25d@mail.gmail.com> References: <5aa3b3570902030218x78df37ddic6488638a7937712@mail.gmail.com> <752614.16176.qm@web62402.mail.re1.yahoo.com> <5aa3b3570902031635n357edc4bx77d3bf6094340532@mail.gmail.com> <320fb6e00902040222r4b3297b3q3b4141db1f16a25d@mail.gmail.com> Message-ID: <5aa3b3570902040337h26c590bara35a096a9d642c9b@mail.gmail.com> On Wed, Feb 4, 2009 at 11:22 AM, Peter wrote: >>> > Moreover, why the typical user should be running >>> > biopython's tests? >>> >>> >>> To make sure that it works. Biopython interacts with and therefore >>> depends more on 3rd party software, web servers, and file formats >>> than most other Python modules. Things are more likely to break >>> than for example for a more self-contained library such as NumPy. >>> I always run the Biopython tests, and I would advise every user to >>> do so too. In addition, the tests can function as example scripts >>> showing how to use Biopython. It is important that all users can run those scripts. >> >> I think that all the tests which check if biopython can run correctly >> on a computer should be separated from all the others. >> Why do I have to test whether biopython correctly translate the >> sequence ACTAGCT to a protein code when I install biopython? It should >> have been already checked by the developers/volonteers. If I want to >> install biopython on my computer, I want to run only the tests needed >> to make it sure it can work fine on my configuration, not all of them. > > As an end user, I would still prefer to know that even simple things > like translation have been checked as working on my machine. With a > very simple example like this is it unlikely to break on some setups > and not others, but for many test cases it is very hard to make this > judgement call. The only real way to "to make it sure it can work > fine on my configuration" is to just test everything - and it doesn't > take that long anyway. It doesn't take long, but the developers are forced to write tests which don't take long. However, this doesn't mean that big tests are not necessary. Many libraries I have installed have two separated commands, 'setup.py test' and 'setup.py test_all'. >> As an example, take pytable, a library to handle HDF5 files with python. >> The authors claim that they have written more than 10^6 tests for it. >> However, when you install pytables from source, you don't have to run >> all of these tests: but only a subset of them, the ones required to >> check if it can run correctly on your computer. Consider that some of >> the tests on pytables take hours or days to complete, because they >> check the handling of big binary files. > > OK, this is a little different - simply because of the time taken. If > the full test suite takes hours or more, then I can see why the > pytables people only distribute a subset of the tests. > >> The idea is that, if we separate the tests on the code from the ones >> on the configuration, we will be able to enhance the test section of >> biopython a lot. >> For example, at the moment there are not many tests to check >> biopython's behaviour with big sequence files (e.g. 1 GB). It would be >> useful to have such tests, because now it is becoming common to handle >> big files in bioinformatics, and it would be possible to do some >> profiling on that. > > If you want developers to download 1 GB files as part of building and > testing Biopython, it will be a hurdle/barrier to development. Even > for existing developers, it would make setting up a new machine that > much more complicated. Other than looking at performance > speed/memory, we can check most features of large multi-record files > with much smaller examples. well it is not necessary to put an 1 GB file in the repo.. we could generate it with the random or hmm module, using always the same seed :). It would be a 'package' global fixture. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 4 13:14:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 4 Feb 2009 13:14:14 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <114112.52378.qm@web62406.mail.re1.yahoo.com> References: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> <114112.52378.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00902040514te7c7d2ci245433371770d172@mail.gmail.com> On Wed, Feb 4, 2009 at 10:30 AM, Michiel de Hoon wrote: > > I've uploaded to CVS a modified version of run_tests.py to address import > errors. Could you have a look to see if you agree with my solution? > It look a little while to show up in CVS for me, but I've got it now. That seems to solve the problem neatly - and you've even managed to capture the stack trace elegantly, something I hadn't worked out how to do. Nice :) Peter From bugzilla-daemon at portal.open-bio.org Wed Feb 4 16:16:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 11:16:37 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902041616.n14GGbwk031325@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #3 from bsouthey at gmail.com 2009-02-04 11:16 EST ------- (In reply to comment #1) > The current view from the Biopython developers is that we don't want to depend > on nose for running our unit tests (nose is not installed automatically as part > of python). This has been discussed on the mailing list, so I won't repeat > myself here. Also, the test framework must support Python 2.3 while Biopython supports it. Really I find that the huge diversity in Biopython prevents a 'single' template that is sufficiently easy to follow. I do not like the splitting that test into setups for each 'subtest' followed by a general test. This starts to get rather difficult to read and manage when you have modules like the sequence object involve many different tasks that require a separate setup for each test as well as the actual test. A related problem is that certain tests may require a specific exception for a specific situation. Another problem is that some of the tests are very similar for the same module (say Logistic regression or testing alphabets in reading sequences into a Seq object) so it makes more sense to do what numpy does (http://projects.scipy.org/scipy/numpy/wiki/TestingGuidelines ) where the same test function is used with different inputs. I would like to easily add a new test case to an existing test like Numpy has a test case class that is separate from the actual tests. Just few cents, Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 4 18:02:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 13:02:27 -0500 Subject: [Biopython-dev] [Bug 2750] New: EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2750 Summary: EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: wolfgang.resch at gmail.com for example the following embl record: ID cel-let-7 standard; RNA; CEL; 99 BP. XX AC MI0000001; XX DE Caenorhabditis elegans let-7 stem-loop XX RN [1] RX PUBMED; 11679671. RA Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; RT "An abundant class of tiny RNAs with probable regulatory roles in RT Caenorhabditis elegans"; RL Science. 294:858-862(2001). XX FH Key Location/Qualifiers FH FT miRNA 17..38 FT /accession="MIMAT0000001" FT /product="cel-let-7" FT /evidence=experimental FT /experiment="cloned [1-3,5], Northern [1], PCR [4]" XX SQ Sequence 99 BP; 26 A; 19 C; 24 G; 0 T; 30 other; uacacugugg auccggugag guaguagguu guauaguuug gaauauuacc accggugaac 60 uaugcaauuu ucuaccuuac cggagacaga acucuucga 99 // is parsed as follows: authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; title: Caenorhabditis elegans"; journal: Science. 294:858-862(2001). medline id: pubmed id: comment: -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Wed Feb 4 19:55:20 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 04 Feb 2009 13:55:20 -0600 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <240911.28388.qm@web62402.mail.re1.yahoo.com> References: <240911.28388.qm@web62402.mail.re1.yahoo.com> Message-ID: <4989F2A8.3020300@gmail.com> Michiel de Hoon wrote: > These three tests currently are written as a combination of a unittest-based test and a print-and-compare test. That is, they contain classes deriving from unittest.TestCase, but then print out stuff that should get compared to the output file. However, run_tests.py assumes that they are true unittest-style tests, so the comparison is never done. > > Does anybody mind if I convert these three to pure print-and-compare or pure unittest-style tests? test_Ace.py and test_Nexus.py produce lots of output, so I'm tempted to go with a print-and-compare test there; test_Phd.py might work well as a unittest-style test. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > I looked at these tests and think that these are actually examples not tests (except to say the code run). If so, then I would go with what is easiest. Bruce From bugzilla-daemon at portal.open-bio.org Wed Feb 4 21:58:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 4 Feb 2009 16:58:51 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902042158.n14Lwpra003911@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #4 from dalloliogm at gmail.com 2009-02-04 16:58 EST ------- > (In reply to comment #1) > > The current view from the Biopython developers is that we don't want to depend > > on nose for running our unit tests (nose is not installed automatically as part > > of python). This has been discussed on the mailing list, so I won't repeat > > myself here. > > Also, the test framework must support Python 2.3 while Biopython supports it. > > Really I find that the huge diversity in Biopython prevents a 'single' template > that is sufficiently easy to follow. ok, but you should give to new developers at least some guidelines on how they should write tests, documentation, and code. The fact that the tests in biopython are so various is not a positive point, it make it difficult to understand and to maintain them, especially for newcomers. > I do not like the splitting that test into > setups for each 'subtest' followed by a general test. Well, it is a matter of taste, I think. I find it elegant and rather clear: you can easily see in which conditions and environment every test is run, the code in every test method is reducted to the minimum, and you clean everything after the execution of the first test, so the order in which the tests are executed doesn't count. > This starts to get rather > difficult to read and manage when you have modules like the sequence object > involve many different tasks that require a separate setup for each test as > well as the actual test. You should put those in a different test module. Every test unit is a particular use case: for example, look at my example, where the first unit test is a simple sequence, and the second (subclassed) is a blank one. > A related problem is that certain tests may require a > specific exception for a specific situation. mmm what do you mean, exactly? > Another problem is that some of the tests are very similar for the same module > (say Logistic regression or testing alphabets in reading sequences into a Seq > object) so it makes more sense to do what numpy does > (http://projects.scipy.org/scipy/numpy/wiki/TestingGuidelines ) where the same > test function is used with different inputs. That will be difficult to do until you are so convinced against using nose :(. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 5 18:14:19 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Feb 2009 13:14:19 -0500 Subject: [Biopython-dev] [Bug 2751] New: PDBParser crashes on empty tempFactor fields Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2751 Summary: PDBParser crashes on empty tempFactor fields Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com When parsing ATOM lines, Bio.PDB.PDBParser appears to be passing the contents of indexes 60-66 directly to the float() constructor without checking if the string is empty (or all spaces). The PDB spec seems to indicate that the default value for this field should be 0.0: http://www.wwpdb.org/documentation/format23/sect9.html#ATOM I interpret that to mean PDBParser should assume 0.0 if the string is blank, at least in permissive mode; otherwise, perhaps a PDBException should be raised. Here's a traceback: File "/usr/lib/python2.5/site-packages/Bio/PDB/PDBParser.py", line 66, in get_structure self._parse(file.readlines()) File "/usr/lib/python2.5/site-packages/Bio/PDB/PDBParser.py", line 86, in _parse self.trailer=self._parse_coordinates(coords_trailer) File "/usr/lib/python2.5/site-packages/Bio/PDB/PDBParser.py", line 160, in _parse_coordinates bfactor=float(line[60:66]) ValueError: empty string for float() This occurs when parsing a file that looks like this: HEADER 1ad5 ATOM 4255 N GLU B 82 -6.363 45.622 156.936 1.00 69.02 ATOM 4256 CA GLU B 82 -6.235 44.414 157.713 1.00 68.26 ATOM 4257 C GLU B 82 -5.067 44.774 158.648 1.00 68.19 ATOM 4258 O GLU B 82 -5.169 45.863 159.227 1.00 67.24 ATOM 4259 CB GLU B 82 -5.903 43.230 156.774 1.00 68.47 ATOM 4260 H1 GLU B 82 -6.252 46.392 157.641 1.00 0 ATOM 4261 H2 GLU B 82 -5.588 45.683 156.246 1.00 0 ATOM 4262 H3 GLU B 82 -7.267 45.667 156.437 1.00 0 ATOM 4263 N ASP B 83 -3.979 43.981 158.770 1.00 67.44 ... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 5 18:25:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 5 Feb 2009 13:25:10 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902051825.n15IPAXF016833@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #1 from eric.talevich at gmail.com 2009-02-05 13:25 EST ------- (In reply to comment #0) Sorry, that PDB example was manually fixed. The broken line format is: ATOM 4260 H1 GLU B 82 -6.252 46.392 157.641 1.00 This is from an odd edition of the 1AD5 structure; RCSB's version has the 0.0 values filled in correctly. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 10:46:54 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 05:46:54 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902061046.n16AksQd020147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 05:46 EST ------- (In reply to comment #0) > When parsing ATOM lines, Bio.PDB.PDBParser appears to be passing the contents > of indexes 60-66 directly to the float() constructor without checking if the > string is empty (or all spaces). > > The PDB spec seems to indicate that the default value for this field should be > 0.0: > http://www.wwpdb.org/documentation/format23/sect9.html#ATOM > > I interpret that to mean PDBParser should assume 0.0 if the string is blank, > at least in permissive mode; otherwise, perhaps a PDBException should be > raised. I would have read that spec to mean if you don't know the tempFactor, put "0.0" in the field and don't leave it blank. By this interpretation of the spec, your old file is invalid, and Biopython's failure is therefore not unreasonable. It would be good to cope with this in permissive mode though, and raise a more meaningful error in strict mode. (In reply to comment #1) > (In reply to comment #0) > > Sorry, that PDB example was manually fixed. The broken line format is: > > ATOM 4260 H1 GLU B 82 -6.252 46.392 157.641 1.00 > > This is from an odd edition of the 1AD5 structure; RCSB's version has the 0.0 > values filled in correctly. Do you have a link to download the old ("invalid") version of PDB reference 1AD5, as it would be very helpful to test this on a real file? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Feb 6 11:22:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 6 Feb 2009 11:22:16 +0000 Subject: [Biopython-dev] Biopython tutorial update for unit tests Message-ID: <320fb6e00902060322s2d860056yd61dabd19b144d00@mail.gmail.com> Hi all [I thought I sent this email on Wednesday - oh well, better late than never!] I've recently checked in a revision to the test case section of the tutorial, see tutorial.tex revision 194, http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython My intention is to describe the current test system in more detail. I've tried to make sure the text makes sense for both Biopython 1.49 (in case we want to update the website before the next release) and CVS (assuming we do get rid of the expected output files as currently being trialled). Let me know if anyone spots a typo, or something that should be clearer. You'll need (pdf)latex to build the PDF file, and hevea for the HTML output - but the raw tex file can just be read directly from the CVS link instead. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 6 12:27:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 07:27:49 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061227.n16CRng0029039@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 07:27 EST ------- Confirmed title problem, example code using your EMBL record saved to a file: >>> from Bio import SeqIO >>> record = SeqIO.read(open("long_ref.embl"),"embl") >>> print record.annotations["references"][0] authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; title: Caenorhabditis elegans"; journal: Science. 294:858-862(2001). medline id: pubmed id: comment: This is due to a subtle difference between the GenBank and EMBL scanner code, the GenBank scanner pre-combines the title lines before passing it to the consumer, while the EMBL scanner passes the title in chunks. Fixed the consumer to cope with either. Also fixed for multi-line author lists etc. Could you update your Bio/GenBank/__init__.py file to CVS revision 102, which you will be able to download here, and retest: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/__init__.py?cvsroot=biopython Or update the full installation to CVS if you would find that easier. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 13:34:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 08:34:22 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061334.n16DYMkp005189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 08:34 EST ------- Regarding the missing PUBMED ID, that is also now fixed in CVS. Note that this still ignores DOI and AGRICOLA references (supporting this would require a change to our reference object, and perhaps our BioSQL bindings to). You will need to update your Bio/GenBank/Scanner.py file to revision 1.27 which you will be able to download here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/Scanner.py?cvsroot=biopython Rather than manually updating these two files (Bio/GenBank/__init__.py as per comment 1, and Bio/GenBank/Scanner.py as above), you may find doing a full installation from CVS simpler. e.g. >>> from Bio import SeqIO >>> record = SeqIO.read(open("long_ref.embl"),"embl") >>> for ref in record.annotations["references"] : print ref ... authors: Lau NC, Lim LP, Weinstein EG, Bartel DP, Lim LP, Lau NC, Weinstein EG; title: "An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans"; journal: Science. 294:858-862(2001). medline id: pubmed id: 11679671 comment: Again, please let us know if that solves your problem. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 14:47:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 09:47:10 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061447.n16ElAAr013975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 ------- Comment #3 from wolfgang.resch at gmail.com 2009-02-06 09:47 EST ------- Peter, phantastic - that solved the problem. I've really got to learn the internals of biopython... Thanks and best regards, Wolfgang -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 14:57:48 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 09:57:48 -0500 Subject: [Biopython-dev] [Bug 2750] EMBL format: reference titles split across lines are not parsed correctly; pmids are not parsed In-Reply-To: Message-ID: <200902061457.n16Evmkm015880@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2750 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-06 09:57 EST ------- (In reply to comment #3) > Peter, > > phantastic - that solved the problem. I've really got to learn the internals > of biopython... > > Thanks and best regards, > > Wolfgang Good to know that's working - marking this as FIXED. If you do find anything else amiss, please report it. The EMBL parsing is not yet as well tested / well used as the GenBank support... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 6 18:29:45 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 13:29:45 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902061829.n16ITj9p007988@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #3 from eric.talevich at gmail.com 2009-02-06 13:29 EST ------- Created an attachment (id=1215) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1215&action=view) PDB file with some missing bfactor fields -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Fri Feb 6 20:11:15 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 6 Feb 2009 15:11:15 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring Message-ID: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> Hello all, Scanning the biopython-dev mailing list archives, it appears that either the CVS-to-SVN migration either stalled out during the past year, or the discussion about this migration went someplace other than the mailing list and wiki. I did, however, find another page for the project on Launchpad ( https://launchpad.net/biopython), apparently started a few years ago by Jonathan Taylor and abandoned. I didn't see any discussion of it on biopython-dev around that time. I'm pretty fond of of bzr, branching, and Launchpad's PPA feature ( https://help.launchpad.net/Packaging/PPA) in particular, so I'd like to see if it's possible to start mirroring the CVS repository on Launchpad to see how it goes. I'm happy to take care of whatever setup and maintenance is needed. Comments? Is Jonathan Taylor still around and interested in resurrecting the Launchpad page? Best regards, Eric From bugzilla-daemon at portal.open-bio.org Fri Feb 6 21:38:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 6 Feb 2009 16:38:09 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902062138.n16Lc9Vq003304@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #4 from eric.talevich at gmail.com 2009-02-06 16:38 EST ------- Created an attachment (id=1216) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1216&action=view) Catch float() failures and substitute a default or forward the exception The error message could be more helpful, and it would be nice to log a warning whenever the first exception is caught and a default value is used. The placement of try_float() may not match the coding conventions, I'm not sure. Generalizing as: try_coerce(field, into=float, default=None): ... would allow the same function to be used for coercing the integers and re-raising the exceptions as PDBConstructionException. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Feb 7 12:55:42 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 7 Feb 2009 12:55:42 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> Message-ID: <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> On Fri, Feb 6, 2009 at 8:11 PM, Eric Talevich wrote: > Hello all, > > Scanning the biopython-dev mailing list archives, it appears that either the > CVS-to-SVN migration either stalled out during the past year, or the > discussion about this migration went someplace other than the mailing list > and wiki. There have been some off list discussions with the OBF guys who look after all the servers etc about the logistics doing the migration, and when might suite them. Having all the OBF projects moved from CVS to SVN will make life easier for them (BioPerl etc have already moved). I was actually about to chase that up... > I did, however, find another page for the project on Launchpad ( > https://launchpad.net/biopython), apparently started a few years ago by > Jonathan Taylor and abandoned. I didn't see any discussion of it on > biopython-dev around that time. I guess that was some 3rd party, I don't recall this being discussed here. In terms of other tools, several people here are interested in git, and git and SVN can be made to work together. Hopefully getting Biopython from CVS to SVN will make things easier for them. Peter From bugzilla-daemon at portal.open-bio.org Sat Feb 7 17:44:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 7 Feb 2009 12:44:46 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902071744.n17Hik8w021047@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #5 from eric.talevich at gmail.com 2009-02-07 12:44 EST ------- (From update of attachment 1216) >=== modified file 'Bio/PDB/PDBParser.py' >--- Bio/PDB/PDBParser.py 2009-02-06 20:42:42 +0000 >+++ Bio/PDB/PDBParser.py 2009-02-06 21:17:08 +0000 >@@ -111,6 +111,20 @@ > current_segid=None > current_residue_id=None > current_resname=None >+ >+ def try_float(field, default=None): >+ """Try coercing a string into a float, safely. >+ >+ If the string is not a valid float, then if default is given, >+ default is returned; otherwise an exception is raised. >+ """ >+ try: >+ return float(field) >+ except (ValueError, NameError): >+ if (self.PERMISSIVE==0) or default is None: >+ raise PDBConstructionException("Detected an invalid value in a field") >+ return default >+ > for i in range(0, len(coords_trailer)): > line=coords_trailer[i] > record_type=line[0:6] >@@ -150,13 +164,13 @@ > hetero_flag=" " > residue_id=(hetero_flag, resseq, icode) > # atomic coordinates >- x=float(line[30:38]) >- y=float(line[38:46]) >- z=float(line[46:54]) >+ x=try_float(line[30:38]) >+ y=try_float(line[38:46]) >+ z=try_float(line[46:54]) > coord=numpy.array((x, y, z), 'f') > # occupancy & B factor >- occupancy=float(line[54:60]) >- bfactor=float(line[60:66]) >+ occupancy=try_float(line[54:60], default=0.0) >+ bfactor=try_float(line[60:66], default=0.0) > segid=line[72:76] > if current_segid!=segid: > current_segid=segid >@@ -183,7 +197,7 @@ > except PDBConstructionException, message: > self._handle_PDB_exception(message, global_line_counter) > elif(record_type=='ANISOU'): >- anisou=map(float, (line[28:35], line[35:42], line[43:49], line[49:56], line[56:63], line[63:70])) >+ anisou=map(try_float, (line[28:35], line[35:42], line[43:49], line[49:56], line[56:63], line[63:70])) > # U's are scaled by 10^4 > anisou_array=(numpy.array(anisou, 'f')/10000.0).astype('f') > structure_builder.set_anisou(anisou_array) >@@ -203,13 +217,13 @@ > current_residue_id=None > elif(record_type=='SIGUIJ'): > # standard deviation of anisotropic B factor >- siguij=map(float, (line[28:35], line[35:42], line[42:49], line[49:56], line[56:63], line[63:70])) >+ siguij=map(try_float, (line[28:35], line[35:42], line[42:49], line[49:56], line[56:63], line[63:70])) > # U sigma's are scaled by 10^4 > siguij_array=(numpy.array(siguij, 'f')/10000.0).astype('f') > structure_builder.set_siguij(siguij_array) > elif(record_type=='SIGATM'): > # standard deviation of atomic positions >- sigatm=map(float, (line[30:38], line[38:45], line[46:54], line[54:60], line[60:66])) >+ sigatm=map(try_float, (line[30:38], line[38:45], line[46:54], line[54:60], line[60:66])) > sigatm_array=numpy.array(sigatm, 'f') > structure_builder.set_sigatm(sigatm_array) > local_line_counter=local_line_counter+1 > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Sun Feb 8 06:20:12 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 8 Feb 2009 01:20:12 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> Message-ID: <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> On Sat, Feb 7, 2009 at 7:55 AM, Peter wrote: > > In terms of other tools, several people here are interested in git, > and git and SVN can be made to work together. Hopefully getting > Biopython from CVS to SVN will make things easier for them. > Good to know. I can get behind git, too -- I see BioRuby is already on GitHub, and so are a couple of (partial/modified) branches of Biopython. It looks like git-cvs and git-cvsimport are reasonably complete, or at least enough that mirroring the existing CVS trunk on GitHub would be feasible already. Has this also been discussed before? I'd like to try it sometime, if no one objects. -Eric From dalloliogm at gmail.com Sun Feb 8 16:47:48 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 8 Feb 2009 17:47:48 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> Message-ID: <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> On Sun, Feb 8, 2009 at 7:20 AM, Eric Talevich wrote: > On Sat, Feb 7, 2009 at 7:55 AM, Peter wrote: > >> >> In terms of other tools, several people here are interested in git, >> and git and SVN can be made to work together. Hopefully getting >> Biopython from CVS to SVN will make things easier for them. >> > > Good to know. I can get behind git, too -- I see BioRuby is already on > GitHub, and so are a couple of (partial/modified) branches of Biopython I like github and I think its web interface is one of the best to work with git: it has some tools that I didn't see in the other hosting services supporting git (trac, gitorious), especially those for creating forks. The problem is that the basic account on github is limited to 100 MB, and with the peculiar approach adopted by git (distributed source control) anyone wishing to participate code to biopython should have to create an account on github and in theory create a copy of the repository in his space. Moreover, I think it would be more difficult to use git without the tools offered by github, even if we configure a git repository with trac or similar on the openbio's servers. I don't know if the git-trac plugins has a feature to show all the forks like the one in github. Maybe I am just wrong.. but you should ask to the bioruby people how they are comfortable with these issues, since they are more expert. > > It looks like git-cvs and git-cvsimport are reasonably complete, or at least > enough that mirroring the existing CVS trunk on GitHub would be feasible > already. Has this also been discussed before? I'd like to try it sometime, > if no one objects. > > -Eric > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Sun Feb 8 18:30:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 8 Feb 2009 13:30:14 -0500 Subject: [Biopython-dev] [Bug 2752] New: Context management for Bio.Entrez handles Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2752 Summary: Context management for Bio.Entrez handles Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com I'd like the following code to work: def write_gbk(gi): with open("gi%s.gbk" % gi, 'w+') as outfile: with Entrez.efetch(db='protein, rettype='genbank', id=gi) as gbk: text = gbk.read() outfile.write(text) print "Wrote", gi Since Python 2.5 it's been possible to use the "with" statement to ensure handles are closed properly even if an exception occurs (PEP 343). There's also a decorator, @contextlib.contextmanager, to make this feature easy to support, but in general it works by adding the __enter__ and __exit__ methods to a class. To make Bio.Entrez work this way, we could just add @contextmanager decorators to efetch() and the others, but that would break 2.3 & 2.4 compatibility, so, it's probably best to make a factory class that returns handles on instantiation, and includes __enter__ and __exit__ methods. The e* functions would become trivial classes that derive from the factory; this would also make it possible to remove the redundant code around the deprecated "cgi=None" argument. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Sun Feb 8 19:03:31 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 8 Feb 2009 20:03:31 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> Message-ID: <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> On Sun, Feb 8, 2009 at 5:47 PM, Giovanni Marco Dall'Olio wrote: > I like github and I think its web interface is one of the best to work > with git: it has some tools that I didn't see in the other hosting > services supporting git (trac, gitorious), especially those for > creating forks. > > The problem is that the basic account on github is limited to 100 MB, > and with the peculiar approach adopted by git (distributed source > control) anyone wishing to participate code to biopython should have > to create an account on github and in theory create a copy of the > repository in his space. > > Moreover, I think it would be more difficult to use git without the > tools offered by github, even if we configure a git repository with > trac or similar on the openbio's servers. I don't know if the git-trac > plugins has a feature to show all the forks like the one in github. > Maybe I am just wrong.. but you should ask to the bioruby people how > they are comfortable with these issues, since they are more expert. > > Have you tried to use bazaar+launchpad? It's really easy and should do all the tricks you need from a distributed vcs. It also has features for bugtracking (like trac on github) but i dont' know if we are unhappy with current setup (bugzilla). I think bzr+launchpad has a number of advantages over git+github: -> can work with CVS as a master repository which means that the transition would not require going through SVN (although if it would help people from OBF it is also possible). -> Anyone used to cvs commands (commit, diff, update etc..) can use bzr without trouble. You only need to know new "distributed" commands (push,branch) -> it supports centralized decisions on merging: the possible scenario is that only a limited number of people can merge to the main repository (push in bzr terminology) cheers Bartek From chris.lasher at gmail.com Sun Feb 8 19:34:08 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 8 Feb 2009 14:34:08 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> Message-ID: <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> On Sun, Feb 8, 2009 at 2:03 PM, Bartek Wilczynski wrote: > On Sun, Feb 8, 2009 at 5:47 PM, Giovanni Marco Dall'Olio > wrote: > >> I like github and I think its web interface is one of the best to work >> with git: it has some tools that I didn't see in the other hosting >> services supporting git (trac, gitorious), especially those for >> creating forks. >> >> The problem is that the basic account on github is limited to 100 MB, >> and with the peculiar approach adopted by git (distributed source >> control) anyone wishing to participate code to biopython should have >> to create an account on github and in theory create a copy of the >> repository in his space. >> >> Moreover, I think it would be more difficult to use git without the >> tools offered by github, even if we configure a git repository with >> trac or similar on the openbio's servers. I don't know if the git-trac >> plugins has a feature to show all the forks like the one in github. >> Maybe I am just wrong.. but you should ask to the bioruby people how >> they are comfortable with these issues, since they are more expert. >> >> > Have you tried to use bazaar+launchpad? It's really easy and should do > all the tricks you need from a distributed vcs. It also has features for > bugtracking (like trac on github) but i dont' know if we are unhappy with > current setup (bugzilla). I think bzr+launchpad has a number of advantages > over git+github: > -> can work with CVS as a master repository which means that the > transition would > not require going through SVN (although if it would help people from > OBF it is also possible). > -> Anyone used to cvs commands (commit, diff, update etc..) can use bzr without > trouble. You only need to know new "distributed" commands (push,branch) > -> it supports centralized decisions on merging: the possible scenario > is that only a > limited number of people can merge to the main repository (push in bzr > terminology) This is a good discussion. The longer BioPython has taken to move to SVN and the more I've worked with distributed revision control systems, the more inclined I am to say that moving from CVS to SVN is a waste of time. The advantages of DSCMs and the tools that have emerged around them (GitHub, Launchpad, Bitbucket, etc.) are too great to ignore; at some point in BioPython's path, it will move over to one of these tools. So why not skip to the current generation of SCM? I'm most a fan of Bazaar VCS, especially given its great integration with Launchpad. If BioPython were to move to hosting its bugs on Launchpad (I believe importing from Bugzilla is possible), I think the benefit becomes significantly greater, due to the great ability to automatically associate branches/commits with bugs. If BioPython chooses to stick with Bugzilla, that feature wouldn't be as useful. (I think the same could be said for using the GitHub + Lighthouse combination.) On that note, I do recommend making sure that the BioPython project moves the code to one of these "social coding" sites (e.g., GitHub, Launchpad, Bitbucket). They bring the "who's working on what" that's necessary for tracking the project as a whole. Finally, none of this is really technically challenging, just socially challenging: we have to find a consensus and then actually follow through and make the move. It's 2009; we need to say goodbye to CVS, acknowledge that we missed our time with SVN, and just go straight to a DSCM and a modern code tracking site. Best, Chris L. From eric.talevich at gmail.com Sun Feb 8 19:57:32 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 8 Feb 2009 14:57:32 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> Message-ID: <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> A couple more notes on Launchpad: - Checking out from the master branch does not require signing up for a Launchpad account. Using Launchpad's bug tracker, etc. does, but that's optional and expected. - The PPA feature really is cool, at least using it from Ubuntu. The python-biopython package is included in the main distribution, but Biopython releases happen more frequently than every 6 months, so that package gets out of date. With the PPA, interested users can track new releases in the package manager without downloading a fresh copy or checking out the development version with cvs/svn/bzr. Cheers, Eric From chris.lasher at gmail.com Sun Feb 8 20:11:10 2009 From: chris.lasher at gmail.com (Chris Lasher) Date: Sun, 8 Feb 2009 15:11:10 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> Message-ID: <128a885f0902081211o5db2e00esdc9aa9055412872f@mail.gmail.com> On Sun, Feb 8, 2009 at 2:57 PM, Eric Talevich wrote: > A couple more notes on Launchpad: > > - Checking out from the master branch does not require signing up for a > Launchpad account. Using Launchpad's bug tracker, etc. does, but that's > optional and expected. > > - The PPA feature really is cool, at least using it from Ubuntu. The > python-biopython package is included in the main distribution, but Biopython > releases happen more frequently than every 6 months, so that package gets > out of date. With the PPA, interested users can track new releases in the > package manager without downloading a fresh copy or checking out the > development version with cvs/svn/bzr. - Launchpad can host, but does not require hosting, the repositories for a project on its servers. It will mirror existing repositories hosted at another location, or simply provide the address for the repositories of branches for a project. In essence, it's happy to just track the presence of branches hosted outside of its own service--a major plus. I just went picking through GitHub and can't find a similar feature. Someone more familiar with GitHub might know a way, though. Chris From bsouthey at gmail.com Mon Feb 9 15:02:00 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Feb 2009 09:02:00 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> Message-ID: <49904568.1040400@gmail.com> Eric Talevich wrote: > A couple more notes on Launchpad: > > - Checking out from the master branch does not require signing up for a > Launchpad account. Using Launchpad's bug tracker, etc. does, but that's > optional and expected. > What is a good project using Launchpad? Ignoring the arguments about it's openness, I have found it to be too slow and difficult to navigate to be useful. Sure the latter is experience since I use command line to update numpy and Biopython. > - The PPA feature really is cool, at least using it from Ubuntu. The > python-biopython package is included in the main distribution, but Biopython > releases happen more frequently than every 6 months, so that package gets > out of date. With the PPA, interested users can track new releases in the > package manager without downloading a fresh copy or checking out the > development version with cvs/svn/bzr. > I find the idea of PPA was complete waste of effort and time! Why? Simply because we are not the distribution maintainers for numpy and Biopython. It would be far better to work with the package maintainers to ensure these are up to date as well as any bugs that may get reported or fixed by them. While I do not use it, I am not sure how relevant that is to just using EasyInstall to provide the latest snapshots which try to avoid any distro or platform requirements. My couple of cents, Bruce From bsouthey at gmail.com Mon Feb 9 16:04:09 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Feb 2009 10:04:09 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> Message-ID: <499053F9.60709@gmail.com> Chris Lasher wrote: > On Sun, Feb 8, 2009 at 2:03 PM, Bartek Wilczynski > wrote: > >> On Sun, Feb 8, 2009 at 5:47 PM, Giovanni Marco Dall'Olio >> wrote: >> >> >>> I like github and I think its web interface is one of the best to work >>> with git: it has some tools that I didn't see in the other hosting >>> services supporting git (trac, gitorious), especially those for >>> creating forks. >>> >>> The problem is that the basic account on github is limited to 100 MB, >>> and with the peculiar approach adopted by git (distributed source >>> control) anyone wishing to participate code to biopython should have >>> to create an account on github and in theory create a copy of the >>> repository in his space. >>> >>> Moreover, I think it would be more difficult to use git without the >>> tools offered by github, even if we configure a git repository with >>> trac or similar on the openbio's servers. I don't know if the git-trac >>> plugins has a feature to show all the forks like the one in github. >>> Maybe I am just wrong.. but you should ask to the bioruby people how >>> they are comfortable with these issues, since they are more expert. >>> >>> >>> >> Have you tried to use bazaar+launchpad? It's really easy and should do >> all the tricks you need from a distributed vcs. It also has features for >> bugtracking (like trac on github) but i dont' know if we are unhappy with >> current setup (bugzilla). I think bzr+launchpad has a number of advantages >> over git+github: >> -> can work with CVS as a master repository which means that the >> transition would >> not require going through SVN (although if it would help people from >> OBF it is also possible). >> -> Anyone used to cvs commands (commit, diff, update etc..) can use bzr without >> trouble. You only need to know new "distributed" commands (push,branch) >> -> it supports centralized decisions on merging: the possible scenario >> is that only a >> limited number of people can merge to the main repository (push in bzr >> terminology) >> > > This is a good discussion. The longer BioPython has taken to move to > SVN and the more I've worked with distributed revision control > systems, the more inclined I am to say that moving from CVS to SVN is > a waste of time. The advantages of DSCMs and the tools that have > emerged around them (GitHub, Launchpad, Bitbucket, etc.) are too great > to ignore; at some point in BioPython's path, it will move over to one > of these tools. So why not skip to the current generation of SCM? > Do you control your own project with multiple developers? If so, how do you ensure which is the standard version and address conflicts? While I understand the advantages of distributed option, I do not see the end result any different between a distributed and a non-distributed version control system. Even in Linux, the only 'tree' that counts is Linus's as he provides the official versions of the kernel. I would argue that same applies to Biopython especially as there appears to be single developers providing their own material to the single tree rather than multiple developers working together. Part of that is legacy in that the core bioinformatics in Biopython is rather complete. > I'm most a fan of Bazaar VCS, especially given its great integration > with Launchpad. If BioPython were to move to hosting its bugs on > Launchpad (I believe importing from Bugzilla is possible), I think the > benefit becomes significantly greater, due to the great ability to > automatically associate branches/commits with bugs. I don't find automatic association between fixes and bugs a reason to change. In numpy's Trac system you can see which version where the bug was closed. > If BioPython > chooses to stick with Bugzilla, that feature wouldn't be as useful. (I > think the same could be said for using the GitHub + Lighthouse > combination.) > > On that note, I do recommend making sure that the BioPython project > moves the code to one of these "social coding" sites (e.g., GitHub, > Launchpad, Bitbucket). They bring the "who's working on what" that's > necessary for tracking the project as a whole. > > Finally, none of this is really technically challenging, just socially > challenging: we have to find a consensus and then actually follow > through and make the move. It's 2009; we need to say goodbye to CVS, > acknowledge that we missed our time with SVN, and just go straight to > a DSCM and a modern code tracking site. > > I think that central question that is lacking so far is how will any of these approaches work with what Biopython is, how Biopython operates and what Biopython provides? It is very easy to argue in general terms on how one system is better than another - lots of web pages on that. But that does not address the needs of the project as a whole. At present, you and others have not specifically addressed how Biopython would benefit from this. How do you maintain a stable tree that always should be correct and addresses conflicts (like different coding style and semantics :-) )? As with Linux, people do not scale, so the one of the main goals of any system is that it should minimize effort of maintaining and producing the stable release. How does a user get the 'latest' version if they have bug? How do you even know what version that actually have? How do they avoid picking up other changes that a developer has made in addition to that bug fix? (Not that any system is immune, like developers adding unsupported dependencies or undefined variables as in recent cases in numpy). I also favor the centralized system because I am not a Biopython developer but a tester. So getting the current version is essential to do that and I do not want to have to pull other people's code in to do that especially if it brings in new code not related to a fix. Nor do I think that an extended period of pre-release testing is suitable for Biopython. Just some thoughts, Bruce From bartek at rezolwenta.eu.org Mon Feb 9 16:08:17 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 9 Feb 2009 17:08:17 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902090807j46586568k5300a3565516d4bc@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <3f6baf360902081157i733de1cfh9eb10a9acd809be7@mail.gmail.com> <49904568.1040400@gmail.com> <8b34ec180902090807j46586568k5300a3565516d4bc@mail.gmail.com> Message-ID: <8b34ec180902090808i7968a054nc77fb7190dac50f1@mail.gmail.com> Hi, On Mon, Feb 9, 2009 at 4:02 PM, Bruce Southey wrote: > What is a good project using Launchpad? Probably the biggest one would be MySQL (https://launchpad.net/mysql-server). You may also look ?at http://bazaar-vcs.org/WhoUsesBzr > Ignoring the arguments about it's openness, I have found it to be too slow > and difficult to navigate to be useful. Sure the latter is experience since > I use command line to update numpy and Biopython. I don't exactly know what you have done, so it's hard to say what is at fault here. There are two separate pieces of software at work here: -bzr, the proper dvcs, which is a command-line tool -launchpad, the website where you can host your code branches and projects I find bzr at ?faster than cvs (although it is considerably slower than git), and I don;t find launchpad slow, but as usually with websites, YMMV. > > I find the idea of PPA was complete waste of effort and time! Why? Simply > because we are not the distribution maintainers for numpy and Biopython. It > would be far better to work with the package maintainers to ensure these are > up to date as well as any bugs that may get reported or fixed by them. While > I do not use it, I am not sure how relevant that is to just using > EasyInstall to provide the latest snapshots which try to avoid any distro or > platform requirements. I don't think that PPA is an important thing for biopython. It might be a nice addition for those who use ubuntu (I do) and there is not much effort required, once there is a current bzr branch of biopython available in launchpad. However it doesn't need to be an official one, so It's not a major issue now. -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From bartek at rezolwenta.eu.org Mon Feb 9 16:24:59 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 9 Feb 2009 17:24:59 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902090824t1acacbd7lf0377202c03ee6bf@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <8b34ec180902090824t1acacbd7lf0377202c03ee6bf@mail.gmail.com> Message-ID: <8b34ec180902090824k20988294hbff5b9c0525c486e@mail.gmail.com> Hi, On Mon, Feb 9, 2009 at 5:04 PM, Bruce Southey wrote: >> >> This is a good discussion. The longer BioPython has taken to move to >> SVN and the more I've worked with distributed revision control >> systems, the more inclined I am to say that moving from CVS to SVN is >> a waste of time. The advantages of DSCMs and the tools that have >> emerged around them (GitHub, Launchpad, Bitbucket, etc.) are too great >> to ignore; at some point in BioPython's path, it will move over to one >> of these tools. So why not skip to the current generation of SCM? >> > > Do you control your own project with multiple developers? > If so, how do you ensure which is the standard version and address > conflicts? > > While I understand the advantages of distributed option, I do not see the > end result any different between a distributed and a non-distributed version > control system. Even in Linux, the only 'tree' that counts is Linus's as he > provides the official versions of the kernel. I would argue that same > applies to Biopython especially as there appears to be single developers > providing their own material to the single tree rather than multiple > developers working together. Part of that is legacy in that the core > bioinformatics in Biopython is rather complete. > That's the point. Linux is a perfect example how a large project can benefit from using a distibuted vcs. The official branch is the one which is linked from biopython.org website. But anyone can _easily_ branch it on his/her own, make changes to it and send submit it for merge with the trunk or just publish it so people can use his branch. >> I'm most a fan of Bazaar VCS, especially given its great integration >> with Launchpad. If BioPython were to move to hosting its bugs on >> Launchpad (I believe importing from Bugzilla is possible), I think the >> benefit becomes significantly greater, due to the great ability to >> automatically associate branches/commits with bugs. > > I don't find automatic association between fixes and bugs a reason to > change. In numpy's Trac system you can see which version where the bug was > closed. I think that using launchpad for bugtracking is a separate issue. There are different options here. The good thing about launchpad+bzr is that it allows this, so it won't be a problem if we decide to switch from bugzilla to somehing else. But it is a separate decision. >> Finally, none of this is really technically challenging, just socially >> challenging: we have to find a consensus and then actually follow >> through and make the move. It's 2009; we need to say goodbye to CVS, >> acknowledge that we missed our time with SVN, and just go straight to >> a DSCM and a modern code tracking site. >> >> > > I think that central question that is lacking so far is how will any of > these approaches work with what Biopython is, how Biopython operates and > what Biopython provides? > It is very easy to argue in general terms on how one system is better than > another - lots of web pages on that. But that does not address the needs of > the project as a whole. At present, you and others have not specifically > addressed how Biopython would benefit from this. > > How do you maintain a stable tree that always should be correct and > addresses conflicts (like different coding style and semantics :-) )? > As with Linux, people do not scale, so the one of the main goals of any > system is that it should minimize effort of maintaining and producing the > stable release. > > How does a user get the 'latest' version if they have bug? How do you even > know what version that actually have? > How do they avoid picking up other changes that a developer has made in > addition to that bug fix? (Not that any system is immune, like developers > adding unsupported dependencies or undefined variables as in recent cases in > numpy). > > I also favor the centralized system because I am not a Biopython developer > but a tester. So getting the current version is essential to do that and I > do not want to have to pull other people's code in to do that especially if > it brings in new code not related to a fix. ?Nor do I think that an extended > period of pre-release testing is suitable for Biopython. > Absolutely right. I think there is a misconception about the "distributed" part of git or bzr. I don't think anybody was proposing some guerilla style development with no official releases and code-base. Using dvcs is for enabling people to contribute effectively rather than because it centralized development easier. The key thing here that bzr/launchpad (or git+github, but I'll stick to what I know for sake of this example) does not _need_ to be the main repository for biopython. I think that possible advantages are not so much ?in using it internally, but making it easier for people to branch and merge. Having an "official" bzr branch of biopython which is automatically updated from current main vcs (currently CVS) makes branching as easy as writing: bzr branch lp:biopython After someone has made a number of changes (and commits to his local vcs) and is happy with the result you just do bzr send lp:biopython and the maintainer of the branch gets notified about a submission of a patch. Then he can decide to merge it into trunk (without loosing any changes history) or refuse. Once the changes are merged into the official bzr branch it's easy to commit them back to CVS. After a while, if people are happy with using bzr instead ov cvs, weo could switch to bzr to avoid synchronizing with CVS, but this is not necessary. It's all about making it easier for people to get involved. Currently the only possibility to participate is to send patches through bugzilla or mailing list but merging this into a cvs is a nightmare. While in bzr (or git) you can develop "on a branch" locally, without disturbing anyone, and then merge with trunk without loosing your development history (virtually impossible in cvs or svn) -- Bartek Wilczynski ================== Postdoctoral fellow EMBL, Furlong group Meyerhoffstrasse 1, 69012 Heidelberg, Germany tel: +49 6221 387 8433 From biopython at maubp.freeserve.co.uk Mon Feb 9 16:29:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 9 Feb 2009 16:29:37 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499053F9.60709@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> Message-ID: <320fb6e00902090829h5f4b02e6xcad41f9b47c9be68@mail.gmail.com> On Mon, Feb 9, 2009 at 4:04 PM, Bruce Southey wrote: > > I also favor the centralized system because I am not a Biopython developer > but a tester. So getting the current version is essential to do that and I > do not want to have to pull other people's code in to do that especially if > it brings in new code not related to a fix. > In case there had been any confusion here, yes, even with a distributed source code system, we do need some centralization. i.e. Even if we do end up with several developers having their own git branches, we would have to have an "official" tree used for the releases and installers published on Biopython.org (and this official tree could potentially be CVS, SVN or git based). Bartek has just written an email saying more or less the same thing. Peter From bugzilla-daemon at portal.open-bio.org Mon Feb 9 16:46:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 11:46:10 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902091646.n19GkACh021770@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-09 11:46 EST ------- (In reply to comment #3) > Created an attachment (id=1215) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1215&action=view) [details] > PDB file with some missing bfactor fields > Where did this come from? The official 1AD5 file from the PDB has valid bfactor fields present: http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1AD5 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Mon Feb 9 16:59:36 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 9 Feb 2009 17:59:36 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499053F9.60709@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> Message-ID: <5aa3b3570902090859q5ea82e3au87d94a708e3b2b74@mail.gmail.com> On Mon, Feb 9, 2009 at 5:04 PM, Bruce Southey wrote: > Chris Lasher wrote: >> > > Do you control your own project with multiple developers? > If so, how do you ensure which is the standard version and address > conflicts? > > While I understand the advantages of distributed option, I do not see the > end result any different between a distributed and a non-distributed version > control system. Let's say I want to develop a new module to read fasta sequence, alternative to the current one. With a DVCS, I would fork the official biopython branch, and start working on it. While I am changing things and committing everything to my private branch, the official biopython developers keep committing changes, on the official branch. When I will be sure that my SeqIO personalization is ready, I will send a merge request to you, and it will be easy to know: - which was the exact version and code of biopython when I created my branch; - which commits have been made in the official branch while I was working on mine, so it will be easier to determine how to merge them; - moreover, if my changes will be accepted, the whole history of my private branch will be included in biopython (and it could be useful). Imagine how to do the same with a normal VCS. It would be similar: I would create a local copy of biopython on my computer, and start working on that (since I don't have access to the official repository). When my new module will be ready, I will send the changes to the official biopython branch through bugzilla: the problem is that then, we will have lost the information on which was the version of biopython when I created my local copy, and it will be more difficult to merge it. Have a look at this post: - http://github.com/blog/39-say-hello-to-the-network-graph-visualizer > Even in Linux, the only 'tree' that counts is Linus's as he > provides the official versions of the kernel. I would argue that same > applies to Biopython especially as there appears to be single developers > providing their own material to the single tree rather than multiple > developers working together. Part of that is legacy in that the core > bioinformatics in Biopython is rather complete. > >> I'm most a fan of Bazaar VCS, especially given its great integration >> with Launchpad. If BioPython were to move to hosting its bugs on >> Launchpad (I believe importing from Bugzilla is possible), I think the >> benefit becomes significantly greater, due to the great ability to >> automatically associate branches/commits with bugs. > > I don't find automatic association between fixes and bugs a reason to > change. In numpy's Trac system you can see which version where the bug was > closed. > >> If BioPython >> chooses to stick with Bugzilla, that feature wouldn't be as useful. (I >> think the same could be said for using the GitHub + Lighthouse >> combination.) >> >> On that note, I do recommend making sure that the BioPython project >> moves the code to one of these "social coding" sites (e.g., GitHub, >> Launchpad, Bitbucket). They bring the "who's working on what" that's >> necessary for tracking the project as a whole. >> >> Finally, none of this is really technically challenging, just socially >> challenging: we have to find a consensus and then actually follow >> through and make the move. It's 2009; we need to say goodbye to CVS, >> acknowledge that we missed our time with SVN, and just go straight to >> a DSCM and a modern code tracking site. >> >> > > I think that central question that is lacking so far is how will any of > these approaches work with what Biopython is, how Biopython operates and > what Biopython provides? > It is very easy to argue in general terms on how one system is better than > another - lots of web pages on that. But that does not address the needs of > the project as a whole. At present, you and others have not specifically > addressed how Biopython would benefit from this. > > How do you maintain a stable tree that always should be correct and > addresses conflicts (like different coding style and semantics :-) )? > As with Linux, people do not scale, so the one of the main goals of any > system is that it should minimize effort of maintaining and producing the > stable release. > > How does a user get the 'latest' version if they have bug? How do you even > know what version that actually have? > How do they avoid picking up other changes that a developer has made in > addition to that bug fix? (Not that any system is immune, like developers > adding unsupported dependencies or undefined variables as in recent cases in > numpy). > > I also favor the centralized system because I am not a Biopython developer > but a tester. So getting the current version is essential to do that and I > do not want to have to pull other people's code in to do that especially if > it brings in new code not related to a fix. Nor do I think that an extended > period of pre-release testing is suitable for Biopython. > > Just some thoughts, > Bruce > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Feb 9 17:20:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 12:20:46 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902091720.n19HKkrQ031145@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-09 12:20 EST ------- Hi Eric, Could you try out Bio/PDB/PDBParser.py CVS revision 1.25 please? This allows missing occupancy and B factor (temp factor) fields in permissive mode, and the exception or printed error message does include the line number. I can appreciate that getting these warnings hundreds of times from a single file would be annoying, so perhaps if the fields are just blank a single warning should be given? If you can find any official PDB examples which do this, or get clarification regarding the "legality" of omitting these fields, then I would be happy to change this code. Where are you getting your PDB files from? Note that I have not attempted to deal with ANISOU, SIGUIJ or SIGATM records. Do you have any examples of this, or were these changes just defensive programming? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 9 19:08:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 14:08:29 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902091908.n19J8TNJ022487@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 ------- Comment #8 from eric.talevich at gmail.com 2009-02-09 14:08 EST ------- (In reply to comment #7) Works for me. Thanks! > I can appreciate that getting these warnings hundreds of times from a single > file would be annoying, so perhaps if the fields are just blank a single > warning should be given? I haven't explored the rest of Biopython's internals yet -- is there a general logging/warning system where verbosity is configured globally? Another issue: These warnings are printed to standard out, rather than standard error; that would screw up a pipeline. Tracebacks, for instance, are printed on standard error. I assume this complain->stdout situation is the case across the codebase -- should I file a separate bug for that? > If you can find any official PDB examples which do > this, or get clarification regarding the "legality" of omitting these fields, > then I would be happy to change this code. Where are you getting your PDB > files from? Another person in my lab reported the problem. Some other program extracted the 'B' chain from the full PDB file to create this one; I don't know which one, but I believe it's out in the wild, rather than a home-grown script. Scientific Python's PDB parser handles the file without complaint. > Note that I have not attempted to deal with ANISOU, SIGUIJ or SIGATM records. > Do you have any examples of this, or were these changes just defensive > programming? I have no examples, it's just defensive. Using try_float() instead of float() everywhere re-raises any ValueExceptions as PDBConstructionExceptions, and only eats the exception if a default value is supplied. Some Scheme sympathies showing, I guess -- it's a closure that generally works the same as the float constructor, but with our own error handling. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 9 20:36:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 15:36:26 -0500 Subject: [Biopython-dev] [Bug 2751] PDBParser crashes on empty tempFactor fields In-Reply-To: Message-ID: <200902092036.n19KaQPt010719@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2751 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-09 15:36 EST ------- (In reply to comment #8) > > I haven't explored the rest of Biopython's internals yet -- is > there a generallogging/warning system where verbosity is > configured globally? No, this is specific to Bio.PDB, presumably a reaction to the number of technically invalid but still useful PDB files one has to deal with. > Another issue: These warnings are printed to standard out, rather > than standard error; that would screw up a pipeline. Tracebacks, > for instance, are printed on standard error. I assume this > complain->stdout situation is the case across the > codebase -- should I file a separate bug for that? Please do - but keep it focused on Bio.PDB as I can't think of any other modules which do anything similar off hand. > > If you can find any official PDB examples which do this, or get > > clarification regarding the "legality" of omitting these fields, > > then I would be happy to change this code. Where are you > > getting your PDB files from? > > Another person in my lab reported the problem. Some other program > extracted the 'B' chain from the full PDB file to create this one; > I don't know which one, but I believe it's out in the wild, rather > than a home-grown script. Fair enough. If you can chase that up, it will make other people's lives a tiny bit easier in the future - assuming my reading of the PDB format is valid that is ;) You should be able work with the full PDB file in Biopython, and just look at the one chain you are interested in. > Scientific Python's PDB parser handles the file without complaint. If I was convinced missing occupancy or B-factors were valid then I agree Biopython shouldn't "complain" either. > > Note that I have not attempted to deal with ANISOU, SIGUIJ or SIGATM > > records. Do you have any examples of this, or were these changes just > > defensive programming? > > I have no examples, it's just defensive. Using try_float() instead of > float() everywhere re-raises any ValueExceptions as > PDBConstructionExceptions, and only eats the exception if a default value > is supplied. Some Scheme sympathies showing, I guess -- it's a closure that > generally works the same as the float constructor, but with our own error > handling. Fair enough. I didn't want to make any "invasive" changes without the original author's input. Anyway, marking this bug as fixed. Thank you Eric! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Mon Feb 9 20:39:04 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 9 Feb 2009 15:39:04 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499053F9.60709@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> Message-ID: <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> Mark Shuttleworth blogged about why Ubuntu chose bzr awhile back: * "Choose lossless VCS tools if you have that luxury" -- http://www.markshuttleworth.com/archives/125 * "Merging is the key to software developer collaboration" -- http://www.markshuttleworth.com/archives/126 The use case for DVCS is what I assume usually happens when a new parser or other module is added to Biopython -- an outside developer has some sizeable chunk of useful code and needs to integrate it with the trunk. "Code bombs" are something the Linux kernel deals with constantly; I have no idea how they'd deal with it in a centralized system. (Nobody does; they never did use cvs.) My lab uses bzr now. I have it set up to work like a centralized repository in general; I'm the only one who uses the distributed features at the moment, switching between a laptop and a workstation. The merging and renaming support is much better than svn's, and it was easier to set up. It feels kind of crazy to me now to add a significant new change to a project's trunk in one monolithic commit, and I feel the pain of any maintaner who has to apply a patch set to the trunk after the developer's branch and the trunk have diverged. Regarding other concerns: - For update operations more advanced than just pulling the latest revision from the trunk, in bzr et al., it's possible to cherry-pick specific revisions from other developers. - Similarly, it's possible to only merge completed bug fixes and enhancements to the trunk, skipping any new/unstable work a developer has embarked on in their branch. That's why Linux is now permanently on version 2.6.X -- basically every commit can be made stable, so there's no need for a new unstable series in the trunk. - Testers should enjoy the ability to pull specific changes while ignoring unrelated code, even in the same file -- the distributed systems all have this capability (shelve). - PPAs are less useful for Python packages; they just let you manage everything from apt instead of easy_install. I use a PPA to keep my lab's machines on the same version of bzr despite having different versions of Ubuntu. I don't use easy_install for anything right now because it scares me. Best, Eric From bugzilla-daemon at portal.open-bio.org Mon Feb 9 20:58:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 15:58:42 -0500 Subject: [Biopython-dev] [Bug 2754] New: Bio.PDB: Parse warnings should print to stderr, not stdout Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2754 Summary: Bio.PDB: Parse warnings should print to stderr, not stdout Product: Biopython Version: 1.49 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: eric.talevich at gmail.com In Bio.PDB.PDBParser, and perhaps its neighbors, warnings raised while parsing in permissive mode are printed to standard output. In general, messages like this should be printed to standard error to avoid sending garbage to the next program in a pipeline. Recommendation: In PDBParser._handle_PDB_exception, change the print statements to include ">>sys.stderr". Also track down other print statements in the PDB module and send any other warnings to sys.stderr as well. (Grepping Bio/PDB/*.py should work.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Mon Feb 9 21:39:06 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 09 Feb 2009 15:39:06 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> Message-ID: <4990A27A.9060500@gmail.com> Eric Talevich wrote: > Mark Shuttleworth blogged about why Ubuntu chose bzr awhile back: > > * "Choose lossless VCS tools if you have that luxury" -- > http://www.markshuttleworth.com/archives/125 > > * "Merging is the key to software developer collaboration" -- > http://www.markshuttleworth.com/archives/126 > (Yeah but I am not a fan of his writing.) Python PEP 0374 "Migrating from svn to a distributed VCS" makes some points. http://www.python.org/dev/peps/pep-0374/ > The use case for DVCS is what I assume usually happens when a new parser or > other module is added to Biopython -- an outside developer has some sizeable > chunk of useful code and needs to integrate it with the trunk. "Code bombs" > are something the Linux kernel deals with constantly; I have no idea how > they'd deal with it in a centralized system. (Nobody does; they never did > use cvs.) > They complain rather loudly! Really there is a development process that tries to avoid this especially the 'release early, release often' adage. Hopefully this should not happen with Biopython... > My lab uses bzr now. I have it set up to work like a centralized repository > in general; I'm the only one who uses the distributed features at the > moment, switching between a laptop and a workstation. The merging and > renaming support is much better than svn's, and it was easier to set up. It > feels kind of crazy to me now to add a significant new change to a project's > trunk in one monolithic commit, and I feel the pain of any maintaner who has > to apply a patch set to the trunk after the developer's branch and the trunk > have diverged. > How do you avoid this pain? > > Regarding other concerns: > > - For update operations more advanced than just pulling the latest revision > from the trunk, in bzr et al., it's possible to cherry-pick specific > revisions from other developers. > But that requires some degree of advanced knowledge. How easy is it to revert a revision, especially down the road? > - Similarly, it's possible to only merge completed bug fixes and > enhancements to the trunk, skipping any new/unstable work a developer has > embarked on in their branch. That's why Linux is now permanently on version > 2.6.X -- basically every commit can be made stable, so there's no need for a > new unstable series in the trunk. > I do not agree here because the Linux kernel process doesn't work that way see Corbet's take: http://ldn.linuxfoundation.org/book/how-participate-linux-community You have the very frequent merge windows with a testing period, the new staging tree, the -mm tree, and a solid team of 'lieutenants' that reduce many of the problems. In addition, you have the stable tree for major bugs. Would Biopython need to do something similar like having a merge windows and a stable tree? > - Testers should enjoy the ability to pull specific changes while ignoring > unrelated code, even in the same file -- the distributed systems all have > this capability (shelve). > Again requires some advanced knowledge. But I am not sure how much time do I would want to waste on doing that if it does not lead to some thing being included in the main branch compared to code that does enter the main branch. (Yes, it is somewhat selfish.) > - PPAs are less useful for Python packages; they just let you manage > everything from apt instead of easy_install. I use a PPA to keep my lab's > machines on the same version of bzr despite having different versions of > Ubuntu. I don't use easy_install for anything right now because it scares > me. > > I know but I don't use either (nor any rpm-based system for that matter)! > Best, > Eric Bruce From bugzilla-daemon at portal.open-bio.org Tue Feb 10 02:32:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 21:32:24 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902100232.n1A2WO4w005726@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #1 from mdehoon at ims.u-tokyo.ac.jp 2009-02-09 21:32 EST ------- Could you write a patch to Bio.Entrez? Also, with the proposed modifications does anything change for current users of Bio.Entrez (i.e., people who don't use the "with" statement)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 10 04:02:54 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 9 Feb 2009 23:02:54 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902100402.n1A42s0M027862@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #2 from eric.talevich at gmail.com 2009-02-09 23:02 EST ------- I'll take care of it this week. As I'm picturing this, existing users should be unaffected because the new __enter__ and __exit__ methods won't be called. The class-vs-function distinction will be invisible, unless some flagrant isinstance() testing or other metaprogramming is occurring, and I don't know why anyone would do that with these particular functions. This also means some classes will have lowercase names, contradicting the usual style. I hope that's OK; it's for a good cause. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 10 10:12:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 05:12:12 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902101012.n1AACC8N026482@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-10 05:12 EST ------- (In reply to comment #0) > To make Bio.Entrez work this way, we could just add @contextmanager decorators > to efetch() and the others, ... Isn't it simpler just to change our Bio.Entrez._open function instead of all the Bio.Entrez.e* functions? These Bio.Entrez functions are just wrappers for urllib (via our _open function). From reading the example at the end of this page, it looks like closing a urllib handle is left to the user: http://www.python.org/doc/2.5.1/whatsnew/pep-343.html e.g. import urllib, sys from contextlib import closing with closing(urllib.urlopen('http://www.yahoo.com')) as f: for line in f: sys.stdout.write(line) In the short term (without altering Biopython) using this should work, shouldn't it? from contextlib import closing from Bio import Entrez def write_gbk(gi): with open("gi%s.gbk" % gi, 'w+') as outfile: with closing(Entrez.efetch(db='protein', rettype='genbank', id=gi)) as gbk: text = gbk.read() outfile.write(text) print "Wrote", gi Furthermore, rather than messing about with a factory class (which sounds overly complicated), can we just use contextlib.closing ourselves in the Bio.Entrez._open function? This approach should also be easy to keep backwards compatibility with older versions of python. i.e. At the end of _open, replace: return uhandle with: try : from contextlib import closing return closing(uhandle) except ImportError : return uhandle (I haven't tested this yet) Alternatively, we could add the __enter__ and __exit__ methods to the Bio.File.UndoHandle object instead (which would benefit any code using them, not just Bio.Entrez). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue Feb 10 10:28:08 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 02:28:08 -0800 (PST) Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <240911.28388.qm@web62402.mail.re1.yahoo.com> Message-ID: <831036.59343.qm@web62406.mail.re1.yahoo.com> I've converted these three tests to pure unittest-style tests. --Michiel --- On Tue, 2/3/09, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd > To: biopython-dev at biopython.org > Date: Tuesday, February 3, 2009, 7:21 AM > These three tests currently are written as a combination of > a unittest-based test and a print-and-compare test. That is, > they contain classes deriving from unittest.TestCase, but > then print out stuff that should get compared to the output > file. However, run_tests.py assumes that they are true > unittest-style tests, so the comparison is never done. > > Does anybody mind if I convert these three to pure > print-and-compare or pure unittest-style tests? test_Ace.py > and test_Nexus.py produce lots of output, so I'm tempted > to go with a print-and-compare test there; test_Phd.py might > work well as a unittest-style test. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Tue Feb 10 10:35:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Feb 2009 10:35:59 +0000 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <831036.59343.qm@web62406.mail.re1.yahoo.com> References: <240911.28388.qm@web62402.mail.re1.yahoo.com> <831036.59343.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00902100235m5dcd72e1reb9e4e7e0ea3b3e6@mail.gmail.com> On Tue, Feb 10, 2009 at 10:28 AM, Michiel de Hoon wrote: > I've converted these three tests to pure unittest-style tests. > > --Michiel Wow - generating all those assert lines must have taken some time (or a clever script)! The test_Nexus tearDown used to make sure the temp output files were removed. This is important on Windows which does not do this automatically. I see you now allocate "random" filenames using tempfile.NamedTemporaryFile(...) so presumably we would need to record these so that the tearDown method knows what temp files to remove. Peter From mjldehoon at yahoo.com Tue Feb 10 11:25:13 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 03:25:13 -0800 (PST) Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <320fb6e00902100235m5dcd72e1reb9e4e7e0ea3b3e6@mail.gmail.com> Message-ID: <366127.53671.qm@web62408.mail.re1.yahoo.com> > The test_Nexus tearDown used to make sure the temp output > files were removed. This is important on Windows which > does not do this automatically. I see you now allocate > "random" filenames using > tempfile.NamedTemporaryFile(...) so presumably we would > need to record these so that the tearDown method knows > what temp files to remove. >From reading the Python documentation, the file created by tempfile.NamedTemporaryFile is removed automatically when the file handle is closed, even on Windows. --Michiel From bartek at rezolwenta.eu.org Tue Feb 10 12:21:41 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 10 Feb 2009 13:21:41 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <4990A27A.9060500@gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> Message-ID: <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> Hi, On Mon, Feb 9, 2009 at 10:39 PM, Bruce Southey wrote: > Python PEP 0374 "Migrating from svn to a distributed VCS" makes some points. > http://www.python.org/dev/peps/pep-0374/ Excellent link. Shows quite thorough comparison between possible DVCSs. And also that there are more people thinking about switching to a DVCS because SVN is not much better than CVS. The worrying part for me were the benchmarks showing that bzr is lagging behing mercurial and git in terms of speed. They mention that the benchmarks were done with an old version of bzr and there seems to be quite a lot of work on bzr performance so I decided to see how it works with actual biopython tree and current bzr and below are my first impressions. An important note here is that I'm not experienced in converting fairly large projects from CVS to any DVCS and what I've done might not be an optimal setup. I've taken the whole biopython CVS tree with complete version history (~3500 commits) and converted it to bzr branch using tailor. It took about 2-3 hours, but it needs to be done only once. The nice thing about tailor is that it gives you a directory structure with both bzr and cvs files so it can be later used for commiting stuff back to the CVS tree as well as getting new changes from CVS. Once I had that, I could publish my private branch of biopython to launchpad (it took about 10s). Now, if anyone is interested in test-driving bazaar+launchpad with biopython, he/she can just branch it to your own computer (you don't need any account for that, just bzr installed): bzr branch lp:~bartek/junk/biopython I did that (branch) on a different computert (~2min). Now one can start modifying code. I've done some changes to the Bio.Motif code (add a method, commit locally, fix a small bug in it, commit again, test) and pushed the changes to the branch on launchpad. Commits are quick (~3s), push takes about a minute, but this is including a scan of the whole tree, so it should not take much longer than this for bigger changes. Note:This is my own branch, so I can commit to it, but if I was not the owner (or maintainer) of the branch, I would have to either send my changes to the maintainer or publish my branch and let him "pull" from it. I realised later that I've accidentaly added a large directory during tailor conversion, so I removed it in the original bzr branch (as made by tailor) merged it with the changes pushed already to launchpad from somewhere else (Motif) and pushed the resulting tree back to launchpad.The removal was very fast (~5s) and the push took about the same time as with the small change.The good thing is that the history of all changes is retained. If anyone wants to give it a try, just install bzr and you can easily branch from me using: bzr branch lp:~bartek/junk/biopython The branch history can be seen here: https://code.launchpad.net/~bartek/+junk/biopython/ And the annotated source code is here: http://bazaar.launchpad.net/~bartek/+junk/biopython/files The specific changes done by me can be seen as revisions: http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3460 http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.1 http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.2 In summary, I think that it's doable to convert current CVS tree to bzr and bzr handle the job of a DVCS. Performance is not stellar (epsecially code browsing in launchpad is sometimes slow) but for it's acceptable, especially given that I'm rarely browsing the history, and much more often use command line tools which are (for me) fast enough. Please let me know what others think. If there will be general interest in that, I can try to set up a more permanent (but still experimental) bzr branch which would be automatically synchronized from CVS, so that we can do a more long-term experiment to see whether it works, and people like it. cheers Bartek From biopython at maubp.freeserve.co.uk Tue Feb 10 13:26:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Feb 2009 13:26:19 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> Message-ID: <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> > I've taken the whole biopython CVS tree with complete version history > (~3500 commits) and converted it to bzr branch using tailor. It took > about 2-3 hours, but it needs to be done only once. Did you do that from the public Biopython CVS server to your machine? If so, its nice to know that step isn't too slow. > In summary, I think that it's doable to convert current CVS tree to bzr and > bzr handle the job of a DVCS. Performance is not stellar (epsecially code > browsing in launchpad is sometimes slow) but for it's acceptable, especially > given that I'm rarely browsing the history, and much more often use command > line tools which are (for me) fast enough. > > Please let me know what others think. If there will be general interest > in that, I can try to set up a more permanent (but still experimental) bzr > branch which would be automatically synchronized from CVS, so that > we can do a more long-term experiment to see whether it works, and > people like it. Have you got a feel for whether it would be easier to sync CVS and bzr, or SVN and bzr? I personally would be more interested in an automatically synchronized git repository (rather than bzr), but this is not a thoroughly researched opinion. As you pointed out, the poor bzr benchmark speeds may not be so bad in the latest code - although the Biopython code base is not so big that this really matters. Peter From bartek at rezolwenta.eu.org Tue Feb 10 14:43:23 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Tue, 10 Feb 2009 15:43:23 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> Message-ID: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Hi, On Tue, Feb 10, 2009 at 2:26 PM, Peter wrote: >> I've taken the whole biopython CVS tree with complete version history >> (~3500 commits) and converted it to bzr branch using tailor. It took >> about 2-3 hours, but it needs to be done only once. > > Did you do that from the public Biopython CVS server to your machine? > If so, its nice to know that step isn't too slow. > You can do it using any cvs repository, but doing it over the network slows it down. I got bored so I downloaded the actual CVS repo from dev.open-bio.org:/home/repository/biopython The 2-3 hours is for conversion from a local repository which was a copy of the original biopython one. But once it is done you have a directory tree which has both CVS and .bzr entries, so you can use it for synchronization. > > Have you got a feel for whether it would be easier to sync CVS and > bzr, or SVN and bzr? > The tool I used (tailor) works with all VCS systems out there. Also launchpad is able to update a branch form either cvs or svn main repository. So there should be no difference, apart from one migration (CVS->SVN) more. > I personally would be more interested in an automatically synchronized > git repository (rather than bzr), but this is not a thoroughly > researched opinion. ?As you pointed out, the poor bzr benchmark speeds > may not be so bad in the latest code - although the Biopython code > base is not so big that this really matters. > when it comes to git, I have to say that I'm not really experienced, but my current understanding of the possibilities is as follows: I don't know about any service to _automaticaly_ synchronize CVS (or SVN) repo with git. There is git-svn, so if we move to SVN, we can set up a git repository and write some scripts around git-svn to have it synchronized with SVN trunk. Then, if we want to host it, we need to start a git-server on dev.open-bio.org or use the free account on github. It has a limit of 100mb and current biopython CVS tree is 57Mb, so we can go with it for a while, but I'm not sure if I would recommend it. But for sure there are people more experienced with git than me on the list, so we may hear about better options. cheers Bartek From argriffi at ncsu.edu Tue Feb 10 14:53:38 2009 From: argriffi at ncsu.edu (alex) Date: Tue, 10 Feb 2009 09:53:38 -0500 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Message-ID: <499194F2.3020906@ncsu.edu> > It has a limit of 100mb and current biopython CVS tree is 57Mb, so we can go with it for > a while, but I'm not sure if I would recommend it. According to github, "The 100MB is a soft limit setup to prevent abuse of the service. If your open source project needs more space, email us , we're happy to provide it." Biopython is an obviously legitimate project so you could probably get more space. Alex From bsouthey at gmail.com Tue Feb 10 15:29:15 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 10 Feb 2009 09:29:15 -0600 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Message-ID: <49919D4B.9060305@gmail.com> Bartek Wilczynski wrote: > Hi, > > On Tue, Feb 10, 2009 at 2:26 PM, Peter wrote: > >>> I've taken the whole biopython CVS tree with complete version history >>> (~3500 commits) and converted it to bzr branch using tailor. It took >>> about 2-3 hours, but it needs to be done only once. >>> >> Did you do that from the public Biopython CVS server to your machine? >> If so, its nice to know that step isn't too slow. >> >> > You can do it using any cvs repository, but doing it over the network > slows it down. > I got bored so I downloaded the actual CVS repo from > dev.open-bio.org:/home/repository/biopython > The 2-3 hours is for conversion from a local repository which was a > copy of the > original biopython one. But once it is done you have a directory tree > which has both > CVS and .bzr entries, so you can use it for synchronization. > > > >> Have you got a feel for whether it would be easier to sync CVS and >> bzr, or SVN and bzr? >> >> > The tool I used (tailor) works with all VCS systems out there. Also launchpad > is able to update a branch form either cvs or svn main repository. So > there should be > no difference, apart from one migration (CVS->SVN) more. > > >> I personally would be more interested in an automatically synchronized >> git repository (rather than bzr), but this is not a thoroughly >> researched opinion. As you pointed out, the poor bzr benchmark speeds >> may not be so bad in the latest code - although the Biopython code >> base is not so big that this really matters. >> >> > > when it comes to git, I have to say that I'm not really experienced, > but my current understanding of > the possibilities is as follows: > I don't know about any service to _automaticaly_ synchronize CVS (or > SVN) repo with git. > There is git-svn, so if we move to SVN, we can set up a git repository > and write some scripts > around git-svn to have it synchronized with SVN trunk. Then, if we > want to host it, > we need to start a git-server on dev.open-bio.org or use the free > account on github. It has a limit of > 100mb and current biopython CVS tree is 57Mb, so we can go with it for > a while, but I'm not sure if > I would recommend it. > > But for sure there are people more experienced with git than me on the > list, so we may hear about better options. > > cheers > Bartek > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, Thanks for doing all of this as it is very very interesting work! I am not experienced in either nor do I have a preference. But I appreciate the various comments on this as it makes it clear that Biopython needs to go to one of these systems. I came across various blogs that show the newest versions are bzr are much faster than the old versions. So nothing like good old competition! Here is one link via Google that shows various actions between git and bzr (a link to a similar comparison between git, bzr and Mercurial is at the bottom of that link): http://laserjock.wordpress.com/2008/05/08/git-and-bzr-historical-performance-comparison/ There are options to convert between bzr and git (like tailor as well as plugins). Also bzr-svn (http://bazaar-vcs.org/BzrForeignBranches/Subversion) and git-svn (http://www.kernel.org/pub/software/scm/git/docs/git-svn.html ) allow you to connect directly to Subversion repositories. From my brief reading, I think these are (or meant to be) bidirectional but the cvs support is somewhat limited. Echoing Chris's comment, should we even bother with svn at all? Obviously going to git or bzr or hg, svn is not necessary but in the short term it could used as a transition towards one of these. Possible uses of the svn would be maintaining the official repository of the current release plus important bug fixes, a sort of staging tree that all new code should build against. Bruce From biopython at maubp.freeserve.co.uk Tue Feb 10 15:31:26 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 10 Feb 2009 15:31:26 +0000 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499194F2.3020906@ncsu.edu> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> <499194F2.3020906@ncsu.edu> Message-ID: <320fb6e00902100731r7c931837o699e36b903dfd48c@mail.gmail.com> On Tue, Feb 10, 2009 at 2:53 PM, alex wrote: >> It has a limit of 100mb and current biopython CVS tree is 57Mb, so we can >> go with it for a while, but I'm not sure if I would recommend it. > > According to github, > "The 100MB is a soft limit setup to prevent abuse of the service. If your > open source project needs more space, email us , > we're happy to provide it." > > Biopython is an obviously legitimate project so you could probably get more > space. In the long term, assuming we do want an official git repository, I would be happier if we could just host it on biopython.org - this shouldn't be a technical problem, but would require discussion with the OBF team (e.g. opening ports, and who gets to look after the service, backing it up etc). That doesn't prevent a proof of concept using github. Peter From bugzilla-daemon at portal.open-bio.org Tue Feb 10 16:20:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 11:20:33 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902101620.n1AGKXmN008289@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #4 from eric.talevich at gmail.com 2009-02-10 11:20 EST ------- (In reply to comment #3) > Alternatively, we could add the __enter__ and __exit__ methods to the > Bio.File.UndoHandle object instead (which would benefit any code using them, > not just Bio.Entrez). You're right, that does what I wanted. This bug is just an enhancement to make the Entrez code work more like modern Python, not anything breaking current code -- my example is what I wished I could have written a few weeks ago when I first tried out Bio.Entrez. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Tue Feb 10 16:43:32 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 10 Feb 2009 17:43:32 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> Message-ID: <5aa3b3570902100843q652adfcbw277565f0a1c95690@mail.gmail.com> On Tue, Feb 10, 2009 at 3:43 PM, Bartek Wilczynski wrote: > Hi, > when it comes to git, I have to say that I'm not really experienced, In github, for every repository there is a button to create a fork and automatically add it to to your own space. Look at the image in this post: - http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo Is there something similar with launchpad? Or is it planned to be? Moreover, in github there are many tools that shows the ramifications of all the repositories coming from the original one, with a very nice view (it's this link, again:http://github.com/blog/39-say-hello-to-the-network-graph-visualizer) Let's say I fork your repository as you explained: how would you do to keep track of all the forks originated from your repository? Will you get notified that I have forked your repo? By the way, do you have any clue on how to configure bazaar under a proxy? :) > but my current understanding of > the possibilities is as follows: > I don't know about any service to _automaticaly_ synchronize CVS (or > SVN) repo with git. I don't know, but maybe the bioruby developers already know how to do it already. Thank you for all the posts and discussions above.. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Tue Feb 10 16:59:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 11:59:17 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902101659.n1AGxHqO013821@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 ------- Comment #5 from eric.talevich at gmail.com 2009-02-10 11:59 EST ------- Created an attachment (id=1226) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1226&action=view) Add __enter__ and __exit__ to UndoHandle Should SGMLHandle also get these methods? They'd be identical. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 11 00:48:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 10 Feb 2009 19:48:43 -0500 Subject: [Biopython-dev] [Bug 2752] Context management for Bio.Entrez handles In-Reply-To: Message-ID: <200902110048.n1B0mhKl028339@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2752 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2009-02-10 19:48 EST ------- I committed your patch to CVS; thanks for contributing. > Should SGMLHandle also get these methods? They'd be identical. In principle, yes, but SGMLHandle is currently not used anywhere in Biopython, and I wouldn't be surprised if it is removed from Biopython in a future release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed Feb 11 01:06:38 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 17:06:38 -0800 (PST) Subject: [Biopython-dev] docstring tests Message-ID: <787613.22831.qm@web62407.mail.re1.yahoo.com> Hi everybody, I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: test_seq ... ok test_translate ... ok test_trie ... ok test_triefind ... ok Bio.Seq docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqIO docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.AlignIO docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok ---------------------------------------------------------------------- Ran 107 tests in 97.191 seconds Previously, this code was in test_docstrings.py, but it's easier to do this from run_tests.py because doctest can create a unittest-style test suite directly. This also means that if your module contains docstring tests, you should include the module name to DOCTEST_MODULES near the top of run_tests.py (instead of to test_docstrings.py). I've uploaded the new run_tests.py to CVS so people can try it, but we can revert to the previous version of run_tests.py if preferred. If there are no issues with this approach, we can remove test_docstrings.py. --Michiel. From mjldehoon at yahoo.com Wed Feb 11 05:51:00 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 10 Feb 2009 21:51:00 -0800 (PST) Subject: [Biopython-dev] Updated the documentation of the Biopython testing framework Message-ID: <572067.89768.qm@web62403.mail.re1.yahoo.com> Hi everybody, I've updated the section in the tutorial about the Biopython testing framework. This description includes the examples that were previously in Doc/cookbook/biopython_test. I haven't uploaded this to CVS yet, but the HTML version of the tutorial is viewable here: http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html If there are no objections, I'll upload the new tutorial to CVS. --Michiel. From bartek at rezolwenta.eu.org Wed Feb 11 08:52:05 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 11 Feb 2009 09:52:05 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <499194F2.3020906@ncsu.edu> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> <499194F2.3020906@ncsu.edu> Message-ID: <8b34ec180902110052m440b030as7fcf2edab6fbbd7d@mail.gmail.com> On Tue, Feb 10, 2009 at 3:53 PM, alex wrote: > According to github, > "The 100MB is a soft limit setup to prevent abuse of the service. If your > open source project needs more space, email us , > we're happy to provide it." > > Biopython is an obviously legitimate project so you could probably get more > space. Oh, that's cool. I didn't know that. So I guess our job now is to evaluate both github and launchpad (maybe by trying them for a period of time) and see which seems to suit our needs better. Competition is always good :) cheers Bartek From bartek at rezolwenta.eu.org Wed Feb 11 09:11:57 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 11 Feb 2009 10:11:57 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <5aa3b3570902100843q652adfcbw277565f0a1c95690@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <320fb6e00902100526oa681fb7n241185c64205921e@mail.gmail.com> <8b34ec180902100643r2972e25eke8b8a8f621b5d554@mail.gmail.com> <5aa3b3570902100843q652adfcbw277565f0a1c95690@mail.gmail.com> Message-ID: <8b34ec180902110111h388e6dabq12897c181a5a02b3@mail.gmail.com> Hi, On Tue, Feb 10, 2009 at 5:43 PM, Giovanni Marco Dall'Olio wrote: > In github, for every repository there is a button to create a fork and > automatically add it to to your own space. > Look at the image in this post: > - http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo > Is there something similar with launchpad? Or is it planned to be? > No. As far as I know you need to branch using bzr to your machine and then you need to push it. It looks like this: bzr branch lp:branch-name then you get a local repo with all version history. Now you can register your branch to your launchpad account: bzr push lp:~username/project/branch In principle, you would do that only if you want other people to access it. Normally, I would publish only branches which have been modified in some meaningful way, and to modify it, you need to download it to your computer anyway... But having such a button might make it easier for people to branch and stimulate contributions. In launchpad, instead of a button you have the exact command printed on the page so that you can paste it into your console. > Moreover, in github there are many tools that shows the ramifications > of all the repositories coming from the original one, with a very nice > view (it's this link, > again:http://github.com/blog/39-say-hello-to-the-network-graph-visualizer) > Yeah, It's nice to see all stuff that's going on in all branches forked from the trunk. In launchpad, you have only a list of branches already submitted for merging. I think it's again a different philosphy, which reduces the amount of information to process for maintainers (you only see mature changes submitted for merging into the trunk) but you might miss all the stuff which was not submitted (and which you would see in github). It's hard to predict how many branches with active development we will have in BioPython, but generally I think the more info we have the better. > Let's say I fork your repository as you explained: how would you do to > keep track of all the forks originated from your repository? Will you > get notified that I have forked your repo? Not by default. I think you can be modified to some extent by plugins, but I have no experience here. Again, I don't think that tracking _all_ branches is necessary (and sometimes simply not possible: people can branch anonymously) but having some statistics on how many times a project was branched (i.e. downloaded) could be interesting. > > By the way, do you have any clue on how to configure bazaar under a proxy? :) > I'm not sure what do you mean. Are you behind a http proxy? If you are a registered user of launchpad, the communication is done via ssh, so there should be no problem. I don't know if there are any problems with using launchpad anonymously. What is your setup and what fails? >> but my current understanding of >> the possibilities is as follows: >> I don't know about any service to _automaticaly_ synchronize CVS (or >> SVN) repo with git. > > I don't know, but maybe the bioruby developers already know how to do > it already. According to their website, they just switched to Github. CVS is not synchronized (hasn't been updated for more than 6 months), but they might now about tools. cheers Bartek From dalloliogm at gmail.com Wed Feb 11 09:16:29 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 11 Feb 2009 10:16:29 +0100 Subject: [Biopython-dev] docstring tests In-Reply-To: <787613.22831.qm@web62407.mail.re1.yahoo.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> Message-ID: <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> On Wed, Feb 11, 2009 at 2:06 AM, Michiel de Hoon wrote: > Hi everybody, > > I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: thanks As for the doctests, it would be really be useful to define some global fixtures. It will reduce the docstrings lengths and we won't have to repeat all the examples in every method of every function, we won't have to repeat the 'from Bio import ..' in every test and so on. How this could be implemented in this test framework? > > test_seq ... ok > test_translate ... ok > test_trie ... ok > test_triefind ... ok > Bio.Seq docstring test ... ok > Bio.SeqRecord docstring test ... ok > Bio.SeqIO docstring test ... ok > Bio.Align.Generic docstring test ... ok > Bio.AlignIO docstring test ... ok > Bio.KEGG.Compound docstring test ... ok > Bio.KEGG.Enzyme docstring test ... ok > Bio.Wise docstring test ... ok > Bio.Wise.psw docstring test ... ok > Bio.Statistics.lowess docstring test ... ok > ---------------------------------------------------------------------- > Ran 107 tests in 97.191 seconds > > > Previously, this code was in test_docstrings.py, but it's easier to do this from run_tests.py because doctest can create a unittest-style test suite directly. This also means that if your module contains docstring tests, you should include the module name to DOCTEST_MODULES near the top of run_tests.py (instead of to test_docstrings.py). > > I've uploaded the new run_tests.py to CVS so people can try it, but we can revert to the previous version of run_tests.py if preferred. If there are no issues with this approach, we can remove test_docstrings.py. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 11 10:48:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 10:48:18 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> Message-ID: <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> > On Wed, Feb 11, 2009 at 2:06 AM, Michiel de Hoon wrote: >> Hi everybody, >> >> I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: That's nice, and arguably better than the test_docstring.py solution. In the medium/long term we may want to switch to an automatic detection of these tests. Do you think we need a way to run just the doctests? Before we could do "python run_tests.py test_docstring.py" to do this. On Wed, Feb 11, 2009 at 9:16 AM, Giovanni Marco Dall'Olio wrote: > As for the doctests, it would be really be useful to define some > global fixtures. > It will reduce the docstrings lengths and we won't have to repeat all > the examples in every method of every function, we won't have to > repeat the 'from Bio import ..' in every test and so on. > How this could be implemented in this test framework? I disagree here. If you have any import statements (or any other global fixtures) missing from the individual docstring examples this reduces their value as documentation (you have to explain somewhere what is missing, and make sure the user knows this). Including any required import statements is only a small overhead for the person writing the doctest examples and it makes the examples self contained (which I think is important for documentation). I can understand for example in NumPy that they might have a global "import numpy as np" done implicitly in all their tests, but they have a very flat namespace where this same line would otherwise be repeated for every single doctest. This is one "magic line of code" which would be the same for all the examples, and omitting it is more justified. Peter From dalloliogm at gmail.com Wed Feb 11 11:14:10 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 11 Feb 2009 12:14:10 +0100 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> Message-ID: <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> On Wed, Feb 11, 2009 at 11:48 AM, Peter wrote: >> On Wed, Feb 11, 2009 at 2:06 AM, Michiel de Hoon wrote: >>> Hi everybody, >>> >>> I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: > > That's nice, and arguably better than the test_docstring.py solution. > In the medium/long term we may want to switch to an automatic > detection of these tests. > > Do you think we need a way to run just the doctests? Before we could > do "python run_tests.py test_docstring.py" to do this. > > On Wed, Feb 11, 2009 at 9:16 AM, Giovanni Marco Dall'Olio > wrote: >> As for the doctests, it would be really be useful to define some >> global fixtures. >> It will reduce the docstrings lengths and we won't have to repeat all >> the examples in every method of every function, we won't have to >> repeat the 'from Bio import ..' in every test and so on. >> How this could be implemented in this test framework? > > I disagree here. If you have any import statements (or any other > global fixtures) missing from the individual docstring examples this > reduces their value as documentation (you have to explain somewhere > what is missing, and make sure the user knows this). Including any > required import statements is only a small overhead for the person > writing the doctest examples and it makes the examples self contained > (which I think is important for documentation). On the long run, it will be hard without fixtures: imagine, for example, the docs in BioSQL, where you will have to put the instructions to create a new database in every method's docstring of every class. I think it is not too bad to face the problem now, before it is too late, and at least give a general infrastructure for how doctest's fixtures will have to be in the future. > I can understand for example in NumPy that they might have a global > "import numpy as np" done implicitly in all their tests, but they have > a very flat namespace where this same line would otherwise be repeated > for every single doctest. This is one "magic line of code" which > would be the same for all the examples, and omitting it is more > justified. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 11 11:29:29 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 11:29:29 +0000 Subject: [Biopython-dev] Updated the documentation of the Biopython testing framework In-Reply-To: <572067.89768.qm@web62403.mail.re1.yahoo.com> References: <572067.89768.qm@web62403.mail.re1.yahoo.com> Message-ID: <320fb6e00902110329s4c84dab8w9120cf480fd84437@mail.gmail.com> On Wed, Feb 11, 2009 at 5:51 AM, Michiel de Hoon wrote: > Hi everybody, > > I've updated the section in the tutorial about the Biopython testing framework. > This description includes the examples that were previously in > Doc/cookbook/biopython_test. I haven't uploaded this to CVS yet, but the > HTML version of the tutorial is viewable here: > > http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html > > If there are no objections, I'll upload the new tutorial to CVS. > > --Michiel. In the unittest example could you add simple docstrings, so that the printed output is nicer? Otherwise thus far I have only skimmed the content, it looks good. Peter From biopython at maubp.freeserve.co.uk Wed Feb 11 13:16:06 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 13:16:06 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> Message-ID: <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> On Wed, Feb 11, 2009 at 11:14 AM, Giovanni Marco Dall'Olio wrote: > On the long run, it will be hard without fixtures: imagine, for > example, the docs in BioSQL, where you will have to put the > instructions to create a new database in every method's docstring of > every class. BioSQL is a special case - we can't have doctests which will work on every machine unless the user has installed particular database (e.g. MySQL), using particular database names, usernames and passwords. So I don't think we need to worry about doctests for BioSQL - because of its nature. For other complicated modules, you could just put one complex multi-part example in the main docstring, and not have individual doctests in each method (if doing so would require a lot of setup code each time). > I think it is not too bad to face the problem now, before it is too > late, and at least give a general infrastructure for how doctest's > fixtures will have to be in the future. It won't be too late - if we continue to write effectively "stand alone" doctests in each docstring, then if at some point we do need more infrastructure to support more complicated doctests, the old simple doctests will still work fine. I think you are inventing unneeded work here. Also if we do add something complicated or non-standard, it makes it harder if later on we do ever want to switch test frameworks (e.g. to nose). Peter From dalke at dalkescientific.com Wed Feb 11 14:25:07 2009 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 11 Feb 2009 15:25:07 +0100 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> Message-ID: <5B539D3F-26E0-4260-B98C-C85419B3D427@dalkescientific.com> On Feb 11, 2009, at 2:16 PM, Peter wrote: > BioSQL is a special case - we can't have doctests which will work on > every machine unless the user has installed particular database (e.g. > MySQL), using particular database names, usernames and passwords. Python comes with SQLite. The distribution could ship/install a small test database with a known schema. Andrew dalke at dalkescientific.com From bsouthey at gmail.com Wed Feb 11 14:34:24 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Feb 2009 08:34:24 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <787613.22831.qm@web62407.mail.re1.yahoo.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> Message-ID: <4992E1F0.1000807@gmail.com> Michiel de Hoon wrote: > Hi everybody, > > I included the code to run the docstring tests to run_tests.py, which means that now they're run after the test_*.py tests have finished: > > test_seq ... ok > test_translate ... ok > test_trie ... ok > test_triefind ... ok > Bio.Seq docstring test ... ok > Bio.SeqRecord docstring test ... ok > Bio.SeqIO docstring test ... ok > Bio.Align.Generic docstring test ... ok > Bio.AlignIO docstring test ... ok > Bio.KEGG.Compound docstring test ... ok > Bio.KEGG.Enzyme docstring test ... ok > Bio.Wise docstring test ... ok > Bio.Wise.psw docstring test ... ok > Bio.Statistics.lowess docstring test ... ok > ---------------------------------------------------------------------- > Ran 107 tests in 97.191 seconds > > > Previously, this code was in test_docstrings.py, but it's easier to do this from run_tests.py because doctest can create a unittest-style test suite directly. This also means that if your module contains docstring tests, you should include the module name to DOCTEST_MODULES near the top of run_tests.py (instead of to test_docstrings.py). > > I've uploaded the new run_tests.py to CVS so people can try it, but we can revert to the previous version of run_tests.py if preferred. If there are no issues with this approach, we can remove test_docstrings.py. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, I ran the latest CVS version through my Python versions on Linux. All appear to pass for Python 2.5 (with and without Numpy) and 2.6. BUT Python 2.4 has an error with docstring tests so it crashes (output below): File "run_tests.py", line 263, in runDocTest module = __import__(name, fromlist=name.split(".")) TypeError: __import__() takes no keyword arguments There is also one failure with Python 2.3 which does not test docstrings: ====================================================================== ERROR: Test Nexus module ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Nexus.py", line 92, in test_NexusTest1 self.assertTrue('codons' in n.charpartitions) AttributeError: 'NexusTest1' object has no attribute 'assertTrue' ---------------------------------------------------------------------- Ran 97 tests in 78.412 seconds Bruce [bsouthey at starling biopython]$ python2.4 setup.py test running test test_Ace ... ok test_AlignIO ... ok test_BioSQL ... skipping. Enter your settings in Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). test_BioSQL_SeqIO ... skipping. Enter your settings in Tests/setup_BioSQL.py (not important if you do not plan to use BioSQL). test_CAPS ... ok test_Clustalw ... ok test_Clustalw_tool ... skipping. Install clustalw or clustalw2 if you want to use Bio.Clustalw. test_Cluster ... ok test_CodonTable ... ok test_CodonUsage ... ok test_Compass ... ok test_Crystal ... ok test_DocSQL ... skipping. Install MySQLdb if you want to use Bio.DocSQL. test_EmbossPrimer ... ok test_Entrez ... ok test_Enzyme ... ok test_FSSP ... ok test_Fasta ... ok test_Fasta2 ... ok test_File ... ok test_GACrossover ... ok test_GAMutation ... ok test_GAOrganism ... ok test_GAQueens ... ok test_GARepair ... ok test_GASelection ... ok test_GFF ... skipping. Environment is not configured for this test (not important if you do not plan to use Bio.GFF). test_GFF2 ... skipping. Install MySQLdb if you want to use Bio.GFF. test_GenBank ... ok test_GenomeDiagram ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsChromosome ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsDistribution ... skipping. Install reportlab if you want to use Bio.Graphics. test_GraphicsGeneral ... skipping. Install reportlab if you want to use Bio.Graphics. test_HMMCasino ... ok test_HMMGeneral ... ok test_HotRand ... ok test_IsoelectricPoint ... ok test_KDTree ... ok test_KEGG ... ok test_KeyWList ... ok test_Location ... ok test_LocationParser ... ok test_LogisticRegression ... ok test_MEME ... ok test_MarkovModel ... ok test_Medline ... ok test_Motif ... ok test_NCBIStandalone ... ok test_NCBIXML ... ok test_NCBI_qblast ... ok test_NNExclusiveOr ... ok test_NNGene ... ok test_NNGeneral ... ok test_Nexus ... ok test_PDB ... ok test_ParserSupport ... ok test_Pathway ... ok test_Phd ... ok test_PopGen_FDist ... skipping. Install FDist if you want to use Bio.PopGen.FDist. test_PopGen_FDist_nodepend ... ok test_PopGen_GenePop ... ok test_PopGen_SimCoal ... skipping. Install SIMCOAL2 if you want to use Bio.PopGen.SimCoal. test_PopGen_SimCoal_nodepend ... ok test_ProtParam ... ok test_Restriction ... ok test_SCOP_Astral ... ok test_SCOP_Cla ... ok test_SCOP_Des ... ok test_SCOP_Dom ... ok test_SCOP_Hie ... ok test_SCOP_Raf ... ok test_SCOP_Residues ... ok test_SCOP_Scop ... ok test_SProt ... ok test_SVDSuperimposer ... ok test_SeqIO ... ok test_SeqIO_online ... ok test_SeqUtils ... ok test_SubsMat ... ok test_UniGene ... ok test_Wise ... ok test_align ... ok test_docstrings ... ok test_geo ... ok test_interpro ... ok test_kNN ... ok test_lowess ... ok test_pairwise2 ... ok test_prodoc ... ok test_property_manager ... ok test_prosite ... ok test_prosite2 ... ok test_psw ... ok test_seq ... ok test_translate ... ok test_trie ... ok test_triefind ... ok Traceback (most recent call last): File "/home/bsouthey/python/biopython_cvs/biopython/setup.py", line 418, in ? data_files=DATA_FILES, File "/usr/local/lib/python2.4/distutils/core.py", line 149, in setup dist.run_commands() File "/usr/local/lib/python2.4/distutils/dist.py", line 946, in run_commands self.run_command(cmd) File "/usr/local/lib/python2.4/distutils/dist.py", line 966, in run_command cmd_obj.run() File "/home/bsouthey/python/biopython_cvs/biopython/setup.py", line 212, in run run_tests.main([]) File "run_tests.py", line 107, in main runner.run() File "run_tests.py", line 292, in run ok = self.runDocTest(test) File "run_tests.py", line 263, in runDocTest module = __import__(name, fromlist=name.split(".")) TypeError: __import__() takes no keyword arguments From biopython at maubp.freeserve.co.uk Wed Feb 11 14:41:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 14:41:07 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <5B539D3F-26E0-4260-B98C-C85419B3D427@dalkescientific.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <5aa3b3570902110116j4cfa7992v22db1619fb34c05@mail.gmail.com> <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> <5aa3b3570902110314u3a4727edl2fd78fd14ef4d5aa@mail.gmail.com> <320fb6e00902110516o5bc865co2b6b1f2ccef1e30f@mail.gmail.com> <5B539D3F-26E0-4260-B98C-C85419B3D427@dalkescientific.com> Message-ID: <320fb6e00902110641o4989f899xa8e6f3f51f5218f@mail.gmail.com> On Wed, Feb 11, 2009 at 2:25 PM, Andrew Dalke wrote: > On Feb 11, 2009, at 2:16 PM, Peter wrote: >> >> BioSQL is a special case - we can't have doctests which will work on >> every machine unless the user has installed particular database (e.g. >> MySQL), using particular database names, usernames and passwords. > > Python comes with SQLite. The distribution could ship/install > a small test database with a known schema. Python 2.5+ comes with SQLite, but there isn't (yet) a BioSQL schema for it. That would be nice though, and could make running Biopython and BioSQL easier. Peter From biopython at maubp.freeserve.co.uk Wed Feb 11 14:45:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 14:45:16 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <4992E1F0.1000807@gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> Message-ID: <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> > There is also one failure with Python 2.3 which does not test docstrings: > > ====================================================================== > ERROR: Test Nexus module > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_Nexus.py", line 92, in test_NexusTest1 > ? self.assertTrue('codons' in n.charpartitions) > AttributeError: 'NexusTest1' object has no attribute 'assertTrue' That is because the unittest assertTrue is only available on python 2.4+, so we should add a quick workaround with a note that this can be simplified once we drop Python 2.3 support. Peter From bsouthey at gmail.com Wed Feb 11 15:16:41 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Feb 2009 09:16:41 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> Message-ID: <4992EBD9.5040403@gmail.com> Peter wrote: >> There is also one failure with Python 2.3 which does not test docstrings: >> >> ====================================================================== >> ERROR: Test Nexus module >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "test_Nexus.py", line 92, in test_NexusTest1 >> self.assertTrue('codons' in n.charpartitions) >> AttributeError: 'NexusTest1' object has no attribute 'assertTrue' >> > > That is because the unittest assertTrue is only available on python > 2.4+, so we should add a quick workaround with a note that this can be > simplified once we drop Python 2.3 support. > > Peter > I think these (as there are more than one) should be using failUnless instead: self.failUnless('codons' in n.charpartitions) From the docstring | failUnless(self, expr, msg=None) | Fail the test unless the expression is true. Bruce From biopython at maubp.freeserve.co.uk Wed Feb 11 15:52:05 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 15:52:05 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <4992EBD9.5040403@gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> <4992EBD9.5040403@gmail.com> Message-ID: <320fb6e00902110752i22c06b94te829641abd68ee65@mail.gmail.com> On Wed, Feb 11, 2009 at 3:16 PM, Bruce Southey wrote: >> That is because the unittest assertTrue is only available on python >> 2.4+, so we should add a quick workaround with a note that this can be >> simplified once we drop Python 2.3 support. > > I think these (as there are more than one) should be using failUnless > instead: > self.failUnless('codons' in n.charpartitions) Actually, from further reading, I think we should really be using assert_ (it would have been called assert, but this is a reserved word, so add a trailing underscore as per PEP8). The variants assertTrue and assertFalse were added to match JUnit. See: http://bugs.python.org/issue2249 Fixed in CVS to use assert_ instead of assertTrue. Peter From bsouthey at gmail.com Wed Feb 11 16:10:41 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 11 Feb 2009 10:10:41 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110752i22c06b94te829641abd68ee65@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902110645q31d5f4b3j705511d0a8ba624@mail.gmail.com> <4992EBD9.5040403@gmail.com> <320fb6e00902110752i22c06b94te829641abd68ee65@mail.gmail.com> Message-ID: <4992F881.9000206@gmail.com> Peter wrote: > On Wed, Feb 11, 2009 at 3:16 PM, Bruce Southey wrote: > >>> That is because the unittest assertTrue is only available on python >>> 2.4+, so we should add a quick workaround with a note that this can be >>> simplified once we drop Python 2.3 support. >>> >> I think these (as there are more than one) should be using failUnless >> instead: >> self.failUnless('codons' in n.charpartitions) >> > > Actually, from further reading, I think we should really be using > assert_ (it would have been called assert, but this is a reserved > word, so add a trailing underscore as per PEP8). The variants > assertTrue and assertFalse were added to match JUnit. See: > http://bugs.python.org/issue2249 > > Fixed in CVS to use assert_ instead of assertTrue. > > Peter > Okay The tests pass or skipped the tests that require Python 2.4+ for Python2.3 with the final message as expected : 'Docstring tests require Python 2.4 or later; skipping' Thanks Bruce From biopython at maubp.freeserve.co.uk Wed Feb 11 23:00:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 11 Feb 2009 23:00:25 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <4992E1F0.1000807@gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> Message-ID: <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> On Wed, Feb 11, 2009 at 2:34 PM, Bruce Southey wrote: > ... Python 2.4 has an error with docstring tests so it crashes (output below): > File "run_tests.py", line 263, in runDocTest > module = __import__(name, fromlist=name.split(".")) > TypeError: __import__() takes no keyword arguments Fixed in run_tests.py CVS revision 1.22, using ordered arguments instead. This now works on Python 2.4. For Python 2.3 we skip the doctests anyway so this doesn't matter. Peter From biopython at maubp.freeserve.co.uk Thu Feb 12 11:49:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 12 Feb 2009 11:49:53 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> Message-ID: <320fb6e00902120349u7898e2bate126208c837be913@mail.gmail.com> Hi Michiel (and everyone else), I was wondering about how the doctests are currently integrated into run_tests.py, and wondered if this patch makes things more concise? This patch is against run_tests.py CVS revision 1.22, essentially it adds the doctest modules to the list of tests - rather than as a separate list. The code becomes slightly shorter, but I am not sure if this is actually clearer or not. Note - this does not address the issue of how to run just the doctests - something I think is very useful when working on them. Peter $ diff run_tests.py run_tests2.py 209,211c209 < if self.tests: < self.doctest_modules = [] < else: --- > if not self.tests: 218c216,222 < self.doctest_modules = DOCTEST_MODULES --- > if sys.version_info[:2] < (2, 4): > #On python 2.3, doctest uses slightly different formatting > #which would be a problem as the expected output won't match. > #Also, it can't cope with in a doctest string. > sys.stderr.write("Skipping doctests which require Python 2.4+\n") > else : > self.tests.extend(DOCTEST_MODULES) 234,240c238,253 < module = __import__(name) < suite = unittest.TestLoader().loadTestsFromModule(module) < if suite.countTestCases()==0: < # This is a print-and-compare test instead of a unittest- < # type test. < test = ComparisonTestCase(name, output) < suite = unittest.TestSuite([test]) --- > if "." in name : > #Its a doc test > #Can't use fromlist=name.split(".") until python 2.5+ > module = __import__(name, None, None, name.split(".")) > suite = doctest.DocTestSuite(module) > del module > else : > #Its a unittest (or a print-and-compare test) > suite = unittest.TestLoader().loadTestsFromName(name) > if suite.countTestCases()==0: > # This is a print-and-compare test instead of a > # unittest-type test. > test = ComparisonTestCase(name, output) > suite = unittest.TestSuite([test]) 263,277d275 < def runDocTest(self, name): < #Can't use fromlist=name.split(".") until python 2.5+ < module = __import__(name, None, None, name.split(".")) < sys.stderr.write("%s docstring test ... " % module.__name__) < suite = doctest.DocTestSuite(module) < result = self._makeResult() < suite.run(result) < if result.wasSuccessful(): < sys.stderr.write("ok\n") < return True < else: < sys.stderr.write("FAIL\n") < result.printErrors() < return False < 287,297d284 < if sys.version_info[:2] < (2, 4): < #On python 2.3, doctest uses slightly different formatting < #which would be a problem as the expected output won't match. < #Also, it can't cope with in a doctest string. < sys.stderr.write("Docstring tests require Python 2.4 or later; skipping\n") < else: < for test in self.doctest_modules: < ok = self.runDocTest(test) < if not ok: < failures += 1 < total += 1 From bugzilla-daemon at portal.open-bio.org Thu Feb 12 13:11:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:11:10 -0500 Subject: [Biopython-dev] [Bug 2759] New: Unit test for Bio.PDB.HSExposure Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2759 Summary: Unit test for Bio.PDB.HSExposure Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Prompted by looked at the example script hsexpo, I've written a unittest based test for the Bio.PDB.HSExposure. I haven't checked it in yet because it prints out looks of warnings to stderr about oddities in the PDB file. We could either add a clean PDB file to the examples, or do something with the stderr. Note that the print-and-compare style test_PDB.py deals this this itself. Perhaps run_tests.py should do something similar for the unittest based cases. See also Bug 2754 which would actually make Bio.PDB print even more warnings to stderr. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 13:12:58 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:12:58 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121312.n1CDCw8h028040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-12 08:12 EST ------- Created an attachment (id=1234) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1234&action=view) New unit test for Bio.PDB.HSExposure This does not cover the DSSP or residue depth calculation, as these require 3rd party tools (DSSP and MSMS) to be installed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 13:14:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:14:13 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121314.n1CDEDgM028550@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-12 08:14 EST ------- This may have implications for how we write further Bio.PDB unit tests, see Bug 2759. [I still agree that any warnings from Bio.PDB should go to stderr rather than stdout] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 13:36:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:36:26 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121336.n1CDaQfQ004937@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #2 from dalloliogm at gmail.com 2009-02-12 08:36 EST ------- (In reply to comment #1) > Created an attachment (id=1234) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1234&action=view) [details] > New unit test for Bio.PDB.HSExposure > > This does not cover the DSSP or residue depth calculation, as these require 3rd > party tools (DSSP and MSMS) to be installed. > Can I suggest you a small refactoring of the test unit? I would move all the asserts in setUp to different functions. Then, it would be good to put also the call to PDB.PDBStructure to a global fixture, to avoid to repeat it for every test. Moreover, I will generalize all the know values and put them as variables, so later you will be able to apply the same test to other files by just subclassing the test. Let me know what is your opinion... :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 13:37:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 08:37:44 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121337.n1CDbikM005433@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #3 from dalloliogm at gmail.com 2009-02-12 08:37 EST ------- Created an attachment (id=1235) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1235&action=view) some of the suggestions I made in the previous comment -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 14:13:45 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 09:13:45 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121413.n1CEDjdZ017446@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-12 09:13 EST ------- (In reply to comment #2) > > Can I suggest you a small refactoring of the test unit? > I would move all the asserts in setUp to different functions. Maybe. They were really to check the file had loaded as I expected, so that the later tests are checking the residues I expect them to. > Then, it would be good to put also the call to PDB.PDBStructure to a global > fixture, to avoid to repeat it for every test. NO! That would be a very bad idea here. The HSExposure calls MODIFY the model passed to them, so for a clean test we NEED a fresh model each time. I suppose we could read the structure in once, and then make a copy for each sub-test, but I think it is clearer as it is. In general, having "global fixtures" is risky. The individual test methods may have side effects (like the changes to the residues in the model in this case), meaning that the overall behaviour will depend on the order the individual test methods are called in. One of the big benefits of using the unittest framework is that each test method is run in a clean known environment (compare this to our print-and-compare scripts, where this isn't the case). Using "global fixtures" shares objects between the individual tests and breaks this. The only good reason I can think of for having a global-setUp method (called once only) rather the current setUp method (called for each test method) is if the set up code is very slow. > Moreover, I will generalize all the know values and put them as variables, > so later you will be able to apply the same test to other files by just > subclassing the test. You would also have to extract the individual exposure scores. It would be simple to get these (and the residue names) as lists, and then check every single residue matches the expected values (rather than the short cut I used to just check the first few and the last few). We could also check any other chains in the structure (not just chain A). These changes are probably a good idea if we ever wanted to extend this unittest to try other PDB files as well, but seemed unnecessary for testing the basics of the Bio.PDB.HSExposure module. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 14:31:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 09:31:59 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121431.n1CEVxhk023680@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #2 from eric.talevich at gmail.com 2009-02-12 09:31 EST ------- Created an attachment (id=1236) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1236&action=view) Print errors and warnings in Bio.PDB to sys.stderr I left the test scripts after "if __name__ == '__main__'" printing at stdout since those messages are meant to be the output of the script if it's run directly. There are some apparent debugging print statements in MMCIF2Dict, commented out. I didn't touch them. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Feb 12 14:39:03 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 12 Feb 2009 08:39:03 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> References: <787613.22831.qm@web62407.mail.re1.yahoo.com> <4992E1F0.1000807@gmail.com> <320fb6e00902111500x2927f9b9pffde693fb966b94d@mail.gmail.com> Message-ID: <49943487.9010509@gmail.com> Peter wrote: > On Wed, Feb 11, 2009 at 2:34 PM, Bruce Southey wrote: > >> ... Python 2.4 has an error with docstring tests so it crashes (output below): >> File "run_tests.py", line 263, in runDocTest >> module = __import__(name, fromlist=name.split(".")) >> TypeError: __import__() takes no keyword arguments >> > > Fixed in run_tests.py CVS revision 1.22, using ordered arguments > instead. This now works on Python 2.4. For Python 2.3 we skip the > doctests anyway so this doesn't matter. > > Peter > Hi, Thanks! I just update from the cvs and all the tests currently pass on Linux for Python versions 2.3 (no doctests), 2.4, 2.5 (with and without numpy) and 2.6. Bruce From bugzilla-daemon at portal.open-bio.org Thu Feb 12 15:09:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:09:29 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121509.n1CF9TMo003270@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #5 from dalloliogm at gmail.com 2009-02-12 10:09 EST ------- (In reply to comment #4) > > Then, it would be good to put also the call to PDB.PDBStructure to a global > > fixture, to avoid to repeat it for every test. > > NO! That would be a very bad idea here. The HSExposure calls MODIFY the model > passed to them, so for a clean test we NEED a fresh model each time. Now I saw it, you're right! > > Moreover, I will generalize all the know values and put them as variables, > > so later you will be able to apply the same test to other files by just > > subclassing the test. > > You would also have to extract the individual exposure scores. It would be > simple to get these (and the residue names) as lists, and then check every > single residue matches the expected values (rather than the short cut I used to ok, I did it.. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 15:13:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:13:44 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121513.n1CFDiqm004524@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 ------- Comment #6 from dalloliogm at gmail.com 2009-02-12 10:13 EST ------- Created an attachment (id=1237) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1237&action=view) proposal of refactoring for test_PDB I have refactored the test and moved all the known values into a separate variable. Now it should be very easy to test other pdb files and conditions: just subclass this test, and redefine the values of residue_number, pdb_filename, expected_values, etc... I left the setUpAll method as commented, as it doesn't harm nobody there... even if it was not commented, it wouldn't be executed from within the normal unittest framework (and from nose, it would just have been an execution more). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 15:14:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:14:25 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121514.n1CFEPYP004828@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1235 is|0 |1 obsolete| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 15:14:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:14:33 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121514.n1CFEXbS004896@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #3 from bsouthey at gmail.com 2009-02-12 10:14 EST ------- (In reply to comment #2) > Created an attachment (id=1236) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1236&action=view) [details] > Print errors and warnings in Bio.PDB to sys.stderr (In reply to comment #1) > This may have implications for how we write further Bio.PDB unit tests, see Bug > 2759. > > [I still agree that any warnings from Bio.PDB should go to stderr rather than > stdout] > I believe that we should be using the using Python warnings module for these types of messages: http://docs.python.org/library/warnings.html This permits the user to have a greater control over the output and also allows redirecting the output as required. In the Bio directory, there are currently 36 and 25 uses of stderr and stdout, respectively. In terms of the patch, my limited understanding is that local import sys will override any global redirection of the output which in my opinion is a bad idea. Further it probably implications for the current test_PDB.py (greping stderr): test_PDB.py:14:# Redirect stderr so user does not see warnings test_PDB.py:37: # Class to hide stderr output test_PDB.py:94:old_stderr = sys.stderr test_PDB.py:95:# Hide stderr output for user test_PDB.py:96:sys.stderr=TheVoid() test_PDB.py:100: sys.stderr = old_stderr Also redirection is already being used by the PDB module (from greping): PDB/NACCESS.py:51: stdout = out.readlines() PDB/NACCESS.py:53: stderr = err.readlines() Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 15:27:04 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 10:27:04 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121527.n1CFR4Mj009728@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #4 from dalloliogm at gmail.com 2009-02-12 10:27 EST ------- (In reply to comment #3) > > I believe that we should be using the using Python warnings module for these > types of messages: > http://docs.python.org/library/warnings.html And what about the logging module? It allows configuration, personalization of the output, etc.. - http://docs.python.org/library/logging.html?highlight=logging#module-logging -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 16:01:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 11:01:10 -0500 Subject: [Biopython-dev] [Bug 2759] Unit test for Bio.PDB.HSExposure In-Reply-To: Message-ID: <200902121601.n1CG1Avv022384@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2759 dalloliogm at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1237 is|0 |1 obsolete| | ------- Comment #7 from dalloliogm at gmail.com 2009-02-12 11:01 EST ------- Created an attachment (id=1238) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1238&action=view) proposal of refactoring for test_PDB (fixed some errors) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 17:00:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 12:00:53 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121700.n1CH0r4G010835@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #5 from eric.talevich at gmail.com 2009-02-12 12:00 EST ------- (In reply to comment #3-4) Using warnings.warn() sounds right. That module is used in other places in Biopython, but not in Bio.PDB yet. > And what about the logging module? > It allows configuration, personalization of the output, etc.. The logging module is probably overkill for a library, I think. It's very flexible, but the setup is kind of tedious, and generally an application using both Biopython and the logging module would figure out how to raise the warnings as exceptions, re-capture then, and log them in whatever customized way is needed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 18:08:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 13:08:01 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121808.n1CI81oJ031358@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #6 from eric.talevich at gmail.com 2009-02-12 13:08 EST ------- Created an attachment (id=1239) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1239&action=view) Use the warnings module for printing warnings I grepped Bio/PDB for stderr and replaced what looked like warning messages with calls to warnings.warn(). A couple of files need further attention: StructureBuilder.py: Every warning is protected by "if __debug__:", which seems like something the warning module itself should cover. PDBParser.py: Parsing exceptions are caught and passed to _handle_PDB_exception, which then decides whether to re-raise the exception or just issue a warning. The warnings module should be able to cover some of this functionality. There's also a feature to only show the first instance of the same warnings triggered by the same lines, which would make the output from parsing semi-malformed PDB files less annoying in permissive mode. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 19:14:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 14:14:12 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121914.n1CJECTH014825@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #7 from eric.talevich at gmail.com 2009-02-12 14:14 EST ------- (In reply to comment #6) Also, as Bruce and Peter implied may happen, this patch clobbers test_PDB.py. Some options: 1. Redirect stderr to stdout, and modify Tests/output/test_PDB to match again. 2. Change test_PDB.py to check the exceptions separately, maybe converting it to a unittest-style test in the process. Maybe also splitting a_structure.pdb into multiple files, with one bug each. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 12 19:38:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Feb 2009 14:38:33 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902121938.n1CJcXmZ020200@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #8 from bsouthey at gmail.com 2009-02-12 14:38 EST ------- (In reply to comment #7) > (In reply to comment #6) > > Also, as Bruce and Peter implied may happen, this patch clobbers test_PDB.py. > Some options: > > 1. Redirect stderr to stdout, and modify Tests/output/test_PDB to match again. > > 2. Change test_PDB.py to check the exceptions separately, maybe converting it > to a unittest-style test in the process. Maybe also splitting a_structure.pdb > into multiple files, with one bug each. > You know more about this than I do. But I think that test_PDB.py must get rewritten partly because of the text it prints and lack of coverage (like retrieving PDB file online). But really it should be checking these corner cases are handled correctly. So if it is an error in PDB file then the test should check that the error reported is the correct message for that error. For example, running the test from the command line the first message is: PDBConstructionException: Atom N defined twice in residue at line 19. Exception ignored. Is that correct or desired output? The actual error is in my mind irrelevant although I do wonder why a special exception is used. (In reply to comment #6) There are a few cases of this so I think a separate bug should be filed. But cleaning these up would be appreciated, at least by me. Just my couple of cents, Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Feb 12 21:08:42 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 12 Feb 2009 15:08:42 -0600 Subject: [Biopython-dev] FYI LWN article 318699 How patches get into the mainline Message-ID: <49948FDA.3080301@gmail.com> Hi, Just thought it might this might be interesting since we have been talking about git. Jonathan Corbet work this article (How patches get into the mainline ) where he traced a patches for Graphviz (article will be available to all next week). http://lwn.net/SubscriberLink/318699/1df097b75e861618/ * http://lwn.net/SubscriberLink/318699/1df097b75e861618/ Bruce From mjldehoon at yahoo.com Fri Feb 13 08:34:46 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 00:34:46 -0800 (PST) Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902110248h4dc2071br943402d6ed02082e@mail.gmail.com> Message-ID: <189305.12645.qm@web62407.mail.re1.yahoo.com> > Do you think we need a way to run just the doctests? > Before we could > do "python run_tests.py test_docstring.py" to do > this. We could add an option "doctest": python run_tests.py doctest runs the doctests only; python run_tests.py test_Cluster doctest runs test_Cluster.py and the doctests, etc. --Michiel From bugzilla-daemon at portal.open-bio.org Fri Feb 13 11:40:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 06:40:40 -0500 Subject: [Biopython-dev] [Bug 2760] New: proposal: enhancement for SeqIO.TabIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2760 Summary: proposal: enhancement for SeqIO.TabIO Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com this patch fix a problem that TabIO had (fail if there it are more than two tabs, or spaces instead of tabs, between the title and the sequence), and introduces a check to skip empty lines. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 11:41:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 06:41:09 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902131141.n1DBf9p1018277@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 ------- Comment #1 from dalloliogm at gmail.com 2009-02-13 06:41 EST ------- Created an attachment (id=1240) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1240&action=view) TabIO patch -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Feb 13 12:12:38 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 04:12:38 -0800 (PST) Subject: [Biopython-dev] docstring tests In-Reply-To: <320fb6e00902120349u7898e2bate126208c837be913@mail.gmail.com> Message-ID: <992871.86988.qm@web62403.mail.re1.yahoo.com> Thanks for the patch. I've updated run_tests.py along these lines, and I added an option "doctest" to specify running the doctests: $ python run_tests.py doctest Bio.Seq docstring test ... ok Bio.SeqRecord docstring test ... ok Bio.SeqIO docstring test ... ok Bio.Align.Generic docstring test ... ok Bio.AlignIO docstring test ... ok Bio.KEGG.Compound docstring test ... ok Bio.KEGG.Enzyme docstring test ... ok Bio.Wise docstring test ... ok Bio.Wise.psw docstring test ... ok Bio.Statistics.lowess docstring test ... ok ---------------------------------------------------------------------- Ran 10 tests in 0.726 seconds --- On Thu, 2/12/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] docstring tests > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Thursday, February 12, 2009, 6:49 AM > Hi Michiel (and everyone else), > > I was wondering about how the doctests are currently > integrated into > run_tests.py, and wondered if this patch makes things more > concise? > This patch is against run_tests.py CVS revision 1.22, > essentially it > adds the doctest modules to the list of tests - rather than > as a > separate list. The code becomes slightly shorter, but I am > not sure > if this is actually clearer or not. > > Note - this does not address the issue of how to run just > the doctests > - something I think is very useful when working on them. > > Peter > > $ diff run_tests.py run_tests2.py > 209,211c209 > < if self.tests: > < self.doctest_modules = [] > < else: > --- > > if not self.tests: > 218c216,222 > < self.doctest_modules = DOCTEST_MODULES > --- > > if sys.version_info[:2] < (2, 4): > > #On python 2.3, doctest uses slightly > different formatting > > #which would be a problem as the > expected output won't match. > > #Also, it can't cope with > in a doctest string. > > sys.stderr.write("Skipping > doctests which require Python 2.4+\n") > > else : > > self.tests.extend(DOCTEST_MODULES) > 234,240c238,253 > < module = __import__(name) > < suite = > unittest.TestLoader().loadTestsFromModule(module) > < if suite.countTestCases()==0: > < # This is a print-and-compare test > instead of a unittest- > < # type test. > < test = ComparisonTestCase(name, > output) > < suite = unittest.TestSuite([test]) > --- > > if "." in name : > > #Its a doc test > > #Can't use > fromlist=name.split(".") until python 2.5+ > > module = __import__(name, None, > None, name.split(".")) > > suite = > doctest.DocTestSuite(module) > > del module > > else : > > #Its a unittest (or a > print-and-compare test) > > suite = > unittest.TestLoader().loadTestsFromName(name) > > if suite.countTestCases()==0: > > # This is a print-and-compare > test instead of a > > # unittest-type test. > > test = > ComparisonTestCase(name, output) > > suite = > unittest.TestSuite([test]) > 263,277d275 > < def runDocTest(self, name): > < #Can't use > fromlist=name.split(".") until python 2.5+ > < module = __import__(name, None, None, > name.split(".")) > < sys.stderr.write("%s docstring test ... > " % module.__name__) > < suite = doctest.DocTestSuite(module) > < result = self._makeResult() > < suite.run(result) > < if result.wasSuccessful(): > < sys.stderr.write("ok\n") > < return True > < else: > < sys.stderr.write("FAIL\n") > < result.printErrors() > < return False > < > 287,297d284 > < if sys.version_info[:2] < (2, 4): > < #On python 2.3, doctest uses slightly > different formatting > < #which would be a problem as the expected > output won't match. > < #Also, it can't cope with > in a doctest string. > < sys.stderr.write("Docstring tests > require Python 2.4 or > later; skipping\n") > < else: > < for test in self.doctest_modules: > < ok = self.runDocTest(test) > < if not ok: > < failures += 1 > < total += 1 From mjldehoon at yahoo.com Fri Feb 13 12:16:36 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 04:16:36 -0800 (PST) Subject: [Biopython-dev] Updated the documentation of the Biopython testing framework In-Reply-To: <320fb6e00902110329s4c84dab8w9120cf480fd84437@mail.gmail.com> Message-ID: <975833.90231.qm@web62402.mail.re1.yahoo.com> I've added some docstring examples to the unittest section. --Michiel --- On Wed, 2/11/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Updated the documentation of the Biopython testing framework > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Wednesday, February 11, 2009, 6:29 AM > On Wed, Feb 11, 2009 at 5:51 AM, Michiel de Hoon > wrote: > > Hi everybody, > > > > I've updated the section in the tutorial about the > Biopython testing framework. > > This description includes the examples that were > previously in > > Doc/cookbook/biopython_test. I haven't uploaded > this to CVS yet, but the > > HTML version of the tutorial is viewable here: > > > > > http://biopython.org/DIST/docs/tutorial/Tutorial.proposal.html > > > > If there are no objections, I'll upload the new > tutorial to CVS. > > > > --Michiel. > > In the unittest example could you add simple docstrings, so > that the > printed output is nicer? Otherwise thus far I have only > skimmed the > content, it looks good. > > Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 13 12:46:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 07:46:40 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902131246.n1DCkeY1003356@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 07:46 EST ------- The "tab" format in Bio.SeqIO was explicitly ONLY for simple tab files with two fields (see Bug 2533). Perhaps a more helpful error message would be a good idea. If there are more than two fields, determining which are the title and sequence is complicated. Your code seems to assume these are the first two fields, and ignores the rest - which may work in some cases. Do you have some specific examples of tab separated files you want to read in using Bio.SeqIO? I am particularly interested in files from other software packages (not ones you created yourself). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 13:00:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 08:00:09 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902131300.n1DD09jK006776@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 08:00 EST ------- (In reply to comment #0) > > this patch ..., and introduces a check to skip empty lines. > That change is probably a good idea, but not that rather than : if line != "" : #Do stuff... I believe the following is considered better python style: if line : #Do stuff... I have updated CVS to ignore blank lines, and to give a more helpful ValueError when trying to parse invalid files. See revision 1.2, http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqIO/TabIO.py?cvsroot=biopython Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Fri Feb 13 15:37:38 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 13 Feb 2009 09:37:38 -0600 Subject: [Biopython-dev] docstring tests In-Reply-To: <992871.86988.qm@web62403.mail.re1.yahoo.com> References: <992871.86988.qm@web62403.mail.re1.yahoo.com> Message-ID: <499593C2.3090806@gmail.com> Michiel de Hoon wrote: > Thanks for the patch. I've updated run_tests.py along these lines, and I added an option "doctest" to specify running the doctests: > > $ python run_tests.py doctest > Bio.Seq docstring test ... ok > Bio.SeqRecord docstring test ... ok > Bio.SeqIO docstring test ... ok > Bio.Align.Generic docstring test ... ok > Bio.AlignIO docstring test ... ok > Bio.KEGG.Compound docstring test ... ok > Bio.KEGG.Enzyme docstring test ... ok > Bio.Wise docstring test ... ok > Bio.Wise.psw docstring test ... ok > Bio.Statistics.lowess docstring test ... ok > ---------------------------------------------------------------------- > Ran 10 tests in 0.726 seconds > > > > > --- On Thu, 2/12/09, Peter wrote: > > >> From: Peter >> Subject: Re: [Biopython-dev] docstring tests >> To: mjldehoon at yahoo.com >> Cc: biopython-dev at biopython.org >> Date: Thursday, February 12, 2009, 6:49 AM >> Hi Michiel (and everyone else), >> >> I was wondering about how the doctests are currently >> integrated into >> run_tests.py, and wondered if this patch makes things more >> concise? >> This patch is against run_tests.py CVS revision 1.22, >> essentially it >> adds the doctest modules to the list of tests - rather than >> as a >> separate list. The code becomes slightly shorter, but I am >> not sure >> if this is actually clearer or not. >> >> Note - this does not address the issue of how to run just >> the doctests >> - something I think is very useful when working on them. >> >> Peter >> >> $ diff run_tests.py run_tests2.py >> 209,211c209 >> < if self.tests: >> < self.doctest_modules = [] >> < else: >> --- >> >>> if not self.tests: >>> >> 218c216,222 >> < self.doctest_modules = DOCTEST_MODULES >> --- >> >>> if sys.version_info[:2] < (2, 4): >>> #On python 2.3, doctest uses slightly >>> >> different formatting >> >>> #which would be a problem as the >>> >> expected output won't match. >> >>> #Also, it can't cope with >>> >> in a doctest string. >> >>> sys.stderr.write("Skipping >>> >> doctests which require Python 2.4+\n") >> >>> else : >>> self.tests.extend(DOCTEST_MODULES) >>> >> 234,240c238,253 >> < module = __import__(name) >> < suite = >> unittest.TestLoader().loadTestsFromModule(module) >> < if suite.countTestCases()==0: >> < # This is a print-and-compare test >> instead of a unittest- >> < # type test. >> < test = ComparisonTestCase(name, >> output) >> < suite = unittest.TestSuite([test]) >> --- >> >>> if "." in name : >>> #Its a doc test >>> #Can't use >>> >> fromlist=name.split(".") until python 2.5+ >> >>> module = __import__(name, None, >>> >> None, name.split(".")) >> >>> suite = >>> >> doctest.DocTestSuite(module) >> >>> del module >>> else : >>> #Its a unittest (or a >>> >> print-and-compare test) >> >>> suite = >>> >> unittest.TestLoader().loadTestsFromName(name) >> >>> if suite.countTestCases()==0: >>> # This is a print-and-compare >>> >> test instead of a >> >>> # unittest-type test. >>> test = >>> >> ComparisonTestCase(name, output) >> >>> suite = >>> >> unittest.TestSuite([test]) >> 263,277d275 >> < def runDocTest(self, name): >> < #Can't use >> fromlist=name.split(".") until python 2.5+ >> < module = __import__(name, None, None, >> name.split(".")) >> < sys.stderr.write("%s docstring test ... >> " % module.__name__) >> < suite = doctest.DocTestSuite(module) >> < result = self._makeResult() >> < suite.run(result) >> < if result.wasSuccessful(): >> < sys.stderr.write("ok\n") >> < return True >> < else: >> < sys.stderr.write("FAIL\n") >> < result.printErrors() >> < return False >> < >> 287,297d284 >> < if sys.version_info[:2] < (2, 4): >> < #On python 2.3, doctest uses slightly >> different formatting >> < #which would be a problem as the expected >> output won't match. >> < #Also, it can't cope with >> in a doctest string. >> < sys.stderr.write("Docstring tests >> require Python 2.4 or >> later; skipping\n") >> < else: >> < for test in self.doctest_modules: >> < ok = self.runDocTest(test) >> < if not ok: >> < failures += 1 >> < total += 1 >> > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, At present 'python setup.py test' does do all the tests including the doctests. Just curious, will you also add the ability to select the doctests there as well? Bruce From biopython at maubp.freeserve.co.uk Fri Feb 13 15:48:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Feb 2009 15:48:33 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902040514te7c7d2ci245433371770d172@mail.gmail.com> References: <320fb6e00902030602p3afd8a82scd5ed5adffda65eb@mail.gmail.com> <114112.52378.qm@web62406.mail.re1.yahoo.com> <320fb6e00902040514te7c7d2ci245433371770d172@mail.gmail.com> Message-ID: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> On Wed, Feb 4, 2009 at 1:14 PM, Peter wrote: > It look a little while to show up in CVS for me, but I've got it now. > That seems to solve the problem neatly - and you've even managed > to capture the stack trace elegantly, something I hadn't worked out how to do. > > Nice :) Unfortunately, the traceback.format_exc() function you used to capture the stack trace for print out is Python 2.4+ only[1]. This means if one of the print-and-compare tests fails with an exception on Python 2.3, then run_tests.py will fall over. I've checked in a simple fix to use the exception text instead - I'm sure something more useful could be done for Python 2.3, but we'll be dropping support for this fairly soon anyway. This was failing for me on a "known failure", test_Clustalw_tool.py on Windows Python 2.3, where some filenames with spaces just won't work without the subprocess module (Python 2.4+ only). I don't think this can be avoided, so I've updated test_Clustalw_tool.py to skip this bit in future. Peter [1] See http://docs.python.org/library/traceback.html From biopython at maubp.freeserve.co.uk Fri Feb 13 16:02:41 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 13 Feb 2009 16:02:41 +0000 Subject: [Biopython-dev] test_Ace, test_Nexus, test_Phd In-Reply-To: <366127.53671.qm@web62408.mail.re1.yahoo.com> References: <320fb6e00902100235m5dcd72e1reb9e4e7e0ea3b3e6@mail.gmail.com> <366127.53671.qm@web62408.mail.re1.yahoo.com> Message-ID: <320fb6e00902130802i2abcad45xafdfb7e4c08820f9@mail.gmail.com> On Tue, Feb 10, 2009 at 11:25 AM, Michiel de Hoon wrote: > >> The test_Nexus tearDown used to make sure the temp output >> files were removed. ?This is important on Windows which >> does not do this automatically. ?I see you now allocate >> "random" filenames using tempfile.NamedTemporaryFile(...) >> so presumably we would need to record these so that the >> tearDown method knows what temp files to remove. > > From reading the Python documentation, the file created by > tempfile.NamedTemporaryFile is removed automatically > when the file handle is closed, even on Windows. That's good to know. On a related point, I've just found test_Nexus.py is failing on Windows XP with Python 2.6 (but is fine with Python 2.3, 2.4 and 2.5): C:\repository\biopython\Tests>c:\python26\python test_Nexus.py Test Nexus module ... ERROR Test Tree module. ... ok ====================================================================== ERROR: Test Nexus module ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Nexus.py", line 114, in test_NexusTest1 f1=tempfile.NamedTemporaryFile(mode='r+w+b') File "c:\python26\lib\tempfile.py", line 445, in NamedTemporaryFile file = _os.fdopen(fd, mode, bufsize) OSError: [Errno 22] Invalid argument ---------------------------------------------------------------------- Ran 2 tests in 0.016s FAILED (errors=1) I don't have time to look into this right now, but should be able to investigate next week. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 13 16:24:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 11:24:37 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902131624.n1DGOb6d026675@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 11:24 EST ------- I think we have resolved this by extending the main tutorial (in CVS) to include a unittest example. This work also replaces the existing (slightly out of date) unit test examples, which have been deleted in CVS: http://biopython.org/DIST/docs/cookbook/biopython_test.html http://biopython.org/DIST/docs/cookbook/biopython_test.pdf Marking bug as fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 16:26:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 11:26:21 -0500 Subject: [Biopython-dev] [Bug 2743] manual installation overwrites previous biopython installations In-Reply-To: Message-ID: <200902131626.n1DGQLdn027308@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2743 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-13 11:26 EST ------- Closing this bug as "invalid". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 13 16:58:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 13 Feb 2009 11:58:29 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902131658.n1DGwT20003435@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #6 from dalloliogm at gmail.com 2009-02-13 11:58 EST ------- (In reply to comment #5) > I think we have resolved this by extending the main tutorial (in CVS) to > include a unittest example. This work also replaces the existing (slightly out > of date) unit test examples, which have been deleted in CVS: > http://biopython.org/DIST/docs/cookbook/biopython_test.html > http://biopython.org/DIST/docs/cookbook/biopython_test.pdf > > Marking bug as fixed. In the example, you could add a comment in the setUp and tearDown functions, something like: def setUp(self): """these instructions will be executed *before* each of the tests in this unit""" and def tearDown(self): """these instructions will be executed *after* each of the tests in this unit""" It will make it clearer. Moreover, the python's library reference for unittest explain very clearly how fixtures and unittest works, maybe it's worth to add a link to it somewhere: - http://www.python.org/doc/2.5.2/lib/module-unittest.html I would also structure the test in a slightly different way.. I would put 'filename' in a separated variable (easier to read), and I would add a knowValues test as example. Finally, if you want to add a comment on global fixture, you can say it is possible to implement them with the 'self._is_set_up' trick. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Sat Feb 14 02:12:17 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 18:12:17 -0800 (PST) Subject: [Biopython-dev] docstring tests In-Reply-To: <499593C2.3090806@gmail.com> Message-ID: <729099.67585.qm@web62405.mail.re1.yahoo.com> > Hi, > At present 'python setup.py test' does do all the > tests including the > doctests. Just curious, will you also add the ability to > select the > doctests there as well? > I wasn't planning to, since "python setup.py test" currently does not allow selecting for any of the test scripts either; it just runs all of them. But I won't object if somebody else (wink, wink) adds this capability to "python setup.py test". On the other hand, you may think of "python setup.py test" as the quick-and-comprehensive way to run the tests, and run_tests.py as a more specialized tool that gives you more control. --Michiel. From mjldehoon at yahoo.com Sat Feb 14 02:14:27 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 13 Feb 2009 18:14:27 -0800 (PST) Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> Message-ID: <743351.68664.qm@web62405.mail.re1.yahoo.com> Currently, Numpy doesn't seem to work with python < 2.4, so for reliability maybe Biopython also should require python >= 2.4. --Michiel --- On Fri, 2/13/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] run_tests.py rewrite > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Friday, February 13, 2009, 10:48 AM > On Wed, Feb 4, 2009 at 1:14 PM, Peter > wrote: > > It look a little while to show up in CVS for me, but > I've got it now. > > That seems to solve the problem neatly - and > you've even managed > > to capture the stack trace elegantly, something I > hadn't worked out how to do. > > > > Nice :) > > Unfortunately, the traceback.format_exc() function you used > to capture > the stack trace for print out is Python 2.4+ only[1]. This > means if > one of the print-and-compare tests fails with an exception > on Python > 2.3, then run_tests.py will fall over. I've checked in > a simple fix > to use the exception text instead - I'm sure something > more useful > could be done for Python 2.3, but we'll be dropping > support for this > fairly soon anyway. > > This was failing for me on a "known failure", > test_Clustalw_tool.py on > Windows Python 2.3, where some filenames with spaces just > won't work > without the subprocess module (Python 2.4+ only). I > don't think this > can be avoided, so I've updated test_Clustalw_tool.py > to skip this bit > in future. > > Peter > > [1] See http://docs.python.org/library/traceback.html From biopython at maubp.freeserve.co.uk Sat Feb 14 14:31:43 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 14:31:43 +0000 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <743351.68664.qm@web62405.mail.re1.yahoo.com> References: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> <743351.68664.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00902140631n4b5b472bi9e3f8de0e3a8647@mail.gmail.com> On Sat, Feb 14, 2009 at 2:14 AM, Michiel de Hoon wrote: > > Currently, Numpy doesn't seem to work with python < 2.4, so for reliability > maybe Biopython also should require python >= 2.4. > What specifically are you refering to? I've not had any trouble with older versions of numpy on python 2.3 - although I believe later versions of numpy do require python 2.4+ (this must be stated on their website somewhere). We've already said that the next release (Biopython 1.50) will be the last to officially support Python 2.3, so this isn't going to be an issue for much longer anyway. Peter From biopython at maubp.freeserve.co.uk Sat Feb 14 14:37:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 14:37:16 +0000 Subject: [Biopython-dev] docstring tests In-Reply-To: <729099.67585.qm@web62405.mail.re1.yahoo.com> References: <499593C2.3090806@gmail.com> <729099.67585.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00902140637j68c03936q5efcd1e6ebe8313d@mail.gmail.com> On Sat, Feb 14, 2009 at 2:12 AM, Michiel de Hoon wrote: > I wasn't planning to, since "python setup.py test" currently does not > allow selecting for any of the test scripts either; it just runs all of them. > But I won't object if somebody else (wink, wink) adds this capability > to "python setup.py test". On the other hand, you may think of > "python setup.py test" as the quick-and-comprehensive way to > run the tests, and run_tests.py as a more specialized tool that > gives you more control. I think this is fine as it is, and would not be keen on adding any redundant code to the setup.py file (there is a small risk of causing problems for third party integrators, py2exe etc). The "python setup.py test" is really there as part of the installation procedure, where you would just want all the tests run. The only reason I'd want to run some of the tests is if debugging something - and this is where you would use run_tests.py directly. Peter From bugzilla-daemon at portal.open-bio.org Sat Feb 14 14:59:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 14 Feb 2009 09:59:53 -0500 Subject: [Biopython-dev] [Bug 2749] Proposal: a template for biopython's unittests In-Reply-To: Message-ID: <200902141459.n1EExr6u006975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2749 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-14 09:59 EST ------- (In reply to comment #6) > > In the example, you could add a comment in the setUp and tearDown functions, > something like: > def setUp(self): > """these instructions will be executed *before* each of the tests in this > unit""" > > and > > def tearDown(self): > """these instructions will be executed *after* each of the tests in this > unit""" > > It will make it clearer. I've added a mention about setUp and tearDown. > Moreover, the python's library reference for unittest explain very clearly how > fixtures and unittest works, maybe it's worth to add a link to it somewhere: > - http://www.python.org/doc/2.5.2/lib/module-unittest.html There was one link at the start of the chapter, but I have added a couple more. We don't need too much detail - that's what the unittest documentation is for ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Sat Feb 14 16:27:19 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sat, 14 Feb 2009 17:27:19 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <320fb6e00902070455h72c7bd31w506f5ed52e9633bc@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> Message-ID: <5aa3b3570902140827n55b210f4q49ed9d4b8b1c56fe@mail.gmail.com> On Tue, Feb 10, 2009 at 1:21 PM, Bartek Wilczynski wrote: > Hi, > > > Once I had that, I could publish my private branch of biopython to > launchpad (it took about 10s). > Now, if anyone is interested in test-driving bazaar+launchpad with > biopython, he/she can just > branch it to your own computer (you don't need any account for that, > just bzr installed): > bzr branch lp:~bartek/junk/biopython > > I did that (branch) on a different computert (~2min). Now one can > start modifying code. > I've done some changes to the Bio.Motif code (add a method, commit > locally, fix a small bug in it, > commit again, test) and pushed the changes to the branch on launchpad. > Commits are quick (~3s), > push takes about a minute, but this is including a scan of the whole > tree, so it should not > take much longer than this for bigger changes. > > Note:This is my own branch, so I can commit to it, but if I was not > the owner (or maintainer) of the > branch, I would have to either send my changes to the maintainer or > publish my branch and let him > "pull" from it. > > I realised later that I've accidentaly added a large directory during > tailor conversion, so I removed it in > the original bzr branch (as made by tailor) merged it with the changes > pushed already to launchpad > from somewhere else (Motif) and pushed the resulting tree back to > launchpad.The removal was very fast > (~5s) and the push took about the same time as with the small > change.The good thing is that the history > of all changes is retained. > > If anyone wants to give it a try, just install bzr and you can easily > branch from me using: > bzr branch lp:~bartek/junk/biopython Hi, I was trying bazaar. These are the steps I did, can you check if I did everything correctly? - I have created an account on launchpad and uploaded an ssh key (on my home page, -> click on 'Profile', then 'Edit details', and then 'Ssh keys' - it costed me a bit to find it at first :) ). - On my computer, from a terminal, I have executed "bzr-launchpad-login " to login to launchpad. - I ran "bzr branch lp:~bartek/junk/biopython", to create a branch of your repository in my computer. - I did some stupid changes (I must have messed a bit with creating branches), and then committed them. So now, if I want to inform you of the changes I have made, how does it work? Which is the correct bazaar command to pull a merge request? - In the meantime, I have created an entry for my branch on launchpad. I went to my home page, clicked on 'Code', and then 'Register a new branch'. On the 'Reference Project' field, I couldn't find your project, only the biopython created a few years ago of which you were asking earlier. This is my branch: - https://code.launchpad.net/~dalloliogm-gmail/+junk/biopython-gio How do I link it to your repo, now? > > The branch history can be seen here: > https://code.launchpad.net/~bartek/+junk/biopython/ > > And the annotated source code is here: > http://bazaar.launchpad.net/~bartek/+junk/biopython/files > > The specific changes done by me can be seen as revisions: > http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3460 > http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.1 > http://bazaar.launchpad.net/~bartek/%2Bjunk/biopython/revision/3459.1.2 > > In summary, I think that it's doable to convert current CVS tree to bzr and > bzr handle the job of a DVCS. Performance is not stellar (epsecially code > browsing in launchpad is sometimes slow) but for it's acceptable, especially > given that I'm rarely browsing the history, and much more often use command > line tools which are (for me) fast enough. > > Please let me know what others think. If there will be general > interest in that, I > can try to set up a more permanent (but still experimental) bzr branch which > would be automatically synchronized from CVS, so that we can do a more > long-term experiment to see whether it works, and people like it. > > cheers > Bartek > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Sat Feb 14 18:47:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 18:47:30 +0000 Subject: [Biopython-dev] External python dependencies and doctests Message-ID: <320fb6e00902141047m6d71d977t946a018482313176@mail.gmail.com> Hi all, Currently the doctest handling in run_tests.py requires some special cases for those modules with an optional external dependency, for example the Bio.Statistics.lowess doctests will only work if NumPy is installed. We *could* just run all the doctests, and catch and ignore any import errors. However, an import error might be a real error in Biopython (e.g. if something was deleted or moved). This is therefore probably a bad idea. I was thinking we could introduce new exception(s) which subclasses both the ImportError and our MissingExternalDependencyError exception. This can then be treated as another variant of MissingExternalDependencyError and ignored by run_tests.py, plus as it is also an ImportError any third party scripts can continue to catch import errors as before. This means that run_tests.py doesn't need to know if some doctests require NumPy (or ReportLab) or not - we can just run them and find out (see patch below). The downside is that any bits of Biopython where we import numpy or reportlab (or at least those with doctests) would need to catch any import error and re-raise it (as below). I'm not sure if this is a good idea or not. It would certainly be useful if we want to switch to having the doctests found automatically (which is probably a good idea in the long run - the hand coded list was just my short term pragmatic solution). Peter Index: Bio/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/__init__.py,v retrieving revision 1.31 diff -r1.31 __init__.py 14a15,37 > > class MissingPythonDependencyError(MissingExternalDependencyError,ImportError) : > """Exception for missing python libraries. > > This should be used when "import numpy" or "import reportlab" fail. > This exception subclasses both the standard python ImportError, and > our Biopython MissingExternalDependencyError meaning it can be caught > using "except ImportError" or "except MissingExternalDependencyError". > This is important for our test framework. > """ > pass > > class MissingNumPyDependencyError(MissingPythonDependencyError) : > """Exception for when NumPy is not installed.""" > def __str__(self) : > return "This requires the Numerical Python library, NumPy, " + \ > "freely available from http://www.numpy.org" > > class MissingReportLabDependencyError(MissingPythonDependencyError) : > """Exception for when ReportLab is not installed.""" > def __str__(self) : > return "This requires the python library ReportLab, " + \ > "freely available from http://www.reportlab.org" Index: Bio/Statistics/lowess.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Statistics/lowess.py,v retrieving revision 1.10 diff -r1.10 lowess.py 23c23,27 < import numpy --- > try : > import numpy > except ImportError: > from Bio import MissingNumPyDependencyError > raise MissingNumPyDependencyError() Index: Bio/Cluster/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Cluster/__init__.py,v retrieving revision 1.13 diff -r1.13 __init__.py 1c1,6 < import numpy --- > try : > import numpy > except ImportError: > from Bio import MissingNumPyDependencyError > raise MissingNumPyDependencyError() > Index: Bio/Graphics/__init__.py =================================================================== RCS file: /home/repository/biopython/biopython/Bio/Graphics/__init__.py,v retrieving revision 1.2 diff -r1.2 __init__.py 2d1 < 7c6,7 < raise ImportError("Install ReportLab if you want to use Bio.Graphics. You can find ReportLab at http://www.reportlab.org/downloads.html") --- > from Bio import MissingReportLabDependencyError > raise MissingReportLabDependencyError() From bsouthey at gmail.com Sat Feb 14 20:12:49 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 14 Feb 2009 14:12:49 -0600 Subject: [Biopython-dev] run_tests.py rewrite In-Reply-To: <320fb6e00902140631n4b5b472bi9e3f8de0e3a8647@mail.gmail.com> References: <320fb6e00902130748y453c0965id99cbb36cb680ee6@mail.gmail.com> <743351.68664.qm@web62405.mail.re1.yahoo.com> <320fb6e00902140631n4b5b472bi9e3f8de0e3a8647@mail.gmail.com> Message-ID: On Sat, Feb 14, 2009 at 8:31 AM, Peter wrote: > On Sat, Feb 14, 2009 at 2:14 AM, Michiel de Hoon wrote: >> >> Currently, Numpy doesn't seem to work with python < 2.4, so for reliability >> maybe Biopython also should require python >= 2.4. >> > > What specifically are you refering to? I've not had any trouble with > older versions of numpy on python 2.3 - although I believe later > versions of numpy do require python 2.4+ (this must be stated on their > website somewhere). numpy versions 1.2 requires Python 2.4 and above. This was in the release notes but the web site has not been updated! Likewise there is no information on the lack of support for Python 2.6 for windows - which should be in the numpy 1.3 release (due to major issues of creating the binary installer). Bruce From biopython at maubp.freeserve.co.uk Sat Feb 14 21:32:00 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 14 Feb 2009 21:32:00 +0000 Subject: [Biopython-dev] External python dependencies and doctests In-Reply-To: <320fb6e00902141047m6d71d977t946a018482313176@mail.gmail.com> References: <320fb6e00902141047m6d71d977t946a018482313176@mail.gmail.com> Message-ID: <320fb6e00902141332m7bba0497g6650a883b86c1994@mail.gmail.com> On Sat, Feb 14, 2009 at 6:47 PM, Peter wrote: > Hi all, > > Currently the doctest handling in run_tests.py requires some special > cases for those modules with an optional external dependency, for > example the Bio.Statistics.lowess doctests will only work if NumPy is > installed. We *could* just run all the doctests, and catch and ignore > any import errors. However, an import error might be a real error in > Biopython (e.g. if something was deleted or moved). This is therefore > probably a bad idea. I've been thinking about the exception idea in my previous email, and maybe it is too complicated - it would be a hassle in the long term to have to manually add this catch ImportError and raise missing dependency code all over the place. An alternative would be to catch all ImportError exceptions in run_tests.py, and treat numpy and reportlab as special cases and skip those tests. Other ImportError cases would indeed be errors. This is basically what I suggested a while back on Bug 2524. http://bugzilla.open-bio.org/show_bug.cgi?id=2524 Perhaps this is better - it puts the special case code in one place only (run_test.py), meaning the our unit tests needing numpy or reportlab don't need to do anything special about raising a missing dependency error. This isn't a big issue for the unit tests, but for the doctests this is a significant benefit I think. [The missing external dependency exception is still useful for missing command line tools - although I'm not sure how best to cope with this in a doctest. See test_psw.py and test_wise.py for an example of this - they are basically doctests with a wrapper to determine if the dnal command line tool is installed.] Peter From bugzilla-daemon at portal.open-bio.org Sat Feb 14 22:06:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 14 Feb 2009 17:06:24 -0500 Subject: [Biopython-dev] [Bug 2754] Bio.PDB: Parse warnings should print to stderr, not stdout In-Reply-To: Message-ID: <200902142206.n1EM6O83011973@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2754 ------- Comment #9 from eric.talevich at gmail.com 2009-02-14 17:06 EST ------- (In reply to comment #8) Yes, something must be done with test_PDB.py, because I don't think warnings.warn can be made to play nice with that print-and-compare test -- or any print-and-compare, since the warning messages contain extra environment-specific information. The test suite I'm picturing looks like this: - Load the PDB file with permissive=0, verify the first PDBException. - Add a warning filter to silence the first message, retry loading, verify the next Exception. - Repeat for all the expected errors in the PDB file. - Silence warnings and load the PDB file with permissive=1; continue the usual print-and-compare tests. I think doctest can be coaxed into ignoring part of an output message with ellipses, and unittest might have an assertion for error messages or we could just catch the exception and check the message directly. So: either way, test_PDB.py gets a rewrite, and the example PDB file can stay the way it is. > For example, running the test from the command line the first message is: > PDBConstructionException: Atom N defined twice in residue resseq=2 icode= > at line 19. > Exception ignored. > > Is that correct or desired output? Yes, but warnings.warn prepends the absolute file path and the line number where the warning was raised (there's an option to make it look deeper in the stack, for catch-and-release cases like this one), so even if sys.stdout is assigned to sys.stderr, the text doesn't match exactly and the test fails. The important thing is that a PDBConstructionError is raised, and to be precise, that the message contains "defined twice", as I understand it. > The actual error is in my mind irrelevant although I do wonder why a special > exception is used. Two advantages for the user: 1. Tracebacks make it clear that there was a problem parsing the PDB file. Otherwise, it's a little unclear whether there's a problem in the user's code, a real bug in Biopython, or something wrong with the PDB file itself. 2. User code can catch a PDBConstructionException specifically and let other exceptions fall through, e.g. an IOError which could require different handling. > (In reply to comment #6) > There are a few cases of this so I think a separate bug should be filed. But > cleaning these up would be appreciated, at least by me. Cases of file-specific error handling, or sys.stderr/stdout abuse? Both sound like good cleanup tasks. In the case of __debug__ protection, it looks like normally Python executes with __debug__==True except when run with -O. Like turning off assertions, you know. Given that, and the simplicity of turning off warnings globally in user code (import warnings; warnings.simplefilter('ignore')), I think it's safe to remove these checks and just issue the warnings directly. For the other stunt in PDBParser, that seems like it deserves a separate patch at the very least, so I'm not going to attempt to resolve it in this bug unless it's breaking something else. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Feb 15 07:18:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 15 Feb 2009 02:18:49 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200902150718.n1F7InuB029333@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2009-02-15 02:18 EST ------- With this patch, the test_LogisticRegression.py unit test fails. Could you check that? Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn can store the value of llik on each call. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bartek at rezolwenta.eu.org Sun Feb 15 14:18:25 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Sun, 15 Feb 2009 15:18:25 +0100 Subject: [Biopython-dev] SVN migration and Launchpad mirroring In-Reply-To: <5aa3b3570902140827n55b210f4q49ed9d4b8b1c56fe@mail.gmail.com> References: <3f6baf360902061211o4da786b0q5f788efcc63e2bb1@mail.gmail.com> <3f6baf360902072220j5c565449i4c7266046051207f@mail.gmail.com> <5aa3b3570902080847p1a126664k4a76b7f19a0ed987@mail.gmail.com> <8b34ec180902081103r1befae9bt33e9024bd43f37fb@mail.gmail.com> <128a885f0902081134m255ec4eao21c75aaf08f9d8f5@mail.gmail.com> <499053F9.60709@gmail.com> <3f6baf360902091239v5988749cm1f48c21d2f19ca9b@mail.gmail.com> <4990A27A.9060500@gmail.com> <8b34ec180902100421o1680735dsd68d890d8ccfbf4f@mail.gmail.com> <5aa3b3570902140827n55b210f4q49ed9d4b8b1c56fe@mail.gmail.com> Message-ID: <8b34ec180902150618p40805703oa700f6d8acbe0aec@mail.gmail.com> Hi, On Sat, Feb 14, 2009 at 5:27 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I was trying bazaar. > > These are the steps I did, can you check if I did everything correctly? > I think so. > This is my branch: > - https://code.launchpad.net/~dalloliogm-gmail/+junk/biopython-gio > > How do I link it to your repo, now? > The problem is, that +junk branches cannot be proposed for merging in launchpad. see (https://help.launchpad.net/Code/PersonalBranches) If you don't have a launchpad project and don't want to setup one, you have two options: either you just send me a changeset (simiar to a patch). You use the command bzr send -o my_changeset_filename this generates a txt file with your changes andd you can just send them to me, so that I can merge them into my tree. The other option is to send me a link to your branch and I can pull from it (I can pull from +junk branches). In order to have all the functionality of merge proposals, code review etc. we need a launchpad project. I created one: https://launchpad.net/biopython-test for the purpose of testing. I've added you (giovanni) to the team of maintainers of the project. I also created two branches and requested one of them to be merged with the other. If now anyone pushes his branch to a proper place: lp:~username/biopython-test/my_branch_name It can be proposed for merging into biopython-test. Branches pushed to the project directory directly: (e.g. lp:biopython-test/trunk) have write permissions for all team-members. If anyone wants to give it a try, please join the biopython-test team on launchpad (you'll need a launchpad account). cheers Bartek From dalloliogm at gmail.com Sun Feb 15 15:29:53 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Sun, 15 Feb 2009 16:29:53 +0100 Subject: [Biopython-dev] biopython on github Message-ID: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> Hi, I have uploaded a git-converted branch of biopython on github, in case you want to try it and see how it works. You can find it here: - http://github.com/biopython/biopython/ To work with it, the optimal protocol is: - create an account on github.com. Upload an ssh public key by clicking on 'account' after having logged in. It is not mandatory to use github, but it will help you understanding how git works, and it allows other people to follow your branches and your work. - go to the biopython repo: http://github.com/biopython/biopython/tree/master and you will see a button named 'Fork': click on it. It will create a fork of the official biopython repository your personal account. Here the word 'fork' is not used in the common way it is, but just to indicate that you are going to work on a modified version of the official code, and it's not even a git command. - now, install git on your computer, and execute the following commands: $: git clone git at github.com:/biopython.git $: git remote add official_dist git://github.com/biopython/biopython.git With the first command, you will download a copy of the repository on your local computer, which will be the one you will modify (technically, you are creating a new branch on your computer). With the second command, you are adding a reference to the official biopython repository, so in the future you will be able to easily import the official code and compare it with yours. Here it is an explanation on these two commands: http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo p.s.: to convert to git from cvs I have followed the instructions here: - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/cvs-migration.html This seems to be a good tutorial on git, too: - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Feb 16 13:00:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 08:00:53 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200902161300.n1GD0rep000706@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 andrea at biodec.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |andrea at biodec.com ------- Comment #7 from andrea at biodec.com 2009-02-16 08:00 EST ------- (In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #3) > > > > > > What versions of biopython and the BioSQL schema are you using? > > > > > > Cymon > > > > According to the bug report, Stephen was using Biopython 1.49, so: > > > > Stephen: > > Biopython 1.49 > > postgresql 8.2 > > BioSQL - schema version unspecified > > psycopg2 - version unspecified > > python - version unspecified > > OS - Mac OS X > > > > What about you Cymon - you have postgresql with psycopg2 working, but what > > versions of things? > > > > Peter > > > > Peter, > > I'm using: > Biopython: CVS > Posgresql: 8.1.11 > BioSQL: 1.0.1 > Python: 2.5.2 > Psycopg: 2.0.8 > OS: Red Hat Enterprise 5.3 > > C. > Hi, the problem, according to me, is already solved. It seems that Stephen has an old version of Loader.py. I submitted a bug and patch that explain that for Postgres is not possible to have double quotes in queries ("). Double quotes are reserved to Column names. In the correct Loader.py version everything is corrected and there aren't double quotes in any queries at all. Stephen, please check if: Loader.DatabaseLoader._get_seqfeature_dbxref is equivalent to: def _get_seqfeature_dbxref(self, seqfeature_id, dbxref_id, rank): """ Check for a pre-existing seqfeature_dbxref entry with the passed seqfeature_id and dbxref_id. If one does not exist, insert new data """ # Check for an existing record sql = r"SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref " \ r"WHERE seqfeature_id = '%s' AND dbxref_id = '%s'" result = self.adaptor.execute_and_fetch_col0(sql, (seqfeature_id, dbxref_id)) # If there was a record, return without executing anything, else create # the record and return if result: return result return self._add_seqfeature_dbxref(seqfeature_id, dbxref_id, rank) maybe in your version, there are still double quotes ("%s") instead of single quotes ('%s') Andrea -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 16 13:24:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 08:24:53 -0500 Subject: [Biopython-dev] [Bug 2734] db.load problem with postgresql and psycopg2 In-Reply-To: Message-ID: <200902161324.n1GDOrJE003880@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2734 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-16 08:24 EST ------- (In reply to comment #7) > Hi, > the problem, according to me, is already solved. > It seems that Stephen has an old version of Loader.py. Well spotted Andrea - you may be right... > I submitted a bug and patch that explain that for Postgres > is not possible to have double quotes in queries ("). > Double quotes are reserved to Column names. > > In the correct Loader.py version everything is corrected > and there aren't double quotes in any queries at all. > ... Andrea is referring to Bug 2506, which was fixed in Loader.py CVS revision 1.31, which means it was included in Biopython 1.46 onwards. Stephen said he was using Biopython 1.49, the error may be an out of date Loader.py which is still using double quotes: Quoting comment #0, > ... > pq_execute: executing SYNC query: > SELECT seqfeature_id, dbxref_id FROM seqfeature_dbxref WHERE seqfeature_id > = "3" AND dbxref_id = "6" > pq_execute: entering syncronous DBAPI compatibility mode > pq_fetch: pgstatus = PGRES_FATAL_ERROR > pq_fetch: uh-oh, something FAILED > pq_fetch: fetching done; check for critical errors > psyco_curs_execute: res = -1, pgres = 0x0 Certainly the SQL command shown in the pg log has double quotes. > Traceback (most recent call last): > ... > File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 645, in > _load_seqfeature_dbxref > self._get_seqfeature_dbxref(seqfeature_id, dbxref_id, rank+1) > File "/Library/Python/2.5/site-packages/BioSQL/Loader.py", line 679, in > _get_seqfeature_dbxref > dbxref_id)) > File "/Library/Python/2.5/site-packages/BioSQL/BioSeqDatabase.py", line 295, > in execute_and_fetch_col0 > self.cursor.execute(sql, args or ()) According to that traceback, line 679 in _get_seqfeature_dbxref is excuting some bad SQL (presumably the double quotes). This line number doesn't match up for Biopython 1.49, so it probably is an older version of Biopython. Stephen - maybe you have more than one copy of Biopython installed (e.g. and old system level copy, and a new local copy)? You could try deleting these directories and then reinstalling Biopython: /Library/Python/2.5/site-packages/Bio /Library/Python/2.5/site-packages/BioSQL Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 16 16:05:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 11:05:27 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200902161605.n1GG5RLr010326@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #3 from bsouthey at gmail.com 2009-02-16 11:05 EST ------- (In reply to comment #2) > With this patch, the test_LogisticRegression.py unit test fails. > Could you check that? Yes it fails because the test example does not convergence with the defaults (try the example is R or SAS) and, thus, does not provide a valid check for logistic regression. > > Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn > can store the value of llik on each call. I guess this is all how you define the purpose of the update_fn function. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From dalloliogm at gmail.com Mon Feb 16 16:40:03 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 16 Feb 2009 17:40:03 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> Message-ID: <5aa3b3570902160840p41948844tfe73b51cf37e6a7@mail.gmail.com> On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I have uploaded a git-converted branch of biopython on github, in case > you want to try it and see how it works. So, yesterday Bartek and me have tried github a bit, and we both have done some test commits to our personal development branches. If you go here: - http://github.com/biopython/biopython/network you will see the network of all the changes we made each and the differences between the various branches. The application that creates the diagram tries to minimize the number of branches shown: so maybe you won't see my branch or Bartek's if one of the two can be included in the other. If you create your own branch, and later other people commit other changes on other forks of the same project, you will have an utility to list all these changes directly from github. It will look like this: - http://img8.imageshack.us/img8/4194/biopythonforkqueueod5.png So, in principle these are the most useful features that I think github offers and I couldn't find in other similar softwares (e.g. trac). On the other side, github has some disadvantages: it is a commercial product, and it has no specific tool to integrate it with a bug tracker. -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Mon Feb 16 19:06:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 14:06:55 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200902161906.n1GJ6t3M022304@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #10 from bsouthey at gmail.com 2009-02-16 14:06 EST ------- (In reply to comment #9) > Created an attachment (id=1212) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1212&action=view) [details] > Patch to Bio/MaxEntropy.py to make the convergence parameters optional > arguments > > This time its the whole patch - sorry for the extra emails this has triggered. > I had stopped to check in a couple of docstring changes and fixed a few tabs in > MaxEntropy.py first, which confused things. > > Note this is a bit different to what I was thinking in comment #5, > > ... something like this: > > > > def train(training_set, results, feature_fns, update_fn=None, > > max_iis_iterations = MAX_IIS_ITERATIONS, > > iis_convere = IIS_CONVERGE, > > max_newton_iterations = MAX_NEWTON_ITERATIONS > > newton_coverage = NEWTON_CONVERGE): > > The above code won't pick up changes to the module level variables like > MAX_IIS_ITERATIONS because the defaults are only evaluated once when the > function is created. My patch removed these hard coded default values and placed them in the function. >The patch deals with this as follows: > > def train(training_set, results, feature_fns, update_fn=None, > max_iis_iterations=None, iis_converge=None, > max_newton_iterations=None, newton_converge=None): > if max_iis_iterations is None : > max_iis_iterations = MAX_IIS_ITERATIONS > if iis_converge is None : > iis_converge = IIS_CONVERGE > if max_newton_iterations is None : > max_newton_iterations = MAX_NEWTON_ITERATIONS > if newton_converge is None : > newton_converge = NEWTON_CONVERGE > > This works :) > I hate the use of the local variable being the lowercase version of another variable. Obviously for the original variables we are stuck with uppercase for backwards compatibility. So we need to change the names of the lowercase variables. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Feb 16 23:14:53 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 16 Feb 2009 18:14:53 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200902162314.n1GNEr9P028162@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-16 18:14 EST ------- (In reply to comment #10) > > My patch removed these hard coded default values and placed them in the > function. > Yes - and in doing so it broke the existing API. My way preserves the ability to alter these module level variables as another way to control the funtion. We could describe these old upper case module level variables as obsolete in the docstring as a step to phasing them out. > >The patch deals with this as follows: > > > > def train(training_set, results, feature_fns, update_fn=None, > > max_iis_iterations=None, iis_converge=None, > > max_newton_iterations=None, newton_converge=None): > > if max_iis_iterations is None : > > max_iis_iterations = MAX_IIS_ITERATIONS > > if iis_converge is None : > > iis_converge = IIS_CONVERGE > > if max_newton_iterations is None : > > max_newton_iterations = MAX_NEWTON_ITERATIONS > > if newton_converge is None : > > newton_converge = NEWTON_CONVERGE > > > > This works :) > > > I hate the use of the local variable being the lowercase version of another > variable. Obviously for the original variables we are stuck with uppercase for > backwards compatibility. So we need to change the names of the lowercase > variables. Hate is a rather strong word. I can see that having the same name except for the case could confuse some people if they are not used to case mattering, but otherwise using the same name seems like a GOOD idea to me for consistency. Do you have any concrete suggestions? We could expand ISS into words. On a related point, using a lower case N for Newton feels a bit wrong to me ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Feb 17 13:22:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Feb 2009 08:22:57 -0500 Subject: [Biopython-dev] [Bug 2762] New: GFF capability in SeqIO Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2762 Summary: GFF capability in SeqIO Product: Biopython Version: 1.49b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk I'm increasingly coming across GFF format files, and SeqIO currently can't handle them. It might be useful if at some point in the future, it could. Also, the Bio.GFF module handles access to a database, and doesn't provide a mechanism for importing or writing GFF format files. I'm not sure that there is currently any facility to handle this format in Biopython. There are at least two variants of the GFF format that I've seen in use... GFF2 is the one I'm working with at the moment, and its specification is here: http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml I've come across GFF3 in other contexts, and it is defined here: http://www.sequenceontology.org/gff3.shtml Note that GFF3 is similar to GenBank files in that it may explicitly describe both sequence features, and the sequence itself (potentially for multiple sequences). GFF2 has the potential for this in the specification for the Comments section, which includes a recommended syntax for defining sequences to which the features refer, although that spec makes the reasonable assumption that you would be able to obtain the sequence from elsewhere, knowing the sequence ID from the GFF file. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 18 13:54:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Feb 2009 08:54:51 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902181354.n1IDsp8m007943@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-18 08:54 EST ------- This looks like a nice idea, and possible with some simplifications to match our existing object scheme. For example, the current SeqRecord and SeqFeature classes do not let us explicitly define parent (part-of) relationships between SeqFeature objects (e.g. GFF3 examples where a CDS has a parent mRNA, or an exon may have multiple parent mRNAs). We do have the idea of sub-features, but this only allows a single parent and thus won't work here. This parent information could be recorded as just another SeqFeature qualifier dictionary entry. P.S. It is nice to see there is an online GFF3 validator :) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 18 14:04:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Feb 2009 09:04:28 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902181404.n1IE4SB7010286@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #2 from dalloliogm at gmail.com 2009-02-18 09:04 EST ------- These are the class of things for which I think it would be useful to have a common repository of use cases with the other bio.* projects. I have seen people using every possible extension and modification of gff, and usually re-writing a new gff parser for each case. If you can, you should ask to the maintainers and the other bio.* projects and make your patch as much compatible with their. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Feb 18 14:34:35 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 18 Feb 2009 09:34:35 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902181434.n1IEYZEX016913@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-18 09:34 EST ------- (In reply to comment #2) > If you can, you should ask to the maintainers and the other bio.* projects and > make your patch as much compatible with their. I'm well aware of one very practical issue regarding compatibility between the Bio* projects, which is for BioSQL. Ideally regardless of which Bio* toolkit you use to load a sequence file into a BioSQL database, they should all record the information in the same way. See http://lists.open-bio.org/pipermail/biosql-l/2009-February/001492.html for a discussion of how GFF files should/could be stored in a BioSQL database. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 08:49:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 03:49:40 -0500 Subject: [Biopython-dev] [Bug 2762] GFF capability in SeqIO In-Reply-To: Message-ID: <200902190849.n1J8neuO016523@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2762 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-02-19 03:49 EST ------- (In reply to comment #1) > This looks like a nice idea, and possible with some simplifications to match > our existing object scheme. For example, the current SeqRecord and SeqFeature > classes do not let us explicitly define parent (part-of) relationships between > SeqFeature objects (e.g. GFF3 examples where a CDS has a parent mRNA, or an > exon may have multiple parent mRNAs). We do have the idea of sub-features, but > this only allows a single parent and thus won't work here. This parent > information could be recorded as just another SeqFeature qualifier dictionary > entry. I'm not sure that these relationships would need to complicate the SeqFeature class model at all, and agree that the attribute tags indicating Parenthood (in the sense of CDS having parent mRNA, as opposed to the SeqRecord/SeqFeature parent-child relationship) could potentially be treated just as SeqFeature.qualifiers attributes. The possibility of multiple parents (in general, membership of more than one group) in GFF3 lends itself well to the existing list representation of qualifiers. I may be wrong but I think that at least some, if not all, of the relationships you might be worried about (for example, those in your linked post to the BioSQL list) are well-defined within the SOFA ontology. So, for example, a BioSQL database with properly-configured SOFA ontology, and properly-defined relationships, could be used to infer those parent-child relationships on the basis of the corresponding term_ids. I don't think that's a behaviour we need to expect from the SeqRecord/SeqFeature class models. Where possible, those relationships could be rebuilt by another function, or package, so long as the SeqFeature object correctly records those descriptions as SOFA terms in the qualifier (or implicitly uses the SOFA ontology when depositing in a database - but that's another enhancement request ;)), I'm not sure that this needs to complicate the SeqFeature class model either. (That said, maybe somewhere down the line there's a role for SQLite in handling that sort of behaviour 'on-the-fly'...) I may have misunderstood, but I think that this is still the same sort of general arrangement that is already the case for GenBank file. When loading, say, a bacterial chromosome, SeqRecord.seq gets the chromosome sequence, and the gene, CDS, and various misc_features for a single gene are imported as - essentially - independent features. We can unite them, after the fact, the by gene name, or locus_tag, or some other attribute, which is essentially the same kind of operation as uniting a CDS with its parent gene via the SOFA ontology and the Parent tag for upload into a SOFA-compliant instance of BioSQL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Feb 19 10:25:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Feb 2009 10:25:46 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> Message-ID: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> On Wed, Oct 1, 2008 at 4:29 PM, Peter wrote: > Peter wrote: >> From a quick look at approach taken in the matplotlib >> code, we could add something like this to setup.py >> >> __version__ = "Undefined" >> for line in open('Bio/__init__.py'): >> if (line.startswith('__version__')): >> exec(line.strip()) >> >> setup( >> name='biopython', >> version=__version__, >> author='The Biopython Consortium', >> ... >> >> I'm happy to deal with this if we are agreed that we >> should add a __version__ to Bio/__init__.py >> (variations on the naming are possible, but this seems >> to be a de-facto standard in python libraries). > > Any objections to making this change now? > > Peter > Since this thread last year, there have been no objections. Following a recent question on the main mailing list about how to determine the version of Biopython this seems worth doing before the next release. Again, an objections or comments on the implementation details? Otherwise I'll make this change shortly. Peter From bsouthey at gmail.com Thu Feb 19 14:44:54 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 19 Feb 2009 08:44:54 -0600 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> Message-ID: <499D7066.6090309@gmail.com> Peter wrote: > On Wed, Oct 1, 2008 at 4:29 PM, Peter wrote: > >> Peter wrote: >> >>> From a quick look at approach taken in the matplotlib >>> code, we could add something like this to setup.py >>> >>> __version__ = "Undefined" >>> for line in open('Bio/__init__.py'): >>> if (line.startswith('__version__')): >>> exec(line.strip()) >>> >>> setup( >>> name='biopython', >>> version=__version__, >>> author='The Biopython Consortium', >>> ... >>> >>> I'm happy to deal with this if we are agreed that we >>> should add a __version__ to Bio/__init__.py >>> (variations on the naming are possible, but this seems >>> to be a de-facto standard in python libraries). >>> >> Any objections to making this change now? >> >> Peter >> >> > > Since this thread last year, there have been no objections. Following > a recent question on the main mailing list about how to determine the > version of Biopython this seems worth doing before the next release. > Again, an objections or comments on the implementation details? > Otherwise I'll make this change shortly. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > Hi, Yes, version information must be included! I like numpy's approach because the version will display a svn-related number when using a developmental version. It is rather clever because the real magic occurs with distutils to found the actual svn version (see _get_svn_revision function in the distutils/misc_util.py file). But I do not know if the same tricks would apply to cvs. So the one thing I would ask for is that the __version__ gets changed immediately after a release so it is clear if you are using an official release or a cvs version. I know that will be a little extra burden on the release maintainer. Bruce From biopython at maubp.freeserve.co.uk Thu Feb 19 15:07:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 19 Feb 2009 15:07:30 +0000 Subject: [Biopython-dev] determining the version In-Reply-To: <499D7066.6090309@gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> <320fb6e00810010929y4dab07a5ya25767cc0818654d@mail.gmail.com> <320fb6e00902190225o34092311saddf02ec39f1e1dd@mail.gmail.com> <499D7066.6090309@gmail.com> Message-ID: <320fb6e00902190707q57677756o71249ad8e12298d0@mail.gmail.com> On Thu, Feb 19, 2009 at 2:44 PM, Bruce Southey wrote: > > Hi, > Yes, version information must be included! > > I like numpy's approach because the version will display a svn-related > number when using a developmental version. It is rather clever because the > real magic occurs with distutils to found the actual svn version (see > _get_svn_revision function in the distutils/misc_util.py file). > > But I do not know if the same tricks would apply to cvs. So the one thing I > would ask for is that the __version__ gets changed immediately after a > release so it is clear if you are using an official release or a cvs > version. I know that will be a little extra burden on the release > maintainer. Given we probably will be moving from CVS to SVN shortly, there doesn't seem to be much point in setting up any "magic" at this point in time. We already manually update the version number in setup.py as part of the build process, so with this change we'll just have to update Bio/__init__.py instead. Peter From bugzilla-daemon at portal.open-bio.org Thu Feb 19 17:37:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:37:56 -0500 Subject: [Biopython-dev] [Bug 2767] New: Bio.SeqIO support for FASTQ and QUAL files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Summary: Bio.SeqIO support for FASTQ and QUAL files Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk This is an enhancement bug for adding support to Bio.SeqIO for two commonly used file formats for storing sequencing quality information (i.e. error rates). The format FASTQ (or FastQ) contains both sequences and PHREP style quality scores. This file format appears to have been introduced at the Sanger Centre, but there is no official specification that I am aware of. I would suggest for Bio.SeqIO we call this format "fastq" (as in BioPerl). See: http://maq.sourceforge.net/fastq.shtml http://www.bioperl.org/wiki/FASTQ_sequence_format Also note that Solexa/Illumina sequencers can produce FASTQ-like files which use a different score mapping and are therefore cannot be treated in the same way. These would have to be treated as a different file format (e.g. Bio.SeqIO format name "fastq-solexa" might do). QUAL or qual files do not contain sequences but just the PHREP style quality score. Roche 454 sequencers also appear to use this style file (see also Bug 2382), where again I believe that PHREP style scores are used. Because they don't hold the actual sequence, Qual files normally come with a matching FASTA file containing the sequence for each entry (in the same order within the file). I would suggest we call this the "qual" format in Bio.SeqIO (to match BioPerl). See: http://www.bioperl.org/wiki/Qual_sequence_format http://www.cees.uio.no/research/facilities/roche454/resultsfiles.html I will attach a preliminary set of code to support this shortly. For the "qual" format Bio.SeqIO would return SeqRecord objects without any sequence (perhaps as None, although we do know the sequence length...). For both the "qual" and "fastq" formats the SeqRecord object would need to store the PHRED quality scores, ideally as a list of integers. Where we put this information is open to debate. The simple option is to just add the list of integers to the annotation dictionary, perhaps under key name "phred_quality" (with "solexa_quality" used when parsing a Solexa/Illumina style FASTQ file). This will then work with BioSQL (although the qualities will get stored in the database as strings rather than integers). However, this does not facilitate slicing a SeqRecord (i.e. it would make implementing enhancement Bug 2507 much harder). In order to use a paired "fasta" and "qual" file you might do this: def merge_fasta_qual(fasta_record, qual_record) : """Modifies the fasta_record in place, and also returns it.""" assert fasta_record.id == qual_record.id assert len(f_rec) == len(q_rec.annotations["phred_quality"]) f_rec.annotations["phred_quality"] = q_rec.annotations["phred_quality"] return f_rec from Bio import SeqIO records = [merge_fasta_qual(f_rec, q_rec) for (f_rec, q_rec) in \ zip(SeqIO.parse(open("example.fasta"), "fasta"), SeqIO.parse(open("example.qual"), "qual"))] I think it would probably make sense to offer this kind of functionality in the Bio.SeqIO.QualityIO module itself, as this code above has several draw backs (e.g. the zip makes a list in memory, rather than a generator). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 17:42:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:42:08 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191742.n1JHg8H2017714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:42 EST ------- Created an attachment (id=1244) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1244&action=view) Read/write support for FASTQ and QUAL files, using the annotation dict This patch stores the PHRED qualities as a list of integers in the SeqRecord's annotations dictionary. Changing this to use say a property, or a separate per-letter-annotation dictionary should be trivial. For QUAL files, the SeqRecord's seq is set to None. This requires a few changes to test_SeqIO.py which does not expect this. We could also consider introducing an UnknownSeq object (giving it a character like "?", "N", or "X", an alphabet, and a length). This would have a __str__ output like "?"*length. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 17:44:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:44:40 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191744.n1JHiepx018401@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:44 EST ------- Created an attachment (id=1245) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1245&action=view) Patch for Tests/test_SeqIO.py and Bio/SeqIO/__init__.py This patches Bio/SeqIO/__init__.py to define "fastq" and "qual" as input and output file formats. It also patches Tests/test_SeqIO.py to include a couple of FASTQ and QUAL files, and cope with None as a SeqRecord's seq. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 17:47:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:47:52 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191747.n1JHlqFF019314@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:47 EST ------- Created an attachment (id=1246) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1246&action=view) ZIP file of plain text FASTQ and QUAL files for the unit tests These are used in the proposed Bio.SeqIO.QualityIO doctests (attachment 1244), and in the modified test_SeqIO.py file (see patch in attachment 1245). These example files should go in a new folder, Tests/Quality -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 17:54:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 12:54:16 -0500 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200902191754.n1JHsGd3021099@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 12:54 EST ------- Enhancement Bug 2767 would add support to Bio.SeqIO for the FASTA like QUAL file format used by both Sanger and Roche. See: http://www.bioperl.org/wiki/Qual_sequence_format http://www.cees.uio.no/research/facilities/roche454/resultsfiles.html This would not solve Jared's original request for a generic FASTA like parsing framework - although it would solve the particular example of dealing with a pair of FASTA and QUAL files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 18:09:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 13:09:13 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902191809.n1JI9Doq024787@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 13:09 EST ------- (In reply to comment #0) > In order to use a paired "fasta" and "qual" file you might do this: > > def merge_fasta_qual(fasta_record, qual_record) : > """Modifies the fasta_record in place, and also returns it.""" > assert fasta_record.id == qual_record.id > assert len(f_rec) == len(q_rec.annotations["phred_quality"]) > f_rec.annotations["phred_quality"] = q_rec.annotations["phred_quality"] > return f_rec > > from Bio import SeqIO > records = [merge_fasta_qual(f_rec, q_rec) for (f_rec, q_rec) in \ > zip(SeqIO.parse(open("example.fasta"), "fasta"), > SeqIO.parse(open("example.qual"), "qual"))] > > I think it would probably make sense to offer this kind of functionality in > the Bio.SeqIO.QualityIO module itself, as this code above has several draw > backs (e.g. the zip makes a list in memory, rather than a generator). Alternatively, if you have enough RAM to hold all the records in memory at once, then a simple dictionary approach using just Bio.SeqIO methods would also work. This was inspired by Jared's related example at the end of Bug 2382 comment 0. >>> from Bio import SeqIO >>> reads = SeqIO.to_dict(SeqIO.parse(open("Quality/example.fasta"), "fasta")) >>> for record in SeqIO.parse(open("Quality/example.qual"), "qual") : ... reads[record.id].annotations["phred_quality"] = record.annotations["phred_quality"] You can then access any record by its key, and get both the sequence and the quality scores. >>> print reads["EAS54_6_R1_2_1_540_792"].format("fastq") @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 This is neat, but given QUAL files are often very very large, wanting to use an iterator may be more typical. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 18:19:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 13:19:49 -0500 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200902191819.n1JIJndd026934@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #15 from jflatow at northwestern.edu 2009-02-19 13:19 EST ------- Created an attachment (id=1247) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1247&action=view) command line tool/library for sifting through and transforming complex (multiline) records I should have posted this earlier. Basically this is the strategy I have been taking for dealing with complex record types (multiline), so that you can filter them the way you would with sed/awk or other stream editing tools. This is far from perfect, and it may not be helpful, but perhaps it will give a better idea of what I was picturing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Feb 19 18:48:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 19 Feb 2009 13:48:46 -0500 Subject: [Biopython-dev] [Bug 2382] Generic Roche or GSFlex "FASTA" parser In-Reply-To: Message-ID: <200902191848.n1JImkmY030625@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2382 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-19 13:48 EST ------- (In reply to comment #15) > Created an attachment (id=1247) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1247&action=view) [details] > command line tool/library for sifting through and transforming complex > (multiline) records > > I should have posted this earlier. Basically this is the strategy I have been > taking for dealing with complex record types (multiline), so that you can > filter them the way you would with sed/awk or other stream editing tools. This > is far from perfect, and it may not be helpful, but perhaps it will give a > better idea of what I was picturing. I just had a quick look, and in someways it reminds me of the Martel/Mindy deprecated parsing infrastructure Biopython used to use. This was very flexible, perhaps too flexible as it had quite a learning curve - plus it didn't scale well with large records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 08:34:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 03:34:42 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902200834.n1K8YgK2002690@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #5 from jblanca at btc.upv.es 2009-02-20 03:34 EST ------- Regarding where to store the quality information in the SeqRecord I'm in faor of using a property named .qual or .quality. That is consistent with the actual .seq property. I think this approach is cleaner, and it is very easy to implement and to understand. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Feb 20 11:15:57 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 20 Feb 2009 11:15:57 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? Message-ID: <320fb6e00902200315q19c4dfebr8502a052a1a4fc9b@mail.gmail.com> Over on enhancement Bug 2767, I have uploaded parsers and writers for the FASTQ and QUAL file format, which both hold PHRED style quality scores (integers ranging from 0 to about 90). See http://bugzilla.open-bio.org/show_bug.cgi?id=2767 One open question in this enhancement is how to store these PHRED quality scores in the SeqRecord. Keep in mind that there is more than one type of quality score in use, for example Solexa/Illumina use a different scaling (although it is possible to map between them without too much trouble for the mid range scores), something I hadn't noticed when we last talked abut this (Sept 2008). See: http://lists.open-bio.org/pipermail/biopython-dev/2008-September/004250.html For the initial code on Bug 2767, I took the simple and extensible route of recording the PHRED qualities as a list of integers in the SeqRecord's annotation dictionary under the key "phred_quality". There are a couple of drawbacks. Firstly, sequencing qualities are a good example of per-letter-annotation (others include secondary structure, atomic coordinates - which would apply to proteins as well as nucleotides). If we want to be able to slice a SeqRecord (Bug 2507) then it is important to distinguish between general annotation (like the source species) and per-letter-annotation (which should also be sliced). One way of dealing with this is to introduce a per-letter-annotation dictionary for the SeqRecord, whose entries would be strings/lists with a length equal to that of the sequence. Secondly, putting the PHRED qualities inside an annotations dictionary (or even a per-letter-annotation dictionary) doesn't make them very accessible. If you are wanting to work with sequencing reads, then the sequence, quality and identifier are all key properties. In bug 2767 comment #5 Jose wrote: > Regarding where to store the quality information in the SeqRecord I'm in > favor of using a property named .qual or .quality. That is consistent with > the actual .seq property. I think this approach is cleaner, and it is very > easy to implement and to understand. I can certainly appreciate that a top level property is easier to use - and perhaps quality scores are important enough to justify this. However, what about PHRED qualities versus Solexa/Illumina qualities, or another sequencing system's scheme? I hadn't thought about this incompatibility when we were discussion this on the mailing list last year (Sept 2008). I suppose you could consider adding a .phred_quality property which is explicit, but then you'd end up with many different properties. Then there are other per-letter quality annotations - you might want the A, C, G and T intensity from capillary sequencing (four sets of numbers, not just one). Plus of course this doesn't address non-quality related per-letter-annotations (like secondary structure, or atomic coordinates). My point is that if we can't give top level properties to everything, hence the original introduction of the annotations dictionary in the first place. Only a handful of really important things got their own properties (id, name, description and the sequence itself). If there was only ONE key quality score, then I wouldn't mind making an exception so much - but that doesn't seem to be the case. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 20 11:17:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 06:17:44 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902201117.n1KBHiLG024083@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 06:17 EST ------- In comment #5 Jose wrote: > Regarding where to store the quality information in the SeqRecord I'm in > favor of using a property named .qual or .quality. That is consistent with > the actual .seq property. I think this approach is cleaner, and it is very > easy to implement and to understand. I can appreciate that a top level property is easier to use - and perhaps quality scores are important enough to justify this. However, what about PHRED qualities versus Solexa/Illumina qualities, or another sequencing system's scheme? I've replied in more depth on the mailing list, where I suggest we discuss this: http://lists.open-bio.org/pipermail/biopython-dev/2009-February/005340.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Fri Feb 20 11:49:36 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 20 Feb 2009 12:49:36 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902200315q19c4dfebr8502a052a1a4fc9b@mail.gmail.com> References: <320fb6e00902200315q19c4dfebr8502a052a1a4fc9b@mail.gmail.com> Message-ID: <200902201249.36743.jblanca@btc.upv.es> > I suppose you could consider adding a .phred_quality > property which is explicit, but then you'd end up with many different > properties. Then there are other per-letter quality annotations - you > might want the A, C, G and T intensity from capillary sequencing (four > sets of numbers, not just one). Plus of course this doesn't address > non-quality related per-letter-annotations (like secondary structure, > or atomic coordinates). > > My point is that if we can't give top level properties to everything, > hence the original introduction of the annotations dictionary in the > first place. Only a handful of really important things got their own > properties (id, name, description and the sequence itself). If there > was only ONE key quality score, then I wouldn't mind making an > exception so much - but that doesn't seem to be the case. That's a very good point. It wouldn't be wise to populate the SeqRecord class with a lot of properties. Another posible approach would be to create a derived class for that a SeqWithQuality. It would be like a SeqRecord but with a .quality property. For other cases other classes could be derived from SeqRecord. The problem with putting the quatilies in a dict with all the other per base annotation is that it has a different behaviour than the .seq case. The seq case is special because is much more used, so maybe that's fair enough. I don't know, maybe it is wiser to set all the per case annotations in a dict a let the sequence outside. In that way we won't be creating a lot of new classes derived from SeqRecord. The more I think about the dict possibility, the more I like it. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From lpritc at scri.ac.uk Fri Feb 20 14:15:50 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Fri, 20 Feb 2009 14:15:50 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <200902201249.36743.jblanca@btc.upv.es> Message-ID: Another 2p... I collect them, you know... An additional determinant of how these values are best scored is: "What will they be used for?". If the only use they would ever find was to accompany a sequence so that its file format could be converted from one with embedded qualities to a format that required two such files (or vice-versa), then straightforward storage as a string in a dictionary is all that's needed. This would be sufficient for conversion between some quality scores, as a utility function could just grab the stored string (given an appropriate name for each quality format). The question of how these per-symbol annotations would be modified when returning a Seq slice or join may be an issue. If 'live' access to the values is required for calculation or alignment purposes, then a different interface might be more useful, permitting slicing, base selection on the basis of quality, or other operation. This use case is more complex, as the return value is likely to be dependent on the quality format (single- or multiple-value per base). Conceptually, I see quality scores as annotations of a sequence, rather than an intrinsic property of the sequence, so am happy for them to live in the same place other annotations do. I also see them as only one instance of a class of per-symbol annotations (along with hydrophobicity scores, secondary structure predictions, read map counts and several other measures). I think, therefore, that there is a case for a class describing per-symbol annotations to a Seq, and placing these in a dictionary of per-symbol annotations. Slices of the parent Seq could then be propagated downwards to all members of that dictionary (which would also be expected to implement the same string-like methods as the parent). The per-symbol annotation objects could be subclassed and/or contain a descriptive string from a controlled vocabulary to indicate their format, for standard interfacing with external packages (e.g. Drawing TOPS diagrams from secondary structure predictions or rendering base quality profiles), which I think would be a flexible approach. On 20/02/2009 11:49, "Jose Blanca" wrote: >> I suppose you could consider adding a .phred_quality >> property which is explicit, but then you'd end up with many different >> properties. Then there are other per-letter quality annotations - you >> might want the A, C, G and T intensity from capillary sequencing (four >> sets of numbers, not just one). Plus of course this doesn't address >> non-quality related per-letter-annotations (like secondary structure, >> or atomic coordinates). >> >> My point is that if we can't give top level properties to everything, >> hence the original introduction of the annotations dictionary in the >> first place. Only a handful of really important things got their own >> properties (id, name, description and the sequence itself). If there >> was only ONE key quality score, then I wouldn't mind making an >> exception so much - but that doesn't seem to be the case. > That's a very good point. It wouldn't be wise to populate the SeqRecord class > with a lot of properties. > Another posible approach would be to create a derived class for that a > SeqWithQuality. It would be like a SeqRecord but with a .quality property. > For other cases other classes could be derived from SeqRecord. > The problem with putting the quatilies in a dict with all the other per base > annotation is that it has a different behaviour than the .seq case. The seq > case is special because is much more used, so maybe that's fair enough. > I don't know, maybe it is wiser to set all the per case annotations in a dict > a let the sequence outside. In that way we won't be creating a lot of new > classes derived from SeqRecord. > The more I think about the dict possibility, the more I like it. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Fri Feb 20 16:01:06 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:01:06 -0500 Subject: [Biopython-dev] [Bug 2768] New: Bio.Entrez under a proxy Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2768 Summary: Bio.Entrez under a proxy Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com I think you should add, in biopython's tutorial, a short explanation on how to setup a proxy for modules like Bio.Entrez. I have tried a simple query with entrez, but the first time I have received this error: $: ipython >>> from Bio import Entrez >>> handle = Entrez.einfo() IOError Traceback (most recent call last) ... [Errno url error] invalid proxy for http: 'proxy.upf.es:8080' I am using the latest biopython cvs, updated yesterday. On my system, the proxy variables were set like this: $http_proxy = 'proxy.upf.es:8080' $HTTP_PROXY = 'proxy.upf.es:8080' After a few tries, it seems that the module uses the HTTP_PROXY variable and that it expects it to contain 'http://' $: export HTTP_PROXY=http://proxy.upf.es:8080 $: ipython >>> from Bio import Entrez >>> Entrez.einfo() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 16:15:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:15:24 -0500 Subject: [Biopython-dev] [Bug 2769] New: Entrez results: seek methods doesn't work? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2769 Summary: Entrez results: seek methods doesn't work? Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com Many methods in Entrez return a file-like object which has methods like .read, .readlines, etc.. However I report this error in the .seek method: >>> from Bio import Entrez >>> result = Entrez.einfo() >>> print result.read() ... >>> print result.read() >>> print handle.seek(0) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/gioby/ in () /home/gioby/usr/share/biopython/Bio/File.pyc in seek(self, *args) 89 def seek(self, *args): 90 self._saved = [] ---> 91 self._handle.seek(*args) 92 93 def __getattr__(self, attr): AttributeError: addinfourl instance has no attribute 'seek' p.s. system info: I am running the latest biopython cvs. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 16:21:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:21:36 -0500 Subject: [Biopython-dev] [Bug 2770] New: suggestion: raise a warning if Entrez.email is not set Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2770 Summary: suggestion: raise a warning if Entrez.email is not set Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com This is a just proposal... In the biopython tutorial, you suggest users to always set Entrez.email before using any Entrez util: - http://www.biopython.org/DIST/docs/tutorial/Tutorial.html#htoc65 You could raise a warning if Entrez.email is not set and if any util is used. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 16:43:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:43:20 -0500 Subject: [Biopython-dev] [Bug 2771] New: Entrez.efetch: dbSNP not supported yet? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2771 Summary: Entrez.efetch: dbSNP not supported yet? Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com Executing efetch on the 'snp' database returns an html file instead of an xml (by default, running efetch on 'gene' or another database returns an xml). >>> handle = Entrez.efetch(db='snp', id='9996597',) >>> cont = handle.read() >>> print cont ... Moreover, even when forcing retmode=xml, it seems that the xml file returned is written in an xml not supported by biopython (not sure if this a ncbi's problem): >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml') >>> cont = handle.read() >>> print cont ' ... You can see the problem better if you open the result handle as explained in the tutorial, via Entrez.read: >>> handle = Entrez.efetch(db='snp', id='9996597', retmode='xml') >>> result = Entrez.read(handle) --------------------------------------------------------------------------- UnboundLocalError Traceback (most recent call last) /home/gioby/Test/NCBI_wsdl/ in () /home/gioby/usr/share/biopython/Bio/Entrez/__init__.pyc in read(handle) 284 DTDs = os.path.join(__path__[0], "DTDs") 285 handler = DataHandler(DTDs) --> 286 record = handler.run(handle) 287 return record 288 /home/gioby/usr/share/biopython/Bio/Entrez/Parser.py in run(self, handle) 93 self.parser.CharacterDataHandler = self.characters 94 self.parser.ExternalEntityRefHandler = self.external_entity_ref_handler ---> 95 self.parser.ParseFile(handle) 96 self.parser = None 97 return self.object /home/gioby/usr/share/biopython/Bio/Entrez/Parser.py in startElement(self, name, attrs) 129 self.attributes = attrs 130 return --> 131 if object!="": 132 object.tag = name 133 if attrs: UnboundLocalError: local variable 'object' referenced before assignment Try this code also, it will return a different error: >>> handle = Entrez.efetch(db='snp', id='9996597') # retmode is HTML >>> result = Entrez.read(handle) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 16:50:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 11:50:09 -0500 Subject: [Biopython-dev] [Bug 2771] Entrez.efetch: dbSNP not supported yet? In-Reply-To: Message-ID: <200902201650.n1KGo9e5026394@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #1 from dalloliogm at gmail.com 2009-02-20 11:50 EST ------- (In reply to comment #0) > Executing efetch on the 'snp' database returns an html file instead of an xml > (by default, running efetch on 'gene' or another database returns an xml). > > >>> handle = Entrez.efetch(db='snp', id='9996597',) > >>> cont = handle.read() > >>> print cont > > ... > Sorry, this part is not correct. I am opening another bug report (#2772 ?) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 17:16:49 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:16:49 -0500 Subject: [Biopython-dev] [Bug 2772] New: Entrez.efetch: the default value of 'retmode' depends on the database Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2772 Summary: Entrez.efetch: the default value of 'retmode' depends on the database Product: Biopython Version: 1.49b Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P5 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: dalloliogm at gmail.com minor issue: Entrez.efetch and Entrez.esummary have different 'retmode' default values. This sometimes is confusing for the users. >>> Entrez.esummary(db='snp', id=1).readline() >>> Entrez.efetch(db='snp', id=1).readline() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 17:37:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:37:08 -0500 Subject: [Biopython-dev] [Bug 2768] Bio.Entrez under a proxy In-Reply-To: Message-ID: <200902201737.n1KHb8hp008109@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2768 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:37 EST ------- This does seem like a good idea for the documentation - after all, you are not the first person to ask. See: http://lists.open-bio.org/pipermail/biopython/2008-November/004756.html http://www.python.org/doc/2.5.2/lib/module-urllib.html -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 17:42:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:42:29 -0500 Subject: [Biopython-dev] [Bug 2770] suggestion: raise a warning if Entrez.email is not set In-Reply-To: Message-ID: <200902201742.n1KHgTOo009851@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2770 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:42 EST ------- We did actually have a warning in the code in CVS, but had concluded it was perhaps a bit too much - see mailing list discussions and revision 1.37 of Bio/Entrez/__init__.py, viewable here: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Entrez/__init__.py?cvsroot=biopython The NCBI guidelines are relatively relaxed about this: http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements Let's leave this open in case anyone else wants to comment, but unless the NCBI change their guidelines I am inclined to leave Bio.Entrez as it is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 17:48:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:48:00 -0500 Subject: [Biopython-dev] [Bug 2772] Entrez.efetch: the default value of 'retmode' depends on the database In-Reply-To: Message-ID: <200902201748.n1KHm0f4011522@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2772 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:48 EST ------- This is not a bug in Biopython - for Bio.Entrez we leave the defaults to the NCBI, who can and may change them at any time. If you want XML, you should explicitly ask for it. Explicit is better than implicit. http://www.python.org/dev/peps/pep-0020/ [For comparison, our qblast wrapper is perhaps more confusing as it has its own defaults set within Biopython, and since it was first written the NCBI have changed some of their default parameters.] -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 17:58:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 12:58:12 -0500 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200902201758.n1KHwCvA014904@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Entrez.efetch: dbSNP not |Bio.Entrez.read can't parse |supported yet? |XML files from dbSNP (snp | |database) ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 12:58 EST ------- I've retitled the bug to focus on the failure to parse the XML file (there is no problem in Bio.Entrez.efetch as far as I can tell). For example, >>> from Bio import Entrez >>> result = Entrez.read(Entrez.efetch(db='snp', id='9996597', retmode='xml')) Traceback (most recent call last): File "", line 1, in File "Bio/Entrez/__init__.py", line 286, in read record = handler.run(handle) File "Bio/Entrez/Parser.py", line 95, in run self.parser.ParseFile(handle) File "Bio/Entrez/Parser.py", line 131, in startElement if object!="": UnboundLocalError: local variable 'object' referenced before assignment This may be an NCBI bug, try this: >>> from Bio import Entrez >>> print Entrez.efetch(db='snp', id='9996597', retmode='xml').read() ... Then copy and paste the XML into a validation site like http://www.validome.org/xml/validate/ where I see an error. On the other hand, http://validator.w3.org/#validate_by_input seems happy with only a warning. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 18:03:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 13:03:11 -0500 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200902201803.n1KI3BAi016506@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 13:03 EST ------- (In reply to comment #2) > This may be an NCBI bug, ... According to this page there is/was a problem with the XML files returned for the snp database by efetch, http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html >> Known issues >> * ... >> * eFetch utility generates an invalid XML for SNP, so currently it doesn't >> work through SOAP. The bug is being fixed. >> * ... Unfortunately I have no idea if that information is current or not. This could been unrelated. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 18:07:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 13:07:15 -0500 Subject: [Biopython-dev] [Bug 2769] Entrez results: seek methods doesn't work? In-Reply-To: Message-ID: <200902201807.n1KI7FBv017703@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2769 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 13:07 EST ------- This is normal and expected behaviour from many python file-like handles. In particular it is normal for handles to network resources, e.g. >>> import urllib >>> handle = urllib.urlopen("http://biopython.org/") >>> print handle.read() ... >>> handle.seek(0) Traceback (most recent call last): File "", line 1, in AttributeError: addinfourl instance has no attribute 'seek' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 20 18:43:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 20 Feb 2009 13:43:52 -0500 Subject: [Biopython-dev] [Bug 2760] proposal: enhancement for SeqIO.TabIO In-Reply-To: Message-ID: <200902201843.n1KIhqCG026268@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2760 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-20 13:43 EST ------- (In reply to comment #0) > this patch fix a problem that TabIO had (fail if there it are more than two > tabs, or spaces instead of tabs, between the title and the sequence), Those cases are intentionally not supported, but the error message should now be clearer. > and introduces a check to skip empty lines. Fixed as this seems like a good idea (you can often get an empty line at the end of files). Closing this bug as "fixed". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Feb 20 23:19:04 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 20 Feb 2009 18:19:04 -0500 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: References: <200902201249.36743.jblanca@btc.upv.es> Message-ID: <20090220231904.GE18294@sobchak.mgh.harvard.edu> Hi all; Good points on this debate so far. What do you all think about a hybrid approach where the .quality attribute is a dictionary? The keys would be the quality type ("phred", "solexa"...) and the values would be a list or string the same length as the sequence. For slicing, all of the quality dictionary values would be sliced identically to the sequence itself. For BioSQL storage the quality items would go in as annotations with names as a concatenation of the attribute and type ("quality_phred"). Treating these specially on the BioSQL in/out is a little hack-y, but quality is likely important enough to not bury it. For Leighton's idea of generalization you could either: - Derive a heavy-weight SeqRecord class from the base class that added a several additional per-symbol cases. - Provide a generic per_symbol_annotations attribute that collected these as a dictionary of dictionaries: dict(quality = dict(phred = [20, 30]), hydrophobicity = dict(some_predictor = ['some', 'scores']) ) These could map to generic attributes in the same way and follow the same slicing rules. After writing this up, I think the second idea is better and probably exactly what Leighton was proposing. Brad > Another 2p... I collect them, you know... > > An additional determinant of how these values are best scored is: "What will > they be used for?". > > If the only use they would ever find was to accompany a sequence so that its > file format could be converted from one with embedded qualities to a format > that required two such files (or vice-versa), then straightforward storage > as a string in a dictionary is all that's needed. This would be sufficient > for conversion between some quality scores, as a utility function could just > grab the stored string (given an appropriate name for each quality format). > The question of how these per-symbol annotations would be modified when > returning a Seq slice or join may be an issue. > > If 'live' access to the values is required for calculation or alignment > purposes, then a different interface might be more useful, permitting > slicing, base selection on the basis of quality, or other operation. This > use case is more complex, as the return value is likely to be dependent on > the quality format (single- or multiple-value per base). > > Conceptually, I see quality scores as annotations of a sequence, rather than > an intrinsic property of the sequence, so am happy for them to live in the > same place other annotations do. I also see them as only one instance of a > class of per-symbol annotations (along with hydrophobicity scores, secondary > structure predictions, read map counts and several other measures). I > think, therefore, that there is a case for a class describing per-symbol > annotations to a Seq, and placing these in a dictionary of per-symbol > annotations. Slices of the parent Seq could then be propagated downwards to > all members of that dictionary (which would also be expected to implement > the same string-like methods as the parent). > > The per-symbol annotation objects could be subclassed and/or contain a > descriptive string from a controlled vocabulary to indicate their format, > for standard interfacing with external packages (e.g. Drawing TOPS diagrams > from secondary structure predictions or rendering base quality profiles), > which I think would be a flexible approach. > > On 20/02/2009 11:49, "Jose Blanca" wrote: > > >> I suppose you could consider adding a .phred_quality > >> property which is explicit, but then you'd end up with many different > >> properties. Then there are other per-letter quality annotations - you > >> might want the A, C, G and T intensity from capillary sequencing (four > >> sets of numbers, not just one). Plus of course this doesn't address > >> non-quality related per-letter-annotations (like secondary structure, > >> or atomic coordinates). > >> > >> My point is that if we can't give top level properties to everything, > >> hence the original introduction of the annotations dictionary in the > >> first place. Only a handful of really important things got their own > >> properties (id, name, description and the sequence itself). If there > >> was only ONE key quality score, then I wouldn't mind making an > >> exception so much - but that doesn't seem to be the case. > > That's a very good point. It wouldn't be wise to populate the SeqRecord class > > with a lot of properties. > > Another posible approach would be to create a derived class for that a > > SeqWithQuality. It would be like a SeqRecord but with a .quality property. > > For other cases other classes could be derived from SeqRecord. > > The problem with putting the quatilies in a dict with all the other per base > > annotation is that it has a different behaviour than the .seq case. The seq > > case is special because is much more used, so maybe that's fair enough. > > I don't know, maybe it is wiser to set all the per case annotations in a dict > > a let the sequence outside. In that way we won't be creating a lot of new > > classes derived from SeqRecord. > > The more I think about the dict possibility, the more I like it. > > -- > Dr Leighton Pritchard MRSC > D131, Plant Pathology Programme, SCRI > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > ______________________________________________________________________ > SCRI, Invergowrie, Dundee, DD2 5DA. > The Scottish Crop Research Institute is a charitable company limited by > guarantee. > Registered in Scotland No: SC 29367. > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > DISCLAIMER: > > This email is from the Scottish Crop Research Institute, but the views > expressed by the sender are not necessarily the views of SCRI and its > subsidiaries. This email and any files transmitted with it are > confidential > > to the intended recipient at the e-mail address to which it has been > addressed. It may not be disclosed or used by any other than that > addressee. > If you are not the intended recipient you are requested to preserve this > > confidentiality and you must not use, disclose, copy, print or rely on > this > e-mail in any way. Please notify postmaster at scri.ac.uk quoting the > name of the sender and delete the email from your system. > > Although SCRI has taken reasonable precautions to ensure no viruses are > present in this email, neither the Institute nor the sender accepts any > responsibility for any viruses, and it is your responsibility to scan > the email and the attachments (if any). > ______________________________________________________________________ > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From idoerg at gmail.com Sat Feb 21 00:24:43 2009 From: idoerg at gmail.com (Iddo Friedberg) Date: Fri, 20 Feb 2009 16:24:43 -0800 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <20090220231904.GE18294@sobchak.mgh.harvard.edu> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> Message-ID: <1235175883.22598.62.camel@lafa> Hi all, I am sort of living in this world right now, doing a lot of metagenomics, so here are my $0.02. I agree with Leighton (assuming I understand him): We should consider the possible applications people will run using the quality data when designing the from what I have seen the most common use for quality scores is for trimming the sequences, i.e. removing the lesser quality sequence data (usually on the edges) from the 5' and 3' ends of the read. So any data structure should take into consideration that we will probably have a .trim(self,threshold) method or function trim(seq, threshold) that will return a slice of the sequence. 2) There is a certain optimization need. Quality scores usually appear on high-throughput data, which today can mean around 3GBp per run. I am not sure where this is going exactly, but maybe in the advent of high throughput short-read based genomics we should think about a slim SeqRecord to expedite processing of short read processing. Or simply write some stuff wrapped around C. ./I On Fri, 2009-02-20 at 18:19 -0500, Brad Chapman wrote: > Hi all; > Good points on this debate so far. What do you all think about a > hybrid approach where the .quality attribute is a dictionary? The > keys would be the quality type ("phred", "solexa"...) and the values > would be a list or string the same length as the sequence. > > For slicing, all of the quality dictionary values would be sliced > identically to the sequence itself. For BioSQL storage the quality > items would go in as annotations with names as a concatenation > of the attribute and type ("quality_phred"). > > Treating these specially on the BioSQL in/out is a little hack-y, > but quality is likely important enough to not bury it. > > For Leighton's idea of generalization you could either: > > - Derive a heavy-weight SeqRecord class from the base class that > added a several additional per-symbol cases. > > - Provide a generic per_symbol_annotations attribute that collected > these as a dictionary of dictionaries: > > dict(quality = dict(phred = [20, 30]), > hydrophobicity = dict(some_predictor = ['some', 'scores']) > ) > > These could map to generic attributes in the same way and follow the > same slicing rules. After writing this up, I think the second idea > is better and probably exactly what Leighton was proposing. > > Brad > > > Another 2p... I collect them, you know... > > > > An additional determinant of how these values are best scored is: "What will > > they be used for?". > > > > If the only use they would ever find was to accompany a sequence so that its > > file format could be converted from one with embedded qualities to a format > > that required two such files (or vice-versa), then straightforward storage > > as a string in a dictionary is all that's needed. This would be sufficient > > for conversion between some quality scores, as a utility function could just > > grab the stored string (given an appropriate name for each quality format). > > The question of how these per-symbol annotations would be modified when > > returning a Seq slice or join may be an issue. > > > > If 'live' access to the values is required for calculation or alignment > > purposes, then a different interface might be more useful, permitting > > slicing, base selection on the basis of quality, or other operation. This > > use case is more complex, as the return value is likely to be dependent on > > the quality format (single- or multiple-value per base). > > > > Conceptually, I see quality scores as annotations of a sequence, rather than > > an intrinsic property of the sequence, so am happy for them to live in the > > same place other annotations do. I also see them as only one instance of a > > class of per-symbol annotations (along with hydrophobicity scores, secondary > > structure predictions, read map counts and several other measures). I > > think, therefore, that there is a case for a class describing per-symbol > > annotations to a Seq, and placing these in a dictionary of per-symbol > > annotations. Slices of the parent Seq could then be propagated downwards to > > all members of that dictionary (which would also be expected to implement > > the same string-like methods as the parent). > > > > The per-symbol annotation objects could be subclassed and/or contain a > > descriptive string from a controlled vocabulary to indicate their format, > > for standard interfacing with external packages (e.g. Drawing TOPS diagrams > > from secondary structure predictions or rendering base quality profiles), > > which I think would be a flexible approach. > > > > On 20/02/2009 11:49, "Jose Blanca" wrote: > > > > >> I suppose you could consider adding a .phred_quality > > >> property which is explicit, but then you'd end up with many different > > >> properties. Then there are other per-letter quality annotations - you > > >> might want the A, C, G and T intensity from capillary sequencing (four > > >> sets of numbers, not just one). Plus of course this doesn't address > > >> non-quality related per-letter-annotations (like secondary structure, > > >> or atomic coordinates). > > >> > > >> My point is that if we can't give top level properties to everything, > > >> hence the original introduction of the annotations dictionary in the > > >> first place. Only a handful of really important things got their own > > >> properties (id, name, description and the sequence itself). If there > > >> was only ONE key quality score, then I wouldn't mind making an > > >> exception so much - but that doesn't seem to be the case. > > > That's a very good point. It wouldn't be wise to populate the SeqRecord class > > > with a lot of properties. > > > Another posible approach would be to create a derived class for that a > > > SeqWithQuality. It would be like a SeqRecord but with a .quality property. > > > For other cases other classes could be derived from SeqRecord. > > > The problem with putting the quatilies in a dict with all the other per base > > > annotation is that it has a different behaviour than the .seq case. The seq > > > case is special because is much more used, so maybe that's fair enough. > > > I don't know, maybe it is wiser to set all the per case annotations in a dict > > > a let the sequence outside. In that way we won't be creating a lot of new > > > classes derived from SeqRecord. > > > The more I think about the dict possibility, the more I like it. > > > > -- > > Dr Leighton Pritchard MRSC > > D131, Plant Pathology Programme, SCRI > > Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA > > e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard > > gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 > > > > > > ______________________________________________________________________ > > SCRI, Invergowrie, Dundee, DD2 5DA. > > The Scottish Crop Research Institute is a charitable company limited by > > guarantee. > > Registered in Scotland No: SC 29367. > > Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. > > > > > > DISCLAIMER: > > > > This email is from the Scottish Crop Research Institute, but the views > > expressed by the sender are not necessarily the views of SCRI and its > > subsidiaries. This email and any files transmitted with it are > > confidential > > > > to the intended recipient at the e-mail address to which it has been > > addressed. It may not be disclosed or used by any other than that > > addressee. > > If you are not the intended recipient you are requested to preserve this > > > > confidentiality and you must not use, disclose, copy, print or rely on > > this > > e-mail in any way. Please notify postmaster at scri.ac.uk quoting the > > name of the sender and delete the email from your system. > > > > Although SCRI has taken reasonable precautions to ensure no viruses are > > present in this email, neither the Institute nor the sender accepts any > > responsibility for any viruses, and it is your responsibility to scan > > the email and the attachments (if any). > > ______________________________________________________________________ > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Iddo Friedberg, Ph.D. CALIT2 Atkinson Hall MC #0446 University of California San Diego 9500 Gilman Drive La Jolla, CA 92093-0446 USA +1 (858) 534-0570 http://iddo-friedberg.org From biopython at maubp.freeserve.co.uk Sat Feb 21 18:50:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Feb 2009 18:50:15 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <20090220231904.GE18294@sobchak.mgh.harvard.edu> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> On Fri, Feb 20, 2009 at 11:19 PM, Brad Chapman wrote: > Hi all; > Good points on this debate so far. What do you all think about a > hybrid approach where the .quality attribute is a dictionary? The > keys would be the quality type ("phred", "solexa"...) and the values > would be a list or string the same length as the sequence. I was actually thinking about adding a per_letter_annotations (or using Brad's suggested name per_symbol_annotations) dictionary which could hold phred qualities, solexa qualities, secondary structure, atomic coordinates - any python sequence (e.g. string, list or tuple) with a length matching the sequence. This would cover all the use cases I have come up with, and we can implement SeqRecord slicing which would also slice everything in the per_letter_annotations dictionary. Note that the per_letter_annotations dictionary could actually be a simple subclass of the python dictionary that only allows you to add elements with the appropriate length - this would prevent simple abuses/accidental errors. > For slicing, all of the quality dictionary values would be sliced > identically to the sequence itself. For BioSQL storage the quality > items would go in as annotations with names as a concatenation > of the attribute and type ("quality_phred"). > > Treating these specially on the BioSQL in/out is a little hack-y, > but quality is likely important enough to not bury it. If you are trying to store a sequence-with-quality in BioSQL, then yes using the existing annotation tables could work - the ontology term can tell us its a per-letter-annotation rather than a generic annotation. The only catch is the current tables only let us store strings. We could store each per-letter-annotation entry (e.g. a single quality score) as a separate table entry (where the rank tells us the correct order), but bundling them all into a single long table row might be more efficient. In the case of PHRED or Solexa scores, we could even use the FASTQ encoding (but a string "10, 20, 50, ..." might be more sensible). This would require some co-ordination with the other Bio* projects, probably on the BioSQL mailing list. On the other hand, I don't expect anyone to try and store GB of sequence+quality data in BioSQL. For this a custom database design would be much more efficient (or at least some custom tables). Here as Iddo points out, the SeqRecord object may be overkill. > For Leighton's idea of generalization you could either: > > - Derive a heavy-weight SeqRecord class from the base class that > added a several additional per-symbol cases. > > - Provide a generic per_symbol_annotations attribute that collected > these as a dictionary of dictionaries: > > dict(quality = dict(phred = [20, 30]), > hydrophobicity = dict(some_predictor = ['some', 'scores']) > ) > > These could map to generic attributes in the same way and follow the > same slicing rules. After writing this up, I think the second idea > is better and probably exactly what Leighton was proposing. I'm not sure if its exactly what Leighton has in mind, but it seems more complicated to have to do my_record.per_symbol_annotations["quality"]["phred"] rather than just my_record.per_symbol_annotations["quality_phred"]. I don't see much benefit to the extra level of nesting - after all you'll typically only have one type of quality present. Peter From biopython at maubp.freeserve.co.uk Sat Feb 21 19:03:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 21 Feb 2009 19:03:14 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <1235175883.22598.62.camel@lafa> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <1235175883.22598.62.camel@lafa> Message-ID: <320fb6e00902211103n175fefc7w71a6922ee0cd0f26@mail.gmail.com> On Sat, Feb 21, 2009 at 12:24 AM, Iddo Friedberg wrote: > > Hi all, > > I am sort of living in this world right now, doing a lot of > metagenomics, so here are my $0.02. I agree with Leighton (assuming I > understand him): We should consider the possible applications people > will run using the quality data when designing the [parser?] Sure. By having the FASTQ and QUAL files integrated into Bio.SeqIO (using SeqRecord objects) one simple use case is supported - interconverting these files into other formats (e.g. FASTQ to FASTA, or with a little more effort FASTA+QUAL to FASTQ). Your trimming example is a another good use case - which could be done with the SeqRecord representation. For anything more complicated (like assembly or mapping onto a genome), with massive datasets the modest overhead of the SeqRecord and Seq objects could be an issue - but isn't this sort of thing is usually best handled by an external tool (written in C or C++ by a specialist)? Anyway - If you have a look at Bug 2767 at the first attachment I did the core of the FASTQ parser as a generic function returning a tuple of strings (the record title, sequence and the encoded quality string - see FastqGeneralIterator). While this could be just a private function, I was thinking this could actually be very helpful for anyone trying to do something where performance speed or memory usage was important. On top of this core parser, I had a FastqPhredIterator (and would similarly have a FastqSolexaIterator) function which turns these into SeqRecord objects for use via the Bio.SeqIO API. i.e. We can offer both the standard Bio.SeqIO interface using SeqRecords, and a simpler string based parser for those that need it. Peter From chapmanb at 50mail.com Sun Feb 22 21:27:42 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 22 Feb 2009 16:27:42 -0500 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> Message-ID: <20090222212742.GA58314@kunkel> Hi all; > I was actually thinking about adding a per_letter_annotations (or > using Brad's suggested name per_symbol_annotations) dictionary which > could hold phred qualities, solexa qualities, secondary structure, > atomic coordinates - any python sequence (e.g. string, list or tuple) > with a length matching the sequence. This would cover all the use > cases I have come up with, and we can implement SeqRecord slicing > which would also slice everything in the per_letter_annotations > dictionary. [...] > I'm not sure if its exactly what Leighton has in mind, but it seems > more complicated to have to do > my_record.per_symbol_annotations["quality"]["phred"] rather than just > my_record.per_symbol_annotations["quality_phred"]. I'm agreed with you here -- the double dictionary I proposed is ugly and doesn't do much of anything extra. I'm +1 on exactly what you wrote here, and am not picky about the naming. > The only catch is the current tables only let us store > strings. We could store each per-letter-annotation entry (e.g. a > single quality score) as a separate table entry (where the rank tells > us the correct order), but bundling them all into a single long table > row might be more efficient. In the case of PHRED or Solexa scores, > we could even use the FASTQ encoding (but a string "10, 20, 50, ..." > might be more sensible). This would require some co-ordination with > the other Bio* projects, probably on the BioSQL mailing list. My vote is for bundling them together into a single row table using json to stringify the lists. It's a nice compact representation and will be well supported in any language. Python 2.6 has the simplejson library bundled, so it's just a matter of doing: jsonified_list = json.dumps(the_quality_list) the_quality_list = json.loads(jsonified_list) Since I've been doing more Javascript and Python, I appreciate not munging lists into strings with obscure separators and really like json. As a bonus, it looks just like Python. Brad From lpritc at scri.ac.uk Mon Feb 23 09:48:07 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Mon, 23 Feb 2009 09:48:07 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <20090222212742.GA58314@kunkel> Message-ID: Hi all, On 22/02/2009 21:27, "Brad Chapman" wrote: > [...] >> I'm not sure if its exactly what Leighton has in mind, but it seems >> more complicated to have to do >> my_record.per_symbol_annotations["quality"]["phred"] rather than just >> my_record.per_symbol_annotations["quality_phred"]. > > I'm agreed with you here -- the double dictionary I proposed is ugly > and doesn't do much of anything extra. I'm +1 on exactly what you wrote > here, and am not picky about the naming. I was originally suggesting two extremes, a lightweight dictionary and a more heavyweight new class. I now prefer the lightweight option, which I imagine might operate along the lines of (keeping away from quality scores, for now...) >>> my_seqrecord SeqRecord(seq=Seq('FCLEPPYWYKNPGARTESRILRGGIID', Alphabet()), id='my_seqrecord', name='', description='', dbxrefs=[]) >>> my_seqrecord.per_symbol_annotations['secondary_structure'] 'HHHHHHEEEEEEE EEEEEEEEE' >>> my_seqrecord.per_symbol_annotations['hydrophobicity'] [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045, 0.493, 0.162, 0.796, 0.989, 0.419, 0.501, 0.686, 0.985, 0.502, 0.242, 0.890, 0.436, 0.855, 0.426, 0.814, 0.178, 0.923] >>> # Assuming that one day there's slicing of SeqRecords... >>> shorter_seqrecord = my_seqrecord[:10] >>> shorter_seqrecord.per_symbol_annotations['secondary_structure'] 'HHHHHHEEEE" >>> shorter_seqrecord.per_symbol_annotations['hydrophobicity'] [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045] Which I guess could be enforced in slice-handling by having it loop over the values (if any) in my_seqrecord.per_symbol_annotations and propagate accordingly. The more heavyweight idea involved a PerSymbolAnnotation (or somesuch name) class. I imagined this presenting a common API, but permitting the storage of annotation data in an arbitrary fashion so long as it could be returned as a Python sequence. The class-based approach would make it possible to attach methods specific to that kind of annotation data, which may be useful - but probably not in the vast majority of cases. Also, any such operations could probably be handled external to the object by other functions, so long as they can get that Python sequence - which the more lightweight approach provides. Most people's attention here seems to be focused on sequence quality data, with a skew towards high-throughput sequencing, and the lightweight approach is the one that definitely makes most sense to me, there. >> The only catch is the current tables only let us store >> strings. We could store each per-letter-annotation entry (e.g. a >> single quality score) as a separate table entry (where the rank tells >> us the correct order), but bundling them all into a single long table >> row might be more efficient. In the case of PHRED or Solexa scores, >> we could even use the FASTQ encoding (but a string "10, 20, 50, ..." >> might be more sensible). This would require some co-ordination with >> the other Bio* projects, probably on the BioSQL mailing list. > > My vote is for bundling them together into a single row table using > json to stringify the lists. It's a nice compact representation and > will be well supported in any language. Python 2.6 has the > simplejson library bundled, so it's just a matter of doing: > > jsonified_list = json.dumps(the_quality_list) > the_quality_list = json.loads(jsonified_list) > > Since I've been doing more Javascript and Python, I appreciate not > munging lists into strings with obscure separators and really like > json. As a bonus, it looks just like Python. I don't like the idea of storing each per-symbol annotation (i.e. single score/annotation) in its own row, either. I think that we all realise that approach could rapidly become hugely inefficient ;) I can see that pulling out individual symbol annotations might be desirable when people want slices of the annotation in units smaller than a single seqfeature or bioentry (in BioSQL terms). In those cases, on grounds of efficiency, I think it possibly makes more sense to grab either the seqfeature or bioentry (since the per-symbol annotations would always be associated with such an object) as a SeqRecord and slice out the data, rather than to query a table with what would likely be (at least eventually) millions of rows of per-symbol annotations. That possibly means adding slicing to SeqRecords though, which brings its own problems... ;) Storage of per-symbol annotation as Python sequence information in a single db row, in a human-readable plain-text format that's readily-parsable when querying the database with Biopython looks like a winning approach to me. I'd not come across json before - it does remind me of nested Python dictionaries. It looks simple to use and parse, and reverse-engineerable if necessary. If it's robust to the kind of data we want to store, and a de facto or actual standard usable transparently across all Bio* projects, then it sounds like a good candidate, to me. L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From biopython at maubp.freeserve.co.uk Mon Feb 23 10:42:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 10:42:13 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: References: <20090222212742.GA58314@kunkel> Message-ID: <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> On Mon, Feb 23, 2009 at 9:48 AM, Leighton Pritchard wrote: > Hi all, > > On 22/02/2009 21:27, "Brad Chapman" wrote: > >> [...] >>> I'm not sure if its exactly what Leighton has in mind, but it seems >>> more complicated to have to do >>> my_record.per_symbol_annotations["quality"]["phred"] rather than just >>> my_record.per_symbol_annotations["quality_phred"]. >> >> I'm agreed with you here -- the double dictionary I proposed is ugly >> and doesn't do much of anything extra. I'm +1 on exactly what you wrote >> here, and am not picky about the naming. > > I was originally suggesting two extremes, a lightweight dictionary and a > more heavyweight new class. ?I now prefer the lightweight option, which I > imagine might operate along the lines of (keeping away from quality scores, > for now...) > >>>> my_seqrecord > SeqRecord(seq=Seq('FCLEPPYWYKNPGARTESRILRGGIID', Alphabet()), > id='my_seqrecord', name='', description=' description>', dbxrefs=[]) >>>> my_seqrecord.per_symbol_annotations['secondary_structure'] > 'HHHHHHEEEEEEE ? ? EEEEEEEEE' >>>> my_seqrecord.per_symbol_annotations['hydrophobicity'] > [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045, > 0.493, 0.162, 0.796, 0.989, 0.419, 0.501, 0.686, 0.985, 0.502, 0.242, 0.890, > 0.436, 0.855, 0.426, 0.814, 0.178, 0.923] >>>> # Assuming that one day there's slicing of SeqRecords... >>>> shorter_seqrecord = my_seqrecord[:10] >>>> shorter_seqrecord.per_symbol_annotations['secondary_structure'] > 'HHHHHHEEEE" >>>> shorter_seqrecord.per_symbol_annotations['hydrophobicity'] > [0.823, 0.880, 0.987, 0.461, 0.706, 0.972, 0.109, 0.499, 0.908, 0.045] > > Which I guess could be enforced in slice-handling by having it loop over the > values (if any) in my_seqrecord.per_symbol_annotations and propagate > accordingly. This sounds like a possible consensus :) In terms of names, we've have per_symbol_annotations and per_letter_annotations (to match the existing annotations dictionary), which are long but explicit. We could also have letter_annotations, symbol_annotations (shorter but more ambiguous), or even pas or pla (too short?). For the implementation, we could start with a simple dictionary and see if any kind of safety feature should be added later if is seems necessary. What I had in mind was a dict subclass which takes the sequence length, and by overriding the __setitem__ method checks only python sequences (objects with __len__ and __getitem__) of the appropriate length can be added. This would add a small overhead when creating the annotated SeqRecord, and wouldn't stop abuses like my_seqrecord.per_symbol_annotations['secondary_structure'].append("X"), but would make it harder to accidentally get inconsistent sequence and per-letter-annotation. > The more heavyweight idea involved a PerSymbolAnnotation (or somesuch name) > class. ?I imagined this presenting a common API, but permitting the storage > of annotation data in an arbitrary fashion so long as it could be returned > as a Python sequence. ?The class-based approach would make it possible to > attach methods specific to that kind of annotation data, which may be useful > - but probably not in the vast majority of cases. ?Also, any such operations > could probably be handled external to the object by other functions, so long > as they can get that Python sequence - which the more lightweight approach > provides. You could implement things like a SolexaQualityList and PhredQualityList with methods to inter-convert the scores and still use them within the per_letter_annotations approach described above. One of the nice things about this dictionary approach is it would be very flexible - you could also store an N by 3 numpy array containing the x,y,z atomic coordinates of the C-alpha protein backbone for a protein of length N, or a list of residue objects from our PDB parser. Anything which is a python sequence object (so lists, strings, tuples for a start). >> My vote is for bundling them together into a single row table using >> json to stringify the lists. It's a nice compact representation and >> will be well supported in any language. Python 2.6 has the >> simplejson library bundled, so it's just a matter of doing: >> >> jsonified_list = json.dumps(the_quality_list) >> the_quality_list = json.loads(jsonified_list) >> >> Since I've been doing more Javascript and Python, I appreciate not >> munging lists into strings with obscure separators and really like >> json. As a bonus, it looks just like Python. > > I don't like the idea of storing each per-symbol annotation (i.e. single > score/annotation) in its own row, either. ?I think that we all realise that > approach could rapidly become hugely inefficient ;) ?... For recording complex objects in a BioSQL database, using json sounds like a simple cross language solution. We should take this sub-topic over to the BioSQL mailing list. In terms of Biopython, we'd need to be able to support old versions Python. For simple cases like lists of integers, or lists of floats, this is probably very straight forward - but if we need full json support its a bit more tricky. We'd want to use the BioSQL term/ontology features to indicate the value is json encoded somehow. Peter From andrea at biodec.com Mon Feb 23 12:22:51 2009 From: andrea at biodec.com (Andrea) Date: Mon, 23 Feb 2009 13:22:51 +0100 Subject: [Biopython-dev] DeprecationWorning SProt.py Message-ID: <49A2951B.9060706@biodec.com> Goodmorning, my name is Andrea Zauli. using the last version of biopyhthon (1.49) i received this DeprecationWarning: /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-x86_64.egg/Bio/SwissProt/SProt.py:147: DeprecationWarning: Bio.SwissProt.SProt.Iterator is deprecated. Please use the function Bio.SwissProt.parse instead if you want to get a SwissProt.SProt.Record, or Bio.SeqIO.parse if you want to get a SeqRecord. If these solutions do not work for you, please get in contact with the Biopython developers (biopython-dev at biopython.org). DeprecationWarning) But i still need to use it. I'm going to explain my problem. I noticed that the seq record parser SProt.SequenceParser (or the newest Bio.SeqIO.parse) aren't able to parse uniprot Feature (and generate SeqFeature Objects). I noticed also that SProt.RecordParser is able to parse uniprot Feature (and it generates a list of tuple for the parsed features). So to generate a "featured SeqRecord" i need to parse each uniprot "record" with both (SProt.SequenceParser, SProt.RecordParser) and than transform easily each Feature tuple into a SeqFeature instance . To manage this problem actually i'm using SProt.Iterator that is able to work with a file handle, and return like a generator, each unparsed uniprot record. After, i can easily pass each unparsed uniprot record either to SProt.SequenceParser and to SProt.RecordParser for parsing. In that way i'm ALSO SURE that the record i'm parsing is the SAME. If this method is deprecated, i'd be forced to use Bio.SeqIO.parse and Bio.SwissProt.parse, but each have to act on their own handle (so i've to open 2 file handles)..... and i'm not sure (ok i would be reasonably sure) that i'm working exactly on the same "record" every each "".next()"" . I could work in a different way if the Feature Parser (that in some way parses the Feature in the SProt.RecordParser) could be transferred to the SProt.SequenceParser (or Bio.SeqIO.parse). So actually i need to work with: - SProt.SequenceParser, SProt.RecordParser because of they have the method "parse_str". - SProt.Iterator because it is able to produce the "string" object that represent an uniprot record to parse (and that i can easily pass to the ".parse_str" method for parsing). I could stop to work with Prot.SequenceParser, SProt.RecordParser if Bio.SeqIO and Bio.SwissProt will have the methof ".parse_str". I could stop to work with SProt.Iterator, if in some way there is an alterntive. I could work in a different way if the Feature Parser (that in some way parses the Feature in the SProt.RecordParser) could be transferred to the SProt.SequenceParser (or Bio.SeqIO.parse). Thank in advance Any help is appreciated Best Reards Dr. Andrea Zauli From biopython at maubp.freeserve.co.uk Mon Feb 23 13:16:56 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 13:16:56 +0000 Subject: [Biopython-dev] DeprecationWorning SProt.py In-Reply-To: <49A2951B.9060706@biodec.com> References: <49A2951B.9060706@biodec.com> Message-ID: <320fb6e00902230516p7781bf73n9e2dbdfef43801df@mail.gmail.com> On Mon, Feb 23, 2009 at 12:22 PM, Andrea wrote: > Goodmorning, > my name is Andrea Zauli. > using the last version of biopyhthon (1.49) i received this > DeprecationWarning: > /usr/lib/python2.5/site-packages/biopython-1.49-py2.5-linux-x86_64.egg/Bio/SwissProt/SProt.py:147: > DeprecationWarning: > Bio.SwissProt.SProt.Iterator is deprecated. > Please use the function Bio.SwissProt.parse instead if you want to get a > SwissProt.SProt.Record, or Bio.SeqIO.parse if you want to get a SeqRecord. > If these solutions do not work for you, please get in contact with the Biopython > developers (biopython-dev at biopython.org). DeprecationWarning) > But i still need to use it. > > I'm going to explain my problem. > I noticed that the seq record parser SProt.SequenceParser (or the newest > Bio.SeqIO.parse) aren't able to parse uniprot Feature (and generate > SeqFeature Objects). I noticed also that SProt.RecordParser is able to > parse uniprot Feature (and it generates a list of tuple for the parsed features). The real solution is for us to finish fixing Bug 2235 so that the parsing SwissProt files as SeqRecord objects includes SeqFeature objects. I need to update the patch on that bug to record the SeqFeature object's qualifiers more like the GenBank parser. I don't personally use SwissProt files much, so If you are willing to help test these changes, I'd be a lot happier about committing this. http://bugzilla.open-bio.org/show_bug.cgi?id=2235 > So to generate a "featured SeqRecord" i need to parse > each uniprot "record" with both (SProt.SequenceParser, SProt.RecordParser) > and than transform easily each Feature tuple into a SeqFeature instance . That sounds ugly, but I guess it worked. > If this method is deprecated, i'd be forced to use Bio.SeqIO.parse and > Bio.SwissProt.parse, but each have to act on their own handle (so i've to > open 2 file handles)..... and i'm not sure (ok i would be reasonably sure) > that i'm working exactly on the same "record" every each "".next()"" . You should be able to use two separate handles for the two parsers, and they should iterate over the records correctly. Perhaps add an assert using the record identifier to make sure the records really are in sync. Peter From jblanca at btc.upv.es Mon Feb 23 13:25:21 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Mon, 23 Feb 2009 14:25:21 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> References: <20090222212742.GA58314@kunkel> <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> Message-ID: <200902231425.21877.jblanca@btc.upv.es> > This sounds like a possible consensus :) Great > In terms of names, we've have per_symbol_annotations and > per_letter_annotations (to match the existing annotations dictionary), > which are long but explicit. We could also have letter_annotations, > symbol_annotations (shorter but more ambiguous), or even pas or pla > (too short?). I don't like pla or pas, their not clear, I would vote for letter_annotations. I think it's the clearest one. > For the implementation, we could start with a simple dictionary and > see if any kind of safety feature should be added later if is seems > necessary. What I had in mind was a dict subclass which takes the > sequence length, and by overriding the __setitem__ method checks only > python sequences (objects with __len__ and __getitem__) of the > appropriate length can be added. I'm not sure how to implement that. What would you think about creating a new class based on dict but with an extra property, parent? parent would be a reference to the SeqRecord. This new class would check the length of its parent before adding the letter_annotation. I'm just asking because I'm curious about the best way to implement it. Best regards, Jose Blanca From dalloliogm at gmail.com Mon Feb 23 13:31:00 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 23 Feb 2009 14:31:00 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> Message-ID: <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio wrote: > Hi, > I have uploaded a git-converted branch of biopython on github, in case > you want to try it and see how it works. > > You can find it here: > - http://github.com/biopython/biopython/ Hi people, so, I am still testing biopython on git. The function to convert a cvs repository to git works well: I have just updated the branch on github to the latest cvs commit in open-bio, and it has correctly imported all the new commits without mixing them with the old ones. Now, if you look at http://github.com/biopython/biopython/network , you can see the results from all these experiments: the black line represent the code imported from cvs, and the other ones are experiments (well, don't care about the red one). For example, let's say you want to test the fix to the SwissProt parser commented by Andrea. You could create a new experimental branch, make it publicly accessible, and put all the changes there: only when you will consider it finish, you will merge it with the official one. The advantage of doing this is that two people or more are able to work on the same patch at the same time, and without having to touch the official code. > > > To work with it, the optimal protocol is: > > - create an account on github.com. Upload an ssh public key by > clicking on 'account' after having logged in. > It is not mandatory to use github, but it will help you understanding > how git works, and it allows other people to follow your branches and > your work. > > - go to the biopython repo: > http://github.com/biopython/biopython/tree/master > and you will see a button named 'Fork': click on it. > It will create a fork of the official biopython repository your > personal account. > Here the word 'fork' is not used in the common way it is, but just to > indicate that you are going to work on a modified version of the > official code, and it's not even a git command. > > > - now, install git on your computer, and execute the following commands: > $: git clone git at github.com:/biopython.git > $: git remote add official_dist git://github.com/biopython/biopython.git > > With the first command, you will download a copy of the repository on > your local computer, which will be the one you will modify > (technically, you are creating a new branch on your computer). > With the second command, you are adding a reference to the official > biopython repository, so in the future you will be able to easily > import the official code and compare it with yours. > > Here it is an explanation on these two commands: > http://github.com/guides/keeping-a-git-fork-in-sync-with-the-forked-repo > > > p.s.: to convert to git from cvs I have followed the instructions here: > - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/cvs-migration.html > This seems to be a good tutorial on git, too: > - http://www.kernel.org/pub/software/scm/git/docs/v1.4.4.4/tutorial.html > > > -- > > My blog on bioinformatics (now in English): http://bioinfoblog.it > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From dalloliogm at gmail.com Mon Feb 23 13:50:49 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Mon, 23 Feb 2009 14:50:49 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> Message-ID: <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> On Sat, Feb 21, 2009 at 7:50 PM, Peter wrote: > On Fri, Feb 20, 2009 at 11:19 PM, Brad Chapman wrote: >> Hi all; >> Good points on this debate so far. What do you all think about a >> hybrid approach where the .quality attribute is a dictionary? The >> keys would be the quality type ("phred", "solexa"...) and the values >> would be a list or string the same length as the sequence. > > I was actually thinking about adding a per_letter_annotations (or I suggest you to use github or any distribuited source versioning system to test the changes you are describing in this discussion. For example, I have created a branch on my github repository called 'qualityscores-experimental' (http://github.com/dalloliogm/biopython/tree/qualityscores-experimental) with a sample commit where I add a per_symbol_annotations attribute to SeqRecord: - http://github.com/dalloliogm/biopython/commit/7821d5f8cab1a5d7c4098c4b52f773b08a45969a I think that it is easier to discuss over this if you can show how the code would look like instead of only describing it. > using Brad's suggested name per_symbol_annotations) dictionary which > could hold phred qualities, solexa qualities, secondary structure, > atomic coordinates - any python sequence (e.g. string, list or tuple) > with a length matching the sequence. This would cover all the use > cases I have come up with, and we can implement SeqRecord slicing > which would also slice everything in the per_letter_annotations > dictionary. > > Note that the per_letter_annotations dictionary could actually be a > simple subclass of the python dictionary that only allows you to add > elements with the appropriate length - this would prevent simple > abuses/accidental errors. > >> For slicing, all of the quality dictionary values would be sliced >> identically to the sequence itself. For BioSQL storage the quality >> items would go in as annotations with names as a concatenation >> of the attribute and type ("quality_phred"). >> >> Treating these specially on the BioSQL in/out is a little hack-y, >> but quality is likely important enough to not bury it. > > If you are trying to store a sequence-with-quality in BioSQL, then yes > using the existing annotation tables could work - the ontology term > can tell us its a per-letter-annotation rather than a generic > annotation. The only catch is the current tables only let us store > strings. We could store each per-letter-annotation entry (e.g. a > single quality score) as a separate table entry (where the rank tells > us the correct order), but bundling them all into a single long table > row might be more efficient. In the case of PHRED or Solexa scores, > we could even use the FASTQ encoding (but a string "10, 20, 50, ..." > might be more sensible). This would require some co-ordination with > the other Bio* projects, probably on the BioSQL mailing list. > > On the other hand, I don't expect anyone to try and store GB of > sequence+quality data in BioSQL. For this a custom database design > would be much more efficient (or at least some custom tables). Here > as Iddo points out, the SeqRecord object may be overkill. > >> For Leighton's idea of generalization you could either: >> >> - Derive a heavy-weight SeqRecord class from the base class that >> added a several additional per-symbol cases. >> >> - Provide a generic per_symbol_annotations attribute that collected >> these as a dictionary of dictionaries: >> >> dict(quality = dict(phred = [20, 30]), >> hydrophobicity = dict(some_predictor = ['some', 'scores']) >> ) >> >> These could map to generic attributes in the same way and follow the >> same slicing rules. After writing this up, I think the second idea >> is better and probably exactly what Leighton was proposing. > > I'm not sure if its exactly what Leighton has in mind, but it seems > more complicated to have to do > my_record.per_symbol_annotations["quality"]["phred"] rather than just > my_record.per_symbol_annotations["quality_phred"]. I don't see much > benefit to the extra level of nesting - after all you'll typically > only have one type of quality present. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Mon Feb 23 14:24:04 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 14:24:04 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> Message-ID: <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> On Mon, Feb 23, 2009 at 1:50 PM, Giovanni Marco Dall'Olio wrote: > > I suggest you to use github or any distribuited source versioning > system to test the changes you are describing in this discussion. > > ... > > I think that it is easier to discuss over this if you can show how the > code would look like instead of only describing it. Or we can stick with the old fashioned approach of uploading patches to bugzilla. This proposal only requires additions to Bio/SeqRecord.py to define the new property, and won't change much existing code at all. I can see there are benefits to using a distributed source version system for more complicated patches touching lots of files, but it isn't needed here and (if you don't have git installed) using github might it actually make it harder for people to try the code on their local machine. Peter From p.j.a.cock at googlemail.com Mon Feb 23 15:31:35 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Feb 2009 15:31:35 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> Message-ID: <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> On Mon, Feb 23, 2009 at 1:31 PM, Giovanni Marco Dall'Olio wrote: > On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio > wrote: >> Hi, >> I have uploaded a git-converted branch of biopython on github, in case >> you want to try it and see how it works. >> >> You can find it here: >> - http://github.com/biopython/biopython/ > > Hi people, > so, I am still testing biopython on git. I should have said something two weeks ago, but I didn't actually realize you weren't just doing this with a branch under your own name. I think it is very misleading that you have created a git user called "biopython" and a branch called "biopython" with a description of "official biopython migration from cvs". I can see the value of having the official CVS server mirrored on github, but the way you have done this suggests this is an official project approved of by the biopython.org developers. What's more, if you get bored, I presume this branch on git hub won't get updated anymore and will just sit there - orphaned and out of date! > The function to convert a cvs repository to git works well: I have > just updated the branch on github to the latest cvs commit in > open-bio, and it has correctly imported all the new commits without > mixing them with the old ones. That sounds nice. > Now, if you look at http://github.com/biopython/biopython/network , > you can see the results from all these experiments: the black line > represent the code imported from cvs, and the other ones are > experiments (well, don't care about the red one). Does this work without Adobe flash? I don't have this on my Linux machine at home, and while I do have gnash it doesn't work on that many sites. Peter From eric.talevich at gmail.com Mon Feb 23 16:43:04 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 23 Feb 2009 11:43:04 -0500 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> Message-ID: <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> Hi folks, > The function to convert a cvs repository to git works well: I have > > just updated the branch on github to the latest cvs commit in > > open-bio, and it has correctly imported all the new commits without > > mixing them with the old ones. > > That sounds nice. > In support of Launchpad once again: Browsing the github docs, I don't see a way for this to be made automatic and continual through the site. (Of course, it's clearly against their financial interest to promote cvs/svn.) Launchpad appears to support it happily: https://help.launchpad.net/VcsImports I see biopython-test hasn't been set up this way yet. Should I try setting up a continuous mirror like this (under a name like biopython-cvs-test)? Or, would Bartek or Giovanni prefer to? > Now, if you look at http://github.com/biopython/biopython/network , > > you can see the results from all these experiments: the black line > > represent the code imported from cvs, and the other ones are > > experiments (well, don't care about the red one). > > Does this work without Adobe flash? I don't have this on my Linux > machine at home, and while I do have gnash it doesn't work on that > many sites. > It doesn't seem to work with gnash 0.8.4/amd64, but I think you could use gitk to get mostly the same information minus the snazzy site integration. Cheers, Eric From p.j.a.cock at googlemail.com Mon Feb 23 17:08:03 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 23 Feb 2009 17:08:03 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> Message-ID: <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> > In support of Launchpad once again: Browsing the github docs, I don't see a > way for this to be made automatic and continual through the site. (Of > course, it's clearly against their financial interest to promote cvs/svn.) Does anyone know if github can automatically keep in sync with ANY external repository? For example, suppose instead of CVS or SVN we actually ran a git server on biopython.org, would github be able to track it automatically? I really don't like the idea of relying on an external host - if github could mirror a repository on biopython.org that would seem much safer. > Launchpad appears to support it happily: > https://help.launchpad.net/VcsImports > > I see biopython-test hasn't been set up this way yet. Should I try setting > up a continuous mirror like this (under a name like biopython-cvs-test)? Or, > would Bartek or Giovanni prefer to? Given Bartek is one of the official Biopython developers, it might make more sense for him to try and setup a biopython-cvs-test tracker in launchpad if people want to try this. He may have done this already, as he seems to have several sub projects... I'm not sure and right now launchpad is being very slow (which does not impress me). See http://bazaar.launchpad.net/~bartek/biopython-test/trunk/files and links >> Does this [github] work without Adobe flash? ?I don't have this on my Linux >> machine at home, and while I do have gnash it doesn't work on that >> many sites. > > It doesn't seem to work with gnash 0.8.4/amd64, but I think you could use > gitk to get mostly the same information minus the snazzy site integration. That's a shame. At least gitk would work on any git repository, you wouldn't be tied into github. Peter From bartek at rezolwenta.eu.org Mon Feb 23 18:29:04 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Mon, 23 Feb 2009 19:29:04 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> Message-ID: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> >Does anyone know if github can automatically keep in sync with ANY >external repository? For example, suppose instead of CVS or SVN we >actually ran a git server on biopython.org, would github be able to >track it automatically? I really don't like the idea of relying on an >external host - if github could mirror a repository on biopython.org >that would seem much safer. I guess It's doable if we are allowed to setup cron jobs at open-bio. If we had a git branch at open-bio.org server, we could use git over ssh to push to the main branch and then set up a cron job which would push the main branch from open-bio to github, so that people can branch from it. The same thing is off course doable as well with bzr+launchpad. >> I see biopython-test hasn't been set up this way yet. Should I try setting >> up a continuous mirror like this (under a name like biopython-cvs-test)? Or, >> would Bartek or Giovanni prefer to? > > Given Bartek is one of the official Biopython developers, it might > make more sense for him to try and setup a biopython-cvs-test tracker > in launchpad if people want to try this. He may have done this > already, as he seems to have several sub projects... I'm not sure and > right now launchpad is being very slow (which does not impress me). > See http://bazaar.launchpad.net/~bartek/biopython-test/trunk/files and links > I've requesten launchpad to follow our cvs trunk. They should (after reviewing my request) put it into the location: https://code.edge.launchpad.net/~vcs-imports/biopython-test/trunk I'll post to the list if they get back to me. We'll see how it goes. >>> Does this [github] work without Adobe flash? I don't have this on my Linux >>> machine at home, and while I do have gnash it doesn't work on that >>> many sites. Github in iteslf does not depend on flash. In fact I don't think you need a browser at all to use it. Network visualization of your branch and its "relatives" is flash based, and thus not really accessible from some systems, but I don't think it's too important. cheers Bartek From biopython at maubp.freeserve.co.uk Mon Feb 23 18:34:18 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 23 Feb 2009 18:34:18 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <200902231425.21877.jblanca@btc.upv.es> References: <20090222212742.GA58314@kunkel> <320fb6e00902230242k2ff44a37h4c0a303c9847c8ca@mail.gmail.com> <200902231425.21877.jblanca@btc.upv.es> Message-ID: <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> Peter wrote: >> For the implementation, we could start with a simple dictionary and >> see if any kind of safety feature should be added later if is seems >> necessary. ?What I had in mind was a dict subclass which takes the >> sequence length, and by overriding the __setitem__ method checks only >> python sequences (objects with __len__ and __getitem__) of the >> appropriate length can be added. On Mon, Feb 23, 2009 at 1:25 PM, Jose Blanca wrote: > I'm not sure how to implement that. This is what I had in mind, though I haven't properly test it yet: class RestrictedDict(dict): """A dictionary which only allows sequences of given length as values.""" def __init__(self, length) : """Create an EMPTY dictionary.""" dict.__init__(self) self._length = int(length) def __setitem__(self, key, value) : if not hasattr(value,"__len__") or not hasattr(value,"__getitem__") \ or len(value) != self._length : raise TypeError("We only allow python sequences (lists, tuples or strings) of length %i." % self._length) dict.__setitem__(self, key, value) x = RestrictedDict(4) x["test"] = "abcd" x["test"] = ["a","b",5,None] x["test"] = (1,2,3,4) try : x["test"] = "abcde" #wrong length assert False except TypeError : pass try : x["test"] = 10 #not a sequence assert False except TypeError : pass > What would you think about creating a new > class based on dict but with an extra property, parent? parent would be a > reference to the SeqRecord. This new class would check the length of its > parent before adding the letter_annotation. I'm just asking because I'm > curious about the best way to implement it. This could work, and would also mean the length of the sequence would get updated if the parent SeqRecord's seq property was changed. On the other hand, this kind of thing could cause trouble for automatic garbage collection (because of the circular references between the objects). This may not be real problem, but its something I would worry about. Peter From jblanca at btc.upv.es Tue Feb 24 10:24:07 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Tue, 24 Feb 2009 11:24:07 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> References: <20090222212742.GA58314@kunkel> <200902231425.21877.jblanca@btc.upv.es> <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> Message-ID: <200902241124.07974.jblanca@btc.upv.es> On Monday 23 February 2009 19:34:18 Peter wrote: > class RestrictedDict(dict): > ? ? """A dictionary which only allows sequences of given length as > values.""" def __init__(self, length) : > ? ? ? ? """Create an EMPTY dictionary.""" > ? ? ? ? dict.__init__(self) > ? ? ? ? self._length = int(length) > ? ? def __setitem__(self, key, value) : > ? ? ? ? if not hasattr(value,"__len__") or not hasattr(value,"__getitem__") > \ or len(value) != self._length : > ? ? ? ? ? ? raise TypeError("We only allow python sequences (lists, > tuples or strings) of length %i." % self._length) > ? ? ? ? dict.__setitem__(self, key, value) An alternternative implementation using weakref to link the RestrictedDict with the SeqRecord. class RestrictedDict(dict): """A dictionary which only allows sequences of the same length as the parent as values.""" def __init__(self, parent): """Create an empty dictionary.""" dict.__init__(self) import weakref self._parent = weakref.ref(parent) def __setitem__(self, key, value): attrs = dir(value) if not "__len__" in attrs or not "__getitem__" in attrs: raise TypeError("We only allow python sequences (lists, tuples or strings)") if len(value) != len(self._parent()): raise TypeError('Lengths do not match.') dict.__setitem__(self, key, value) And in the SeqRecord __init__ we should add: #letter_annotations self.letter_annotations = RestrictedDict(self) -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From dalloliogm at gmail.com Tue Feb 24 11:54:53 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Tue, 24 Feb 2009 12:54:53 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> Message-ID: <5aa3b3570902240354j25ef5007g9ae750d70ed00993@mail.gmail.com> On Mon, Feb 23, 2009 at 4:31 PM, Peter Cock wrote: > On Mon, Feb 23, 2009 at 1:31 PM, Giovanni Marco Dall'Olio > wrote: >> On Sun, Feb 15, 2009 at 4:29 PM, Giovanni Marco Dall'Olio >> wrote: >>> Hi, >>> I have uploaded a git-converted branch of biopython on github, in case >>> you want to try it and see how it works. >>> >>> You can find it here: >>> - http://github.com/biopython/biopython/ >> >> Hi people, >> so, I am still testing biopython on git. > > I should have said something two weeks ago, but I didn't actually > realize you weren't just doing this with a branch under your own name. > > I think it is very misleading that you have created a git user called > "biopython" and a branch called "biopython" with a description of > "official biopython migration from cvs". I can see the value of > having the official CVS server mirrored on github, but the way you > have done this suggests this is an official project approved of by the > biopython.org developers. Do not worry too much about that.. I also hadn't had too much time to refine it. I was going to send you the credentials of the biopython user, or to anyone wishing to have them, but I wanted to test the cvs update first. In any case, the term 'official' was just meant to indicate that all the other branches should be derived from that, as there are other biopython derivates on github already. > What's more, if you get bored, I presume > this branch on git hub won't get updated anymore and will just sit > there - orphaned and out of date! That is a matter of setting a cron job somewhere to automatically update the branch. However, I don't know if github can mirror a cvs repository, maybe not. But I just wanted to show you how a decentralized versioning system works and how it can be used in a more 'centralized' way, with an official repository - since this is what you were asking earlier. >> The function to convert a cvs repository to git works well: I have >> just updated the branch on github to the latest cvs commit in >> open-bio, and it has correctly imported all the new commits without >> mixing them with the old ones. > > That sounds nice. > >> Now, if you look at http://github.com/biopython/biopython/network , >> you can see the results from all these experiments: the black line >> represent the code imported from cvs, and the other ones are >> experiments (well, don't care about the red one). > > Does this work without Adobe flash? I don't have this on my Linux > machine at home, and while I do have gnash it doesn't work on that > many sites. It seems to not work with gnash... however, you can still see how many derived branches there are, which in launchpad is handled in a different way (since you were asking for the differences, again). > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From bugzilla-daemon at portal.open-bio.org Tue Feb 24 12:04:40 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 24 Feb 2009 07:04:40 -0500 Subject: [Biopython-dev] [Bug 2771] Bio.Entrez.read can't parse XML files from dbSNP (snp database) In-Reply-To: Message-ID: <200902241204.n1OC4erT008537@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2771 ------- Comment #4 from dalloliogm at gmail.com 2009-02-24 07:04 EST ------- (In reply to comment #3) > (In reply to comment #2) > > This may be an NCBI bug, ... > > According to this page there is/was a problem with the XML files returned for > the snp database by efetch, > http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html > > >> Known issues > >> * ... > >> * eFetch utility generates an invalid XML for SNP, so currently it doesn't > >> work through SOAP. The bug is being fixed. > >> * ... > > Unfortunately I have no idea if that information is current or not. This could > been unrelated. Yeah, unfortunately the XML seems to be still invalid. I have tried to paste an XML result from Bio.Entrez to many XML validators, but they detect errors. I have also tried with a python module to interrogate SOAP services (suds) and it also return errors. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 24 12:46:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Feb 2009 12:46:34 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <200902241124.07974.jblanca@btc.upv.es> References: <20090222212742.GA58314@kunkel> <200902231425.21877.jblanca@btc.upv.es> <320fb6e00902231034q33fe4e6aofba4b238d67f020d@mail.gmail.com> <200902241124.07974.jblanca@btc.upv.es> Message-ID: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> On Tue, Feb 24, 2009 at 10:24 AM, Jose Blanca wrote: > > An alternternative implementation using weakref to link the RestrictedDict > with the SeqRecord. > > ... > Your code seems a little more complicated, but should work too. It would mean that if the parent SeqRecord's seq property was altered, the per-letter-annotation dictionary would know the new length. This is better - but if someone did change the parent SeqRecord's seq, then perhaps we should also automatically clear the per-letter-annotation? We could do this by using a full property for the seq attribute, which would also us to clear any existing per-letter-annotation by replacing it with a new restricted dictionary using the new length. Peter From p.j.a.cock at googlemail.com Tue Feb 24 12:59:45 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 24 Feb 2009 12:59:45 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <5aa3b3570902240354j25ef5007g9ae750d70ed00993@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <5aa3b3570902240354j25ef5007g9ae750d70ed00993@mail.gmail.com> Message-ID: <320fb6e00902240459i58ae1ad7w761c079a86fa389@mail.gmail.com> On Tue, Feb 24, 2009 at 11:54 AM, Giovanni Marco Dall'Olio wrote: > On Mon, Feb 23, 2009 at 4:31 PM, Peter Cock wrote: >> >> I think it is very misleading that you have created a git user called >> "biopython" and a branch called "biopython" with a description of >> "official biopython migration from cvs". ?I can see the value of >> having the official CVS server mirrored on github, but the way you >> have done this suggests this is an official project approved of by the >> biopython.org developers. > > Do not worry too much about that.. I also hadn't had too much time to > refine it. I was going to send you the credentials of the biopython user, > or to anyone wishing to have them, but I wanted to test the cvs update > first. > In any case, the term 'official' was just meant to indicate that all > the other branches should be derived from that, as there are other > biopython derivates on github already. The new description of "mirror of official biopython cvs on github" is much better - thanks. I would go further and call it "Unofficial test github mirror of Biopython CVS". If we do decide to use github (even just as a mirror to our own hosted repository), then yes giving the current Biopython admins control of the github "biopython" user would be a good idea. >> What's more, if you get bored, I presume >> this branch on git hub won't get updated anymore and will just sit >> there - orphaned and out of date! > > That is a matter of setting a cron job somewhere to automatically > update the branch. In the short term (as this is just an experiment for now), testing a daily cron job on your machine would be a good idea. In the long term (assuming we want an "official" github mirror), then doing it from the biopython.org repository server would be better. In theory this could be hooked into our main repository to push any trunk branch commits to github immediately. It would be much nicer if github could track an external repository on its own (like Bartek is hoping to get setup with Launchpad). Peter From lpritc at scri.ac.uk Tue Feb 24 13:04:09 2009 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 24 Feb 2009 13:04:09 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> Message-ID: On 24/02/2009 12:46, "Peter" wrote: > On Tue, Feb 24, 2009 at 10:24 AM, Jose Blanca wrote: >> >> An alternternative implementation using weakref to link the RestrictedDict >> with the SeqRecord. >> >> ... >> > > Your code seems a little more complicated, but should work too. It > would mean that if the parent SeqRecord's seq property was altered, > the per-letter-annotation dictionary would know the new length. This > is better - but if someone did change the parent SeqRecord's seq, then > perhaps we should also automatically clear the per-letter-annotation? > We could do this by using a full property for the seq attribute, which > would also us to clear any existing per-letter-annotation by replacing > it with a new restricted dictionary using the new length. I can think of two particular incompatible situations here: 1) I change the parent SeqRecord sequence, by slicing it to a region I'm interested in. I want to keep the per-symbol-annotation, but adjusted to the new sequence. 2) I change the parent SeqRecord sequence by adding some more symbols to it. I've just destroyed the association between the per-symbol-annotation and my sequence without even realising it. I'd prefer a warning that this is going to happen before it destroys my earlier work, so I can make the change in a duplicate SeqRecord object. I think it's worth considering which behaviours we would find desirable, and how to handle others. We'll all have different use cases, I imagine... L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From bugzilla-daemon at portal.open-bio.org Tue Feb 24 13:29:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 24 Feb 2009 08:29:31 -0500 Subject: [Biopython-dev] [Bug 2768] Bio.Entrez under a proxy In-Reply-To: Message-ID: <200902241329.n1ODTVlX016168@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2768 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-24 08:29 EST ------- Fixed in Tutorial.tex CVS revision 1.201, see: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Feb 24 14:08:17 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Feb 2009 14:08:17 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: References: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> Message-ID: <320fb6e00902240608r50191274m7dc9a996f13964d9@mail.gmail.com> On Tue, Feb 24, 2009 at 1:04 PM, Leighton Pritchard wrote: >> Your code seems a little more complicated, but should work too. ?It >> would mean that if the parent SeqRecord's seq property was altered, >> the per-letter-annotation dictionary would know the new length. ?This >> is better - but if someone did change the parent SeqRecord's seq, then >> perhaps we should also automatically clear the per-letter-annotation? >> We could do this by using a full property for the seq attribute, which >> would also us to clear any existing per-letter-annotation by replacing >> it with a new restricted dictionary using the new length. > > I can think of two particular incompatible situations here: > > 1) I change the parent SeqRecord sequence, by slicing it to a region I'm > interested in. ?I want to keep the per-symbol-annotation, but adjusted to > the new sequence. If you did this by my_record[50:100] (assuming we implement the __getitem__ method, see Bug 2507), then my_record isn't changed - you'd get a new SeqRecord back for the partial sequence, with the appropriate per-symbol-annotation (by which I mean each per-symbol-annotation sequence would have been sliced using [50:100] to match, and a new dictionary created to hold these sub-sequences of the per-symbol-annotation). I'll try and upload a SeqRecord patch that does this shortly... > 2) I change the parent SeqRecord sequence by adding some more symbols to it. > I've just destroyed the association between the per-symbol-annotation and my > sequence without even realising it. ?I'd prefer a warning that this is going > to happen before it destroys my earlier work, so I can make the change in a > duplicate SeqRecord object. This situation could be caught by a set method for the SeqRecord seq property (not implemented yet). I was thinking this would silently throw away the old per-symbol-annotation, but this could instead raise an error (and make no changes), or issue a warning (but carry on). Good point. Peter From jblanca at btc.upv.es Tue Feb 24 14:26:02 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Tue, 24 Feb 2009 15:26:02 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902240608r50191274m7dc9a996f13964d9@mail.gmail.com> References: <320fb6e00902240446j1b8c1ceerfb53cb6871479324@mail.gmail.com> <320fb6e00902240608r50191274m7dc9a996f13964d9@mail.gmail.com> Message-ID: <200902241526.02209.jblanca@btc.upv.es> > Your code seems a little more complicated, but should work too. ?It > would mean that if the parent SeqRecord's seq property was altered, >the per-letter-annotation dictionary would know the new length. I did it that way to allow the creation of an empty SeqRecord and to modify the seq property after the creation. I don't know if that's a behaviour supported by biopython, but it can be done now. Your proposed seq property implementation could take care of that removing the possibility of setting seq after the creation. > > 2) I change the parent SeqRecord sequence by adding some more symbols to > > it. I've just destroyed the association between the per-symbol-annotation > > and my sequence without even realising it. ?I'd prefer a warning that > > this is going to happen before it destroys my earlier work, so I can make > > the change in a duplicate SeqRecord object. > > This situation could be caught by a set method for the SeqRecord seq > property (not implemented yet). I was thinking this would silently > throw away the old per-symbol-annotation, but this could instead raise > an error (and make no changes), or issue a warning (but carry on). > Good point. I would also prefer to raise an error in that case, because the user wouldn't be aware of the problem if the per-symbol-annotation is thown away without any warning. Regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From bsouthey at gmail.com Tue Feb 24 14:31:44 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 24 Feb 2009 08:31:44 -0600 Subject: [Biopython-dev] FYI: Scipy and DVCS Message-ID: <49A404D0.3050106@gmail.com> Hi, In connection with our discussions, there is a long thread (already about 100 entries) started by a post St?fan van der Walt titled 'The future of SciPy and its development infrastructure' : http://thread.gmane.org/gmane.comp.python.scientific.devel/10065 " I'd like to propose two changes to the status quo: 1. Change to a distributed revision control system, encouraging more open collaboration. 2. Determine guidelines for code acceptance, in terms of unit tests, documentation and peer review. " No real conclusions but there is concern about having a suitable bug tracker system as well. Regards Bruce From biopython at maubp.freeserve.co.uk Tue Feb 24 17:22:01 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 24 Feb 2009 17:22:01 +0000 Subject: [Biopython-dev] Converting between PHRED and Solexa quality scores (and FASTQ files) Message-ID: <320fb6e00902240922x1cf77a7amf387432d7f79e51b@mail.gmail.com> Hopefully this information will be of general interest - I could have just stuck it on the end of Bug 2767 but thought it more suited to the mailing list (or even a blog post?). http://bugzilla.open-bio.org/show_bug.cgi?id=2767 Nice links on mapping between Solexa and PHRED scores, http://maq.sourceforge.net/qual.shtml http://maq.sourceforge.net/fastq.shtml (missing some brackets in the final formula at the time of writing, I've emailed them) and: http://illumina.ucr.edu/ht/documentation/file-formats http://rcdev.umassmed.edu/pipeline/Alignment%20Scoring%20Guide%20and%20FAQ.html (note they are missing a minus sign in the definition of Q_solexa) For good quality reads the two scores are almost equal - but they differ for poor quality reads (PHRED scores go to zero, but Solexa scores can be negative). A standard FASTQ file (as used by Sanger) encodes the quality information using PHRED scores, while Solexa/Illumina decided to use their own schema in the FASTQ variant. In a PHRED style FASTQ file, PHRED quality = ord(letter) - 33 In a Solexa style FASTQ file, Solexa quality = ord(letter) - 64 >>> def phred_quality_from_fastq_letter(letter) : ... return ord(letter) - 33 ... >>> def solexa_quality_from_fastq_letter(letter) : ... return ord(letter) - 64 ... Both these scores are defined in terms of the estimated probability of an error (between 0 for a good read and 1 for a bad read). A probability of almost zero gives a high quality score, while a probability of almost one gives a very low quality score. >>> def phred_quality_from_error(error) : ... return -10*log(error,10) ... >>> def solexa_quality_from_error(error) : ... return -10*log(error/(1-error),10) ... >>> solexa_quality_from_error(0.000000001) 89.999999995657035 >>> solexa_quality_from_error(0.999999999) -90.000000118483911 >>> phred_quality_from_error(0.000000001) 89.999999999999986 >>> phred_quality_from_error(0.999999999) 4.3429446983771231e-09 >>> phred_quality_from_error(1) -0.0 Using these relationships you can map between PHRED and Solexa quality scores, assuming their error estimation methods are equivalent, >>> def solexa_quality_from_phred(phred_quality) : ... return 10*log(10**(phred_quality/10.0) - 1, 10) ... >>> solexa_quality_from_phred(90) 89.999999995657035 >>> solexa_quality_from_phred(50) 49.99995657033466 >>> solexa_quality_from_phred(10) 9.5424250943932485 >>> solexa_quality_from_phred(1) -5.8682532438011537 >>> solexa_quality_from_phred(0.1) -16.32774717238372 Or, the other way round, >>> def phred_quality_from_solexa(solexa_quality) : ... return 10*log(10**(solexa_quality/10.0) + 1, 10) ... >>> phred_quality_from_solexa(90) 90.000000004342922 >>> phred_quality_from_solexa(10) 10.41392685158225 >>> phred_quality_from_solexa(0) 3.0102999566398116 >>> phred_quality_from_solexa(-20) 0.043213737826425784 I think these python versions agree with the perl examples on http://maq.sourceforge.net/qual.shtml (doing a base ten logarithm seems much easier in python than in perl). Combining this with the letter mapping using in the Solexa FASTQ files, ord(letter)-64, we have: >>> def phred_quality_from_solexa_fastq_letter(letter) : ... return 10*log(10**((ord(letter)-64)/10.0) + 1, 10) This seems to agree with the perl example on http://maq.sourceforge.net/fastq.shtml (allowing for the missing brackets which I've emailed them about). So, in conclusion: >>> phred_quality_from_fastq_letter("!") 0 >>> phred_quality_from_fastq_letter("{") 90 >>> solexa_quality_from_fastq_letter("!") -31 >>> solexa_quality_from_fastq_letter("{") 59 >>> phred_quality_from_solexa_fastq_letter("!") 0.0034483543102526788 >>> phred_quality_from_solexa_fastq_letter("{") 59.000005467440147 Its very tricky to guess which FASTQ variant you have from the data itself (but from the range of characters, some examples can only be Solexa style). If we know we have a standard FASTQ file we can trivially get the PHRED scores. If we have a Solexa encoded FASTQ file, we can trivially get the Solexa scores. With this log mapping we *could* also do an implicit conversion of Solexa scores into PHRED scores, but due to floating point issues this is a little lossy. I would say follow python conventions and go with making things explicit, and not do this automatically when parsing. We could do this automatically if the user explicitly asks Bio.SeqIO to write out a "fastq-solexa" format file and their SeqRecords don't have Solexa qualities but do have PHRED qualities (or vice versa). Peter From chapmanb at 50mail.com Tue Feb 24 23:11:14 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 24 Feb 2009 18:11:14 -0500 Subject: [Biopython-dev] Converting between PHRED and Solexa quality scores (and FASTQ files) In-Reply-To: <320fb6e00902240922x1cf77a7amf387432d7f79e51b@mail.gmail.com> References: <320fb6e00902240922x1cf77a7amf387432d7f79e51b@mail.gmail.com> Message-ID: <20090224231114.GB39545@sobchak.mgh.harvard.edu> Peter; This is a great summary. I think these things belong on the wiki on the documentation page once the functionality is rolled into Biopython; it's a shame to see useful documentation hidden on the dev mailing list. Agreed 100% with no auto conversion. Providing the functionality to convert is plenty, and I think it would be more confusing to start seeing one type of scores when you expected another. Also, given the size of these data sets we want to be as lightweight as possible. Brad > Hopefully this information will be of general interest - I could have > just stuck it on the end of Bug 2767 but thought it more suited to the > mailing list (or even a blog post?). > http://bugzilla.open-bio.org/show_bug.cgi?id=2767 > > Nice links on mapping between Solexa and PHRED scores, > http://maq.sourceforge.net/qual.shtml > http://maq.sourceforge.net/fastq.shtml (missing some brackets in the > final formula at the time of writing, I've emailed them) > > and: > http://illumina.ucr.edu/ht/documentation/file-formats > http://rcdev.umassmed.edu/pipeline/Alignment%20Scoring%20Guide%20and%20FAQ.html > (note they are missing a minus sign in the definition of Q_solexa) > > For good quality reads the two scores are almost equal - but they > differ for poor quality reads (PHRED scores go to zero, but Solexa > scores can be negative). > > A standard FASTQ file (as used by Sanger) encodes the quality > information using PHRED scores, while Solexa/Illumina decided to use > their own schema in the FASTQ variant. > > In a PHRED style FASTQ file, PHRED quality = ord(letter) - 33 > In a Solexa style FASTQ file, Solexa quality = ord(letter) - 64 > > >>> def phred_quality_from_fastq_letter(letter) : > ... return ord(letter) - 33 > ... > >>> def solexa_quality_from_fastq_letter(letter) : > ... return ord(letter) - 64 > ... > > Both these scores are defined in terms of the estimated probability of > an error (between 0 for a good read and 1 for a bad read). A > probability of almost zero gives a high quality score, while a > probability of almost one gives a very low quality score. > > >>> def phred_quality_from_error(error) : > ... return -10*log(error,10) > ... > >>> def solexa_quality_from_error(error) : > ... return -10*log(error/(1-error),10) > ... > >>> solexa_quality_from_error(0.000000001) > 89.999999995657035 > >>> solexa_quality_from_error(0.999999999) > -90.000000118483911 > >>> phred_quality_from_error(0.000000001) > 89.999999999999986 > >>> phred_quality_from_error(0.999999999) > 4.3429446983771231e-09 > >>> phred_quality_from_error(1) > -0.0 > > Using these relationships you can map between PHRED and Solexa quality > scores, assuming their error estimation methods are equivalent, > > >>> def solexa_quality_from_phred(phred_quality) : > ... return 10*log(10**(phred_quality/10.0) - 1, 10) > ... > >>> solexa_quality_from_phred(90) > 89.999999995657035 > >>> solexa_quality_from_phred(50) > 49.99995657033466 > >>> solexa_quality_from_phred(10) > 9.5424250943932485 > >>> solexa_quality_from_phred(1) > -5.8682532438011537 > >>> solexa_quality_from_phred(0.1) > -16.32774717238372 > > Or, the other way round, > > >>> def phred_quality_from_solexa(solexa_quality) : > ... return 10*log(10**(solexa_quality/10.0) + 1, 10) > ... > >>> phred_quality_from_solexa(90) > 90.000000004342922 > >>> phred_quality_from_solexa(10) > 10.41392685158225 > >>> phred_quality_from_solexa(0) > 3.0102999566398116 > >>> phred_quality_from_solexa(-20) > 0.043213737826425784 > > I think these python versions agree with the perl examples on > http://maq.sourceforge.net/qual.shtml (doing a base ten logarithm > seems much easier in python than in perl). > > Combining this with the letter mapping using in the Solexa FASTQ > files, ord(letter)-64, we have: > > >>> def phred_quality_from_solexa_fastq_letter(letter) : > ... return 10*log(10**((ord(letter)-64)/10.0) + 1, 10) > > This seems to agree with the perl example on > http://maq.sourceforge.net/fastq.shtml (allowing for the missing > brackets which I've emailed them about). > > So, in conclusion: > > >>> phred_quality_from_fastq_letter("!") > 0 > >>> phred_quality_from_fastq_letter("{") > 90 > >>> solexa_quality_from_fastq_letter("!") > -31 > >>> solexa_quality_from_fastq_letter("{") > 59 > >>> phred_quality_from_solexa_fastq_letter("!") > 0.0034483543102526788 > >>> phred_quality_from_solexa_fastq_letter("{") > 59.000005467440147 > > Its very tricky to guess which FASTQ variant you have from the data > itself (but from the range of characters, some examples can only be > Solexa style). > > If we know we have a standard FASTQ file we can trivially get the > PHRED scores. If we have a Solexa encoded FASTQ file, we can > trivially get the Solexa scores. With this log mapping we *could* > also do an implicit conversion of Solexa scores into PHRED scores, but > due to floating point issues this is a little lossy. I would say > follow python conventions and go with making things explicit, and not > do this automatically when parsing. We could do this automatically if > the user explicitly asks Bio.SeqIO to write out a "fastq-solexa" > format file and their SeqRecords don't have Solexa qualities but do > have PHRED qualities (or vice versa). > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bartek at rezolwenta.eu.org Wed Feb 25 09:40:49 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Feb 2009 10:40:49 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> Message-ID: <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> On Mon, Feb 23, 2009 at 7:29 PM, Bartek Wilczynski wrote: > > I've requested launchpad to follow our cvs trunk. They should (after > reviewing my request) put it into the location: > https://code.edge.launchpad.net/~vcs-imports/biopython-test/trunk > I'll post to the list if they get back to me. We'll see how it goes. > There is a small technical problem with bazaar following our repo. For now, their scripts are working only with cvs pserver connections without password. It shouldn't be too difficult for them to adjust (anyway the setup of each import is not fully automated), but just in case it's not possible for now: Can we set up a user with no password and read-only access to our cvs repo? Who would be the right person to contact? cheers Bartek From dalloliogm at gmail.com Wed Feb 25 10:02:37 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Wed, 25 Feb 2009 11:02:37 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> Message-ID: <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> On Mon, Feb 23, 2009 at 3:24 PM, Peter wrote: > On Mon, Feb 23, 2009 at 1:50 PM, Giovanni Marco Dall'Olio > wrote: >> >> I suggest you to use github or any distribuited source versioning >> system to test the changes you are describing in this discussion. >> >> ... >> >> I think that it is easier to discuss over this if you can show how the >> code would look like instead of only describing it. > > Or we can stick with the old fashioned approach of uploading patches > to bugzilla. ?This proposal only requires additions to > Bio/SeqRecord.py to define the new property, and won't change much > existing code at all. Of course you can stick with bugzilla, but let me explain why I think using a drcs would be better :-). Basically, you should consider that with a drcs you can create forks very frequently, even for three or four commits, and when you have finished you merge the changes back and nobody will ever know that there it was a fork. If you want to change an attribute to SeqRecord, this doesn't imply a single commit: you have to test various solutions, provide tests for each of them, see which one is the most comfortable, and only then, push it in the official release. Basically, what you do now is similar to what you would do with a drcs: each one of you will probably have a modified copy of biopython on his computer, and when he will have finished he will create a patch or commit to the cvs system. However, the problem is that these local copies are on local computers, and for other people it is very difficult to evaluate them and to give good feedback. Moreover, these copies can become out of synchronization with the official branch. You can post some code snippets via mail, but you probably won't post the tests and many other things. If you create an experimental branch to test the new attribute to SeqRecord, along with its tests and all the separated commits for every change, and post it on a publicly accessible web site, then it will be possible to discuss a lot more over the changes, and I think this could improve the biopython's development process. > > I can see there are benefits to using a distributed source version > system for ?more complicated patches touching lots of files, but it > isn't needed here and (if you don't have git installed) using github > might it actually make it harder for people to try the code on their > local machine. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Wed Feb 25 10:10:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Feb 2009 10:10:19 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> Message-ID: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> On Wed, Feb 25, 2009 at 9:40 AM, Bartek Wilczynski wrote: > On Mon, Feb 23, 2009 at 7:29 PM, Bartek Wilczynski > wrote: >> >> I've requested launchpad to follow our cvs trunk. They should (after >> reviewing my request) put it into the location: >> https://code.edge.launchpad.net/~vcs-imports/biopython-test/trunk >> I'll post to the list if they get back to me. We'll see how it goes. > > There is a small technical problem with bazaar following our repo. For > now, their scripts are working only with cvs pserver connections without > password. It shouldn't be too difficult for them to adjust (anyway the > setup of each import is not fully automated), but just in case it's not > possible for now: Can we set up a user with no password and read-only > access to our cvs repo? Who would be the right person to contact? Right now as far as I know you need username "cvs", password "cvs" - or a full developer account. I guess another read only account could be setup (maybe "guest") with no password, assuming there are no security issues with this, but the OBF guys would have to do this. You could ask them on support at helpdesk.open-bio.org but given we probably won't continue with CVS that much longer anyway, it seems a bit pointless to hassle the OBF over this now - it might easier to just encourage Bazaar to deal with a password (as I'm sure lots of open source projects have a simple password like this). Peter From bartek at rezolwenta.eu.org Wed Feb 25 10:56:01 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Wed, 25 Feb 2009 11:56:01 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> Message-ID: <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> On Wed, Feb 25, 2009 at 11:10 AM, Peter wrote: > > Right now as far as I know you need username "cvs", password "cvs" ?- > or a full developer account. ?I guess another read only account could > be setup (maybe "guest") with no password, assuming there are no > security issues with this, but the OBF guys would have to do this. > You could ask them on support at helpdesk.open-bio.org but given we > probably won't continue with CVS that much longer anyway, it seems a > bit pointless to hassle the OBF over this now - it might easier to > just encourage Bazaar to deal with a password (as I'm sure lots of > open source projects have a simple password like this). I've already contacted them about this, but this might take time for them to update their procedures to support passwords. In the meantime, I'll try to look into crontab based update procedure which wouldn't require anything on the launchpad part. cheers Bartek From bugzilla-daemon at portal.open-bio.org Wed Feb 25 15:42:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 25 Feb 2009 10:42:25 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200902251542.n1PFgP2Z029511@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #998 is|0 |1 obsolete| | ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-25 10:42 EST ------- Created an attachment (id=1249) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1249&action=view) Patch to SeqRecord.py and SeqFeature.py This updates the old patch (which no longer applied cleanly to CVS), and implements per-letter-annotation with a restricted dictionary as discussed on the mailing list. The precise name for the publicly exposed per-letter-annotation dictionary is still open to debate, here I have used letter_annotation - see the mailing list for more: http://lists.open-bio.org/pipermail/biopython-dev/2009-February/005340.html This includes a lengthy doctest on the SeqRecord __getitem__ method, but further additions to the unit tests would be wise. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Feb 25 22:00:10 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 25 Feb 2009 22:00:10 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> Message-ID: <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> On Wed, Feb 25, 2009 at 10:02 AM, Giovanni Marco Dall'Olio wrote: > > Of course you can stick with bugzilla, ... > I've put an updated patch on Bug 2507 which implements the per-letter-annotations as a restricted dictionary (as the letter_annotations property for now), and adds a __getitem__ method to the SeqRecord object which is aware of it. This changes both SeqRecord.py and SeqFeature.py (required for switching the co-ordinates on SeqFeature objects as part of a SeqRecord slice), and is against the current CVS code. http://bugzilla.open-bio.org/show_bug.cgi?id=2507 If any of you aren't familiar with using the command line tools diff and patch, here's what you would do to try this code. Get a copy of the latest Biopython code from CVS, change to the Bio directory, download the attachment and save it in that directory as attachment.patch (for example) then and run "patch < attachment.patch" to update the code. Peter From bartek at rezolwenta.eu.org Thu Feb 26 13:26:15 2009 From: bartek at rezolwenta.eu.org (Bartek Wilczynski) Date: Thu, 26 Feb 2009 14:26:15 +0100 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> Message-ID: <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> Hi all, I've been looking around for alternative ways of converting our current CVS repository to a new DVCS system (git or bzr). The launchpad team offers the possibility of automatic mirroring of a cvs repository in a bzr branch, but it would require a chenge in configuration on our side (they still didn't answer my request to support password protected repos). I was looking for other option, and it seems that there is a way to solve the problem of mirroring. There is a tool cvs2git (a part of cvs2svn package http://cvs2svn.tigris.org/cvs2git.html), which reads a cvs repository and outputs a dump which is readable by both git and bzr (using the fast-import command). The nice thing about it is that it's very fast (~3mins) for the whole biopython repo. I've setup a small script, which grabs the newest cvs repo from dev.open-bio.org and converts it to git and bzr branches which are then pushed to github and launchpad. it currently runs as a crontab script on my machine and it could be transferred to open-bio.org if they would install bzr and git, but I'm fine with running it from my computer for a few months, especially if we plan to drop CVS support in a foreseeable future, which would make the installation of the script to open-bio servers useless. You can see the branches here: http://github.com/barwil/biopython-test/tree/master https://code.launchpad.net/~bartek/biopython-test/trunk_updates the branches are different than the previous ones made by me and Giovanni, because they now include the whole repository (including biodata,html,website etc.). We might consider spliting these into different repos. Using this kind of setup, we are allowing all interested to easily fork our current repo and then even merge their changes into the newer version of exported source. The only problem is, that as long as the CVS is the main repository, it might be difficult to commit these changes back to CVS. Does anyone have a clever idea for an easy procedure to commit things back to CVS? Because having a branch is not of much use if cannot easily accept contributions. All comments and/or ideas are welcome cheers Bartek From biopython at maubp.freeserve.co.uk Thu Feb 26 14:00:46 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Feb 2009 14:00:46 +0000 Subject: [Biopython-dev] biopython on github In-Reply-To: <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> References: <5aa3b3570902150729g367022a5p334b2c33f86461f@mail.gmail.com> <5aa3b3570902230531k6a0da3e0rdec28079971f1193@mail.gmail.com> <320fb6e00902230731h6257376sb2d6772f72b6e03a@mail.gmail.com> <3f6baf360902230843u320e9fe9wc0a03928383d6cbb@mail.gmail.com> <320fb6e00902230908j38f5755la85a55bfc461a763@mail.gmail.com> <8b34ec180902231029u7a9d003r533af7f078f4a8e2@mail.gmail.com> <8b34ec180902250140k4fb1bef0y913b97db0e309e4b@mail.gmail.com> <320fb6e00902250210t2ad19536ke379e219ba6f7dae@mail.gmail.com> <8b34ec180902250256k6f6f5c1bvbf85d8b68a315927@mail.gmail.com> <8b34ec180902260526m3ff42f3x2a99a77d4d0fb928@mail.gmail.com> Message-ID: <320fb6e00902260600p5fb90241td1ded497c08cb901@mail.gmail.com> On Thu, Feb 26, 2009 at 1:26 PM, Bartek Wilczynski wrote: > You can see the branches here: > http://github.com/barwil/biopython-test/tree/master > https://code.launchpad.net/~bartek/biopython-test/trunk_updates I would have just gone with the main Biopython "directory", ignoring the old website etc. > Using this kind of setup, we are allowing all interested to easily > fork our current repo and then even merge their changes into the newer > version of exported source. The only problem is, that as long as the > CVS is the main repository, it might be difficult to commit these > changes back to CVS. Does anyone have a clever idea for an easy > procedure to commit things back to CVS? Because having a branch is not > of much use if cannot easily accept contributions. Can't you produce a diff between the git mirror of CVS, and your modified branch - and then we can deal with the patch as usual via CVS? Another option to consider would be to switch to running git on biopython.org, but use the git-cvsserver tool to provide an emulated CVS server on top of the git repository. This sounds possible in theory, and would be nice for any "old fashioned" biopython developers because is should be fairly transparent - they can continue to treat it as CVS and just work on the main trunk. This would require someone competent to do the conversion and alter the server setup - we'd have to talk to the OBF team about this. However, if anyone has first hand experience on git-cvsserver perhaps they could comment on weather this sounds like a good plan or not. Peter From jblanca at btc.upv.es Thu Feb 26 15:12:54 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 26 Feb 2009 16:12:54 +0100 Subject: [Biopython-dev] library to create gel image Message-ID: <200902261612.54306.jblanca@btc.upv.es> Hi: I'm writting an application that reads ABIF files (Applied Biosystems files) and generates a gel image. I'm able to read the trace (chromatogram) data from the file and now I would like to plot it. I don't want to plot every trace as a 2d graphic like in: http://www.mun.ca/biology/scarr/ABI377_chromatogram.jpg But to create a 2D gel image using all traces like in: http://www.fieldmuseum.org/research_collections/pritzker_lab/pritzker/images/ecran.jpg Any suggestion on which python library could I use? Of course, if anybody is interested in the code that I already got I'm willing to share it. Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Thu Feb 26 18:51:07 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 26 Feb 2009 18:51:07 +0000 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200902261612.54306.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> Message-ID: <320fb6e00902261051n62899098i86edd36ba00ee7d4@mail.gmail.com> On Thu, Feb 26, 2009 at 3:12 PM, Jose Blanca wrote: > Hi: > I'm writting an application that reads ABIF files (Applied Biosystems files) > and generates a gel image. I'm able to read the trace (chromatogram) data > from the file and now I would like to plot it. I don't want to plot every > trace as a 2d graphic like in: > http://www.mun.ca/biology/scarr/ABI377_chromatogram.jpg > But to create a 2D gel image using all traces like in: > http://www.fieldmuseum.org/research_collections/pritzker_lab/pritzker/images/ecran.jpg > Any suggestion on which python library could I use? Do you want to recreate the hexagonal grid, or would a simplified rectangular grid do? Do you need to be able to control the size, colour and intensity of the spots (in order to recreate the something close to the original). Do you get quality control information for nasty cases (e.g. non-circular dots, say a ring donut shape)? If you need this kind of fine control it would be a lot of work but you certainly could do this "by hand" using a number of python packages - for example ReportLab would let you generate PDF, PS, SVG or bitmap images from the same drawing object. Other backends might be equally suitable. > Of course, if anybody is interested in the code that I already got I'm willing > to share it. > Best regards, The code for reading the trace (chromatogram) data from ABIF files (Applied Biosystems files) might make a nice a addition to the Bio.Sequencing module. Peter From jblanca at btc.upv.es Fri Feb 27 09:05:28 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Feb 2009 10:05:28 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <320fb6e00902261400v22af3c52ob1a8cb80113f6756@mail.gmail.com> References: <200902261612.54306.jblanca@btc.upv.es> <1235675865.49a6ead98bc24@webmail.upv.es> <320fb6e00902261400v22af3c52ob1a8cb80113f6756@mail.gmail.com> Message-ID: <200902271005.28459.jblanca@btc.upv.es> > The example was a bit small - so I had guessed a bit, and it sounds > like my guess was wrong. Do you have a larger example picture? I want something like the Genographer software does: http://hordeum.oscs.montana.edu/genographer/help/tutorial/tutorial.html But I don't need an interactive GUI application, just the gel rendering. You can take a look at the code at: http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/ Take into account that is just a work in progress. Suggestions are welcomed. Regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From biopython at maubp.freeserve.co.uk Fri Feb 27 10:45:59 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 10:45:59 +0000 Subject: [Biopython-dev] library to create gel image In-Reply-To: <200902271005.28459.jblanca@btc.upv.es> References: <200902261612.54306.jblanca@btc.upv.es> <1235675865.49a6ead98bc24@webmail.upv.es> <320fb6e00902261400v22af3c52ob1a8cb80113f6756@mail.gmail.com> <200902271005.28459.jblanca@btc.upv.es> Message-ID: <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca wrote: >> The example was a bit small - so I had guessed a bit, and it sounds >> like my guess was wrong. ?Do you have a larger example picture? > I want something like the Genographer software does: > http://hordeum.oscs.montana.edu/genographer/help/tutorial/tutorial.html > But I don't need an interactive GUI application, just the gel rendering. That's much clearer - is the Genographer software showing the actual image (zoomed as required, with the colours adjusted as required), or an artificial recreation? Are you trying to create this figure for illustrative purposes only? I mean would a slightly cartoon like recreation be fine, or are you trying to make it as realistic as possible? Either way, I doubt there will be any existing software for exactly this purpose - and you will have to create your own code to draw this. > You can take a look at the code at: > http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/ > Take into account that is just a work in progress. I see you are having to reverse engineer their file format. I guess other people have tried this in the past so there may be more clues out on the internet. Have you tried emailing the company to see if they would publish the file format specifications (unlikely I fear, but worth asking). Peter From jblanca at btc.upv.es Fri Feb 27 10:57:49 2009 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 27 Feb 2009 11:57:49 +0100 Subject: [Biopython-dev] library to create gel image In-Reply-To: <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> References: <200902261612.54306.jblanca@btc.upv.es> <200902271005.28459.jblanca@btc.upv.es> <320fb6e00902270245q65c0b924obd5181576374134c@mail.gmail.com> Message-ID: <200902271157.49948.jblanca@btc.upv.es> On Friday 27 February 2009 11:45:59 Peter wrote: > On Fri, Feb 27, 2009 at 9:05 AM, Jose Blanca wrote: > That's much clearer - is the Genographer software showing the actual > image (zoomed as required, with the colours adjusted as required), or > an artificial recreation? Is an artificial recreation, the same as I'm trying to accomplish. I just want more resolution an automated process (genographer is a GUI application) > Are you trying to create this figure for illustrative purposes only? > I mean would a slightly cartoon like recreation be fine, or are you > trying to make it as realistic as possible? I want to analyze it. > I see you are having to reverse engineer their file format. I guess > other people have tried this in the past so there may be more clues > out on the internet. Have you tried emailing the company to see if > they would publish the file format specifications (unlikely I fear, > but worth asking). Fortunately the ABIF was reverse enginered by people more clever than me. And a couple of years ago Applied published an specification. http://bioinf.comav.upv.es/svn/gelify/gelifyfsa/src/doc/ABIF_File_Format.pdf You can't beleive everything in that specification, but it is a good start. Reading an abif file is not a problem, drawing the gel with as little coding as possible is another thing. Regards, Jose Blanca From biopython at maubp.freeserve.co.uk Fri Feb 27 11:13:45 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 11:13:45 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <1235175883.22598.62.camel@lafa> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <1235175883.22598.62.camel@lafa> Message-ID: <320fb6e00902270313o3c860b4eweb56a0a1cdc87e80@mail.gmail.com> On Sat, Feb 21, 2009 at 12:24 AM, Iddo Friedberg wrote: > > Hi all, > > I am sort of living in this world right now, doing a lot of > metagenomics, so here are my $0.02. I agree with Leighton (assuming I > understand him): We should consider the possible applications people > will run using the quality data when designing the > > from what I have seen the ?most common use for quality scores is for > trimming the sequences, i.e. removing the lesser quality sequence data > (usually on the edges) from the 5' and 3' ends of the read. So any data > structure should take into consideration that we will probably have > a .trim(self,threshold) method or function trim(seq, threshold) that > will return a slice of the sequence. I'm note convinced the SeqRecord needs a trim method (and if it did, it would also need to take an argument saying which per-letter-annotation should use, e.g. the PHRED qualities). But yes, this is an excellent example of where it would be very useful to have the SeqRecord support slicing which also slices the quality information (as recently discussed, with an implementation on Bug 2507). I've got a related example use-case, trimming primer sequences from the raw reads (and trimming the quality scores to match) before assembly. If the quality scores are recorded in a per-letter-annotation dictionary which is integrated into SeqRecord slicing, this becomes fairly straight forward. First read in the data (most simply from a FASTQ file). You look at the SeqRecord's seq to determine where to cut the sequence, and then apply the slice to the SeqRecord - this will give you a new SeqRecord with the appropriate sub-sequence and the appropriate sub-list of the quality scores. You can then save this data, either as a FASTQ file, or paired FASTA and QUAL files. Peter From dalloliogm at gmail.com Fri Feb 27 11:50:03 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 27 Feb 2009 12:50:03 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> Message-ID: <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> On Wed, Feb 25, 2009 at 11:00 PM, Peter wrote: > On Wed, Feb 25, 2009 at 10:02 AM, Giovanni Marco Dall'Olio > wrote: >> >> Of course you can stick with bugzilla, ... >> > > I've put an updated patch on Bug 2507 which implements the > per-letter-annotations as a restricted dictionary (as the > letter_annotations property for now), and adds a __getitem__ method to > the SeqRecord object which is aware of it. Hi, I have applied your patch to my unofficial github branch. Here it is: - http://github.com/dalloliogm/biopython/commit/51383b0e91b46f66ca20b36707c3a21a3dcbf0fb People not wishing to use git can download the code anyway, by clicking on 'download' in this page: - http://github.com/dalloliogm/biopython/tree/qualityscores-experimental The right button to click is the 'download' near the 'watch' button. I know there is a second 'Downloads' page which creates confusion, but it doesn't have nothing to do with it. On the branches graph there is a bit of confusion now (my fault), but you can see that I have applied your patch over a recent version of biopython (there are some commits that I didn't include yet). p.s. on your patch (http://bugzilla.open-bio.org/attachment.cgi?id=1249), on the third change, you modify this in SeqRecord.__init__: 95c120 < self.seq = seq --- > self._seq = seq can it be an error? Why self.seq has been moved to self._seq? > This changes both > SeqRecord.py and SeqFeature.py (required for switching the > co-ordinates on SeqFeature objects as part of a SeqRecord slice), and > is against the current CVS code. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2507 > > If any of you aren't familiar with using the command line tools diff > and patch, here's what you would do to try this code. ?Get a copy of > the latest Biopython code from CVS, change to the Bio directory, > download the attachment and save it in that directory as > attachment.patch (for example) then and run "patch < attachment.patch" > to update the code. > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Feb 27 12:12:11 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 12:12:11 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> Message-ID: <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> > p.s. on your patch > (http://bugzilla.open-bio.org/attachment.cgi?id=1249), on the third > change, you modify this in SeqRecord.__init__: > > 95c120 > < ? ? ? ? self.seq = seq > --- >> ? ? ? ? self._seq = seq > > can it be an error? Why self.seq has been moved to self._seq? It is deliberate. Before the patch, the SeqRecord's .seq was a "naked" attribute. After the patch, the actual sequence hidden in the private attribute ._seq and is publicly exposed using a property (also known as a "managed attribute") with a get and set method (and a doc string). The reason for doing this is I want to have some code run when ever anyone tries to set the seq property to a new value (in order prevent the seq and per-letter-annotation getting out of sync). Peter From dalloliogm at gmail.com Fri Feb 27 12:20:51 2009 From: dalloliogm at gmail.com (Giovanni Marco Dall'Olio) Date: Fri, 27 Feb 2009 13:20:51 +0100 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> Message-ID: <5aa3b3570902270420j5b97932fo3f79bb4cd19566b0@mail.gmail.com> On Fri, Feb 27, 2009 at 1:12 PM, Peter wrote: >> p.s. on your patch >> (http://bugzilla.open-bio.org/attachment.cgi?id=1249), on the third >> change, you modify this in SeqRecord.__init__: >> >> 95c120 >> < ? ? ? ? self.seq = seq >> --- >>> ? ? ? ? self._seq = seq >> >> can it be an error? Why self.seq has been moved to self._seq? > > It is deliberate. ?Before the patch, the SeqRecord's .seq was a > "naked" attribute. ?After the patch, the actual sequence hidden in the > private attribute ._seq and is publicly exposed using a property (also > known as a "managed attribute") with a get and set method (and a doc > string). ?The reason for doing this is I want to have some code run > when ever anyone tries to set the seq property to a new value (in > order prevent the seq and per-letter-annotation getting out of sync). I see, that is pretty nice. now you define seq with the 'property' function. p.s. it is not exactly related... but I was reading this article about python 3: - http://www.informit.com/articles/article.aspx?p=1309289&seqNum=4 Look at the example: the class StockItem has an attribute called 'quantity' which can be comprised only between 1 and 1000; when someone tries to modify it to a negative number, an exception is raised. Maybe it can be interesting for biopython 3 :-) > > Peter > -- My blog on bioinformatics (now in English): http://bioinfoblog.it From biopython at maubp.freeserve.co.uk Fri Feb 27 12:26:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 12:26:19 +0000 Subject: [Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord? In-Reply-To: <5aa3b3570902270420j5b97932fo3f79bb4cd19566b0@mail.gmail.com> References: <200902201249.36743.jblanca@btc.upv.es> <20090220231904.GE18294@sobchak.mgh.harvard.edu> <320fb6e00902211050r7a57bceap9ba216924785b9b0@mail.gmail.com> <5aa3b3570902230550v12e505eeje3dcf38d9bed8d2b@mail.gmail.com> <320fb6e00902230624j65d90b63tb9b5c1063d03c923@mail.gmail.com> <5aa3b3570902250202k6ad4779duea2c051ad6a8fd3c@mail.gmail.com> <320fb6e00902251400r58f46df4ka54328b617781bd4@mail.gmail.com> <5aa3b3570902270350j2a7d978bpc778e7f4f952e077@mail.gmail.com> <320fb6e00902270412q6b0a9208m47320660f7d19c58@mail.gmail.com> <5aa3b3570902270420j5b97932fo3f79bb4cd19566b0@mail.gmail.com> Message-ID: <320fb6e00902270426t60ba1970ld7e576a90d9ca99e@mail.gmail.com> On Fri, Feb 27, 2009 at 12:20 PM, Giovanni Marco Dall'Olio wrote: > I see, that is pretty nice. > now you define seq with the 'property' function. > > p.s. it is not exactly related... but I was reading this article about python 3: > - http://www.informit.com/articles/article.aspx?p=1309289&seqNum=4 > Look at the example: the class StockItem has an attribute called > 'quantity' which can be comprised only between 1 and 1000; when > someone tries to modify it to a negative number, an exception is > raised. > Maybe it can be interesting for biopython 3 :-) Actually decorators are available from Python 2.4+ http://www.python.org/dev/peps/pep-0318/ This is something we may want to look at once we've dropped support for Python 2.3 (Biopython 1.50 should be our last release to officially support Python 2.3). Peter From biopython at maubp.freeserve.co.uk Fri Feb 27 14:17:15 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 27 Feb 2009 14:17:15 +0000 Subject: [Biopython-dev] Bio.NetCatch, Bio.FilteredReader and Bio.File.SGMLHandle Message-ID: <320fb6e00902270617u6ed9e230u35dc8e440fcd21cd@mail.gmail.com> Hello all, Earlier this month over on the main discussion list Michiel suggested we start the deprecation process for the Bio.NetCatch and Bio.FilteredReader modules and Bio.File.SGMLHandle class. http://lists.open-bio.org/pipermail/biopython/2009-February/004932.html http://lists.open-bio.org/pipermail/biopython/2009-February/004933.html We didn't have any response, so I have just updated the docstrings and the DEPRECATED file in CVS to declare them obsolete, stating that in a subsequent release they will be deprecated, and later removed. If anyone wants to, we could probably go with an immediate deprecation of these (plus also Bio.EZRetrieve), but I see no reason to hurry. Peter From bugzilla-daemon at portal.open-bio.org Fri Feb 27 18:25:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:25:42 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200902271825.n1RIPgLl011447@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1249 is|0 |1 obsolete| | ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-27 13:25 EST ------- Created an attachment (id=1250) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1250&action=view) Patch to SeqRecord.py and SeqFeature.py Updated the patch, fixes a couple of len(seq) which should have been len(self.seq), updates the __str__ method to show when there is per-letter-annotation. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 27 18:29:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:29:31 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902271829.n1RITVsp011783@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1244 is|0 |1 obsolete| | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-27 13:29 EST ------- Created an attachment (id=1251) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1251&action=view) Read/write support for FASTQ and QUAL files, using the per-letter-annotation dict Updated to: * use the per-letter-annotation dictionary added by the patch on Bug 2507 * read and write the Solexa FASTQ variant (which I plan to call "fastq-solexa" in Bio.SeqIO) * automatically convert PHRED/Solexa qualities when writing a file in the other format. This needs some more testing with real Solexa FASTQ files, but I expect to be able to do that next with with some real data from a colleague. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 27 18:31:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:31:23 -0500 Subject: [Biopython-dev] [Bug 2767] Bio.SeqIO support for FASTQ and QUAL files In-Reply-To: Message-ID: <200902271831.n1RIVNRG012028@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2767 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2507 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-02-27 13:31 EST ------- After discussion on the mailing list, storing the qualities values nicely will depend on the per-letter-annotation support being implemented on Bug 2507 (together with SeqRecord slicing). Marking this dependency in bugzilla. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Feb 27 18:31:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 27 Feb 2009 13:31:25 -0500 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200902271831.n1RIVP59012042@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2767 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.