From schaefer at rostlab.org Tue Nov 2 05:17:49 2010 From: schaefer at rostlab.org (Christian Schaefer) Date: Tue, 02 Nov 2010 10:17:49 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: <4CCFD73D.7000203@rostlab.org> Hey, I was using the PDB superimposer once and compared it to ProFit [1] which does a McLachlan fitting. Both return essentially the same rmsd, while the implementation in Bio.PDB seems to yield higher precision. Chris [1] http://www.bioinf.org.uk/software/profit/ -- Dipl.-Bioinf. Christian Schaefer Technical University Munich Department for Bioinformatics Faculty of Computer Science/I12 Boltzmannstr. 3 D-85748 Garching b. Muenchen Germany http://www.rostlab.org/~schaefer On 10/30/2010 01:42 AM, George Devaniranjan wrote: > Thanks Eric and Peter, > Your patience in answering this question is very much appreciated. > I think Eric maybe right, I tried the RMSD calculation for several > structures and VMD does give a lower value for them all. > George > > Thanks once again for all of you for your answers > > On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevichwrote: > >> On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan< >> devaniranjan at gmail.com> wrote: >> >>> I was wondering why there is two functions for calculating RMSD >>> >>> 1)in the SVDSuperimposer() >>> 2)in PDB.Superimposer() >>> >>> In the code its says RMS-is RMS being calculated instead of RMSD??? >>> I ask because VMD gives a different value for RMSD to the one from >>> Biopython >>> >>> >> Hello George, >> >> Here's my understanding of it: >> >> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms >> of the distances in 3D space between each corresponding pair of atoms. The >> RMSD between all atoms in two aligned structures may be different than the >> RMSD between backbone atoms only. Or, if the two structures don't have the >> same peptide sequence, that raises another set of issues. >> >> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a >> simplified wrapper. >> >> 3. The SVDSuperimposer module allows you to either (i) align two structures >> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without >> spatially (re-)aligning the structures. PDB.Superimposer just does the >> former. If the structures weren't already aligned, these can yield very >> different values. >> >> 4. There are many ways to perform a structural alignment; SVDSuperimposer >> implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement >> more advanced methods. >> >> So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer >> -- it just means VMD found a better alignment between the two structures. >> >> Best, >> Eric >> >> >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From krother at rubor.de Tue Nov 2 07:15:05 2010 From: krother at rubor.de (Kristian Rother) Date: Tue, 2 Nov 2010 12:15:05 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: <529a050d3a1c3801f07adbef605341ef-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhRSF5cXg1fWg==-webmailer1@server08.webmailer.hosteurope.de> Hi Greg, I think I can help to clear up the RMSD question. (or RMS however you abbreviate it its the same formula) The short answer is, the methods giving lower RMSD do something conceptually very different from Bio.PDB. Long answer: - Bio.PDB.Superimposer does structure *superposition*. It takes pairs of atoms, and finds the rotation/translation matrix that minimizes the RMSD. There is a single analytical solution to this, returned by the Kabsch algorithm from 1976 (see http://www.pymolwiki.org/index.php/Kabsch). I'm quite sure Biopython/SVDSuperimposer implements this algorithm. - Services like the EBI SSM server do *structure alignment*. They take two structures and try to find a set of residue pairs that fit to each other well. To do so, they occasionally calculate RMSDs, but do not necessarily use all the residues provided. For instance, when submitting protein1 and protein2 to EBI, the output tells me that N(algn) = 31 meaning that 31 of the 36 residues were used to calculate the alignment. When looking at the structures, these are probably on the N-terminus (see picture). ==> the structure alignment algorithm discards the residues he doesnt regard useful for aligning, this is why the RMSD is lower. Do you think this explains all our observations? Best regards, Kristian > Hello everyone, > I tried with pymol and it gives a value of 1.792 for the RMSD after > alignment > The EU bioinformatics server gives a value of 1.74 > VMD 1.62 > But SVD and PDB Superimposer gives a value 3.2 > I have attached the 2 PDB files concerned-is it something I am doing in > calculating the RMSD using biopython? > Thank you > > On Thu, Oct 28, 2010 at 1:46 PM, Peter > wrote: > >> On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan >> wrote: >> > Yes there is a difference-for 2 proteins having exact same residues of >> 36 >> > residues the values from 4 sources are as follows >> > VMD RMSD=1.61 >> > SVD RMSD =3.2 >> > PDB RMSD=3.2 >> > >> > From the EU Bioinformatics server (link below) RMSD =1.75 >> > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) >> > >> > So Biopython really is computing the RMSD and not RMS? >> > Thanks you >> >> It has been a while since I looked at this (but I can still edit >> the Warwick page if is is unclear). >> >> Which definition of RMSD are you using? >> >> Bio.PDB uses Bio.SVDSuperimposer, so they should be the same. >> The comment for this code *says* is calculates the RMS deviation, >> here: >> >> diff=coords1-coords2 >> l=coords1.shape[0] >> return sqrt(sum(sum(diff*diff))/l) >> >> Here variable l will be the number of atoms. >> >> What are the two examples you are using? Can you at perhaps >> share a small example pair of PDB files? >> >> Peter >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -------------- next part -------------- A non-text attachment was scrubbed... Name: superpos.png Type: image/png Size: 172427 bytes Desc: not available URL: From devaniranjan at gmail.com Tue Nov 2 21:09:18 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Wed, 3 Nov 2010 01:09:18 +0000 Subject: [Biopython-dev] RMSD calculation In-Reply-To: <4CCFD73D.7000203@rostlab.org> References: <4CCFD73D.7000203@rostlab.org> Message-ID: Hi, Thank you- I have been noticing that for most PDB-superimposer well as SV-superimposer give similar values In addition PYMOL in most cases also gives similar values however in all cases VMD continues to give the smallest value. I will also test ProFit -thanks for the link. George On Tue, Nov 2, 2010 at 9:17 AM, Christian Schaefer wrote: > Hey, > > I was using the PDB superimposer once and compared it to ProFit [1] which > does a McLachlan fitting. Both return essentially the same rmsd, while the > implementation in Bio.PDB seems to yield higher precision. > > Chris > > [1] http://www.bioinf.org.uk/software/profit/ > > -- > Dipl.-Bioinf. Christian Schaefer > Technical University Munich > Department for Bioinformatics > Faculty of Computer Science/I12 > Boltzmannstr. 3 > D-85748 Garching b. Muenchen > Germany > http://www.rostlab.org/~schaefer > > > > On 10/30/2010 01:42 AM, George Devaniranjan wrote: > >> Thanks Eric and Peter, >> Your patience in answering this question is very much appreciated. >> I think Eric maybe right, I tried the RMSD calculation for several >> structures and VMD does give a lower value for them all. >> George >> >> Thanks once again for all of you for your answers >> >> On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich> >wrote: >> >> On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan< >>> devaniranjan at gmail.com> wrote: >>> >>> I was wondering why there is two functions for calculating RMSD >>>> >>>> 1)in the SVDSuperimposer() >>>> 2)in PDB.Superimposer() >>>> >>>> In the code its says RMS-is RMS being calculated instead of RMSD??? >>>> I ask because VMD gives a different value for RMSD to the one from >>>> Biopython >>>> >>>> >>>> Hello George, >>> >>> Here's my understanding of it: >>> >>> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms >>> of the distances in 3D space between each corresponding pair of atoms. >>> The >>> RMSD between all atoms in two aligned structures may be different than >>> the >>> RMSD between backbone atoms only. Or, if the two structures don't have >>> the >>> same peptide sequence, that raises another set of issues. >>> >>> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a >>> simplified wrapper. >>> >>> 3. The SVDSuperimposer module allows you to either (i) align two >>> structures >>> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without >>> spatially (re-)aligning the structures. PDB.Superimposer just does the >>> former. If the structures weren't already aligned, these can yield very >>> different values. >>> >>> 4. There are many ways to perform a structural alignment; SVDSuperimposer >>> implements a simple one. PyMOL, VMD, ce, DALI, and other programs >>> implement >>> more advanced methods. >>> >>> So don't be alarmed that VMD gives you a smaller RMSD than >>> PDB.Superimposer >>> -- it just means VMD found a better alignment between the two structures. >>> >>> Best, >>> Eric >>> >>> >>> >>> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Wed Nov 3 10:02:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 14:02:48 +0000 Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: References: Message-ID: On Tue, Oct 19, 2010 at 4:54 PM, Peter wrote: > Hi all, > > I've fixed a few issues I felt were holding up merging Andrea's UniProt > XML parser. > > I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed > into more or less equivalent objects, and that these can be written out > as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent > work to support protein EMBL files - which do exist but are rarely used). > > This required "fixing" Bug 3026 to cope with long annotation that cannot > be line wrapper nicely (lots of long URL strings in UniProt XML comments). > http://bugzilla.open-bio.org/show_bug.cgi?id=3026 > I'm tempted to remove the warning because it is so common... or make > it use the same text each time so you get warned once. > > There are also some additions to the Bio.SeqFeature position classes, > since SwissProt/UniProt files can have uncertain positions. > > Could someone take a look at the code here (a rebased branch), as I'd > like some independent testing (and better yet, code review): > http://github.com/peterjc/biopython/tree/uniprot I've now merged this into the trunk (with a git rebase first so the history is linear - no branch+merge), and Andrea has agreed to retest it. Other testing and comments are most welcome. Peter From biopython at maubp.freeserve.co.uk Wed Nov 3 12:45:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 16:45:25 +0000 Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: <781588.85801.qm@web62407.mail.re1.yahoo.com> References: <781588.85801.qm@web62407.mail.re1.yahoo.com> Message-ID: On Sat, Oct 30, 2010 at 3:23 PM, Michiel de Hoon wrote: > > OK, done. In the end, I put the warning message in MarkovModel.py anyway, > since it's very easy to miss if it's in setup.py. > Do we really need the warning? I guess otherwise people using this code might notice a drop in performance if they were using our C code version, updated their Biopython, and then get the Python fallback if their NumPy is too old. If we do keep the warning should it be silenced in test_MarkovModel.py? Something like the patch below should do it... Peter diff --git a/Tests/test_MarkovModel.py b/Tests/test_MarkovModel.py index fc5ae8b..bb3afe8 100644 --- a/Tests/test_MarkovModel.py +++ b/Tests/test_MarkovModel.py @@ -9,7 +9,12 @@ except ImportError: raise MissingPythonDependencyError(\ "Install NumPy if you want to use Bio.MarkovModel.") +import warnings +#Silence this warning: +#For optimal speed, please update to Numpy version 1.3 or later +warnings.filterwarnings("ignore", category=UserWarning) from Bio import MarkovModel +warnings.filters.pop() def print_mm(markov_model): print "STATES: %s" % ' '.join(markov_model.states) From biopython at maubp.freeserve.co.uk Wed Nov 3 13:17:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 17:17:46 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/10/30 Tiago Ant?o : > Hi all, > > I've been hacking with buildbot, an integration server. This is to > allow continuous testing of Biopython. So that we are alerted of any > problems as soon as somebody does a dreadful commit (I have the top 5 > of most dreadful commits, so it was fair that I should try to do > something about it). > > Things are still incomplete, but I think it is time to inform the list > of this effort... > To know more about buildbot you can either go to the buildbot site > http://buildbot.net/ or see the draft doc that I have been preparing > http://biopython.org/wiki/Continuous_integration > There is a draft server here: > http://events.open-bio.org:8010/ > The cool thing about buildbot is that actual testing is done by > volunteer computers. Want to test on OS y, Python version z? You can > offer the idle time of your laptop for that... > It is looking impressive Tiago - excellent work :) > > Obvious things missing: > > 0. First and foremost, see if people like this? Looks very promising. > 1. Changing the biopython test code to avoid stressing the network > (i.e., having a run_tests option that will not test network tests). > This to avoid imposing continuous traffic on genbank and friends. This > is a show stopper. Certainly we can't scale this up to many machines running regular testing without limiting the network access somewhat. > 2. Maybe warn the mailing list when some fundamental build stops > working (e.g. send an email when a python 2.x build stops working) > 3. Have test servers with all the applications installed (do you want > to volunteer? This is more to do with volunteers) I would expect "core" developers to have machines with most of the command line applications used in Biopython's tests already installed - but yes, we do want to make sure each optional command line tool or library is installed on at least one build slave. > 4. Maybe change run_tests to require all tests to be done. If we are > doing integration testing, we want all tests to be done (missing > applications or libraries should be an error). As an example, none of > my tests are complete This is about how it currently skips tests missing external dependencies (like PopGen command line tools in your case). I think that is OK, otherwise we'll get false positives (see below, we can't satisfy all dependencies on all platforms). > 5. Support mac (my access to Mr Job's fashion machines is limited). > Again this is more a volunteer issue. My main work machine is a Mac, so this shouldn't be an issue. > 6. Discuss policies: One test a day? Full tests or updates? Full > network tests (probably sporadically)? Send emails? Right now triggering tests after each commit isn't easy to do is it (due to limited git support in builtbot)? That might be nice but in the short term running the tests once a day is a big step forward. I'd suggest we do network tests once a week (or fortnight?). > 7. Find volunteers to cover several OSes and several Python > versions. Assure that people do full tests (i.e. with all applications > and libraries) That isn't possible - some applications are not available on Windows, and some libraries are not available on Jython or Python 3 (yet). > 8. While I have volunteer Windows testing myself, I will not be able > to maintain it regularly. I have access to a Windows machine (which I use to build the Biopython installers) but currently it is only online intermittently. I'd have to reorganise machines due to limited network ports in the office, but it could in principle be used as a builtbot slave. > > Opinions are most welcome > What is wrong with your Linux Python 3.1 slave? It seems that 2to3 is failing on the doctest conversion. Peter From tiagoantao at gmail.com Thu Nov 4 08:04:17 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 4 Nov 2010 12:04:17 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/3 Peter : > Certainly we can't scale this up to many machines running regular > testing without limiting the network access somewhat. As we discussed before, I was thinking in adding an option to run_tests.py (like --offline) and change the tests that access the Internet to honour that flag. I was thinking in coding this myself and then send to the list for approval (I am not going to make big changes to the test framework myself without passing them through here). >> 6. Discuss policies: One test a day? Full tests or updates? Full >> network tests (probably sporadically)? Send emails? > > Right now triggering tests after each commit isn't easy to do > is it (due to limited git support in builtbot)? That might be nice > but in the short term running the tests once a day is a big step > forward. It is actually quite easy (with an hook on github), but I would suggest leaving this for version 2: lets put the fundamental working and the add bells and whistles. > I'd suggest we do network tests once a week (or fortnight?). OK, I will go ahead and do some changes to run_tests.py as per above. > That isn't possible - some applications are not available on Windows, > and some libraries are not available on Jython or Python 3 (yet). OK, we just have to be sure (manually) that all applications that need tested are tested. >> 8. While I have volunteer Windows testing myself, I will not be able >> to maintain it regularly. > > I have access to a Windows machine (which I use to build the > Biopython installers) but currently it is only online intermittently. > I'd have to reorganise machines due to limited network ports in > the office, but it could in principle be used as a builtbot slave. Regarding Mac and Windows, I will email again as soon as we have the network issue sorted out. Before that we would be doing maybe too much traffic as we have no way to stop the network access for now. > What is wrong with your Linux Python 3.1 slave? It seems that > 2to3 is failing on the doctest conversion. I do not have time to evaluate this now, I will trace this issue over the weekend. Tiago From biopython at maubp.freeserve.co.uk Thu Nov 4 08:28:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 12:28:50 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/4 Tiago Ant?o : > 2010/11/3 Peter : >> Certainly we can't scale this up to many machines running regular >> testing without limiting the network access somewhat. > > As we discussed before, I was thinking in adding an option to > run_tests.py (like --offline) and change the tests that access the > Internet to honour that flag. I was thinking in coding this myself and > then send to the list for approval (I am not going to make big changes > to the test framework myself without passing them through here). Yep, that sounds good. The previous discussion is here if anyone missed it: http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html >>> 6. Discuss policies: One test a day? Full tests or updates? Full >>> network tests (probably sporadically)? Send emails? >> >> Right now triggering tests after each commit isn't easy to do >> is it (due to limited git support in builtbot)? That might be nice >> but in the short term running the tests once a day is a big step >> forward. > > It is actually quite easy (with an hook on github), but I would > suggest leaving this for version 2: lets put the fundamental working > and the add bells and whistles. I agree. >> I'd suggest we do network tests once a week (or fortnight?). > > OK, I will go ahead and do some changes to run_tests.py as per above. > >> That isn't possible - some applications are not available on Windows, >> and some libraries are not available on Jython or Python 3 (yet). > > OK, we just have to be sure (manually) that all applications that need > tested are tested. Yes, that will be a manual task. When we document the slave setup process we can list which applications we ideally want people to install on each OS. Having a slight range in versions would actually be a good thing here. >>> 8. While I have volunteer Windows testing myself, I will not be able >>> to maintain it regularly. >> >> I have access to a Windows machine (which I use to build the >> Biopython installers) but currently it is only online intermittently. >> I'd have to reorganise machines due to limited network ports in >> the office, but it could in principle be used as a builtbot slave. > > Regarding Mac and Windows, I will email again as soon as we have the > network issue sorted out. Before that we would be doing maybe too much > traffic as we have no way to stop the network access for now. > >> What is wrong with your Linux Python 3.1 slave? It seems that >> 2to3 is failing on the doctest conversion. > > I do not have time to evaluate this now, I will trace this issue over > the weekend. Sure. And once the --offline switch is working, we can start adding slaves (and documenting how to do it to assist future volunteers). Good work Tiago :) Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 4 12:49:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 4 Nov 2010 12:49:45 -0400 Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error code 0 even on failure In-Reply-To: Message-ID: <201011041649.oA4GnjEw008477@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3139 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-04 12:49 EST ------- Fix checked in by Tiago, marking as fixed. http://github.com/biopython/biopython/commit/457ce49a060fe540f98aa37a6266cff17864487b -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 4 13:13:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 17:13:33 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans Message-ID: Hi all, I've mentioned in recent threads that I think we should try and release Biopython 1.56 this month (November 2010). I think the NEWS file is pretty up to date, and covers important new functionality like Andrea Pierleoni's UniProt XML parser and the IMGT support (with Uri Laserson). Is there any other functionality which is ready for merging? For example, Tiago - you've been doing lots of work on your branch with the PopGen code. Is that code ready? I'm willing to do the git merge/rebase. Is there any reason to bother with a beta release this time? If there are no pressing additions, I may be able to do the release tomorrow - otherwise how about aiming for Thursday or Friday next week (11 or 12 November)? Regards, Peter From mjldehoon at yahoo.com Fri Nov 5 05:40:19 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 02:40:19 -0700 (PDT) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <701600.10148.qm@web62403.mail.re1.yahoo.com> I think the following should be removed before the release: Bio/SwissProt/SProt.py Bio/Transcribe.py Bio/Translate.py as well as the Iterator class in Bio/SCOP/Dom.py. These have been deprecated since Biopython 1.52. Best, --Michiel. --- On Thu, 11/4/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Biopython 1.56 release plans > To: "Biopython-Dev Mailing List" > Date: Thursday, November 4, 2010, 1:13 PM > Hi all, > > I've mentioned in recent threads that I think we should try > and > release Biopython 1.56 this month (November 2010). > > I think the NEWS file is pretty up to date, and covers > important > new functionality like Andrea Pierleoni's UniProt XML > parser > and the IMGT support (with Uri Laserson). > > Is there any other functionality which is ready for > merging? > > For example, Tiago - you've been doing lots of work on > your > branch with the PopGen code. Is that code ready? I'm > willing > to do the git merge/rebase. > > Is there any reason to bother with a beta release this > time? > > If there are no pressing additions, I may be able to do > the > release tomorrow - otherwise how about aiming for Thursday > or Friday next week (11 or 12 November)? > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Fri Nov 5 06:13:09 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Nov 2010 10:13:09 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: > For example, Tiago - you've been doing lots of work on your > branch with the PopGen code. Is that code ready? I'm willing > to do the git merge/rebase. I was hoping that would offer to do a merge ;) . Though we need a broken repository to test the integration server, so maybe I could do it myself . Yes, the code is ready. After the merge I will still add a couple of functions (also ready, but not committed) and make sure the test cases are fully ready. But it should be a day only and better done after the merge. This is mainly new code that does much faster GENEPOP parsing and supports AFLP processing. Tiago From biopython at maubp.freeserve.co.uk Fri Nov 5 06:19:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 10:19:53 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Tiago Ant?o : > On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: >> For example, Tiago - you've been doing lots of work on your >> branch with the PopGen code. Is that code ready? I'm willing >> to do the git merge/rebase. > > I was hoping that would offer to do a merge ;) . Though we > need a broken repository to test the integration server, so maybe I > could do it myself . > Yes, the code is ready. OK - I'll try to get your code, rebase it onto the current master, then post it as a new branch for you to check. Once that is OK, I'll rebase it again if the master has changed, then fast-forward merge it to the master (that way we don't get a split and join on the master history - just a sudden batch of commits). > After the merge I will still add a couple of functions (also ready, > but not committed) and make sure the test cases are fully ready. > But it should be a day only and better done after the merge. > This is mainly new code that does much faster GENEPOP > parsing and supports AFLP processing. Hopefully we can get that part done early next week. Peter From biopython at maubp.freeserve.co.uk Fri Nov 5 06:23:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 10:23:26 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: <701600.10148.qm@web62403.mail.re1.yahoo.com> References: <701600.10148.qm@web62403.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 9:40 AM, Michiel de Hoon wrote: > I think the following should be removed before the release: > > Bio/SwissProt/SProt.py > Bio/Transcribe.py > Bio/Translate.py > > as well as the Iterator class in Bio/SCOP/Dom.py. > > These have been deprecated since Biopython 1.52. According to the DEPRECATED file, those modules were deprecated in Biopython 1.51, so they are definitely due for removal. In any case Biopython 1.52 was very nearly a year ago [1] as it was released 22 September 2009. Please go ahead and tidy this up. Thanks, Peter [1] http://www.biopython.org/wiki/Deprecation_policy From biopython at maubp.freeserve.co.uk Fri Nov 5 06:47:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 10:47:12 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Peter : > 2010/11/5 Tiago Ant?o : >> On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: >>> For example, Tiago - you've been doing lots of work on your >>> branch with the PopGen code. Is that code ready? I'm willing >>> to do the git merge/rebase. >> >> I was hoping that would offer to do a merge ;) . Though we >> need a broken repository to test the integration server, so maybe I >> could do it myself . >> Yes, the code is ready. > > OK - I'll try to get your code, rebase it onto the current master, > then post it as a new branch for you to check. Notes on how I did this: $ git remote add tiago https://github.com/tiagoantao/biopython.git $ git fetch tiago ... >From https://github.com/tiagoantao/biopython * [new branch] buildbot -> tiago/buildbot * [new branch] master -> tiago/master Now I want your "master" branch, but that name clashes with my "master" branch... the following worked here: $ git checkout tiago/master Note: moving to "tiago/master" which isn't a local branch If you want to create a new branch from this checkout, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at 21b7a22... Merge branch 'master' of github.com:tiagoantao/biopython $ git checkout -b tiago-pop-gen Switched to a new branch "tiago-pop-gen" Now I want to write the history of you PopGen work as though it was started from the current state of the master branch. I was hoping there would have been no changes to the PopGen code on the master so that this would be trivial... $ git rebase master ... CONFLICT (content): Merge conflict in Bio/PopGen/FDist/__init__.py ... So open Bio/PopGen/FDist/__init__.py and look for the merge failures (which are marked with <<<<<<< to >>>>>>>). In this it was the removal of some deprecated code done on the pop gen branch, which was only deprecated in Biopython 1.55 so it is a bit premature to remove it already. So I fixed up Bio/PopGen/FDist/__init__.py and saved it. Then: $ git add Bio/PopGen/FDist/__init__.py $ git rebase --continue ... This seems to have worked. I can now do a comparison to the master branch, $ git diff master ... After running the unit tests (which was of limited value as I don't have FDist installed on this machine), I then pushed it online: $ git push peterjc tiago-pop-gen The rebased branch is now here: https://github.com/peterjc/biopython/tree/tiago-pop-gen If you agree the rebased branch is sane, it should be trivial to now merge that onto the master as a fast-forward merge. (But I would check first that the master hasn't changed, and if it has, repeat the rebase). Peter From tiagoantao at gmail.com Fri Nov 5 06:50:32 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Nov 2010 10:50:32 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Peter : > If you agree the rebased branch is sane, it should be trivial to > now merge that onto the master as a fast-forward merge. > (But I would check first that the master hasn't changed, and > if it has, repeat the rebase). Many thanks for the guide, maybe in the future I will have the courage to do it myself. Go ahead and commit the changes. I will make sure the module is sane this Sunday. From biopython at maubp.freeserve.co.uk Fri Nov 5 07:08:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 11:08:54 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Tiago Ant?o : > 2010/11/5 Peter : >> If you agree the rebased branch is sane, it should be trivial to >> now merge that onto the master as a fast-forward merge. >> (But I would check first that the master hasn't changed, and >> if it has, repeat the rebase). > > Many thanks for the guide, maybe in the future I will have the > courage to do it myself. > > Go ahead and commit the changes. I will make sure the module > is sane this Sunday. Done. The master hadn't changed in the meantime so I didn't have to re-rebase: $ git checkout master Switched to branch "master" $ git merge tiago-pop-gen Updating 065e235..4f318a4 Fast forward Bio/PopGen/FDist/Async.py | 21 +- Bio/PopGen/FDist/Controller.py | 125 +- Bio/PopGen/FDist/Utils.py | 68 +- Bio/PopGen/FDist/__init__.py | 1 - Bio/PopGen/GenePop/EasyController.py | 10 +- Bio/PopGen/GenePop/FileParser.py | 69 +- Tests/PopGen/data_dfst_outfile | 300 + Tests/PopGen/dfdist1 | 1204 + Tests/PopGen/dout.cpl | 300 + Tests/PopGen/dout.dat |50000 ++++++++++++++++++++++++++++++++++ Tests/test_PopGen_DFDist.py | 106 + Tests/test_PopGen_FDist_nodepend.py | 20 +- 12 files changed, 52176 insertions(+), 48 deletions(-) create mode 100644 Tests/PopGen/data_dfst_outfile create mode 100644 Tests/PopGen/dfdist1 create mode 100644 Tests/PopGen/dout.cpl create mode 100644 Tests/PopGen/dout.dat create mode 100644 Tests/test_PopGen_DFDist.py Then publishing it, $ git push origin master Counting objects: 120, done. Delta compression using 8 threads. Compressing objects: 100% (106/106), done. Writing objects: 100% (106/106), 133.46 KiB, done. Total 106 (delta 79), reused 0 (delta 0) To git at github.com:biopython/biopython.git 065e235..4f318a4 master -> master And removing my now pointless public branch: $ git push peterjc :tiago-pop-gen To git at github.com:peterjc/biopython.git - [deleted] tiago-pop-gen We need to update the NEWS file now. Peter From mjldehoon at yahoo.com Fri Nov 5 07:52:15 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 04:52:15 -0700 (PDT) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <645847.84052.qm@web62404.mail.re1.yahoo.com> > > Bio/SwissProt/SProt.py > > the Iterator class in Bio/SCOP/Dom.py I have removed these. > > Bio/Transcribe.py > > Bio/Translate.py These are still imported from Bio/Encodings/IUPACEncoding.py, which is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code is doing. Does anybody know? --Michiel. From biopython at maubp.freeserve.co.uk Fri Nov 5 08:01:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 12:01:45 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: <645847.84052.qm@web62404.mail.re1.yahoo.com> References: <645847.84052.qm@web62404.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon wrote: > >> > Bio/SwissProt/SProt.py >> > the Iterator class in Bio/SCOP/Dom.py > > I have removed these. > >> > Bio/Transcribe.py >> > Bio/Translate.py > > These are still imported from Bio/Encodings/IUPACEncoding.py, which > is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code > is doing. Does anybody know? Ah right - sorry, that had slipped my mind: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html I had suggested we leave Bio.Transcribe and Bio.Translate in for Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager, and Bio.Encodings.IUPACEncoding) for Biopython 1.57 Peter From mjldehoon at yahoo.com Fri Nov 5 08:08:17 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 05:08:17 -0700 (PDT) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <772269.63506.qm@web62407.mail.re1.yahoo.com> I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this functionality moved to Bio.ExPASy.Prodoc at least since release 1.50, and the module has been labeled as obsolete since then. The enclosing module Bio.Prosite itself is already deprecated. --Michiel. From biopython at maubp.freeserve.co.uk Fri Nov 5 08:19:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 12:19:27 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: <772269.63506.qm@web62407.mail.re1.yahoo.com> References: <772269.63506.qm@web62407.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 12:08 PM, Michiel de Hoon wrote: > I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this > functionality moved to Bio.ExPASy.Prodoc at least since release 1.50, > and the module has been labeled as obsolete since then. The enclosing > module Bio.Prosite itself is already deprecated. Since Bio.Prosite is deprecated that means Bio.Prosite.Prodoc (and any other child modules) is too. If you try "from Bio.Prosite import Prodoc" you get a deprecation warning. Feel free to add "(DEPRECATED)" to the Bio.Prosite.Prodoc docstrings if you think it would be clearer. Peter From andrea at biocomp.unibo.it Fri Nov 5 12:43:16 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 5 Nov 2010 17:43:16 +0100 (CET) Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: References: Message-ID: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> > On Tue, Oct 19, 2010 at 4:54 PM, Peter > wrote: > I've now merged this into the trunk (with a git rebase first so the > history > is linear - no branch+merge), and Andrea has agreed to retest it. > Other testing and comments are most welcome. > > Peter > I've done a couple of testing, from the master biopython branch. The uniprot-xml parser successfully parsed the 2010_11 release of uniprot containing 522,019 entries. The plain text 'swiss' parser took 6 mins to parse the complete flatfile uniprot db on my system (python 2.6 on a macbook pro, core2duo). the uniprot-xml parser took 12 minutes to do the same task when using cElementTree and looks pretty good to me (compare this to the 8 minutes I needed to download the gzipped db). However it took more than 80 mins to do the same task using ElementTree. So be aware that the parser can turn very slow without the C library. I'm currently retesting also on TrEMBL, but I don't think there is going to be any problem. I have no idea of the performances with jython, and similar derivations of python, nor if it works. Andrea From eric.talevich at gmail.com Fri Nov 5 13:26:03 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 5 Nov 2010 13:26:03 -0400 Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 5, 2010 at 12:43 PM, Andrea Pierleoni wrote: > > I've done a couple of testing, from the master biopython branch. > The uniprot-xml parser successfully parsed the 2010_11 release of uniprot > containing > 522,019 entries. > > [...] > > I have no idea of the performances with jython, and similar derivations of > python, nor if it works. > > Speaking from my experience with ElementTree in Bio.Phylo -- Jython 2.5's implementation of xml.etree should work as a drop-in replacement, but it's painfully slow. However, I've read that the next release of Jython will include some substantial overall speedups, which should make it more competitive. I once tried to get Biopython working on IronPython (on Mono, on Linux), but didn't succeed. The release I used didn't seem to have a compatible xml.etree implementation, though the developers may have made progress on this recently. -Eric From biopython at maubp.freeserve.co.uk Fri Nov 5 13:53:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 17:53:50 +0000 Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 5, 2010 at 4:43 PM, Andrea Pierleoni wrote: > > On Tue, Oct 19, 2010 at 4:54 PM, Peter wrote: >> I've now merged this into the trunk (with a git rebase first so the >> history is linear - no branch+merge), and Andrea has agreed to >> retest it. Other testing and comments are most welcome. >> >> Peter >> > > > I've done a couple of testing, from the master biopython branch. > The uniprot-xml parser successfully parsed the 2010_11 release > of uniprot containing 522,019 entries. > > The plain text 'swiss' parser took 6 mins to parse the complete flatfile > uniprot db on my system (python 2.6 on a macbook pro, core2duo). > the uniprot-xml parser took 12 minutes to do the same task when using > cElementTree and looks pretty good to me (compare this to the 8 > minutes I needed to download the gzipped db). I think I have a slightly older version as it only has 519348 entries. My timings using Python 2.6 on Mac OS X, using looping over the file with Bio.SeqIO.parse() and incrementing a counter: uniprot_sprot.fasta, 232 MB, 15s ("fasta") uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss") uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml") Note the XML file is about twice the size of the plain text swiss format file, and as you noted, takes about twice as long to parse. > However it took more than 80 mins to do the same task using > ElementTree. So be aware that the parser can turn very slow > without the C library. > > I'm currently retesting also on TrEMBL, but I don't think there is going > to be any problem. OK - those files are about 10 times bigger, right? > I have no idea of the performances with jython, and similar > derivations of python, nor if it works. The tests all pass with Jython 2.5.1 (running under Mac OS X), and here are some timings: uniprot_sprot.fasta, 232 MB, 21s ("fasta") uniprot_sprot.dat, 2.2 GB, 8m34s ("swiss") uniprot_sprot.xml, 4.5 GB, FAILED ("uniprot-xml") The XML file failed almost immediately with this traceback: Traceback (most recent call last): File "../count.py", line 13, in for record in SeqIO.parse(open(filename), format_name): File "../count.py", line 13, in for record in SeqIO.parse(open(filename), format_name): File "/Users/xxx/jython2.5.1/Lib/site-packages/Bio/SeqIO/UniprotIO.py", line 80, in UniprotIterator for event, elem in ElementTree.iterparse(handle, events=("start", "end")): File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 937, in next self._parser.feed(data) File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 1245, in feed self._parser.Parse(data, 0) File "/Users/xxx/jython2.5.1/Lib/xml/parsers/expat.py", line 195, in Parse self._data.append(data) at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuilder.append(StringBuilder.java:119) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space Note this wasn't a simple out of memory error (the machine had GBs free), rather it was heap space. That's a bit frustrating - but Kyle's email suggests things could improve in the next Jython release. Peter From andrea at biocomp.unibo.it Fri Nov 5 14:09:08 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 5 Nov 2010 19:09:08 +0100 (CET) Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: <37e194782e740bf5bd2e872bfc6a37d3.squirrel@lipid.biocomp.unibo.it> > I think I have a slightly older version as it only has 519348 entries. > My timings using Python 2.6 on Mac OS X, using looping over the > file with Bio.SeqIO.parse() and incrementing a counter: > > uniprot_sprot.fasta, 232 MB, 15s ("fasta") > uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss") > uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml") > my timings were without the counter :) > Note the XML file is about twice the size of the plain text swiss > format file, and as you noted, takes about twice as long to parse. > yes it's true, but iterating over the two files takes 18s for .dat one and 38s for .xml one. the information retrieved is more or less the same. the rest is overhead due to the XML file complexity. however it's pretty fast anyway, at least with cElementTree. >> I'm currently retesting also on TrEMBL, but I don't think there is going >> to be any problem. > > OK - those files are about 10 times bigger, right? it's currently 12 millions entries! so it's 24 times bigger (7.5Gb gzipped) in fact I can't complete the test today. I'll keep you updated. > > Note this wasn't a simple out of memory error (the machine had GBs > free), rather it was heap space. That's a bit frustrating - but Kyle's > email suggests things could improve in the next Jython release. > Is the new Jython release coming soon? I'm really a newbie to jython, so I don't think I can help with it. maybe it is safer for jython users to use the 'swiss' parser until the new release came out, particularly if they have performance issues. Andrea From mjldehoon at yahoo.com Fri Nov 5 22:41:57 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 19:41:57 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: Message-ID: <646748.14362.qm@web62407.mail.re1.yahoo.com> --- On Wed, 11/3/10, Peter wrote: > > I put the warning message in MarkovModel.py anyway, > > since it's very easy to miss if it's in setup.py. > > Do we really need the warning? I guess otherwise people > using this code > might notice a drop in performance if they were using our C > code version, > updated their Biopython, and then get the Python fallback > if their NumPy is too old. We need the warning, otherwise we'd leave the user guessing as to why their code is suddenly slower. > If we do keep the warning should it be silenced in > test_MarkovModel.py? OK I've added this warning. --Michiel. From biopython at maubp.freeserve.co.uk Mon Nov 8 11:12:06 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 8 Nov 2010 16:12:06 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/4 Peter : >> As we discussed before, I was thinking in adding an option to >> run_tests.py (like --offline) and change the tests that access the >> Internet to honour that flag. I was thinking in coding this myself and >> then send to the list for approval (I am not going to make big changes >> to the test framework myself without passing them through here). > > Yep, that sounds good. > > The previous discussion is here if anyone missed it: > http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html > Hi Tiago, I've implemented the proposed --offline switch in run_tests.py, https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697 Does that work for you ? If you can come up with a more elegant solution do speak up - mine is a bit of a hack ;) Peter From tiagoantao at gmail.com Mon Nov 8 11:17:07 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 8 Nov 2010 16:17:07 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/8 Peter : > I've implemented the proposed --offline switch in run_tests.py, > > https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697 > > Does that work for you ? If you can come up with a more > elegant solution do speak up - mine is a bit of a hack ;) Thanks a lot. I was waiting for the 1.56 release to work on this thing (to avoid adding entrpoy). But as this is now in, I will progress immediately with the rest of the integration server work. I will contact soon regarding Mac testing. From tiagoantao at gmail.com Mon Nov 8 11:34:31 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 8 Nov 2010 16:34:31 +0000 Subject: [Biopython-dev] Bio/Entrez/__init__.py Message-ID: Hi, There is a doctest line that is making 2to3 go bonkers on Bio.Entrez (__init__.py) Line 55 >>> for record in records: ... # each record is a Python dictionary or list. Simplying adding a ... pass Is enough (the code should not work as it is an empty for, so 2to3 is actually correct) -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Mon Nov 8 11:38:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 8 Nov 2010 16:38:08 +0000 Subject: [Biopython-dev] Bio/Entrez/__init__.py In-Reply-To: References: Message-ID: 2010/11/8 Tiago Ant?o : > Hi, > > There is a doctest line that is making 2to3 go bonkers on Bio.Entrez > (__init__.py) > Line 55 > ? ? ? ? ? ? >>> for record in records: > ? ? ? ? ? ? ... ? ? # each record is a Python dictionary or list. > > Simplying adding a > ... ? ? ? pass > > Is enough (the code should not work as it is an empty for, so 2to3 is > actually correct) Ah - that isn't actually being used as a doctest (we don't call it in run_tests.py) and it wouldn't work if we tried because half the function arguments are omitted or left as dots. I like your solution of adding the pass line. Peter From mjldehoon at yahoo.com Mon Nov 8 20:22:39 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 8 Nov 2010 17:22:39 -0800 (PST) Subject: [Biopython-dev] Bio/Entrez/__init__.py In-Reply-To: Message-ID: <365364.32303.qm@web62403.mail.re1.yahoo.com> I've added this line: ... print record which should solve the 2to3 error. --Michiel. --- On Mon, 11/8/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py > To: "Tiago Ant?o" > Cc: "BioPython-Dev Mailing List" > Date: Monday, November 8, 2010, 11:38 AM > 2010/11/8 Tiago Ant?o : > > Hi, > > > > There is a doctest line that is making 2to3 go bonkers > on Bio.Entrez > > (__init__.py) > > Line 55 > > ? ? ? ? ? ? >>> for record in records: > > ? ? ? ? ? ? ... ? ? # each record is a Python > dictionary or list. > > > > Simplying adding a > > ... ? ? ? pass > > > > Is enough (the code should not work as it is an empty > for, so 2to3 is > > actually correct) > > Ah - that isn't actually being used as a doctest (we don't > call it > in run_tests.py) and it wouldn't work if we tried because > half > the function arguments are omitted or left as dots. > > I like your solution of adding the pass line. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Tue Nov 9 04:12:29 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Nov 2010 09:12:29 +0000 Subject: [Biopython-dev] Bio/Entrez/__init__.py In-Reply-To: <365364.32303.qm@web62403.mail.re1.yahoo.com> References: <365364.32303.qm@web62403.mail.re1.yahoo.com> Message-ID: The buildbot server VM is currently down (Chris is moving it to another physical location). As soon as the machine is back up, I will activate the server and maybe we can start activating things on a Mac architecture. I was thinking in sending emails to the list (automatically) when a build that was previously working, stops doing so...? 2010/11/9 Michiel de Hoon : > I've added this line: > > ? ?... ? ?print record > > which should solve the 2to3 error. > > --Michiel. > > --- On Mon, 11/8/10, Peter wrote: > >> From: Peter >> Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py >> To: "Tiago Ant?o" >> Cc: "BioPython-Dev Mailing List" >> Date: Monday, November 8, 2010, 11:38 AM >> 2010/11/8 Tiago Ant?o : >> > Hi, >> > >> > There is a doctest line that is making 2to3 go bonkers >> on Bio.Entrez >> > (__init__.py) >> > Line 55 >> > ? ? ? ? ? ? >>> for record in records: >> > ? ? ? ? ? ? ... ? ? # each record is a Python >> dictionary or list. >> > >> > Simplying adding a >> > ... ? ? ? pass >> > >> > Is enough (the code should not work as it is an empty >> for, so 2to3 is >> > actually correct) >> >> Ah - that isn't actually being used as a doctest (we don't >> call it >> in run_tests.py) and it wouldn't work if we tried because >> half >> the function arguments are omitted or left as dots. >> >> I like your solution of adding the pass line. >> >> Peter >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > > > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Tue Nov 9 04:57:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 09:57:47 +0000 Subject: [Biopython-dev] Continuous integration server Message-ID: 2010/11/9 Tiago Ant?o : > The buildbot server VM is currently down (Chris is moving it to > another physical location). As soon as the machine is back up, I will > activate the server and maybe we can start activating things on a Mac > architecture. > > I was thinking in sending emails to the list (automatically) when a > build that was previously working, stops doing so...? > That sounds worth trying, as it removes the need for us to actively check the buildbot server's webreport. Alternatively we should be able to use the RSS/Atom feed. One concern is if we have (say) 8 builtbot slaves, and a change on the trunk accidentally breaks a unit test (on all platforms), does that mean we'd get one email or eight? Peter From tiagoantao at gmail.com Tue Nov 9 05:14:37 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Nov 2010 10:14:37 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Peter : > That sounds worth trying, as it removes the need for us to actively > check the buildbot server's webreport. Alternatively we should be > able to use the RSS/Atom feed. The web interface has RSS and atom. > One concern is if we have (say) 8 builtbot slaves, and a change on > the trunk accidentally breaks a unit test (on all platforms), does that > mean we'd get one email or eight? It can be configured to send only 1. I just cannot promise that I will get the configuration right at the first time ;) . But it can be done. From biopython at maubp.freeserve.co.uk Tue Nov 9 05:33:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 10:33:26 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Tiago Ant?o : > 2010/11/9 Peter : >> That sounds worth trying, as it removes the need for us to actively >> check the buildbot server's webreport. Alternatively we should be >> able to use the RSS/Atom feed. > > The web interface has RSS and atom. Yet another feed for me to track :) Emails have the advantage of being logged on the mailing list archive. Lets try it and see how it goes. >> One concern is if we have (say) 8 builtbot slaves, and a change on >> the trunk accidentally breaks a unit test (on all platforms), does that >> mean we'd get one email or eight? > > It can be configured to send only 1. I just cannot promise that I will > get the configuration right at the first time ;) . But it can be done. I thought they (buildbot) would have considered that example :) You'll probably need the buildbot server's email address added to the biopython-dev mailing list's white list - let me know nearer the time. Peter From tiagoantao at gmail.com Tue Nov 9 09:07:56 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Nov 2010 14:07:56 +0000 Subject: [Biopython-dev] bugzilla jython platform Message-ID: Hi, Just a minor thingy: would it be possible to have a bugzilla platform called jython? (Or OS). I am going to report a bug on Jython and noticed that it is not available. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From bugzilla-daemon at portal.open-bio.org Tue Nov 9 09:09:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 09:09:42 -0500 Subject: [Biopython-dev] [Bug 3155] New: Some Phylip tools seem to fail on Jython Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3155 Summary: Some Phylip tools seem to fail on Jython Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com According to the integration tests, some Phylip tools seem to fail on Jython. Please see below or http://events.open-bio.org:8010/builders/jython/builds/18 ====================================================================== ERROR: pseudosample a phylip DNA alignment written with AlignIO ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 270, in test_bootstrap_AlignIO_DNA self.check_bootstrap("Phylip/opuntia.phy", "phylip") File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 251, in check_bootstrap raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fseqboot -auto -filter -outfile=test_file -sequence=Phylip/opuntia.phy -seqtype=d -reps=2 ====================================================================== ERROR: pseudosample a phylip protein alignment written with AlignIO ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 279, in test_bootstrap_AlignIO_protein self.check_bootstrap("Phylip/hedgehog.phy", "phylip", "p") File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 251, in check_bootstrap raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fseqboot -auto -filter -outfile=test_file -sequence=Phylip/hedgehog.phy -seqtype=p -reps=2 ====================================================================== ERROR: Calculate distance matrix from an AlignIO written protein alignment ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 157, in test_distances_from_protein_AlignIO self.distances_from_alignment("Phylip/hedgehog.phy", DNA=False) File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 117, in distances_from_alignment raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fprotdist -auto -outfile=test_file -sequence=Phylip/hedgehog.phy -method=j ====================================================================== ERROR: Make a parsimony tree from an alignment written with AlignIO ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 210, in test_parsimony_tree_from_AlignIO_DNA self.parsimony_tree("Phylip/opuntia.phy", "phylip") File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 194, in parsimony_tree raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fdnapars -auto -stdout -sequence=Phylip/opuntia.phy -outtreefile=test_file ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 9 09:14:10 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 14:14:10 +0000 Subject: [Biopython-dev] bugzilla jython platform In-Reply-To: References: Message-ID: 2010/11/9 Tiago Ant?o : > Hi, > > Just a minor thingy: would it be possible to have a bugzilla platform > called jython? (Or OS). > > I am going to report a bug on Jython and noticed that it is not available. > It doesn't make sense to me to add Jython as an OS (for one thing, the OS field is used by all the Bio* projects on our bugzilla, also you can run Jython on Windows/Mac/Linux etc). Currently we don't even have a field for the Python version... maybe we should add a whole new (Biopython only) field for this (e.g. with Python 2.4, 2.5, 2.6, 2.7, 3.1, and Jython 2.5 as choices for now). Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 9 09:26:57 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 09:26:57 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091426.oA9EQvws028228@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-09 09:26 EST ------- I realise I don't have EMBOSS phylipnew installed on my machine with Jython, so the test has just been skipped. What version of Jython? What version of EMBOSS, and the phylipnew package? Do these tests pass *on the same machine* if run in normal (C) Python? Alternately, do these four command line examples work when run by hand? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 09:55:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 09:55:50 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091455.oA9Eto7n029965@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #2 from tiagoantao at gmail.com 2010-11-09 09:55 EST ------- (In reply to comment #1) > What version of Jython? Jython 2.5.2rc2 > What version of EMBOSS, and the phylipnew package? EMBOSS 6.0.1 Phylip seems 3.68 > Do these tests pass *on the same machine* if run in normal (C) Python? Yep. This is the same machine as the one doing integration testing in C-Python > Alternately, do these four command line examples work when run by hand? No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy does not exist. Indeed this should not work, I think -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 10:05:29 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 10:05:29 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091505.oA9F5SxD030383@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-09 10:05 EST ------- (In reply to comment #2) > (In reply to comment #1) > > What version of Jython? > > Jython 2.5.2rc2 Can you easily update to Jython 2.5.2 (actual release)? > > What version of EMBOSS, and the phylipnew package? > > EMBOSS 6.0.1 > Phylip seems 3.68 Your EMBOSS is a bit old, but should be fine. > > Do these tests pass *on the same machine* if run in normal (C) Python? > > Yep. This is the same machine as the one doing integration testing in C-Python > Good - that means we can rule out EMBOSS being too old. > > Alternately, do these four command line examples work when run by hand? > > No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy > does not exist. Indeed this should not work, I think > The unit tests create Phylip/opuntia.phy at runtime, converted from Clustalw/opuntia.aln -- I'd forgotten about that and it does make testing the individual commands harder. The point here is to ensure the PHYLIP likes what we write out as PHYLIP format. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 10:11:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 10:11:37 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091511.oA9FBbaK030580@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #4 from tiagoantao at gmail.com 2010-11-09 10:11 EST ------- > Can you easily update to Jython 2.5.2 (actual release)? rc2 is the most recent. I can do 2.5.*1* -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 10:33:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 10:33:39 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091533.oA9FXdSo031629@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-09 10:33 EST ------- (In reply to comment #4) > > Can you easily update to Jython 2.5.2 (actual release)? > > rc2 is the most recent. I can do 2.5.*1* Sorry - my mistake. I have Jython 2.5.1 (final release). I'll try to get EMBOSS phylipnew on this machine (useful anyway as a potential buildbot slave). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 9 17:54:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 22:54:13 +0000 Subject: [Biopython-dev] buildbot and setup.py Message-ID: Hi all, For the continuous integration server, it is important to be able to run setup.py without it prompting the user. There are (just?) two potential prompts at the moment. First, if running on Python 3, it asks the user to confirm they have run 2to3 as per the README file. This was done as a bit of a hack - perhaps now that most of the Python code works on Py3 we can avoid this? Second, if running without NumPy, it asks the user if they really want to do this as it is best to install NumPy to use all of Biopython. For the purposes of the buildbot, I think we should have at least one build-slave without NumPy. This should then catch any regressions in the test suite. Since Jython doesn't have NumPy (and so we don't prompt about it) then maybe that would double in this role for the test matrix ;) Right now Tiago has solved the first prompt (about 2to3) by piping a "y\n" into stdin. I guess piping two would solve the case of no NumPy on Py3 ;) However, do we need an --auto or --force flag to bypass these yes or no prompts in setup.py? (Meanwhile I'm off to install NumPy under Python 3 on my Linux box which will avoid the issue for now) Peter From tiagoantao at gmail.com Tue Nov 9 19:15:02 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 00:15:02 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Peter : > One concern is if we have (say) 8 builtbot slaves, and a change on > the trunk accidentally breaks a unit test (on all platforms), does that > mean we'd get one email or eight? I was wrong here. It is not possible to send only one email. I misread the documentation. But it is quite simple to extend the mail system (by code) to do this. I least it seems simple: I will have a try at it tomorrow. For now I am only sending automated emails to myself and Peter. If anyone wants to be in the loop, please tell me. As soon as the system is reliable I will send to biopython-dev. From tiagoantao at gmail.com Tue Nov 9 19:21:15 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 00:21:15 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Peter : >> The web interface has RSS and atom. > > Yet another feed for me to track :) In order to minimize the number of feed entries one can specify constraints, useful is just to report failed builds. Like this http://events.open-bio.org:8010/rss?failures_only=true Which only shows entries that relate to failures. Tiago From eric.talevich at gmail.com Tue Nov 9 22:04:38 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 9 Nov 2010 22:04:38 -0500 Subject: [Biopython-dev] buildbot and setup.py In-Reply-To: References: Message-ID: On Tue, Nov 9, 2010 at 5:54 PM, Peter wrote: > Hi all, > > For the continuous integration server, it is important > to be able to run setup.py without it prompting the > user. There are (just?) two potential prompts at the > moment. > > [...] > However, do we need an --auto or --force flag > to bypass these yes or no prompts in setup.py? > I'd find a flag like that convenient for running setup.py manually, too. For reference: apt-get takes a "-y" option which assumes a "yes" answer to all prompts, just like this. -Eric From biopython at maubp.freeserve.co.uk Wed Nov 10 06:48:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 11:48:30 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython Message-ID: Hi Taigo, >From your buildbot log for Jython 2.5.2 (release candidate 2), and also my Mac OS Jython 2.5.1 install, we have a PopGen failure: ====================================================================== FAIL: Test get alleles. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20]) AssertionError: [20, 3] != [3, 20] Notice that by using the unittest assertEqual method we get to see the values compared: https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a Before the change the output was like this: ====================================================================== FAIL: Test get alleles. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles assert self.ctrl.get_alleles(0,"Locus3") == [3, 20] AssertionError It is interesting that Jython is giving [20, 3] rather than [3, 20]. My guess would be this is down to something python implementation specific like the sort order of dictionaries or sets, in which case the unittest needs to compare sorted lists -- or the get_alleles method needs a sort? Peter From tiagoantao at gmail.com Wed Nov 10 08:05:59 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 13:05:59 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: I know, this might be an issue with the jython version (being just a release candidate). I am going to wait for results on 2.5.1 and compare. Or I might just install it myself and see. Is there any reason for the unittest framework to ignore OSErrors? I am getting some OSErrors (just in jython 2.5.2) and they are being ignored (but reported as warnings)... Tiago 2010/11/10 Peter : > Hi Taigo, > > From your buildbot log for Jython 2.5.2 (release candidate 2), and > also my Mac OS > Jython 2.5.1 install, we have a PopGen failure: > > ====================================================================== > FAIL: Test get alleles. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py", > line 57, in test_get_alleles > ? ?self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20]) > AssertionError: [20, 3] != [3, 20] > > Notice that by using the unittest assertEqual method we get to see the > values compared: > https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a > > Before the change the output was like this: > > ====================================================================== > FAIL: Test get alleles. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles > ? ?assert self.ctrl.get_alleles(0,"Locus3") == [3, 20] > AssertionError > > > It is interesting that Jython is giving [20, 3] rather than [3, 20]. My > guess would be this is down to something python implementation > specific like the sort order of dictionaries or sets, in which case > the unittest needs to compare sorted lists -- or the get_alleles > method needs a sort? > > Peter > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Wed Nov 10 08:15:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 13:15:16 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/10 Tiago Ant?o : > > I know, this might be an issue with the jython version (being just a > release candidate). I am going to wait for results on 2.5.1 and > compare. Or I might just install it myself and see. I also see the same test_get_alleles failure on the Mac and on Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase candidate specific issue. > Is there any reason for the unittest framework to ignore OSErrors? I > am getting some OSErrors (just in jython 2.5.2) and they are being > ignored (but reported as warnings)... > > Tiago I've just recently put Jython 2.5.1 on my Windows box, and in addition to the test_get_alleles failure, I also see OSErrors about being unable to delete files (but the F stats test still passes). This seems to be a wider issue, affecting more than just test_PopGen_GenePop_EasyController.py, but it does seem to be OS specific (no problems deleting files in Jython 2.5.1 on my Mac, I've not tried on Linux). Peter From biopython at maubp.freeserve.co.uk Wed Nov 10 09:14:07 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 14:14:07 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows Message-ID: Hi Tiago Is/was test_PopGen_SimCoal.py working for you on Windows? I'm getting "Output directory not created!" under Python 2.6 I've also tried it under Jython 2.5.1 and had to tweak things to find the executable, thus: https://github.com/biopython/biopython/commit/95cba71f7286860fa9cd79843c47b075a2f530a6 Now both Jython 2.5.1 and Python 2.6 give the same error, "Output directory not created!" (progress I suppose). Peter P.S. On the bright side, both the FDist2 and DFDist tests are passing on Windows on Python 2.6 and Jython 2.5.1 now (after a couple of little tweaks). From tiagoantao at gmail.com Wed Nov 10 09:35:31 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 14:35:31 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/10 Peter : > I've just recently put Jython 2.5.1 on my Windows box, and > in addition to the test_get_alleles failure, I also see OSErrors > about being unable to delete files (but the F stats test still > passes). This seems to be a wider issue, affecting more than > just test_PopGen_GenePop_EasyController.py, but it does > seem to be OS specific (no problems deleting files in > Jython 2.5.1 on my Mac, I've not tried on Linux). The OSError has to potential to be somewhat nasty (i.e. throughout other Bio.* modules) as it is silent. There might be tests failing that report OK. Tiago From tiagoantao at gmail.com Wed Nov 10 09:42:18 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 14:42:18 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows In-Reply-To: References: Message-ID: 2010/11/10 Peter : > Hi Tiago > > Is/was test_PopGen_SimCoal.py working for you on Windows? > I'm getting "Output directory not created!" under Python 2.6 This code is used 99.99% on Jython (as the fdist/dfdist code and genepop parser, BTW). I happen to test on Linux. I will fire my Windows machine and have a look, but I do not have it at hand. This will have to wait a few hours or a couple of days at most) > Now both Jython 2.5.1 and Python 2.6 give the same error, > "Output directory not created!" (progress I suppose). I cannot test this here, but I am 99% sure that the problem is the executable name (case sensitive on Windows and Mac, maybe even on Windows Jython?). If it is compiled with a capital S (seen happening) it might be a problem. > P.S. On the bright side, both the FDist2 and DFDist tests are > passing on Windows on Python 2.6 and Jython 2.5.1 now > (after a couple of little tweaks). Were they failing on Jython? I do have a reasonable amount of users on my applications (jython based)... -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Wed Nov 10 10:13:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 15:13:27 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows In-Reply-To: References: Message-ID: 2010/11/10 Tiago Ant?o : > > 2010/11/10 Peter : >> Hi Tiago >> >> Is/was test_PopGen_SimCoal.py working for you on Windows? >> I'm getting "Output directory not created!" under Python 2.6 > > This code is used 99.99% on Jython (as the fdist/dfdist code and > genepop parser, BTW). I happen to test on Linux. > I will fire my Windows machine and have a look, but I do not have it > at hand. This will have to wait a few hours or a couple of days at > most) > > >> Now both Jython 2.5.1 and Python 2.6 give the same error, >> "Output directory not created!" (progress I suppose). > > I cannot test this here, but I am 99% sure that the problem is the > executable name (case sensitive on Windows and Mac, maybe even on > Windows Jython?). If it is compiled with a capital S (seen happening) > it might be a problem. It could also be something with spaces in filenames, much more common on Windows :( >> P.S. On the bright side, both the FDist2 and DFDist tests are >> passing on Windows on Python 2.6 and Jython 2.5.1 now >> (after a couple of little tweaks). > > Were they failing on Jython? I do have a reasonable amount > of users on my applications (jython based)... I tweaked the executable checking in the unit tests, it now looks for all four binaries required, and works on Windows (both Python and Jython) and Mac (both Python and Jython). Peter From biopython at maubp.freeserve.co.uk Wed Nov 10 12:35:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 17:35:37 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows In-Reply-To: References: Message-ID: 2010/11/10 Peter : >> >> I cannot test this here, but I am 99% sure that the problem is the >> executable name (case sensitive on Windows and Mac, maybe even on >> Windows Jython?). If it is compiled with a capital S (seen happening) >> it might be a problem. > > It could also be something with spaces in filenames, much > more common on Windows :( > Yep, that was it. Fixed: https://github.com/biopython/biopython/commit/e24f1662b5e619d558fea17c11ddea12c3561e53 I've got my Windows box running as a buildslave now, so fingers crossed it will all be green. Peter From lpritc at scri.ac.uk Thu Nov 11 09:12:21 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 11 Nov 2010 14:12:21 +0000 Subject: [Biopython-dev] Bioinformatics position Message-ID: We have a bioinformatics post available at SCRI, and would be grateful if you could please bring it to the attention of any colleagues who may be interested in applying. It is advertised at http://www.jobs.ac.uk/job/ABS904/bioinformatics/ and some details are included below: """ Bioinformatics Scottish Crop Research Institute- SCRI SCRI is Scotland's leading Institute for research on plants and their interactions with the environment, particularly in managed ecosystems. Our mission is to conduct excellent research in plant and environmental sciences. Our vision is to deliver innovative products, knowledge and services that enrich the life of the community and address the public goods of environmental sustainability, high quality and healthy food. Post Reference SMB/1/10 Research in the Plant Pathology Programme at SCRI is founded on pathogen genomics, and scientists in the Programme have a strong track record of contributing to whole genome sequencing and genetic analysis of economically important pests and pathogens.? The successful candidate will collaborate with other groups in the Programme working on plant-pathogen interactions developing innovative approaches to understand disease processes.?This post provides an opportunity to influence biological research of direct impact to agriculture. The ideal candidate would be experienced in manipulating and curating large biological datasets with a record of collaboration and integration with biologists.The successful applicant is expected to have an interest in plant-pathogen interactions and to develop their own research profile.The candidate should have a PhD or equivalent in bioinformatics, biostatistics or a related field. Informal enquiries from:??Leighton.Pritchard at scri.ac.uk ?or?Lesley.Torrance at scri.ac.uk Salary Scale For All Posts: *Band D/E, ?26,610 - ?37,534 (commensurate with experience) *Appointments to Band F, ?42,769 - ?47,521 available for exceptional candidates. Candidates willing to apply for a research fellowship to further help establish their own laboratory are encouraged to apply and will, if successful, benefit from generous Institute support throughout the tenure of their fellowship. Further information on the above posts, including how to apply, is available on the SCRI website athttp://www.scri.ac.uk/careers/vacancies ? Closing date -?Friday 19th?November 2010. The Institute is an equal opportunities employer. """ Many thanks, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Thu Nov 11 11:45:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 16:45:43 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: On Thu, Nov 11, 2010 at 4:08 PM, Andrea Pierleoni wrote: > I finally found the time, and the 62Gb needed to test the TrEmbl database > in uniprot xml format. Is that the size on disk of the XML file? 62GB is a lot. > the analisis ic currently going, but so far I've been able to parse 1 > million entries out of 12 millions (it will go overnight...) > > I've had just one problem with the entry: Q2LEH1_9ROSI > in the downloaded files, there are multiple organism name fields, one of > wich is empty: > > ... > ? > ? ? > ? ?Populus tomentosa x P. bolleana) x P. tomentosa > var. truncat > ... > > this part of the file is differentially reported on the uniprot server at: > http://www.uniprot.org/uniprot/Q2LEH1.xml > > ... > ? > ?(Populus tomentosa x P. bolleana) x P. tomentosa > var. truncata > ... > > now, given also the missing start parenthesis, I think there is an error > non the downloaded XML file. It sounds like it - have you told UniProt? > I've attached a patch that should cope with this issue. I don't know if > there are more "errors" in the xml file. > the patch was made on the current version of biopython master branch on > github and is valid for commit ?9363c3cdc5f51805f247. > > Andrea Checked in, thanks: https://github.com/biopython/biopython/commit/38da3ff264fe180e903cda4c143a7aa9be3d431a Peter From andrea at biocomp.unibo.it Thu Nov 11 11:08:58 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Thu, 11 Nov 2010 17:08:58 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: I finally found the time, and the 62Gb needed to test the TrEmbl database in uniprot xml format. the analisis ic currently going, but so far I've been able to parse 1 million entries out of 12 millions (it will go overnight...) I've had just one problem with the entry: Q2LEH1_9ROSI in the downloaded files, there are multiple organism name fields, one of wich is empty: ... Populus tomentosa x P. bolleana) x P. tomentosa var. truncat ... this part of the file is differentially reported on the uniprot server at: http://www.uniprot.org/uniprot/Q2LEH1.xml ... (Populus tomentosa x P. bolleana) x P. tomentosa var. truncata ... now, given also the missing start parenthesis, I think there is an error non the downloaded XML file. I've attached a patch that should cope with this issue. I don't know if there are more "errors" in the xml file. the patch was made on the current version of biopython master branch on github and is valid for commit 9363c3cdc5f51805f247. Andrea -------------- next part -------------- A non-text attachment was scrubbed... Name: UniprotIO.patch Type: / Size: 610 bytes Desc: not available URL: From andrea at biocomp.unibo.it Thu Nov 11 12:15:08 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Thu, 11 Nov 2010 18:15:08 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: > > Is that the size on disk of the XML file? 62GB is a lot. yes, my macbook is getting very hot... > It sounds like it - have you told UniProt? I've notified them, let's see what they say... Anyhow the parser works. I just don't know if we should have an internet browser-like approach interpreting errors, or just be consistent and raise an error if there is a format error. in this case an empty organism name is an error. From biopython at maubp.freeserve.co.uk Thu Nov 11 14:16:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 19:16:57 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/10 Peter : > 2010/11/10 Tiago Ant?o : >> >> I know, this might be an issue with the jython version (being just a >> release candidate). I am going to wait for results on 2.5.1 and >> compare. Or I might just install it myself and see. > > I also see the same test_get_alleles failure on the Mac and on > Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase > candidate specific issue. Yes, the order just came from the order of a dict's keys - which is Python implementation dependent. Quick fix committed: https://github.com/biopython/biopython/commit/2aa604e54df02804219e092141bb32728b021a64 If you actually care about the order, then perhaps add a sorted(...) to the get_alleles method itself instead? Peter From biopython at maubp.freeserve.co.uk Thu Nov 11 15:19:05 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 20:19:05 +0000 Subject: [Biopython-dev] Jython on Windows: OSError deleting files Message-ID: Hi all, I recently installed Jython 2.5.1 on Windows XP (32 bit) for use as a build slave. This showed up some new bugs, in particular several problems with trying to delete temp files triggering an OSError. It turns out this can be triggered by trying to delete a file while we still have a handle open on it. This is a Windows limitation, but we don't see it on normal Python because there the garbage collector closes handles promptly when they go out of scope. The Java garbage collector doesn't do that. See also: http://web.archiveorange.com/archive/v/8tc1Z6ysA03SXedms7TA In particular, I am aware that if given a filename the SeqIO and AlignIO read and parse functions did not explicitly close the handle they open. I was intending to address this with a with statement in Python 2.5+, but it can be solved in Python 2.4 as well. I have started to address this, e.g. https://github.com/biopython/biopython/commit/0fb039b745b0b2ddacf2a6c9ee8afcdb56018f3c https://github.com/biopython/biopython/commit/936ea5f348cc1feea8556d263761e77ce960217e Assuming it will be easier to fix on Python 2.5+, it might be pragmatic to ignore the issue in the short term since it only seems to affect Jython on Windows. Peter From rjalves at igc.gulbenkian.pt Thu Nov 11 17:06:06 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Nov 2010 22:06:06 +0000 Subject: [Biopython-dev] Uniprot parsers Message-ID: <4CDC68CE.9070401@igc.gulbenkian.pt> Hi everyone, With the arrival of the Uniprot XML parser, is the swiss format still going to be maintained? I just clashed with a 'swiss' format parsing problem present in the 1.55b release (and previous releases). Seems like the format might have changed. One random case is [1] where all of the 2nd and following IDs are ignored by the parser. In Ensembl, for instance, the parser only collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd) identifiers. Is this a known issue? Regards, Renato [1] http://www.uniprot.org/uniprot/P31946.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature URL: From biopython at maubp.freeserve.co.uk Thu Nov 11 17:26:22 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 22:26:22 +0000 Subject: [Biopython-dev] Uniprot parsers In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt> References: <4CDC68CE.9070401@igc.gulbenkian.pt> Message-ID: On Thu, Nov 11, 2010 at 10:06 PM, Renato Alves wrote: > Hi everyone, > > With the arrival of the Uniprot XML parser, is the swiss format still > going to be maintained? Definitely yes in the short term, for one thing the swiss files are smaller and much faster to parse. I suspect UniProt themselves may want to retire the swiss text format at some point, but moving every user over to XML will take some time. > I just clashed with a 'swiss' format parsing problem present in the > 1.55b release (and previous releases). Seems like the format might have > changed. > > One random case is [1] where all of the 2nd and following IDs are > ignored by the parser. In Ensembl, for instance, the parser only > collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd) > identifiers. > > Is this a known issue? > No - could you file a bug one this with a short example to explain what result you get, and what you want. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 11 18:09:04 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Nov 2010 18:09:04 -0500 Subject: [Biopython-dev] [Bug 3156] New: UniProt XML and SwissProt parsers silently fail to parse all of database references Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3156 Summary: UniProt XML and SwissProt parsers silently fail to parse all of database references Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rjalves at igc.gulbenkian.pt Example code: from Bio import SeqIO, ExPASy entry = SeqIO.read(ExPASy.get_sprot_raw('P31946'), 'swiss') If you then inspect entry.dbxrefs, you can see that it includes: ['Ensembl:ENST00000353703', 'Ensembl:ENST00000372839'] but not ['Ensembl:ENSP00000300161', 'Ensembl:ENSG00000166913'. 'Ensembl:ENSP00000361930', 'Ensembl:ENSG00000166913'] which are present in the original file as: DR Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913. DR Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913. The same happens with the XML format and the new uniprot-xml parser where the original file contains: -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Thu Nov 11 17:32:41 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Nov 2010 22:32:41 +0000 Subject: [Biopython-dev] Uniprot parsers In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt> References: <4CDC68CE.9070401@igc.gulbenkian.pt> Message-ID: <4CDC6F09.9090506@igc.gulbenkian.pt> Actually I just tested the Uniprot-XML parser and it seems to suffer from the same issue... It ignores the following XML "properties": Quoting Renato Alves on 11/11/2010 10:06 PM: > Hi everyone, > > With the arrival of the Uniprot XML parser, is the swiss format still > going to be maintained? > > I just clashed with a 'swiss' format parsing problem present in the > 1.55b release (and previous releases). Seems like the format might have > changed. > > One random case is [1] where all of the 2nd and following IDs are > ignored by the parser. In Ensembl, for instance, the parser only > collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd) > identifiers. > > Is this a known issue? > > Regards, > Renato > > [1] http://www.uniprot.org/uniprot/P31946.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature URL: From bugzilla-daemon at portal.open-bio.org Thu Nov 11 18:50:46 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Nov 2010 18:50:46 -0500 Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers silently fail to parse all of database references In-Reply-To: Message-ID: <201011112350.oABNokG9031101@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3156 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-11 18:50 EST ------- That was by design, dbxrefs is a flat list and for consistency with other formats we have only stored the primary identifier. Would you regard this as two primary cross references, or six? DR Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913. DR Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 11 18:59:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Nov 2010 18:59:20 -0500 Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers silently fail to parse all of database references In-Reply-To: Message-ID: <201011112359.oABNxKcn031294@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3156 ------- Comment #2 from rjalves at igc.gulbenkian.pt 2010-11-11 18:59 EST ------- Five primary references since ENSG00000166913 is repeated twice (once per line). More precisely, ENSG = Ensembl Gene ENST = Ensembl Transcript ENSP = Ensembl Protein -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andrea at biocomp.unibo.it Thu Nov 11 20:02:14 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 02:02:14 +0100 (CET) Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers silently fail to parse all of database references In-Reply-To: References: Message-ID: <7c21462addfa62e09fd6c42135cc7d76.squirrel@lipid.biocomp.unibo.it> it was by construction also in the XML format, there is also a comment at line 343 of UniprotIO.py to address this issue. to parse this type of data an adapter for each db type should be written, since each DB has different data, ancd can have different structurese. also note that the Ensembl reference fields as recently undergone a change of format in the XML file: http://www.uniprot.org/docs/xml_news.htm this happens in release 2010_10. Andrea From andrea at biocomp.unibo.it Fri Nov 12 05:24:07 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 11:24:07 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> WIth the submitted patch the parser was able to correctly parse 12.347.303 entries in the 62Gb XML file in 2h 13m. it looks like a reasonable performance to me, since you are going to spend more time in downloading the 8Gb gzipped file and decompressing it. Andrea From biopython at maubp.freeserve.co.uk Fri Nov 12 05:29:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Nov 2010 10:29:51 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 12, 2010 at 10:24 AM, Andrea Pierleoni wrote: > WIth the submitted patch the parser was able to correctly parse 12.347.303 > entries in the 62Gb XML file in 2h 13m. That's good - but I thought the patch broke the unit test so I reverted it last night. I'll double check this. > it looks like a reasonable performance to me, since you are going to spend > more time in downloading the 8Gb gzipped file and decompressing it. On the other hand, you only download it once, and will probably only decompress it once (although you can parse gzipped files from within python if you want to), but you will parse it many times. My point is it probably could be made faster (if anyone wanted to spend the time), but it is fast enough already to be useful, and worth having in Biopython :) Peter From andrea at biocomp.unibo.it Fri Nov 12 06:05:43 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 12:05:43 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> Message-ID: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it> > That's good - but I thought the patch broke the unit test so I reverted it > last night. I'll double check this. > yes I've seen it in github, can you fix it? > On the other hand, you only download it once, and will probably only > decompress it once (although you can parse gzipped files from within > python if you want to), but you will parse it many times. > well, if your looking to performance, you're not scanning a 62Gb file each time you search for an entry, but your going to index it. the of course it depends on what you are doing... but, given the monthly release, maybe you're downloading and decompressing (or parsing a compressed file) once a month. > My point is it probably could be made faster (if anyone wanted to spend > the time), but it is fast enough already to be useful, and worth having > in Biopython :) Yes, I hope it can be made faster, but I have no idea about this, since the process is very straightforward. I did not make any profiling of the parser, so I cannot exclude some bottleneck. the only obvious speed up would be using the multiprocessing library in multi-cpu system, but I've never seen it used in biopython. It should be really easy to implement, and maybe we can think about it after python 2.4 support is dropped. as far as i know, multiprocessing is included in python 2.6 and available in python 2.5. On the other hand, Biopython has the fastest uniprot XML parser among Bio* projects and (to my knowledge) the fastest public parser on the planet ;) I bet Uniprot guys have their parser... Andrea From biopython at maubp.freeserve.co.uk Fri Nov 12 07:00:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Nov 2010 12:00:42 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 12, 2010 at 11:05 AM, Andrea Pierleoni wrote: > >> That's good - but I thought the patch broke the unit test so I reverted it >> last night. I'll double check this. >> > > yes I've seen it in github, can you fix it? > Probably. I'll make time to look at it before the Biopython 1.56 release (which is unlikely to happen this week, delayed by the identification of some problems running under Jython on Windows). >> On the other hand, you only download it once, and will probably only >> decompress it once (although you can parse gzipped files from within >> python if you want to), but you will parse it many times. >> > > well, if your looking to performance, you're not scanning a 62Gb file > each time you search for an entry, but your going to index it. the of > course it depends on what you are doing... but, given the monthly > release, maybe you're downloading and decompressing (or parsing > a compressed file) once a month. Yeah, it depends. >> My point is it probably could be made faster (if anyone wanted to spend >> the time), but it is fast enough already to be useful, and worth having >> in Biopython :) > > Yes, I hope it can be made faster, but I have no idea about this, since > the process is very straightforward. I did not make any profiling of the > parser, so I cannot exclude some bottleneck. That would be worth while at some point. > the only obvious speed up would be using the multiprocessing library in > multi-cpu system, but I've never seen it used in biopython. We haven't been able to due to the Python 2.4 requirement, but I know of people using Biopython and multiprocessing together. > It should be really easy to implement, and maybe we can think about > it after python 2.4 support is dropped. ?as far as i know, multiprocessing > is included in python 2.6 and available in python ?2.5. Personally I'd try profiling the current single threaded code before going to multiprocessing. > On the other hand, Biopython has the fastest uniprot XML parse > among Bio* projects and (to my knowledge) the fastest public > parser on the planet ;) I bet Uniprot guys have their parser... Which of the other Bio* projects have a Uniprot XML parser? (Or was that intended as a joke?) Peter From p.j.a.cock at googlemail.com Fri Nov 12 12:18:52 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 12 Nov 2010 17:18:52 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: Hi all, I've exchanged a few emails with Tiago off list regarding an inconsistent test_PopGen_GenePop_EasyController.py problem (most visible on Jython), giving error "Unable to open file genepop.txt". I've just had it from Python 2.7 on a 32bit Linux machine: ====================================================================== ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest) Test get pairwise Fst. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", line 98, in test_get_avg_fst_pair pop_fis = self.ctrl.get_avg_fst_pair() File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", line 162, in get_avg_fst_pair return self._controller.calc_fst_pair(self._fname)[1] File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 819, in calc_fst_pair self._run_genepop([".ST2", ".MIG"], [6,2], fname) File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 296, in _run_genepop % (ret, e_out.strip().split("\n",1)[0])) IOError: GenePop error -11, Unable to open file genepop.txt ====================================================================== ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest) Test get average Fst for pairwise pops on a locus. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", line 93, in test_get_avg_fst_pair_locus self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45) File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", line 166, in get_avg_fst_pair_locus iter = self._controller.calc_fst_pair(self._fname)[0] File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 819, in calc_fst_pair self._run_genepop([".ST2", ".MIG"], [6,2], fname) File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 296, in _run_genepop % (ret, e_out.strip().split("\n",1)[0])) IOError: GenePop error -11, Unable to open file genepop.txt ---------------------------------------------------------------------- This failed twice in a row, then passed four times in a row (Linux, Python 2.7). I suspect the issue was related to machine IO load - during the first tests I had something compiling at the same time. I can't reproduce it on demand :( I've also seen it on the Mac with Apple's Python 2.6 (although usually it is usually fine). However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac. Peter From biopython at maubp.freeserve.co.uk Fri Nov 12 12:47:22 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Nov 2010 17:47:22 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: > Hi all, > > I've mentioned in recent threads that I think we should try and > release Biopython 1.56 this month (November 2010). > > I think the NEWS file is pretty up to date, and covers important > new functionality like Andrea Pierleoni's UniProt XML parser > and the IMGT support (with Uri Laserson). > > Is there any other functionality which is ready for merging? > > For example, Tiago - you've been doing lots of work on your > branch with the PopGen code. Is that code ready? I'm willing > to do the git merge/rebase. > > Is there any reason to bother with a beta release this time? > > If there are no pressing additions, I may be able to do the > release tomorrow - otherwise how about aiming for Thursday > or Friday next week (11 or 12 November)? As people will have noticed, the release didn't happen this week. Tiago has been doing some excellent work with the prototype buildbot server (see http://events.open-bio.org:8010/grid for the current temporary home), and as part of this we've set up a few machines as buildslaves. See this thread: http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008376.html Running under Jython on the Mac showed a few problems which appear to now be sorted, other than an apparent problem with the GenePop tool. Unfortunately running under Jython on Windows XP has revealed several new problems, e.g. http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html As things stand all the tests (*) are fine on "C" Python on Linux, Mac, and Windows. They are also fine on Jython on Linux, give some warnings on Jython on Mac, and 3 errors on Windows. Hopefully we can address these three test failures (or at least understand them) and do Biopython 1.56 at the end of next week instead. Peter (*) We haven't audited all the slave test output to check which tests are being skipped due to missing optional dependencies yet. e.g. command line tools, or Python modules like ReportLab or NetworkX. From p.j.a.cock at googlemail.com Fri Nov 12 12:55:57 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 12 Nov 2010 17:55:57 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/12 Peter Cock : > Hi all, > > I've exchanged a few emails with Tiago off list regarding an inconsistent > test_PopGen_GenePop_EasyController.py problem (most visible on > Jython), giving error "Unable to open file genepop.txt". > > I've just had it from Python 2.7 on a 32bit Linux machine: > > ====================================================================== > ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest) > Test get pairwise Fst. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", > line 98, in test_get_avg_fst_pair > ? ?pop_fis = ?self.ctrl.get_avg_fst_pair() > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", > line 162, in get_avg_fst_pair > ? ?return self._controller.calc_fst_pair(self._fname)[1] > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 819, in calc_fst_pair > ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname) > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 296, in _run_genepop > ? ?% (ret, e_out.strip().split("\n",1)[0])) > IOError: GenePop error -11, Unable to open file genepop.txt > > ====================================================================== > ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest) > Test get average Fst for pairwise pops on a locus. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", > line 93, in test_get_avg_fst_pair_locus > ? ?self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45) > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", > line 166, in get_avg_fst_pair_locus > ? ?iter = self._controller.calc_fst_pair(self._fname)[0] > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 819, in calc_fst_pair > ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname) > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 296, in _run_genepop > ? ?% (ret, e_out.strip().split("\n",1)[0])) > IOError: GenePop error -11, Unable to open file genepop.txt > > ---------------------------------------------------------------------- > > > This failed twice in a row, then passed four times in a row (Linux, Python 2.7). > I suspect the issue was related to machine IO load - during the first > tests I had something compiling at the same time. I can't reproduce > it on demand :( > > I've also seen it on the Mac with Apple's Python 2.6 (although usually it is > usually fine). > > However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac. Well right now on my Mac with Jython, the test passes but with lots of warnings: $ jython test_PopGen_GenePop_EasyController.py Test basic info. ... ok Test Nm estimation. ... ok Test allele frequency. ... ok Test get alleles. ... ok Test get alleles for all populations. ... ok Test average Fis. ... ok Test get pairwise Fst. ... ok Test get average Fst for pairwise pops on a locus. ... Exception OSError: [Errno 0] couldn't delete file: 'big.gen.INF' in > ignored Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in > ignored ok Test F stats. ... ok Test get Fis. ... Exception OSError: [Errno 0] couldn't delete file: 'big.gen.ST2' in > ignored ok Test genotype count. ... ok Test heterozygosity info. ... Exception OSError: [Errno 0] couldn't delete file: 'big.gen.INF' in > ignored Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in > ignored ok Test multilocus F stats. ... ok ---------------------------------------------------------------------- Ran 13 tests in 5.912s Or another example, the same machine as a build slave: http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/9/steps/shell/logs/stdio On the previous build Jython on Mac gave the same error I reported above on Linux with "C" Python 2.7: http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/7/steps/shell/logs/stdio Peter From andrea at biocomp.unibo.it Fri Nov 12 15:45:24 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 21:45:24 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl Message-ID: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> > We haven't been able to due to the Python 2.4 requirement, but > I know of people using Biopython and multiprocessing together. > good > Personally I'd try profiling the current single threaded code before > going to multiprocessing. > yes, of course. >> On the other hand, Biopython has the fastest uniprot XML parse >> among Bio* projects and (to my knowledge) the fastest public >> parser on the planet ;) I bet Uniprot guys have their parser... > > Which of the other Bio* projects have a Uniprot XML parser? > (Or was that intended as a joke?) > It was both a joke and a matter of fact, since I don't know about other publicly available parsers. Usually I look at a glass as half full... Andrea From gawbul at gmail.com Sat Nov 13 16:24:43 2010 From: gawbul at gmail.com (Steve Moss) Date: Sat, 13 Nov 2010 21:24:43 +0000 Subject: [Biopython-dev] Developing for the BioPython project... Message-ID: Hi all, I've just started a PhD centring around evolutionary comparative genomics, and will be focusing on bioinformatics and computational biology methodology. I'm really keen to use Python and BioPython in particular throughout my PhD and would like to contribute any code I can to aid in promoting BioPython as viable alternative to BioPerl, which I feel has a larger user base currently? Is there any particular process of registration to become involved with development, or is it just a case of fork'ing the repository from github? Cheers, Steve -- Kindest regards, Steve Moss http://stevemoss.ath.cx From eric.talevich at gmail.com Sat Nov 13 18:05:24 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 13 Nov 2010 18:05:24 -0500 Subject: [Biopython-dev] Developing for the BioPython project... In-Reply-To: References: Message-ID: On Sat, Nov 13, 2010 at 4:24 PM, Steve Moss wrote: > Hi all, > > I've just started a PhD centring around evolutionary comparative genomics, > and will be focusing on bioinformatics and computational biology > methodology. > > I'm really keen to use Python and BioPython in particular throughout my PhD > and would like to contribute any code I can to aid in promoting BioPython > as > viable alternative to BioPerl, which I feel has a larger user > base currently? Is there any particular process of registration to become > involved with development, or is it just a case of fork'ing the repository > from github? > > Hi Steve, If you've joined the biopython-dev mailing list, you're in the club. Feel free to fork away! To get a feel for where development is focused right now, you can look at our wiki page for active projects: http://biopython.org/wiki/Active_projects We're also collectively working on Python 3 compatibility (C extensions still need some work), though that isn't listed. Since you're a new grad student, you might have some leeway to get involved with Google Summer of Code next summer. The project ideas for Biopython, Open Bio, and NESCent drummed up last year are still worth doing, or might inspire you do do something else on your own: http://biopython.org/wiki/Google_Summer_of_Code http://www.open-bio.org/wiki/Google_Summer_of_Code https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2010 Cheers, Eric From biopython at maubp.freeserve.co.uk Mon Nov 15 09:34:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 15 Nov 2010 14:34:40 +0000 Subject: [Biopython-dev] FASTA filtering by ID Message-ID: Hi all, Something I want to do in several of my workflows is to filter a FASTA file (or potentially other format sequence files) using a list of desired identifiers (e.g. a column from a tabular file). Right now I can achieve this with three steps in Galaxy. Suppose I have: Dataset #1, FASTA file Dataset #2, Tabular file with identifiers of interest (e.g. BLAST hits, or filtered output from a sequence analysis tool) Then: Create tabular Dataset #3 using FASTA-to-tabular on Dataset #1, subject to the enhancement proposed here: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003717.html Create tabular Dataset #4 using join on Datasets #2 and #3 using the matched identifier columns. This does the filtering. Create FASTA Dataset #5 using tabular-to-FASTA on Dataset #4. This works (at least for reasonably sized datasets), but requires three steps and the creation of at least two temporary files. I'd like to introduce another tool under "FASTA manipulation" to do it on one step (rather than three). Am I going against the apparent Galaxy ideal that complex manipulations should be done with tabular files? Would such a FASTA filter tool be of interest to add directly to Galaxy (e.g. under the "FASTA manipulation" section), or better off on the community tool shed? Thanks, Peter From biopython at maubp.freeserve.co.uk Mon Nov 15 12:05:00 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 15 Nov 2010 17:05:00 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: On Fri, Nov 12, 2010 at 5:47 PM, Peter wrote: > On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: >> Hi all, >> >> I've mentioned in recent threads that I think we should try and >> release Biopython 1.56 this month (November 2010). >> >> ... > > As people will have noticed, the release didn't happen this week. > > ... > > Unfortunately running under Jython on Windows XP has > revealed several new problems, e.g. > http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html > > ... > > Hopefully we can address these three test failures (or > at least understand them) and do Biopython 1.56 at > the end of next week instead. Two of the problems on Jython on Windows were down to the Windows specific command line tool detection not being used, now fixed: https://github.com/biopython/biopython/commit/db41d7e4bfd8f5d4ea44bf8254334fcd7b76474f https://github.com/biopython/biopython/commit/7e5b71093c8408de140de1937480e26aaaa5daf1 There was also a heap space problem solved by a more memory efficient __getitem__ method for the UnknownSeq object (still room for improvement here). https://github.com/biopython/biopython/commit/125d8d31d07f57628c231286afae99a178e6f2c5 So, we now have a clean bill of health from the offline tests run on the buildslaves (apart from the occasional GenePop failure where retesting can make it work). I still want to look at the SeqIO/AlignIO handle issue, http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html and also the UniProt XML issue, http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008440.html Peter From peter at maubp.freeserve.co.uk Thu Nov 18 10:47:08 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Nov 2010 15:47:08 +0000 Subject: [Biopython-dev] Dropping Python 2.4 Support? Message-ID: Dear Biopythoneers, Are any of you still using Biopython on Python 2.4? http://news.open-bio.org/news/2010/11/dropping-python24-support/ Please get in touch if dropping support for Python 2.4 would be a problem. Otherwise we plan for Biopython 1.56 (expected by the end of this month) to be our last release to work with Python 2.4. Thanks, Peter From biopython at maubp.freeserve.co.uk Thu Nov 18 12:45:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Nov 2010 17:45:30 +0000 Subject: [Biopython-dev] FASTA filtering by ID In-Reply-To: References: Message-ID: Sorry folk - I meant to post that to the Galaxy development mailing list, http://lists.bx.psu.edu/listinfo/galaxy-dev Peter From biopython at maubp.freeserve.co.uk Wed Nov 24 13:03:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Nov 2010 18:03:03 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> Message-ID: Hi Andrea, I *think* I have fixed the problem with empty names in the UniProt XML format, without affecting the unit tests, but I don't have the 62GB free to unpack uniprot_trembl.xml.gz to try it out: https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700 Would you be able to retest the trunk code on that please? I also changed the handling of the organism host (where present) in both the UniProt and SwissProt parsers to be more consistent. I've checked uniprot_sprot.dat still parses, but haven't tried the much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so again, would you be able to retest the "swiss" text parser too? Many thanks, Peter P.S. Did you get any reply from UniProt about the apparent error in the Q2LEH1 record within uniprot_trembl.xml.gz? From andrea at biocomp.unibo.it Thu Nov 25 11:09:28 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Thu, 25 Nov 2010 17:09:28 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> Message-ID: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> > Hi Andrea, > > I *think* I have fixed the problem with empty names in the UniProt XML > format, without affecting the unit tests, but I don't have the 62GB free > to > unpack uniprot_trembl.xml.gz to try it out: > > https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700 > > Would you be able to retest the trunk code on that please? > I've just completed a run on the 8Gb gzipped trembl file (I don't have the free 62Gb either) an it was ok, with zero errors. By the way it took just 2h 18m, the same time it took on the uncompressed 62Gb XML file. So it's definitely better not to decompress this file... > I also changed the handling of the organism host (where present) > in both the UniProt and SwissProt parsers to be more consistent. good > I've checked uniprot_sprot.dat still parses, but haven't tried the > much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so > again, would you be able to retest the "swiss" text parser too? I'll test this too and let you know. > > Many thanks, > > Peter > > P.S. Did you get any reply from UniProt about the apparent error in > the Q2LEH1 record within uniprot_trembl.xml.gz? > Unfortunately not. Andrea From andrea at biocomp.unibo.it Fri Nov 26 08:54:29 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 26 Nov 2010 14:54:29 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> Message-ID: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it> >> I've checked uniprot_sprot.dat still parses, but haven't tried the >> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so >> again, would you be able to retest the "swiss" text parser too? > > I'll test this too and let you know. > Test completed on the .dat file, all entries were parsed without errors. This time it took almost 3h but was done on the gzipped file stored in a removable 5400rpm hard drive. the XML file was on an SSD so maybe that's why it is faster with that parser. From biopython at maubp.freeserve.co.uk Fri Nov 26 09:06:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 14:06:58 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it> References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 26, 2010 at 1:54 PM, Andrea Pierleoni wrote: > >>> I've checked uniprot_sprot.dat still parses, but haven't tried the >>> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so >>> again, would you be able to retest the "swiss" text parser too? >> >> I'll test this too and let you know. >> > > Test completed on the .dat file, all entries were parsed without errors. > This time it took almost 3h but was done on the gzipped file stored in a > removable 5400rpm hard drive. the XML file was on an SSD so maybe that's > why it is faster with that parser. > Excellent - thanks. Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 09:08:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 14:08:59 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 Message-ID: Hi all, No one has raised any outstanding issues to warrant delaying the 1.56 release any further, so I plan to do it now. Please don't make any commits to the master branch until further notice. Thank you, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 10:19:20 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 15:19:20 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: On Fri, Nov 26, 2010 at 2:08 PM, Peter wrote: > Hi all, > > No one has raised any outstanding issues to warrant delaying > the 1.56 release any further, so I plan to do it now. Please don't > make any commits to the master branch until further notice. > > Thank you, > > Peter I think that's the source code bundles and Windows installers all done and uploaded, plus the PyPI upload done. I'll work on a release announcement for the news server and mailing list. In the meantime, if anyone could check the files as a sanity test (just in case I missed something), please do. Get them from here: http://biopython.org/DIST/ Thanks, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 11:07:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 16:07:48 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: On Fri, Nov 26, 2010 at 3:19 PM, Peter wrote: > On Fri, Nov 26, 2010 at 2:08 PM, Peter wrote: >> Hi all, >> >> No one has raised any outstanding issues to warrant delaying >> the 1.56 release any further, so I plan to do it now. Please don't >> make any commits to the master branch until further notice. >> >> Thank you, >> >> Peter > > I think that's the source code bundles and Windows installers > all done and uploaded, plus the PyPI upload done. I'll work on > a release announcement for the news server and mailing list. > Posted online, http://news.open-bio.org/news/2010/11/biopython-1-56-released/ If anyone spots a typo please drop me an email, and I can fix it - hopefully before sending out the email announcement which I'll do a bit later on in case there are any suggested revisions to the text. Regards, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 11:25:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 16:25:42 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: <645847.84052.qm@web62404.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 12:01 PM, Peter wrote: > On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon wrote: >> >> Bio/Transcribe.py >> Bio/Translate.py >> >> These are still imported from Bio/Encodings/IUPACEncoding.py, which >> is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code >> is doing. Does anybody know? > > Ah right - sorry, that had slipped my mind: > http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html > > I had suggested we leave Bio.Transcribe and Bio.Translate in for > Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager, > and Bio.Encodings.IUPACEncoding) for Biopython 1.57 Hi Michiel, Now Biopython 1.56 is out, would you like to remove those modules? Thanks Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 14:31:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 19:31:40 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: On Fri, Nov 26, 2010 at 4:07 PM, Peter wrote: > > Posted online, > http://news.open-bio.org/news/2010/11/biopython-1-56-released/ > > If anyone spots a typo please drop me an email, and I can fix > it - hopefully before sending out the email announcement which > I'll do a bit later on in case there are any suggested revisions > to the text. I aim to send out the email in a hour or so's time. If I forget, Brad - you're in a suitable time zone right? By the way - please consider the git freeze over (I should have said so explicitly earlier - sorry about that). Peter From chapmanb at 50mail.com Fri Nov 26 15:20:04 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 26 Nov 2010 15:20:04 -0500 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: <20101126202003.GC29878@sobchak.mgh.harvard.edu> Peter; > > Posted online, > > http://news.open-bio.org/news/2010/11/biopython-1-56-released/ > > > > If anyone spots a typo please drop me an email, and I can fix > > it - hopefully before sending out the email announcement which > > I'll do a bit later on in case there are any suggested revisions > > to the text. Thanks for all the hard work getting this together. Everything looks great and thanks for pushing to PyPi. The only thing I noticed was that after "Note as previously announced" there is an extra tag which causes the rest of the text through the authors to be a link. Not a big deal. Congrats on the new release, Brad From biopython at maubp.freeserve.co.uk Fri Nov 26 16:17:23 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 21:17:23 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: <20101126202003.GC29878@sobchak.mgh.harvard.edu> References: <20101126202003.GC29878@sobchak.mgh.harvard.edu> Message-ID: Hi Brad, On Fri, Nov 26, 2010 at 8:20 PM, Brad Chapman wrote: > > Thanks for all the hard work getting this together. Everything looks > great and thanks for pushing to PyPi. I must say a public thank you to Tiago too - having the buildbot up and running (even with the handful of buildslaves we have now) has been a great reassurance that things are looking OK. This will be particularly helpful for spotting problems on Python 3 (since it is a hassle to test by hand right now) and older versions of Python - my main machine these days run Python 2.6. As an example, for a while the trunk had been broken on Python 2.4 without anyone noticing. This was when I merged the UniProt XML parser without having checked the unit tests were skipped nicely on Python 2.4 when ElementTree was missing. Having the tests run every night automatically is much safer - so thanks Tiago :) [Hopefully we'll get the buildbot running on a dedicated VM before too long - we're in touch with the OBF admins about this already.] > The only thing I noticed was that after "Note as previously > announced" there is an extra tag which causes the rest > of the text through the authors to be a link. Not a big deal. Well spotted - I'd actually put rather than which must have confused the formatting because it looked OK. Thanks! Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 18:12:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 23:12:14 +0000 Subject: [Biopython-dev] Biopython 1.56 Message-ID: Dear Biopythoneers, On behalf of the developers, I'm pleased to announce we released Biopython 1.56 earlier today. For more details please see: http://news.open-bio.org/news/2010/11/biopython-1-56-released/ Please note this will probably be the last release to support Python 2.4, see: http://news.open-bio.org/news/2010/11/dropping-python24-support/ (At least) 13 people have contributed to this release, including 6 new people ? thank you all: * Andrea Pierleoni (first contribution) * Bart de Koning (first contribution) * Bartek Wilczynski * Bartosz Telenczuk (first contribution) * Cymon Cox * Eric Talevich * Frank Kauff * Michiel de Hoon * Peter Cock * Phillip Garland (first contribution) * Siong Kong (first contribution) * Tiago Antao * Uri Laserson (first contribution) Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download As usual, feedback is most welcome on the mailing lists (or bugzilla). Regards, Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 07:02:55 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 12:02:55 +0000 Subject: [Biopython-dev] Dropping Python 2.4 Support? In-Reply-To: References: Message-ID: On Thu, Nov 18, 2010 at 3:47 PM, Peter wrote: > Dear Biopythoneers, > > Are any of you still using Biopython on Python 2.4? > http://news.open-bio.org/news/2010/11/dropping-python24-support/ > > Please get in touch if dropping support for Python 2.4 would be a > problem. Otherwise we plan for Biopython 1.56 (expected by the > end of this month) to be our last release to work with Python 2.4. > > Thanks, > > Peter So, no comments? We're using CentOS on our servers at work, but have installed a later Python on most of them and made it the default. I'm also keen to use Biopython with Galaxy, and they currently support Python 2.4 to 2.6 (and I'm unclear when they will add 2.7 and drop 2.4), so this is another reason to keep some level of support for Python 2.4. However, on a local level this isn't important as we are running Galaxy on Python 2.6 now. Likewise I know Brad is running Galaxy on a more recent Python than 2.4 (are you using Biopython within Galaxy Brad? Maybe we could chat about that on a new thread). Hopefully the release of Biopython 1.56 will alert more of our users to the planned withdrawal of support of Python 2.4, so we may get some feedback this week... Peter From chapmanb at 50mail.com Mon Nov 29 07:23:23 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 29 Nov 2010 07:23:23 -0500 Subject: [Biopython-dev] Dropping Python 2.4 Support? In-Reply-To: References: Message-ID: <20101129122323.GA3139@sobchak.mgh.harvard.edu> Peter; [Python2.4 support] > So, no comments? The folks who are still using 5 year old versions of python might not be the most responsive. We'll probably hear some complaints when some of the code breaks. > I'm also keen to use Biopython with Galaxy, and they currently > support Python 2.4 to 2.6 (and I'm unclear when they will add > 2.7 and drop 2.4), so this is another reason to keep some level > of support for Python 2.4. However, on a local level this isn't > important as we are running Galaxy on Python 2.6 now. > Likewise I know Brad is running Galaxy on a more recent > Python than 2.4 (are you using Biopython within Galaxy > Brad? Maybe we could chat about that on a new thread). Yes, I'm running on 2.6 (and sad to be missing nested with statements in my code). It would be great to have formal Biopython/Galaxy interoperability. If I remember right, the biggest complaint was lack of PEP 8 compliance with module names, but it should be worth discussing. Brad From mjldehoon at yahoo.com Tue Nov 30 08:14:20 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Nov 2010 05:14:20 -0800 (PST) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <215849.18567.qm@web62405.mail.re1.yahoo.com> OK, I have removed these modules: Bio.Encodings Bio.PropertyManager Bio.Transcribe Bio.Translate Bio.utils --Michiel. --- On Fri, 11/26/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Biopython 1.56 release plans > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Friday, November 26, 2010, 11:25 AM > On Fri, Nov 5, 2010 at 12:01 PM, > Peter > wrote: > > On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon > > wrote: > >> > >> Bio/Transcribe.py > >> Bio/Translate.py > >> > >> These are still imported from > Bio/Encodings/IUPACEncoding.py, which > >> is imported from Bio/Alphabet/IUPAC.py. I have no > idea what this code > >> is doing. Does anybody know? > > > > Ah right - sorry, that had slipped my mind: > > http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html > > > > I had suggested we leave Bio.Transcribe and > Bio.Translate in for > > Biopython 1.56 and remove them (and Bio.utils, > Bio.PropertyManager, > > and Bio.Encodings.IUPACEncoding) for Biopython 1.57 > > Hi Michiel, > > Now Biopython 1.56 is out, would you like to remove those > modules? > > Thanks > > Peter > From anaryin at gmail.com Tue Nov 30 10:45:35 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 30 Nov 2010 16:45:35 +0100 Subject: [Biopython-dev] Features of the GSOC branch ready to be merged Message-ID: Hello all, I've been looking at the code I wrote for the GSOC to see what is ready to be merged in the main branch. I have to thank Kristian and whoever participated in the Python & Friends for the input. >From what I gathered, and from my own tests, I believe the following functions are solid enough: 1. Bio/PDB/Atom.py: automatically guessing atom element from atom name 2. Bio/PDB/Structure.py 1. Building biological unit from REMARK 350 in the header (link ) 2. Renumbering residues (link ) Let me know what you all think. Best, Jo?o [...] Rodrigues http://doeidoei.wordpress.com From biopython at maubp.freeserve.co.uk Tue Nov 30 18:24:35 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Nov 2010 23:24:35 +0000 Subject: [Biopython-dev] Bio.SeqIO.index extension, Bio.SeqIO.index_many Message-ID: Hi all, You may recall some previous discussion about extending the Bio.SeqIO.index functionality. I'm particularly interested in keeping the index on disk to reduce the memory overhead and thus support NGS files with many millions of reads. e.g. http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006713.html http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006716.html I'd also like to index multiple files (e.g. a folder of GenBank files for different chromosomes), functionality we used to have with the OBDA style index (using BDB or a flat file) and Martel/Mindy (deprecated and removed some time ago due to problems with 3rd party libraries, scaling problems when parsing, and ultimately no one familiar enough with the code to try and fix it). See also: http://lists.open-bio.org/pipermail/biopython-dev/2009-August/006704.html I've been working on the follow idea on branches in github, and have something workable using SQLite3 to store a table of record identifiers, file offset, and file number (for where we have multiple files indexed together). Following the OBDA standard, I extended this to also (optionally) store the record length on disk. This allows the get_raw method to be much faster, but may not be possible on all file formats. [Currently I get the length when building the index on all supported file formats except SFF. Here we normally use the Roche index, and that doesn't have the raw record lengths.] Note that using SQLite seems sensible to me as it is included with Python 2.5+ including Python 3, while BDB, the other candidate from the standard library, has been deprecated. The current API is as follows, a new function: def index_many(index_filename, filenames=None, format=None, alphabet=None, key_function=None) This is similar to the existing index function, although here the key_function must return a string for use as the key in the SQLite database. The idea is that you call index_many to build a new index (if the index file does not exist) or reload an existing index (if the index file does exist). If you are reloading an existing index, you can omit the filenames and format. The index_many function returns a read only dictionary like object - very much like the existing index function. Although not (currently) exposed by this API, the code allows a configurable limit on the number of handles (since these are a finite resource limited by the OS). I've put a branch up for comment: https://github.com/peterjc/biopython/tree/index-many I hope the docstring text and embedded doctest examples are clear. You can read them here: https://github.com/peterjc/biopython/blob/index-many/Bio/SeqIO/__init__.py What do people think? One thing I haven't done yet (any volunteers?) is any benchmarking - for example comparing the index build and retrieval times for some large files using Biopython 1.55 (recent baseline), Biopython 1.56 (should be faster on retrieval) and the branch to check for any regressions in Bio.SeqIO.index(), and compare this to Bio.SeqIO.index_many() which being disk based will be slower but require much less RAM. Peter P.S. This was based on the following branch, which proved non-trivial to merge since in the meantime I'd made separate tweaks to the index code on the trunk: https://github.com/peterjc/biopython/tree/index-many-length I didn't propose merging this back then because it absolutely requires SQLite, and thus Python 2.5+ and we wanted Biopython 1.56 to support Python 2.4. From schaefer at rostlab.org Tue Nov 2 09:17:49 2010 From: schaefer at rostlab.org (Christian Schaefer) Date: Tue, 02 Nov 2010 10:17:49 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: <4CCFD73D.7000203@rostlab.org> Hey, I was using the PDB superimposer once and compared it to ProFit [1] which does a McLachlan fitting. Both return essentially the same rmsd, while the implementation in Bio.PDB seems to yield higher precision. Chris [1] http://www.bioinf.org.uk/software/profit/ -- Dipl.-Bioinf. Christian Schaefer Technical University Munich Department for Bioinformatics Faculty of Computer Science/I12 Boltzmannstr. 3 D-85748 Garching b. Muenchen Germany http://www.rostlab.org/~schaefer On 10/30/2010 01:42 AM, George Devaniranjan wrote: > Thanks Eric and Peter, > Your patience in answering this question is very much appreciated. > I think Eric maybe right, I tried the RMSD calculation for several > structures and VMD does give a lower value for them all. > George > > Thanks once again for all of you for your answers > > On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevichwrote: > >> On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan< >> devaniranjan at gmail.com> wrote: >> >>> I was wondering why there is two functions for calculating RMSD >>> >>> 1)in the SVDSuperimposer() >>> 2)in PDB.Superimposer() >>> >>> In the code its says RMS-is RMS being calculated instead of RMSD??? >>> I ask because VMD gives a different value for RMSD to the one from >>> Biopython >>> >>> >> Hello George, >> >> Here's my understanding of it: >> >> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms >> of the distances in 3D space between each corresponding pair of atoms. The >> RMSD between all atoms in two aligned structures may be different than the >> RMSD between backbone atoms only. Or, if the two structures don't have the >> same peptide sequence, that raises another set of issues. >> >> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a >> simplified wrapper. >> >> 3. The SVDSuperimposer module allows you to either (i) align two structures >> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without >> spatially (re-)aligning the structures. PDB.Superimposer just does the >> former. If the structures weren't already aligned, these can yield very >> different values. >> >> 4. There are many ways to perform a structural alignment; SVDSuperimposer >> implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement >> more advanced methods. >> >> So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer >> -- it just means VMD found a better alignment between the two structures. >> >> Best, >> Eric >> >> >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From krother at rubor.de Tue Nov 2 11:15:05 2010 From: krother at rubor.de (Kristian Rother) Date: Tue, 2 Nov 2010 12:15:05 +0100 Subject: [Biopython-dev] RMSD calculation In-Reply-To: References: Message-ID: <529a050d3a1c3801f07adbef605341ef-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhRSF5cXg1fWg==-webmailer1@server08.webmailer.hosteurope.de> Hi Greg, I think I can help to clear up the RMSD question. (or RMS however you abbreviate it its the same formula) The short answer is, the methods giving lower RMSD do something conceptually very different from Bio.PDB. Long answer: - Bio.PDB.Superimposer does structure *superposition*. It takes pairs of atoms, and finds the rotation/translation matrix that minimizes the RMSD. There is a single analytical solution to this, returned by the Kabsch algorithm from 1976 (see http://www.pymolwiki.org/index.php/Kabsch). I'm quite sure Biopython/SVDSuperimposer implements this algorithm. - Services like the EBI SSM server do *structure alignment*. They take two structures and try to find a set of residue pairs that fit to each other well. To do so, they occasionally calculate RMSDs, but do not necessarily use all the residues provided. For instance, when submitting protein1 and protein2 to EBI, the output tells me that N(algn) = 31 meaning that 31 of the 36 residues were used to calculate the alignment. When looking at the structures, these are probably on the N-terminus (see picture). ==> the structure alignment algorithm discards the residues he doesnt regard useful for aligning, this is why the RMSD is lower. Do you think this explains all our observations? Best regards, Kristian > Hello everyone, > I tried with pymol and it gives a value of 1.792 for the RMSD after > alignment > The EU bioinformatics server gives a value of 1.74 > VMD 1.62 > But SVD and PDB Superimposer gives a value 3.2 > I have attached the 2 PDB files concerned-is it something I am doing in > calculating the RMSD using biopython? > Thank you > > On Thu, Oct 28, 2010 at 1:46 PM, Peter > wrote: > >> On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan >> wrote: >> > Yes there is a difference-for 2 proteins having exact same residues of >> 36 >> > residues the values from 4 sources are as follows >> > VMD RMSD=1.61 >> > SVD RMSD =3.2 >> > PDB RMSD=3.2 >> > >> > From the EU Bioinformatics server (link below) RMSD =1.75 >> > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver) >> > >> > So Biopython really is computing the RMSD and not RMS? >> > Thanks you >> >> It has been a while since I looked at this (but I can still edit >> the Warwick page if is is unclear). >> >> Which definition of RMSD are you using? >> >> Bio.PDB uses Bio.SVDSuperimposer, so they should be the same. >> The comment for this code *says* is calculates the RMS deviation, >> here: >> >> diff=coords1-coords2 >> l=coords1.shape[0] >> return sqrt(sum(sum(diff*diff))/l) >> >> Here variable l will be the number of atoms. >> >> What are the two examples you are using? Can you at perhaps >> share a small example pair of PDB files? >> >> Peter >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -------------- next part -------------- A non-text attachment was scrubbed... Name: superpos.png Type: image/png Size: 172427 bytes Desc: not available URL: From devaniranjan at gmail.com Wed Nov 3 01:09:18 2010 From: devaniranjan at gmail.com (George Devaniranjan) Date: Wed, 3 Nov 2010 01:09:18 +0000 Subject: [Biopython-dev] RMSD calculation In-Reply-To: <4CCFD73D.7000203@rostlab.org> References: <4CCFD73D.7000203@rostlab.org> Message-ID: Hi, Thank you- I have been noticing that for most PDB-superimposer well as SV-superimposer give similar values In addition PYMOL in most cases also gives similar values however in all cases VMD continues to give the smallest value. I will also test ProFit -thanks for the link. George On Tue, Nov 2, 2010 at 9:17 AM, Christian Schaefer wrote: > Hey, > > I was using the PDB superimposer once and compared it to ProFit [1] which > does a McLachlan fitting. Both return essentially the same rmsd, while the > implementation in Bio.PDB seems to yield higher precision. > > Chris > > [1] http://www.bioinf.org.uk/software/profit/ > > -- > Dipl.-Bioinf. Christian Schaefer > Technical University Munich > Department for Bioinformatics > Faculty of Computer Science/I12 > Boltzmannstr. 3 > D-85748 Garching b. Muenchen > Germany > http://www.rostlab.org/~schaefer > > > > On 10/30/2010 01:42 AM, George Devaniranjan wrote: > >> Thanks Eric and Peter, >> Your patience in answering this question is very much appreciated. >> I think Eric maybe right, I tried the RMSD calculation for several >> structures and VMD does give a lower value for them all. >> George >> >> Thanks once again for all of you for your answers >> >> On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich> >wrote: >> >> On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan< >>> devaniranjan at gmail.com> wrote: >>> >>> I was wondering why there is two functions for calculating RMSD >>>> >>>> 1)in the SVDSuperimposer() >>>> 2)in PDB.Superimposer() >>>> >>>> In the code its says RMS-is RMS being calculated instead of RMSD??? >>>> I ask because VMD gives a different value for RMSD to the one from >>>> Biopython >>>> >>>> >>>> Hello George, >>> >>> Here's my understanding of it: >>> >>> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms >>> of the distances in 3D space between each corresponding pair of atoms. >>> The >>> RMSD between all atoms in two aligned structures may be different than >>> the >>> RMSD between backbone atoms only. Or, if the two structures don't have >>> the >>> same peptide sequence, that raises another set of issues. >>> >>> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a >>> simplified wrapper. >>> >>> 3. The SVDSuperimposer module allows you to either (i) align two >>> structures >>> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without >>> spatially (re-)aligning the structures. PDB.Superimposer just does the >>> former. If the structures weren't already aligned, these can yield very >>> different values. >>> >>> 4. There are many ways to perform a structural alignment; SVDSuperimposer >>> implements a simple one. PyMOL, VMD, ce, DALI, and other programs >>> implement >>> more advanced methods. >>> >>> So don't be alarmed that VMD gives you a smaller RMSD than >>> PDB.Superimposer >>> -- it just means VMD found a better alignment between the two structures. >>> >>> Best, >>> Eric >>> >>> >>> >>> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From biopython at maubp.freeserve.co.uk Wed Nov 3 14:02:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 14:02:48 +0000 Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: References: Message-ID: On Tue, Oct 19, 2010 at 4:54 PM, Peter wrote: > Hi all, > > I've fixed a few issues I felt were holding up merging Andrea's UniProt > XML parser. > > I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed > into more or less equivalent objects, and that these can be written out > as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent > work to support protein EMBL files - which do exist but are rarely used). > > This required "fixing" Bug 3026 to cope with long annotation that cannot > be line wrapper nicely (lots of long URL strings in UniProt XML comments). > http://bugzilla.open-bio.org/show_bug.cgi?id=3026 > I'm tempted to remove the warning because it is so common... or make > it use the same text each time so you get warned once. > > There are also some additions to the Bio.SeqFeature position classes, > since SwissProt/UniProt files can have uncertain positions. > > Could someone take a look at the code here (a rebased branch), as I'd > like some independent testing (and better yet, code review): > http://github.com/peterjc/biopython/tree/uniprot I've now merged this into the trunk (with a git rebase first so the history is linear - no branch+merge), and Andrea has agreed to retest it. Other testing and comments are most welcome. Peter From biopython at maubp.freeserve.co.uk Wed Nov 3 16:45:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 16:45:25 +0000 Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: <781588.85801.qm@web62407.mail.re1.yahoo.com> References: <781588.85801.qm@web62407.mail.re1.yahoo.com> Message-ID: On Sat, Oct 30, 2010 at 3:23 PM, Michiel de Hoon wrote: > > OK, done. In the end, I put the warning message in MarkovModel.py anyway, > since it's very easy to miss if it's in setup.py. > Do we really need the warning? I guess otherwise people using this code might notice a drop in performance if they were using our C code version, updated their Biopython, and then get the Python fallback if their NumPy is too old. If we do keep the warning should it be silenced in test_MarkovModel.py? Something like the patch below should do it... Peter diff --git a/Tests/test_MarkovModel.py b/Tests/test_MarkovModel.py index fc5ae8b..bb3afe8 100644 --- a/Tests/test_MarkovModel.py +++ b/Tests/test_MarkovModel.py @@ -9,7 +9,12 @@ except ImportError: raise MissingPythonDependencyError(\ "Install NumPy if you want to use Bio.MarkovModel.") +import warnings +#Silence this warning: +#For optimal speed, please update to Numpy version 1.3 or later +warnings.filterwarnings("ignore", category=UserWarning) from Bio import MarkovModel +warnings.filters.pop() def print_mm(markov_model): print "STATES: %s" % ' '.join(markov_model.states) From biopython at maubp.freeserve.co.uk Wed Nov 3 17:17:46 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 3 Nov 2010 17:17:46 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/10/30 Tiago Ant?o : > Hi all, > > I've been hacking with buildbot, an integration server. This is to > allow continuous testing of Biopython. So that we are alerted of any > problems as soon as somebody does a dreadful commit (I have the top 5 > of most dreadful commits, so it was fair that I should try to do > something about it). > > Things are still incomplete, but I think it is time to inform the list > of this effort... > To know more about buildbot you can either go to the buildbot site > http://buildbot.net/ or see the draft doc that I have been preparing > http://biopython.org/wiki/Continuous_integration > There is a draft server here: > http://events.open-bio.org:8010/ > The cool thing about buildbot is that actual testing is done by > volunteer computers. Want to test on OS y, Python version z? You can > offer the idle time of your laptop for that... > It is looking impressive Tiago - excellent work :) > > Obvious things missing: > > 0. First and foremost, see if people like this? Looks very promising. > 1. Changing the biopython test code to avoid stressing the network > (i.e., having a run_tests option that will not test network tests). > This to avoid imposing continuous traffic on genbank and friends. This > is a show stopper. Certainly we can't scale this up to many machines running regular testing without limiting the network access somewhat. > 2. Maybe warn the mailing list when some fundamental build stops > working (e.g. send an email when a python 2.x build stops working) > 3. Have test servers with all the applications installed (do you want > to volunteer? This is more to do with volunteers) I would expect "core" developers to have machines with most of the command line applications used in Biopython's tests already installed - but yes, we do want to make sure each optional command line tool or library is installed on at least one build slave. > 4. Maybe change run_tests to require all tests to be done. If we are > doing integration testing, we want all tests to be done (missing > applications or libraries should be an error). As an example, none of > my tests are complete This is about how it currently skips tests missing external dependencies (like PopGen command line tools in your case). I think that is OK, otherwise we'll get false positives (see below, we can't satisfy all dependencies on all platforms). > 5. Support mac (my access to Mr Job's fashion machines is limited). > Again this is more a volunteer issue. My main work machine is a Mac, so this shouldn't be an issue. > 6. Discuss policies: One test a day? Full tests or updates? Full > network tests (probably sporadically)? Send emails? Right now triggering tests after each commit isn't easy to do is it (due to limited git support in builtbot)? That might be nice but in the short term running the tests once a day is a big step forward. I'd suggest we do network tests once a week (or fortnight?). > 7. Find volunteers to cover several OSes and several Python > versions. Assure that people do full tests (i.e. with all applications > and libraries) That isn't possible - some applications are not available on Windows, and some libraries are not available on Jython or Python 3 (yet). > 8. While I have volunteer Windows testing myself, I will not be able > to maintain it regularly. I have access to a Windows machine (which I use to build the Biopython installers) but currently it is only online intermittently. I'd have to reorganise machines due to limited network ports in the office, but it could in principle be used as a builtbot slave. > > Opinions are most welcome > What is wrong with your Linux Python 3.1 slave? It seems that 2to3 is failing on the doctest conversion. Peter From tiagoantao at gmail.com Thu Nov 4 12:04:17 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 4 Nov 2010 12:04:17 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/3 Peter : > Certainly we can't scale this up to many machines running regular > testing without limiting the network access somewhat. As we discussed before, I was thinking in adding an option to run_tests.py (like --offline) and change the tests that access the Internet to honour that flag. I was thinking in coding this myself and then send to the list for approval (I am not going to make big changes to the test framework myself without passing them through here). >> 6. Discuss policies: One test a day? Full tests or updates? Full >> network tests (probably sporadically)? Send emails? > > Right now triggering tests after each commit isn't easy to do > is it (due to limited git support in builtbot)? That might be nice > but in the short term running the tests once a day is a big step > forward. It is actually quite easy (with an hook on github), but I would suggest leaving this for version 2: lets put the fundamental working and the add bells and whistles. > I'd suggest we do network tests once a week (or fortnight?). OK, I will go ahead and do some changes to run_tests.py as per above. > That isn't possible - some applications are not available on Windows, > and some libraries are not available on Jython or Python 3 (yet). OK, we just have to be sure (manually) that all applications that need tested are tested. >> 8. While I have volunteer Windows testing myself, I will not be able >> to maintain it regularly. > > I have access to a Windows machine (which I use to build the > Biopython installers) but currently it is only online intermittently. > I'd have to reorganise machines due to limited network ports in > the office, but it could in principle be used as a builtbot slave. Regarding Mac and Windows, I will email again as soon as we have the network issue sorted out. Before that we would be doing maybe too much traffic as we have no way to stop the network access for now. > What is wrong with your Linux Python 3.1 slave? It seems that > 2to3 is failing on the doctest conversion. I do not have time to evaluate this now, I will trace this issue over the weekend. Tiago From biopython at maubp.freeserve.co.uk Thu Nov 4 12:28:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 12:28:50 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/4 Tiago Ant?o : > 2010/11/3 Peter : >> Certainly we can't scale this up to many machines running regular >> testing without limiting the network access somewhat. > > As we discussed before, I was thinking in adding an option to > run_tests.py (like --offline) and change the tests that access the > Internet to honour that flag. I was thinking in coding this myself and > then send to the list for approval (I am not going to make big changes > to the test framework myself without passing them through here). Yep, that sounds good. The previous discussion is here if anyone missed it: http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html >>> 6. Discuss policies: One test a day? Full tests or updates? Full >>> network tests (probably sporadically)? Send emails? >> >> Right now triggering tests after each commit isn't easy to do >> is it (due to limited git support in builtbot)? That might be nice >> but in the short term running the tests once a day is a big step >> forward. > > It is actually quite easy (with an hook on github), but I would > suggest leaving this for version 2: lets put the fundamental working > and the add bells and whistles. I agree. >> I'd suggest we do network tests once a week (or fortnight?). > > OK, I will go ahead and do some changes to run_tests.py as per above. > >> That isn't possible - some applications are not available on Windows, >> and some libraries are not available on Jython or Python 3 (yet). > > OK, we just have to be sure (manually) that all applications that need > tested are tested. Yes, that will be a manual task. When we document the slave setup process we can list which applications we ideally want people to install on each OS. Having a slight range in versions would actually be a good thing here. >>> 8. While I have volunteer Windows testing myself, I will not be able >>> to maintain it regularly. >> >> I have access to a Windows machine (which I use to build the >> Biopython installers) but currently it is only online intermittently. >> I'd have to reorganise machines due to limited network ports in >> the office, but it could in principle be used as a builtbot slave. > > Regarding Mac and Windows, I will email again as soon as we have the > network issue sorted out. Before that we would be doing maybe too much > traffic as we have no way to stop the network access for now. > >> What is wrong with your Linux Python 3.1 slave? It seems that >> 2to3 is failing on the doctest conversion. > > I do not have time to evaluate this now, I will trace this issue over > the weekend. Sure. And once the --offline switch is working, we can start adding slaves (and documenting how to do it to assist future volunteers). Good work Tiago :) Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 4 16:49:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 4 Nov 2010 12:49:45 -0400 Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error code 0 even on failure In-Reply-To: Message-ID: <201011041649.oA4GnjEw008477@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3139 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-04 12:49 EST ------- Fix checked in by Tiago, marking as fixed. http://github.com/biopython/biopython/commit/457ce49a060fe540f98aa37a6266cff17864487b -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Nov 4 17:13:33 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Nov 2010 17:13:33 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans Message-ID: Hi all, I've mentioned in recent threads that I think we should try and release Biopython 1.56 this month (November 2010). I think the NEWS file is pretty up to date, and covers important new functionality like Andrea Pierleoni's UniProt XML parser and the IMGT support (with Uri Laserson). Is there any other functionality which is ready for merging? For example, Tiago - you've been doing lots of work on your branch with the PopGen code. Is that code ready? I'm willing to do the git merge/rebase. Is there any reason to bother with a beta release this time? If there are no pressing additions, I may be able to do the release tomorrow - otherwise how about aiming for Thursday or Friday next week (11 or 12 November)? Regards, Peter From mjldehoon at yahoo.com Fri Nov 5 09:40:19 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 02:40:19 -0700 (PDT) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <701600.10148.qm@web62403.mail.re1.yahoo.com> I think the following should be removed before the release: Bio/SwissProt/SProt.py Bio/Transcribe.py Bio/Translate.py as well as the Iterator class in Bio/SCOP/Dom.py. These have been deprecated since Biopython 1.52. Best, --Michiel. --- On Thu, 11/4/10, Peter wrote: > From: Peter > Subject: [Biopython-dev] Biopython 1.56 release plans > To: "Biopython-Dev Mailing List" > Date: Thursday, November 4, 2010, 1:13 PM > Hi all, > > I've mentioned in recent threads that I think we should try > and > release Biopython 1.56 this month (November 2010). > > I think the NEWS file is pretty up to date, and covers > important > new functionality like Andrea Pierleoni's UniProt XML > parser > and the IMGT support (with Uri Laserson). > > Is there any other functionality which is ready for > merging? > > For example, Tiago - you've been doing lots of work on > your > branch with the PopGen code. Is that code ready? I'm > willing > to do the git merge/rebase. > > Is there any reason to bother with a beta release this > time? > > If there are no pressing additions, I may be able to do > the > release tomorrow - otherwise how about aiming for Thursday > or Friday next week (11 or 12 November)? > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Fri Nov 5 10:13:09 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Nov 2010 10:13:09 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: > For example, Tiago - you've been doing lots of work on your > branch with the PopGen code. Is that code ready? I'm willing > to do the git merge/rebase. I was hoping that would offer to do a merge ;) . Though we need a broken repository to test the integration server, so maybe I could do it myself . Yes, the code is ready. After the merge I will still add a couple of functions (also ready, but not committed) and make sure the test cases are fully ready. But it should be a day only and better done after the merge. This is mainly new code that does much faster GENEPOP parsing and supports AFLP processing. Tiago From biopython at maubp.freeserve.co.uk Fri Nov 5 10:19:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 10:19:53 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Tiago Ant?o : > On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: >> For example, Tiago - you've been doing lots of work on your >> branch with the PopGen code. Is that code ready? I'm willing >> to do the git merge/rebase. > > I was hoping that would offer to do a merge ;) . Though we > need a broken repository to test the integration server, so maybe I > could do it myself . > Yes, the code is ready. OK - I'll try to get your code, rebase it onto the current master, then post it as a new branch for you to check. Once that is OK, I'll rebase it again if the master has changed, then fast-forward merge it to the master (that way we don't get a split and join on the master history - just a sudden batch of commits). > After the merge I will still add a couple of functions (also ready, > but not committed) and make sure the test cases are fully ready. > But it should be a day only and better done after the merge. > This is mainly new code that does much faster GENEPOP > parsing and supports AFLP processing. Hopefully we can get that part done early next week. Peter From biopython at maubp.freeserve.co.uk Fri Nov 5 10:23:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 10:23:26 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: <701600.10148.qm@web62403.mail.re1.yahoo.com> References: <701600.10148.qm@web62403.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 9:40 AM, Michiel de Hoon wrote: > I think the following should be removed before the release: > > Bio/SwissProt/SProt.py > Bio/Transcribe.py > Bio/Translate.py > > as well as the Iterator class in Bio/SCOP/Dom.py. > > These have been deprecated since Biopython 1.52. According to the DEPRECATED file, those modules were deprecated in Biopython 1.51, so they are definitely due for removal. In any case Biopython 1.52 was very nearly a year ago [1] as it was released 22 September 2009. Please go ahead and tidy this up. Thanks, Peter [1] http://www.biopython.org/wiki/Deprecation_policy From biopython at maubp.freeserve.co.uk Fri Nov 5 10:47:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 10:47:12 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Peter : > 2010/11/5 Tiago Ant?o : >> On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: >>> For example, Tiago - you've been doing lots of work on your >>> branch with the PopGen code. Is that code ready? I'm willing >>> to do the git merge/rebase. >> >> I was hoping that would offer to do a merge ;) . Though we >> need a broken repository to test the integration server, so maybe I >> could do it myself . >> Yes, the code is ready. > > OK - I'll try to get your code, rebase it onto the current master, > then post it as a new branch for you to check. Notes on how I did this: $ git remote add tiago https://github.com/tiagoantao/biopython.git $ git fetch tiago ... >From https://github.com/tiagoantao/biopython * [new branch] buildbot -> tiago/buildbot * [new branch] master -> tiago/master Now I want your "master" branch, but that name clashes with my "master" branch... the following worked here: $ git checkout tiago/master Note: moving to "tiago/master" which isn't a local branch If you want to create a new branch from this checkout, you may do so (now or later) by using -b with the checkout command again. Example: git checkout -b HEAD is now at 21b7a22... Merge branch 'master' of github.com:tiagoantao/biopython $ git checkout -b tiago-pop-gen Switched to a new branch "tiago-pop-gen" Now I want to write the history of you PopGen work as though it was started from the current state of the master branch. I was hoping there would have been no changes to the PopGen code on the master so that this would be trivial... $ git rebase master ... CONFLICT (content): Merge conflict in Bio/PopGen/FDist/__init__.py ... So open Bio/PopGen/FDist/__init__.py and look for the merge failures (which are marked with <<<<<<< to >>>>>>>). In this it was the removal of some deprecated code done on the pop gen branch, which was only deprecated in Biopython 1.55 so it is a bit premature to remove it already. So I fixed up Bio/PopGen/FDist/__init__.py and saved it. Then: $ git add Bio/PopGen/FDist/__init__.py $ git rebase --continue ... This seems to have worked. I can now do a comparison to the master branch, $ git diff master ... After running the unit tests (which was of limited value as I don't have FDist installed on this machine), I then pushed it online: $ git push peterjc tiago-pop-gen The rebased branch is now here: https://github.com/peterjc/biopython/tree/tiago-pop-gen If you agree the rebased branch is sane, it should be trivial to now merge that onto the master as a fast-forward merge. (But I would check first that the master hasn't changed, and if it has, repeat the rebase). Peter From tiagoantao at gmail.com Fri Nov 5 10:50:32 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Nov 2010 10:50:32 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Peter : > If you agree the rebased branch is sane, it should be trivial to > now merge that onto the master as a fast-forward merge. > (But I would check first that the master hasn't changed, and > if it has, repeat the rebase). Many thanks for the guide, maybe in the future I will have the courage to do it myself. Go ahead and commit the changes. I will make sure the module is sane this Sunday. From biopython at maubp.freeserve.co.uk Fri Nov 5 11:08:54 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 11:08:54 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: 2010/11/5 Tiago Ant?o : > 2010/11/5 Peter : >> If you agree the rebased branch is sane, it should be trivial to >> now merge that onto the master as a fast-forward merge. >> (But I would check first that the master hasn't changed, and >> if it has, repeat the rebase). > > Many thanks for the guide, maybe in the future I will have the > courage to do it myself. > > Go ahead and commit the changes. I will make sure the module > is sane this Sunday. Done. The master hadn't changed in the meantime so I didn't have to re-rebase: $ git checkout master Switched to branch "master" $ git merge tiago-pop-gen Updating 065e235..4f318a4 Fast forward Bio/PopGen/FDist/Async.py | 21 +- Bio/PopGen/FDist/Controller.py | 125 +- Bio/PopGen/FDist/Utils.py | 68 +- Bio/PopGen/FDist/__init__.py | 1 - Bio/PopGen/GenePop/EasyController.py | 10 +- Bio/PopGen/GenePop/FileParser.py | 69 +- Tests/PopGen/data_dfst_outfile | 300 + Tests/PopGen/dfdist1 | 1204 + Tests/PopGen/dout.cpl | 300 + Tests/PopGen/dout.dat |50000 ++++++++++++++++++++++++++++++++++ Tests/test_PopGen_DFDist.py | 106 + Tests/test_PopGen_FDist_nodepend.py | 20 +- 12 files changed, 52176 insertions(+), 48 deletions(-) create mode 100644 Tests/PopGen/data_dfst_outfile create mode 100644 Tests/PopGen/dfdist1 create mode 100644 Tests/PopGen/dout.cpl create mode 100644 Tests/PopGen/dout.dat create mode 100644 Tests/test_PopGen_DFDist.py Then publishing it, $ git push origin master Counting objects: 120, done. Delta compression using 8 threads. Compressing objects: 100% (106/106), done. Writing objects: 100% (106/106), 133.46 KiB, done. Total 106 (delta 79), reused 0 (delta 0) To git at github.com:biopython/biopython.git 065e235..4f318a4 master -> master And removing my now pointless public branch: $ git push peterjc :tiago-pop-gen To git at github.com:peterjc/biopython.git - [deleted] tiago-pop-gen We need to update the NEWS file now. Peter From mjldehoon at yahoo.com Fri Nov 5 11:52:15 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 04:52:15 -0700 (PDT) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <645847.84052.qm@web62404.mail.re1.yahoo.com> > > Bio/SwissProt/SProt.py > > the Iterator class in Bio/SCOP/Dom.py I have removed these. > > Bio/Transcribe.py > > Bio/Translate.py These are still imported from Bio/Encodings/IUPACEncoding.py, which is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code is doing. Does anybody know? --Michiel. From biopython at maubp.freeserve.co.uk Fri Nov 5 12:01:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 12:01:45 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: <645847.84052.qm@web62404.mail.re1.yahoo.com> References: <645847.84052.qm@web62404.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon wrote: > >> > Bio/SwissProt/SProt.py >> > the Iterator class in Bio/SCOP/Dom.py > > I have removed these. > >> > Bio/Transcribe.py >> > Bio/Translate.py > > These are still imported from Bio/Encodings/IUPACEncoding.py, which > is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code > is doing. Does anybody know? Ah right - sorry, that had slipped my mind: http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html I had suggested we leave Bio.Transcribe and Bio.Translate in for Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager, and Bio.Encodings.IUPACEncoding) for Biopython 1.57 Peter From mjldehoon at yahoo.com Fri Nov 5 12:08:17 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 05:08:17 -0700 (PDT) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <772269.63506.qm@web62407.mail.re1.yahoo.com> I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this functionality moved to Bio.ExPASy.Prodoc at least since release 1.50, and the module has been labeled as obsolete since then. The enclosing module Bio.Prosite itself is already deprecated. --Michiel. From biopython at maubp.freeserve.co.uk Fri Nov 5 12:19:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 12:19:27 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: <772269.63506.qm@web62407.mail.re1.yahoo.com> References: <772269.63506.qm@web62407.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 12:08 PM, Michiel de Hoon wrote: > I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this > functionality moved to Bio.ExPASy.Prodoc at least since release 1.50, > and the module has been labeled as obsolete since then. The enclosing > module Bio.Prosite itself is already deprecated. Since Bio.Prosite is deprecated that means Bio.Prosite.Prodoc (and any other child modules) is too. If you try "from Bio.Prosite import Prodoc" you get a deprecation warning. Feel free to add "(DEPRECATED)" to the Bio.Prosite.Prodoc docstrings if you think it would be clearer. Peter From andrea at biocomp.unibo.it Fri Nov 5 16:43:16 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 5 Nov 2010 17:43:16 +0100 (CET) Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: References: Message-ID: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> > On Tue, Oct 19, 2010 at 4:54 PM, Peter > wrote: > I've now merged this into the trunk (with a git rebase first so the > history > is linear - no branch+merge), and Andrea has agreed to retest it. > Other testing and comments are most welcome. > > Peter > I've done a couple of testing, from the master biopython branch. The uniprot-xml parser successfully parsed the 2010_11 release of uniprot containing 522,019 entries. The plain text 'swiss' parser took 6 mins to parse the complete flatfile uniprot db on my system (python 2.6 on a macbook pro, core2duo). the uniprot-xml parser took 12 minutes to do the same task when using cElementTree and looks pretty good to me (compare this to the 8 minutes I needed to download the gzipped db). However it took more than 80 mins to do the same task using ElementTree. So be aware that the parser can turn very slow without the C library. I'm currently retesting also on TrEMBL, but I don't think there is going to be any problem. I have no idea of the performances with jython, and similar derivations of python, nor if it works. Andrea From eric.talevich at gmail.com Fri Nov 5 17:26:03 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 5 Nov 2010 13:26:03 -0400 Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 5, 2010 at 12:43 PM, Andrea Pierleoni wrote: > > I've done a couple of testing, from the master biopython branch. > The uniprot-xml parser successfully parsed the 2010_11 release of uniprot > containing > 522,019 entries. > > [...] > > I have no idea of the performances with jython, and similar derivations of > python, nor if it works. > > Speaking from my experience with ElementTree in Bio.Phylo -- Jython 2.5's implementation of xml.etree should work as a drop-in replacement, but it's painfully slow. However, I've read that the next release of Jython will include some substantial overall speedups, which should make it more competitive. I once tried to get Biopython working on IronPython (on Mono, on Linux), but didn't succeed. The release I used didn't seem to have a compatible xml.etree implementation, though the developers may have made progress on this recently. -Eric From biopython at maubp.freeserve.co.uk Fri Nov 5 17:53:50 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Nov 2010 17:53:50 +0000 Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 5, 2010 at 4:43 PM, Andrea Pierleoni wrote: > > On Tue, Oct 19, 2010 at 4:54 PM, Peter wrote: >> I've now merged this into the trunk (with a git rebase first so the >> history is linear - no branch+merge), and Andrea has agreed to >> retest it. Other testing and comments are most welcome. >> >> Peter >> > > > I've done a couple of testing, from the master biopython branch. > The uniprot-xml parser successfully parsed the 2010_11 release > of uniprot containing 522,019 entries. > > The plain text 'swiss' parser took 6 mins to parse the complete flatfile > uniprot db on my system (python 2.6 on a macbook pro, core2duo). > the uniprot-xml parser took 12 minutes to do the same task when using > cElementTree and looks pretty good to me (compare this to the 8 > minutes I needed to download the gzipped db). I think I have a slightly older version as it only has 519348 entries. My timings using Python 2.6 on Mac OS X, using looping over the file with Bio.SeqIO.parse() and incrementing a counter: uniprot_sprot.fasta, 232 MB, 15s ("fasta") uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss") uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml") Note the XML file is about twice the size of the plain text swiss format file, and as you noted, takes about twice as long to parse. > However it took more than 80 mins to do the same task using > ElementTree. So be aware that the parser can turn very slow > without the C library. > > I'm currently retesting also on TrEMBL, but I don't think there is going > to be any problem. OK - those files are about 10 times bigger, right? > I have no idea of the performances with jython, and similar > derivations of python, nor if it works. The tests all pass with Jython 2.5.1 (running under Mac OS X), and here are some timings: uniprot_sprot.fasta, 232 MB, 21s ("fasta") uniprot_sprot.dat, 2.2 GB, 8m34s ("swiss") uniprot_sprot.xml, 4.5 GB, FAILED ("uniprot-xml") The XML file failed almost immediately with this traceback: Traceback (most recent call last): File "../count.py", line 13, in for record in SeqIO.parse(open(filename), format_name): File "../count.py", line 13, in for record in SeqIO.parse(open(filename), format_name): File "/Users/xxx/jython2.5.1/Lib/site-packages/Bio/SeqIO/UniprotIO.py", line 80, in UniprotIterator for event, elem in ElementTree.iterparse(handle, events=("start", "end")): File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 937, in next self._parser.feed(data) File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 1245, in feed self._parser.Parse(data, 0) File "/Users/xxx/jython2.5.1/Lib/xml/parsers/expat.py", line 195, in Parse self._data.append(data) at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuilder.append(StringBuilder.java:119) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space Note this wasn't a simple out of memory error (the machine had GBs free), rather it was heap space. That's a bit frustrating - but Kyle's email suggests things could improve in the next Jython release. Peter From andrea at biocomp.unibo.it Fri Nov 5 18:09:08 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 5 Nov 2010 19:09:08 +0100 (CET) Subject: [Biopython-dev] Merging Uniprot XML parser? In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: <37e194782e740bf5bd2e872bfc6a37d3.squirrel@lipid.biocomp.unibo.it> > I think I have a slightly older version as it only has 519348 entries. > My timings using Python 2.6 on Mac OS X, using looping over the > file with Bio.SeqIO.parse() and incrementing a counter: > > uniprot_sprot.fasta, 232 MB, 15s ("fasta") > uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss") > uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml") > my timings were without the counter :) > Note the XML file is about twice the size of the plain text swiss > format file, and as you noted, takes about twice as long to parse. > yes it's true, but iterating over the two files takes 18s for .dat one and 38s for .xml one. the information retrieved is more or less the same. the rest is overhead due to the XML file complexity. however it's pretty fast anyway, at least with cElementTree. >> I'm currently retesting also on TrEMBL, but I don't think there is going >> to be any problem. > > OK - those files are about 10 times bigger, right? it's currently 12 millions entries! so it's 24 times bigger (7.5Gb gzipped) in fact I can't complete the test today. I'll keep you updated. > > Note this wasn't a simple out of memory error (the machine had GBs > free), rather it was heap space. That's a bit frustrating - but Kyle's > email suggests things could improve in the next Jython release. > Is the new Jython release coming soon? I'm really a newbie to jython, so I don't think I can help with it. maybe it is safer for jython users to use the 'swiss' parser until the new release came out, particularly if they have performance issues. Andrea From mjldehoon at yahoo.com Sat Nov 6 02:41:57 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 5 Nov 2010 19:41:57 -0700 (PDT) Subject: [Biopython-dev] Bio/cMarkovModelmodule.c In-Reply-To: Message-ID: <646748.14362.qm@web62407.mail.re1.yahoo.com> --- On Wed, 11/3/10, Peter wrote: > > I put the warning message in MarkovModel.py anyway, > > since it's very easy to miss if it's in setup.py. > > Do we really need the warning? I guess otherwise people > using this code > might notice a drop in performance if they were using our C > code version, > updated their Biopython, and then get the Python fallback > if their NumPy is too old. We need the warning, otherwise we'd leave the user guessing as to why their code is suddenly slower. > If we do keep the warning should it be silenced in > test_MarkovModel.py? OK I've added this warning. --Michiel. From biopython at maubp.freeserve.co.uk Mon Nov 8 16:12:06 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 8 Nov 2010 16:12:06 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/4 Peter : >> As we discussed before, I was thinking in adding an option to >> run_tests.py (like --offline) and change the tests that access the >> Internet to honour that flag. I was thinking in coding this myself and >> then send to the list for approval (I am not going to make big changes >> to the test framework myself without passing them through here). > > Yep, that sounds good. > > The previous discussion is here if anyone missed it: > http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html > Hi Tiago, I've implemented the proposed --offline switch in run_tests.py, https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697 Does that work for you ? If you can come up with a more elegant solution do speak up - mine is a bit of a hack ;) Peter From tiagoantao at gmail.com Mon Nov 8 16:17:07 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 8 Nov 2010 16:17:07 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/8 Peter : > I've implemented the proposed --offline switch in run_tests.py, > > https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697 > > Does that work for you ? If you can come up with a more > elegant solution do speak up - mine is a bit of a hack ;) Thanks a lot. I was waiting for the 1.56 release to work on this thing (to avoid adding entrpoy). But as this is now in, I will progress immediately with the rest of the integration server work. I will contact soon regarding Mac testing. From tiagoantao at gmail.com Mon Nov 8 16:34:31 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 8 Nov 2010 16:34:31 +0000 Subject: [Biopython-dev] Bio/Entrez/__init__.py Message-ID: Hi, There is a doctest line that is making 2to3 go bonkers on Bio.Entrez (__init__.py) Line 55 >>> for record in records: ... # each record is a Python dictionary or list. Simplying adding a ... pass Is enough (the code should not work as it is an empty for, so 2to3 is actually correct) -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Mon Nov 8 16:38:08 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 8 Nov 2010 16:38:08 +0000 Subject: [Biopython-dev] Bio/Entrez/__init__.py In-Reply-To: References: Message-ID: 2010/11/8 Tiago Ant?o : > Hi, > > There is a doctest line that is making 2to3 go bonkers on Bio.Entrez > (__init__.py) > Line 55 > ? ? ? ? ? ? >>> for record in records: > ? ? ? ? ? ? ... ? ? # each record is a Python dictionary or list. > > Simplying adding a > ... ? ? ? pass > > Is enough (the code should not work as it is an empty for, so 2to3 is > actually correct) Ah - that isn't actually being used as a doctest (we don't call it in run_tests.py) and it wouldn't work if we tried because half the function arguments are omitted or left as dots. I like your solution of adding the pass line. Peter From mjldehoon at yahoo.com Tue Nov 9 01:22:39 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 8 Nov 2010 17:22:39 -0800 (PST) Subject: [Biopython-dev] Bio/Entrez/__init__.py In-Reply-To: Message-ID: <365364.32303.qm@web62403.mail.re1.yahoo.com> I've added this line: ... print record which should solve the 2to3 error. --Michiel. --- On Mon, 11/8/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py > To: "Tiago Ant?o" > Cc: "BioPython-Dev Mailing List" > Date: Monday, November 8, 2010, 11:38 AM > 2010/11/8 Tiago Ant?o : > > Hi, > > > > There is a doctest line that is making 2to3 go bonkers > on Bio.Entrez > > (__init__.py) > > Line 55 > > ? ? ? ? ? ? >>> for record in records: > > ? ? ? ? ? ? ... ? ? # each record is a Python > dictionary or list. > > > > Simplying adding a > > ... ? ? ? pass > > > > Is enough (the code should not work as it is an empty > for, so 2to3 is > > actually correct) > > Ah - that isn't actually being used as a doctest (we don't > call it > in run_tests.py) and it wouldn't work if we tried because > half > the function arguments are omitted or left as dots. > > I like your solution of adding the pass line. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Tue Nov 9 09:12:29 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Nov 2010 09:12:29 +0000 Subject: [Biopython-dev] Bio/Entrez/__init__.py In-Reply-To: <365364.32303.qm@web62403.mail.re1.yahoo.com> References: <365364.32303.qm@web62403.mail.re1.yahoo.com> Message-ID: The buildbot server VM is currently down (Chris is moving it to another physical location). As soon as the machine is back up, I will activate the server and maybe we can start activating things on a Mac architecture. I was thinking in sending emails to the list (automatically) when a build that was previously working, stops doing so...? 2010/11/9 Michiel de Hoon : > I've added this line: > > ? ?... ? ?print record > > which should solve the 2to3 error. > > --Michiel. > > --- On Mon, 11/8/10, Peter wrote: > >> From: Peter >> Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py >> To: "Tiago Ant?o" >> Cc: "BioPython-Dev Mailing List" >> Date: Monday, November 8, 2010, 11:38 AM >> 2010/11/8 Tiago Ant?o : >> > Hi, >> > >> > There is a doctest line that is making 2to3 go bonkers >> on Bio.Entrez >> > (__init__.py) >> > Line 55 >> > ? ? ? ? ? ? >>> for record in records: >> > ? ? ? ? ? ? ... ? ? # each record is a Python >> dictionary or list. >> > >> > Simplying adding a >> > ... ? ? ? pass >> > >> > Is enough (the code should not work as it is an empty >> for, so 2to3 is >> > actually correct) >> >> Ah - that isn't actually being used as a doctest (we don't >> call it >> in run_tests.py) and it wouldn't work if we tried because >> half >> the function arguments are omitted or left as dots. >> >> I like your solution of adding the pass line. >> >> Peter >> >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > > > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Tue Nov 9 09:57:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 09:57:47 +0000 Subject: [Biopython-dev] Continuous integration server Message-ID: 2010/11/9 Tiago Ant?o : > The buildbot server VM is currently down (Chris is moving it to > another physical location). As soon as the machine is back up, I will > activate the server and maybe we can start activating things on a Mac > architecture. > > I was thinking in sending emails to the list (automatically) when a > build that was previously working, stops doing so...? > That sounds worth trying, as it removes the need for us to actively check the buildbot server's webreport. Alternatively we should be able to use the RSS/Atom feed. One concern is if we have (say) 8 builtbot slaves, and a change on the trunk accidentally breaks a unit test (on all platforms), does that mean we'd get one email or eight? Peter From tiagoantao at gmail.com Tue Nov 9 10:14:37 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Nov 2010 10:14:37 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Peter : > That sounds worth trying, as it removes the need for us to actively > check the buildbot server's webreport. Alternatively we should be > able to use the RSS/Atom feed. The web interface has RSS and atom. > One concern is if we have (say) 8 builtbot slaves, and a change on > the trunk accidentally breaks a unit test (on all platforms), does that > mean we'd get one email or eight? It can be configured to send only 1. I just cannot promise that I will get the configuration right at the first time ;) . But it can be done. From biopython at maubp.freeserve.co.uk Tue Nov 9 10:33:26 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 10:33:26 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Tiago Ant?o : > 2010/11/9 Peter : >> That sounds worth trying, as it removes the need for us to actively >> check the buildbot server's webreport. Alternatively we should be >> able to use the RSS/Atom feed. > > The web interface has RSS and atom. Yet another feed for me to track :) Emails have the advantage of being logged on the mailing list archive. Lets try it and see how it goes. >> One concern is if we have (say) 8 builtbot slaves, and a change on >> the trunk accidentally breaks a unit test (on all platforms), does that >> mean we'd get one email or eight? > > It can be configured to send only 1. I just cannot promise that I will > get the configuration right at the first time ;) . But it can be done. I thought they (buildbot) would have considered that example :) You'll probably need the buildbot server's email address added to the biopython-dev mailing list's white list - let me know nearer the time. Peter From tiagoantao at gmail.com Tue Nov 9 14:07:56 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 9 Nov 2010 14:07:56 +0000 Subject: [Biopython-dev] bugzilla jython platform Message-ID: Hi, Just a minor thingy: would it be possible to have a bugzilla platform called jython? (Or OS). I am going to report a bug on Jython and noticed that it is not available. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From bugzilla-daemon at portal.open-bio.org Tue Nov 9 14:09:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 09:09:42 -0500 Subject: [Biopython-dev] [Bug 3155] New: Some Phylip tools seem to fail on Jython Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3155 Summary: Some Phylip tools seem to fail on Jython Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: tiagoantao at gmail.com According to the integration tests, some Phylip tools seem to fail on Jython. Please see below or http://events.open-bio.org:8010/builders/jython/builds/18 ====================================================================== ERROR: pseudosample a phylip DNA alignment written with AlignIO ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 270, in test_bootstrap_AlignIO_DNA self.check_bootstrap("Phylip/opuntia.phy", "phylip") File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 251, in check_bootstrap raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fseqboot -auto -filter -outfile=test_file -sequence=Phylip/opuntia.phy -seqtype=d -reps=2 ====================================================================== ERROR: pseudosample a phylip protein alignment written with AlignIO ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 279, in test_bootstrap_AlignIO_protein self.check_bootstrap("Phylip/hedgehog.phy", "phylip", "p") File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 251, in check_bootstrap raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fseqboot -auto -filter -outfile=test_file -sequence=Phylip/hedgehog.phy -seqtype=p -reps=2 ====================================================================== ERROR: Calculate distance matrix from an AlignIO written protein alignment ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 157, in test_distances_from_protein_AlignIO self.distances_from_alignment("Phylip/hedgehog.phy", DNA=False) File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 117, in distances_from_alignment raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fprotdist -auto -outfile=test_file -sequence=Phylip/hedgehog.phy -method=j ====================================================================== ERROR: Make a parsimony tree from an alignment written with AlignIO ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 210, in test_parsimony_tree_from_AlignIO_DNA self.parsimony_tree("Phylip/opuntia.phy", "phylip") File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py", line 194, in parsimony_tree raise ValueError("Return code %s from:\n%s" \ ValueError: Return code 1 from: fdnapars -auto -stdout -sequence=Phylip/opuntia.phy -outtreefile=test_file ====================================================================== -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 9 14:14:10 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 14:14:10 +0000 Subject: [Biopython-dev] bugzilla jython platform In-Reply-To: References: Message-ID: 2010/11/9 Tiago Ant?o : > Hi, > > Just a minor thingy: would it be possible to have a bugzilla platform > called jython? (Or OS). > > I am going to report a bug on Jython and noticed that it is not available. > It doesn't make sense to me to add Jython as an OS (for one thing, the OS field is used by all the Bio* projects on our bugzilla, also you can run Jython on Windows/Mac/Linux etc). Currently we don't even have a field for the Python version... maybe we should add a whole new (Biopython only) field for this (e.g. with Python 2.4, 2.5, 2.6, 2.7, 3.1, and Jython 2.5 as choices for now). Peter From bugzilla-daemon at portal.open-bio.org Tue Nov 9 14:26:57 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 09:26:57 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091426.oA9EQvws028228@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-09 09:26 EST ------- I realise I don't have EMBOSS phylipnew installed on my machine with Jython, so the test has just been skipped. What version of Jython? What version of EMBOSS, and the phylipnew package? Do these tests pass *on the same machine* if run in normal (C) Python? Alternately, do these four command line examples work when run by hand? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 14:55:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 09:55:50 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091455.oA9Eto7n029965@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #2 from tiagoantao at gmail.com 2010-11-09 09:55 EST ------- (In reply to comment #1) > What version of Jython? Jython 2.5.2rc2 > What version of EMBOSS, and the phylipnew package? EMBOSS 6.0.1 Phylip seems 3.68 > Do these tests pass *on the same machine* if run in normal (C) Python? Yep. This is the same machine as the one doing integration testing in C-Python > Alternately, do these four command line examples work when run by hand? No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy does not exist. Indeed this should not work, I think -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 15:05:29 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 10:05:29 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091505.oA9F5SxD030383@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-09 10:05 EST ------- (In reply to comment #2) > (In reply to comment #1) > > What version of Jython? > > Jython 2.5.2rc2 Can you easily update to Jython 2.5.2 (actual release)? > > What version of EMBOSS, and the phylipnew package? > > EMBOSS 6.0.1 > Phylip seems 3.68 Your EMBOSS is a bit old, but should be fine. > > Do these tests pass *on the same machine* if run in normal (C) Python? > > Yep. This is the same machine as the one doing integration testing in C-Python > Good - that means we can rule out EMBOSS being too old. > > Alternately, do these four command line examples work when run by hand? > > No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy > does not exist. Indeed this should not work, I think > The unit tests create Phylip/opuntia.phy at runtime, converted from Clustalw/opuntia.aln -- I'd forgotten about that and it does make testing the individual commands harder. The point here is to ensure the PHYLIP likes what we write out as PHYLIP format. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 15:11:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 10:11:37 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091511.oA9FBbaK030580@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #4 from tiagoantao at gmail.com 2010-11-09 10:11 EST ------- > Can you easily update to Jython 2.5.2 (actual release)? rc2 is the most recent. I can do 2.5.*1* -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Nov 9 15:33:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 9 Nov 2010 10:33:39 -0500 Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython In-Reply-To: Message-ID: <201011091533.oA9FXdSo031629@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3155 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-09 10:33 EST ------- (In reply to comment #4) > > Can you easily update to Jython 2.5.2 (actual release)? > > rc2 is the most recent. I can do 2.5.*1* Sorry - my mistake. I have Jython 2.5.1 (final release). I'll try to get EMBOSS phylipnew on this machine (useful anyway as a potential buildbot slave). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Nov 9 22:54:13 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Nov 2010 22:54:13 +0000 Subject: [Biopython-dev] buildbot and setup.py Message-ID: Hi all, For the continuous integration server, it is important to be able to run setup.py without it prompting the user. There are (just?) two potential prompts at the moment. First, if running on Python 3, it asks the user to confirm they have run 2to3 as per the README file. This was done as a bit of a hack - perhaps now that most of the Python code works on Py3 we can avoid this? Second, if running without NumPy, it asks the user if they really want to do this as it is best to install NumPy to use all of Biopython. For the purposes of the buildbot, I think we should have at least one build-slave without NumPy. This should then catch any regressions in the test suite. Since Jython doesn't have NumPy (and so we don't prompt about it) then maybe that would double in this role for the test matrix ;) Right now Tiago has solved the first prompt (about 2to3) by piping a "y\n" into stdin. I guess piping two would solve the case of no NumPy on Py3 ;) However, do we need an --auto or --force flag to bypass these yes or no prompts in setup.py? (Meanwhile I'm off to install NumPy under Python 3 on my Linux box which will avoid the issue for now) Peter From tiagoantao at gmail.com Wed Nov 10 00:15:02 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 00:15:02 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Peter : > One concern is if we have (say) 8 builtbot slaves, and a change on > the trunk accidentally breaks a unit test (on all platforms), does that > mean we'd get one email or eight? I was wrong here. It is not possible to send only one email. I misread the documentation. But it is quite simple to extend the mail system (by code) to do this. I least it seems simple: I will have a try at it tomorrow. For now I am only sending automated emails to myself and Peter. If anyone wants to be in the loop, please tell me. As soon as the system is reliable I will send to biopython-dev. From tiagoantao at gmail.com Wed Nov 10 00:21:15 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 00:21:15 +0000 Subject: [Biopython-dev] Continuous integration server In-Reply-To: References: Message-ID: 2010/11/9 Peter : >> The web interface has RSS and atom. > > Yet another feed for me to track :) In order to minimize the number of feed entries one can specify constraints, useful is just to report failed builds. Like this http://events.open-bio.org:8010/rss?failures_only=true Which only shows entries that relate to failures. Tiago From eric.talevich at gmail.com Wed Nov 10 03:04:38 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 9 Nov 2010 22:04:38 -0500 Subject: [Biopython-dev] buildbot and setup.py In-Reply-To: References: Message-ID: On Tue, Nov 9, 2010 at 5:54 PM, Peter wrote: > Hi all, > > For the continuous integration server, it is important > to be able to run setup.py without it prompting the > user. There are (just?) two potential prompts at the > moment. > > [...] > However, do we need an --auto or --force flag > to bypass these yes or no prompts in setup.py? > I'd find a flag like that convenient for running setup.py manually, too. For reference: apt-get takes a "-y" option which assumes a "yes" answer to all prompts, just like this. -Eric From biopython at maubp.freeserve.co.uk Wed Nov 10 11:48:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 11:48:30 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython Message-ID: Hi Taigo, >From your buildbot log for Jython 2.5.2 (release candidate 2), and also my Mac OS Jython 2.5.1 install, we have a PopGen failure: ====================================================================== FAIL: Test get alleles. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20]) AssertionError: [20, 3] != [3, 20] Notice that by using the unittest assertEqual method we get to see the values compared: https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a Before the change the output was like this: ====================================================================== FAIL: Test get alleles. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles assert self.ctrl.get_alleles(0,"Locus3") == [3, 20] AssertionError It is interesting that Jython is giving [20, 3] rather than [3, 20]. My guess would be this is down to something python implementation specific like the sort order of dictionaries or sets, in which case the unittest needs to compare sorted lists -- or the get_alleles method needs a sort? Peter From tiagoantao at gmail.com Wed Nov 10 13:05:59 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 13:05:59 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: I know, this might be an issue with the jython version (being just a release candidate). I am going to wait for results on 2.5.1 and compare. Or I might just install it myself and see. Is there any reason for the unittest framework to ignore OSErrors? I am getting some OSErrors (just in jython 2.5.2) and they are being ignored (but reported as warnings)... Tiago 2010/11/10 Peter : > Hi Taigo, > > From your buildbot log for Jython 2.5.2 (release candidate 2), and > also my Mac OS > Jython 2.5.1 install, we have a PopGen failure: > > ====================================================================== > FAIL: Test get alleles. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py", > line 57, in test_get_alleles > ? ?self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20]) > AssertionError: [20, 3] != [3, 20] > > Notice that by using the unittest assertEqual method we get to see the > values compared: > https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a > > Before the change the output was like this: > > ====================================================================== > FAIL: Test get alleles. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles > ? ?assert self.ctrl.get_alleles(0,"Locus3") == [3, 20] > AssertionError > > > It is interesting that Jython is giving [20, 3] rather than [3, 20]. My > guess would be this is down to something python implementation > specific like the sort order of dictionaries or sets, in which case > the unittest needs to compare sorted lists -- or the get_alleles > method needs a sort? > > Peter > -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Wed Nov 10 13:15:16 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 13:15:16 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/10 Tiago Ant?o : > > I know, this might be an issue with the jython version (being just a > release candidate). I am going to wait for results on 2.5.1 and > compare. Or I might just install it myself and see. I also see the same test_get_alleles failure on the Mac and on Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase candidate specific issue. > Is there any reason for the unittest framework to ignore OSErrors? I > am getting some OSErrors (just in jython 2.5.2) and they are being > ignored (but reported as warnings)... > > Tiago I've just recently put Jython 2.5.1 on my Windows box, and in addition to the test_get_alleles failure, I also see OSErrors about being unable to delete files (but the F stats test still passes). This seems to be a wider issue, affecting more than just test_PopGen_GenePop_EasyController.py, but it does seem to be OS specific (no problems deleting files in Jython 2.5.1 on my Mac, I've not tried on Linux). Peter From biopython at maubp.freeserve.co.uk Wed Nov 10 14:14:07 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 14:14:07 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows Message-ID: Hi Tiago Is/was test_PopGen_SimCoal.py working for you on Windows? I'm getting "Output directory not created!" under Python 2.6 I've also tried it under Jython 2.5.1 and had to tweak things to find the executable, thus: https://github.com/biopython/biopython/commit/95cba71f7286860fa9cd79843c47b075a2f530a6 Now both Jython 2.5.1 and Python 2.6 give the same error, "Output directory not created!" (progress I suppose). Peter P.S. On the bright side, both the FDist2 and DFDist tests are passing on Windows on Python 2.6 and Jython 2.5.1 now (after a couple of little tweaks). From tiagoantao at gmail.com Wed Nov 10 14:35:31 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 14:35:31 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/10 Peter : > I've just recently put Jython 2.5.1 on my Windows box, and > in addition to the test_get_alleles failure, I also see OSErrors > about being unable to delete files (but the F stats test still > passes). This seems to be a wider issue, affecting more than > just test_PopGen_GenePop_EasyController.py, but it does > seem to be OS specific (no problems deleting files in > Jython 2.5.1 on my Mac, I've not tried on Linux). The OSError has to potential to be somewhat nasty (i.e. throughout other Bio.* modules) as it is silent. There might be tests failing that report OK. Tiago From tiagoantao at gmail.com Wed Nov 10 14:42:18 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 10 Nov 2010 14:42:18 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows In-Reply-To: References: Message-ID: 2010/11/10 Peter : > Hi Tiago > > Is/was test_PopGen_SimCoal.py working for you on Windows? > I'm getting "Output directory not created!" under Python 2.6 This code is used 99.99% on Jython (as the fdist/dfdist code and genepop parser, BTW). I happen to test on Linux. I will fire my Windows machine and have a look, but I do not have it at hand. This will have to wait a few hours or a couple of days at most) > Now both Jython 2.5.1 and Python 2.6 give the same error, > "Output directory not created!" (progress I suppose). I cannot test this here, but I am 99% sure that the problem is the executable name (case sensitive on Windows and Mac, maybe even on Windows Jython?). If it is compiled with a capital S (seen happening) it might be a problem. > P.S. On the bright side, both the FDist2 and DFDist tests are > passing on Windows on Python 2.6 and Jython 2.5.1 now > (after a couple of little tweaks). Were they failing on Jython? I do have a reasonable amount of users on my applications (jython based)... -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From biopython at maubp.freeserve.co.uk Wed Nov 10 15:13:27 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 15:13:27 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows In-Reply-To: References: Message-ID: 2010/11/10 Tiago Ant?o : > > 2010/11/10 Peter : >> Hi Tiago >> >> Is/was test_PopGen_SimCoal.py working for you on Windows? >> I'm getting "Output directory not created!" under Python 2.6 > > This code is used 99.99% on Jython (as the fdist/dfdist code and > genepop parser, BTW). I happen to test on Linux. > I will fire my Windows machine and have a look, but I do not have it > at hand. This will have to wait a few hours or a couple of days at > most) > > >> Now both Jython 2.5.1 and Python 2.6 give the same error, >> "Output directory not created!" (progress I suppose). > > I cannot test this here, but I am 99% sure that the problem is the > executable name (case sensitive on Windows and Mac, maybe even on > Windows Jython?). If it is compiled with a capital S (seen happening) > it might be a problem. It could also be something with spaces in filenames, much more common on Windows :( >> P.S. On the bright side, both the FDist2 and DFDist tests are >> passing on Windows on Python 2.6 and Jython 2.5.1 now >> (after a couple of little tweaks). > > Were they failing on Jython? I do have a reasonable amount > of users on my applications (jython based)... I tweaked the executable checking in the unit tests, it now looks for all four binaries required, and works on Windows (both Python and Jython) and Mac (both Python and Jython). Peter From biopython at maubp.freeserve.co.uk Wed Nov 10 17:35:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 10 Nov 2010 17:35:37 +0000 Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows In-Reply-To: References: Message-ID: 2010/11/10 Peter : >> >> I cannot test this here, but I am 99% sure that the problem is the >> executable name (case sensitive on Windows and Mac, maybe even on >> Windows Jython?). If it is compiled with a capital S (seen happening) >> it might be a problem. > > It could also be something with spaces in filenames, much > more common on Windows :( > Yep, that was it. Fixed: https://github.com/biopython/biopython/commit/e24f1662b5e619d558fea17c11ddea12c3561e53 I've got my Windows box running as a buildslave now, so fingers crossed it will all be green. Peter From lpritc at scri.ac.uk Thu Nov 11 14:12:21 2010 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Thu, 11 Nov 2010 14:12:21 +0000 Subject: [Biopython-dev] Bioinformatics position Message-ID: We have a bioinformatics post available at SCRI, and would be grateful if you could please bring it to the attention of any colleagues who may be interested in applying. It is advertised at http://www.jobs.ac.uk/job/ABS904/bioinformatics/ and some details are included below: """ Bioinformatics Scottish Crop Research Institute- SCRI SCRI is Scotland's leading Institute for research on plants and their interactions with the environment, particularly in managed ecosystems. Our mission is to conduct excellent research in plant and environmental sciences. Our vision is to deliver innovative products, knowledge and services that enrich the life of the community and address the public goods of environmental sustainability, high quality and healthy food. Post Reference SMB/1/10 Research in the Plant Pathology Programme at SCRI is founded on pathogen genomics, and scientists in the Programme have a strong track record of contributing to whole genome sequencing and genetic analysis of economically important pests and pathogens.? The successful candidate will collaborate with other groups in the Programme working on plant-pathogen interactions developing innovative approaches to understand disease processes.?This post provides an opportunity to influence biological research of direct impact to agriculture. The ideal candidate would be experienced in manipulating and curating large biological datasets with a record of collaboration and integration with biologists.The successful applicant is expected to have an interest in plant-pathogen interactions and to develop their own research profile.The candidate should have a PhD or equivalent in bioinformatics, biostatistics or a related field. Informal enquiries from:??Leighton.Pritchard at scri.ac.uk ?or?Lesley.Torrance at scri.ac.uk Salary Scale For All Posts: *Band D/E, ?26,610 - ?37,534 (commensurate with experience) *Appointments to Band F, ?42,769 - ?47,521 available for exceptional candidates. Candidates willing to apply for a research fellowship to further help establish their own laboratory are encouraged to apply and will, if successful, benefit from generous Institute support throughout the tenure of their fellowship. Further information on the above posts, including how to apply, is available on the SCRI website athttp://www.scri.ac.uk/careers/vacancies ? Closing date -?Friday 19th?November 2010. The Institute is an equal opportunities employer. """ Many thanks, L. -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________ From biopython at maubp.freeserve.co.uk Thu Nov 11 16:45:43 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 16:45:43 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: On Thu, Nov 11, 2010 at 4:08 PM, Andrea Pierleoni wrote: > I finally found the time, and the 62Gb needed to test the TrEmbl database > in uniprot xml format. Is that the size on disk of the XML file? 62GB is a lot. > the analisis ic currently going, but so far I've been able to parse 1 > million entries out of 12 millions (it will go overnight...) > > I've had just one problem with the entry: Q2LEH1_9ROSI > in the downloaded files, there are multiple organism name fields, one of > wich is empty: > > ... > ? > ? ? > ? ?Populus tomentosa x P. bolleana) x P. tomentosa > var. truncat > ... > > this part of the file is differentially reported on the uniprot server at: > http://www.uniprot.org/uniprot/Q2LEH1.xml > > ... > ? > ?(Populus tomentosa x P. bolleana) x P. tomentosa > var. truncata > ... > > now, given also the missing start parenthesis, I think there is an error > non the downloaded XML file. It sounds like it - have you told UniProt? > I've attached a patch that should cope with this issue. I don't know if > there are more "errors" in the xml file. > the patch was made on the current version of biopython master branch on > github and is valid for commit ?9363c3cdc5f51805f247. > > Andrea Checked in, thanks: https://github.com/biopython/biopython/commit/38da3ff264fe180e903cda4c143a7aa9be3d431a Peter From andrea at biocomp.unibo.it Thu Nov 11 16:08:58 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Thu, 11 Nov 2010 17:08:58 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: I finally found the time, and the 62Gb needed to test the TrEmbl database in uniprot xml format. the analisis ic currently going, but so far I've been able to parse 1 million entries out of 12 millions (it will go overnight...) I've had just one problem with the entry: Q2LEH1_9ROSI in the downloaded files, there are multiple organism name fields, one of wich is empty: ... Populus tomentosa x P. bolleana) x P. tomentosa var. truncat ... this part of the file is differentially reported on the uniprot server at: http://www.uniprot.org/uniprot/Q2LEH1.xml ... (Populus tomentosa x P. bolleana) x P. tomentosa var. truncata ... now, given also the missing start parenthesis, I think there is an error non the downloaded XML file. I've attached a patch that should cope with this issue. I don't know if there are more "errors" in the xml file. the patch was made on the current version of biopython master branch on github and is valid for commit 9363c3cdc5f51805f247. Andrea -------------- next part -------------- A non-text attachment was scrubbed... Name: UniprotIO.patch Type: / Size: 610 bytes Desc: not available URL: From andrea at biocomp.unibo.it Thu Nov 11 17:15:08 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Thu, 11 Nov 2010 18:15:08 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: > > Is that the size on disk of the XML file? 62GB is a lot. yes, my macbook is getting very hot... > It sounds like it - have you told UniProt? I've notified them, let's see what they say... Anyhow the parser works. I just don't know if we should have an internet browser-like approach interpreting errors, or just be consistent and raise an error if there is a format error. in this case an empty organism name is an error. From biopython at maubp.freeserve.co.uk Thu Nov 11 19:16:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 19:16:57 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/10 Peter : > 2010/11/10 Tiago Ant?o : >> >> I know, this might be an issue with the jython version (being just a >> release candidate). I am going to wait for results on 2.5.1 and >> compare. Or I might just install it myself and see. > > I also see the same test_get_alleles failure on the Mac and on > Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase > candidate specific issue. Yes, the order just came from the order of a dict's keys - which is Python implementation dependent. Quick fix committed: https://github.com/biopython/biopython/commit/2aa604e54df02804219e092141bb32728b021a64 If you actually care about the order, then perhaps add a sorted(...) to the get_alleles method itself instead? Peter From biopython at maubp.freeserve.co.uk Thu Nov 11 20:19:05 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 20:19:05 +0000 Subject: [Biopython-dev] Jython on Windows: OSError deleting files Message-ID: Hi all, I recently installed Jython 2.5.1 on Windows XP (32 bit) for use as a build slave. This showed up some new bugs, in particular several problems with trying to delete temp files triggering an OSError. It turns out this can be triggered by trying to delete a file while we still have a handle open on it. This is a Windows limitation, but we don't see it on normal Python because there the garbage collector closes handles promptly when they go out of scope. The Java garbage collector doesn't do that. See also: http://web.archiveorange.com/archive/v/8tc1Z6ysA03SXedms7TA In particular, I am aware that if given a filename the SeqIO and AlignIO read and parse functions did not explicitly close the handle they open. I was intending to address this with a with statement in Python 2.5+, but it can be solved in Python 2.4 as well. I have started to address this, e.g. https://github.com/biopython/biopython/commit/0fb039b745b0b2ddacf2a6c9ee8afcdb56018f3c https://github.com/biopython/biopython/commit/936ea5f348cc1feea8556d263761e77ce960217e Assuming it will be easier to fix on Python 2.5+, it might be pragmatic to ignore the issue in the short term since it only seems to affect Jython on Windows. Peter From rjalves at igc.gulbenkian.pt Thu Nov 11 22:06:06 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Nov 2010 22:06:06 +0000 Subject: [Biopython-dev] Uniprot parsers Message-ID: <4CDC68CE.9070401@igc.gulbenkian.pt> Hi everyone, With the arrival of the Uniprot XML parser, is the swiss format still going to be maintained? I just clashed with a 'swiss' format parsing problem present in the 1.55b release (and previous releases). Seems like the format might have changed. One random case is [1] where all of the 2nd and following IDs are ignored by the parser. In Ensembl, for instance, the parser only collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd) identifiers. Is this a known issue? Regards, Renato [1] http://www.uniprot.org/uniprot/P31946.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature URL: From biopython at maubp.freeserve.co.uk Thu Nov 11 22:26:22 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 11 Nov 2010 22:26:22 +0000 Subject: [Biopython-dev] Uniprot parsers In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt> References: <4CDC68CE.9070401@igc.gulbenkian.pt> Message-ID: On Thu, Nov 11, 2010 at 10:06 PM, Renato Alves wrote: > Hi everyone, > > With the arrival of the Uniprot XML parser, is the swiss format still > going to be maintained? Definitely yes in the short term, for one thing the swiss files are smaller and much faster to parse. I suspect UniProt themselves may want to retire the swiss text format at some point, but moving every user over to XML will take some time. > I just clashed with a 'swiss' format parsing problem present in the > 1.55b release (and previous releases). Seems like the format might have > changed. > > One random case is [1] where all of the 2nd and following IDs are > ignored by the parser. In Ensembl, for instance, the parser only > collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd) > identifiers. > > Is this a known issue? > No - could you file a bug one this with a short example to explain what result you get, and what you want. Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Nov 11 23:09:04 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Nov 2010 18:09:04 -0500 Subject: [Biopython-dev] [Bug 3156] New: UniProt XML and SwissProt parsers silently fail to parse all of database references Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3156 Summary: UniProt XML and SwissProt parsers silently fail to parse all of database references Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: rjalves at igc.gulbenkian.pt Example code: from Bio import SeqIO, ExPASy entry = SeqIO.read(ExPASy.get_sprot_raw('P31946'), 'swiss') If you then inspect entry.dbxrefs, you can see that it includes: ['Ensembl:ENST00000353703', 'Ensembl:ENST00000372839'] but not ['Ensembl:ENSP00000300161', 'Ensembl:ENSG00000166913'. 'Ensembl:ENSP00000361930', 'Ensembl:ENSG00000166913'] which are present in the original file as: DR Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913. DR Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913. The same happens with the XML format and the new uniprot-xml parser where the original file contains: -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Thu Nov 11 22:32:41 2010 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Thu, 11 Nov 2010 22:32:41 +0000 Subject: [Biopython-dev] Uniprot parsers In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt> References: <4CDC68CE.9070401@igc.gulbenkian.pt> Message-ID: <4CDC6F09.9090506@igc.gulbenkian.pt> Actually I just tested the Uniprot-XML parser and it seems to suffer from the same issue... It ignores the following XML "properties": Quoting Renato Alves on 11/11/2010 10:06 PM: > Hi everyone, > > With the arrival of the Uniprot XML parser, is the swiss format still > going to be maintained? > > I just clashed with a 'swiss' format parsing problem present in the > 1.55b release (and previous releases). Seems like the format might have > changed. > > One random case is [1] where all of the 2nd and following IDs are > ignored by the parser. In Ensembl, for instance, the parser only > collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd) > identifiers. > > Is this a known issue? > > Regards, > Renato > > [1] http://www.uniprot.org/uniprot/P31946.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 262 bytes Desc: OpenPGP digital signature URL: From bugzilla-daemon at portal.open-bio.org Thu Nov 11 23:50:46 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Nov 2010 18:50:46 -0500 Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers silently fail to parse all of database references In-Reply-To: Message-ID: <201011112350.oABNokG9031101@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3156 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-11-11 18:50 EST ------- That was by design, dbxrefs is a flat list and for consistency with other formats we have only stored the primary identifier. Would you regard this as two primary cross references, or six? DR Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913. DR Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Nov 11 23:59:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Nov 2010 18:59:20 -0500 Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers silently fail to parse all of database references In-Reply-To: Message-ID: <201011112359.oABNxKcn031294@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3156 ------- Comment #2 from rjalves at igc.gulbenkian.pt 2010-11-11 18:59 EST ------- Five primary references since ENSG00000166913 is repeated twice (once per line). More precisely, ENSG = Ensembl Gene ENST = Ensembl Transcript ENSP = Ensembl Protein -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andrea at biocomp.unibo.it Fri Nov 12 01:02:14 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 02:02:14 +0100 (CET) Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers silently fail to parse all of database references In-Reply-To: References: Message-ID: <7c21462addfa62e09fd6c42135cc7d76.squirrel@lipid.biocomp.unibo.it> it was by construction also in the XML format, there is also a comment at line 343 of UniprotIO.py to address this issue. to parse this type of data an adapter for each db type should be written, since each DB has different data, ancd can have different structurese. also note that the Ensembl reference fields as recently undergone a change of format in the XML file: http://www.uniprot.org/docs/xml_news.htm this happens in release 2010_10. Andrea From andrea at biocomp.unibo.it Fri Nov 12 10:24:07 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 11:24:07 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> Message-ID: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> WIth the submitted patch the parser was able to correctly parse 12.347.303 entries in the 62Gb XML file in 2h 13m. it looks like a reasonable performance to me, since you are going to spend more time in downloading the 8Gb gzipped file and decompressing it. Andrea From biopython at maubp.freeserve.co.uk Fri Nov 12 10:29:51 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Nov 2010 10:29:51 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 12, 2010 at 10:24 AM, Andrea Pierleoni wrote: > WIth the submitted patch the parser was able to correctly parse 12.347.303 > entries in the 62Gb XML file in 2h 13m. That's good - but I thought the patch broke the unit test so I reverted it last night. I'll double check this. > it looks like a reasonable performance to me, since you are going to spend > more time in downloading the 8Gb gzipped file and decompressing it. On the other hand, you only download it once, and will probably only decompress it once (although you can parse gzipped files from within python if you want to), but you will parse it many times. My point is it probably could be made faster (if anyone wanted to spend the time), but it is fast enough already to be useful, and worth having in Biopython :) Peter From andrea at biocomp.unibo.it Fri Nov 12 11:05:43 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 12:05:43 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> Message-ID: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it> > That's good - but I thought the patch broke the unit test so I reverted it > last night. I'll double check this. > yes I've seen it in github, can you fix it? > On the other hand, you only download it once, and will probably only > decompress it once (although you can parse gzipped files from within > python if you want to), but you will parse it many times. > well, if your looking to performance, you're not scanning a 62Gb file each time you search for an entry, but your going to index it. the of course it depends on what you are doing... but, given the monthly release, maybe you're downloading and decompressing (or parsing a compressed file) once a month. > My point is it probably could be made faster (if anyone wanted to spend > the time), but it is fast enough already to be useful, and worth having > in Biopython :) Yes, I hope it can be made faster, but I have no idea about this, since the process is very straightforward. I did not make any profiling of the parser, so I cannot exclude some bottleneck. the only obvious speed up would be using the multiprocessing library in multi-cpu system, but I've never seen it used in biopython. It should be really easy to implement, and maybe we can think about it after python 2.4 support is dropped. as far as i know, multiprocessing is included in python 2.6 and available in python 2.5. On the other hand, Biopython has the fastest uniprot XML parser among Bio* projects and (to my knowledge) the fastest public parser on the planet ;) I bet Uniprot guys have their parser... Andrea From biopython at maubp.freeserve.co.uk Fri Nov 12 12:00:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Nov 2010 12:00:42 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it> References: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it> <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it> <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 12, 2010 at 11:05 AM, Andrea Pierleoni wrote: > >> That's good - but I thought the patch broke the unit test so I reverted it >> last night. I'll double check this. >> > > yes I've seen it in github, can you fix it? > Probably. I'll make time to look at it before the Biopython 1.56 release (which is unlikely to happen this week, delayed by the identification of some problems running under Jython on Windows). >> On the other hand, you only download it once, and will probably only >> decompress it once (although you can parse gzipped files from within >> python if you want to), but you will parse it many times. >> > > well, if your looking to performance, you're not scanning a 62Gb file > each time you search for an entry, but your going to index it. the of > course it depends on what you are doing... but, given the monthly > release, maybe you're downloading and decompressing (or parsing > a compressed file) once a month. Yeah, it depends. >> My point is it probably could be made faster (if anyone wanted to spend >> the time), but it is fast enough already to be useful, and worth having >> in Biopython :) > > Yes, I hope it can be made faster, but I have no idea about this, since > the process is very straightforward. I did not make any profiling of the > parser, so I cannot exclude some bottleneck. That would be worth while at some point. > the only obvious speed up would be using the multiprocessing library in > multi-cpu system, but I've never seen it used in biopython. We haven't been able to due to the Python 2.4 requirement, but I know of people using Biopython and multiprocessing together. > It should be really easy to implement, and maybe we can think about > it after python 2.4 support is dropped. ?as far as i know, multiprocessing > is included in python 2.6 and available in python ?2.5. Personally I'd try profiling the current single threaded code before going to multiprocessing. > On the other hand, Biopython has the fastest uniprot XML parse > among Bio* projects and (to my knowledge) the fastest public > parser on the planet ;) I bet Uniprot guys have their parser... Which of the other Bio* projects have a Uniprot XML parser? (Or was that intended as a joke?) Peter From p.j.a.cock at googlemail.com Fri Nov 12 17:18:52 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 12 Nov 2010 17:18:52 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: Hi all, I've exchanged a few emails with Tiago off list regarding an inconsistent test_PopGen_GenePop_EasyController.py problem (most visible on Jython), giving error "Unable to open file genepop.txt". I've just had it from Python 2.7 on a 32bit Linux machine: ====================================================================== ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest) Test get pairwise Fst. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", line 98, in test_get_avg_fst_pair pop_fis = self.ctrl.get_avg_fst_pair() File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", line 162, in get_avg_fst_pair return self._controller.calc_fst_pair(self._fname)[1] File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 819, in calc_fst_pair self._run_genepop([".ST2", ".MIG"], [6,2], fname) File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 296, in _run_genepop % (ret, e_out.strip().split("\n",1)[0])) IOError: GenePop error -11, Unable to open file genepop.txt ====================================================================== ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest) Test get average Fst for pairwise pops on a locus. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", line 93, in test_get_avg_fst_pair_locus self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45) File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", line 166, in get_avg_fst_pair_locus iter = self._controller.calc_fst_pair(self._fname)[0] File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 819, in calc_fst_pair self._run_genepop([".ST2", ".MIG"], [6,2], fname) File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", line 296, in _run_genepop % (ret, e_out.strip().split("\n",1)[0])) IOError: GenePop error -11, Unable to open file genepop.txt ---------------------------------------------------------------------- This failed twice in a row, then passed four times in a row (Linux, Python 2.7). I suspect the issue was related to machine IO load - during the first tests I had something compiling at the same time. I can't reproduce it on demand :( I've also seen it on the Mac with Apple's Python 2.6 (although usually it is usually fine). However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac. Peter From biopython at maubp.freeserve.co.uk Fri Nov 12 17:47:22 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 12 Nov 2010 17:47:22 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: > Hi all, > > I've mentioned in recent threads that I think we should try and > release Biopython 1.56 this month (November 2010). > > I think the NEWS file is pretty up to date, and covers important > new functionality like Andrea Pierleoni's UniProt XML parser > and the IMGT support (with Uri Laserson). > > Is there any other functionality which is ready for merging? > > For example, Tiago - you've been doing lots of work on your > branch with the PopGen code. Is that code ready? I'm willing > to do the git merge/rebase. > > Is there any reason to bother with a beta release this time? > > If there are no pressing additions, I may be able to do the > release tomorrow - otherwise how about aiming for Thursday > or Friday next week (11 or 12 November)? As people will have noticed, the release didn't happen this week. Tiago has been doing some excellent work with the prototype buildbot server (see http://events.open-bio.org:8010/grid for the current temporary home), and as part of this we've set up a few machines as buildslaves. See this thread: http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008376.html Running under Jython on the Mac showed a few problems which appear to now be sorted, other than an apparent problem with the GenePop tool. Unfortunately running under Jython on Windows XP has revealed several new problems, e.g. http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html As things stand all the tests (*) are fine on "C" Python on Linux, Mac, and Windows. They are also fine on Jython on Linux, give some warnings on Jython on Mac, and 3 errors on Windows. Hopefully we can address these three test failures (or at least understand them) and do Biopython 1.56 at the end of next week instead. Peter (*) We haven't audited all the slave test output to check which tests are being skipped due to missing optional dependencies yet. e.g. command line tools, or Python modules like ReportLab or NetworkX. From p.j.a.cock at googlemail.com Fri Nov 12 17:55:57 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 12 Nov 2010 17:55:57 +0000 Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on Jython In-Reply-To: References: Message-ID: 2010/11/12 Peter Cock : > Hi all, > > I've exchanged a few emails with Tiago off list regarding an inconsistent > test_PopGen_GenePop_EasyController.py problem (most visible on > Jython), giving error "Unable to open file genepop.txt". > > I've just had it from Python 2.7 on a 32bit Linux machine: > > ====================================================================== > ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest) > Test get pairwise Fst. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", > line 98, in test_get_avg_fst_pair > ? ?pop_fis = ?self.ctrl.get_avg_fst_pair() > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", > line 162, in get_avg_fst_pair > ? ?return self._controller.calc_fst_pair(self._fname)[1] > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 819, in calc_fst_pair > ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname) > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 296, in _run_genepop > ? ?% (ret, e_out.strip().split("\n",1)[0])) > IOError: GenePop error -11, Unable to open file genepop.txt > > ====================================================================== > ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest) > Test get average Fst for pairwise pops on a locus. > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py", > line 93, in test_get_avg_fst_pair_locus > ? ?self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45) > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py", > line 166, in get_avg_fst_pair_locus > ? ?iter = self._controller.calc_fst_pair(self._fname)[0] > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 819, in calc_fst_pair > ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname) > ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py", > line 296, in _run_genepop > ? ?% (ret, e_out.strip().split("\n",1)[0])) > IOError: GenePop error -11, Unable to open file genepop.txt > > ---------------------------------------------------------------------- > > > This failed twice in a row, then passed four times in a row (Linux, Python 2.7). > I suspect the issue was related to machine IO load - during the first > tests I had something compiling at the same time. I can't reproduce > it on demand :( > > I've also seen it on the Mac with Apple's Python 2.6 (although usually it is > usually fine). > > However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac. Well right now on my Mac with Jython, the test passes but with lots of warnings: $ jython test_PopGen_GenePop_EasyController.py Test basic info. ... ok Test Nm estimation. ... ok Test allele frequency. ... ok Test get alleles. ... ok Test get alleles for all populations. ... ok Test average Fis. ... ok Test get pairwise Fst. ... ok Test get average Fst for pairwise pops on a locus. ... Exception OSError: [Errno 0] couldn't delete file: 'big.gen.INF' in > ignored Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in > ignored ok Test F stats. ... ok Test get Fis. ... Exception OSError: [Errno 0] couldn't delete file: 'big.gen.ST2' in > ignored ok Test genotype count. ... ok Test heterozygosity info. ... Exception OSError: [Errno 0] couldn't delete file: 'big.gen.INF' in > ignored Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in > ignored ok Test multilocus F stats. ... ok ---------------------------------------------------------------------- Ran 13 tests in 5.912s Or another example, the same machine as a build slave: http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/9/steps/shell/logs/stdio On the previous build Jython on Mac gave the same error I reported above on Linux with "C" Python 2.7: http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/7/steps/shell/logs/stdio Peter From andrea at biocomp.unibo.it Fri Nov 12 20:45:24 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 12 Nov 2010 21:45:24 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl Message-ID: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> > We haven't been able to due to the Python 2.4 requirement, but > I know of people using Biopython and multiprocessing together. > good > Personally I'd try profiling the current single threaded code before > going to multiprocessing. > yes, of course. >> On the other hand, Biopython has the fastest uniprot XML parse >> among Bio* projects and (to my knowledge) the fastest public >> parser on the planet ;) I bet Uniprot guys have their parser... > > Which of the other Bio* projects have a Uniprot XML parser? > (Or was that intended as a joke?) > It was both a joke and a matter of fact, since I don't know about other publicly available parsers. Usually I look at a glass as half full... Andrea From gawbul at gmail.com Sat Nov 13 21:24:43 2010 From: gawbul at gmail.com (Steve Moss) Date: Sat, 13 Nov 2010 21:24:43 +0000 Subject: [Biopython-dev] Developing for the BioPython project... Message-ID: Hi all, I've just started a PhD centring around evolutionary comparative genomics, and will be focusing on bioinformatics and computational biology methodology. I'm really keen to use Python and BioPython in particular throughout my PhD and would like to contribute any code I can to aid in promoting BioPython as viable alternative to BioPerl, which I feel has a larger user base currently? Is there any particular process of registration to become involved with development, or is it just a case of fork'ing the repository from github? Cheers, Steve -- Kindest regards, Steve Moss http://stevemoss.ath.cx From eric.talevich at gmail.com Sat Nov 13 23:05:24 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 13 Nov 2010 18:05:24 -0500 Subject: [Biopython-dev] Developing for the BioPython project... In-Reply-To: References: Message-ID: On Sat, Nov 13, 2010 at 4:24 PM, Steve Moss wrote: > Hi all, > > I've just started a PhD centring around evolutionary comparative genomics, > and will be focusing on bioinformatics and computational biology > methodology. > > I'm really keen to use Python and BioPython in particular throughout my PhD > and would like to contribute any code I can to aid in promoting BioPython > as > viable alternative to BioPerl, which I feel has a larger user > base currently? Is there any particular process of registration to become > involved with development, or is it just a case of fork'ing the repository > from github? > > Hi Steve, If you've joined the biopython-dev mailing list, you're in the club. Feel free to fork away! To get a feel for where development is focused right now, you can look at our wiki page for active projects: http://biopython.org/wiki/Active_projects We're also collectively working on Python 3 compatibility (C extensions still need some work), though that isn't listed. Since you're a new grad student, you might have some leeway to get involved with Google Summer of Code next summer. The project ideas for Biopython, Open Bio, and NESCent drummed up last year are still worth doing, or might inspire you do do something else on your own: http://biopython.org/wiki/Google_Summer_of_Code http://www.open-bio.org/wiki/Google_Summer_of_Code https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2010 Cheers, Eric From biopython at maubp.freeserve.co.uk Mon Nov 15 14:34:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 15 Nov 2010 14:34:40 +0000 Subject: [Biopython-dev] FASTA filtering by ID Message-ID: Hi all, Something I want to do in several of my workflows is to filter a FASTA file (or potentially other format sequence files) using a list of desired identifiers (e.g. a column from a tabular file). Right now I can achieve this with three steps in Galaxy. Suppose I have: Dataset #1, FASTA file Dataset #2, Tabular file with identifiers of interest (e.g. BLAST hits, or filtered output from a sequence analysis tool) Then: Create tabular Dataset #3 using FASTA-to-tabular on Dataset #1, subject to the enhancement proposed here: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003717.html Create tabular Dataset #4 using join on Datasets #2 and #3 using the matched identifier columns. This does the filtering. Create FASTA Dataset #5 using tabular-to-FASTA on Dataset #4. This works (at least for reasonably sized datasets), but requires three steps and the creation of at least two temporary files. I'd like to introduce another tool under "FASTA manipulation" to do it on one step (rather than three). Am I going against the apparent Galaxy ideal that complex manipulations should be done with tabular files? Would such a FASTA filter tool be of interest to add directly to Galaxy (e.g. under the "FASTA manipulation" section), or better off on the community tool shed? Thanks, Peter From biopython at maubp.freeserve.co.uk Mon Nov 15 17:05:00 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 15 Nov 2010 17:05:00 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: Message-ID: On Fri, Nov 12, 2010 at 5:47 PM, Peter wrote: > On Thu, Nov 4, 2010 at 5:13 PM, Peter wrote: >> Hi all, >> >> I've mentioned in recent threads that I think we should try and >> release Biopython 1.56 this month (November 2010). >> >> ... > > As people will have noticed, the release didn't happen this week. > > ... > > Unfortunately running under Jython on Windows XP has > revealed several new problems, e.g. > http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html > > ... > > Hopefully we can address these three test failures (or > at least understand them) and do Biopython 1.56 at > the end of next week instead. Two of the problems on Jython on Windows were down to the Windows specific command line tool detection not being used, now fixed: https://github.com/biopython/biopython/commit/db41d7e4bfd8f5d4ea44bf8254334fcd7b76474f https://github.com/biopython/biopython/commit/7e5b71093c8408de140de1937480e26aaaa5daf1 There was also a heap space problem solved by a more memory efficient __getitem__ method for the UnknownSeq object (still room for improvement here). https://github.com/biopython/biopython/commit/125d8d31d07f57628c231286afae99a178e6f2c5 So, we now have a clean bill of health from the offline tests run on the buildslaves (apart from the occasional GenePop failure where retesting can make it work). I still want to look at the SeqIO/AlignIO handle issue, http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html and also the UniProt XML issue, http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008440.html Peter From peter at maubp.freeserve.co.uk Thu Nov 18 15:47:08 2010 From: peter at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Nov 2010 15:47:08 +0000 Subject: [Biopython-dev] Dropping Python 2.4 Support? Message-ID: Dear Biopythoneers, Are any of you still using Biopython on Python 2.4? http://news.open-bio.org/news/2010/11/dropping-python24-support/ Please get in touch if dropping support for Python 2.4 would be a problem. Otherwise we plan for Biopython 1.56 (expected by the end of this month) to be our last release to work with Python 2.4. Thanks, Peter From biopython at maubp.freeserve.co.uk Thu Nov 18 17:45:30 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Nov 2010 17:45:30 +0000 Subject: [Biopython-dev] FASTA filtering by ID In-Reply-To: References: Message-ID: Sorry folk - I meant to post that to the Galaxy development mailing list, http://lists.bx.psu.edu/listinfo/galaxy-dev Peter From biopython at maubp.freeserve.co.uk Wed Nov 24 18:03:03 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Nov 2010 18:03:03 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> Message-ID: Hi Andrea, I *think* I have fixed the problem with empty names in the UniProt XML format, without affecting the unit tests, but I don't have the 62GB free to unpack uniprot_trembl.xml.gz to try it out: https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700 Would you be able to retest the trunk code on that please? I also changed the handling of the organism host (where present) in both the UniProt and SwissProt parsers to be more consistent. I've checked uniprot_sprot.dat still parses, but haven't tried the much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so again, would you be able to retest the "swiss" text parser too? Many thanks, Peter P.S. Did you get any reply from UniProt about the apparent error in the Q2LEH1 record within uniprot_trembl.xml.gz? From andrea at biocomp.unibo.it Thu Nov 25 16:09:28 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Thu, 25 Nov 2010 17:09:28 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> Message-ID: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> > Hi Andrea, > > I *think* I have fixed the problem with empty names in the UniProt XML > format, without affecting the unit tests, but I don't have the 62GB free > to > unpack uniprot_trembl.xml.gz to try it out: > > https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700 > > Would you be able to retest the trunk code on that please? > I've just completed a run on the 8Gb gzipped trembl file (I don't have the free 62Gb either) an it was ok, with zero errors. By the way it took just 2h 18m, the same time it took on the uncompressed 62Gb XML file. So it's definitely better not to decompress this file... > I also changed the handling of the organism host (where present) > in both the UniProt and SwissProt parsers to be more consistent. good > I've checked uniprot_sprot.dat still parses, but haven't tried the > much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so > again, would you be able to retest the "swiss" text parser too? I'll test this too and let you know. > > Many thanks, > > Peter > > P.S. Did you get any reply from UniProt about the apparent error in > the Q2LEH1 record within uniprot_trembl.xml.gz? > Unfortunately not. Andrea From andrea at biocomp.unibo.it Fri Nov 26 13:54:29 2010 From: andrea at biocomp.unibo.it (Andrea Pierleoni) Date: Fri, 26 Nov 2010 14:54:29 +0100 (CET) Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> Message-ID: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it> >> I've checked uniprot_sprot.dat still parses, but haven't tried the >> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so >> again, would you be able to retest the "swiss" text parser too? > > I'll test this too and let you know. > Test completed on the .dat file, all entries were parsed without errors. This time it took almost 3h but was done on the gzipped file stored in a removable 5400rpm hard drive. the XML file was on an SSD so maybe that's why it is faster with that parser. From biopython at maubp.freeserve.co.uk Fri Nov 26 14:06:58 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 14:06:58 +0000 Subject: [Biopython-dev] Uniprot XML parser on TrEmbl In-Reply-To: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it> References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it> <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it> <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it> Message-ID: On Fri, Nov 26, 2010 at 1:54 PM, Andrea Pierleoni wrote: > >>> I've checked uniprot_sprot.dat still parses, but haven't tried the >>> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so >>> again, would you be able to retest the "swiss" text parser too? >> >> I'll test this too and let you know. >> > > Test completed on the .dat file, all entries were parsed without errors. > This time it took almost 3h but was done on the gzipped file stored in a > removable 5400rpm hard drive. the XML file was on an SSD so maybe that's > why it is faster with that parser. > Excellent - thanks. Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 14:08:59 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 14:08:59 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 Message-ID: Hi all, No one has raised any outstanding issues to warrant delaying the 1.56 release any further, so I plan to do it now. Please don't make any commits to the master branch until further notice. Thank you, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 15:19:20 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 15:19:20 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: On Fri, Nov 26, 2010 at 2:08 PM, Peter wrote: > Hi all, > > No one has raised any outstanding issues to warrant delaying > the 1.56 release any further, so I plan to do it now. Please don't > make any commits to the master branch until further notice. > > Thank you, > > Peter I think that's the source code bundles and Windows installers all done and uploaded, plus the PyPI upload done. I'll work on a release announcement for the news server and mailing list. In the meantime, if anyone could check the files as a sanity test (just in case I missed something), please do. Get them from here: http://biopython.org/DIST/ Thanks, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 16:07:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 16:07:48 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: On Fri, Nov 26, 2010 at 3:19 PM, Peter wrote: > On Fri, Nov 26, 2010 at 2:08 PM, Peter wrote: >> Hi all, >> >> No one has raised any outstanding issues to warrant delaying >> the 1.56 release any further, so I plan to do it now. Please don't >> make any commits to the master branch until further notice. >> >> Thank you, >> >> Peter > > I think that's the source code bundles and Windows installers > all done and uploaded, plus the PyPI upload done. I'll work on > a release announcement for the news server and mailing list. > Posted online, http://news.open-bio.org/news/2010/11/biopython-1-56-released/ If anyone spots a typo please drop me an email, and I can fix it - hopefully before sending out the email announcement which I'll do a bit later on in case there are any suggested revisions to the text. Regards, Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 16:25:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 16:25:42 +0000 Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: References: <645847.84052.qm@web62404.mail.re1.yahoo.com> Message-ID: On Fri, Nov 5, 2010 at 12:01 PM, Peter wrote: > On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon wrote: >> >> Bio/Transcribe.py >> Bio/Translate.py >> >> These are still imported from Bio/Encodings/IUPACEncoding.py, which >> is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code >> is doing. Does anybody know? > > Ah right - sorry, that had slipped my mind: > http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html > > I had suggested we leave Bio.Transcribe and Bio.Translate in for > Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager, > and Bio.Encodings.IUPACEncoding) for Biopython 1.57 Hi Michiel, Now Biopython 1.56 is out, would you like to remove those modules? Thanks Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 19:31:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 19:31:40 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: On Fri, Nov 26, 2010 at 4:07 PM, Peter wrote: > > Posted online, > http://news.open-bio.org/news/2010/11/biopython-1-56-released/ > > If anyone spots a typo please drop me an email, and I can fix > it - hopefully before sending out the email announcement which > I'll do a bit later on in case there are any suggested revisions > to the text. I aim to send out the email in a hour or so's time. If I forget, Brad - you're in a suitable time zone right? By the way - please consider the git freeze over (I should have said so explicitly earlier - sorry about that). Peter From chapmanb at 50mail.com Fri Nov 26 20:20:04 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 26 Nov 2010 15:20:04 -0500 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: References: Message-ID: <20101126202003.GC29878@sobchak.mgh.harvard.edu> Peter; > > Posted online, > > http://news.open-bio.org/news/2010/11/biopython-1-56-released/ > > > > If anyone spots a typo please drop me an email, and I can fix > > it - hopefully before sending out the email announcement which > > I'll do a bit later on in case there are any suggested revisions > > to the text. Thanks for all the hard work getting this together. Everything looks great and thanks for pushing to PyPi. The only thing I noticed was that after "Note as previously announced" there is an extra tag which causes the rest of the text through the authors to be a link. Not a big deal. Congrats on the new release, Brad From biopython at maubp.freeserve.co.uk Fri Nov 26 21:17:23 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 21:17:23 +0000 Subject: [Biopython-dev] git freeze for Biopython 1.56 In-Reply-To: <20101126202003.GC29878@sobchak.mgh.harvard.edu> References: <20101126202003.GC29878@sobchak.mgh.harvard.edu> Message-ID: Hi Brad, On Fri, Nov 26, 2010 at 8:20 PM, Brad Chapman wrote: > > Thanks for all the hard work getting this together. Everything looks > great and thanks for pushing to PyPi. I must say a public thank you to Tiago too - having the buildbot up and running (even with the handful of buildslaves we have now) has been a great reassurance that things are looking OK. This will be particularly helpful for spotting problems on Python 3 (since it is a hassle to test by hand right now) and older versions of Python - my main machine these days run Python 2.6. As an example, for a while the trunk had been broken on Python 2.4 without anyone noticing. This was when I merged the UniProt XML parser without having checked the unit tests were skipped nicely on Python 2.4 when ElementTree was missing. Having the tests run every night automatically is much safer - so thanks Tiago :) [Hopefully we'll get the buildbot running on a dedicated VM before too long - we're in touch with the OBF admins about this already.] > The only thing I noticed was that after "Note as previously > announced" there is an extra tag which causes the rest > of the text through the authors to be a link. Not a big deal. Well spotted - I'd actually put rather than which must have confused the formatting because it looked OK. Thanks! Peter From biopython at maubp.freeserve.co.uk Fri Nov 26 23:12:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Nov 2010 23:12:14 +0000 Subject: [Biopython-dev] Biopython 1.56 Message-ID: Dear Biopythoneers, On behalf of the developers, I'm pleased to announce we released Biopython 1.56 earlier today. For more details please see: http://news.open-bio.org/news/2010/11/biopython-1-56-released/ Please note this will probably be the last release to support Python 2.4, see: http://news.open-bio.org/news/2010/11/dropping-python24-support/ (At least) 13 people have contributed to this release, including 6 new people ? thank you all: * Andrea Pierleoni (first contribution) * Bart de Koning (first contribution) * Bartek Wilczynski * Bartosz Telenczuk (first contribution) * Cymon Cox * Eric Talevich * Frank Kauff * Michiel de Hoon * Peter Cock * Phillip Garland (first contribution) * Siong Kong (first contribution) * Tiago Antao * Uri Laserson (first contribution) Source distributions and Windows installers are available from the downloads page on the Biopython website: http://www.biopython.org/wiki/Download As usual, feedback is most welcome on the mailing lists (or bugzilla). Regards, Peter From biopython at maubp.freeserve.co.uk Mon Nov 29 12:02:55 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 29 Nov 2010 12:02:55 +0000 Subject: [Biopython-dev] Dropping Python 2.4 Support? In-Reply-To: References: Message-ID: On Thu, Nov 18, 2010 at 3:47 PM, Peter wrote: > Dear Biopythoneers, > > Are any of you still using Biopython on Python 2.4? > http://news.open-bio.org/news/2010/11/dropping-python24-support/ > > Please get in touch if dropping support for Python 2.4 would be a > problem. Otherwise we plan for Biopython 1.56 (expected by the > end of this month) to be our last release to work with Python 2.4. > > Thanks, > > Peter So, no comments? We're using CentOS on our servers at work, but have installed a later Python on most of them and made it the default. I'm also keen to use Biopython with Galaxy, and they currently support Python 2.4 to 2.6 (and I'm unclear when they will add 2.7 and drop 2.4), so this is another reason to keep some level of support for Python 2.4. However, on a local level this isn't important as we are running Galaxy on Python 2.6 now. Likewise I know Brad is running Galaxy on a more recent Python than 2.4 (are you using Biopython within Galaxy Brad? Maybe we could chat about that on a new thread). Hopefully the release of Biopython 1.56 will alert more of our users to the planned withdrawal of support of Python 2.4, so we may get some feedback this week... Peter From chapmanb at 50mail.com Mon Nov 29 12:23:23 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 29 Nov 2010 07:23:23 -0500 Subject: [Biopython-dev] Dropping Python 2.4 Support? In-Reply-To: References: Message-ID: <20101129122323.GA3139@sobchak.mgh.harvard.edu> Peter; [Python2.4 support] > So, no comments? The folks who are still using 5 year old versions of python might not be the most responsive. We'll probably hear some complaints when some of the code breaks. > I'm also keen to use Biopython with Galaxy, and they currently > support Python 2.4 to 2.6 (and I'm unclear when they will add > 2.7 and drop 2.4), so this is another reason to keep some level > of support for Python 2.4. However, on a local level this isn't > important as we are running Galaxy on Python 2.6 now. > Likewise I know Brad is running Galaxy on a more recent > Python than 2.4 (are you using Biopython within Galaxy > Brad? Maybe we could chat about that on a new thread). Yes, I'm running on 2.6 (and sad to be missing nested with statements in my code). It would be great to have formal Biopython/Galaxy interoperability. If I remember right, the biggest complaint was lack of PEP 8 compliance with module names, but it should be worth discussing. Brad From mjldehoon at yahoo.com Tue Nov 30 13:14:20 2010 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Nov 2010 05:14:20 -0800 (PST) Subject: [Biopython-dev] Biopython 1.56 release plans In-Reply-To: Message-ID: <215849.18567.qm@web62405.mail.re1.yahoo.com> OK, I have removed these modules: Bio.Encodings Bio.PropertyManager Bio.Transcribe Bio.Translate Bio.utils --Michiel. --- On Fri, 11/26/10, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Biopython 1.56 release plans > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Friday, November 26, 2010, 11:25 AM > On Fri, Nov 5, 2010 at 12:01 PM, > Peter > wrote: > > On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon > > wrote: > >> > >> Bio/Transcribe.py > >> Bio/Translate.py > >> > >> These are still imported from > Bio/Encodings/IUPACEncoding.py, which > >> is imported from Bio/Alphabet/IUPAC.py. I have no > idea what this code > >> is doing. Does anybody know? > > > > Ah right - sorry, that had slipped my mind: > > http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html > > > > I had suggested we leave Bio.Transcribe and > Bio.Translate in for > > Biopython 1.56 and remove them (and Bio.utils, > Bio.PropertyManager, > > and Bio.Encodings.IUPACEncoding) for Biopython 1.57 > > Hi Michiel, > > Now Biopython 1.56 is out, would you like to remove those > modules? > > Thanks > > Peter > From anaryin at gmail.com Tue Nov 30 15:45:35 2010 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 30 Nov 2010 16:45:35 +0100 Subject: [Biopython-dev] Features of the GSOC branch ready to be merged Message-ID: Hello all, I've been looking at the code I wrote for the GSOC to see what is ready to be merged in the main branch. I have to thank Kristian and whoever participated in the Python & Friends for the input. >From what I gathered, and from my own tests, I believe the following functions are solid enough: 1. Bio/PDB/Atom.py: automatically guessing atom element from atom name 2. Bio/PDB/Structure.py 1. Building biological unit from REMARK 350 in the header (link ) 2. Renumbering residues (link ) Let me know what you all think. Best, Jo?o [...] Rodrigues http://doeidoei.wordpress.com From biopython at maubp.freeserve.co.uk Tue Nov 30 23:24:35 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 30 Nov 2010 23:24:35 +0000 Subject: [Biopython-dev] Bio.SeqIO.index extension, Bio.SeqIO.index_many Message-ID: Hi all, You may recall some previous discussion about extending the Bio.SeqIO.index functionality. I'm particularly interested in keeping the index on disk to reduce the memory overhead and thus support NGS files with many millions of reads. e.g. http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006713.html http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006716.html I'd also like to index multiple files (e.g. a folder of GenBank files for different chromosomes), functionality we used to have with the OBDA style index (using BDB or a flat file) and Martel/Mindy (deprecated and removed some time ago due to problems with 3rd party libraries, scaling problems when parsing, and ultimately no one familiar enough with the code to try and fix it). See also: http://lists.open-bio.org/pipermail/biopython-dev/2009-August/006704.html I've been working on the follow idea on branches in github, and have something workable using SQLite3 to store a table of record identifiers, file offset, and file number (for where we have multiple files indexed together). Following the OBDA standard, I extended this to also (optionally) store the record length on disk. This allows the get_raw method to be much faster, but may not be possible on all file formats. [Currently I get the length when building the index on all supported file formats except SFF. Here we normally use the Roche index, and that doesn't have the raw record lengths.] Note that using SQLite seems sensible to me as it is included with Python 2.5+ including Python 3, while BDB, the other candidate from the standard library, has been deprecated. The current API is as follows, a new function: def index_many(index_filename, filenames=None, format=None, alphabet=None, key_function=None) This is similar to the existing index function, although here the key_function must return a string for use as the key in the SQLite database. The idea is that you call index_many to build a new index (if the index file does not exist) or reload an existing index (if the index file does exist). If you are reloading an existing index, you can omit the filenames and format. The index_many function returns a read only dictionary like object - very much like the existing index function. Although not (currently) exposed by this API, the code allows a configurable limit on the number of handles (since these are a finite resource limited by the OS). I've put a branch up for comment: https://github.com/peterjc/biopython/tree/index-many I hope the docstring text and embedded doctest examples are clear. You can read them here: https://github.com/peterjc/biopython/blob/index-many/Bio/SeqIO/__init__.py What do people think? One thing I haven't done yet (any volunteers?) is any benchmarking - for example comparing the index build and retrieval times for some large files using Biopython 1.55 (recent baseline), Biopython 1.56 (should be faster on retrieval) and the branch to check for any regressions in Bio.SeqIO.index(), and compare this to Bio.SeqIO.index_many() which being disk based will be slower but require much less RAM. Peter P.S. This was based on the following branch, which proved non-trivial to merge since in the meantime I'd made separate tweaks to the index code on the trunk: https://github.com/peterjc/biopython/tree/index-many-length I didn't propose merging this back then because it absolutely requires SQLite, and thus Python 2.5+ and we wanted Biopython 1.56 to support Python 2.4.