From schaefer at rostlab.org  Tue Nov  2 05:17:49 2010
From: schaefer at rostlab.org (Christian Schaefer)
Date: Tue, 02 Nov 2010 10:17:49 +0100
Subject: [Biopython-dev] RMSD calculation
In-Reply-To: <AANLkTikndP+_qBoFe=u2jR=oYQ6Dn-+WLD2xBNjXxXCC@mail.gmail.com>
References: <AANLkTi=DMWNh1AtuVjtv8=thDx5Y6KKPW+aaUK=Gi1Yj@mail.gmail.com>	<AANLkTinTy-t_-FafL23kj7PrsiLH=48mL0KZi2f-3RbS@mail.gmail.com>
	<AANLkTikndP+_qBoFe=u2jR=oYQ6Dn-+WLD2xBNjXxXCC@mail.gmail.com>
Message-ID: <4CCFD73D.7000203@rostlab.org>

Hey,

I was using the PDB superimposer once and compared it to ProFit [1] 
which does a McLachlan fitting. Both return essentially the same rmsd, 
while the implementation in Bio.PDB seems to yield higher precision.

Chris

[1] http://www.bioinf.org.uk/software/profit/

-- 
Dipl.-Bioinf. Christian Schaefer
Technical University Munich
Department for Bioinformatics
Faculty of Computer Science/I12
Boltzmannstr. 3
D-85748 Garching b. Muenchen
Germany
http://www.rostlab.org/~schaefer


On 10/30/2010 01:42 AM, George Devaniranjan wrote:
> Thanks Eric and Peter,
> Your patience in answering this question is very much appreciated.
> I think Eric maybe right, I tried the RMSD calculation for several
> structures and VMD does give a lower value for them all.
> George
>
> Thanks once again for all of you for your answers
>
> On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich<eric.talevich at gmail.com>wrote:
>
>> On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan<
>> devaniranjan at gmail.com>  wrote:
>>
>>> I was wondering why there is two functions for calculating RMSD
>>>
>>> 1)in the SVDSuperimposer()
>>> 2)in PDB.Superimposer()
>>>
>>> In the code its says RMS-is RMS being calculated instead of RMSD???
>>> I ask because VMD gives a different value for RMSD to the one from
>>> Biopython
>>>
>>>
>> Hello George,
>>
>> Here's my understanding of it:
>>
>> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms
>> of the distances in 3D space between each corresponding pair of atoms. The
>> RMSD between all atoms in two aligned structures may be different than the
>> RMSD between backbone atoms only. Or, if the two structures don't have the
>> same peptide sequence, that raises another set of issues.
>>
>> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a
>> simplified wrapper.
>>
>> 3. The SVDSuperimposer module allows you to either (i) align two structures
>> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without
>> spatially (re-)aligning the structures. PDB.Superimposer just does the
>> former. If the structures weren't already aligned, these can yield very
>> different values.
>>
>> 4. There are many ways to perform a structural alignment; SVDSuperimposer
>> implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement
>> more advanced methods.
>>
>> So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer
>> -- it just means VMD found a better alignment between the two structures.
>>
>> Best,
>> Eric
>>
>>
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From krother at rubor.de  Tue Nov  2 07:15:05 2010
From: krother at rubor.de (Kristian Rother)
Date: Tue, 2 Nov 2010 12:15:05 +0100
Subject: [Biopython-dev] RMSD calculation
In-Reply-To: <AANLkTikBe8eU7w+F14T_VdBy2dBwTmfiWv84mXjRvX8-@mail.gmail.com>
References: <AANLkTi=DMWNh1AtuVjtv8=thDx5Y6KKPW+aaUK=Gi1Yj@mail.gmail.com>
	<AANLkTikiX3RDMqmcYatjqmJ4ukuiLNq5GC=DTOn0Pmje@mail.gmail.com>
	<AANLkTik=ckm184v_+ZH1C34YNzrFpZ8Mrtt-cO5iDm3B@mail.gmail.com>
	<AANLkTi=MxnisQA8s5Kf2TioNv2dggkWsRPZEFkDw0Oaa@mail.gmail.com>
	<AANLkTikBe8eU7w+F14T_VdBy2dBwTmfiWv84mXjRvX8-@mail.gmail.com>
Message-ID: <529a050d3a1c3801f07adbef605341ef-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhRSF5cXg1fWg==-webmailer1@server08.webmailer.hosteurope.de>

Hi Greg,

I think I can help to clear up the RMSD question.
(or RMS however you abbreviate it its the same formula)

The short answer is, the methods giving lower RMSD do something
conceptually very different from Bio.PDB.

Long answer:

- Bio.PDB.Superimposer does structure *superposition*. It takes pairs of
atoms, and finds the rotation/translation matrix that minimizes the RMSD.
There is a single analytical solution to this, returned by the Kabsch
algorithm from 1976 (see http://www.pymolwiki.org/index.php/Kabsch). I'm
quite sure Biopython/SVDSuperimposer implements this algorithm.

- Services like the EBI SSM server do *structure alignment*. They take two
structures and try to find a set of residue pairs that fit to each other
well. To do so, they occasionally calculate RMSDs, but do not necessarily
use all the residues provided.

For instance, when submitting protein1 and protein2 to EBI, the output
tells me that

N(algn) = 31

meaning that 31 of the 36 residues were used to calculate the alignment.
When looking at the structures, these are probably on the N-terminus (see
picture).

==> the structure alignment algorithm discards the residues he doesnt
regard useful for aligning, this is why the RMSD is lower.


Do you think this explains all our observations?

Best regards,
    Kristian


> Hello everyone,
> I tried with pymol and it gives a value of 1.792 for the RMSD after
> alignment
> The EU bioinformatics server gives a value of 1.74
> VMD 1.62
> But SVD and PDB Superimposer gives a value 3.2
> I have attached the 2 PDB files concerned-is it something I am doing in
> calculating the RMSD using biopython?
> Thank you
>
> On Thu, Oct 28, 2010 at 1:46 PM, Peter
> <biopython at maubp.freeserve.co.uk>wrote:
>
>> On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan
>> <devaniranjan at gmail.com> wrote:
>> > Yes there is a difference-for 2 proteins having exact same residues of
>> 36
>> > residues the values from 4 sources are as follows
>> > VMD RMSD=1.61
>> > SVD RMSD =3.2
>> > PDB RMSD=3.2
>> >
>> > From the EU Bioinformatics server (link below) RMSD =1.75
>> > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver)
>> >
>> > So Biopython really is computing the RMSD and not RMS?
>> > Thanks you
>>
>> It has been a while since I looked at this (but I can still edit
>> the Warwick page if is is unclear).
>>
>> Which definition of RMSD are you using?
>>
>> Bio.PDB uses Bio.SVDSuperimposer, so they should be the same.
>> The comment for this code *says* is calculates the RMS deviation,
>> here:
>>
>>        diff=coords1-coords2
>>        l=coords1.shape[0]
>>        return sqrt(sum(sum(diff*diff))/l)
>>
>> Here variable l will be the number of atoms.
>>
>> What are the two examples you are using? Can you at perhaps
>> share a small example pair of PDB files?
>>
>> Peter
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: superpos.png
Type: image/png
Size: 172427 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101102/f02741f3/attachment-0001.png>

From devaniranjan at gmail.com  Tue Nov  2 21:09:18 2010
From: devaniranjan at gmail.com (George Devaniranjan)
Date: Wed, 3 Nov 2010 01:09:18 +0000
Subject: [Biopython-dev] RMSD calculation
In-Reply-To: <4CCFD73D.7000203@rostlab.org>
References: <AANLkTi=DMWNh1AtuVjtv8=thDx5Y6KKPW+aaUK=Gi1Yj@mail.gmail.com>
	<AANLkTinTy-t_-FafL23kj7PrsiLH=48mL0KZi2f-3RbS@mail.gmail.com>
	<AANLkTikndP+_qBoFe=u2jR=oYQ6Dn-+WLD2xBNjXxXCC@mail.gmail.com>
	<4CCFD73D.7000203@rostlab.org>
Message-ID: <AANLkTinrxtJbP6AzKqfkNwpL+w3fakVduQr=WJRRDNMO@mail.gmail.com>

Hi,
Thank you- I have been noticing that for most PDB-superimposer well as
SV-superimposer give similar values
In addition PYMOL in most cases also gives similar values however in all
cases VMD continues to give the smallest value.

I will also test ProFit -thanks for the link.
George

On Tue, Nov 2, 2010 at 9:17 AM, Christian Schaefer <schaefer at rostlab.org>wrote:

> Hey,
>
> I was using the PDB superimposer once and compared it to ProFit [1] which
> does a McLachlan fitting. Both return essentially the same rmsd, while the
> implementation in Bio.PDB seems to yield higher precision.
>
> Chris
>
> [1] http://www.bioinf.org.uk/software/profit/
>
> --
> Dipl.-Bioinf. Christian Schaefer
> Technical University Munich
> Department for Bioinformatics
> Faculty of Computer Science/I12
> Boltzmannstr. 3
> D-85748 Garching b. Muenchen
> Germany
> http://www.rostlab.org/~schaefer <http://www.rostlab.org/%7Eschaefer>
>
>
>
> On 10/30/2010 01:42 AM, George Devaniranjan wrote:
>
>> Thanks Eric and Peter,
>> Your patience in answering this question is very much appreciated.
>> I think Eric maybe right, I tried the RMSD calculation for several
>> structures and VMD does give a lower value for them all.
>> George
>>
>> Thanks once again for all of you for your answers
>>
>> On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich<eric.talevich at gmail.com
>> >wrote:
>>
>>  On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan<
>>> devaniranjan at gmail.com>  wrote:
>>>
>>>  I was wondering why there is two functions for calculating RMSD
>>>>
>>>> 1)in the SVDSuperimposer()
>>>> 2)in PDB.Superimposer()
>>>>
>>>> In the code its says RMS-is RMS being calculated instead of RMSD???
>>>> I ask because VMD gives a different value for RMSD to the one from
>>>> Biopython
>>>>
>>>>
>>>>  Hello George,
>>>
>>> Here's my understanding of it:
>>>
>>> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms
>>> of the distances in 3D space between each corresponding pair of atoms.
>>> The
>>> RMSD between all atoms in two aligned structures may be different than
>>> the
>>> RMSD between backbone atoms only. Or, if the two structures don't have
>>> the
>>> same peptide sequence, that raises another set of issues.
>>>
>>> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a
>>> simplified wrapper.
>>>
>>> 3. The SVDSuperimposer module allows you to either (i) align two
>>> structures
>>> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without
>>> spatially (re-)aligning the structures. PDB.Superimposer just does the
>>> former. If the structures weren't already aligned, these can yield very
>>> different values.
>>>
>>> 4. There are many ways to perform a structural alignment; SVDSuperimposer
>>> implements a simple one. PyMOL, VMD, ce, DALI, and other programs
>>> implement
>>> more advanced methods.
>>>
>>> So don't be alarmed that VMD gives you a smaller RMSD than
>>> PDB.Superimposer
>>> -- it just means VMD found a better alignment between the two structures.
>>>
>>> Best,
>>> Eric
>>>
>>>
>>>
>>>  _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>

From biopython at maubp.freeserve.co.uk  Wed Nov  3 10:02:48 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Nov 2010 14:02:48 +0000
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
Message-ID: <AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>

On Tue, Oct 19, 2010 at 4:54 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> I've fixed a few issues I felt were holding up merging Andrea's UniProt
> XML parser.
>
> I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed
> into more or less equivalent objects, and that these can be written out
> as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent
> work to support protein EMBL files - which do exist but are rarely used).
>
> This required "fixing" Bug 3026 to cope with long annotation that cannot
> be line wrapper nicely (lots of long URL strings in UniProt XML comments).
> http://bugzilla.open-bio.org/show_bug.cgi?id=3026
> I'm tempted to remove the warning because it is so common... or make
> it use the same text each time so you get warned once.
>
> There are also some additions to the Bio.SeqFeature position classes,
> since SwissProt/UniProt files can have uncertain positions.
>
> Could someone take a look at the code here (a rebased branch), as I'd
> like some independent testing (and better yet, code review):
> http://github.com/peterjc/biopython/tree/uniprot

I've now merged this into the trunk (with a git rebase first so the history
is linear - no branch+merge), and Andrea has agreed to retest it.
Other testing and comments are most welcome.

Peter

From biopython at maubp.freeserve.co.uk  Wed Nov  3 12:45:25 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Nov 2010 16:45:25 +0000
Subject: [Biopython-dev] Bio/cMarkovModelmodule.c
In-Reply-To: <781588.85801.qm@web62407.mail.re1.yahoo.com>
References: <AANLkTimkefVwSCjYSQPBhQ5SFyMFVPiJYiRSnC8G2ygQ@mail.gmail.com>
	<781588.85801.qm@web62407.mail.re1.yahoo.com>
Message-ID: <AANLkTi=U1bcLmbJczO3GNmkViBMe+0SrTJUQJ7LBGnha@mail.gmail.com>

On Sat, Oct 30, 2010 at 3:23 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> OK, done. In the end, I put the warning message in MarkovModel.py anyway,
> since it's very easy to miss if it's in setup.py.
>

Do we really need the warning? I guess otherwise people using this code
might notice a drop in performance if they were using our C code version,
updated their Biopython, and then get the Python fallback if their NumPy
is too old.

If we do keep the warning should it be silenced in test_MarkovModel.py?
Something like the patch below should do it...

Peter

diff --git a/Tests/test_MarkovModel.py b/Tests/test_MarkovModel.py
index fc5ae8b..bb3afe8 100644
--- a/Tests/test_MarkovModel.py
+++ b/Tests/test_MarkovModel.py
@@ -9,7 +9,12 @@ except ImportError:
     raise MissingPythonDependencyError(\
         "Install NumPy if you want to use Bio.MarkovModel.")

+import warnings
+#Silence this warning:
+#For optimal speed, please update to Numpy version 1.3 or later
+warnings.filterwarnings("ignore", category=UserWarning)
 from Bio import MarkovModel
+warnings.filters.pop()

 def print_mm(markov_model):
     print "STATES: %s" % ' '.join(markov_model.states)

From biopython at maubp.freeserve.co.uk  Wed Nov  3 13:17:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Nov 2010 17:17:46 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
Message-ID: <AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>

2010/10/30 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi all,
>
> I've been hacking with buildbot, an integration server. This is to
> allow continuous testing of Biopython. So that we are alerted of any
> problems as soon as somebody does a dreadful commit (I have the top 5
> of most dreadful commits, so it was fair that I should try to do
> something about it).
>
> Things are still incomplete, but I think it is time to inform the list
> of this effort...
> To know more about buildbot you can either go to the buildbot site
> http://buildbot.net/ or see the draft doc that I have been preparing
> http://biopython.org/wiki/Continuous_integration
> There is a draft server here:
> http://events.open-bio.org:8010/
> The cool thing about buildbot is that actual testing is done by
> volunteer computers. Want to test on OS y, Python version z? You can
> offer the idle time of your laptop for that...
>

It is looking impressive Tiago - excellent work :)

>
> Obvious things missing:
>
> 0. First and foremost, see if people like this?

Looks very promising.

> 1. Changing the biopython test code to avoid stressing the network
> (i.e., having a run_tests option that will not test network tests).
> This to avoid imposing continuous traffic on genbank and friends. This
> is a show stopper.

Certainly we can't scale this up to many machines running regular
testing without limiting the network access somewhat.

> 2. Maybe warn the mailing list when some fundamental build stops
> working (e.g. send an email when a python 2.x build stops working)
> 3. Have test servers with all the applications installed (do you want
> to volunteer? This is more to do with volunteers)

I would expect "core" developers to have machines with most of
the command line applications used in Biopython's tests already
installed - but yes, we do want to make sure each optional
command line tool or library is installed on at least one build slave.

> 4. Maybe change run_tests to require all tests to be done. If we are
> doing integration testing, we want all tests to be done (missing
> applications or libraries should be an error). As an example, none of
> my tests are complete

This is about how it currently skips tests missing external
dependencies (like PopGen command line tools in your case).
I think that is OK, otherwise we'll get false positives (see below,
we can't satisfy all dependencies on all platforms).

> 5. Support mac (my access to Mr Job's fashion machines is limited).
> Again this is more a volunteer issue.

My main work machine is a Mac, so this shouldn't be an issue.

> 6. Discuss policies: One test a day? Full tests or updates? Full
> network tests (probably sporadically)? Send emails?

Right now triggering tests after each commit isn't easy to do
is it (due to limited git support in builtbot)? That might be nice
but in the short term running the tests once a day is a big step
forward.

I'd suggest we do network tests once a week (or fortnight?).

> 7. Find volunteers to cover several OSes and several Python
> versions. Assure that people do full tests (i.e. with all applications
> and libraries)

That isn't possible - some applications are not available on Windows,
and some libraries are not available on Jython or Python 3 (yet).

> 8. While I have volunteer Windows testing myself, I will not be able
> to maintain it regularly.

I have access to a Windows machine (which I use to build the
Biopython installers) but currently it is only online intermittently.
I'd have to reorganise machines due to limited network ports in
the office, but it could in principle be used as a builtbot slave.

>
> Opinions are most welcome
>

What is wrong with your Linux Python 3.1 slave? It seems that
2to3 is failing on the doctest conversion.

Peter


From tiagoantao at gmail.com  Thu Nov  4 08:04:17 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 4 Nov 2010 12:04:17 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
Message-ID: <AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>

2010/11/3 Peter <biopython at maubp.freeserve.co.uk>:
> Certainly we can't scale this up to many machines running regular
> testing without limiting the network access somewhat.

As we discussed before, I was thinking in adding an option to
run_tests.py (like --offline) and change the tests that access the
Internet to honour that flag. I was thinking in coding this myself and
then send to the list for approval (I am not going to make big changes
to the test framework myself without passing them through here).

>> 6. Discuss policies: One test a day? Full tests or updates? Full
>> network tests (probably sporadically)? Send emails?
>
> Right now triggering tests after each commit isn't easy to do
> is it (due to limited git support in builtbot)? That might be nice
> but in the short term running the tests once a day is a big step
> forward.

It is actually quite easy (with an hook on github), but I would
suggest leaving this for version 2: lets put the fundamental working
and the add bells and whistles.

> I'd suggest we do network tests once a week (or fortnight?).

OK, I will go ahead and do some changes to run_tests.py as per above.

> That isn't possible - some applications are not available on Windows,
> and some libraries are not available on Jython or Python 3 (yet).


OK, we just have to be sure (manually) that all applications that need
tested are tested.

>> 8. While I have volunteer Windows testing myself, I will not be able
>> to maintain it regularly.
>
> I have access to a Windows machine (which I use to build the
> Biopython installers) but currently it is only online intermittently.
> I'd have to reorganise machines due to limited network ports in
> the office, but it could in principle be used as a builtbot slave.

Regarding Mac and Windows, I will email again as soon as we have the
network issue sorted out. Before that we would be doing maybe too much
traffic as we have no way to stop the network access for now.

> What is wrong with your Linux Python 3.1 slave? It seems that
> 2to3 is failing on the doctest conversion.

I do not have time to evaluate this now, I will trace this issue over
the weekend.

Tiago

From biopython at maubp.freeserve.co.uk  Thu Nov  4 08:28:50 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Nov 2010 12:28:50 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
	<AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
Message-ID: <AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>

2010/11/4 Tiago Ant?o <tiagoantao at gmail.com>:
> 2010/11/3 Peter <biopython at maubp.freeserve.co.uk>:
>> Certainly we can't scale this up to many machines running regular
>> testing without limiting the network access somewhat.
>
> As we discussed before, I was thinking in adding an option to
> run_tests.py (like --offline) and change the tests that access the
> Internet to honour that flag. I was thinking in coding this myself and
> then send to the list for approval (I am not going to make big changes
> to the test framework myself without passing them through here).

Yep, that sounds good.

The previous discussion is here if anyone missed it:
http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html

>>> 6. Discuss policies: One test a day? Full tests or updates? Full
>>> network tests (probably sporadically)? Send emails?
>>
>> Right now triggering tests after each commit isn't easy to do
>> is it (due to limited git support in builtbot)? That might be nice
>> but in the short term running the tests once a day is a big step
>> forward.
>
> It is actually quite easy (with an hook on github), but I would
> suggest leaving this for version 2: lets put the fundamental working
> and the add bells and whistles.

I agree.

>> I'd suggest we do network tests once a week (or fortnight?).
>
> OK, I will go ahead and do some changes to run_tests.py as per above.
>
>> That isn't possible - some applications are not available on Windows,
>> and some libraries are not available on Jython or Python 3 (yet).
>
> OK, we just have to be sure (manually) that all applications that need
> tested are tested.

Yes, that will be a manual task. When we document the slave setup
process we can list which applications we ideally want people to install
on each OS. Having a slight range in versions would actually be a good
thing here.

>>> 8. While I have volunteer Windows testing myself, I will not be able
>>> to maintain it regularly.
>>
>> I have access to a Windows machine (which I use to build the
>> Biopython installers) but currently it is only online intermittently.
>> I'd have to reorganise machines due to limited network ports in
>> the office, but it could in principle be used as a builtbot slave.
>
> Regarding Mac and Windows, I will email again as soon as we have the
> network issue sorted out. Before that we would be doing maybe too much
> traffic as we have no way to stop the network access for now.
>
>> What is wrong with your Linux Python 3.1 slave? It seems that
>> 2to3 is failing on the doctest conversion.
>
> I do not have time to evaluate this now, I will trace this issue over
> the weekend.

Sure.

And once the --offline switch is working, we can start adding slaves
(and documenting how to do it to assist future volunteers).

Good work Tiago :)

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov  4 12:49:45 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 4 Nov 2010 12:49:45 -0400
Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error
	code 0 even on failure
In-Reply-To: <bug-3139-42@http.bugzilla.open-bio.org/>
Message-ID: <201011041649.oA4GnjEw008477@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3139


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-04 12:49 EST -------
Fix checked in by Tiago, marking as fixed.

http://github.com/biopython/biopython/commit/457ce49a060fe540f98aa37a6266cff17864487b


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Thu Nov  4 13:13:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Nov 2010 17:13:33 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
Message-ID: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>

Hi all,

I've mentioned in recent threads that I think we should try and
release Biopython 1.56 this month (November 2010).

I think the NEWS file is pretty up to date, and covers important
new functionality like Andrea Pierleoni's UniProt XML parser
and the IMGT support (with Uri Laserson).

Is there any other functionality which is ready for merging?

For example, Tiago - you've been doing lots of work on your
branch with the PopGen code. Is that code ready? I'm willing
to do the git merge/rebase.

Is there any reason to bother with a beta release this time?

If there are no pressing additions, I may be able to do the
release tomorrow - otherwise how about aiming for Thursday
or Friday next week (11 or 12 November)?

Regards,

Peter

From mjldehoon at yahoo.com  Fri Nov  5 05:40:19 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 02:40:19 -0700 (PDT)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <701600.10148.qm@web62403.mail.re1.yahoo.com>

I think the following should be removed before the release:

Bio/SwissProt/SProt.py
Bio/Transcribe.py
Bio/Translate.py

as well as the Iterator class in Bio/SCOP/Dom.py.

These have been deprecated since Biopython 1.52.

Best,
--Michiel.

--- On Thu, 11/4/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: [Biopython-dev] Biopython 1.56 release plans
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Thursday, November 4, 2010, 1:13 PM
> Hi all,
> 
> I've mentioned in recent threads that I think we should try
> and
> release Biopython 1.56 this month (November 2010).
> 
> I think the NEWS file is pretty up to date, and covers
> important
> new functionality like Andrea Pierleoni's UniProt XML
> parser
> and the IMGT support (with Uri Laserson).
> 
> Is there any other functionality which is ready for
> merging?
> 
> For example, Tiago - you've been doing lots of work on
> your
> branch with the PopGen code. Is that code ready? I'm
> willing
> to do the git merge/rebase.
> 
> Is there any reason to bother with a beta release this
> time?
> 
> If there are no pressing additions, I may be able to do
> the
> release tomorrow - otherwise how about aiming for Thursday
> or Friday next week (11 or 12 November)?
> 
> Regards,
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From tiagoantao at gmail.com  Fri Nov  5 06:13:09 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Nov 2010 10:13:09 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>

On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> For example, Tiago - you've been doing lots of work on your
> branch with the PopGen code. Is that code ready? I'm willing
> to do the git merge/rebase.

I was hoping that would offer to do a merge ;) . <sarcasm> Though we
need a broken repository to test the integration server, so maybe I
could do it myself </sarcasm>.
Yes, the code is ready.
After the merge I will still add a couple of functions (also ready,
but not committed) and make sure the test cases are fully ready. But
it should be a day only and better done after the merge.
This is mainly new code that does much faster GENEPOP parsing and
supports AFLP processing.

Tiago

From biopython at maubp.freeserve.co.uk  Fri Nov  5 06:19:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 10:19:53 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
Message-ID: <AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>

2010/11/5 Tiago Ant?o <tiagoantao at gmail.com>:
> On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> For example, Tiago - you've been doing lots of work on your
>> branch with the PopGen code. Is that code ready? I'm willing
>> to do the git merge/rebase.
>
> I was hoping that would offer to do a merge ;) . <sarcasm> Though we
> need a broken repository to test the integration server, so maybe I
> could do it myself </sarcasm>.
> Yes, the code is ready.

OK - I'll try to get your code, rebase it onto the current master,
then post it as a new branch for you to check. Once that is OK,
I'll rebase it again if the master has changed, then fast-forward
merge it to the master (that way we don't get a split and join on
the master history - just a sudden batch of commits).

> After the merge I will still add a couple of functions (also ready,
> but not committed) and make sure the test cases are fully ready.
> But it should be a day only and better done after the merge.
> This is mainly new code that does much faster GENEPOP
> parsing and supports AFLP processing.

Hopefully we can get that part done early next week.

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov  5 06:23:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 10:23:26 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <701600.10148.qm@web62403.mail.re1.yahoo.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<701600.10148.qm@web62403.mail.re1.yahoo.com>
Message-ID: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>

On Fri, Nov 5, 2010 at 9:40 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I think the following should be removed before the release:
>
> Bio/SwissProt/SProt.py
> Bio/Transcribe.py
> Bio/Translate.py
>
> as well as the Iterator class in Bio/SCOP/Dom.py.
>
> These have been deprecated since Biopython 1.52.

According to the DEPRECATED file, those modules were
deprecated in Biopython 1.51, so they are definitely due for
removal. In any case Biopython 1.52 was very nearly a year
ago [1] as it was released 22 September 2009.

Please go ahead and tidy this up.

Thanks,

Peter

[1] http://www.biopython.org/wiki/Deprecation_policy

From biopython at maubp.freeserve.co.uk  Fri Nov  5 06:47:12 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 10:47:12 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
	<AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
Message-ID: <AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>

2010/11/5 Peter <biopython at maubp.freeserve.co.uk>:
> 2010/11/5 Tiago Ant?o <tiagoantao at gmail.com>:
>> On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> For example, Tiago - you've been doing lots of work on your
>>> branch with the PopGen code. Is that code ready? I'm willing
>>> to do the git merge/rebase.
>>
>> I was hoping that would offer to do a merge ;) . <sarcasm> Though we
>> need a broken repository to test the integration server, so maybe I
>> could do it myself </sarcasm>.
>> Yes, the code is ready.
>
> OK - I'll try to get your code, rebase it onto the current master,
> then post it as a new branch for you to check.

Notes on how I did this:

$ git remote add tiago https://github.com/tiagoantao/biopython.git
$ git fetch tiago
...
>From https://github.com/tiagoantao/biopython
 * [new branch]      buildbot   -> tiago/buildbot
 * [new branch]      master     -> tiago/master

Now I want your "master" branch, but that name clashes with
my "master" branch... the following worked here:

$ git checkout tiago/master
Note: moving to "tiago/master" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
  git checkout -b <new_branch_name>
HEAD is now at 21b7a22... Merge branch 'master' of
github.com:tiagoantao/biopython
$ git checkout -b tiago-pop-gen
Switched to a new branch "tiago-pop-gen"

Now I want to write the history of you PopGen work as though
it was started from the current state of the master branch. I
was hoping there would have been no changes to the PopGen
code on the master so that this would be trivial...

$ git rebase master
...
CONFLICT (content): Merge conflict in Bio/PopGen/FDist/__init__.py
...

So open Bio/PopGen/FDist/__init__.py and look for the merge failures
(which are marked with <<<<<<< to >>>>>>>). In this it was the
removal of some deprecated code done on the pop gen branch, which
was only deprecated in Biopython 1.55 so it is a bit premature to remove
it already. So I fixed up Bio/PopGen/FDist/__init__.py and saved it. Then:

$ git add Bio/PopGen/FDist/__init__.py
$ git rebase --continue
...

This seems to have worked. I can now do a comparison to
the master branch,

$ git diff master
...

After running the unit tests (which was of limited value as I don't
have FDist installed on this machine), I then pushed it online:

$ git push peterjc tiago-pop-gen

The rebased branch is now here:
https://github.com/peterjc/biopython/tree/tiago-pop-gen

If you agree the rebased branch is sane, it should be trivial to
now merge that onto the master as a fast-forward merge.
(But I would check first that the master hasn't changed, and
if it has, repeat the rebase).

Peter


From tiagoantao at gmail.com  Fri Nov  5 06:50:32 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Nov 2010 10:50:32 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
	<AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
	<AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>
Message-ID: <AANLkTim+pFsPSt2wMvwpLsy=ks4rP6rvX6zC8DBwQqwK@mail.gmail.com>

2010/11/5 Peter <biopython at maubp.freeserve.co.uk>:
> If you agree the rebased branch is sane, it should be trivial to
> now merge that onto the master as a fast-forward merge.
> (But I would check first that the master hasn't changed, and
> if it has, repeat the rebase).

Many thanks for the guide, maybe in the future I will have the courage
to do it myself.

Go ahead and commit the changes. I will make sure the module is sane
this Sunday.

From biopython at maubp.freeserve.co.uk  Fri Nov  5 07:08:54 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 11:08:54 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTim+pFsPSt2wMvwpLsy=ks4rP6rvX6zC8DBwQqwK@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
	<AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
	<AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>
	<AANLkTim+pFsPSt2wMvwpLsy=ks4rP6rvX6zC8DBwQqwK@mail.gmail.com>
Message-ID: <AANLkTi=W8uAcNDT2FYm5KEq6h4uLK8TUyQodmNtYUg=_@mail.gmail.com>

2010/11/5 Tiago Ant?o <tiagoantao at gmail.com>:
> 2010/11/5 Peter <biopython at maubp.freeserve.co.uk>:
>> If you agree the rebased branch is sane, it should be trivial to
>> now merge that onto the master as a fast-forward merge.
>> (But I would check first that the master hasn't changed, and
>> if it has, repeat the rebase).
>
> Many thanks for the guide, maybe in the future I will have the
> courage to do it myself.
>
> Go ahead and commit the changes. I will make sure the module
> is sane this Sunday.

Done. The master hadn't changed in the meantime so I didn't
have to re-rebase:

$ git checkout master
Switched to branch "master"
$ git merge tiago-pop-gen
Updating 065e235..4f318a4
Fast forward
 Bio/PopGen/FDist/Async.py            |   21 +-
 Bio/PopGen/FDist/Controller.py       |  125 +-
 Bio/PopGen/FDist/Utils.py            |   68 +-
 Bio/PopGen/FDist/__init__.py         |    1 -
 Bio/PopGen/GenePop/EasyController.py |   10 +-
 Bio/PopGen/GenePop/FileParser.py     |   69 +-
 Tests/PopGen/data_dfst_outfile       |  300 +
 Tests/PopGen/dfdist1                 | 1204 +
 Tests/PopGen/dout.cpl                |  300 +
 Tests/PopGen/dout.dat                |50000 ++++++++++++++++++++++++++++++++++
 Tests/test_PopGen_DFDist.py          |  106 +
 Tests/test_PopGen_FDist_nodepend.py  |   20 +-
 12 files changed, 52176 insertions(+), 48 deletions(-)
 create mode 100644 Tests/PopGen/data_dfst_outfile
 create mode 100644 Tests/PopGen/dfdist1
 create mode 100644 Tests/PopGen/dout.cpl
 create mode 100644 Tests/PopGen/dout.dat
 create mode 100644 Tests/test_PopGen_DFDist.py

Then publishing it,

$ git push origin master
Counting objects: 120, done.
Delta compression using 8 threads.
Compressing objects: 100% (106/106), done.
Writing objects: 100% (106/106), 133.46 KiB, done.
Total 106 (delta 79), reused 0 (delta 0)
To git at github.com:biopython/biopython.git
   065e235..4f318a4  master -> master

And removing my now pointless public branch:

$ git push peterjc :tiago-pop-gen
To git at github.com:peterjc/biopython.git
 - [deleted]         tiago-pop-gen

We need to update the NEWS file now.

Peter


From mjldehoon at yahoo.com  Fri Nov  5 07:52:15 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 04:52:15 -0700 (PDT)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>
Message-ID: <645847.84052.qm@web62404.mail.re1.yahoo.com>

> > Bio/SwissProt/SProt.py
> > the Iterator class in Bio/SCOP/Dom.py

I have removed these.

> > Bio/Transcribe.py
> > Bio/Translate.py

These are still imported from Bio/Encodings/IUPACEncoding.py, which is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code is doing. Does anybody know?

--Michiel.


From biopython at maubp.freeserve.co.uk  Fri Nov  5 08:01:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 12:01:45 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <645847.84052.qm@web62404.mail.re1.yahoo.com>
References: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>
	<645847.84052.qm@web62404.mail.re1.yahoo.com>
Message-ID: <AANLkTikhuis9NVte79m9PZMb9pNoFBQvqqq+PwLXstAf@mail.gmail.com>

On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> > Bio/SwissProt/SProt.py
>> > the Iterator class in Bio/SCOP/Dom.py
>
> I have removed these.
>
>> > Bio/Transcribe.py
>> > Bio/Translate.py
>
> These are still imported from Bio/Encodings/IUPACEncoding.py, which
> is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code
> is doing. Does anybody know?

Ah right - sorry, that had slipped my mind:
http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html

I had suggested we leave Bio.Transcribe and Bio.Translate in for
Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager,
and Bio.Encodings.IUPACEncoding) for Biopython 1.57

Peter

From mjldehoon at yahoo.com  Fri Nov  5 08:08:17 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 05:08:17 -0700 (PDT)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <772269.63506.qm@web62407.mail.re1.yahoo.com>

I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this functionality moved to Bio.ExPASy.Prodoc at least since release 1.50, and the module has been labeled as obsolete since then. The enclosing module Bio.Prosite itself is already deprecated.

--Michiel.


From biopython at maubp.freeserve.co.uk  Fri Nov  5 08:19:27 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 12:19:27 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <772269.63506.qm@web62407.mail.re1.yahoo.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<772269.63506.qm@web62407.mail.re1.yahoo.com>
Message-ID: <AANLkTin+Qn8K+eqNvXVU87VGCWGgDGu0xJO8gD_gaYQ0@mail.gmail.com>

On Fri, Nov 5, 2010 at 12:08 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this
> functionality moved to Bio.ExPASy.Prodoc at least since release 1.50,
> and the module has been labeled as obsolete since then. The enclosing
> module Bio.Prosite itself is already deprecated.

Since Bio.Prosite is deprecated that means Bio.Prosite.Prodoc (and any
other child modules) is too. If you try "from Bio.Prosite import Prodoc"
you get a deprecation warning. Feel free to add "(DEPRECATED)" to
the Bio.Prosite.Prodoc docstrings if you think it would be clearer.

Peter

From andrea at biocomp.unibo.it  Fri Nov  5 12:43:16 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 5 Nov 2010 17:43:16 +0100 (CET)
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
Message-ID: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>

> On Tue, Oct 19, 2010 at 4:54 PM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> I've now merged this into the trunk (with a git rebase first so the
> history
> is linear - no branch+merge), and Andrea has agreed to retest it.
> Other testing and comments are most welcome.
>
> Peter
>


I've done a couple of testing, from the master biopython branch.
The uniprot-xml parser successfully parsed the 2010_11 release of uniprot
containing
522,019 entries.

The plain text 'swiss' parser took 6 mins to parse the complete flatfile
uniprot db on my system (python 2.6 on a macbook pro, core2duo).
the uniprot-xml parser took 12 minutes to do the same task when using
cElementTree and
looks pretty good to me (compare this to the 8 minutes I needed to
download the gzipped db).
However it took more than 80 mins to do the same task using ElementTree.
So be aware
that the parser can turn very slow without the C library.

I'm currently retesting also on TrEMBL, but I don't think there is going
to be any problem.
I have no idea of the performances with jython, and similar derivations of
python, nor if it works.

Andrea


From eric.talevich at gmail.com  Fri Nov  5 13:26:03 2010
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 5 Nov 2010 13:26:03 -0400
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTin7khy0GkxBd1LMJpCtQeAaFHybS1v4C+52FdK5@mail.gmail.com>

On Fri, Nov 5, 2010 at 12:43 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it>wrote:

>
> I've done a couple of testing, from the master biopython branch.
> The uniprot-xml parser successfully parsed the 2010_11 release of uniprot
> containing
> 522,019 entries.
>
> [...]
>
> I have no idea of the performances with jython, and similar derivations of
> python, nor if it works.
>
>
Speaking from my experience with ElementTree in Bio.Phylo -- Jython 2.5's
implementation of xml.etree should work as a drop-in replacement, but it's
painfully slow. However, I've read that the next release of Jython will
include some substantial overall speedups, which should make it more
competitive.

I once tried to get Biopython working on IronPython (on Mono, on Linux), but
didn't succeed. The release I used didn't seem to have a compatible
xml.etree implementation, though the developers may have made progress on
this recently.

-Eric

From biopython at maubp.freeserve.co.uk  Fri Nov  5 13:53:50 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 17:53:50 +0000
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>

On Fri, Nov 5, 2010 at 4:43 PM, Andrea Pierleoni wrote:
>
> On Tue, Oct 19, 2010 at 4:54 PM, Peter wrote:
>> I've now merged this into the trunk (with a git rebase first so the
>> history is linear - no branch+merge), and Andrea has agreed to
>> retest it. Other testing and comments are most welcome.
>>
>> Peter
>>
>
>
> I've done a couple of testing, from the master biopython branch.
> The uniprot-xml parser successfully parsed the 2010_11 release
> of uniprot containing 522,019 entries.
>
> The plain text 'swiss' parser took 6 mins to parse the complete flatfile
> uniprot db on my system (python 2.6 on a macbook pro, core2duo).
> the uniprot-xml parser took 12 minutes to do the same task when using
> cElementTree and looks pretty good to me (compare this to the 8
> minutes I needed to download the gzipped db).

I think I have a slightly older version as it only has 519348 entries.
My timings using Python 2.6 on Mac OS X, using looping over the
file with Bio.SeqIO.parse() and incrementing a counter:

uniprot_sprot.fasta, 232 MB, 15s ("fasta")
uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss")
uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml")

Note the XML file is about twice the size of the plain text swiss
format file, and as you noted, takes about twice as long to parse.

> However it took more than 80 mins to do the same task using
> ElementTree. So be aware that the parser can turn very slow
> without the C library.
>
> I'm currently retesting also on TrEMBL, but I don't think there is going
> to be any problem.

OK - those files are about 10 times bigger, right?

> I have no idea of the performances with jython, and similar
> derivations of python, nor if it works.

The tests all pass with Jython 2.5.1 (running under Mac OS X),
and here are some timings:

uniprot_sprot.fasta, 232 MB, 21s ("fasta")
uniprot_sprot.dat, 2.2 GB, 8m34s ("swiss")
uniprot_sprot.xml, 4.5 GB, FAILED ("uniprot-xml")

The XML file failed almost immediately with this traceback:

Traceback (most recent call last):
  File "../count.py", line 13, in <module>
    for record in SeqIO.parse(open(filename), format_name):
  File "../count.py", line 13, in <module>
    for record in SeqIO.parse(open(filename), format_name):
  File "/Users/xxx/jython2.5.1/Lib/site-packages/Bio/SeqIO/UniprotIO.py",
line 80, in UniprotIterator
    for event, elem in ElementTree.iterparse(handle, events=("start", "end")):
  File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 937, in next
    self._parser.feed(data)
  File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 1245, in feed
    self._parser.Parse(data, 0)
  File "/Users/xxx/jython2.5.1/Lib/xml/parsers/expat.py", line 195, in Parse
    self._data.append(data)
	at java.util.Arrays.copyOf(Arrays.java:2882)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
	at java.lang.StringBuilder.append(StringBuilder.java:119)
	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)

java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space


Note this wasn't a simple out of memory error (the machine had GBs
free), rather it was heap space. That's a bit frustrating - but Kyle's
email suggests things could improve in the next Jython release.

Peter

From andrea at biocomp.unibo.it  Fri Nov  5 14:09:08 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 5 Nov 2010 19:09:08 +0100 (CET)
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
Message-ID: <37e194782e740bf5bd2e872bfc6a37d3.squirrel@lipid.biocomp.unibo.it>


> I think I have a slightly older version as it only has 519348 entries.
> My timings using Python 2.6 on Mac OS X, using looping over the
> file with Bio.SeqIO.parse() and incrementing a counter:
>
> uniprot_sprot.fasta, 232 MB, 15s ("fasta")
> uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss")
> uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml")
>

my timings were without the counter :)

> Note the XML file is about twice the size of the plain text swiss
> format file, and as you noted, takes about twice as long to parse.
>

yes it's true, but iterating over the two files takes 18s for .dat one
and 38s for .xml one. the information retrieved is more or less
the same. the rest is overhead due to the XML file complexity.
however it's pretty fast anyway, at least with cElementTree.

>> I'm currently retesting also on TrEMBL, but I don't think there is going
>> to be any problem.
>
> OK - those files are about 10 times bigger, right?

it's currently 12 millions entries! so it's 24 times bigger (7.5Gb gzipped)
in fact I can't complete the test today. I'll keep you updated.


>
> Note this wasn't a simple out of memory error (the machine had GBs
> free), rather it was heap space. That's a bit frustrating - but Kyle's
> email suggests things could improve in the next Jython release.
>


Is the new Jython release coming soon? I'm really a newbie to jython,
so I don't think I can help with it. maybe it is safer for jython users to
use the
'swiss' parser until the new release came out, particularly if they have
performance issues.

Andrea


From mjldehoon at yahoo.com  Fri Nov  5 22:41:57 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 19:41:57 -0700 (PDT)
Subject: [Biopython-dev] Bio/cMarkovModelmodule.c
In-Reply-To: <AANLkTi=U1bcLmbJczO3GNmkViBMe+0SrTJUQJ7LBGnha@mail.gmail.com>
Message-ID: <646748.14362.qm@web62407.mail.re1.yahoo.com>

--- On Wed, 11/3/10, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > I put the warning message in MarkovModel.py anyway,
> > since it's very easy to miss if it's in setup.py.
> 
> Do we really need the warning? I guess otherwise people
> using this code
> might notice a drop in performance if they were using our C
> code version,
> updated their Biopython, and then get the Python fallback
> if their NumPy is too old.

We need the warning, otherwise we'd leave the user guessing as to why their code is suddenly slower.

> If we do keep the warning should it be silenced in
> test_MarkovModel.py?

OK I've added this warning.

--Michiel.


From biopython at maubp.freeserve.co.uk  Mon Nov  8 11:12:06 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 8 Nov 2010 16:12:06 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
	<AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
	<AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>
Message-ID: <AANLkTimctC=9-EVUYhF01SmOzR1wKQByVuvi96_vTrfZ@mail.gmail.com>

2010/11/4 Peter <biopython at maubp.freeserve.co.uk>:
>> As we discussed before, I was thinking in adding an option to
>> run_tests.py (like --offline) and change the tests that access the
>> Internet to honour that flag. I was thinking in coding this myself and
>> then send to the list for approval (I am not going to make big changes
>> to the test framework myself without passing them through here).
>
> Yep, that sounds good.
>
> The previous discussion is here if anyone missed it:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html
>

Hi Tiago,

I've implemented the proposed --offline switch in run_tests.py,

https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697

Does that work for you ? If you can come up with a more
elegant solution do speak up - mine is a bit of a hack ;)

Peter

From tiagoantao at gmail.com  Mon Nov  8 11:17:07 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 8 Nov 2010 16:17:07 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTimctC=9-EVUYhF01SmOzR1wKQByVuvi96_vTrfZ@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
	<AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
	<AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>
	<AANLkTimctC=9-EVUYhF01SmOzR1wKQByVuvi96_vTrfZ@mail.gmail.com>
Message-ID: <AANLkTi=_mj2se5k6wAEDKyhGrUqkhqgS-geAfkh+8+zx@mail.gmail.com>

2010/11/8 Peter <biopython at maubp.freeserve.co.uk>:
> I've implemented the proposed --offline switch in run_tests.py,
>
> https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697
>
> Does that work for you ? If you can come up with a more
> elegant solution do speak up - mine is a bit of a hack ;)


Thanks a lot. I was waiting for the 1.56 release to work on this thing
(to avoid adding entrpoy). But as this is now in, I will progress
immediately with the rest of the integration server work. I will
contact soon regarding Mac testing.

From tiagoantao at gmail.com  Mon Nov  8 11:34:31 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 8 Nov 2010 16:34:31 +0000
Subject: [Biopython-dev] Bio/Entrez/__init__.py
Message-ID: <AANLkTimqTeR7qERSW8ATepANRxQeVRyVNg=BBnXpXYXW@mail.gmail.com>

Hi,

There is a doctest line that is making 2to3 go bonkers on Bio.Entrez
(__init__.py)
Line 55
             >>> for record in records:
             ...     # each record is a Python dictionary or list.

Simplying adding a
...       pass

Is enough (the code should not work as it is an empty for, so 2to3 is
actually correct)

-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Mon Nov  8 11:38:08 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 8 Nov 2010 16:38:08 +0000
Subject: [Biopython-dev] Bio/Entrez/__init__.py
In-Reply-To: <AANLkTimqTeR7qERSW8ATepANRxQeVRyVNg=BBnXpXYXW@mail.gmail.com>
References: <AANLkTimqTeR7qERSW8ATepANRxQeVRyVNg=BBnXpXYXW@mail.gmail.com>
Message-ID: <AANLkTin0g01DY3HhTEZjRNVM2zwgp-G-AtzS8YqWb+Ct@mail.gmail.com>

2010/11/8 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> There is a doctest line that is making 2to3 go bonkers on Bio.Entrez
> (__init__.py)
> Line 55
> ? ? ? ? ? ? >>> for record in records:
> ? ? ? ? ? ? ... ? ? # each record is a Python dictionary or list.
>
> Simplying adding a
> ... ? ? ? pass
>
> Is enough (the code should not work as it is an empty for, so 2to3 is
> actually correct)

Ah - that isn't actually being used as a doctest (we don't call it
in run_tests.py) and it wouldn't work if we tried because half
the function arguments are omitted or left as dots.

I like your solution of adding the pass line.

Peter


From mjldehoon at yahoo.com  Mon Nov  8 20:22:39 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 8 Nov 2010 17:22:39 -0800 (PST)
Subject: [Biopython-dev] Bio/Entrez/__init__.py
In-Reply-To: <AANLkTin0g01DY3HhTEZjRNVM2zwgp-G-AtzS8YqWb+Ct@mail.gmail.com>
Message-ID: <365364.32303.qm@web62403.mail.re1.yahoo.com>

I've added this line:

    ...    print record

which should solve the 2to3 error.

--Michiel.

--- On Mon, 11/8/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py
> To: "Tiago Ant?o" <tiagoantao at gmail.com>
> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Monday, November 8, 2010, 11:38 AM
> 2010/11/8 Tiago Ant?o <tiagoantao at gmail.com>:
> > Hi,
> >
> > There is a doctest line that is making 2to3 go bonkers
> on Bio.Entrez
> > (__init__.py)
> > Line 55
> > ? ? ? ? ? ? >>> for record in records:
> > ? ? ? ? ? ? ... ? ? # each record is a Python
> dictionary or list.
> >
> > Simplying adding a
> > ... ? ? ? pass
> >
> > Is enough (the code should not work as it is an empty
> for, so 2to3 is
> > actually correct)
> 
> Ah - that isn't actually being used as a doctest (we don't
> call it
> in run_tests.py) and it wouldn't work if we tried because
> half
> the function arguments are omitted or left as dots.
> 
> I like your solution of adding the pass line.
> 
> Peter
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From tiagoantao at gmail.com  Tue Nov  9 04:12:29 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Nov 2010 09:12:29 +0000
Subject: [Biopython-dev] Bio/Entrez/__init__.py
In-Reply-To: <365364.32303.qm@web62403.mail.re1.yahoo.com>
References: <AANLkTin0g01DY3HhTEZjRNVM2zwgp-G-AtzS8YqWb+Ct@mail.gmail.com>
	<365364.32303.qm@web62403.mail.re1.yahoo.com>
Message-ID: <AANLkTim86p-vgZjp60BggkCzGJ_RaRCpXdvZHeV=YMwv@mail.gmail.com>

The buildbot server VM is currently down (Chris is moving it to
another physical location). As soon as the machine is back up, I will
activate the server and maybe we can start activating things on a Mac
architecture.

I was thinking in sending emails to the list (automatically) when a
build that was previously working, stops doing so...?

2010/11/9 Michiel de Hoon <mjldehoon at yahoo.com>:
> I've added this line:
>
> ? ?... ? ?print record
>
> which should solve the 2to3 error.
>
> --Michiel.
>
> --- On Mon, 11/8/10, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
>> From: Peter <biopython at maubp.freeserve.co.uk>
>> Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py
>> To: "Tiago Ant?o" <tiagoantao at gmail.com>
>> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
>> Date: Monday, November 8, 2010, 11:38 AM
>> 2010/11/8 Tiago Ant?o <tiagoantao at gmail.com>:
>> > Hi,
>> >
>> > There is a doctest line that is making 2to3 go bonkers
>> on Bio.Entrez
>> > (__init__.py)
>> > Line 55
>> > ? ? ? ? ? ? >>> for record in records:
>> > ? ? ? ? ? ? ... ? ? # each record is a Python
>> dictionary or list.
>> >
>> > Simplying adding a
>> > ... ? ? ? pass
>> >
>> > Is enough (the code should not work as it is an empty
>> for, so 2to3 is
>> > actually correct)
>>
>> Ah - that isn't actually being used as a doctest (we don't
>> call it
>> in run_tests.py) and it wouldn't work if we tried because
>> half
>> the function arguments are omitted or left as dots.
>>
>> I like your solution of adding the pass line.
>>
>> Peter
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
>
>
>


-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Tue Nov  9 04:57:47 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 09:57:47 +0000
Subject: [Biopython-dev] Continuous integration server
Message-ID: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>

2010/11/9 Tiago Ant?o <tiagoantao at gmail.com>:
> The buildbot server VM is currently down (Chris is moving it to
> another physical location). As soon as the machine is back up, I will
> activate the server and maybe we can start activating things on a Mac
> architecture.
>
> I was thinking in sending emails to the list (automatically) when a
> build that was previously working, stops doing so...?
>

That sounds worth trying, as it removes the need for us to actively
check the buildbot server's webreport. Alternatively we should be
able to use the RSS/Atom feed.

One concern is if we have (say) 8 builtbot slaves, and a change on
the trunk accidentally breaks a unit test (on all platforms), does that
mean we'd get one email or eight?

Peter


From tiagoantao at gmail.com  Tue Nov  9 05:14:37 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Nov 2010 10:14:37 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
Message-ID: <AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>

2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
> That sounds worth trying, as it removes the need for us to actively
> check the buildbot server's webreport. Alternatively we should be
> able to use the RSS/Atom feed.

The web interface has RSS and atom.

> One concern is if we have (say) 8 builtbot slaves, and a change on
> the trunk accidentally breaks a unit test (on all platforms), does that
> mean we'd get one email or eight?

It can be configured to send only 1. I just cannot promise that I will
get the configuration right at the first time ;) . But it can be done.

From biopython at maubp.freeserve.co.uk  Tue Nov  9 05:33:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 10:33:26 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
	<AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>
Message-ID: <AANLkTimh3umr_D=bRchK0shWT0zwxo45EeaiMop-53Sz@mail.gmail.com>

2010/11/9 Tiago Ant?o <tiagoantao at gmail.com>:
> 2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
>> That sounds worth trying, as it removes the need for us to actively
>> check the buildbot server's webreport. Alternatively we should be
>> able to use the RSS/Atom feed.
>
> The web interface has RSS and atom.

Yet another feed for me to track :)

Emails have the advantage of being logged on the mailing list
archive. Lets try it and see how it goes.

>> One concern is if we have (say) 8 builtbot slaves, and a change on
>> the trunk accidentally breaks a unit test (on all platforms), does that
>> mean we'd get one email or eight?
>
> It can be configured to send only 1. I just cannot promise that I will
> get the configuration right at the first time ;) . But it can be done.

I thought they (buildbot) would have considered that example :)
You'll probably need the buildbot server's email address added
to the biopython-dev mailing list's white list - let me know nearer
the time.

Peter


From tiagoantao at gmail.com  Tue Nov  9 09:07:56 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Nov 2010 14:07:56 +0000
Subject: [Biopython-dev] bugzilla jython platform
Message-ID: <AANLkTimKpWaeKKyi10mEtLTeOy8f8BcHmBi3e_mv7ZVd@mail.gmail.com>

Hi,

Just a minor thingy: would it be possible to have a bugzilla platform
called jython? (Or OS).

I am going to report a bug on Jython and noticed that it is not available.

-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 09:09:42 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 09:09:42 -0500
Subject: [Biopython-dev] [Bug 3155] New: Some Phylip tools seem to fail on
	Jython
Message-ID: <bug-3155-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155

           Summary: Some Phylip tools seem to fail on Jython
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: tiagoantao at gmail.com


According to the integration tests, some Phylip tools seem to fail on Jython.

Please see below or http://events.open-bio.org:8010/builders/jython/builds/18

======================================================================
ERROR: pseudosample a phylip DNA alignment written with AlignIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 270, in test_bootstrap_AlignIO_DNA
    self.check_bootstrap("Phylip/opuntia.phy", "phylip")
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 251, in check_bootstrap
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fseqboot -auto -filter -outfile=test_file -sequence=Phylip/opuntia.phy
-seqtype=d -reps=2

======================================================================
ERROR: pseudosample a phylip protein alignment written with AlignIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 279, in test_bootstrap_AlignIO_protein
    self.check_bootstrap("Phylip/hedgehog.phy", "phylip", "p")
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 251, in check_bootstrap
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fseqboot -auto -filter -outfile=test_file -sequence=Phylip/hedgehog.phy
-seqtype=p -reps=2

======================================================================
ERROR: Calculate distance matrix from an AlignIO written protein alignment
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 157, in test_distances_from_protein_AlignIO
    self.distances_from_alignment("Phylip/hedgehog.phy", DNA=False)
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 117, in distances_from_alignment
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fprotdist -auto -outfile=test_file -sequence=Phylip/hedgehog.phy -method=j

======================================================================
ERROR: Make a parsimony tree from an alignment written with AlignIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 210, in test_parsimony_tree_from_AlignIO_DNA
    self.parsimony_tree("Phylip/opuntia.phy", "phylip")
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 194, in parsimony_tree
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fdnapars -auto -stdout -sequence=Phylip/opuntia.phy -outtreefile=test_file

======================================================================


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Nov  9 09:14:10 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 14:14:10 +0000
Subject: [Biopython-dev] bugzilla jython platform
In-Reply-To: <AANLkTimKpWaeKKyi10mEtLTeOy8f8BcHmBi3e_mv7ZVd@mail.gmail.com>
References: <AANLkTimKpWaeKKyi10mEtLTeOy8f8BcHmBi3e_mv7ZVd@mail.gmail.com>
Message-ID: <AANLkTikD4UquEALj0_DffoGtW5qwALn3PZikJ8BPOq=U@mail.gmail.com>

2010/11/9 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> Just a minor thingy: would it be possible to have a bugzilla platform
> called jython? (Or OS).
>
> I am going to report a bug on Jython and noticed that it is not available.
>

It doesn't make sense to me to add Jython as an OS (for one thing, the
OS field is used by all the Bio* projects on our bugzilla, also you can
run Jython on Windows/Mac/Linux etc).

Currently we don't even have a field for the Python version... maybe
we should add a whole new (Biopython only) field for this (e.g. with
Python 2.4, 2.5, 2.6, 2.7, 3.1, and Jython 2.5 as choices for now).

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 09:26:57 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 09:26:57 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091426.oA9EQvws028228@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-09 09:26 EST -------
I realise I don't have EMBOSS phylipnew installed on my machine with Jython, so
the test has just been skipped.

What version of Jython?

What version of EMBOSS, and the phylipnew package?

Do these tests pass *on the same machine* if run in normal (C) Python?
Alternately, do these four command line examples work when run by hand?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 09:55:50 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 09:55:50 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091455.oA9Eto7n029965@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #2 from tiagoantao at gmail.com  2010-11-09 09:55 EST -------
(In reply to comment #1)
> What version of Jython?

Jython 2.5.2rc2


> What version of EMBOSS, and the phylipnew package?

EMBOSS 6.0.1
Phylip seems 3.68

> Do these tests pass *on the same machine* if run in normal (C) Python?

Yep. This is the same machine as the one doing integration testing in C-Python

> Alternately, do these four command line examples work when run by hand?

No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy
does not exist. Indeed this should not work, I think


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 10:05:29 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 10:05:29 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091505.oA9F5SxD030383@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-09 10:05 EST -------
(In reply to comment #2)
> (In reply to comment #1)
> > What version of Jython?
> 
> Jython 2.5.2rc2

Can you easily update to Jython 2.5.2 (actual release)?

> > What version of EMBOSS, and the phylipnew package?
> 
> EMBOSS 6.0.1
> Phylip seems 3.68

Your EMBOSS is a bit old, but should be fine.

> > Do these tests pass *on the same machine* if run in normal (C) Python?
> 
> Yep. This is the same machine as the one doing integration testing in C-Python
> 

Good - that means we can rule out EMBOSS being too old.

> > Alternately, do these four command line examples work when run by hand?
> 
> No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy
> does not exist. Indeed this should not work, I think
> 

The unit tests create Phylip/opuntia.phy at runtime, converted from
Clustalw/opuntia.aln -- I'd forgotten about that and it does make testing the
individual commands harder. The point here is to ensure the PHYLIP likes what
we write out as PHYLIP format.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 10:11:37 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 10:11:37 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091511.oA9FBbaK030580@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #4 from tiagoantao at gmail.com  2010-11-09 10:11 EST -------
> Can you easily update to Jython 2.5.2 (actual release)?

rc2 is the most recent. I can do 2.5.*1*


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 10:33:39 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 10:33:39 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091533.oA9FXdSo031629@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-09 10:33 EST -------
(In reply to comment #4)
> > Can you easily update to Jython 2.5.2 (actual release)?
> 
> rc2 is the most recent. I can do 2.5.*1*

Sorry - my mistake. I have Jython 2.5.1 (final release).

I'll try to get EMBOSS phylipnew on this machine (useful anyway as a potential
buildbot slave).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From biopython at maubp.freeserve.co.uk  Tue Nov  9 17:54:13 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 22:54:13 +0000
Subject: [Biopython-dev] buildbot and setup.py
Message-ID: <AANLkTinhPa5VgCLzs3WS+ezyghL4JsD=jzb4Qwndw1vy@mail.gmail.com>

Hi all,

For the continuous integration server, it is important
to be able to run setup.py without it prompting the
user. There are (just?) two potential prompts at the
moment.

First, if running on Python 3, it asks the user to
confirm they have run 2to3 as per the README
file. This was done as a bit of a hack - perhaps
now that most of the Python code works on Py3
we can avoid this?

Second, if running without NumPy, it asks the user
if they really want to do this as it is best to install
NumPy to use all of Biopython.

For the purposes of the buildbot, I think we should
have at least one build-slave without NumPy. This
should then catch any regressions in the test suite.
Since Jython doesn't have NumPy (and so we
don't prompt about it) then maybe that would
double in this role for the test matrix ;)

Right now Tiago has solved the first prompt (about
2to3) by piping a "y\n" into stdin. I guess piping
two would solve the case of no NumPy on Py3 ;)

However, do we need an --auto or --force flag
to bypass these yes or no prompts in setup.py?

(Meanwhile I'm off to install NumPy under
Python 3 on my Linux box which will avoid
the issue for now)

Peter

From tiagoantao at gmail.com  Tue Nov  9 19:15:02 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 00:15:02 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
Message-ID: <AANLkTim488HLxcxsw0yTvR7T6row3C3gvoHNFf4Ww_wT@mail.gmail.com>

2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
> One concern is if we have (say) 8 builtbot slaves, and a change on
> the trunk accidentally breaks a unit test (on all platforms), does that
> mean we'd get one email or eight?

I was wrong here. It is not possible to send only one email. I misread
the documentation.
But it is quite simple to extend the mail system (by code) to do this.
I least it seems simple: I will have a try at it tomorrow.

For now I am only sending automated emails to myself and Peter. If
anyone wants to be in the loop, please tell me.

As soon as the system is reliable I will send to biopython-dev.

From tiagoantao at gmail.com  Tue Nov  9 19:21:15 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 00:21:15 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTimh3umr_D=bRchK0shWT0zwxo45EeaiMop-53Sz@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
	<AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>
	<AANLkTimh3umr_D=bRchK0shWT0zwxo45EeaiMop-53Sz@mail.gmail.com>
Message-ID: <AANLkTim7_T=pO6s7UsOWfadmNz9ZYE6Yof-rkSROb=s-@mail.gmail.com>

2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
>> The web interface has RSS and atom.
>
> Yet another feed for me to track :)

In order to minimize the number of feed entries one can specify
constraints, useful is just to report failed builds. Like this
http://events.open-bio.org:8010/rss?failures_only=true
Which only shows entries that relate to failures.

Tiago

From eric.talevich at gmail.com  Tue Nov  9 22:04:38 2010
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 9 Nov 2010 22:04:38 -0500
Subject: [Biopython-dev] buildbot and setup.py
In-Reply-To: <AANLkTinhPa5VgCLzs3WS+ezyghL4JsD=jzb4Qwndw1vy@mail.gmail.com>
References: <AANLkTinhPa5VgCLzs3WS+ezyghL4JsD=jzb4Qwndw1vy@mail.gmail.com>
Message-ID: <AANLkTimNfS_B=6SNggrKTABEdcjwn7X2Js0jAeqOeS_q@mail.gmail.com>

On Tue, Nov 9, 2010 at 5:54 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> Hi all,
>
> For the continuous integration server, it is important
> to be able to run setup.py without it prompting the
> user. There are (just?) two potential prompts at the
> moment.
>
> [...]
>
However, do we need an --auto or --force flag
> to bypass these yes or no prompts in setup.py?
>

I'd find a flag like that convenient for running setup.py manually, too.

For reference: apt-get takes a "-y" option which assumes a "yes" answer to
all prompts, just like this.

-Eric

From biopython at maubp.freeserve.co.uk  Wed Nov 10 06:48:30 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 11:48:30 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on
	Jython
Message-ID: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>

Hi Taigo,

>From your buildbot log for Jython 2.5.2 (release candidate 2), and
also my Mac OS
Jython 2.5.1 install, we have a PopGen failure:

======================================================================
FAIL: Test get alleles.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py",
line 57, in test_get_alleles
    self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20])
AssertionError: [20, 3] != [3, 20]

Notice that by using the unittest assertEqual method we get to see the
values compared:
https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a

Before the change the output was like this:

======================================================================
FAIL: Test get alleles.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles
    assert self.ctrl.get_alleles(0,"Locus3") == [3, 20]
AssertionError


It is interesting that Jython is giving [20, 3] rather than [3, 20]. My
guess would be this is down to something python implementation
specific like the sort order of dictionaries or sets, in which case
the unittest needs to compare sorted lists -- or the get_alleles
method needs a sort?

Peter

From tiagoantao at gmail.com  Wed Nov 10 08:05:59 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 13:05:59 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
Message-ID: <AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>

I know, this might be an issue with the jython version (being just a
release candidate). I am going to wait for results on 2.5.1 and
compare. Or I might just install it myself and see.

Is there any reason for the unittest framework to ignore OSErrors? I
am getting some OSErrors (just in jython 2.5.2) and they are being
ignored (but reported as warnings)...

Tiago

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> Hi Taigo,
>
> From your buildbot log for Jython 2.5.2 (release candidate 2), and
> also my Mac OS
> Jython 2.5.1 install, we have a PopGen failure:
>
> ======================================================================
> FAIL: Test get alleles.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py",
> line 57, in test_get_alleles
> ? ?self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20])
> AssertionError: [20, 3] != [3, 20]
>
> Notice that by using the unittest assertEqual method we get to see the
> values compared:
> https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a
>
> Before the change the output was like this:
>
> ======================================================================
> FAIL: Test get alleles.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles
> ? ?assert self.ctrl.get_alleles(0,"Locus3") == [3, 20]
> AssertionError
>
>
> It is interesting that Jython is giving [20, 3] rather than [3, 20]. My
> guess would be this is down to something python implementation
> specific like the sort order of dictionaries or sets, in which case
> the unittest needs to compare sorted lists -- or the get_alleles
> method needs a sort?
>
> Peter
>


-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Wed Nov 10 08:15:16 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 13:15:16 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
Message-ID: <AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>

2010/11/10 Tiago Ant?o <tiagoantao at gmail.com>:
>
> I know, this might be an issue with the jython version (being just a
> release candidate). I am going to wait for results on 2.5.1 and
> compare. Or I might just install it myself and see.

I also see the same test_get_alleles failure on the Mac and on
Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase
candidate specific issue.

> Is there any reason for the unittest framework to ignore OSErrors? I
> am getting some OSErrors (just in jython 2.5.2) and they are being
> ignored (but reported as warnings)...
>
> Tiago

I've just recently put Jython 2.5.1 on my Windows box, and
in addition to the test_get_alleles failure, I also see OSErrors
about being unable to delete files (but the F stats test still
passes). This seems to be a wider issue, affecting more than
just test_PopGen_GenePop_EasyController.py, but it does
seem to be OS specific (no problems deleting files in
Jython 2.5.1 on my Mac, I've not tried on Linux).

Peter


From biopython at maubp.freeserve.co.uk  Wed Nov 10 09:14:07 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 14:14:07 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
Message-ID: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>

Hi Tiago

Is/was test_PopGen_SimCoal.py working for you on Windows?
I'm getting "Output directory not created!" under Python 2.6

I've also tried it under Jython 2.5.1 and had to tweak things to
find the executable, thus:
https://github.com/biopython/biopython/commit/95cba71f7286860fa9cd79843c47b075a2f530a6

Now both Jython 2.5.1 and Python 2.6 give the same error,
"Output directory not created!" (progress I suppose).

Peter

P.S. On the bright side, both the FDist2 and DFDist tests are
passing on Windows on Python 2.6 and Jython 2.5.1 now
(after a couple of little tweaks).

From tiagoantao at gmail.com  Wed Nov 10 09:35:31 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 14:35:31 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
Message-ID: <AANLkTi=phgYOTJWHAq1vMEYr9rNPbdG-eckJm=Asa4oH@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> I've just recently put Jython 2.5.1 on my Windows box, and
> in addition to the test_get_alleles failure, I also see OSErrors
> about being unable to delete files (but the F stats test still
> passes). This seems to be a wider issue, affecting more than
> just test_PopGen_GenePop_EasyController.py, but it does
> seem to be OS specific (no problems deleting files in
> Jython 2.5.1 on my Mac, I've not tried on Linux).

The OSError has to potential to be somewhat nasty (i.e. throughout
other Bio.* modules) as it is silent. There might be tests failing
that report OK.

Tiago

From tiagoantao at gmail.com  Wed Nov 10 09:42:18 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 14:42:18 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
In-Reply-To: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
References: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
Message-ID: <AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> Hi Tiago
>
> Is/was test_PopGen_SimCoal.py working for you on Windows?
> I'm getting "Output directory not created!" under Python 2.6

This code is used 99.99% on Jython (as the fdist/dfdist code and
genepop parser, BTW). I happen to test on Linux.
I will fire my Windows machine and have a look, but I do not have it
at hand. This will have to wait a few hours or a couple of days at
most)


> Now both Jython 2.5.1 and Python 2.6 give the same error,
> "Output directory not created!" (progress I suppose).

I cannot test this here, but I am 99% sure that the problem is the
executable name (case sensitive on Windows and Mac, maybe even on
Windows Jython?). If it is compiled with a capital S (seen happening)
it might be a problem.

> P.S. On the bright side, both the FDist2 and DFDist tests are
> passing on Windows on Python 2.6 and Jython 2.5.1 now
> (after a couple of little tweaks).

Were they failing on Jython? I do have a reasonable amount of users on
my applications (jython based)...


-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Wed Nov 10 10:13:27 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 15:13:27 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
In-Reply-To: <AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>
References: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
	<AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>
Message-ID: <AANLkTinBqkc2dEWZ82RBiP89TwwP5WfSge+=rH4-GUYH@mail.gmail.com>

2010/11/10 Tiago Ant?o <tiagoantao at gmail.com>:
>
> 2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
>> Hi Tiago
>>
>> Is/was test_PopGen_SimCoal.py working for you on Windows?
>> I'm getting "Output directory not created!" under Python 2.6
>
> This code is used 99.99% on Jython (as the fdist/dfdist code and
> genepop parser, BTW). I happen to test on Linux.
> I will fire my Windows machine and have a look, but I do not have it
> at hand. This will have to wait a few hours or a couple of days at
> most)
>
>
>> Now both Jython 2.5.1 and Python 2.6 give the same error,
>> "Output directory not created!" (progress I suppose).
>
> I cannot test this here, but I am 99% sure that the problem is the
> executable name (case sensitive on Windows and Mac, maybe even on
> Windows Jython?). If it is compiled with a capital S (seen happening)
> it might be a problem.

It could also be something with spaces in filenames, much
more common on Windows :(

>> P.S. On the bright side, both the FDist2 and DFDist tests are
>> passing on Windows on Python 2.6 and Jython 2.5.1 now
>> (after a couple of little tweaks).
>
> Were they failing on Jython? I do have a reasonable amount
> of users on my applications (jython based)...

I tweaked the executable checking in the unit tests, it now
looks for all four binaries required, and works on Windows
(both Python and Jython) and Mac (both Python and Jython).

Peter


From biopython at maubp.freeserve.co.uk  Wed Nov 10 12:35:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 17:35:37 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
In-Reply-To: <AANLkTinBqkc2dEWZ82RBiP89TwwP5WfSge+=rH4-GUYH@mail.gmail.com>
References: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
	<AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>
	<AANLkTinBqkc2dEWZ82RBiP89TwwP5WfSge+=rH4-GUYH@mail.gmail.com>
Message-ID: <AANLkTi=tYbJqFgHdZrOgqMrcdPQnZOqoDpHoUf9-5HZO@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
>>
>> I cannot test this here, but I am 99% sure that the problem is the
>> executable name (case sensitive on Windows and Mac, maybe even on
>> Windows Jython?). If it is compiled with a capital S (seen happening)
>> it might be a problem.
>
> It could also be something with spaces in filenames, much
> more common on Windows :(
>

Yep, that was it. Fixed:
https://github.com/biopython/biopython/commit/e24f1662b5e619d558fea17c11ddea12c3561e53

I've got my Windows box running as a buildslave now, so
fingers crossed it will all be green.

Peter

From lpritc at scri.ac.uk  Thu Nov 11 09:12:21 2010
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 11 Nov 2010 14:12:21 +0000
Subject: [Biopython-dev] Bioinformatics position
Message-ID: <C901AA45.3F8C2%lpritc@scri.ac.uk>

We have a bioinformatics post available at SCRI, and would be grateful if
you could please bring it to the attention of any colleagues who may be
interested in applying.  It is advertised at
http://www.jobs.ac.uk/job/ABS904/bioinformatics/ and some details are
included below:

"""
Bioinformatics
Scottish Crop Research Institute- SCRI
SCRI is Scotland's leading Institute for research on plants and their
interactions with the environment, particularly in managed ecosystems. Our
mission is to conduct excellent research in plant and environmental
sciences. Our vision is to deliver innovative products, knowledge and
services that enrich the life of the community and address the public goods
of environmental sustainability, high quality and healthy food.

Post Reference SMB/1/10

Research in the Plant Pathology Programme at SCRI is founded on pathogen
genomics, and scientists in the Programme have a strong track record of
contributing to whole genome sequencing and genetic analysis of economically
important pests and pathogens.? The successful candidate will collaborate
with other groups in the Programme working on plant-pathogen interactions
developing innovative approaches to understand disease processes.?This post
provides an opportunity to influence biological research of direct impact to
agriculture.

The ideal candidate would be experienced in manipulating and curating large
biological datasets with a record of collaboration and integration with
biologists.The successful applicant is expected to have an interest in
plant-pathogen interactions and to develop their own research profile.The
candidate should have a PhD or equivalent in bioinformatics, biostatistics
or a related field.

Informal enquiries from:??Leighton.Pritchard at scri.ac.uk
<mailto:Leighton.Pritchard at scri.ac.uk> ?or?Lesley.Torrance at scri.ac.uk
<mailto:Lesley.Torrance at scri.ac.uk>

Salary Scale For All Posts:

*Band D/E, ?26,610 - ?37,534 (commensurate with experience)

*Appointments to Band F, ?42,769 - ?47,521 available for exceptional
candidates.

Candidates willing to apply for a research fellowship to further help
establish their own laboratory are encouraged to apply and will, if
successful, benefit from generous Institute support throughout the tenure of
their fellowship.

Further information on the above posts, including how to apply, is available
on the SCRI website athttp://www.scri.ac.uk/careers/vacancies
<http://www.scri.ac.uk/careers/vacancies> ?

Closing date -?Friday 19th?November 2010.

The Institute is an equal opportunities employer.
"""

Many thanks,

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From biopython at maubp.freeserve.co.uk  Thu Nov 11 11:45:43 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 16:45:43 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>

On Thu, Nov 11, 2010 at 4:08 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
> I finally found the time, and the 62Gb needed to test the TrEmbl database
> in uniprot xml format.

Is that the size on disk of the XML file? 62GB is a lot.

> the analisis ic currently going, but so far I've been able to parse 1
> million entries out of 12 millions (it will go overnight...)
>
> I've had just one problem with the entry: Q2LEH1_9ROSI
> in the downloaded files, there are multiple organism name fields, one of
> wich is empty:
>
> ...
> ?<organism evidence="EI1">
> ? ?<name type="scientific"></name>
> ? ?<name type="common">Populus tomentosa x P. bolleana) x P. tomentosa
> var. truncat</name>
> ...
>
> this part of the file is differentially reported on the uniprot server at:
> http://www.uniprot.org/uniprot/Q2LEH1.xml
>
> ...
> ?<organism evidence="EI1">
> ?<name type="scientific">(Populus tomentosa x P. bolleana) x P. tomentosa
> var. truncata</name>
> ...
>
> now, given also the missing start parenthesis, I think there is an error
> non the downloaded XML file.

It sounds like it - have you told UniProt?

> I've attached a patch that should cope with this issue. I don't know if
> there are more "errors" in the xml file.
> the patch was made on the current version of biopython master branch on
> github and is valid for commit ?9363c3cdc5f51805f247.
>
> Andrea

Checked in, thanks:
https://github.com/biopython/biopython/commit/38da3ff264fe180e903cda4c143a7aa9be3d431a

Peter


From andrea at biocomp.unibo.it  Thu Nov 11 11:08:58 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Thu, 11 Nov 2010 17:08:58 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
Message-ID: <ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>

I finally found the time, and the 62Gb needed to test the TrEmbl database
in uniprot xml format.
the analisis ic currently going, but so far I've been able to parse 1
million entries out of 12 millions (it will go overnight...)

I've had just one problem with the entry: Q2LEH1_9ROSI
in the downloaded files, there are multiple organism name fields, one of
wich is empty:

...
  <organism evidence="EI1">
    <name type="scientific"></name>
    <name type="common">Populus tomentosa x P. bolleana) x P. tomentosa
var. truncat</name>
...

this part of the file is differentially reported on the uniprot server at:
http://www.uniprot.org/uniprot/Q2LEH1.xml

...
 <organism evidence="EI1">
  <name type="scientific">(Populus tomentosa x P. bolleana) x P. tomentosa
var. truncata</name>
...

now, given also the missing start parenthesis, I think there is an error
non the downloaded XML file.

I've attached a patch that should cope with this issue. I don't know if
there are more "errors" in the xml file.
the patch was made on the current version of biopython master branch on
github and is valid for commit  9363c3cdc5f51805f247.

Andrea
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UniprotIO.patch
Type: /
Size: 610 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101111/3f9a10ae/attachment.bin>

From andrea at biocomp.unibo.it  Thu Nov 11 12:15:08 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Thu, 11 Nov 2010 18:15:08 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
Message-ID: <cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>


>
> Is that the size on disk of the XML file? 62GB is a lot.

yes, my macbook is getting very hot...

> It sounds like it - have you told UniProt?

I've notified them, let's see what they say...

Anyhow the parser works. I just don't know if we should have an
internet browser-like approach interpreting errors, or just be
consistent and raise an error if there is a format error.

in this case an empty organism name is an error.


From biopython at maubp.freeserve.co.uk  Thu Nov 11 14:16:57 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 19:16:57 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
Message-ID: <AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> 2010/11/10 Tiago Ant?o <tiagoantao at gmail.com>:
>>
>> I know, this might be an issue with the jython version (being just a
>> release candidate). I am going to wait for results on 2.5.1 and
>> compare. Or I might just install it myself and see.
>
> I also see the same test_get_alleles failure on the Mac and on
> Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase
> candidate specific issue.

Yes, the order just came from the order of a dict's keys - which
is Python implementation dependent. Quick fix committed:

https://github.com/biopython/biopython/commit/2aa604e54df02804219e092141bb32728b021a64

If you actually care about the order, then perhaps add a
sorted(...) to the get_alleles method itself instead?

Peter


From biopython at maubp.freeserve.co.uk  Thu Nov 11 15:19:05 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 20:19:05 +0000
Subject: [Biopython-dev] Jython on Windows: OSError deleting files
Message-ID: <AANLkTimO-YUNPy6U7J-QAYxW1a-OZcG0rYBDzZQFHAHy@mail.gmail.com>

Hi all,

I recently installed Jython 2.5.1 on Windows XP (32 bit) for
use as a build slave. This showed up some new bugs, in
particular several problems with trying to delete temp
files triggering an OSError.

It turns out this can be triggered by trying to delete a
file while we still have a handle open on it. This is a
Windows limitation, but we don't see it on normal
Python because there the garbage collector closes
handles promptly when they go out of scope. The Java
garbage collector doesn't do that. See also:

http://web.archiveorange.com/archive/v/8tc1Z6ysA03SXedms7TA

In particular, I am aware that if given a filename the
SeqIO and AlignIO read and parse functions did not
explicitly close the handle they open. I was intending
to address this with a with statement in Python 2.5+,
but it can be solved in Python 2.4 as well. I have
started to address this, e.g.
https://github.com/biopython/biopython/commit/0fb039b745b0b2ddacf2a6c9ee8afcdb56018f3c
https://github.com/biopython/biopython/commit/936ea5f348cc1feea8556d263761e77ce960217e

Assuming it will be easier to fix on Python 2.5+, it
might be pragmatic to ignore the issue in the short
term since it only seems to affect Jython on Windows.

Peter

From rjalves at igc.gulbenkian.pt  Thu Nov 11 17:06:06 2010
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Thu, 11 Nov 2010 22:06:06 +0000
Subject: [Biopython-dev] Uniprot parsers
Message-ID: <4CDC68CE.9070401@igc.gulbenkian.pt>

Hi everyone,

With the arrival of the Uniprot XML parser, is the swiss format still
going to be maintained?

I just clashed with a 'swiss' format parsing problem present in the
1.55b release (and previous releases). Seems like the format might have
changed.

One random case is [1] where all of the 2nd and following IDs are
ignored by the parser. In Ensembl, for instance, the parser only
collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd)
identifiers.

Is this a known issue?

Regards,
Renato

[1] http://www.uniprot.org/uniprot/P31946.txt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101111/01034c9b/attachment.bin>

From biopython at maubp.freeserve.co.uk  Thu Nov 11 17:26:22 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 22:26:22 +0000
Subject: [Biopython-dev] Uniprot parsers
In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt>
References: <4CDC68CE.9070401@igc.gulbenkian.pt>
Message-ID: <AANLkTikNd0FuWn8_QspaRrmGe_ahLxbJ6=Hkt+1+GOfi@mail.gmail.com>

On Thu, Nov 11, 2010 at 10:06 PM, Renato Alves
<rjalves at igc.gulbenkian.pt> wrote:
> Hi everyone,
>
> With the arrival of the Uniprot XML parser, is the swiss format still
> going to be maintained?

Definitely yes in the short term, for one thing the swiss files are
smaller and much faster to parse. I suspect UniProt themselves
may want to retire the swiss text format at some point, but moving
every user over to XML will take some time.

> I just clashed with a 'swiss' format parsing problem present in the
> 1.55b release (and previous releases). Seems like the format might have
> changed.
>
> One random case is [1] where all of the 2nd and following IDs are
> ignored by the parser. In Ensembl, for instance, the parser only
> collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd)
> identifiers.
>
> Is this a known issue?
>

No - could you file a bug one this with a short example to explain
what result you get, and what you want.

Thanks,

Peter

From bugzilla-daemon at portal.open-bio.org  Thu Nov 11 18:09:04 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Nov 2010 18:09:04 -0500
Subject: [Biopython-dev] [Bug 3156] New: UniProt XML and SwissProt parsers
	silently fail to parse all of database references
Message-ID: <bug-3156-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=3156

           Summary: UniProt XML and SwissProt parsers silently fail to parse
                    all of database references
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: rjalves at igc.gulbenkian.pt


Example code:

from Bio import SeqIO, ExPASy
entry = SeqIO.read(ExPASy.get_sprot_raw('P31946'), 'swiss')

If you then inspect entry.dbxrefs, you can see that it includes:

['Ensembl:ENST00000353703', 'Ensembl:ENST00000372839']

but not
['Ensembl:ENSP00000300161', 'Ensembl:ENSG00000166913'.
'Ensembl:ENSP00000361930', 'Ensembl:ENSG00000166913']

which are present in the original file as:
DR   Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913.
DR   Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913.


The same happens with the XML format and the new uniprot-xml parser where the
original file contains:

<dbReference type="Ensembl" id="ENST00000353703" key="75">
<property type="protein sequence ID" value="ENSP00000300161" />
<property type="gene ID" value="ENSG00000166913" />
</dbReference>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From rjalves at igc.gulbenkian.pt  Thu Nov 11 17:32:41 2010
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Thu, 11 Nov 2010 22:32:41 +0000
Subject: [Biopython-dev] Uniprot parsers
In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt>
References: <4CDC68CE.9070401@igc.gulbenkian.pt>
Message-ID: <4CDC6F09.9090506@igc.gulbenkian.pt>

Actually I just tested the Uniprot-XML parser and it seems to suffer
from the same issue...

It ignores the following XML "properties":

<dbReference type="Ensembl" id="ENST00000353703" key="75">
<property type="protein sequence ID" value="ENSP00000300161" />
<property type="gene ID" value="ENSG00000166913" />
</dbReference>


Quoting Renato Alves on 11/11/2010 10:06 PM:
> Hi everyone,
> 
> With the arrival of the Uniprot XML parser, is the swiss format still
> going to be maintained?
> 
> I just clashed with a 'swiss' format parsing problem present in the
> 1.55b release (and previous releases). Seems like the format might have
> changed.
> 
> One random case is [1] where all of the 2nd and following IDs are
> ignored by the parser. In Ensembl, for instance, the parser only
> collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd)
> identifiers.
> 
> Is this a known issue?
> 
> Regards,
> Renato
> 
> [1] http://www.uniprot.org/uniprot/P31946.txt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101111/efefdb12/attachment.bin>

From bugzilla-daemon at portal.open-bio.org  Thu Nov 11 18:50:46 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Nov 2010 18:50:46 -0500
Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers
	silently fail to parse all of database references
In-Reply-To: <bug-3156-42@http.bugzilla.open-bio.org/>
Message-ID: <201011112350.oABNokG9031101@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3156


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-11 18:50 EST -------
That was by design, dbxrefs is a flat list and for consistency with other
formats we have only stored the primary identifier.

Would you regard this as two primary cross references, or six?

DR   Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913.
DR   Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From bugzilla-daemon at portal.open-bio.org  Thu Nov 11 18:59:20 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Nov 2010 18:59:20 -0500
Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers
	silently fail to parse all of database references
In-Reply-To: <bug-3156-42@http.bugzilla.open-bio.org/>
Message-ID: <201011112359.oABNxKcn031294@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3156


------- Comment #2 from rjalves at igc.gulbenkian.pt  2010-11-11 18:59 EST -------
Five primary references since ENSG00000166913 is repeated twice (once per
line).

More precisely,
ENSG = Ensembl Gene
ENST = Ensembl Transcript
ENSP = Ensembl Protein


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

From andrea at biocomp.unibo.it  Thu Nov 11 20:02:14 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 02:02:14 +0100 (CET)
Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers
 silently fail to parse all of database references
In-Reply-To: <mailman.2563.1289519960.2958.biopython-dev@lists.open-bio.org>
References: <mailman.2563.1289519960.2958.biopython-dev@lists.open-bio.org>
Message-ID: <7c21462addfa62e09fd6c42135cc7d76.squirrel@lipid.biocomp.unibo.it>

it was by construction also in the XML format,
there is also a comment at line 343 of UniprotIO.py to address
this issue.
to parse this type of data an adapter for each db type should be written,
since each DB has different data, ancd can have different structurese.
also note that the Ensembl reference fields as recently undergone a
change of format in the XML file:

http://www.uniprot.org/docs/xml_news.htm

this happens in release 2010_10.

Andrea


From andrea at biocomp.unibo.it  Fri Nov 12 05:24:07 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 11:24:07 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
Message-ID: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>

WIth the submitted patch the parser was able to correctly parse 12.347.303
entries in
the 62Gb XML file in 2h 13m.
it looks like a reasonable performance to me, since you are going to spend
more time
in downloading the 8Gb gzipped file and decompressing it.

Andrea


From biopython at maubp.freeserve.co.uk  Fri Nov 12 05:29:51 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 12 Nov 2010 10:29:51 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
	<430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>

On Fri, Nov 12, 2010 at 10:24 AM, Andrea Pierleoni wrote:
> WIth the submitted patch the parser was able to correctly parse 12.347.303
> entries in the 62Gb XML file in 2h 13m.

That's good - but I thought the patch broke the unit test so I reverted it
last night. I'll double check this.

> it looks like a reasonable performance to me, since you are going to spend
> more time in downloading the 8Gb gzipped file and decompressing it.

On the other hand, you only download it once, and will probably only
decompress it once (although you can parse gzipped files from within
python if you want to), but you will parse it many times.

My point is it probably could be made faster (if anyone wanted to spend
the time), but it is fast enough already to be useful, and worth having
in Biopython :)

Peter

From andrea at biocomp.unibo.it  Fri Nov 12 06:05:43 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 12:05:43 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
	<430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>
Message-ID: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it>


> That's good - but I thought the patch broke the unit test so I reverted it
> last night. I'll double check this.
>

yes I've seen it in github, can you fix it?


> On the other hand, you only download it once, and will probably only
> decompress it once (although you can parse gzipped files from within
> python if you want to), but you will parse it many times.
>

well, if your looking to performance, you're not scanning a 62Gb file each
time
you search for an entry, but your going to index it. the of course it
depends on
what you are doing... but, given the monthly release, maybe you're
downloading
and decompressing (or parsing a compressed file) once a month.

> My point is it probably could be made faster (if anyone wanted to spend
> the time), but it is fast enough already to be useful, and worth having
> in Biopython :)

Yes, I hope it can be made faster, but I have no idea about this, since
the process is very straightforward. I did not make any profiling of the
parser, so I cannot exclude some
bottleneck.
the only obvious speed up would be using the multiprocessing library in
multi-cpu
system, but I've never seen it used in biopython.
It should be really easy to implement, and maybe we can think about it
after python 2.4
support is dropped.  as far as i know, multiprocessing is included in
python 2.6 and
available in python  2.5.

On the other hand, Biopython has the fastest uniprot XML parser among Bio*
projects
and (to my knowledge) the fastest public parser on the planet ;) I bet
Uniprot guys have
their parser...

Andrea


From biopython at maubp.freeserve.co.uk  Fri Nov 12 07:00:42 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 12 Nov 2010 12:00:42 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
	<430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>
	<6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTi=zwkXWAUfkDaEAfU+FxrNAqX5KLUr0a8uOGZUY@mail.gmail.com>

On Fri, Nov 12, 2010 at 11:05 AM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>
>> That's good - but I thought the patch broke the unit test so I reverted it
>> last night. I'll double check this.
>>
>
> yes I've seen it in github, can you fix it?
>

Probably. I'll make time to look at it before the Biopython 1.56 release
(which is unlikely to happen this week, delayed by the identification of
some problems running under Jython on Windows).

>> On the other hand, you only download it once, and will probably only
>> decompress it once (although you can parse gzipped files from within
>> python if you want to), but you will parse it many times.
>>
>
> well, if your looking to performance, you're not scanning a 62Gb file
> each time you search for an entry, but your going to index it. the of
> course it depends on what you are doing... but, given the monthly
> release, maybe you're downloading and decompressing (or parsing
> a compressed file) once a month.

Yeah, it depends.

>> My point is it probably could be made faster (if anyone wanted to spend
>> the time), but it is fast enough already to be useful, and worth having
>> in Biopython :)
>
> Yes, I hope it can be made faster, but I have no idea about this, since
> the process is very straightforward. I did not make any profiling of the
> parser, so I cannot exclude some bottleneck.

That would be worth while at some point.

> the only obvious speed up would be using the multiprocessing library in
> multi-cpu system, but I've never seen it used in biopython.

We haven't been able to due to the Python 2.4 requirement, but
I know of people using Biopython and multiprocessing together.

> It should be really easy to implement, and maybe we can think about
> it after python 2.4 support is dropped. ?as far as i know, multiprocessing
> is included in python 2.6 and available in python ?2.5.

Personally I'd try profiling the current single threaded code before
going to multiprocessing.

> On the other hand, Biopython has the fastest uniprot XML parse
> among Bio* projects and (to my knowledge) the fastest public
> parser on the planet ;) I bet Uniprot guys have their parser...

Which of the other Bio* projects have a Uniprot XML parser?
(Or was that intended as a joke?)

Peter


From p.j.a.cock at googlemail.com  Fri Nov 12 12:18:52 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 12 Nov 2010 17:18:52 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
	<AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>
Message-ID: <AANLkTikWZt42DK2rp2hxhWBKGGHpA26QQu4te-m4hrnA@mail.gmail.com>

Hi all,

I've exchanged a few emails with Tiago off list regarding an inconsistent
test_PopGen_GenePop_EasyController.py problem (most visible on
Jython), giving error "Unable to open file genepop.txt".

I've just had it from Python 2.7 on a 32bit Linux machine:

======================================================================
ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest)
Test get pairwise Fst.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
line 98, in test_get_avg_fst_pair
    pop_fis =  self.ctrl.get_avg_fst_pair()
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
line 162, in get_avg_fst_pair
    return self._controller.calc_fst_pair(self._fname)[1]
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 819, in calc_fst_pair
    self._run_genepop([".ST2", ".MIG"], [6,2], fname)
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 296, in _run_genepop
    % (ret, e_out.strip().split("\n",1)[0]))
IOError: GenePop error -11, Unable to open file genepop.txt

======================================================================
ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest)
Test get average Fst for pairwise pops on a locus.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
line 93, in test_get_avg_fst_pair_locus
    self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45)
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
line 166, in get_avg_fst_pair_locus
    iter = self._controller.calc_fst_pair(self._fname)[0]
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 819, in calc_fst_pair
    self._run_genepop([".ST2", ".MIG"], [6,2], fname)
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 296, in _run_genepop
    % (ret, e_out.strip().split("\n",1)[0]))
IOError: GenePop error -11, Unable to open file genepop.txt

----------------------------------------------------------------------


This failed twice in a row, then passed four times in a row (Linux, Python 2.7).
I suspect the issue was related to machine IO load - during the first
tests I had
something compiling at the same time. I can't reproduce it on demand :(

I've also seen it on the Mac with Apple's Python 2.6 (although usually it is
usually fine).

However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac.

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 12 12:47:22 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 12 Nov 2010 17:47:22 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <AANLkTimkZk3n9VLb3fLtg2-GwvpMhZreSVnBbB4-LB6W@mail.gmail.com>

On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> I've mentioned in recent threads that I think we should try and
> release Biopython 1.56 this month (November 2010).
>
> I think the NEWS file is pretty up to date, and covers important
> new functionality like Andrea Pierleoni's UniProt XML parser
> and the IMGT support (with Uri Laserson).
>
> Is there any other functionality which is ready for merging?
>
> For example, Tiago - you've been doing lots of work on your
> branch with the PopGen code. Is that code ready? I'm willing
> to do the git merge/rebase.
>
> Is there any reason to bother with a beta release this time?
>
> If there are no pressing additions, I may be able to do the
> release tomorrow - otherwise how about aiming for Thursday
> or Friday next week (11 or 12 November)?

As people will have noticed, the release didn't happen this week.

Tiago has been doing some excellent work with the prototype
buildbot server (see http://events.open-bio.org:8010/grid for
the current temporary home), and as part of this we've set
up a few machines as buildslaves. See this thread:
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008376.html

Running under Jython on the Mac showed a few problems
which appear to now be sorted, other than an apparent
problem with the GenePop tool.

Unfortunately running under Jython on Windows XP has
revealed several new problems, e.g.
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html

As things stand all the tests (*) are fine on "C" Python on
Linux, Mac, and Windows. They are also fine on Jython
on Linux, give some warnings on Jython on Mac, and 3
errors on Windows.

Hopefully we can address these three test failures (or
at least understand them) and do Biopython 1.56 at
the end of next week instead.

Peter

(*) We haven't audited all the slave test output to check
which tests are being skipped due to missing optional
dependencies yet. e.g. command line tools, or Python
modules like ReportLab or NetworkX.

From p.j.a.cock at googlemail.com  Fri Nov 12 12:55:57 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 12 Nov 2010 17:55:57 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTikWZt42DK2rp2hxhWBKGGHpA26QQu4te-m4hrnA@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
	<AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>
	<AANLkTikWZt42DK2rp2hxhWBKGGHpA26QQu4te-m4hrnA@mail.gmail.com>
Message-ID: <AANLkTikoqMEDZvQAn+tvedvhe6FE+udf4==FdcA-V4rz@mail.gmail.com>

2010/11/12 Peter Cock <p.j.a.cock at googlemail.com>:
> Hi all,
>
> I've exchanged a few emails with Tiago off list regarding an inconsistent
> test_PopGen_GenePop_EasyController.py problem (most visible on
> Jython), giving error "Unable to open file genepop.txt".
>
> I've just had it from Python 2.7 on a 32bit Linux machine:
>
> ======================================================================
> ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest)
> Test get pairwise Fst.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
> line 98, in test_get_avg_fst_pair
> ? ?pop_fis = ?self.ctrl.get_avg_fst_pair()
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
> line 162, in get_avg_fst_pair
> ? ?return self._controller.calc_fst_pair(self._fname)[1]
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 819, in calc_fst_pair
> ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname)
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 296, in _run_genepop
> ? ?% (ret, e_out.strip().split("\n",1)[0]))
> IOError: GenePop error -11, Unable to open file genepop.txt
>
> ======================================================================
> ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest)
> Test get average Fst for pairwise pops on a locus.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
> line 93, in test_get_avg_fst_pair_locus
> ? ?self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45)
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
> line 166, in get_avg_fst_pair_locus
> ? ?iter = self._controller.calc_fst_pair(self._fname)[0]
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 819, in calc_fst_pair
> ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname)
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 296, in _run_genepop
> ? ?% (ret, e_out.strip().split("\n",1)[0]))
> IOError: GenePop error -11, Unable to open file genepop.txt
>
> ----------------------------------------------------------------------
>
>
> This failed twice in a row, then passed four times in a row (Linux, Python 2.7).
> I suspect the issue was related to machine IO load - during the first
> tests I had something compiling at the same time. I can't reproduce
> it on demand :(
>
> I've also seen it on the Mac with Apple's Python 2.6 (although usually it is
> usually fine).
>
> However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac.

Well right now on my Mac with Jython, the test passes but with lots of warnings:

$ jython test_PopGen_GenePop_EasyController.py
Test basic info. ... ok
Test Nm estimation. ... ok
Test allele frequency. ... ok
Test get alleles. ... ok
Test get alleles for all populations. ... ok
Test average Fis. ... ok
Test get pairwise Fst. ... ok
Test get average Fst for pairwise pops on a locus. ... Exception
OSError: [Errno 0] couldn't delete file: 'big.gen.INF' in <bound
method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x1>> ignored
Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in
<bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x2>> ignored
ok
Test F stats. ... ok
Test get Fis. ... Exception OSError: [Errno 0] couldn't delete file:
'big.gen.ST2' in <bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x3>> ignored
ok
Test genotype count. ... ok
Test heterozygosity info. ... Exception OSError: [Errno 0] couldn't
delete file: 'big.gen.INF' in <bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x4>> ignored
Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in
<bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x5>> ignored
ok
Test multilocus F stats. ... ok

----------------------------------------------------------------------
Ran 13 tests in 5.912s

Or another example, the same machine as a build slave:

http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/9/steps/shell/logs/stdio

On the previous build Jython on Mac gave the same error I reported
above on Linux with "C" Python 2.7:

http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/7/steps/shell/logs/stdio

Peter


From andrea at biocomp.unibo.it  Fri Nov 12 15:45:24 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 21:45:24 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
Message-ID: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>


> We haven't been able to due to the Python 2.4 requirement, but
> I know of people using Biopython and multiprocessing together.
>

good


> Personally I'd try profiling the current single threaded code before
> going to multiprocessing.
>

yes, of course.

>> On the other hand, Biopython has the fastest uniprot XML parse
>> among Bio* projects and (to my knowledge) the fastest public
>> parser on the planet ;) I bet Uniprot guys have their parser...
>
> Which of the other Bio* projects have a Uniprot XML parser?
> (Or was that intended as a joke?)
>

It was both a joke and a matter of fact, since I don't know about other
publicly available parsers. Usually I look at a glass as half full...

Andrea


From gawbul at gmail.com  Sat Nov 13 16:24:43 2010
From: gawbul at gmail.com (Steve Moss)
Date: Sat, 13 Nov 2010 21:24:43 +0000
Subject: [Biopython-dev] Developing for the BioPython project...
Message-ID: <AANLkTinrVnyZSgr3WbX40-ACMdcjhAShBUAhNvb63Hg=@mail.gmail.com>

Hi all,

I've just started a PhD centring around evolutionary comparative genomics,
and will be focusing on bioinformatics and computational biology
methodology.

I'm really keen to use Python and BioPython in particular throughout my PhD
and would like to contribute any code I can to aid in promoting BioPython as
viable alternative to BioPerl, which I feel has a larger user
base currently? Is there any particular process of registration to become
involved with development, or is it just a case of fork'ing the repository
from github?

Cheers,

Steve
-- 
Kindest regards,

Steve Moss
http://stevemoss.ath.cx

From eric.talevich at gmail.com  Sat Nov 13 18:05:24 2010
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 13 Nov 2010 18:05:24 -0500
Subject: [Biopython-dev] Developing for the BioPython project...
In-Reply-To: <AANLkTinrVnyZSgr3WbX40-ACMdcjhAShBUAhNvb63Hg=@mail.gmail.com>
References: <AANLkTinrVnyZSgr3WbX40-ACMdcjhAShBUAhNvb63Hg=@mail.gmail.com>
Message-ID: <AANLkTik86tnDLV6M4sFvjNJ_Kb_MTDGwv19U8njtwCrk@mail.gmail.com>

On Sat, Nov 13, 2010 at 4:24 PM, Steve Moss <gawbul at gmail.com> wrote:

> Hi all,
>
> I've just started a PhD centring around evolutionary comparative genomics,
> and will be focusing on bioinformatics and computational biology
> methodology.
>
> I'm really keen to use Python and BioPython in particular throughout my PhD
> and would like to contribute any code I can to aid in promoting BioPython
> as
> viable alternative to BioPerl, which I feel has a larger user
> base currently? Is there any particular process of registration to become
> involved with development, or is it just a case of fork'ing the repository
> from github?
>
>
Hi Steve,

If you've joined the biopython-dev mailing list, you're in the club. Feel
free to fork away!

To get a feel for where development is focused right now, you can look at
our wiki page for active projects:
http://biopython.org/wiki/Active_projects

We're also collectively working on Python 3 compatibility (C extensions
still need some work), though that isn't listed.

Since you're a new grad student, you might have some leeway to get involved
with Google Summer of Code next summer. The project ideas for Biopython,
Open Bio, and NESCent drummed up last year are still worth doing, or might
inspire you do do something else on your own:
http://biopython.org/wiki/Google_Summer_of_Code
http://www.open-bio.org/wiki/Google_Summer_of_Code
https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2010

Cheers,
Eric

From biopython at maubp.freeserve.co.uk  Mon Nov 15 09:34:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 15 Nov 2010 14:34:40 +0000
Subject: [Biopython-dev] FASTA filtering by ID
Message-ID: <AANLkTikF7+nPAsBd7u=yYx=AKXmqrNpzUOP8RRnd40o8@mail.gmail.com>

Hi all,

Something I want to do in several of my workflows is to filter a
FASTA file (or potentially other format sequence files) using a
list of desired identifiers (e.g. a column from a tabular file).

Right now I can achieve this with three steps in Galaxy.
Suppose I have:

Dataset #1, FASTA file

Dataset #2, Tabular file with identifiers of interest (e.g. BLAST hits,
or filtered output from a sequence analysis tool)

Then:

Create tabular Dataset #3 using FASTA-to-tabular on Dataset #1,
subject to the enhancement proposed here:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003717.html

Create tabular Dataset #4 using join on Datasets #2 and #3 using the
matched identifier columns. This does the filtering.

Create FASTA Dataset #5 using tabular-to-FASTA on Dataset #4.

This works (at least for reasonably sized datasets), but requires
three steps and the creation of at least two temporary files.

I'd like to introduce another tool under "FASTA manipulation"
to do it on one step (rather than three). Am I going against
the apparent Galaxy ideal that complex manipulations should
be done with tabular files? Would such a FASTA filter tool be
of interest to add directly to Galaxy (e.g. under the "FASTA
manipulation" section), or better off on the community tool shed?

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Mon Nov 15 12:05:00 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 15 Nov 2010 17:05:00 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTimkZk3n9VLb3fLtg2-GwvpMhZreSVnBbB4-LB6W@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimkZk3n9VLb3fLtg2-GwvpMhZreSVnBbB4-LB6W@mail.gmail.com>
Message-ID: <AANLkTiknLLR3=7DKk5PANLAAMjPHK_kE9Detd==koZCe@mail.gmail.com>

On Fri, Nov 12, 2010 at 5:47 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi all,
>>
>> I've mentioned in recent threads that I think we should try and
>> release Biopython 1.56 this month (November 2010).
>>
>> ...
>
> As people will have noticed, the release didn't happen this week.
>
> ...
>
> Unfortunately running under Jython on Windows XP has
> revealed several new problems, e.g.
> http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html
>
> ...
>
> Hopefully we can address these three test failures (or
> at least understand them) and do Biopython 1.56 at
> the end of next week instead.

Two of the problems on Jython on Windows were down
to the Windows specific command line tool detection
not being used, now fixed:

https://github.com/biopython/biopython/commit/db41d7e4bfd8f5d4ea44bf8254334fcd7b76474f
https://github.com/biopython/biopython/commit/7e5b71093c8408de140de1937480e26aaaa5daf1

There was also a heap space problem solved by a
more memory efficient __getitem__ method for the
UnknownSeq object (still room for improvement here).

https://github.com/biopython/biopython/commit/125d8d31d07f57628c231286afae99a178e6f2c5

So, we now have a clean bill of health from the offline
tests run on the buildslaves (apart from the occasional
GenePop failure where retesting can make it work).

I still want to look at the SeqIO/AlignIO handle issue,
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html
and also the UniProt XML issue,
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008440.html

Peter

From peter at maubp.freeserve.co.uk  Thu Nov 18 10:47:08 2010
From: peter at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Nov 2010 15:47:08 +0000
Subject: [Biopython-dev] Dropping Python 2.4 Support?
Message-ID: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>

Dear Biopythoneers,

Are any of you still using Biopython on Python 2.4?
http://news.open-bio.org/news/2010/11/dropping-python24-support/

Please get in touch if dropping support for Python 2.4 would be a
problem. Otherwise we plan for Biopython 1.56 (expected by the
end of this month) to be our last release to work with Python 2.4.

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Thu Nov 18 12:45:30 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Nov 2010 17:45:30 +0000
Subject: [Biopython-dev] FASTA filtering by ID
In-Reply-To: <AANLkTikF7+nPAsBd7u=yYx=AKXmqrNpzUOP8RRnd40o8@mail.gmail.com>
References: <AANLkTikF7+nPAsBd7u=yYx=AKXmqrNpzUOP8RRnd40o8@mail.gmail.com>
Message-ID: <AANLkTinaE7kwWRO-rSN_5N9HoMNMQhDfrF+0r0JSv5So@mail.gmail.com>

Sorry folk - I meant to post that to the Galaxy development
mailing list, http://lists.bx.psu.edu/listinfo/galaxy-dev

Peter

From biopython at maubp.freeserve.co.uk  Wed Nov 24 13:03:03 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Nov 2010 18:03:03 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>

Hi Andrea,

I *think* I have fixed the problem with empty names in the UniProt XML
format, without affecting the unit tests, but I don't have the 62GB free to
unpack uniprot_trembl.xml.gz to try it out:

https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700

Would you be able to retest the trunk code on that please?

I also changed the handling of the organism host (where present)
in both the UniProt and SwissProt parsers to be more consistent.
I've checked uniprot_sprot.dat still parses, but haven't tried the
much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
again, would you be able to retest the "swiss" text parser too?

Many thanks,

Peter

P.S. Did you get any reply from UniProt about the apparent error in
the Q2LEH1 record within uniprot_trembl.xml.gz?

From andrea at biocomp.unibo.it  Thu Nov 25 11:09:28 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Thu, 25 Nov 2010 17:09:28 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
	<AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
Message-ID: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>

> Hi Andrea,
>
> I *think* I have fixed the problem with empty names in the UniProt XML
> format, without affecting the unit tests, but I don't have the 62GB free
> to
> unpack uniprot_trembl.xml.gz to try it out:
>
> https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700
>
> Would you be able to retest the trunk code on that please?
>

I've just completed a run on the 8Gb gzipped trembl file (I don't have the
free 62Gb either) an it was ok, with zero errors.
By the way it took just 2h 18m, the same time it took on the uncompressed
62Gb XML file. So it's definitely better not to decompress this file...


> I also changed the handling of the organism host (where present)
> in both the UniProt and SwissProt parsers to be more consistent.
good

> I've checked uniprot_sprot.dat still parses, but haven't tried the
> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
> again, would you be able to retest the "swiss" text parser too?

I'll test this too and let you know.

>
> Many thanks,
>
> Peter
>
> P.S. Did you get any reply from UniProt about the apparent error in
> the Q2LEH1 record within uniprot_trembl.xml.gz?
>

Unfortunately not.

Andrea


From andrea at biocomp.unibo.it  Fri Nov 26 08:54:29 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 26 Nov 2010 14:54:29 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
	<AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
	<17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>
Message-ID: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it>


>> I've checked uniprot_sprot.dat still parses, but haven't tried the
>> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
>> again, would you be able to retest the "swiss" text parser too?
>
> I'll test this too and let you know.
>

Test completed on the .dat file, all entries were parsed without errors.
This time it took almost 3h but was done on the gzipped file stored in a
removable 5400rpm hard drive. the XML file was on an SSD so maybe that's
why it is faster with that parser.


From biopython at maubp.freeserve.co.uk  Fri Nov 26 09:06:58 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 14:06:58 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
	<AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
	<17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>
	<1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTimpKYSANr8R3LhLXOkCxjqu51DragKhosZ5BYtS@mail.gmail.com>

On Fri, Nov 26, 2010 at 1:54 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>
>>> I've checked uniprot_sprot.dat still parses, but haven't tried the
>>> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
>>> again, would you be able to retest the "swiss" text parser too?
>>
>> I'll test this too and let you know.
>>
>
> Test completed on the .dat file, all entries were parsed without errors.
> This time it took almost 3h but was done on the gzipped file stored in a
> removable 5400rpm hard drive. the XML file was on an SSD so maybe that's
> why it is faster with that parser.
>

Excellent - thanks.

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 26 09:08:59 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 14:08:59 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
Message-ID: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>

Hi all,

No one has raised any outstanding issues to warrant delaying
the 1.56 release any further, so I plan to do it now. Please don't
make any commits to the master branch until further notice.

Thank you,

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 26 10:19:20 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 15:19:20 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
Message-ID: <AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>

On Fri, Nov 26, 2010 at 2:08 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> No one has raised any outstanding issues to warrant delaying
> the 1.56 release any further, so I plan to do it now. Please don't
> make any commits to the master branch until further notice.
>
> Thank you,
>
> Peter

I think that's the source code bundles and Windows installers
all done and uploaded, plus the PyPI upload done. I'll work on
a release announcement for the news server and mailing list.

In the meantime, if anyone could check the files as a sanity
test (just in case I missed something), please do. Get them
from here: http://biopython.org/DIST/

Thanks,

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 26 11:07:48 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 16:07:48 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
Message-ID: <AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>

On Fri, Nov 26, 2010 at 3:19 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Nov 26, 2010 at 2:08 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi all,
>>
>> No one has raised any outstanding issues to warrant delaying
>> the 1.56 release any further, so I plan to do it now. Please don't
>> make any commits to the master branch until further notice.
>>
>> Thank you,
>>
>> Peter
>
> I think that's the source code bundles and Windows installers
> all done and uploaded, plus the PyPI upload done. I'll work on
> a release announcement for the news server and mailing list.
>

Posted online,
http://news.open-bio.org/news/2010/11/biopython-1-56-released/

If anyone spots a typo please drop me an email, and I can fix
it - hopefully before sending out the email announcement which
I'll do a bit later on in case there are any suggested revisions
to the text.

Regards,

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 26 11:25:42 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 16:25:42 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikhuis9NVte79m9PZMb9pNoFBQvqqq+PwLXstAf@mail.gmail.com>
References: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>
	<645847.84052.qm@web62404.mail.re1.yahoo.com>
	<AANLkTikhuis9NVte79m9PZMb9pNoFBQvqqq+PwLXstAf@mail.gmail.com>
Message-ID: <AANLkTi=E2me=8XBN7LGNRnQK5Kv7Qvu92Uue3qyhsstj@mail.gmail.com>

On Fri, Nov 5, 2010 at 12:01 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>> Bio/Transcribe.py
>> Bio/Translate.py
>>
>> These are still imported from Bio/Encodings/IUPACEncoding.py, which
>> is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code
>> is doing. Does anybody know?
>
> Ah right - sorry, that had slipped my mind:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html
>
> I had suggested we leave Bio.Transcribe and Bio.Translate in for
> Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager,
> and Bio.Encodings.IUPACEncoding) for Biopython 1.57

Hi Michiel,

Now Biopython 1.56 is out, would you like to remove those modules?

Thanks

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 26 14:31:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 19:31:40 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
	<AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
Message-ID: <AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>

On Fri, Nov 26, 2010 at 4:07 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Posted online,
> http://news.open-bio.org/news/2010/11/biopython-1-56-released/
>
> If anyone spots a typo please drop me an email, and I can fix
> it - hopefully before sending out the email announcement which
> I'll do a bit later on in case there are any suggested revisions
> to the text.

I aim to send out the email in a hour or so's time. If I forget,
Brad - you're in a suitable time zone right?

By the way - please consider the git freeze over (I should have
said so explicitly earlier - sorry about that).

Peter

From chapmanb at 50mail.com  Fri Nov 26 15:20:04 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 26 Nov 2010 15:20:04 -0500
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
	<AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
	<AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>
Message-ID: <20101126202003.GC29878@sobchak.mgh.harvard.edu>

Peter;

> > Posted online,
> > http://news.open-bio.org/news/2010/11/biopython-1-56-released/
> >
> > If anyone spots a typo please drop me an email, and I can fix
> > it - hopefully before sending out the email announcement which
> > I'll do a bit later on in case there are any suggested revisions
> > to the text.

Thanks for all the hard work getting this together. Everything looks
great and thanks for pushing to PyPi.

The only thing I noticed was that after "Note as previously
announced" there is an extra <a> tag which causes the rest of the
text through the authors to be a link. Not a big deal.

Congrats on the new release,
Brad

From biopython at maubp.freeserve.co.uk  Fri Nov 26 16:17:23 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 21:17:23 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <20101126202003.GC29878@sobchak.mgh.harvard.edu>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
	<AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
	<AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>
	<20101126202003.GC29878@sobchak.mgh.harvard.edu>
Message-ID: <AANLkTik4E6XKKPbkbFeP--2p_a-0dvypLyyj7UcQTsZb@mail.gmail.com>

Hi Brad,

On Fri, Nov 26, 2010 at 8:20 PM, Brad Chapman wrote:
>
> Thanks for all the hard work getting this together. Everything looks
> great and thanks for pushing to PyPi.

I must say a public thank you to Tiago too - having the buildbot
up and running (even with the handful of buildslaves we have
now) has been a great reassurance that things are looking OK.

This will be particularly helpful for spotting problems on Python
3 (since it is a hassle to test by hand right now) and older
versions of Python - my main machine these days run
Python 2.6.

As an example, for a while the trunk had been broken on
Python 2.4 without anyone noticing. This was when I merged
the UniProt XML parser without having checked the unit tests
were skipped nicely on Python 2.4 when ElementTree was
missing.

Having the tests run every night automatically is much
safer - so thanks Tiago :)

[Hopefully we'll get the buildbot running on a dedicated
VM before too long - we're in touch with the OBF admins
about this already.]

> The only thing I noticed was that after "Note as previously
> announced" there is an extra <a> tag which causes the rest
> of the text through the authors to be a link. Not a big deal.

Well spotted - I'd actually put <a/> rather than </a> which must
have confused the formatting because it looked OK.

Thanks!

Peter

From biopython at maubp.freeserve.co.uk  Fri Nov 26 18:12:14 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 23:12:14 +0000
Subject: [Biopython-dev] Biopython 1.56
Message-ID: <AANLkTim+FCTZDGm-jfKW-T14VRNESPUJow2c4Acm-U6K@mail.gmail.com>

Dear Biopythoneers,

On behalf of the developers, I'm pleased to announce we
released Biopython 1.56 earlier today. For more details
please see:

http://news.open-bio.org/news/2010/11/biopython-1-56-released/

Please note this will probably be the last release to
support Python 2.4, see:

http://news.open-bio.org/news/2010/11/dropping-python24-support/

(At least) 13 people have contributed to this release,
including 6 new people ? thank you all:

    * Andrea Pierleoni (first contribution)
    * Bart de Koning (first contribution)
    * Bartek Wilczynski
    * Bartosz Telenczuk (first contribution)
    * Cymon Cox
    * Eric Talevich
    * Frank Kauff
    * Michiel de Hoon
    * Peter Cock
    * Phillip Garland (first contribution)
    * Siong Kong (first contribution)
    * Tiago Antao
    * Uri Laserson (first contribution)

Source distributions and Windows installers are available
from the downloads page on the Biopython website:
http://www.biopython.org/wiki/Download

As usual, feedback is most welcome on the mailing lists
(or bugzilla).

Regards,

Peter


From biopython at maubp.freeserve.co.uk  Mon Nov 29 07:02:55 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Nov 2010 12:02:55 +0000
Subject: [Biopython-dev] Dropping Python 2.4 Support?
In-Reply-To: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>
References: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>
Message-ID: <AANLkTinmM49x+L8DH_duCc46mQ67mOtcQvW86y7WS94Q@mail.gmail.com>

On Thu, Nov 18, 2010 at 3:47 PM, Peter wrote:
> Dear Biopythoneers,
>
> Are any of you still using Biopython on Python 2.4?
> http://news.open-bio.org/news/2010/11/dropping-python24-support/
>
> Please get in touch if dropping support for Python 2.4 would be a
> problem. Otherwise we plan for Biopython 1.56 (expected by the
> end of this month) to be our last release to work with Python 2.4.
>
> Thanks,
>
> Peter

So, no comments?

We're using CentOS on our servers at work, but have installed
a later Python on most of them and made it the default.

I'm also keen to use Biopython with Galaxy, and they currently
support Python 2.4 to 2.6 (and I'm unclear when they will add
2.7 and drop 2.4), so this is another reason to keep some level
of support for Python 2.4. However, on a local level this isn't
important as we are running Galaxy on Python 2.6 now.
Likewise I know Brad is running Galaxy on a more recent
Python than 2.4 (are you using Biopython within Galaxy
Brad? Maybe we could chat about that on a new thread).

Hopefully the release of Biopython 1.56 will alert more of our
users to the planned withdrawal of support of Python 2.4, so
we may get some feedback this week...

Peter

From chapmanb at 50mail.com  Mon Nov 29 07:23:23 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 29 Nov 2010 07:23:23 -0500
Subject: [Biopython-dev] Dropping Python 2.4 Support?
In-Reply-To: <AANLkTinmM49x+L8DH_duCc46mQ67mOtcQvW86y7WS94Q@mail.gmail.com>
References: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>
	<AANLkTinmM49x+L8DH_duCc46mQ67mOtcQvW86y7WS94Q@mail.gmail.com>
Message-ID: <20101129122323.GA3139@sobchak.mgh.harvard.edu>

Peter;

[Python2.4 support]
> So, no comments?

The folks who are still using 5 year old versions of python might
not be the most responsive. We'll probably hear some complaints
when some of the code breaks.

> I'm also keen to use Biopython with Galaxy, and they currently
> support Python 2.4 to 2.6 (and I'm unclear when they will add
> 2.7 and drop 2.4), so this is another reason to keep some level
> of support for Python 2.4. However, on a local level this isn't
> important as we are running Galaxy on Python 2.6 now.
> Likewise I know Brad is running Galaxy on a more recent
> Python than 2.4 (are you using Biopython within Galaxy
> Brad? Maybe we could chat about that on a new thread).

Yes, I'm running on 2.6 (and sad to be missing nested with
statements in my code). It would be great to have formal
Biopython/Galaxy interoperability. If I remember right, the biggest
complaint was lack of PEP 8 compliance with module names, but it
should be worth discussing.

Brad

From mjldehoon at yahoo.com  Tue Nov 30 08:14:20 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 30 Nov 2010 05:14:20 -0800 (PST)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=E2me=8XBN7LGNRnQK5Kv7Qvu92Uue3qyhsstj@mail.gmail.com>
Message-ID: <215849.18567.qm@web62405.mail.re1.yahoo.com>

OK, I have removed these modules:
  	Bio.Encodings
 	Bio.PropertyManager
 	Bio.Transcribe
 	Bio.Translate
 	Bio.utils

--Michiel.

--- On Fri, 11/26/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Biopython 1.56 release plans
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, November 26, 2010, 11:25 AM
> On Fri, Nov 5, 2010 at 12:01 PM,
> Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> > On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon
> <mjldehoon at yahoo.com>
> wrote:
> >>
> >> Bio/Transcribe.py
> >> Bio/Translate.py
> >>
> >> These are still imported from
> Bio/Encodings/IUPACEncoding.py, which
> >> is imported from Bio/Alphabet/IUPAC.py. I have no
> idea what this code
> >> is doing. Does anybody know?
> >
> > Ah right - sorry, that had slipped my mind:
> > http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html
> >
> > I had suggested we leave Bio.Transcribe and
> Bio.Translate in for
> > Biopython 1.56 and remove them (and Bio.utils,
> Bio.PropertyManager,
> > and Bio.Encodings.IUPACEncoding) for Biopython 1.57
> 
> Hi Michiel,
> 
> Now Biopython 1.56 is out, would you like to remove those
> modules?
> 
> Thanks
> 
> Peter
> 


From anaryin at gmail.com  Tue Nov 30 10:45:35 2010
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 30 Nov 2010 16:45:35 +0100
Subject: [Biopython-dev] Features of the GSOC branch ready to be merged
Message-ID: <AANLkTime10jWf1URpPyqxUvXPw79bfrH=GDvB79J+dNq@mail.gmail.com>

Hello all,

I've been looking at the code I wrote for the GSOC to see what is ready to
be merged in the main branch. I have to thank Kristian and whoever
participated in the Python & Friends for the input.

>From what I gathered, and from my own tests, I believe the following
functions are solid enough:


   1. Bio/PDB/Atom.py<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Atom.py#L75-105>:
   automatically guessing atom element from atom name
   2. Bio/PDB/Structure.py
      1. Building biological unit from REMARK 350 in the header
(link<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Structure.py#L78-110>
      )
      2. Renumbering residues
(link<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Structure.py#L66-76>
      )


Let me know what you all think.

Best,

Jo?o [...] Rodrigues
http://doeidoei.wordpress.com


From biopython at maubp.freeserve.co.uk  Tue Nov 30 18:24:35 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Nov 2010 23:24:35 +0000
Subject: [Biopython-dev] Bio.SeqIO.index extension, Bio.SeqIO.index_many
Message-ID: <AANLkTimena_PPhuzjksES=eSWf7ZmSsBWM1uam=V87fC@mail.gmail.com>

Hi all,

You may recall some previous discussion about extending the
Bio.SeqIO.index functionality. I'm particularly interested in
keeping the index on disk to reduce the memory overhead
and thus support NGS files with many millions of reads. e.g.

http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006713.html
http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006716.html

I'd also like to index multiple files (e.g. a folder of GenBank
files for different chromosomes), functionality we used to
have with the OBDA style index (using BDB or a flat file)
and Martel/Mindy (deprecated and removed some time ago
due to problems with 3rd party libraries, scaling problems
when parsing, and ultimately no one familiar enough with
the code to try and fix it). See also:

http://lists.open-bio.org/pipermail/biopython-dev/2009-August/006704.html

I've been working on the follow idea on branches in github,
and have something workable using SQLite3 to store a
table of record identifiers, file offset, and file number
(for where we have multiple files indexed together).
Following the OBDA standard, I extended this to
also (optionally) store the record length on disk.
This allows the get_raw method to be much faster,
but may not be possible on all file formats.

[Currently I get the length when building the index
on all supported file formats except SFF. Here we
normally use the Roche index, and that doesn't
have the raw record lengths.]

Note that using SQLite seems sensible to me as
it is included with Python 2.5+ including Python 3,
while BDB, the other candidate from the standard
library, has been deprecated.

The current API is as follows, a new function:

def index_many(index_filename, filenames=None,
                        format=None, alphabet=None,
                        key_function=None)

This is similar to the existing index function, although
here the key_function must return a string for use as
the key in the SQLite database.

The idea is that you call index_many to build a new
index (if the index file does not exist) or reload an
existing index (if the index file does exist). If you
are reloading an existing index, you can omit the
filenames and format.

The index_many function returns a read only dictionary
like object - very much like the existing index function.

Although not (currently) exposed by this API, the code
allows a configurable limit on the number of handles
(since these are a finite resource limited by the OS).

I've put a branch up for comment:
https://github.com/peterjc/biopython/tree/index-many

I hope the docstring text and embedded doctest
examples are clear. You can read them here:
https://github.com/peterjc/biopython/blob/index-many/Bio/SeqIO/__init__.py

What do people think?

One thing I haven't done yet (any volunteers?) is any
benchmarking - for example comparing the index
build and retrieval times for some large files using
Biopython 1.55 (recent baseline), Biopython 1.56
(should be faster on retrieval) and the branch to
check for any regressions in Bio.SeqIO.index(), and
compare this to Bio.SeqIO.index_many() which being
disk based will be slower but require much less RAM.

Peter

P.S. This was based on the following branch, which
proved non-trivial to merge since in the meantime I'd
made separate tweaks to the index code on the trunk:
https://github.com/peterjc/biopython/tree/index-many-length

I didn't propose merging this back then because it
absolutely requires SQLite, and thus Python 2.5+
and we wanted Biopython 1.56 to support Python 2.4.

From schaefer at rostlab.org  Tue Nov  2 09:17:49 2010
From: schaefer at rostlab.org (Christian Schaefer)
Date: Tue, 02 Nov 2010 10:17:49 +0100
Subject: [Biopython-dev] RMSD calculation
In-Reply-To: <AANLkTikndP+_qBoFe=u2jR=oYQ6Dn-+WLD2xBNjXxXCC@mail.gmail.com>
References: <AANLkTi=DMWNh1AtuVjtv8=thDx5Y6KKPW+aaUK=Gi1Yj@mail.gmail.com>	<AANLkTinTy-t_-FafL23kj7PrsiLH=48mL0KZi2f-3RbS@mail.gmail.com>
	<AANLkTikndP+_qBoFe=u2jR=oYQ6Dn-+WLD2xBNjXxXCC@mail.gmail.com>
Message-ID: <4CCFD73D.7000203@rostlab.org>

Hey,

I was using the PDB superimposer once and compared it to ProFit [1] 
which does a McLachlan fitting. Both return essentially the same rmsd, 
while the implementation in Bio.PDB seems to yield higher precision.

Chris

[1] http://www.bioinf.org.uk/software/profit/

-- 
Dipl.-Bioinf. Christian Schaefer
Technical University Munich
Department for Bioinformatics
Faculty of Computer Science/I12
Boltzmannstr. 3
D-85748 Garching b. Muenchen
Germany
http://www.rostlab.org/~schaefer


On 10/30/2010 01:42 AM, George Devaniranjan wrote:
> Thanks Eric and Peter,
> Your patience in answering this question is very much appreciated.
> I think Eric maybe right, I tried the RMSD calculation for several
> structures and VMD does give a lower value for them all.
> George
>
> Thanks once again for all of you for your answers
>
> On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich<eric.talevich at gmail.com>wrote:
>
>> On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan<
>> devaniranjan at gmail.com>  wrote:
>>
>>> I was wondering why there is two functions for calculating RMSD
>>>
>>> 1)in the SVDSuperimposer()
>>> 2)in PDB.Superimposer()
>>>
>>> In the code its says RMS-is RMS being calculated instead of RMSD???
>>> I ask because VMD gives a different value for RMSD to the one from
>>> Biopython
>>>
>>>
>> Hello George,
>>
>> Here's my understanding of it:
>>
>> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms
>> of the distances in 3D space between each corresponding pair of atoms. The
>> RMSD between all atoms in two aligned structures may be different than the
>> RMSD between backbone atoms only. Or, if the two structures don't have the
>> same peptide sequence, that raises another set of issues.
>>
>> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a
>> simplified wrapper.
>>
>> 3. The SVDSuperimposer module allows you to either (i) align two structures
>> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without
>> spatially (re-)aligning the structures. PDB.Superimposer just does the
>> former. If the structures weren't already aligned, these can yield very
>> different values.
>>
>> 4. There are many ways to perform a structural alignment; SVDSuperimposer
>> implements a simple one. PyMOL, VMD, ce, DALI, and other programs implement
>> more advanced methods.
>>
>> So don't be alarmed that VMD gives you a smaller RMSD than PDB.Superimposer
>> -- it just means VMD found a better alignment between the two structures.
>>
>> Best,
>> Eric
>>
>>
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev


From krother at rubor.de  Tue Nov  2 11:15:05 2010
From: krother at rubor.de (Kristian Rother)
Date: Tue, 2 Nov 2010 12:15:05 +0100
Subject: [Biopython-dev] RMSD calculation
In-Reply-To: <AANLkTikBe8eU7w+F14T_VdBy2dBwTmfiWv84mXjRvX8-@mail.gmail.com>
References: <AANLkTi=DMWNh1AtuVjtv8=thDx5Y6KKPW+aaUK=Gi1Yj@mail.gmail.com>
	<AANLkTikiX3RDMqmcYatjqmJ4ukuiLNq5GC=DTOn0Pmje@mail.gmail.com>
	<AANLkTik=ckm184v_+ZH1C34YNzrFpZ8Mrtt-cO5iDm3B@mail.gmail.com>
	<AANLkTi=MxnisQA8s5Kf2TioNv2dggkWsRPZEFkDw0Oaa@mail.gmail.com>
	<AANLkTikBe8eU7w+F14T_VdBy2dBwTmfiWv84mXjRvX8-@mail.gmail.com>
Message-ID: <529a050d3a1c3801f07adbef605341ef-EhVcX1xCQgFaRwICBxEAXR0wfgFLV15YQUBGAEFfUC9ZUFgWXVpyH1RXX0FdQU1tXlhRSF5cXg1fWg==-webmailer1@server08.webmailer.hosteurope.de>

Hi Greg,

I think I can help to clear up the RMSD question.
(or RMS however you abbreviate it its the same formula)

The short answer is, the methods giving lower RMSD do something
conceptually very different from Bio.PDB.

Long answer:

- Bio.PDB.Superimposer does structure *superposition*. It takes pairs of
atoms, and finds the rotation/translation matrix that minimizes the RMSD.
There is a single analytical solution to this, returned by the Kabsch
algorithm from 1976 (see http://www.pymolwiki.org/index.php/Kabsch). I'm
quite sure Biopython/SVDSuperimposer implements this algorithm.

- Services like the EBI SSM server do *structure alignment*. They take two
structures and try to find a set of residue pairs that fit to each other
well. To do so, they occasionally calculate RMSDs, but do not necessarily
use all the residues provided.

For instance, when submitting protein1 and protein2 to EBI, the output
tells me that

N(algn) = 31

meaning that 31 of the 36 residues were used to calculate the alignment.
When looking at the structures, these are probably on the N-terminus (see
picture).

==> the structure alignment algorithm discards the residues he doesnt
regard useful for aligning, this is why the RMSD is lower.


Do you think this explains all our observations?

Best regards,
    Kristian


> Hello everyone,
> I tried with pymol and it gives a value of 1.792 for the RMSD after
> alignment
> The EU bioinformatics server gives a value of 1.74
> VMD 1.62
> But SVD and PDB Superimposer gives a value 3.2
> I have attached the 2 PDB files concerned-is it something I am doing in
> calculating the RMSD using biopython?
> Thank you
>
> On Thu, Oct 28, 2010 at 1:46 PM, Peter
> <biopython at maubp.freeserve.co.uk>wrote:
>
>> On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan
>> <devaniranjan at gmail.com> wrote:
>> > Yes there is a difference-for 2 proteins having exact same residues of
>> 36
>> > residues the values from 4 sources are as follows
>> > VMD RMSD=1.61
>> > SVD RMSD =3.2
>> > PDB RMSD=3.2
>> >
>> > From the EU Bioinformatics server (link below) RMSD =1.75
>> > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver)
>> >
>> > So Biopython really is computing the RMSD and not RMS?
>> > Thanks you
>>
>> It has been a while since I looked at this (but I can still edit
>> the Warwick page if is is unclear).
>>
>> Which definition of RMSD are you using?
>>
>> Bio.PDB uses Bio.SVDSuperimposer, so they should be the same.
>> The comment for this code *says* is calculates the RMS deviation,
>> here:
>>
>>        diff=coords1-coords2
>>        l=coords1.shape[0]
>>        return sqrt(sum(sum(diff*diff))/l)
>>
>> Here variable l will be the number of atoms.
>>
>> What are the two examples you are using? Can you at perhaps
>> share a small example pair of PDB files?
>>
>> Peter
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: superpos.png
Type: image/png
Size: 172427 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101102/f02741f3/attachment-0002.png>

From devaniranjan at gmail.com  Wed Nov  3 01:09:18 2010
From: devaniranjan at gmail.com (George Devaniranjan)
Date: Wed, 3 Nov 2010 01:09:18 +0000
Subject: [Biopython-dev] RMSD calculation
In-Reply-To: <4CCFD73D.7000203@rostlab.org>
References: <AANLkTi=DMWNh1AtuVjtv8=thDx5Y6KKPW+aaUK=Gi1Yj@mail.gmail.com>
	<AANLkTinTy-t_-FafL23kj7PrsiLH=48mL0KZi2f-3RbS@mail.gmail.com>
	<AANLkTikndP+_qBoFe=u2jR=oYQ6Dn-+WLD2xBNjXxXCC@mail.gmail.com>
	<4CCFD73D.7000203@rostlab.org>
Message-ID: <AANLkTinrxtJbP6AzKqfkNwpL+w3fakVduQr=WJRRDNMO@mail.gmail.com>

Hi,
Thank you- I have been noticing that for most PDB-superimposer well as
SV-superimposer give similar values
In addition PYMOL in most cases also gives similar values however in all
cases VMD continues to give the smallest value.

I will also test ProFit -thanks for the link.
George

On Tue, Nov 2, 2010 at 9:17 AM, Christian Schaefer <schaefer at rostlab.org>wrote:

> Hey,
>
> I was using the PDB superimposer once and compared it to ProFit [1] which
> does a McLachlan fitting. Both return essentially the same rmsd, while the
> implementation in Bio.PDB seems to yield higher precision.
>
> Chris
>
> [1] http://www.bioinf.org.uk/software/profit/
>
> --
> Dipl.-Bioinf. Christian Schaefer
> Technical University Munich
> Department for Bioinformatics
> Faculty of Computer Science/I12
> Boltzmannstr. 3
> D-85748 Garching b. Muenchen
> Germany
> http://www.rostlab.org/~schaefer <http://www.rostlab.org/%7Eschaefer>
>
>
>
> On 10/30/2010 01:42 AM, George Devaniranjan wrote:
>
>> Thanks Eric and Peter,
>> Your patience in answering this question is very much appreciated.
>> I think Eric maybe right, I tried the RMSD calculation for several
>> structures and VMD does give a lower value for them all.
>> George
>>
>> Thanks once again for all of you for your answers
>>
>> On Fri, Oct 29, 2010 at 10:39 PM, Eric Talevich<eric.talevich at gmail.com
>> >wrote:
>>
>>  On Thu, Oct 28, 2010 at 12:49 PM, George Devaniranjan<
>>> devaniranjan at gmail.com>  wrote:
>>>
>>>  I was wondering why there is two functions for calculating RMSD
>>>>
>>>> 1)in the SVDSuperimposer()
>>>> 2)in PDB.Superimposer()
>>>>
>>>> In the code its says RMS-is RMS being calculated instead of RMSD???
>>>> I ask because VMD gives a different value for RMSD to the one from
>>>> Biopython
>>>>
>>>>
>>>>  Hello George,
>>>
>>> Here's my understanding of it:
>>>
>>> 1. RMSD and "RMS distance" both mean root mean square deviation, in terms
>>> of the distances in 3D space between each corresponding pair of atoms.
>>> The
>>> RMSD between all atoms in two aligned structures may be different than
>>> the
>>> RMSD between backbone atoms only. Or, if the two structures don't have
>>> the
>>> same peptide sequence, that raises another set of issues.
>>>
>>> 2. In Biopython, PDB.Superimposer internally uses SVDSuperimposer. It's a
>>> simplified wrapper.
>>>
>>> 3. The SVDSuperimposer module allows you to either (i) align two
>>> structures
>>> in 3D space and then calculate RMSD, or (ii) just calculate RMSD without
>>> spatially (re-)aligning the structures. PDB.Superimposer just does the
>>> former. If the structures weren't already aligned, these can yield very
>>> different values.
>>>
>>> 4. There are many ways to perform a structural alignment; SVDSuperimposer
>>> implements a simple one. PyMOL, VMD, ce, DALI, and other programs
>>> implement
>>> more advanced methods.
>>>
>>> So don't be alarmed that VMD gives you a smaller RMSD than
>>> PDB.Superimposer
>>> -- it just means VMD found a better alignment between the two structures.
>>>
>>> Best,
>>> Eric
>>>
>>>
>>>
>>>  _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>


From biopython at maubp.freeserve.co.uk  Wed Nov  3 14:02:48 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Nov 2010 14:02:48 +0000
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
Message-ID: <AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>

On Tue, Oct 19, 2010 at 4:54 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> I've fixed a few issues I felt were holding up merging Andrea's UniProt
> XML parser.
>
> I've now tested the uniprot_sprot.txt and uniprot_sprot.xml are parsed
> into more or less equivalent objects, and that these can be written out
> as GenBank (well, GenPept) files or as EMBL/IMGT files (given recent
> work to support protein EMBL files - which do exist but are rarely used).
>
> This required "fixing" Bug 3026 to cope with long annotation that cannot
> be line wrapper nicely (lots of long URL strings in UniProt XML comments).
> http://bugzilla.open-bio.org/show_bug.cgi?id=3026
> I'm tempted to remove the warning because it is so common... or make
> it use the same text each time so you get warned once.
>
> There are also some additions to the Bio.SeqFeature position classes,
> since SwissProt/UniProt files can have uncertain positions.
>
> Could someone take a look at the code here (a rebased branch), as I'd
> like some independent testing (and better yet, code review):
> http://github.com/peterjc/biopython/tree/uniprot

I've now merged this into the trunk (with a git rebase first so the history
is linear - no branch+merge), and Andrea has agreed to retest it.
Other testing and comments are most welcome.

Peter


From biopython at maubp.freeserve.co.uk  Wed Nov  3 16:45:25 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Nov 2010 16:45:25 +0000
Subject: [Biopython-dev] Bio/cMarkovModelmodule.c
In-Reply-To: <781588.85801.qm@web62407.mail.re1.yahoo.com>
References: <AANLkTimkefVwSCjYSQPBhQ5SFyMFVPiJYiRSnC8G2ygQ@mail.gmail.com>
	<781588.85801.qm@web62407.mail.re1.yahoo.com>
Message-ID: <AANLkTi=U1bcLmbJczO3GNmkViBMe+0SrTJUQJ7LBGnha@mail.gmail.com>

On Sat, Oct 30, 2010 at 3:23 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> OK, done. In the end, I put the warning message in MarkovModel.py anyway,
> since it's very easy to miss if it's in setup.py.
>

Do we really need the warning? I guess otherwise people using this code
might notice a drop in performance if they were using our C code version,
updated their Biopython, and then get the Python fallback if their NumPy
is too old.

If we do keep the warning should it be silenced in test_MarkovModel.py?
Something like the patch below should do it...

Peter

diff --git a/Tests/test_MarkovModel.py b/Tests/test_MarkovModel.py
index fc5ae8b..bb3afe8 100644
--- a/Tests/test_MarkovModel.py
+++ b/Tests/test_MarkovModel.py
@@ -9,7 +9,12 @@ except ImportError:
     raise MissingPythonDependencyError(\
         "Install NumPy if you want to use Bio.MarkovModel.")

+import warnings
+#Silence this warning:
+#For optimal speed, please update to Numpy version 1.3 or later
+warnings.filterwarnings("ignore", category=UserWarning)
 from Bio import MarkovModel
+warnings.filters.pop()

 def print_mm(markov_model):
     print "STATES: %s" % ' '.join(markov_model.states)


From biopython at maubp.freeserve.co.uk  Wed Nov  3 17:17:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 3 Nov 2010 17:17:46 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
Message-ID: <AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>

2010/10/30 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi all,
>
> I've been hacking with buildbot, an integration server. This is to
> allow continuous testing of Biopython. So that we are alerted of any
> problems as soon as somebody does a dreadful commit (I have the top 5
> of most dreadful commits, so it was fair that I should try to do
> something about it).
>
> Things are still incomplete, but I think it is time to inform the list
> of this effort...
> To know more about buildbot you can either go to the buildbot site
> http://buildbot.net/ or see the draft doc that I have been preparing
> http://biopython.org/wiki/Continuous_integration
> There is a draft server here:
> http://events.open-bio.org:8010/
> The cool thing about buildbot is that actual testing is done by
> volunteer computers. Want to test on OS y, Python version z? You can
> offer the idle time of your laptop for that...
>

It is looking impressive Tiago - excellent work :)

>
> Obvious things missing:
>
> 0. First and foremost, see if people like this?

Looks very promising.

> 1. Changing the biopython test code to avoid stressing the network
> (i.e., having a run_tests option that will not test network tests).
> This to avoid imposing continuous traffic on genbank and friends. This
> is a show stopper.

Certainly we can't scale this up to many machines running regular
testing without limiting the network access somewhat.

> 2. Maybe warn the mailing list when some fundamental build stops
> working (e.g. send an email when a python 2.x build stops working)
> 3. Have test servers with all the applications installed (do you want
> to volunteer? This is more to do with volunteers)

I would expect "core" developers to have machines with most of
the command line applications used in Biopython's tests already
installed - but yes, we do want to make sure each optional
command line tool or library is installed on at least one build slave.

> 4. Maybe change run_tests to require all tests to be done. If we are
> doing integration testing, we want all tests to be done (missing
> applications or libraries should be an error). As an example, none of
> my tests are complete

This is about how it currently skips tests missing external
dependencies (like PopGen command line tools in your case).
I think that is OK, otherwise we'll get false positives (see below,
we can't satisfy all dependencies on all platforms).

> 5. Support mac (my access to Mr Job's fashion machines is limited).
> Again this is more a volunteer issue.

My main work machine is a Mac, so this shouldn't be an issue.

> 6. Discuss policies: One test a day? Full tests or updates? Full
> network tests (probably sporadically)? Send emails?

Right now triggering tests after each commit isn't easy to do
is it (due to limited git support in builtbot)? That might be nice
but in the short term running the tests once a day is a big step
forward.

I'd suggest we do network tests once a week (or fortnight?).

> 7. Find volunteers to cover several OSes and several Python
> versions. Assure that people do full tests (i.e. with all applications
> and libraries)

That isn't possible - some applications are not available on Windows,
and some libraries are not available on Jython or Python 3 (yet).

> 8. While I have volunteer Windows testing myself, I will not be able
> to maintain it regularly.

I have access to a Windows machine (which I use to build the
Biopython installers) but currently it is only online intermittently.
I'd have to reorganise machines due to limited network ports in
the office, but it could in principle be used as a builtbot slave.

>
> Opinions are most welcome
>

What is wrong with your Linux Python 3.1 slave? It seems that
2to3 is failing on the doctest conversion.

Peter


From tiagoantao at gmail.com  Thu Nov  4 12:04:17 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 4 Nov 2010 12:04:17 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
Message-ID: <AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>

2010/11/3 Peter <biopython at maubp.freeserve.co.uk>:
> Certainly we can't scale this up to many machines running regular
> testing without limiting the network access somewhat.

As we discussed before, I was thinking in adding an option to
run_tests.py (like --offline) and change the tests that access the
Internet to honour that flag. I was thinking in coding this myself and
then send to the list for approval (I am not going to make big changes
to the test framework myself without passing them through here).

>> 6. Discuss policies: One test a day? Full tests or updates? Full
>> network tests (probably sporadically)? Send emails?
>
> Right now triggering tests after each commit isn't easy to do
> is it (due to limited git support in builtbot)? That might be nice
> but in the short term running the tests once a day is a big step
> forward.

It is actually quite easy (with an hook on github), but I would
suggest leaving this for version 2: lets put the fundamental working
and the add bells and whistles.

> I'd suggest we do network tests once a week (or fortnight?).

OK, I will go ahead and do some changes to run_tests.py as per above.

> That isn't possible - some applications are not available on Windows,
> and some libraries are not available on Jython or Python 3 (yet).


OK, we just have to be sure (manually) that all applications that need
tested are tested.

>> 8. While I have volunteer Windows testing myself, I will not be able
>> to maintain it regularly.
>
> I have access to a Windows machine (which I use to build the
> Biopython installers) but currently it is only online intermittently.
> I'd have to reorganise machines due to limited network ports in
> the office, but it could in principle be used as a builtbot slave.

Regarding Mac and Windows, I will email again as soon as we have the
network issue sorted out. Before that we would be doing maybe too much
traffic as we have no way to stop the network access for now.

> What is wrong with your Linux Python 3.1 slave? It seems that
> 2to3 is failing on the doctest conversion.

I do not have time to evaluate this now, I will trace this issue over
the weekend.

Tiago


From biopython at maubp.freeserve.co.uk  Thu Nov  4 12:28:50 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Nov 2010 12:28:50 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
	<AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
Message-ID: <AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>

2010/11/4 Tiago Ant?o <tiagoantao at gmail.com>:
> 2010/11/3 Peter <biopython at maubp.freeserve.co.uk>:
>> Certainly we can't scale this up to many machines running regular
>> testing without limiting the network access somewhat.
>
> As we discussed before, I was thinking in adding an option to
> run_tests.py (like --offline) and change the tests that access the
> Internet to honour that flag. I was thinking in coding this myself and
> then send to the list for approval (I am not going to make big changes
> to the test framework myself without passing them through here).

Yep, that sounds good.

The previous discussion is here if anyone missed it:
http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html

>>> 6. Discuss policies: One test a day? Full tests or updates? Full
>>> network tests (probably sporadically)? Send emails?
>>
>> Right now triggering tests after each commit isn't easy to do
>> is it (due to limited git support in builtbot)? That might be nice
>> but in the short term running the tests once a day is a big step
>> forward.
>
> It is actually quite easy (with an hook on github), but I would
> suggest leaving this for version 2: lets put the fundamental working
> and the add bells and whistles.

I agree.

>> I'd suggest we do network tests once a week (or fortnight?).
>
> OK, I will go ahead and do some changes to run_tests.py as per above.
>
>> That isn't possible - some applications are not available on Windows,
>> and some libraries are not available on Jython or Python 3 (yet).
>
> OK, we just have to be sure (manually) that all applications that need
> tested are tested.

Yes, that will be a manual task. When we document the slave setup
process we can list which applications we ideally want people to install
on each OS. Having a slight range in versions would actually be a good
thing here.

>>> 8. While I have volunteer Windows testing myself, I will not be able
>>> to maintain it regularly.
>>
>> I have access to a Windows machine (which I use to build the
>> Biopython installers) but currently it is only online intermittently.
>> I'd have to reorganise machines due to limited network ports in
>> the office, but it could in principle be used as a builtbot slave.
>
> Regarding Mac and Windows, I will email again as soon as we have the
> network issue sorted out. Before that we would be doing maybe too much
> traffic as we have no way to stop the network access for now.
>
>> What is wrong with your Linux Python 3.1 slave? It seems that
>> 2to3 is failing on the doctest conversion.
>
> I do not have time to evaluate this now, I will trace this issue over
> the weekend.

Sure.

And once the --offline switch is working, we can start adding slaves
(and documenting how to do it to assist future volunteers).

Good work Tiago :)

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov  4 16:49:45 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 4 Nov 2010 12:49:45 -0400
Subject: [Biopython-dev] [Bug 3139] python setup.py test ends with error
	code 0 even on failure
In-Reply-To: <bug-3139-42@http.bugzilla.open-bio.org/>
Message-ID: <201011041649.oA4GnjEw008477@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3139


biopython-bugzilla at maubp.freeserve.co.uk changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-04 12:49 EST -------
Fix checked in by Tiago, marking as fixed.

http://github.com/biopython/biopython/commit/457ce49a060fe540f98aa37a6266cff17864487b


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Thu Nov  4 17:13:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Nov 2010 17:13:33 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
Message-ID: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>

Hi all,

I've mentioned in recent threads that I think we should try and
release Biopython 1.56 this month (November 2010).

I think the NEWS file is pretty up to date, and covers important
new functionality like Andrea Pierleoni's UniProt XML parser
and the IMGT support (with Uri Laserson).

Is there any other functionality which is ready for merging?

For example, Tiago - you've been doing lots of work on your
branch with the PopGen code. Is that code ready? I'm willing
to do the git merge/rebase.

Is there any reason to bother with a beta release this time?

If there are no pressing additions, I may be able to do the
release tomorrow - otherwise how about aiming for Thursday
or Friday next week (11 or 12 November)?

Regards,

Peter


From mjldehoon at yahoo.com  Fri Nov  5 09:40:19 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 02:40:19 -0700 (PDT)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <701600.10148.qm@web62403.mail.re1.yahoo.com>

I think the following should be removed before the release:

Bio/SwissProt/SProt.py
Bio/Transcribe.py
Bio/Translate.py

as well as the Iterator class in Bio/SCOP/Dom.py.

These have been deprecated since Biopython 1.52.

Best,
--Michiel.

--- On Thu, 11/4/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: [Biopython-dev] Biopython 1.56 release plans
> To: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Thursday, November 4, 2010, 1:13 PM
> Hi all,
> 
> I've mentioned in recent threads that I think we should try
> and
> release Biopython 1.56 this month (November 2010).
> 
> I think the NEWS file is pretty up to date, and covers
> important
> new functionality like Andrea Pierleoni's UniProt XML
> parser
> and the IMGT support (with Uri Laserson).
> 
> Is there any other functionality which is ready for
> merging?
> 
> For example, Tiago - you've been doing lots of work on
> your
> branch with the PopGen code. Is that code ready? I'm
> willing
> to do the git merge/rebase.
> 
> Is there any reason to bother with a beta release this
> time?
> 
> If there are no pressing additions, I may be able to do
> the
> release tomorrow - otherwise how about aiming for Thursday
> or Friday next week (11 or 12 November)?
> 
> Regards,
> 
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From tiagoantao at gmail.com  Fri Nov  5 10:13:09 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Nov 2010 10:13:09 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>

On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> For example, Tiago - you've been doing lots of work on your
> branch with the PopGen code. Is that code ready? I'm willing
> to do the git merge/rebase.

I was hoping that would offer to do a merge ;) . <sarcasm> Though we
need a broken repository to test the integration server, so maybe I
could do it myself </sarcasm>.
Yes, the code is ready.
After the merge I will still add a couple of functions (also ready,
but not committed) and make sure the test cases are fully ready. But
it should be a day only and better done after the merge.
This is mainly new code that does much faster GENEPOP parsing and
supports AFLP processing.

Tiago


From biopython at maubp.freeserve.co.uk  Fri Nov  5 10:19:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 10:19:53 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
Message-ID: <AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>

2010/11/5 Tiago Ant?o <tiagoantao at gmail.com>:
> On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> For example, Tiago - you've been doing lots of work on your
>> branch with the PopGen code. Is that code ready? I'm willing
>> to do the git merge/rebase.
>
> I was hoping that would offer to do a merge ;) . <sarcasm> Though we
> need a broken repository to test the integration server, so maybe I
> could do it myself </sarcasm>.
> Yes, the code is ready.

OK - I'll try to get your code, rebase it onto the current master,
then post it as a new branch for you to check. Once that is OK,
I'll rebase it again if the master has changed, then fast-forward
merge it to the master (that way we don't get a split and join on
the master history - just a sudden batch of commits).

> After the merge I will still add a couple of functions (also ready,
> but not committed) and make sure the test cases are fully ready.
> But it should be a day only and better done after the merge.
> This is mainly new code that does much faster GENEPOP
> parsing and supports AFLP processing.

Hopefully we can get that part done early next week.

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov  5 10:23:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 10:23:26 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <701600.10148.qm@web62403.mail.re1.yahoo.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<701600.10148.qm@web62403.mail.re1.yahoo.com>
Message-ID: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>

On Fri, Nov 5, 2010 at 9:40 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I think the following should be removed before the release:
>
> Bio/SwissProt/SProt.py
> Bio/Transcribe.py
> Bio/Translate.py
>
> as well as the Iterator class in Bio/SCOP/Dom.py.
>
> These have been deprecated since Biopython 1.52.

According to the DEPRECATED file, those modules were
deprecated in Biopython 1.51, so they are definitely due for
removal. In any case Biopython 1.52 was very nearly a year
ago [1] as it was released 22 September 2009.

Please go ahead and tidy this up.

Thanks,

Peter

[1] http://www.biopython.org/wiki/Deprecation_policy


From biopython at maubp.freeserve.co.uk  Fri Nov  5 10:47:12 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 10:47:12 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
	<AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
Message-ID: <AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>

2010/11/5 Peter <biopython at maubp.freeserve.co.uk>:
> 2010/11/5 Tiago Ant?o <tiagoantao at gmail.com>:
>> On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>>> For example, Tiago - you've been doing lots of work on your
>>> branch with the PopGen code. Is that code ready? I'm willing
>>> to do the git merge/rebase.
>>
>> I was hoping that would offer to do a merge ;) . <sarcasm> Though we
>> need a broken repository to test the integration server, so maybe I
>> could do it myself </sarcasm>.
>> Yes, the code is ready.
>
> OK - I'll try to get your code, rebase it onto the current master,
> then post it as a new branch for you to check.

Notes on how I did this:

$ git remote add tiago https://github.com/tiagoantao/biopython.git
$ git fetch tiago
...
>From https://github.com/tiagoantao/biopython
 * [new branch]      buildbot   -> tiago/buildbot
 * [new branch]      master     -> tiago/master

Now I want your "master" branch, but that name clashes with
my "master" branch... the following worked here:

$ git checkout tiago/master
Note: moving to "tiago/master" which isn't a local branch
If you want to create a new branch from this checkout, you may do so
(now or later) by using -b with the checkout command again. Example:
  git checkout -b <new_branch_name>
HEAD is now at 21b7a22... Merge branch 'master' of
github.com:tiagoantao/biopython
$ git checkout -b tiago-pop-gen
Switched to a new branch "tiago-pop-gen"

Now I want to write the history of you PopGen work as though
it was started from the current state of the master branch. I
was hoping there would have been no changes to the PopGen
code on the master so that this would be trivial...

$ git rebase master
...
CONFLICT (content): Merge conflict in Bio/PopGen/FDist/__init__.py
...

So open Bio/PopGen/FDist/__init__.py and look for the merge failures
(which are marked with <<<<<<< to >>>>>>>). In this it was the
removal of some deprecated code done on the pop gen branch, which
was only deprecated in Biopython 1.55 so it is a bit premature to remove
it already. So I fixed up Bio/PopGen/FDist/__init__.py and saved it. Then:

$ git add Bio/PopGen/FDist/__init__.py
$ git rebase --continue
...

This seems to have worked. I can now do a comparison to
the master branch,

$ git diff master
...

After running the unit tests (which was of limited value as I don't
have FDist installed on this machine), I then pushed it online:

$ git push peterjc tiago-pop-gen

The rebased branch is now here:
https://github.com/peterjc/biopython/tree/tiago-pop-gen

If you agree the rebased branch is sane, it should be trivial to
now merge that onto the master as a fast-forward merge.
(But I would check first that the master hasn't changed, and
if it has, repeat the rebase).

Peter


From tiagoantao at gmail.com  Fri Nov  5 10:50:32 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Nov 2010 10:50:32 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
	<AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
	<AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>
Message-ID: <AANLkTim+pFsPSt2wMvwpLsy=ks4rP6rvX6zC8DBwQqwK@mail.gmail.com>

2010/11/5 Peter <biopython at maubp.freeserve.co.uk>:
> If you agree the rebased branch is sane, it should be trivial to
> now merge that onto the master as a fast-forward merge.
> (But I would check first that the master hasn't changed, and
> if it has, repeat the rebase).

Many thanks for the guide, maybe in the future I will have the courage
to do it myself.

Go ahead and commit the changes. I will make sure the module is sane
this Sunday.


From biopython at maubp.freeserve.co.uk  Fri Nov  5 11:08:54 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 11:08:54 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTim+pFsPSt2wMvwpLsy=ks4rP6rvX6zC8DBwQqwK@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimUFDQNh3gw4eT1F6=0rbsG7GgKF4X-7NCvPFA9@mail.gmail.com>
	<AANLkTi=1WC_rM_BctDzU+ubLj=w9o8Q-W5tAYogs9ND=@mail.gmail.com>
	<AANLkTi=MYN1AjZudZ7hdV6MYVgR4UifakWXCtjk1zUFs@mail.gmail.com>
	<AANLkTim+pFsPSt2wMvwpLsy=ks4rP6rvX6zC8DBwQqwK@mail.gmail.com>
Message-ID: <AANLkTi=W8uAcNDT2FYm5KEq6h4uLK8TUyQodmNtYUg=_@mail.gmail.com>

2010/11/5 Tiago Ant?o <tiagoantao at gmail.com>:
> 2010/11/5 Peter <biopython at maubp.freeserve.co.uk>:
>> If you agree the rebased branch is sane, it should be trivial to
>> now merge that onto the master as a fast-forward merge.
>> (But I would check first that the master hasn't changed, and
>> if it has, repeat the rebase).
>
> Many thanks for the guide, maybe in the future I will have the
> courage to do it myself.
>
> Go ahead and commit the changes. I will make sure the module
> is sane this Sunday.

Done. The master hadn't changed in the meantime so I didn't
have to re-rebase:

$ git checkout master
Switched to branch "master"
$ git merge tiago-pop-gen
Updating 065e235..4f318a4
Fast forward
 Bio/PopGen/FDist/Async.py            |   21 +-
 Bio/PopGen/FDist/Controller.py       |  125 +-
 Bio/PopGen/FDist/Utils.py            |   68 +-
 Bio/PopGen/FDist/__init__.py         |    1 -
 Bio/PopGen/GenePop/EasyController.py |   10 +-
 Bio/PopGen/GenePop/FileParser.py     |   69 +-
 Tests/PopGen/data_dfst_outfile       |  300 +
 Tests/PopGen/dfdist1                 | 1204 +
 Tests/PopGen/dout.cpl                |  300 +
 Tests/PopGen/dout.dat                |50000 ++++++++++++++++++++++++++++++++++
 Tests/test_PopGen_DFDist.py          |  106 +
 Tests/test_PopGen_FDist_nodepend.py  |   20 +-
 12 files changed, 52176 insertions(+), 48 deletions(-)
 create mode 100644 Tests/PopGen/data_dfst_outfile
 create mode 100644 Tests/PopGen/dfdist1
 create mode 100644 Tests/PopGen/dout.cpl
 create mode 100644 Tests/PopGen/dout.dat
 create mode 100644 Tests/test_PopGen_DFDist.py

Then publishing it,

$ git push origin master
Counting objects: 120, done.
Delta compression using 8 threads.
Compressing objects: 100% (106/106), done.
Writing objects: 100% (106/106), 133.46 KiB, done.
Total 106 (delta 79), reused 0 (delta 0)
To git at github.com:biopython/biopython.git
   065e235..4f318a4  master -> master

And removing my now pointless public branch:

$ git push peterjc :tiago-pop-gen
To git at github.com:peterjc/biopython.git
 - [deleted]         tiago-pop-gen

We need to update the NEWS file now.

Peter


From mjldehoon at yahoo.com  Fri Nov  5 11:52:15 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 04:52:15 -0700 (PDT)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>
Message-ID: <645847.84052.qm@web62404.mail.re1.yahoo.com>

> > Bio/SwissProt/SProt.py
> > the Iterator class in Bio/SCOP/Dom.py

I have removed these.

> > Bio/Transcribe.py
> > Bio/Translate.py

These are still imported from Bio/Encodings/IUPACEncoding.py, which is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code is doing. Does anybody know?

--Michiel.


From biopython at maubp.freeserve.co.uk  Fri Nov  5 12:01:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 12:01:45 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <645847.84052.qm@web62404.mail.re1.yahoo.com>
References: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>
	<645847.84052.qm@web62404.mail.re1.yahoo.com>
Message-ID: <AANLkTikhuis9NVte79m9PZMb9pNoFBQvqqq+PwLXstAf@mail.gmail.com>

On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
>> > Bio/SwissProt/SProt.py
>> > the Iterator class in Bio/SCOP/Dom.py
>
> I have removed these.
>
>> > Bio/Transcribe.py
>> > Bio/Translate.py
>
> These are still imported from Bio/Encodings/IUPACEncoding.py, which
> is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code
> is doing. Does anybody know?

Ah right - sorry, that had slipped my mind:
http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html

I had suggested we leave Bio.Transcribe and Bio.Translate in for
Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager,
and Bio.Encodings.IUPACEncoding) for Biopython 1.57

Peter


From mjldehoon at yahoo.com  Fri Nov  5 12:08:17 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 05:08:17 -0700 (PDT)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <772269.63506.qm@web62407.mail.re1.yahoo.com>

I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this functionality moved to Bio.ExPASy.Prodoc at least since release 1.50, and the module has been labeled as obsolete since then. The enclosing module Bio.Prosite itself is already deprecated.

--Michiel.


From biopython at maubp.freeserve.co.uk  Fri Nov  5 12:19:27 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 12:19:27 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <772269.63506.qm@web62407.mail.re1.yahoo.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<772269.63506.qm@web62407.mail.re1.yahoo.com>
Message-ID: <AANLkTin+Qn8K+eqNvXVU87VGCWGgDGu0xJO8gD_gaYQ0@mail.gmail.com>

On Fri, Nov 5, 2010 at 12:08 PM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
> I'd like to suggest also that we deprecate Bio.Prosite.Prodoc; this
> functionality moved to Bio.ExPASy.Prodoc at least since release 1.50,
> and the module has been labeled as obsolete since then. The enclosing
> module Bio.Prosite itself is already deprecated.

Since Bio.Prosite is deprecated that means Bio.Prosite.Prodoc (and any
other child modules) is too. If you try "from Bio.Prosite import Prodoc"
you get a deprecation warning. Feel free to add "(DEPRECATED)" to
the Bio.Prosite.Prodoc docstrings if you think it would be clearer.

Peter


From andrea at biocomp.unibo.it  Fri Nov  5 16:43:16 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 5 Nov 2010 17:43:16 +0100 (CET)
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
Message-ID: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>

> On Tue, Oct 19, 2010 at 4:54 PM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> I've now merged this into the trunk (with a git rebase first so the
> history
> is linear - no branch+merge), and Andrea has agreed to retest it.
> Other testing and comments are most welcome.
>
> Peter
>


I've done a couple of testing, from the master biopython branch.
The uniprot-xml parser successfully parsed the 2010_11 release of uniprot
containing
522,019 entries.

The plain text 'swiss' parser took 6 mins to parse the complete flatfile
uniprot db on my system (python 2.6 on a macbook pro, core2duo).
the uniprot-xml parser took 12 minutes to do the same task when using
cElementTree and
looks pretty good to me (compare this to the 8 minutes I needed to
download the gzipped db).
However it took more than 80 mins to do the same task using ElementTree.
So be aware
that the parser can turn very slow without the C library.

I'm currently retesting also on TrEMBL, but I don't think there is going
to be any problem.
I have no idea of the performances with jython, and similar derivations of
python, nor if it works.

Andrea


From eric.talevich at gmail.com  Fri Nov  5 17:26:03 2010
From: eric.talevich at gmail.com (Eric Talevich)
Date: Fri, 5 Nov 2010 13:26:03 -0400
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTin7khy0GkxBd1LMJpCtQeAaFHybS1v4C+52FdK5@mail.gmail.com>

On Fri, Nov 5, 2010 at 12:43 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it>wrote:

>
> I've done a couple of testing, from the master biopython branch.
> The uniprot-xml parser successfully parsed the 2010_11 release of uniprot
> containing
> 522,019 entries.
>
> [...]
>
> I have no idea of the performances with jython, and similar derivations of
> python, nor if it works.
>
>
Speaking from my experience with ElementTree in Bio.Phylo -- Jython 2.5's
implementation of xml.etree should work as a drop-in replacement, but it's
painfully slow. However, I've read that the next release of Jython will
include some substantial overall speedups, which should make it more
competitive.

I once tried to get Biopython working on IronPython (on Mono, on Linux), but
didn't succeed. The release I used didn't seem to have a compatible
xml.etree implementation, though the developers may have made progress on
this recently.

-Eric


From biopython at maubp.freeserve.co.uk  Fri Nov  5 17:53:50 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Nov 2010 17:53:50 +0000
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>

On Fri, Nov 5, 2010 at 4:43 PM, Andrea Pierleoni wrote:
>
> On Tue, Oct 19, 2010 at 4:54 PM, Peter wrote:
>> I've now merged this into the trunk (with a git rebase first so the
>> history is linear - no branch+merge), and Andrea has agreed to
>> retest it. Other testing and comments are most welcome.
>>
>> Peter
>>
>
>
> I've done a couple of testing, from the master biopython branch.
> The uniprot-xml parser successfully parsed the 2010_11 release
> of uniprot containing 522,019 entries.
>
> The plain text 'swiss' parser took 6 mins to parse the complete flatfile
> uniprot db on my system (python 2.6 on a macbook pro, core2duo).
> the uniprot-xml parser took 12 minutes to do the same task when using
> cElementTree and looks pretty good to me (compare this to the 8
> minutes I needed to download the gzipped db).

I think I have a slightly older version as it only has 519348 entries.
My timings using Python 2.6 on Mac OS X, using looping over the
file with Bio.SeqIO.parse() and incrementing a counter:

uniprot_sprot.fasta, 232 MB, 15s ("fasta")
uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss")
uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml")

Note the XML file is about twice the size of the plain text swiss
format file, and as you noted, takes about twice as long to parse.

> However it took more than 80 mins to do the same task using
> ElementTree. So be aware that the parser can turn very slow
> without the C library.
>
> I'm currently retesting also on TrEMBL, but I don't think there is going
> to be any problem.

OK - those files are about 10 times bigger, right?

> I have no idea of the performances with jython, and similar
> derivations of python, nor if it works.

The tests all pass with Jython 2.5.1 (running under Mac OS X),
and here are some timings:

uniprot_sprot.fasta, 232 MB, 21s ("fasta")
uniprot_sprot.dat, 2.2 GB, 8m34s ("swiss")
uniprot_sprot.xml, 4.5 GB, FAILED ("uniprot-xml")

The XML file failed almost immediately with this traceback:

Traceback (most recent call last):
  File "../count.py", line 13, in <module>
    for record in SeqIO.parse(open(filename), format_name):
  File "../count.py", line 13, in <module>
    for record in SeqIO.parse(open(filename), format_name):
  File "/Users/xxx/jython2.5.1/Lib/site-packages/Bio/SeqIO/UniprotIO.py",
line 80, in UniprotIterator
    for event, elem in ElementTree.iterparse(handle, events=("start", "end")):
  File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 937, in next
    self._parser.feed(data)
  File "/Users/xxx/jython2.5.1/Lib/xml/etree/ElementTree.py", line 1245, in feed
    self._parser.Parse(data, 0)
  File "/Users/xxx/jython2.5.1/Lib/xml/parsers/expat.py", line 195, in Parse
    self._data.append(data)
	at java.util.Arrays.copyOf(Arrays.java:2882)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
	at java.lang.StringBuilder.append(StringBuilder.java:119)
	at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)

java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space


Note this wasn't a simple out of memory error (the machine had GBs
free), rather it was heap space. That's a bit frustrating - but Kyle's
email suggests things could improve in the next Jython release.

Peter


From andrea at biocomp.unibo.it  Fri Nov  5 18:09:08 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 5 Nov 2010 19:09:08 +0100 (CET)
Subject: [Biopython-dev] Merging Uniprot XML parser?
In-Reply-To: <AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
Message-ID: <37e194782e740bf5bd2e872bfc6a37d3.squirrel@lipid.biocomp.unibo.it>


> I think I have a slightly older version as it only has 519348 entries.
> My timings using Python 2.6 on Mac OS X, using looping over the
> file with Bio.SeqIO.parse() and incrementing a counter:
>
> uniprot_sprot.fasta, 232 MB, 15s ("fasta")
> uniprot_sprot.dat, 2.2 GB, 4m57s ("swiss")
> uniprot_sprot.xml, 4.5 GB, 10m34s ("uniprot-xml")
>

my timings were without the counter :)

> Note the XML file is about twice the size of the plain text swiss
> format file, and as you noted, takes about twice as long to parse.
>

yes it's true, but iterating over the two files takes 18s for .dat one
and 38s for .xml one. the information retrieved is more or less
the same. the rest is overhead due to the XML file complexity.
however it's pretty fast anyway, at least with cElementTree.

>> I'm currently retesting also on TrEMBL, but I don't think there is going
>> to be any problem.
>
> OK - those files are about 10 times bigger, right?

it's currently 12 millions entries! so it's 24 times bigger (7.5Gb gzipped)
in fact I can't complete the test today. I'll keep you updated.


>
> Note this wasn't a simple out of memory error (the machine had GBs
> free), rather it was heap space. That's a bit frustrating - but Kyle's
> email suggests things could improve in the next Jython release.
>


Is the new Jython release coming soon? I'm really a newbie to jython,
so I don't think I can help with it. maybe it is safer for jython users to
use the
'swiss' parser until the new release came out, particularly if they have
performance issues.

Andrea


From mjldehoon at yahoo.com  Sat Nov  6 02:41:57 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 5 Nov 2010 19:41:57 -0700 (PDT)
Subject: [Biopython-dev] Bio/cMarkovModelmodule.c
In-Reply-To: <AANLkTi=U1bcLmbJczO3GNmkViBMe+0SrTJUQJ7LBGnha@mail.gmail.com>
Message-ID: <646748.14362.qm@web62407.mail.re1.yahoo.com>

--- On Wed, 11/3/10, Peter <biopython at maubp.freeserve.co.uk> wrote:
> > I put the warning message in MarkovModel.py anyway,
> > since it's very easy to miss if it's in setup.py.
> 
> Do we really need the warning? I guess otherwise people
> using this code
> might notice a drop in performance if they were using our C
> code version,
> updated their Biopython, and then get the Python fallback
> if their NumPy is too old.

We need the warning, otherwise we'd leave the user guessing as to why their code is suddenly slower.

> If we do keep the warning should it be silenced in
> test_MarkovModel.py?

OK I've added this warning.

--Michiel.


From biopython at maubp.freeserve.co.uk  Mon Nov  8 16:12:06 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 8 Nov 2010 16:12:06 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
	<AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
	<AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>
Message-ID: <AANLkTimctC=9-EVUYhF01SmOzR1wKQByVuvi96_vTrfZ@mail.gmail.com>

2010/11/4 Peter <biopython at maubp.freeserve.co.uk>:
>> As we discussed before, I was thinking in adding an option to
>> run_tests.py (like --offline) and change the tests that access the
>> Internet to honour that flag. I was thinking in coding this myself and
>> then send to the list for approval (I am not going to make big changes
>> to the test framework myself without passing them through here).
>
> Yep, that sounds good.
>
> The previous discussion is here if anyone missed it:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008295.html
>

Hi Tiago,

I've implemented the proposed --offline switch in run_tests.py,

https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697

Does that work for you ? If you can come up with a more
elegant solution do speak up - mine is a bit of a hack ;)

Peter


From tiagoantao at gmail.com  Mon Nov  8 16:17:07 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 8 Nov 2010 16:17:07 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTimctC=9-EVUYhF01SmOzR1wKQByVuvi96_vTrfZ@mail.gmail.com>
References: <AANLkTikQNr-VfKtF5w-BbXLawb6hMBPegBerg9yb7jC+@mail.gmail.com>
	<AANLkTik5BxRuFN4T6rA=hqAjK0LwGpQDqgfz94bFPsGm@mail.gmail.com>
	<AANLkTimnLwnBC2bx8S5POa1GCXjne2M4g6AsENJU_s-h@mail.gmail.com>
	<AANLkTi=QPiwjis+o91AXZR90fd-zVHgd59E-C_6+Mg5Q@mail.gmail.com>
	<AANLkTimctC=9-EVUYhF01SmOzR1wKQByVuvi96_vTrfZ@mail.gmail.com>
Message-ID: <AANLkTi=_mj2se5k6wAEDKyhGrUqkhqgS-geAfkh+8+zx@mail.gmail.com>

2010/11/8 Peter <biopython at maubp.freeserve.co.uk>:
> I've implemented the proposed --offline switch in run_tests.py,
>
> https://github.com/biopython/biopython/commit/b6bbcea355a8f71df8654256d8da6ef8b8c02697
>
> Does that work for you ? If you can come up with a more
> elegant solution do speak up - mine is a bit of a hack ;)


Thanks a lot. I was waiting for the 1.56 release to work on this thing
(to avoid adding entrpoy). But as this is now in, I will progress
immediately with the rest of the integration server work. I will
contact soon regarding Mac testing.


From tiagoantao at gmail.com  Mon Nov  8 16:34:31 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 8 Nov 2010 16:34:31 +0000
Subject: [Biopython-dev] Bio/Entrez/__init__.py
Message-ID: <AANLkTimqTeR7qERSW8ATepANRxQeVRyVNg=BBnXpXYXW@mail.gmail.com>

Hi,

There is a doctest line that is making 2to3 go bonkers on Bio.Entrez
(__init__.py)
Line 55
             >>> for record in records:
             ...     # each record is a Python dictionary or list.

Simplying adding a
...       pass

Is enough (the code should not work as it is an empty for, so 2to3 is
actually correct)

-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Mon Nov  8 16:38:08 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 8 Nov 2010 16:38:08 +0000
Subject: [Biopython-dev] Bio/Entrez/__init__.py
In-Reply-To: <AANLkTimqTeR7qERSW8ATepANRxQeVRyVNg=BBnXpXYXW@mail.gmail.com>
References: <AANLkTimqTeR7qERSW8ATepANRxQeVRyVNg=BBnXpXYXW@mail.gmail.com>
Message-ID: <AANLkTin0g01DY3HhTEZjRNVM2zwgp-G-AtzS8YqWb+Ct@mail.gmail.com>

2010/11/8 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> There is a doctest line that is making 2to3 go bonkers on Bio.Entrez
> (__init__.py)
> Line 55
> ? ? ? ? ? ? >>> for record in records:
> ? ? ? ? ? ? ... ? ? # each record is a Python dictionary or list.
>
> Simplying adding a
> ... ? ? ? pass
>
> Is enough (the code should not work as it is an empty for, so 2to3 is
> actually correct)

Ah - that isn't actually being used as a doctest (we don't call it
in run_tests.py) and it wouldn't work if we tried because half
the function arguments are omitted or left as dots.

I like your solution of adding the pass line.

Peter


From mjldehoon at yahoo.com  Tue Nov  9 01:22:39 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 8 Nov 2010 17:22:39 -0800 (PST)
Subject: [Biopython-dev] Bio/Entrez/__init__.py
In-Reply-To: <AANLkTin0g01DY3HhTEZjRNVM2zwgp-G-AtzS8YqWb+Ct@mail.gmail.com>
Message-ID: <365364.32303.qm@web62403.mail.re1.yahoo.com>

I've added this line:

    ...    print record

which should solve the 2to3 error.

--Michiel.

--- On Mon, 11/8/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py
> To: "Tiago Ant?o" <tiagoantao at gmail.com>
> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Monday, November 8, 2010, 11:38 AM
> 2010/11/8 Tiago Ant?o <tiagoantao at gmail.com>:
> > Hi,
> >
> > There is a doctest line that is making 2to3 go bonkers
> on Bio.Entrez
> > (__init__.py)
> > Line 55
> > ? ? ? ? ? ? >>> for record in records:
> > ? ? ? ? ? ? ... ? ? # each record is a Python
> dictionary or list.
> >
> > Simplying adding a
> > ... ? ? ? pass
> >
> > Is enough (the code should not work as it is an empty
> for, so 2to3 is
> > actually correct)
> 
> Ah - that isn't actually being used as a doctest (we don't
> call it
> in run_tests.py) and it wouldn't work if we tried because
> half
> the function arguments are omitted or left as dots.
> 
> I like your solution of adding the pass line.
> 
> Peter
> 
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 


From tiagoantao at gmail.com  Tue Nov  9 09:12:29 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Nov 2010 09:12:29 +0000
Subject: [Biopython-dev] Bio/Entrez/__init__.py
In-Reply-To: <365364.32303.qm@web62403.mail.re1.yahoo.com>
References: <AANLkTin0g01DY3HhTEZjRNVM2zwgp-G-AtzS8YqWb+Ct@mail.gmail.com>
	<365364.32303.qm@web62403.mail.re1.yahoo.com>
Message-ID: <AANLkTim86p-vgZjp60BggkCzGJ_RaRCpXdvZHeV=YMwv@mail.gmail.com>

The buildbot server VM is currently down (Chris is moving it to
another physical location). As soon as the machine is back up, I will
activate the server and maybe we can start activating things on a Mac
architecture.

I was thinking in sending emails to the list (automatically) when a
build that was previously working, stops doing so...?

2010/11/9 Michiel de Hoon <mjldehoon at yahoo.com>:
> I've added this line:
>
> ? ?... ? ?print record
>
> which should solve the 2to3 error.
>
> --Michiel.
>
> --- On Mon, 11/8/10, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
>> From: Peter <biopython at maubp.freeserve.co.uk>
>> Subject: Re: [Biopython-dev] Bio/Entrez/__init__.py
>> To: "Tiago Ant?o" <tiagoantao at gmail.com>
>> Cc: "BioPython-Dev Mailing List" <biopython-dev at biopython.org>
>> Date: Monday, November 8, 2010, 11:38 AM
>> 2010/11/8 Tiago Ant?o <tiagoantao at gmail.com>:
>> > Hi,
>> >
>> > There is a doctest line that is making 2to3 go bonkers
>> on Bio.Entrez
>> > (__init__.py)
>> > Line 55
>> > ? ? ? ? ? ? >>> for record in records:
>> > ? ? ? ? ? ? ... ? ? # each record is a Python
>> dictionary or list.
>> >
>> > Simplying adding a
>> > ... ? ? ? pass
>> >
>> > Is enough (the code should not work as it is an empty
>> for, so 2to3 is
>> > actually correct)
>>
>> Ah - that isn't actually being used as a doctest (we don't
>> call it
>> in run_tests.py) and it wouldn't work if we tried because
>> half
>> the function arguments are omitted or left as dots.
>>
>> I like your solution of adding the pass line.
>>
>> Peter
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>>
>
>
>
>


-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Tue Nov  9 09:57:47 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 09:57:47 +0000
Subject: [Biopython-dev] Continuous integration server
Message-ID: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>

2010/11/9 Tiago Ant?o <tiagoantao at gmail.com>:
> The buildbot server VM is currently down (Chris is moving it to
> another physical location). As soon as the machine is back up, I will
> activate the server and maybe we can start activating things on a Mac
> architecture.
>
> I was thinking in sending emails to the list (automatically) when a
> build that was previously working, stops doing so...?
>

That sounds worth trying, as it removes the need for us to actively
check the buildbot server's webreport. Alternatively we should be
able to use the RSS/Atom feed.

One concern is if we have (say) 8 builtbot slaves, and a change on
the trunk accidentally breaks a unit test (on all platforms), does that
mean we'd get one email or eight?

Peter


From tiagoantao at gmail.com  Tue Nov  9 10:14:37 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Nov 2010 10:14:37 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
Message-ID: <AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>

2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
> That sounds worth trying, as it removes the need for us to actively
> check the buildbot server's webreport. Alternatively we should be
> able to use the RSS/Atom feed.

The web interface has RSS and atom.

> One concern is if we have (say) 8 builtbot slaves, and a change on
> the trunk accidentally breaks a unit test (on all platforms), does that
> mean we'd get one email or eight?

It can be configured to send only 1. I just cannot promise that I will
get the configuration right at the first time ;) . But it can be done.


From biopython at maubp.freeserve.co.uk  Tue Nov  9 10:33:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 10:33:26 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
	<AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>
Message-ID: <AANLkTimh3umr_D=bRchK0shWT0zwxo45EeaiMop-53Sz@mail.gmail.com>

2010/11/9 Tiago Ant?o <tiagoantao at gmail.com>:
> 2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
>> That sounds worth trying, as it removes the need for us to actively
>> check the buildbot server's webreport. Alternatively we should be
>> able to use the RSS/Atom feed.
>
> The web interface has RSS and atom.

Yet another feed for me to track :)

Emails have the advantage of being logged on the mailing list
archive. Lets try it and see how it goes.

>> One concern is if we have (say) 8 builtbot slaves, and a change on
>> the trunk accidentally breaks a unit test (on all platforms), does that
>> mean we'd get one email or eight?
>
> It can be configured to send only 1. I just cannot promise that I will
> get the configuration right at the first time ;) . But it can be done.

I thought they (buildbot) would have considered that example :)
You'll probably need the buildbot server's email address added
to the biopython-dev mailing list's white list - let me know nearer
the time.

Peter


From tiagoantao at gmail.com  Tue Nov  9 14:07:56 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 9 Nov 2010 14:07:56 +0000
Subject: [Biopython-dev] bugzilla jython platform
Message-ID: <AANLkTimKpWaeKKyi10mEtLTeOy8f8BcHmBi3e_mv7ZVd@mail.gmail.com>

Hi,

Just a minor thingy: would it be possible to have a bugzilla platform
called jython? (Or OS).

I am going to report a bug on Jython and noticed that it is not available.

-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 14:09:42 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 09:09:42 -0500
Subject: [Biopython-dev] [Bug 3155] New: Some Phylip tools seem to fail on
	Jython
Message-ID: <bug-3155-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155

           Summary: Some Phylip tools seem to fail on Jython
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: tiagoantao at gmail.com


According to the integration tests, some Phylip tools seem to fail on Jython.

Please see below or http://events.open-bio.org:8010/builders/jython/builds/18

======================================================================
ERROR: pseudosample a phylip DNA alignment written with AlignIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 270, in test_bootstrap_AlignIO_DNA
    self.check_bootstrap("Phylip/opuntia.phy", "phylip")
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 251, in check_bootstrap
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fseqboot -auto -filter -outfile=test_file -sequence=Phylip/opuntia.phy
-seqtype=d -reps=2

======================================================================
ERROR: pseudosample a phylip protein alignment written with AlignIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 279, in test_bootstrap_AlignIO_protein
    self.check_bootstrap("Phylip/hedgehog.phy", "phylip", "p")
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 251, in check_bootstrap
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fseqboot -auto -filter -outfile=test_file -sequence=Phylip/hedgehog.phy
-seqtype=p -reps=2

======================================================================
ERROR: Calculate distance matrix from an AlignIO written protein alignment
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 157, in test_distances_from_protein_AlignIO
    self.distances_from_alignment("Phylip/hedgehog.phy", DNA=False)
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 117, in distances_from_alignment
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fprotdist -auto -outfile=test_file -sequence=Phylip/hedgehog.phy -method=j

======================================================================
ERROR: Make a parsimony tree from an alignment written with AlignIO
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 210, in test_parsimony_tree_from_AlignIO_DNA
    self.parsimony_tree("Phylip/opuntia.phy", "phylip")
  File "/home/tantao/test/slave/jython/build/Tests/test_EmbossPhylipNew.py",
line 194, in parsimony_tree
    raise ValueError("Return code %s from:\n%s" \
ValueError: Return code 1 from:
fdnapars -auto -stdout -sequence=Phylip/opuntia.phy -outtreefile=test_file

======================================================================


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Nov  9 14:14:10 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 14:14:10 +0000
Subject: [Biopython-dev] bugzilla jython platform
In-Reply-To: <AANLkTimKpWaeKKyi10mEtLTeOy8f8BcHmBi3e_mv7ZVd@mail.gmail.com>
References: <AANLkTimKpWaeKKyi10mEtLTeOy8f8BcHmBi3e_mv7ZVd@mail.gmail.com>
Message-ID: <AANLkTikD4UquEALj0_DffoGtW5qwALn3PZikJ8BPOq=U@mail.gmail.com>

2010/11/9 Tiago Ant?o <tiagoantao at gmail.com>:
> Hi,
>
> Just a minor thingy: would it be possible to have a bugzilla platform
> called jython? (Or OS).
>
> I am going to report a bug on Jython and noticed that it is not available.
>

It doesn't make sense to me to add Jython as an OS (for one thing, the
OS field is used by all the Bio* projects on our bugzilla, also you can
run Jython on Windows/Mac/Linux etc).

Currently we don't even have a field for the Python version... maybe
we should add a whole new (Biopython only) field for this (e.g. with
Python 2.4, 2.5, 2.6, 2.7, 3.1, and Jython 2.5 as choices for now).

Peter


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 14:26:57 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 09:26:57 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091426.oA9EQvws028228@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-09 09:26 EST -------
I realise I don't have EMBOSS phylipnew installed on my machine with Jython, so
the test has just been skipped.

What version of Jython?

What version of EMBOSS, and the phylipnew package?

Do these tests pass *on the same machine* if run in normal (C) Python?
Alternately, do these four command line examples work when run by hand?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 14:55:50 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 09:55:50 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091455.oA9Eto7n029965@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #2 from tiagoantao at gmail.com  2010-11-09 09:55 EST -------
(In reply to comment #1)
> What version of Jython?

Jython 2.5.2rc2


> What version of EMBOSS, and the phylipnew package?

EMBOSS 6.0.1
Phylip seems 3.68

> Do these tests pass *on the same machine* if run in normal (C) Python?

Yep. This is the same machine as the one doing integration testing in C-Python

> Alternately, do these four command line examples work when run by hand?

No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy
does not exist. Indeed this should not work, I think


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 15:05:29 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 10:05:29 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091505.oA9F5SxD030383@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-09 10:05 EST -------
(In reply to comment #2)
> (In reply to comment #1)
> > What version of Jython?
> 
> Jython 2.5.2rc2

Can you easily update to Jython 2.5.2 (actual release)?

> > What version of EMBOSS, and the phylipnew package?
> 
> EMBOSS 6.0.1
> Phylip seems 3.68

Your EMBOSS is a bit old, but should be fine.

> > Do these tests pass *on the same machine* if run in normal (C) Python?
> 
> Yep. This is the same machine as the one doing integration testing in C-Python
> 

Good - that means we can rule out EMBOSS being too old.

> > Alternately, do these four command line examples work when run by hand?
> 
> No. I've noticed that the example files do not exist! e.g. Phylip/opuntia.phy
> does not exist. Indeed this should not work, I think
> 

The unit tests create Phylip/opuntia.phy at runtime, converted from
Clustalw/opuntia.aln -- I'd forgotten about that and it does make testing the
individual commands harder. The point here is to ensure the PHYLIP likes what
we write out as PHYLIP format.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 15:11:37 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 10:11:37 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091511.oA9FBbaK030580@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #4 from tiagoantao at gmail.com  2010-11-09 10:11 EST -------
> Can you easily update to Jython 2.5.2 (actual release)?

rc2 is the most recent. I can do 2.5.*1*


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Tue Nov  9 15:33:39 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 9 Nov 2010 10:33:39 -0500
Subject: [Biopython-dev] [Bug 3155] Some Phylip tools seem to fail on Jython
In-Reply-To: <bug-3155-42@http.bugzilla.open-bio.org/>
Message-ID: <201011091533.oA9FXdSo031629@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3155


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-09 10:33 EST -------
(In reply to comment #4)
> > Can you easily update to Jython 2.5.2 (actual release)?
> 
> rc2 is the most recent. I can do 2.5.*1*

Sorry - my mistake. I have Jython 2.5.1 (final release).

I'll try to get EMBOSS phylipnew on this machine (useful anyway as a potential
buildbot slave).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From biopython at maubp.freeserve.co.uk  Tue Nov  9 22:54:13 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Nov 2010 22:54:13 +0000
Subject: [Biopython-dev] buildbot and setup.py
Message-ID: <AANLkTinhPa5VgCLzs3WS+ezyghL4JsD=jzb4Qwndw1vy@mail.gmail.com>

Hi all,

For the continuous integration server, it is important
to be able to run setup.py without it prompting the
user. There are (just?) two potential prompts at the
moment.

First, if running on Python 3, it asks the user to
confirm they have run 2to3 as per the README
file. This was done as a bit of a hack - perhaps
now that most of the Python code works on Py3
we can avoid this?

Second, if running without NumPy, it asks the user
if they really want to do this as it is best to install
NumPy to use all of Biopython.

For the purposes of the buildbot, I think we should
have at least one build-slave without NumPy. This
should then catch any regressions in the test suite.
Since Jython doesn't have NumPy (and so we
don't prompt about it) then maybe that would
double in this role for the test matrix ;)

Right now Tiago has solved the first prompt (about
2to3) by piping a "y\n" into stdin. I guess piping
two would solve the case of no NumPy on Py3 ;)

However, do we need an --auto or --force flag
to bypass these yes or no prompts in setup.py?

(Meanwhile I'm off to install NumPy under
Python 3 on my Linux box which will avoid
the issue for now)

Peter


From tiagoantao at gmail.com  Wed Nov 10 00:15:02 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 00:15:02 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
Message-ID: <AANLkTim488HLxcxsw0yTvR7T6row3C3gvoHNFf4Ww_wT@mail.gmail.com>

2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
> One concern is if we have (say) 8 builtbot slaves, and a change on
> the trunk accidentally breaks a unit test (on all platforms), does that
> mean we'd get one email or eight?

I was wrong here. It is not possible to send only one email. I misread
the documentation.
But it is quite simple to extend the mail system (by code) to do this.
I least it seems simple: I will have a try at it tomorrow.

For now I am only sending automated emails to myself and Peter. If
anyone wants to be in the loop, please tell me.

As soon as the system is reliable I will send to biopython-dev.


From tiagoantao at gmail.com  Wed Nov 10 00:21:15 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 00:21:15 +0000
Subject: [Biopython-dev] Continuous integration server
In-Reply-To: <AANLkTimh3umr_D=bRchK0shWT0zwxo45EeaiMop-53Sz@mail.gmail.com>
References: <AANLkTikMrhZ+MryAg4emN=Q+RCOFG-gxtysA_g=edTsa@mail.gmail.com>
	<AANLkTikURQGajrVKhZFY=+j4cBHz7Yyn37A-ag_6DUcy@mail.gmail.com>
	<AANLkTimh3umr_D=bRchK0shWT0zwxo45EeaiMop-53Sz@mail.gmail.com>
Message-ID: <AANLkTim7_T=pO6s7UsOWfadmNz9ZYE6Yof-rkSROb=s-@mail.gmail.com>

2010/11/9 Peter <biopython at maubp.freeserve.co.uk>:
>> The web interface has RSS and atom.
>
> Yet another feed for me to track :)

In order to minimize the number of feed entries one can specify
constraints, useful is just to report failed builds. Like this
http://events.open-bio.org:8010/rss?failures_only=true
Which only shows entries that relate to failures.

Tiago


From eric.talevich at gmail.com  Wed Nov 10 03:04:38 2010
From: eric.talevich at gmail.com (Eric Talevich)
Date: Tue, 9 Nov 2010 22:04:38 -0500
Subject: [Biopython-dev] buildbot and setup.py
In-Reply-To: <AANLkTinhPa5VgCLzs3WS+ezyghL4JsD=jzb4Qwndw1vy@mail.gmail.com>
References: <AANLkTinhPa5VgCLzs3WS+ezyghL4JsD=jzb4Qwndw1vy@mail.gmail.com>
Message-ID: <AANLkTimNfS_B=6SNggrKTABEdcjwn7X2Js0jAeqOeS_q@mail.gmail.com>

On Tue, Nov 9, 2010 at 5:54 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> Hi all,
>
> For the continuous integration server, it is important
> to be able to run setup.py without it prompting the
> user. There are (just?) two potential prompts at the
> moment.
>
> [...]
>
However, do we need an --auto or --force flag
> to bypass these yes or no prompts in setup.py?
>

I'd find a flag like that convenient for running setup.py manually, too.

For reference: apt-get takes a "-y" option which assumes a "yes" answer to
all prompts, just like this.

-Eric


From biopython at maubp.freeserve.co.uk  Wed Nov 10 11:48:30 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 11:48:30 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure on
	Jython
Message-ID: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>

Hi Taigo,

>From your buildbot log for Jython 2.5.2 (release candidate 2), and
also my Mac OS
Jython 2.5.1 install, we have a PopGen failure:

======================================================================
FAIL: Test get alleles.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py",
line 57, in test_get_alleles
    self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20])
AssertionError: [20, 3] != [3, 20]

Notice that by using the unittest assertEqual method we get to see the
values compared:
https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a

Before the change the output was like this:

======================================================================
FAIL: Test get alleles.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles
    assert self.ctrl.get_alleles(0,"Locus3") == [3, 20]
AssertionError


It is interesting that Jython is giving [20, 3] rather than [3, 20]. My
guess would be this is down to something python implementation
specific like the sort order of dictionaries or sets, in which case
the unittest needs to compare sorted lists -- or the get_alleles
method needs a sort?

Peter


From tiagoantao at gmail.com  Wed Nov 10 13:05:59 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 13:05:59 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
Message-ID: <AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>

I know, this might be an issue with the jython version (being just a
release candidate). I am going to wait for results on 2.5.1 and
compare. Or I might just install it myself and see.

Is there any reason for the unittest framework to ignore OSErrors? I
am getting some OSErrors (just in jython 2.5.2) and they are being
ignored (but reported as warnings)...

Tiago

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> Hi Taigo,
>
> From your buildbot log for Jython 2.5.2 (release candidate 2), and
> also my Mac OS
> Jython 2.5.1 install, we have a PopGen failure:
>
> ======================================================================
> FAIL: Test get alleles.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "/home/tantao/test/slave/jython252lin/build/Tests/test_PopGen_GenePop_EasyController.py",
> line 57, in test_get_alleles
> ? ?self.assertEqual(self.ctrl.get_alleles(0,"Locus3"), [3, 20])
> AssertionError: [20, 3] != [3, 20]
>
> Notice that by using the unittest assertEqual method we get to see the
> values compared:
> https://github.com/biopython/biopython/commit/06a719be51ecd207b781224d3f57bb5ebb07198a
>
> Before the change the output was like this:
>
> ======================================================================
> FAIL: Test get alleles.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "test_PopGen_GenePop_EasyController.py", line 57, in test_get_alleles
> ? ?assert self.ctrl.get_alleles(0,"Locus3") == [3, 20]
> AssertionError
>
>
> It is interesting that Jython is giving [20, 3] rather than [3, 20]. My
> guess would be this is down to something python implementation
> specific like the sort order of dictionaries or sets, in which case
> the unittest needs to compare sorted lists -- or the get_alleles
> method needs a sort?
>
> Peter
>


-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Wed Nov 10 13:15:16 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 13:15:16 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
Message-ID: <AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>

2010/11/10 Tiago Ant?o <tiagoantao at gmail.com>:
>
> I know, this might be an issue with the jython version (being just a
> release candidate). I am going to wait for results on 2.5.1 and
> compare. Or I might just install it myself and see.

I also see the same test_get_alleles failure on the Mac and on
Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase
candidate specific issue.

> Is there any reason for the unittest framework to ignore OSErrors? I
> am getting some OSErrors (just in jython 2.5.2) and they are being
> ignored (but reported as warnings)...
>
> Tiago

I've just recently put Jython 2.5.1 on my Windows box, and
in addition to the test_get_alleles failure, I also see OSErrors
about being unable to delete files (but the F stats test still
passes). This seems to be a wider issue, affecting more than
just test_PopGen_GenePop_EasyController.py, but it does
seem to be OS specific (no problems deleting files in
Jython 2.5.1 on my Mac, I've not tried on Linux).

Peter


From biopython at maubp.freeserve.co.uk  Wed Nov 10 14:14:07 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 14:14:07 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
Message-ID: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>

Hi Tiago

Is/was test_PopGen_SimCoal.py working for you on Windows?
I'm getting "Output directory not created!" under Python 2.6

I've also tried it under Jython 2.5.1 and had to tweak things to
find the executable, thus:
https://github.com/biopython/biopython/commit/95cba71f7286860fa9cd79843c47b075a2f530a6

Now both Jython 2.5.1 and Python 2.6 give the same error,
"Output directory not created!" (progress I suppose).

Peter

P.S. On the bright side, both the FDist2 and DFDist tests are
passing on Windows on Python 2.6 and Jython 2.5.1 now
(after a couple of little tweaks).


From tiagoantao at gmail.com  Wed Nov 10 14:35:31 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 14:35:31 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
Message-ID: <AANLkTi=phgYOTJWHAq1vMEYr9rNPbdG-eckJm=Asa4oH@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> I've just recently put Jython 2.5.1 on my Windows box, and
> in addition to the test_get_alleles failure, I also see OSErrors
> about being unable to delete files (but the F stats test still
> passes). This seems to be a wider issue, affecting more than
> just test_PopGen_GenePop_EasyController.py, but it does
> seem to be OS specific (no problems deleting files in
> Jython 2.5.1 on my Mac, I've not tried on Linux).

The OSError has to potential to be somewhat nasty (i.e. throughout
other Bio.* modules) as it is silent. There might be tests failing
that report OK.

Tiago


From tiagoantao at gmail.com  Wed Nov 10 14:42:18 2010
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Wed, 10 Nov 2010 14:42:18 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
In-Reply-To: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
References: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
Message-ID: <AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> Hi Tiago
>
> Is/was test_PopGen_SimCoal.py working for you on Windows?
> I'm getting "Output directory not created!" under Python 2.6

This code is used 99.99% on Jython (as the fdist/dfdist code and
genepop parser, BTW). I happen to test on Linux.
I will fire my Windows machine and have a look, but I do not have it
at hand. This will have to wait a few hours or a couple of days at
most)


> Now both Jython 2.5.1 and Python 2.6 give the same error,
> "Output directory not created!" (progress I suppose).

I cannot test this here, but I am 99% sure that the problem is the
executable name (case sensitive on Windows and Mac, maybe even on
Windows Jython?). If it is compiled with a capital S (seen happening)
it might be a problem.

> P.S. On the bright side, both the FDist2 and DFDist tests are
> passing on Windows on Python 2.6 and Jython 2.5.1 now
> (after a couple of little tweaks).

Were they failing on Jython? I do have a reasonable amount of users on
my applications (jython based)...


-- 
"If you want to get laid, go to college.? If you want an education, go
to the library." - Frank Zappa


From biopython at maubp.freeserve.co.uk  Wed Nov 10 15:13:27 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 15:13:27 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
In-Reply-To: <AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>
References: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
	<AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>
Message-ID: <AANLkTinBqkc2dEWZ82RBiP89TwwP5WfSge+=rH4-GUYH@mail.gmail.com>

2010/11/10 Tiago Ant?o <tiagoantao at gmail.com>:
>
> 2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
>> Hi Tiago
>>
>> Is/was test_PopGen_SimCoal.py working for you on Windows?
>> I'm getting "Output directory not created!" under Python 2.6
>
> This code is used 99.99% on Jython (as the fdist/dfdist code and
> genepop parser, BTW). I happen to test on Linux.
> I will fire my Windows machine and have a look, but I do not have it
> at hand. This will have to wait a few hours or a couple of days at
> most)
>
>
>> Now both Jython 2.5.1 and Python 2.6 give the same error,
>> "Output directory not created!" (progress I suppose).
>
> I cannot test this here, but I am 99% sure that the problem is the
> executable name (case sensitive on Windows and Mac, maybe even on
> Windows Jython?). If it is compiled with a capital S (seen happening)
> it might be a problem.

It could also be something with spaces in filenames, much
more common on Windows :(

>> P.S. On the bright side, both the FDist2 and DFDist tests are
>> passing on Windows on Python 2.6 and Jython 2.5.1 now
>> (after a couple of little tweaks).
>
> Were they failing on Jython? I do have a reasonable amount
> of users on my applications (jython based)...

I tweaked the executable checking in the unit tests, it now
looks for all four binaries required, and works on Windows
(both Python and Jython) and Mac (both Python and Jython).

Peter


From biopython at maubp.freeserve.co.uk  Wed Nov 10 17:35:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 10 Nov 2010 17:35:37 +0000
Subject: [Biopython-dev] test_PopGen_SimCoal.py on Windows
In-Reply-To: <AANLkTinBqkc2dEWZ82RBiP89TwwP5WfSge+=rH4-GUYH@mail.gmail.com>
References: <AANLkTinaO3D=tL1BLsOgHTPhS65X7aGQVXQ1G7wUk=Om@mail.gmail.com>
	<AANLkTikRZNemx=43O6t0M36YcZEKUM8W-Fqer=XAe6bf@mail.gmail.com>
	<AANLkTinBqkc2dEWZ82RBiP89TwwP5WfSge+=rH4-GUYH@mail.gmail.com>
Message-ID: <AANLkTi=tYbJqFgHdZrOgqMrcdPQnZOqoDpHoUf9-5HZO@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
>>
>> I cannot test this here, but I am 99% sure that the problem is the
>> executable name (case sensitive on Windows and Mac, maybe even on
>> Windows Jython?). If it is compiled with a capital S (seen happening)
>> it might be a problem.
>
> It could also be something with spaces in filenames, much
> more common on Windows :(
>

Yep, that was it. Fixed:
https://github.com/biopython/biopython/commit/e24f1662b5e619d558fea17c11ddea12c3561e53

I've got my Windows box running as a buildslave now, so
fingers crossed it will all be green.

Peter


From lpritc at scri.ac.uk  Thu Nov 11 14:12:21 2010
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Thu, 11 Nov 2010 14:12:21 +0000
Subject: [Biopython-dev] Bioinformatics position
Message-ID: <C901AA45.3F8C2%lpritc@scri.ac.uk>

We have a bioinformatics post available at SCRI, and would be grateful if
you could please bring it to the attention of any colleagues who may be
interested in applying.  It is advertised at
http://www.jobs.ac.uk/job/ABS904/bioinformatics/ and some details are
included below:

"""
Bioinformatics
Scottish Crop Research Institute- SCRI
SCRI is Scotland's leading Institute for research on plants and their
interactions with the environment, particularly in managed ecosystems. Our
mission is to conduct excellent research in plant and environmental
sciences. Our vision is to deliver innovative products, knowledge and
services that enrich the life of the community and address the public goods
of environmental sustainability, high quality and healthy food.

Post Reference SMB/1/10

Research in the Plant Pathology Programme at SCRI is founded on pathogen
genomics, and scientists in the Programme have a strong track record of
contributing to whole genome sequencing and genetic analysis of economically
important pests and pathogens.? The successful candidate will collaborate
with other groups in the Programme working on plant-pathogen interactions
developing innovative approaches to understand disease processes.?This post
provides an opportunity to influence biological research of direct impact to
agriculture.

The ideal candidate would be experienced in manipulating and curating large
biological datasets with a record of collaboration and integration with
biologists.The successful applicant is expected to have an interest in
plant-pathogen interactions and to develop their own research profile.The
candidate should have a PhD or equivalent in bioinformatics, biostatistics
or a related field.

Informal enquiries from:??Leighton.Pritchard at scri.ac.uk
<mailto:Leighton.Pritchard at scri.ac.uk> ?or?Lesley.Torrance at scri.ac.uk
<mailto:Lesley.Torrance at scri.ac.uk>

Salary Scale For All Posts:

*Band D/E, ?26,610 - ?37,534 (commensurate with experience)

*Appointments to Band F, ?42,769 - ?47,521 available for exceptional
candidates.

Candidates willing to apply for a research fellowship to further help
establish their own laboratory are encouraged to apply and will, if
successful, benefit from generous Institute support throughout the tenure of
their fellowship.

Further information on the above posts, including how to apply, is available
on the SCRI website athttp://www.scri.ac.uk/careers/vacancies
<http://www.scri.ac.uk/careers/vacancies> ?

Closing date -?Friday 19th?November 2010.

The Institute is an equal opportunities employer.
"""

Many thanks,

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405


______________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
______________________________________________________


From biopython at maubp.freeserve.co.uk  Thu Nov 11 16:45:43 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 16:45:43 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>

On Thu, Nov 11, 2010 at 4:08 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
> I finally found the time, and the 62Gb needed to test the TrEmbl database
> in uniprot xml format.

Is that the size on disk of the XML file? 62GB is a lot.

> the analisis ic currently going, but so far I've been able to parse 1
> million entries out of 12 millions (it will go overnight...)
>
> I've had just one problem with the entry: Q2LEH1_9ROSI
> in the downloaded files, there are multiple organism name fields, one of
> wich is empty:
>
> ...
> ?<organism evidence="EI1">
> ? ?<name type="scientific"></name>
> ? ?<name type="common">Populus tomentosa x P. bolleana) x P. tomentosa
> var. truncat</name>
> ...
>
> this part of the file is differentially reported on the uniprot server at:
> http://www.uniprot.org/uniprot/Q2LEH1.xml
>
> ...
> ?<organism evidence="EI1">
> ?<name type="scientific">(Populus tomentosa x P. bolleana) x P. tomentosa
> var. truncata</name>
> ...
>
> now, given also the missing start parenthesis, I think there is an error
> non the downloaded XML file.

It sounds like it - have you told UniProt?

> I've attached a patch that should cope with this issue. I don't know if
> there are more "errors" in the xml file.
> the patch was made on the current version of biopython master branch on
> github and is valid for commit ?9363c3cdc5f51805f247.
>
> Andrea

Checked in, thanks:
https://github.com/biopython/biopython/commit/38da3ff264fe180e903cda4c143a7aa9be3d431a

Peter


From andrea at biocomp.unibo.it  Thu Nov 11 16:08:58 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Thu, 11 Nov 2010 17:08:58 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
Message-ID: <ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>

I finally found the time, and the 62Gb needed to test the TrEmbl database
in uniprot xml format.
the analisis ic currently going, but so far I've been able to parse 1
million entries out of 12 millions (it will go overnight...)

I've had just one problem with the entry: Q2LEH1_9ROSI
in the downloaded files, there are multiple organism name fields, one of
wich is empty:

...
  <organism evidence="EI1">
    <name type="scientific"></name>
    <name type="common">Populus tomentosa x P. bolleana) x P. tomentosa
var. truncat</name>
...

this part of the file is differentially reported on the uniprot server at:
http://www.uniprot.org/uniprot/Q2LEH1.xml

...
 <organism evidence="EI1">
  <name type="scientific">(Populus tomentosa x P. bolleana) x P. tomentosa
var. truncata</name>
...

now, given also the missing start parenthesis, I think there is an error
non the downloaded XML file.

I've attached a patch that should cope with this issue. I don't know if
there are more "errors" in the xml file.
the patch was made on the current version of biopython master branch on
github and is valid for commit  9363c3cdc5f51805f247.

Andrea
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UniprotIO.patch
Type: /
Size: 610 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101111/3f9a10ae/attachment-0002.bin>

From andrea at biocomp.unibo.it  Thu Nov 11 17:15:08 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Thu, 11 Nov 2010 18:15:08 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
Message-ID: <cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>


>
> Is that the size on disk of the XML file? 62GB is a lot.

yes, my macbook is getting very hot...

> It sounds like it - have you told UniProt?

I've notified them, let's see what they say...

Anyhow the parser works. I just don't know if we should have an
internet browser-like approach interpreting errors, or just be
consistent and raise an error if there is a format error.

in this case an empty organism name is an error.


From biopython at maubp.freeserve.co.uk  Thu Nov 11 19:16:57 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 19:16:57 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
Message-ID: <AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>

2010/11/10 Peter <biopython at maubp.freeserve.co.uk>:
> 2010/11/10 Tiago Ant?o <tiagoantao at gmail.com>:
>>
>> I know, this might be an issue with the jython version (being just a
>> release candidate). I am going to wait for results on 2.5.1 and
>> compare. Or I might just install it myself and see.
>
> I also see the same test_get_alleles failure on the Mac and on
> Windows 32 using Jython 2.5.1, so it isn't a Jython 2.5.2 relase
> candidate specific issue.

Yes, the order just came from the order of a dict's keys - which
is Python implementation dependent. Quick fix committed:

https://github.com/biopython/biopython/commit/2aa604e54df02804219e092141bb32728b021a64

If you actually care about the order, then perhaps add a
sorted(...) to the get_alleles method itself instead?

Peter


From biopython at maubp.freeserve.co.uk  Thu Nov 11 20:19:05 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 20:19:05 +0000
Subject: [Biopython-dev] Jython on Windows: OSError deleting files
Message-ID: <AANLkTimO-YUNPy6U7J-QAYxW1a-OZcG0rYBDzZQFHAHy@mail.gmail.com>

Hi all,

I recently installed Jython 2.5.1 on Windows XP (32 bit) for
use as a build slave. This showed up some new bugs, in
particular several problems with trying to delete temp
files triggering an OSError.

It turns out this can be triggered by trying to delete a
file while we still have a handle open on it. This is a
Windows limitation, but we don't see it on normal
Python because there the garbage collector closes
handles promptly when they go out of scope. The Java
garbage collector doesn't do that. See also:

http://web.archiveorange.com/archive/v/8tc1Z6ysA03SXedms7TA

In particular, I am aware that if given a filename the
SeqIO and AlignIO read and parse functions did not
explicitly close the handle they open. I was intending
to address this with a with statement in Python 2.5+,
but it can be solved in Python 2.4 as well. I have
started to address this, e.g.
https://github.com/biopython/biopython/commit/0fb039b745b0b2ddacf2a6c9ee8afcdb56018f3c
https://github.com/biopython/biopython/commit/936ea5f348cc1feea8556d263761e77ce960217e

Assuming it will be easier to fix on Python 2.5+, it
might be pragmatic to ignore the issue in the short
term since it only seems to affect Jython on Windows.

Peter


From rjalves at igc.gulbenkian.pt  Thu Nov 11 22:06:06 2010
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Thu, 11 Nov 2010 22:06:06 +0000
Subject: [Biopython-dev] Uniprot parsers
Message-ID: <4CDC68CE.9070401@igc.gulbenkian.pt>

Hi everyone,

With the arrival of the Uniprot XML parser, is the swiss format still
going to be maintained?

I just clashed with a 'swiss' format parsing problem present in the
1.55b release (and previous releases). Seems like the format might have
changed.

One random case is [1] where all of the 2nd and following IDs are
ignored by the parser. In Ensembl, for instance, the parser only
collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd)
identifiers.

Is this a known issue?

Regards,
Renato

[1] http://www.uniprot.org/uniprot/P31946.txt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101111/01034c9b/attachment.sig>

From biopython at maubp.freeserve.co.uk  Thu Nov 11 22:26:22 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 11 Nov 2010 22:26:22 +0000
Subject: [Biopython-dev] Uniprot parsers
In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt>
References: <4CDC68CE.9070401@igc.gulbenkian.pt>
Message-ID: <AANLkTikNd0FuWn8_QspaRrmGe_ahLxbJ6=Hkt+1+GOfi@mail.gmail.com>

On Thu, Nov 11, 2010 at 10:06 PM, Renato Alves
<rjalves at igc.gulbenkian.pt> wrote:
> Hi everyone,
>
> With the arrival of the Uniprot XML parser, is the swiss format still
> going to be maintained?

Definitely yes in the short term, for one thing the swiss files are
smaller and much faster to parse. I suspect UniProt themselves
may want to retire the swiss text format at some point, but moving
every user over to XML will take some time.

> I just clashed with a 'swiss' format parsing problem present in the
> 1.55b release (and previous releases). Seems like the format might have
> changed.
>
> One random case is [1] where all of the 2nd and following IDs are
> ignored by the parser. In Ensembl, for instance, the parser only
> collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd)
> identifiers.
>
> Is this a known issue?
>

No - could you file a bug one this with a short example to explain
what result you get, and what you want.

Thanks,

Peter


From bugzilla-daemon at portal.open-bio.org  Thu Nov 11 23:09:04 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Nov 2010 18:09:04 -0500
Subject: [Biopython-dev] [Bug 3156] New: UniProt XML and SwissProt parsers
	silently fail to parse all of database references
Message-ID: <bug-3156-42@http.bugzilla.open-bio.org/>

http://bugzilla.open-bio.org/show_bug.cgi?id=3156

           Summary: UniProt XML and SwissProt parsers silently fail to parse
                    all of database references
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: rjalves at igc.gulbenkian.pt


Example code:

from Bio import SeqIO, ExPASy
entry = SeqIO.read(ExPASy.get_sprot_raw('P31946'), 'swiss')

If you then inspect entry.dbxrefs, you can see that it includes:

['Ensembl:ENST00000353703', 'Ensembl:ENST00000372839']

but not
['Ensembl:ENSP00000300161', 'Ensembl:ENSG00000166913'.
'Ensembl:ENSP00000361930', 'Ensembl:ENSG00000166913']

which are present in the original file as:
DR   Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913.
DR   Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913.


The same happens with the XML format and the new uniprot-xml parser where the
original file contains:

<dbReference type="Ensembl" id="ENST00000353703" key="75">
<property type="protein sequence ID" value="ENSP00000300161" />
<property type="gene ID" value="ENSG00000166913" />
</dbReference>


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From rjalves at igc.gulbenkian.pt  Thu Nov 11 22:32:41 2010
From: rjalves at igc.gulbenkian.pt (Renato Alves)
Date: Thu, 11 Nov 2010 22:32:41 +0000
Subject: [Biopython-dev] Uniprot parsers
In-Reply-To: <4CDC68CE.9070401@igc.gulbenkian.pt>
References: <4CDC68CE.9070401@igc.gulbenkian.pt>
Message-ID: <4CDC6F09.9090506@igc.gulbenkian.pt>

Actually I just tested the Uniprot-XML parser and it seems to suffer
from the same issue...

It ignores the following XML "properties":

<dbReference type="Ensembl" id="ENST00000353703" key="75">
<property type="protein sequence ID" value="ENSP00000300161" />
<property type="gene ID" value="ENSG00000166913" />
</dbReference>


Quoting Renato Alves on 11/11/2010 10:06 PM:
> Hi everyone,
> 
> With the arrival of the Uniprot XML parser, is the swiss format still
> going to be maintained?
> 
> I just clashed with a 'swiss' format parsing problem present in the
> 1.55b release (and previous releases). Seems like the format might have
> changed.
> 
> One random case is [1] where all of the 2nd and following IDs are
> ignored by the parser. In Ensembl, for instance, the parser only
> collects the ENST (the 1st) but not the ENSP (2nd) and ENSG (3rd)
> identifiers.
> 
> Is this a known issue?
> 
> Regards,
> Renato
> 
> [1] http://www.uniprot.org/uniprot/P31946.txt

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101111/efefdb12/attachment.sig>

From bugzilla-daemon at portal.open-bio.org  Thu Nov 11 23:50:46 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Nov 2010 18:50:46 -0500
Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers
	silently fail to parse all of database references
In-Reply-To: <bug-3156-42@http.bugzilla.open-bio.org/>
Message-ID: <201011112350.oABNokG9031101@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3156


------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2010-11-11 18:50 EST -------
That was by design, dbxrefs is a flat list and for consistency with other
formats we have only stored the primary identifier.

Would you regard this as two primary cross references, or six?

DR   Ensembl; ENST00000353703; ENSP00000300161; ENSG00000166913.
DR   Ensembl; ENST00000372839; ENSP00000361930; ENSG00000166913


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From bugzilla-daemon at portal.open-bio.org  Thu Nov 11 23:59:20 2010
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Nov 2010 18:59:20 -0500
Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers
	silently fail to parse all of database references
In-Reply-To: <bug-3156-42@http.bugzilla.open-bio.org/>
Message-ID: <201011112359.oABNxKcn031294@portal.open-bio.org>

http://bugzilla.open-bio.org/show_bug.cgi?id=3156


------- Comment #2 from rjalves at igc.gulbenkian.pt  2010-11-11 18:59 EST -------
Five primary references since ENSG00000166913 is repeated twice (once per
line).

More precisely,
ENSG = Ensembl Gene
ENST = Ensembl Transcript
ENSP = Ensembl Protein


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


From andrea at biocomp.unibo.it  Fri Nov 12 01:02:14 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 02:02:14 +0100 (CET)
Subject: [Biopython-dev] [Bug 3156] UniProt XML and SwissProt parsers
 silently fail to parse all of database references
In-Reply-To: <mailman.2563.1289519960.2958.biopython-dev@lists.open-bio.org>
References: <mailman.2563.1289519960.2958.biopython-dev@lists.open-bio.org>
Message-ID: <7c21462addfa62e09fd6c42135cc7d76.squirrel@lipid.biocomp.unibo.it>

it was by construction also in the XML format,
there is also a comment at line 343 of UniprotIO.py to address
this issue.
to parse this type of data an adapter for each db type should be written,
since each DB has different data, ancd can have different structurese.
also note that the Ensembl reference fields as recently undergone a
change of format in the XML file:

http://www.uniprot.org/docs/xml_news.htm

this happens in release 2010_10.

Andrea


From andrea at biocomp.unibo.it  Fri Nov 12 10:24:07 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 11:24:07 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
Message-ID: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>

WIth the submitted patch the parser was able to correctly parse 12.347.303
entries in
the 62Gb XML file in 2h 13m.
it looks like a reasonable performance to me, since you are going to spend
more time
in downloading the 8Gb gzipped file and decompressing it.

Andrea


From biopython at maubp.freeserve.co.uk  Fri Nov 12 10:29:51 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 12 Nov 2010 10:29:51 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
	<430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>

On Fri, Nov 12, 2010 at 10:24 AM, Andrea Pierleoni wrote:
> WIth the submitted patch the parser was able to correctly parse 12.347.303
> entries in the 62Gb XML file in 2h 13m.

That's good - but I thought the patch broke the unit test so I reverted it
last night. I'll double check this.

> it looks like a reasonable performance to me, since you are going to spend
> more time in downloading the 8Gb gzipped file and decompressing it.

On the other hand, you only download it once, and will probably only
decompress it once (although you can parse gzipped files from within
python if you want to), but you will parse it many times.

My point is it probably could be made faster (if anyone wanted to spend
the time), but it is fast enough already to be useful, and worth having
in Biopython :)

Peter


From andrea at biocomp.unibo.it  Fri Nov 12 11:05:43 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 12:05:43 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
	<430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>
Message-ID: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it>


> That's good - but I thought the patch broke the unit test so I reverted it
> last night. I'll double check this.
>

yes I've seen it in github, can you fix it?


> On the other hand, you only download it once, and will probably only
> decompress it once (although you can parse gzipped files from within
> python if you want to), but you will parse it many times.
>

well, if your looking to performance, you're not scanning a 62Gb file each
time
you search for an entry, but your going to index it. the of course it
depends on
what you are doing... but, given the monthly release, maybe you're
downloading
and decompressing (or parsing a compressed file) once a month.

> My point is it probably could be made faster (if anyone wanted to spend
> the time), but it is fast enough already to be useful, and worth having
> in Biopython :)

Yes, I hope it can be made faster, but I have no idea about this, since
the process is very straightforward. I did not make any profiling of the
parser, so I cannot exclude some
bottleneck.
the only obvious speed up would be using the multiprocessing library in
multi-cpu
system, but I've never seen it used in biopython.
It should be really easy to implement, and maybe we can think about it
after python 2.4
support is dropped.  as far as i know, multiprocessing is included in
python 2.6 and
available in python  2.5.

On the other hand, Biopython has the fastest uniprot XML parser among Bio*
projects
and (to my knowledge) the fastest public parser on the planet ;) I bet
Uniprot guys have
their parser...

Andrea


From biopython at maubp.freeserve.co.uk  Fri Nov 12 12:00:42 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 12 Nov 2010 12:00:42 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it>
References: <AANLkTineNfa+eMqcUyN7+anQ4OQOyLnVYOT+gM5H_Qg3@mail.gmail.com>
	<AANLkTimcrZBsL_1re6wYn0qr2H3Z-0Tq3Wo7748Pifvz@mail.gmail.com>
	<3cb74578eeedb8825ef75202c909b843.squirrel@lipid.biocomp.unibo.it>
	<AANLkTikCzLALtfhydpM7n3=fC=0+WoSuMnuzFxhmwgvV@mail.gmail.com>
	<ef80b0313dade56171f9d119dbc2baea.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimx7OZvgqbWOtV9T33Zek6HODw8pWnOkEU3Wqwk@mail.gmail.com>
	<cf0f600f7252fc960d7f3ac1a5c720c2.squirrel@lipid.biocomp.unibo.it>
	<430ea31975638cdd972a3aa01757fa03.squirrel@lipid.biocomp.unibo.it>
	<AANLkTimPmqPDdiLAANuGepLWyyu74p=wGu2i-6gvb7LX@mail.gmail.com>
	<6c12e6fda6bab033738ed36d74d2a24a.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTi=zwkXWAUfkDaEAfU+FxrNAqX5KLUr0a8uOGZUY@mail.gmail.com>

On Fri, Nov 12, 2010 at 11:05 AM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>
>> That's good - but I thought the patch broke the unit test so I reverted it
>> last night. I'll double check this.
>>
>
> yes I've seen it in github, can you fix it?
>

Probably. I'll make time to look at it before the Biopython 1.56 release
(which is unlikely to happen this week, delayed by the identification of
some problems running under Jython on Windows).

>> On the other hand, you only download it once, and will probably only
>> decompress it once (although you can parse gzipped files from within
>> python if you want to), but you will parse it many times.
>>
>
> well, if your looking to performance, you're not scanning a 62Gb file
> each time you search for an entry, but your going to index it. the of
> course it depends on what you are doing... but, given the monthly
> release, maybe you're downloading and decompressing (or parsing
> a compressed file) once a month.

Yeah, it depends.

>> My point is it probably could be made faster (if anyone wanted to spend
>> the time), but it is fast enough already to be useful, and worth having
>> in Biopython :)
>
> Yes, I hope it can be made faster, but I have no idea about this, since
> the process is very straightforward. I did not make any profiling of the
> parser, so I cannot exclude some bottleneck.

That would be worth while at some point.

> the only obvious speed up would be using the multiprocessing library in
> multi-cpu system, but I've never seen it used in biopython.

We haven't been able to due to the Python 2.4 requirement, but
I know of people using Biopython and multiprocessing together.

> It should be really easy to implement, and maybe we can think about
> it after python 2.4 support is dropped. ?as far as i know, multiprocessing
> is included in python 2.6 and available in python ?2.5.

Personally I'd try profiling the current single threaded code before
going to multiprocessing.

> On the other hand, Biopython has the fastest uniprot XML parse
> among Bio* projects and (to my knowledge) the fastest public
> parser on the planet ;) I bet Uniprot guys have their parser...

Which of the other Bio* projects have a Uniprot XML parser?
(Or was that intended as a joke?)

Peter


From p.j.a.cock at googlemail.com  Fri Nov 12 17:18:52 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 12 Nov 2010 17:18:52 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
	<AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>
Message-ID: <AANLkTikWZt42DK2rp2hxhWBKGGHpA26QQu4te-m4hrnA@mail.gmail.com>

Hi all,

I've exchanged a few emails with Tiago off list regarding an inconsistent
test_PopGen_GenePop_EasyController.py problem (most visible on
Jython), giving error "Unable to open file genepop.txt".

I've just had it from Python 2.7 on a 32bit Linux machine:

======================================================================
ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest)
Test get pairwise Fst.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
line 98, in test_get_avg_fst_pair
    pop_fis =  self.ctrl.get_avg_fst_pair()
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
line 162, in get_avg_fst_pair
    return self._controller.calc_fst_pair(self._fname)[1]
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 819, in calc_fst_pair
    self._run_genepop([".ST2", ".MIG"], [6,2], fname)
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 296, in _run_genepop
    % (ret, e_out.strip().split("\n",1)[0]))
IOError: GenePop error -11, Unable to open file genepop.txt

======================================================================
ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest)
Test get average Fst for pairwise pops on a locus.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
line 93, in test_get_avg_fst_pair_locus
    self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45)
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
line 166, in get_avg_fst_pair_locus
    iter = self._controller.calc_fst_pair(self._fname)[0]
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 819, in calc_fst_pair
    self._run_genepop([".ST2", ".MIG"], [6,2], fname)
  File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
line 296, in _run_genepop
    % (ret, e_out.strip().split("\n",1)[0]))
IOError: GenePop error -11, Unable to open file genepop.txt

----------------------------------------------------------------------


This failed twice in a row, then passed four times in a row (Linux, Python 2.7).
I suspect the issue was related to machine IO load - during the first
tests I had
something compiling at the same time. I can't reproduce it on demand :(

I've also seen it on the Mac with Apple's Python 2.6 (although usually it is
usually fine).

However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac.

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 12 17:47:22 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 12 Nov 2010 17:47:22 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
Message-ID: <AANLkTimkZk3n9VLb3fLtg2-GwvpMhZreSVnBbB4-LB6W@mail.gmail.com>

On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> I've mentioned in recent threads that I think we should try and
> release Biopython 1.56 this month (November 2010).
>
> I think the NEWS file is pretty up to date, and covers important
> new functionality like Andrea Pierleoni's UniProt XML parser
> and the IMGT support (with Uri Laserson).
>
> Is there any other functionality which is ready for merging?
>
> For example, Tiago - you've been doing lots of work on your
> branch with the PopGen code. Is that code ready? I'm willing
> to do the git merge/rebase.
>
> Is there any reason to bother with a beta release this time?
>
> If there are no pressing additions, I may be able to do the
> release tomorrow - otherwise how about aiming for Thursday
> or Friday next week (11 or 12 November)?

As people will have noticed, the release didn't happen this week.

Tiago has been doing some excellent work with the prototype
buildbot server (see http://events.open-bio.org:8010/grid for
the current temporary home), and as part of this we've set
up a few machines as buildslaves. See this thread:
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008376.html

Running under Jython on the Mac showed a few problems
which appear to now be sorted, other than an apparent
problem with the GenePop tool.

Unfortunately running under Jython on Windows XP has
revealed several new problems, e.g.
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html

As things stand all the tests (*) are fine on "C" Python on
Linux, Mac, and Windows. They are also fine on Jython
on Linux, give some warnings on Jython on Mac, and 3
errors on Windows.

Hopefully we can address these three test failures (or
at least understand them) and do Biopython 1.56 at
the end of next week instead.

Peter

(*) We haven't audited all the slave test output to check
which tests are being skipped due to missing optional
dependencies yet. e.g. command line tools, or Python
modules like ReportLab or NetworkX.


From p.j.a.cock at googlemail.com  Fri Nov 12 17:55:57 2010
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 12 Nov 2010 17:55:57 +0000
Subject: [Biopython-dev] test_PopGen_GenePop_EasyController.py failure
	on Jython
In-Reply-To: <AANLkTikWZt42DK2rp2hxhWBKGGHpA26QQu4te-m4hrnA@mail.gmail.com>
References: <AANLkTimQ+XXcEDwrC6AR15OdvDtLV+CqaKUnBv0=+F0=@mail.gmail.com>
	<AANLkTikDVija_mNTs4vE+BFbndm9OpwA2+cYLFKvg=Yj@mail.gmail.com>
	<AANLkTi=gkMchj-Fao8HtvPHSKdOhDKT-o7QQhZap2SkW@mail.gmail.com>
	<AANLkTimmyGz_hx5PtuiwcDq39eW=VfV=7u+Gas92jRih@mail.gmail.com>
	<AANLkTikWZt42DK2rp2hxhWBKGGHpA26QQu4te-m4hrnA@mail.gmail.com>
Message-ID: <AANLkTikoqMEDZvQAn+tvedvhe6FE+udf4==FdcA-V4rz@mail.gmail.com>

2010/11/12 Peter Cock <p.j.a.cock at googlemail.com>:
> Hi all,
>
> I've exchanged a few emails with Tiago off list regarding an inconsistent
> test_PopGen_GenePop_EasyController.py problem (most visible on
> Jython), giving error "Unable to open file genepop.txt".
>
> I've just had it from Python 2.7 on a 32bit Linux machine:
>
> ======================================================================
> ERROR: test_get_avg_fst_pair (test_PopGen_GenePop_EasyController.AppTest)
> Test get pairwise Fst.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
> line 98, in test_get_avg_fst_pair
> ? ?pop_fis = ?self.ctrl.get_avg_fst_pair()
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
> line 162, in get_avg_fst_pair
> ? ?return self._controller.calc_fst_pair(self._fname)[1]
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 819, in calc_fst_pair
> ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname)
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 296, in _run_genepop
> ? ?% (ret, e_out.strip().split("\n",1)[0]))
> IOError: GenePop error -11, Unable to open file genepop.txt
>
> ======================================================================
> ERROR: test_get_avg_fst_pair_locus (test_PopGen_GenePop_EasyController.AppTest)
> Test get average Fst for pairwise pops on a locus.
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ?File "/home/pjcock/repositories/biopython/Tests/test_PopGen_GenePop_EasyController.py",
> line 93, in test_get_avg_fst_pair_locus
> ? ?self.assertEqual(len(self.ctrl.get_avg_fst_pair_locus("Locus4")), 45)
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/EasyController.py",
> line 166, in get_avg_fst_pair_locus
> ? ?iter = self._controller.calc_fst_pair(self._fname)[0]
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 819, in calc_fst_pair
> ? ?self._run_genepop([".ST2", ".MIG"], [6,2], fname)
> ?File "/home/pjcock/repositories/biopython/build/lib.linux-i686-2.7/Bio/PopGen/GenePop/Controller.py",
> line 296, in _run_genepop
> ? ?% (ret, e_out.strip().split("\n",1)[0]))
> IOError: GenePop error -11, Unable to open file genepop.txt
>
> ----------------------------------------------------------------------
>
>
> This failed twice in a row, then passed four times in a row (Linux, Python 2.7).
> I suspect the issue was related to machine IO load - during the first
> tests I had something compiling at the same time. I can't reproduce
> it on demand :(
>
> I've also seen it on the Mac with Apple's Python 2.6 (although usually it is
> usually fine).
>
> However, I'm seeing this (consistently?) with Jython 2.5.1 on the Mac.

Well right now on my Mac with Jython, the test passes but with lots of warnings:

$ jython test_PopGen_GenePop_EasyController.py
Test basic info. ... ok
Test Nm estimation. ... ok
Test allele frequency. ... ok
Test get alleles. ... ok
Test get alleles for all populations. ... ok
Test average Fis. ... ok
Test get pairwise Fst. ... ok
Test get average Fst for pairwise pops on a locus. ... Exception
OSError: [Errno 0] couldn't delete file: 'big.gen.INF' in <bound
method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x1>> ignored
Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in
<bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x2>> ignored
ok
Test F stats. ... ok
Test get Fis. ... Exception OSError: [Errno 0] couldn't delete file:
'big.gen.ST2' in <bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x3>> ignored
ok
Test genotype count. ... ok
Test heterozygosity info. ... Exception OSError: [Errno 0] couldn't
delete file: 'big.gen.INF' in <bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x4>> ignored
Exception OSError: [Errno 0] couldn't delete file: 'big.gen.IN2' in
<bound method _FileIterator.__del__ of
<Bio.PopGen.GenePop.Controller._FileIterator instance at 0x5>> ignored
ok
Test multilocus F stats. ... ok

----------------------------------------------------------------------
Ran 13 tests in 5.912s

Or another example, the same machine as a build slave:

http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/9/steps/shell/logs/stdio

On the previous build Jython on Mac gave the same error I reported
above on Linux with "C" Python 2.7:

http://events.open-bio.org:8010/builders/OS%20X%2010.6%20Snow%20Leopard%20-%20Jython%202.5.1/builds/7/steps/shell/logs/stdio

Peter


From andrea at biocomp.unibo.it  Fri Nov 12 20:45:24 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 12 Nov 2010 21:45:24 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
Message-ID: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>


> We haven't been able to due to the Python 2.4 requirement, but
> I know of people using Biopython and multiprocessing together.
>

good


> Personally I'd try profiling the current single threaded code before
> going to multiprocessing.
>

yes, of course.

>> On the other hand, Biopython has the fastest uniprot XML parse
>> among Bio* projects and (to my knowledge) the fastest public
>> parser on the planet ;) I bet Uniprot guys have their parser...
>
> Which of the other Bio* projects have a Uniprot XML parser?
> (Or was that intended as a joke?)
>

It was both a joke and a matter of fact, since I don't know about other
publicly available parsers. Usually I look at a glass as half full...

Andrea


From gawbul at gmail.com  Sat Nov 13 21:24:43 2010
From: gawbul at gmail.com (Steve Moss)
Date: Sat, 13 Nov 2010 21:24:43 +0000
Subject: [Biopython-dev] Developing for the BioPython project...
Message-ID: <AANLkTinrVnyZSgr3WbX40-ACMdcjhAShBUAhNvb63Hg=@mail.gmail.com>

Hi all,

I've just started a PhD centring around evolutionary comparative genomics,
and will be focusing on bioinformatics and computational biology
methodology.

I'm really keen to use Python and BioPython in particular throughout my PhD
and would like to contribute any code I can to aid in promoting BioPython as
viable alternative to BioPerl, which I feel has a larger user
base currently? Is there any particular process of registration to become
involved with development, or is it just a case of fork'ing the repository
from github?

Cheers,

Steve
-- 
Kindest regards,

Steve Moss
http://stevemoss.ath.cx


From eric.talevich at gmail.com  Sat Nov 13 23:05:24 2010
From: eric.talevich at gmail.com (Eric Talevich)
Date: Sat, 13 Nov 2010 18:05:24 -0500
Subject: [Biopython-dev] Developing for the BioPython project...
In-Reply-To: <AANLkTinrVnyZSgr3WbX40-ACMdcjhAShBUAhNvb63Hg=@mail.gmail.com>
References: <AANLkTinrVnyZSgr3WbX40-ACMdcjhAShBUAhNvb63Hg=@mail.gmail.com>
Message-ID: <AANLkTik86tnDLV6M4sFvjNJ_Kb_MTDGwv19U8njtwCrk@mail.gmail.com>

On Sat, Nov 13, 2010 at 4:24 PM, Steve Moss <gawbul at gmail.com> wrote:

> Hi all,
>
> I've just started a PhD centring around evolutionary comparative genomics,
> and will be focusing on bioinformatics and computational biology
> methodology.
>
> I'm really keen to use Python and BioPython in particular throughout my PhD
> and would like to contribute any code I can to aid in promoting BioPython
> as
> viable alternative to BioPerl, which I feel has a larger user
> base currently? Is there any particular process of registration to become
> involved with development, or is it just a case of fork'ing the repository
> from github?
>
>
Hi Steve,

If you've joined the biopython-dev mailing list, you're in the club. Feel
free to fork away!

To get a feel for where development is focused right now, you can look at
our wiki page for active projects:
http://biopython.org/wiki/Active_projects

We're also collectively working on Python 3 compatibility (C extensions
still need some work), though that isn't listed.

Since you're a new grad student, you might have some leeway to get involved
with Google Summer of Code next summer. The project ideas for Biopython,
Open Bio, and NESCent drummed up last year are still worth doing, or might
inspire you do do something else on your own:
http://biopython.org/wiki/Google_Summer_of_Code
http://www.open-bio.org/wiki/Google_Summer_of_Code
https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2010

Cheers,
Eric


From biopython at maubp.freeserve.co.uk  Mon Nov 15 14:34:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 15 Nov 2010 14:34:40 +0000
Subject: [Biopython-dev] FASTA filtering by ID
Message-ID: <AANLkTikF7+nPAsBd7u=yYx=AKXmqrNpzUOP8RRnd40o8@mail.gmail.com>

Hi all,

Something I want to do in several of my workflows is to filter a
FASTA file (or potentially other format sequence files) using a
list of desired identifiers (e.g. a column from a tabular file).

Right now I can achieve this with three steps in Galaxy.
Suppose I have:

Dataset #1, FASTA file

Dataset #2, Tabular file with identifiers of interest (e.g. BLAST hits,
or filtered output from a sequence analysis tool)

Then:

Create tabular Dataset #3 using FASTA-to-tabular on Dataset #1,
subject to the enhancement proposed here:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003717.html

Create tabular Dataset #4 using join on Datasets #2 and #3 using the
matched identifier columns. This does the filtering.

Create FASTA Dataset #5 using tabular-to-FASTA on Dataset #4.

This works (at least for reasonably sized datasets), but requires
three steps and the creation of at least two temporary files.

I'd like to introduce another tool under "FASTA manipulation"
to do it on one step (rather than three). Am I going against
the apparent Galaxy ideal that complex manipulations should
be done with tabular files? Would such a FASTA filter tool be
of interest to add directly to Galaxy (e.g. under the "FASTA
manipulation" section), or better off on the community tool shed?

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Mon Nov 15 17:05:00 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 15 Nov 2010 17:05:00 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTimkZk3n9VLb3fLtg2-GwvpMhZreSVnBbB4-LB6W@mail.gmail.com>
References: <AANLkTikq5TXOhAB-WVurn=WDNM8GiCrPRznrjcZ0Caew@mail.gmail.com>
	<AANLkTimkZk3n9VLb3fLtg2-GwvpMhZreSVnBbB4-LB6W@mail.gmail.com>
Message-ID: <AANLkTiknLLR3=7DKk5PANLAAMjPHK_kE9Detd==koZCe@mail.gmail.com>

On Fri, Nov 12, 2010 at 5:47 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Nov 4, 2010 at 5:13 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi all,
>>
>> I've mentioned in recent threads that I think we should try and
>> release Biopython 1.56 this month (November 2010).
>>
>> ...
>
> As people will have noticed, the release didn't happen this week.
>
> ...
>
> Unfortunately running under Jython on Windows XP has
> revealed several new problems, e.g.
> http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html
>
> ...
>
> Hopefully we can address these three test failures (or
> at least understand them) and do Biopython 1.56 at
> the end of next week instead.

Two of the problems on Jython on Windows were down
to the Windows specific command line tool detection
not being used, now fixed:

https://github.com/biopython/biopython/commit/db41d7e4bfd8f5d4ea44bf8254334fcd7b76474f
https://github.com/biopython/biopython/commit/7e5b71093c8408de140de1937480e26aaaa5daf1

There was also a heap space problem solved by a
more memory efficient __getitem__ method for the
UnknownSeq object (still room for improvement here).

https://github.com/biopython/biopython/commit/125d8d31d07f57628c231286afae99a178e6f2c5

So, we now have a clean bill of health from the offline
tests run on the buildslaves (apart from the occasional
GenePop failure where retesting can make it work).

I still want to look at the SeqIO/AlignIO handle issue,
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008431.html
and also the UniProt XML issue,
http://lists.open-bio.org/pipermail/biopython-dev/2010-November/008440.html

Peter


From peter at maubp.freeserve.co.uk  Thu Nov 18 15:47:08 2010
From: peter at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Nov 2010 15:47:08 +0000
Subject: [Biopython-dev] Dropping Python 2.4 Support?
Message-ID: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>

Dear Biopythoneers,

Are any of you still using Biopython on Python 2.4?
http://news.open-bio.org/news/2010/11/dropping-python24-support/

Please get in touch if dropping support for Python 2.4 would be a
problem. Otherwise we plan for Biopython 1.56 (expected by the
end of this month) to be our last release to work with Python 2.4.

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Thu Nov 18 17:45:30 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Nov 2010 17:45:30 +0000
Subject: [Biopython-dev] FASTA filtering by ID
In-Reply-To: <AANLkTikF7+nPAsBd7u=yYx=AKXmqrNpzUOP8RRnd40o8@mail.gmail.com>
References: <AANLkTikF7+nPAsBd7u=yYx=AKXmqrNpzUOP8RRnd40o8@mail.gmail.com>
Message-ID: <AANLkTinaE7kwWRO-rSN_5N9HoMNMQhDfrF+0r0JSv5So@mail.gmail.com>

Sorry folk - I meant to post that to the Galaxy development
mailing list, http://lists.bx.psu.edu/listinfo/galaxy-dev

Peter


From biopython at maubp.freeserve.co.uk  Wed Nov 24 18:03:03 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Nov 2010 18:03:03 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>

Hi Andrea,

I *think* I have fixed the problem with empty names in the UniProt XML
format, without affecting the unit tests, but I don't have the 62GB free to
unpack uniprot_trembl.xml.gz to try it out:

https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700

Would you be able to retest the trunk code on that please?

I also changed the handling of the organism host (where present)
in both the UniProt and SwissProt parsers to be more consistent.
I've checked uniprot_sprot.dat still parses, but haven't tried the
much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
again, would you be able to retest the "swiss" text parser too?

Many thanks,

Peter

P.S. Did you get any reply from UniProt about the apparent error in
the Q2LEH1 record within uniprot_trembl.xml.gz?


From andrea at biocomp.unibo.it  Thu Nov 25 16:09:28 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Thu, 25 Nov 2010 17:09:28 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
	<AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
Message-ID: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>

> Hi Andrea,
>
> I *think* I have fixed the problem with empty names in the UniProt XML
> format, without affecting the unit tests, but I don't have the 62GB free
> to
> unpack uniprot_trembl.xml.gz to try it out:
>
> https://github.com/biopython/biopython/commit/bb971b2a7384d42d9a6e4994e59299a90e6cc700
>
> Would you be able to retest the trunk code on that please?
>

I've just completed a run on the 8Gb gzipped trembl file (I don't have the
free 62Gb either) an it was ok, with zero errors.
By the way it took just 2h 18m, the same time it took on the uncompressed
62Gb XML file. So it's definitely better not to decompress this file...


> I also changed the handling of the organism host (where present)
> in both the UniProt and SwissProt parsers to be more consistent.
good

> I've checked uniprot_sprot.dat still parses, but haven't tried the
> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
> again, would you be able to retest the "swiss" text parser too?

I'll test this too and let you know.

>
> Many thanks,
>
> Peter
>
> P.S. Did you get any reply from UniProt about the apparent error in
> the Q2LEH1 record within uniprot_trembl.xml.gz?
>

Unfortunately not.

Andrea


From andrea at biocomp.unibo.it  Fri Nov 26 13:54:29 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 26 Nov 2010 14:54:29 +0100 (CET)
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
	<AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
	<17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>
Message-ID: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it>


>> I've checked uniprot_sprot.dat still parses, but haven't tried the
>> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
>> again, would you be able to retest the "swiss" text parser too?
>
> I'll test this too and let you know.
>

Test completed on the .dat file, all entries were parsed without errors.
This time it took almost 3h but was done on the gzipped file stored in a
removable 5400rpm hard drive. the XML file was on an SSD so maybe that's
why it is faster with that parser.


From biopython at maubp.freeserve.co.uk  Fri Nov 26 14:06:58 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 14:06:58 +0000
Subject: [Biopython-dev] Uniprot XML parser on TrEmbl
In-Reply-To: <1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it>
References: <5c0bc5f9bead03ed216fafaff35c709b.squirrel@lipid.biocomp.unibo.it>
	<AANLkTin7aHw4z4=JXP_n+b1Q_rhWgGpzfY=uw81F99FP@mail.gmail.com>
	<17fb1526d4af40ebbe4e6129d1bd0c2c.squirrel@lipid.biocomp.unibo.it>
	<1f693f5d96187fcc44a180d1e7c55a3d.squirrel@lipid.biocomp.unibo.it>
Message-ID: <AANLkTimpKYSANr8R3LhLXOkCxjqu51DragKhosZ5BYtS@mail.gmail.com>

On Fri, Nov 26, 2010 at 1:54 PM, Andrea Pierleoni
<andrea at biocomp.unibo.it> wrote:
>
>>> I've checked uniprot_sprot.dat still parses, but haven't tried the
>>> much bigger uniprot_trembl.dat from uniprot_trembl.dat.gz - so
>>> again, would you be able to retest the "swiss" text parser too?
>>
>> I'll test this too and let you know.
>>
>
> Test completed on the .dat file, all entries were parsed without errors.
> This time it took almost 3h but was done on the gzipped file stored in a
> removable 5400rpm hard drive. the XML file was on an SSD so maybe that's
> why it is faster with that parser.
>

Excellent - thanks.

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 26 14:08:59 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 14:08:59 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
Message-ID: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>

Hi all,

No one has raised any outstanding issues to warrant delaying
the 1.56 release any further, so I plan to do it now. Please don't
make any commits to the master branch until further notice.

Thank you,

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 26 15:19:20 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 15:19:20 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
Message-ID: <AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>

On Fri, Nov 26, 2010 at 2:08 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Hi all,
>
> No one has raised any outstanding issues to warrant delaying
> the 1.56 release any further, so I plan to do it now. Please don't
> make any commits to the master branch until further notice.
>
> Thank you,
>
> Peter

I think that's the source code bundles and Windows installers
all done and uploaded, plus the PyPI upload done. I'll work on
a release announcement for the news server and mailing list.

In the meantime, if anyone could check the files as a sanity
test (just in case I missed something), please do. Get them
from here: http://biopython.org/DIST/

Thanks,

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 26 16:07:48 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 16:07:48 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
Message-ID: <AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>

On Fri, Nov 26, 2010 at 3:19 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Nov 26, 2010 at 2:08 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Hi all,
>>
>> No one has raised any outstanding issues to warrant delaying
>> the 1.56 release any further, so I plan to do it now. Please don't
>> make any commits to the master branch until further notice.
>>
>> Thank you,
>>
>> Peter
>
> I think that's the source code bundles and Windows installers
> all done and uploaded, plus the PyPI upload done. I'll work on
> a release announcement for the news server and mailing list.
>

Posted online,
http://news.open-bio.org/news/2010/11/biopython-1-56-released/

If anyone spots a typo please drop me an email, and I can fix
it - hopefully before sending out the email announcement which
I'll do a bit later on in case there are any suggested revisions
to the text.

Regards,

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 26 16:25:42 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 16:25:42 +0000
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTikhuis9NVte79m9PZMb9pNoFBQvqqq+PwLXstAf@mail.gmail.com>
References: <AANLkTi=feVugOz6M6uK3E=SjKw3Ett4MahGTkLs80Xje@mail.gmail.com>
	<645847.84052.qm@web62404.mail.re1.yahoo.com>
	<AANLkTikhuis9NVte79m9PZMb9pNoFBQvqqq+PwLXstAf@mail.gmail.com>
Message-ID: <AANLkTi=E2me=8XBN7LGNRnQK5Kv7Qvu92Uue3qyhsstj@mail.gmail.com>

On Fri, Nov 5, 2010 at 12:01 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>>
>> Bio/Transcribe.py
>> Bio/Translate.py
>>
>> These are still imported from Bio/Encodings/IUPACEncoding.py, which
>> is imported from Bio/Alphabet/IUPAC.py. I have no idea what this code
>> is doing. Does anybody know?
>
> Ah right - sorry, that had slipped my mind:
> http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html
>
> I had suggested we leave Bio.Transcribe and Bio.Translate in for
> Biopython 1.56 and remove them (and Bio.utils, Bio.PropertyManager,
> and Bio.Encodings.IUPACEncoding) for Biopython 1.57

Hi Michiel,

Now Biopython 1.56 is out, would you like to remove those modules?

Thanks

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 26 19:31:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 19:31:40 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
	<AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
Message-ID: <AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>

On Fri, Nov 26, 2010 at 4:07 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> Posted online,
> http://news.open-bio.org/news/2010/11/biopython-1-56-released/
>
> If anyone spots a typo please drop me an email, and I can fix
> it - hopefully before sending out the email announcement which
> I'll do a bit later on in case there are any suggested revisions
> to the text.

I aim to send out the email in a hour or so's time. If I forget,
Brad - you're in a suitable time zone right?

By the way - please consider the git freeze over (I should have
said so explicitly earlier - sorry about that).

Peter


From chapmanb at 50mail.com  Fri Nov 26 20:20:04 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 26 Nov 2010 15:20:04 -0500
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
	<AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
	<AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>
Message-ID: <20101126202003.GC29878@sobchak.mgh.harvard.edu>

Peter;

> > Posted online,
> > http://news.open-bio.org/news/2010/11/biopython-1-56-released/
> >
> > If anyone spots a typo please drop me an email, and I can fix
> > it - hopefully before sending out the email announcement which
> > I'll do a bit later on in case there are any suggested revisions
> > to the text.

Thanks for all the hard work getting this together. Everything looks
great and thanks for pushing to PyPi.

The only thing I noticed was that after "Note as previously
announced" there is an extra <a> tag which causes the rest of the
text through the authors to be a link. Not a big deal.

Congrats on the new release,
Brad


From biopython at maubp.freeserve.co.uk  Fri Nov 26 21:17:23 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 21:17:23 +0000
Subject: [Biopython-dev] git freeze for Biopython 1.56
In-Reply-To: <20101126202003.GC29878@sobchak.mgh.harvard.edu>
References: <AANLkTikfYGQG2_KMJHdWr42X5A3-0iJaZD8-kTPvuoVP@mail.gmail.com>
	<AANLkTim1POLxTm93=QjEB14whbNhvgDPBwpMyf4-HMB8@mail.gmail.com>
	<AANLkTime7jvxqQmM+=ry-6a0+1h_bJKFQT6WsdM6hVsU@mail.gmail.com>
	<AANLkTimf9hMnu9egYyu5vx4R6nBybjw0JOP20pA9VtHv@mail.gmail.com>
	<20101126202003.GC29878@sobchak.mgh.harvard.edu>
Message-ID: <AANLkTik4E6XKKPbkbFeP--2p_a-0dvypLyyj7UcQTsZb@mail.gmail.com>

Hi Brad,

On Fri, Nov 26, 2010 at 8:20 PM, Brad Chapman wrote:
>
> Thanks for all the hard work getting this together. Everything looks
> great and thanks for pushing to PyPi.

I must say a public thank you to Tiago too - having the buildbot
up and running (even with the handful of buildslaves we have
now) has been a great reassurance that things are looking OK.

This will be particularly helpful for spotting problems on Python
3 (since it is a hassle to test by hand right now) and older
versions of Python - my main machine these days run
Python 2.6.

As an example, for a while the trunk had been broken on
Python 2.4 without anyone noticing. This was when I merged
the UniProt XML parser without having checked the unit tests
were skipped nicely on Python 2.4 when ElementTree was
missing.

Having the tests run every night automatically is much
safer - so thanks Tiago :)

[Hopefully we'll get the buildbot running on a dedicated
VM before too long - we're in touch with the OBF admins
about this already.]

> The only thing I noticed was that after "Note as previously
> announced" there is an extra <a> tag which causes the rest
> of the text through the authors to be a link. Not a big deal.

Well spotted - I'd actually put <a/> rather than </a> which must
have confused the formatting because it looked OK.

Thanks!

Peter


From biopython at maubp.freeserve.co.uk  Fri Nov 26 23:12:14 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Nov 2010 23:12:14 +0000
Subject: [Biopython-dev] Biopython 1.56
Message-ID: <AANLkTim+FCTZDGm-jfKW-T14VRNESPUJow2c4Acm-U6K@mail.gmail.com>

Dear Biopythoneers,

On behalf of the developers, I'm pleased to announce we
released Biopython 1.56 earlier today. For more details
please see:

http://news.open-bio.org/news/2010/11/biopython-1-56-released/

Please note this will probably be the last release to
support Python 2.4, see:

http://news.open-bio.org/news/2010/11/dropping-python24-support/

(At least) 13 people have contributed to this release,
including 6 new people ? thank you all:

    * Andrea Pierleoni (first contribution)
    * Bart de Koning (first contribution)
    * Bartek Wilczynski
    * Bartosz Telenczuk (first contribution)
    * Cymon Cox
    * Eric Talevich
    * Frank Kauff
    * Michiel de Hoon
    * Peter Cock
    * Phillip Garland (first contribution)
    * Siong Kong (first contribution)
    * Tiago Antao
    * Uri Laserson (first contribution)

Source distributions and Windows installers are available
from the downloads page on the Biopython website:
http://www.biopython.org/wiki/Download

As usual, feedback is most welcome on the mailing lists
(or bugzilla).

Regards,

Peter


From biopython at maubp.freeserve.co.uk  Mon Nov 29 12:02:55 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 29 Nov 2010 12:02:55 +0000
Subject: [Biopython-dev] Dropping Python 2.4 Support?
In-Reply-To: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>
References: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>
Message-ID: <AANLkTinmM49x+L8DH_duCc46mQ67mOtcQvW86y7WS94Q@mail.gmail.com>

On Thu, Nov 18, 2010 at 3:47 PM, Peter wrote:
> Dear Biopythoneers,
>
> Are any of you still using Biopython on Python 2.4?
> http://news.open-bio.org/news/2010/11/dropping-python24-support/
>
> Please get in touch if dropping support for Python 2.4 would be a
> problem. Otherwise we plan for Biopython 1.56 (expected by the
> end of this month) to be our last release to work with Python 2.4.
>
> Thanks,
>
> Peter

So, no comments?

We're using CentOS on our servers at work, but have installed
a later Python on most of them and made it the default.

I'm also keen to use Biopython with Galaxy, and they currently
support Python 2.4 to 2.6 (and I'm unclear when they will add
2.7 and drop 2.4), so this is another reason to keep some level
of support for Python 2.4. However, on a local level this isn't
important as we are running Galaxy on Python 2.6 now.
Likewise I know Brad is running Galaxy on a more recent
Python than 2.4 (are you using Biopython within Galaxy
Brad? Maybe we could chat about that on a new thread).

Hopefully the release of Biopython 1.56 will alert more of our
users to the planned withdrawal of support of Python 2.4, so
we may get some feedback this week...

Peter


From chapmanb at 50mail.com  Mon Nov 29 12:23:23 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Mon, 29 Nov 2010 07:23:23 -0500
Subject: [Biopython-dev] Dropping Python 2.4 Support?
In-Reply-To: <AANLkTinmM49x+L8DH_duCc46mQ67mOtcQvW86y7WS94Q@mail.gmail.com>
References: <AANLkTikTaHLuFGCHzXJBpFCENwVj4oDbY1WM1wgKPwhn@mail.gmail.com>
	<AANLkTinmM49x+L8DH_duCc46mQ67mOtcQvW86y7WS94Q@mail.gmail.com>
Message-ID: <20101129122323.GA3139@sobchak.mgh.harvard.edu>

Peter;

[Python2.4 support]
> So, no comments?

The folks who are still using 5 year old versions of python might
not be the most responsive. We'll probably hear some complaints
when some of the code breaks.

> I'm also keen to use Biopython with Galaxy, and they currently
> support Python 2.4 to 2.6 (and I'm unclear when they will add
> 2.7 and drop 2.4), so this is another reason to keep some level
> of support for Python 2.4. However, on a local level this isn't
> important as we are running Galaxy on Python 2.6 now.
> Likewise I know Brad is running Galaxy on a more recent
> Python than 2.4 (are you using Biopython within Galaxy
> Brad? Maybe we could chat about that on a new thread).

Yes, I'm running on 2.6 (and sad to be missing nested with
statements in my code). It would be great to have formal
Biopython/Galaxy interoperability. If I remember right, the biggest
complaint was lack of PEP 8 compliance with module names, but it
should be worth discussing.

Brad


From mjldehoon at yahoo.com  Tue Nov 30 13:14:20 2010
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 30 Nov 2010 05:14:20 -0800 (PST)
Subject: [Biopython-dev] Biopython 1.56 release plans
In-Reply-To: <AANLkTi=E2me=8XBN7LGNRnQK5Kv7Qvu92Uue3qyhsstj@mail.gmail.com>
Message-ID: <215849.18567.qm@web62405.mail.re1.yahoo.com>

OK, I have removed these modules:
  	Bio.Encodings
 	Bio.PropertyManager
 	Bio.Transcribe
 	Bio.Translate
 	Bio.utils

--Michiel.

--- On Fri, 11/26/10, Peter <biopython at maubp.freeserve.co.uk> wrote:

> From: Peter <biopython at maubp.freeserve.co.uk>
> Subject: Re: [Biopython-dev] Biopython 1.56 release plans
> To: "Michiel de Hoon" <mjldehoon at yahoo.com>
> Cc: "Biopython-Dev Mailing List" <biopython-dev at biopython.org>
> Date: Friday, November 26, 2010, 11:25 AM
> On Fri, Nov 5, 2010 at 12:01 PM,
> Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> > On Fri, Nov 5, 2010 at 11:52 AM, Michiel de Hoon
> <mjldehoon at yahoo.com>
> wrote:
> >>
> >> Bio/Transcribe.py
> >> Bio/Translate.py
> >>
> >> These are still imported from
> Bio/Encodings/IUPACEncoding.py, which
> >> is imported from Bio/Alphabet/IUPAC.py. I have no
> idea what this code
> >> is doing. Does anybody know?
> >
> > Ah right - sorry, that had slipped my mind:
> > http://lists.open-bio.org/pipermail/biopython-dev/2010-September/008255.html
> >
> > I had suggested we leave Bio.Transcribe and
> Bio.Translate in for
> > Biopython 1.56 and remove them (and Bio.utils,
> Bio.PropertyManager,
> > and Bio.Encodings.IUPACEncoding) for Biopython 1.57
> 
> Hi Michiel,
> 
> Now Biopython 1.56 is out, would you like to remove those
> modules?
> 
> Thanks
> 
> Peter
> 


From anaryin at gmail.com  Tue Nov 30 15:45:35 2010
From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=)
Date: Tue, 30 Nov 2010 16:45:35 +0100
Subject: [Biopython-dev] Features of the GSOC branch ready to be merged
Message-ID: <AANLkTime10jWf1URpPyqxUvXPw79bfrH=GDvB79J+dNq@mail.gmail.com>

Hello all,

I've been looking at the code I wrote for the GSOC to see what is ready to
be merged in the main branch. I have to thank Kristian and whoever
participated in the Python & Friends for the input.

>From what I gathered, and from my own tests, I believe the following
functions are solid enough:


   1. Bio/PDB/Atom.py<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Atom.py#L75-105>:
   automatically guessing atom element from atom name
   2. Bio/PDB/Structure.py
      1. Building biological unit from REMARK 350 in the header
(link<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Structure.py#L78-110>
      )
      2. Renumbering residues
(link<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Structure.py#L66-76>
      )


Let me know what you all think.

Best,

Jo?o [...] Rodrigues
http://doeidoei.wordpress.com


From biopython at maubp.freeserve.co.uk  Tue Nov 30 23:24:35 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 30 Nov 2010 23:24:35 +0000
Subject: [Biopython-dev] Bio.SeqIO.index extension, Bio.SeqIO.index_many
Message-ID: <AANLkTimena_PPhuzjksES=eSWf7ZmSsBWM1uam=V87fC@mail.gmail.com>

Hi all,

You may recall some previous discussion about extending the
Bio.SeqIO.index functionality. I'm particularly interested in
keeping the index on disk to reduce the memory overhead
and thus support NGS files with many millions of reads. e.g.

http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006713.html
http://lists.open-bio.org/pipermail/biopython-dev/2009-September/006716.html

I'd also like to index multiple files (e.g. a folder of GenBank
files for different chromosomes), functionality we used to
have with the OBDA style index (using BDB or a flat file)
and Martel/Mindy (deprecated and removed some time ago
due to problems with 3rd party libraries, scaling problems
when parsing, and ultimately no one familiar enough with
the code to try and fix it). See also:

http://lists.open-bio.org/pipermail/biopython-dev/2009-August/006704.html

I've been working on the follow idea on branches in github,
and have something workable using SQLite3 to store a
table of record identifiers, file offset, and file number
(for where we have multiple files indexed together).
Following the OBDA standard, I extended this to
also (optionally) store the record length on disk.
This allows the get_raw method to be much faster,
but may not be possible on all file formats.

[Currently I get the length when building the index
on all supported file formats except SFF. Here we
normally use the Roche index, and that doesn't
have the raw record lengths.]

Note that using SQLite seems sensible to me as
it is included with Python 2.5+ including Python 3,
while BDB, the other candidate from the standard
library, has been deprecated.

The current API is as follows, a new function:

def index_many(index_filename, filenames=None,
                        format=None, alphabet=None,
                        key_function=None)

This is similar to the existing index function, although
here the key_function must return a string for use as
the key in the SQLite database.

The idea is that you call index_many to build a new
index (if the index file does not exist) or reload an
existing index (if the index file does exist). If you
are reloading an existing index, you can omit the
filenames and format.

The index_many function returns a read only dictionary
like object - very much like the existing index function.

Although not (currently) exposed by this API, the code
allows a configurable limit on the number of handles
(since these are a finite resource limited by the OS).

I've put a branch up for comment:
https://github.com/peterjc/biopython/tree/index-many

I hope the docstring text and embedded doctest
examples are clear. You can read them here:
https://github.com/peterjc/biopython/blob/index-many/Bio/SeqIO/__init__.py

What do people think?

One thing I haven't done yet (any volunteers?) is any
benchmarking - for example comparing the index
build and retrieval times for some large files using
Biopython 1.55 (recent baseline), Biopython 1.56
(should be faster on retrieval) and the branch to
check for any regressions in Bio.SeqIO.index(), and
compare this to Bio.SeqIO.index_many() which being
disk based will be slower but require much less RAM.

Peter

P.S. This was based on the following branch, which
proved non-trivial to merge since in the meantime I'd
made separate tweaks to the index code on the trunk:
https://github.com/peterjc/biopython/tree/index-many-length

I didn't propose merging this back then because it
absolutely requires SQLite, and thus Python 2.5+
and we wanted Biopython 1.56 to support Python 2.4.