From redmine at redmine.open-bio.org  Mon Aug  1 01:24:51 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 1 Aug 2011 05:24:51 +0000
Subject: [Biopython-dev] [Biopython - Feature #3271] Updates to PDBList.py-
	downloading PDB structures
References: <redmine.issue-3271.20110726201643@redmine.open-bio.org>
Message-ID: <redmine.journal-14650.20110801052451@redmine.open-bio.org>


Issue #3271 has been updated by David Cain.


Hi, Eric. I'm glad you like my changes, and I appreciate your feedback. I made some changes in line with your suggestions and submitted my branch as a pull request.

Thank you again for the response.
----------------------------------------
Feature #3271: Updates to PDBList.py- downloading PDB structures
https://redmine.open-bio.org/issues/3271

Author: David Cain
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 1.57
URL: https://github.com/DavidCain/biopython


PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter.

Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue.

My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Aug  1 10:57:06 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 1 Aug 2011 14:57:06 +0000
Subject: [Biopython-dev] [Biopython - Feature #3271] (Closed) Updates to
	PDBList.py- downloading PDB structures
References: <redmine.issue-3271.20110726201643@redmine.open-bio.org>
Message-ID: <redmine.journal-14652.20110801145706@redmine.open-bio.org>


Issue #3271 has been updated by Eric Talevich.

Status changed from New to Closed
% Done changed from 0 to 100

Merged it:
https://github.com/biopython/biopython/pull/14

I think we could do more work on the docstrings and comments, generally, but it's out of the scope of this bug.

Thanks again!
----------------------------------------
Feature #3271: Updates to PDBList.py- downloading PDB structures
https://redmine.open-bio.org/issues/3271

Author: David Cain
Status: Closed
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 1.57
URL: https://github.com/DavidCain/biopython


PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter.

Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue.

My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Aug  2 12:43:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 17:43:30 +0100
Subject: [Biopython-dev] Leaked handles in PAML unit tests
Message-ID: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>

Hi Brandon,

Would you be able to look at these handle leaks in the PAML unit tests
some time?

test_PAML_baseml ... /Users/pjcock/lib/python3.2/unittest/case.py:574:
ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad1.ctl'
mode='r' encoding='UTF-8'>
  callableObj(*args, **kwargs)
/Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning:
unclosed file <_io.TextIOWrapper name='PAML/bad2.ctl' mode='r'
encoding='UTF-8'>
  callableObj(*args, **kwargs)
/Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning:
unclosed file <_io.TextIOWrapper name='/dev/null' mode='w'
encoding='UTF-8'>
  callableObj(*args, **kwargs)
ok
test_PAML_codeml ... ok
test_PAML_yn00 ... /Users/pjcock/lib/python3.2/unittest/case.py:574:
ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad3.ctl'
mode='r' encoding='UTF-8'>
  callableObj(*args, **kwargs)
ok

This is warning is new under Python 3.2, but this kind of code can and
has caused bugs on Windows (can't delete files if there is an open
handle) and Jython (different GC collection, so implicit handle closing
is stochastic). See also:

http://bugs.python.org/issue10093

Note there are other cases of this, some in PopGen (which may
explain a periodic failure under Jython), and in test_SCOP_Astral.py
(where the object design makes this difficult to avoid IIRC), etc.

Peter

From p.j.a.cock at googlemail.com  Tue Aug  2 12:47:20 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 17:47:20 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
Message-ID: <CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>

On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> I've done some more improvements to the code:
> - I've written the check and unittest for the file handle mode. I've set it
> so that abi file has to be opened in 'rb' mode, otherwise it'll return an
> error. While it's ok to open in 'r' mode in python 2 in Linux, it has to be
> specified as 'rb' in Windows and/or Python 3 for the file to be read
> correctly. So I decided forcing it to 'rb' is the best. Because of this, I
> changed 'test_SeqIO.py:503' to include the mode argument when opening.

OK, good.

> - I've also checked against test_Emboss.py for seqret output, after
> including the abi format in it. My EMBOSS version is 6.4.0. There was a
> slight problem with this testing, since for some reason the ID returned by
> seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS
> installation, since when I previously tested it against 6.1.0, the ID was
> correct (although the qual values not, so I had to upgrade). As expected, if
> I comment out the code that tests for sequence id ('test_Emboss.py:168-172')
> the tests pass. Maybe you could try testing it as well and see if EMBOSS
> also returns the default id instead of the sample name?

EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS 6.4.0

> - Finally, I did some small cosmetic changes to the code (typos, etc).
> All changes have been pushed to my github fork. Now I still have time for
> the weekend to improve whatever needs to be improved :).
> Regards,

There appears to be another Python 3 problem, consider this at the
python prompt:

from Bio import SeqIO
record = SeqIO.read("Tests/Abi/310.ab1", "abi")
record.letter_annotations["phred_quality"]

I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00',
'\x00', '\x00', ..., '\x00']

Peter

From w.arindrarto at gmail.com  Tue Aug  2 12:53:46 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 2 Aug 2011 18:53:46 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
Message-ID: <CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>

Hi Peter,

I noticed that bug was because I did not add the _bytes_to_string()
converter for a data type. I already fixed this with my latest push, adding
the appropriate if clause at AbiIO.py:293-294.

Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Tue, Aug 2, 2011 at 18:47, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter,
> > I've done some more improvements to the code:
> > - I've written the check and unittest for the file handle mode. I've set
> it
> > so that abi file has to be opened in 'rb' mode, otherwise it'll return an
> > error. While it's ok to open in 'r' mode in python 2 in Linux, it has to
> be
> > specified as 'rb' in Windows and/or Python 3 for the file to be read
> > correctly. So I decided forcing it to 'rb' is the best. Because of this,
> I
> > changed 'test_SeqIO.py:503' to include the mode argument when opening.
>
> OK, good.
>
> > - I've also checked against test_Emboss.py for seqret output, after
> > including the abi format in it. My EMBOSS version is 6.4.0. There was a
> > slight problem with this testing, since for some reason the ID returned
> by
> > seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS
> > installation, since when I previously tested it against 6.1.0, the ID was
> > correct (although the qual values not, so I had to upgrade). As expected,
> if
> > I comment out the code that tests for sequence id
> ('test_Emboss.py:168-172')
> > the tests pass. Maybe you could try testing it as well and see if EMBOSS
> > also returns the default id instead of the sample name?
>
> EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS
> 6.4.0
>
> > - Finally, I did some small cosmetic changes to the code (typos, etc).
> > All changes have been pushed to my github fork. Now I still have time for
> > the weekend to improve whatever needs to be improved :).
> > Regards,
>
> There appears to be another Python 3 problem, consider this at the
> python prompt:
>
> from Bio import SeqIO
> record = SeqIO.read("Tests/Abi/310.ab1", "abi")
> record.letter_annotations["phred_quality"]
>
> I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00',
> '\x00', '\x00', ..., '\x00']
>
> Peter
>

From p.j.a.cock at googlemail.com  Tue Aug  2 13:57:56 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 18:57:56 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
Message-ID: <CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>

On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> I noticed that bug was because I did not add the _bytes_to_string()
> converter for a data type. I already fixed this with my latest push, adding
> the appropriate if clause at AbiIO.py:293-294.
> Regards,

Was that only half the fix? This made it work for me:

https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45

and:

https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e

Peter

From p.j.a.cock at googlemail.com  Tue Aug  2 14:03:24 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 19:03:24 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
Message-ID: <CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>

On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
>> Hi Peter,
>> I noticed that bug was because I did not add the _bytes_to_string()
>> converter for a data type. I already fixed this with my latest push, adding
>> the appropriate if clause at AbiIO.py:293-294.
>> Regards,
>
> Was that only half the fix? This made it work for me:
>
> https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45
>
> and:
>
> https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e
>
> Peter
>

Could you test this branch, which I think is ready to be merged to the
trunk now:

https://github.com/peterjc/biopython/tree/seqio-abi

Thanks,

Peter

From w.arindrarto at gmail.com  Wed Aug  3 08:14:53 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 3 Aug 2011 14:14:53 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
Message-ID: <CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>

Hi Peter,

My bad, I forgot to change that one line and didn't test before comitting.
Thanks for fixing it.

I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the
results:

- On both py2.6.5  and py3.1.2, I have the following test case error:
"NameError: global name 'embossversion' is not defined", on line 257. I
didn't have "EMBOSS_ROOT" in my os.environ paths (I installed 6.4.0 from
source, by the way), so this must be what's causing it. Is there another way
to automatically detect EMBOSS_ROOT other than this? Or perhaps we should
avoid emboss 6.4.0's bug by only checking if the id is EMBOSS_001? The only
case I think this would fail is if the user inputs "EMBOSS_001" before the
sequencing run as the sample id, which is possible but unlikely.

- On a related note, I noticed you set the minimum Emboss requirement to
6.1.0 patch 3. I'm not sure if this the one I use previously, but my
previous Emboss 6.1.0 installation failed to extract the proper quality
values. Perhaps we should set the minimum version to 6.3.1? (well, making it
the only Emboss version that works with Biopython because of that 6.4.0
bug).

- Other than those two, everything's tip top :).


Regards,
Wibowo Arindrarto (bow)
http://bow.web.id


On Tue, Aug 2, 2011 at 20:03, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto
> > <w.arindrarto at gmail.com> wrote:
> >> Hi Peter,
> >> I noticed that bug was because I did not add the _bytes_to_string()
> >> converter for a data type. I already fixed this with my latest push,
> adding
> >> the appropriate if clause at AbiIO.py:293-294.
> >> Regards,
> >
> > Was that only half the fix? This made it work for me:
> >
> >
> https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45
> >
> > and:
> >
> >
> https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e
> >
> > Peter
> >
>
> Could you test this branch, which I think is ready to be merged to the
> trunk now:
>
> https://github.com/peterjc/biopython/tree/seqio-abi
>
> Thanks,
>
> Peter
>

From macrozhu at gmail.com  Wed Aug  3 09:47:07 2011
From: macrozhu at gmail.com (Hongbo Zhu)
Date: Wed, 3 Aug 2011 15:47:07 +0200
Subject: [Biopython-dev] inconsistent return values
	Bio.PDB.NeighborSearch.search()
Message-ID: <CABHu63pviognJX42+kyPDt+yxZrvETPygTuv5bAkm0FnZkECuw@mail.gmail.com>

Hi, python-developers,

In the current version of BioPython (source code as of 3 Aug. 2011), it
seems the outcome of *Bio.PDB.NeighborSearch.search()* is inconsistent if
different levels are specified when the returned list is empty.

e.g.

> ns.search(center, radius, 'A')
> []
> ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S'
> IndexError: list index out of range

Obviously, this is because the Bio.PDB.NeighborSearch.search() functions
tries to convert returned list to levels other than 'A' using
function Bio.PDB.Selection.unfold_entities() (see line 92 in
NeighborSearch.py). In function unfold_entities(), the first element of
input argument entity_list is evaluated without entity_list being checked
for emptiness (see line 47 in Selection.py). An IndexError is raised when
entity_list is empty.

So, I think either the length of the returned list in
Bio.PDB.NeighborSearch.search()
should be checked before invoking Bio.PDB.Selection.unfold_entities(), or
the function Bio.PDB.Selection.unfold_entities() should be revised so that
it simply returns an empty list if the argument entity_list is empty. I
prefer the latter solution because this would also fix other similar
situations when  Bio.PDB.Selection.unfold_entities() is invoked in other
functions.

And it seems "Sorry, entering bugs into the product Biopython has been
disabled."

regards,
Hongbo Zhu

From p.j.a.cock at googlemail.com  Wed Aug  3 09:58:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 3 Aug 2011 14:58:13 +0100
Subject: [Biopython-dev] inconsistent return values
	Bio.PDB.NeighborSearch.search()
In-Reply-To: <CABHu63pviognJX42+kyPDt+yxZrvETPygTuv5bAkm0FnZkECuw@mail.gmail.com>
References: <CABHu63pviognJX42+kyPDt+yxZrvETPygTuv5bAkm0FnZkECuw@mail.gmail.com>
Message-ID: <CAKVJ-_4=W0wuVZ0__Mawh340ajQfPkS_7yOFNT4ikH8rUdGX1g@mail.gmail.com>

On Wed, Aug 3, 2011 at 2:47 PM, Hongbo Zhu <macrozhu at gmail.com> wrote:
>
> And it seems "Sorry, entering bugs into the product Biopython has been
> disabled."

We moved from Bugzilla to Redmine, links on the main homepage
were updated: http://redmine.open-bio.org/projects/biopython

I wonder if we can change that message text or something...

Peter

From p.j.a.cock at googlemail.com  Wed Aug  3 10:04:46 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 3 Aug 2011 15:04:46 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
Message-ID: <CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>

On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> My bad, I forgot to change that one line and didn't test before comitting.
> Thanks for fixing it.
> I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the
> results:
> - On both py2.6.5 ?and py3.1.2, I have the following test case error:
> "NameError: global name 'embossversion' is not defined", on line 257.
>...

It was simpler than that - I'd checked it in with a typo, emboss_version
was what I wanted. Sorry about that confusion!

> - On a related note, I noticed you set the minimum Emboss requirement to
> 6.1.0 patch 3. I'm not sure if this the one I use previously, but my
> previous Emboss 6.1.0 installation failed to extract the proper quality
> values. Perhaps we should set the minimum version to 6.3.1? (well, making it
> the only Emboss version that works with Biopython because of that 6.4.0
> bug).

We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later,
which is why that requirement exists. Asking for at least EMBOSS
6.3.1 makes no practical difference as far as I can see.

If you meant require EMBOSS 6.4.1 that hasn't been released yet.

I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after
I've tested the proposed patch Peter Rice sent), but that will still
report itself as EMBOSS 6.4.0 (based on past patch behaviour,
something I consider annoying but have to live with).

> - Other than those two, everything's tip top :).
>

Great. I've pushed the code to the main repository, and have
just set off the buildbot slaves as a final sanity test.

This reveal a minor Python 2.4 breakage (not a big issue - it only
seems to be me still trying to keep testing this - and I'm about
ready to give up), and another probable EMBOSS bug in an
older version installed on one buildslave.

Congratulations, your code will be in the next Biopython release.

Thank you,

Peter


From redmine at redmine.open-bio.org  Wed Aug  3 10:52:32 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 3 Aug 2011 14:52:32 +0000
Subject: [Biopython-dev] [Biopython - Bug #3276] (New) inconsistent returns
	of Bio.PDB.NeighborSearch.search()
Message-ID: <redmine.issue-3276.20110803145232@redmine.open-bio.org>


Issue #3276 has been reported by Hongbo Zhu.

----------------------------------------
Bug #3276: inconsistent returns of Bio.PDB.NeighborSearch.search()
https://redmine.open-bio.org/issues/3276

Author: Hongbo Zhu
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


In the current version of BioPython (source code as of 3 Aug. 2011), it seems the outcome of Bio.PDB.NeighborSearch.search() is inconsistent if different levels are specified when the returned list is empty.

i.e.
@
ns.search(center, radius, 'A')
[]
ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S'
IndexError: list index out of range
@
Obviously, this is because the Bio.PDB.NeighborSearch.search() functions tries to convert returned list to levels other than 'A' using function Bio.PDB.Selection.unfold_entities() (see line 92 in NeighborSearch.py). In function unfold_entities(), the first element of input argument entity_list is evaluated without entity_list being checked for emptiness (see line 47 in Selection.py). An IndexError is raised when entity_list is empty.

So, I think either the length of the returned list in Bio.PDB.NeighborSearch.search() should be checked before invoking Bio.PDB.Selection.unfold_entities(), or the function Bio.PDB.Selection.unfold_entities() should be revised so that it simply returns an empty list if the argument entity_list is empty. I prefer the latter solution because this would also fix other similar situations when  Bio.PDB.Selection.unfold_entities() is invoked in other functions.

cheers, hongbo


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Wed Aug  3 11:11:13 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 3 Aug 2011 17:11:13 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
Message-ID: <CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>

Hi Peter,

On Wed, Aug 3, 2011 at 16:04, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter,
> > My bad, I forgot to change that one line and didn't test before
> comitting.
> > Thanks for fixing it.
> > I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the
> > results:
> > - On both py2.6.5  and py3.1.2, I have the following test case error:
> > "NameError: global name 'embossversion' is not defined", on line 257.
> >...

It was simpler than that - I'd checked it in with a typo, emboss_version
> was what I wanted. Sorry about that confusion!


Silly me, I should've noticed you used emboss_version when I was looking at
the code checking Emboss dependency :/.


> > - On a related note, I noticed you set the minimum Emboss requirement to
> > 6.1.0 patch 3. I'm not sure if this the one I use previously, but my
> > previous Emboss 6.1.0 installation failed to extract the proper quality
> > values. Perhaps we should set the minimum version to 6.3.1? (well, making
> it
> > the only Emboss version that works with Biopython because of that 6.4.0
> > bug).
>
> We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later,
> which is why that requirement exists. Asking for at least EMBOSS
> 6.3.1 makes no practical difference as far as I can see.
>
> If you meant require EMBOSS 6.4.1 that hasn't been released yet.
>
> I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after
> I've tested the proposed patch Peter Rice sent), but that will still
> report itself as EMBOSS 6.4.0 (based on past patch behaviour,
> something I consider annoying but have to live with).


I meant Emboss 6.3.1, since that seems to be one that works best with the
current AbiIO implementation. But yeah, I guess as long as the tests work
it's fine.


> > - Other than those two, everything's tip top :).
> >
>
> Great. I've pushed the code to the main repository, and have
> just set off the buildbot slaves as a final sanity test.
>
> This reveal a minor Python 2.4 breakage (not a big issue - it only
> seems to be me still trying to keep testing this - and I'm about
> ready to give up), and another probable EMBOSS bug in an
> older version installed on one buildslave.
>
> Congratulations, your code will be in the next Biopython release.
>
> Thank you,
>
> Peter
>

This really made my day :)! You're welcome and thank you reviewing my code,
too!


Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id

From w.arindrarto at gmail.com  Thu Aug  4 07:30:44 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Thu, 4 Aug 2011 13:30:44 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
Message-ID: <CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>

Hi Peter,

Ah yes, I didn't know there could be handles without .seek() and .tell(),
and I thought those two are the proper way of traversing files, so I used
them. I also didn't realize you could use SeqIO with network handles, too.
This is really neat :).

In any case, sure, I'd love to make some changes to the current AbiIO code
so it works without .seek() and .tell(). Is there any other input types that
does not use .seek() and .tell() other than network handles? Here's my new
branch from the current master:
https://github.com/bow/biopython/tree/seqio-abi_handlefix, nothing different
for now but I'll push my updates soon.


Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Thu, Aug 4, 2011 at 13:03, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > On Wed, Aug 3, 2011 at 16:04, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >> ...
> >> Congratulations, your code will be in the next Biopython release.
> >> ...
> >
> > This really made my day :)! You're welcome and thank you reviewing my
> code,
> > too!
>
> I found something else to work on (sorry!). You're using seek and tell,
> which
> may not exist. Network handles are a good example of this situation. Try:
>
> from urllib import urlopen
> from Bio import SeqIO
> handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1")
> record = SeqIO.read(handle, "abi")
> handle.close()
>
> I've added some code to test_SeqIO.py to simulate this, which revealed that
> the SFF parser was also using the tell method. In that case we must track
> the
> offset explicitly (it is needed for handling SFF index blocks). You can see
> how
> I did this here - note I avoid the overhead of tracking the offset in
> general:
>
> https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc
>
> I've tried the same trick in the ABI parser, but this reveals your code
> likes to
> seek backwards. Try the attached patch against this revision to confirm
> this.
>
> Having looked over your code, I don't believe you need to use seek and tell
> at all. This isn't critical to fix right now, but I would like us to
> solve it. Would
> you like to try? Make a new branch from the current master for this please.
>
> Regards,
>
> Peter
>

From p.j.a.cock at googlemail.com  Thu Aug  4 07:03:27 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 12:03:27 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
Message-ID: <CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>

On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> On Wed, Aug 3, 2011 at 16:04, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> ...
>> Congratulations, your code will be in the next Biopython release.
>> ...
>
> This really made my day :)! You're welcome and thank you reviewing my code,
> too!

I found something else to work on (sorry!). You're using seek and tell, which
may not exist. Network handles are a good example of this situation. Try:

from urllib import urlopen
from Bio import SeqIO
handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1")
record = SeqIO.read(handle, "abi")
handle.close()

I've added some code to test_SeqIO.py to simulate this, which revealed that
the SFF parser was also using the tell method. In that case we must track the
offset explicitly (it is needed for handling SFF index blocks). You can see how
I did this here - note I avoid the overhead of tracking the offset in general:
https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc

I've tried the same trick in the ABI parser, but this reveals your code likes to
seek backwards. Try the attached patch against this revision to confirm this.

Having looked over your code, I don't believe you need to use seek and tell
at all. This isn't critical to fix right now, but I would like us to
solve it. Would
you like to try? Make a new branch from the current master for this please.

Regards,

Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tell_hack.patch
Type: application/octet-stream
Size: 1466 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20110804/bd28e873/attachment.obj>

From p.j.a.cock at googlemail.com  Thu Aug  4 07:47:49 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 12:47:49 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
Message-ID: <CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>

On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> Ah yes, I didn't know there could be handles without .seek() and .tell(),
> and I thought those two are the proper way of traversing files, so I used
> them. I also didn't realize you could use SeqIO with network handles, too.
> This is really neat :).

Yes - having a handle focused API makes some clever stuff possible :)
Of course, parsing sequences directly from network handles isn't always
a good idea, but it can be useful.

> In any case, sure, I'd love to make some changes to the current AbiIO code
> so it works without .seek() and .tell(). Is there any other input types that
> does not use .seek() and .tell() other than network handles?

I suspect some specialised handles for accessing compressed files might
have similar limitations. In the case of gzip at least, I think it does support
seek and tell.

> Here's my new branch from the current master:
> https://github.com/bow/biopython/tree/seqio-abi_handlefix
> nothing different for now but I'll push my updates soon.

Don't rush yourself - I'm away for a long weekend so won't be testing
any updates till next week anyway.

Thanks,

Peter

From b.invergo at gmail.com  Thu Aug  4 11:38:23 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Thu, 04 Aug 2011 17:38:23 +0200
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
Message-ID: <1312472309.8916.15.camel@localhost.localdomain>

Hi Peter,
(I'm CCing this to the dev list for the info in the second paragraph)
Thanks for the reply. I solved the Python2 problem by fixing my
PYTHONPATH. Running the tests from the Tests directory couldn't find the
Bio module due to a mistake in the PYTHONPATH, so I tried to run them
from the parent directory, resulting in test failures. A dumb mistake
but anyway it's fixed. Sorry for wasting your time with that.

I still have the following error with Python 3.2, though, which prevents
me from figuring out the leaked handle problem in Py3k:
[brandon at brandon-linux Tests]$ python test_PAML_baseml.py
Traceback (most recent call last):
  File "test_PAML_baseml.py", line 10, in <module>
    from Bio.Phylo.PAML import baseml
  File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py",
line 12, in <module>
    from Bio.Phylo._io import parse, read, write, convert
  File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line
12, in <module>
    from Bio.Phylo import BaseTree, NewickIO, NexusIO
  File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py",
line 222
    return u'%s(%s)' % (self.__class__.__name__,

SyntaxError: invalid syntax

Regarding that specific error, I think all strings are implicitly
unicode in Python 3, aren't they? I don't have much experience with
maintaing Py2/3 compatibility, though, so I don't know how to best
handle this. Searching for the unicode operator (u') in the entire Bio
file tree shows that it only exists in Phylo/PhyloXML.py and
Phylo/BaseTree.py.

-brandon

On Wed, 2011-08-03 at 13:33 +0100, Peter Cock wrote:
> On Wed, Aug 3, 2011 at 11:18 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
> > Hi Peter,
> > I'm still in the process of looking at them now but I'm running into a
> > side issue that maybe you can help with. I've tried running the unit
> > tests myself using both Python 2.7.2 and Python 3.2.1, the two versions
> > I have, and both times it fails.
> 
> Python 3 takes a bit more effort to debug due to the 2to3 thing
> and different paths - so I'd focus on Python 2.7 initially.
> 
> > Just looking at test_PAML_baseml.py, for example, with Python 2 I get a
> > lot of test failures due to baseml.py now (correctly) throwing IOErrors
> > rather than AttributeErrors or TypeErrors. With Python 3, on the other
> > hand, I get syntax errors in BaseTree.py (I'll include the output of
> > both below). I did a git pull upstream master before doing this, so my
> > code should be up-to-date (it seems like the unit tests are out-of-date,
> > re: the error types). Now, clearly these have passed on the build
> > machine so I'm wondering what I could be doing wrong.  Being able to
> > replicate the test failures in Python 3 on my machine will really help
> > in fixing them.
> > Sorry about the probable-newbie question...
> 
> What does "git status" give you?
> 
> My usual routine is as follows, but I clone from the official repository
> (which is therefore called origin), and have my personal one setup
> as peterjc via "git remote add ...":
> 
> git checkout master #if not there already
> git fetch origin
> git status #should say behind and can FF merge
> git merge origin/master #should now have latest code
> 
> I'm guessing you're working from a clone of your github repo?
> 
> An easy thing to try is a fresh clone of the official biopython.
> 
> The other key point is all the unit tests expect the current
> directory to be the Tests directory NOT the parent directory
> where setup.py lives.
> 
> Note if you just do "python test_PAML_baseml.py" this will
> pickup the installed Biopython (via PYTHONPATH etc).
> 
> One option is "runtests.py test_PAML_baseml.py" which
> will use the local code for you.
> 
> If you do "python Tests/test_PAML_baseml.py" this should
> pickup the source code for Biopython (won't work for any
> compiled modules IIRC).
> 
> Peter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20110804/e63fea3a/attachment-0001.bin>

From p.j.a.cock at googlemail.com  Thu Aug  4 11:59:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 16:59:42 +0100
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <1312472309.8916.15.camel@localhost.localdomain>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
Message-ID: <CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>

On Thu, Aug 4, 2011 at 4:38 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Hi Peter,
> (I'm CCing this to the dev list for the info in the second paragraph)
> Thanks for the reply. I solved the Python2 problem by fixing my
> PYTHONPATH. Running the tests from the Tests directory couldn't find the
> Bio module due to a mistake in the PYTHONPATH, so I tried to run them
> from the parent directory, resulting in test failures. A dumb mistake
> but anyway it's fixed. Sorry for wasting your time with that.

No problem - learning about paths and imports is a bit tricky.

> I still have the following error with Python 3.2, though, which prevents
> me from figuring out the leaked handle problem in Py3k:
> [brandon at brandon-linux Tests]$ python test_PAML_baseml.py
> Traceback (most recent call last):
> ?File "test_PAML_baseml.py", line 10, in <module>
> ? ?from Bio.Phylo.PAML import baseml
> ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py",
> line 12, in <module>
> ? ?from Bio.Phylo._io import parse, read, write, convert
> ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line
> 12, in <module>
> ? ?from Bio.Phylo import BaseTree, NewickIO, NexusIO
> ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py",
> line 222
> ? ?return u'%s(%s)' % (self.__class__.__name__,
>
> SyntaxError: invalid syntax

Hang on - that looks like you ran it with "python" meaning Python 2.x

Working with Python 3 the following should "just work":

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py build
python3 setup.py test
python3 setup.py install #Use sudo or --prefix etc if you want

However, if you want to run the offline test only, you need
to go into the Python3 converted Tests directory, not the
unconverted Python2 Tests directory. Note that this is
Biopython specific (but based on what NumPy does). e.g.

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py build
cd build/py3.2/Tests
python3 run_tests.py --offline

Likewise if you want to test just one module,

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py build
cd build/py3.2/Tests
python3 run_tests.py test_PAML_baseml.py

In the above, run_tests.py should take care of the path
settings to ensure the freshly built Biopython is used
(not whatever old version may be installed elsewhere).

If the above works nicely for you, stick with that.

Alternatively, I often just install in-development versions of
Biopython on my personal machine under my home directory
(where Python 3 was also installed using the --prefix option
so I don't need to mess about with the PYTHONPATH):

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py install --prefix=$HOME
cd build/py3.2/Tests
python3 test_PAML_baseml.py

If your Python 3 is installed at system level you can do this but
it isn't very clean (certainly don't do it on a shared machine):

cd /home/brandon/Projects/pypaml/biopython
sudo python3 setup.py install
cd build/py3.2/Tests
python3 test_PAML_baseml.py

Alternatively if your Python 3 is at the system level you can
install Biopython under your home directory but then you have
to mess about with PYTHONPATH and keep changing it for
Python2 vs Python3, since they use the same variable (a
design choice I fail to see any advantages in).

Confusing isn't it?

There are other potential solutions to having multiple copies
of Python installed, like using virtualenv...

Peter


From p.j.a.cock at googlemail.com  Thu Aug  4 13:32:38 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 18:32:38 +0100
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <1312478530.8916.20.camel@localhost.localdomain>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
	<CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>
	<1312478530.8916.20.camel@localhost.localdomain>
Message-ID: <CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>

On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
>
> The above does work nicely for me. So nicely, in fact, that the PAML
> tests all pass! So I'm still having trouble replicating the leaked
> handles. I'm still trying to figure out what's happening...
>

It could be something silly with warning silencing being global
and not local, and thus depends on the order the tests are run in.

Did you try running all the (offline) tests in one go under Python 3.2?

Peter

From b.invergo at gmail.com  Thu Aug  4 14:21:59 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Thu, 04 Aug 2011 20:21:59 +0200
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
	<CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>
	<1312478530.8916.20.camel@localhost.localdomain>
	<CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
Message-ID: <1312482121.8916.22.camel@localhost.localdomain>

Ok, now I've got the errors. Now I can actually get to work. Thanks for
your help with this. I had no idea about the special Py3 building (I've
just been using the raw tests from the repository)

I'll see what I can do now.
-brandon

On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote:
> On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> >
> > The above does work nicely for me. So nicely, in fact, that the PAML
> > tests all pass! So I'm still having trouble replicating the leaked
> > handles. I'm still trying to figure out what's happening...
> >
> 
> It could be something silly with warning silencing being global
> and not local, and thus depends on the order the tests are run in.
> 
> Did you try running all the (offline) tests in one go under Python 3.2?
> 
> Peter


From b.invergo at gmail.com  Fri Aug  5 09:58:27 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Fri, 05 Aug 2011 15:58:27 +0200
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
	<CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>
	<1312478530.8916.20.camel@localhost.localdomain>
	<CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
Message-ID: <1312552714.8916.28.camel@localhost.localdomain>

Ok the leaks have been taken care of. The problem arises when an
exception is raised within a block of text in which a file handle is
currently open. I simply had to close the handle just before raising the
exception. There was another one, however, that came up from using
stdout=open('/dev/null', 'w') in the subprocess.call() to PAML programs
(which, come to think of it, is *nix-specific anyway, and probably
wouldn't work with Windows). Instead, I set stdout to a subprocess.PIPE
and get rid of the /dev/null handle altogether.

Cheers,
Brandon


On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote:
> On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> >
> > The above does work nicely for me. So nicely, in fact, that the PAML
> > tests all pass! So I'm still having trouble replicating the leaked
> > handles. I'm still trying to figure out what's happening...
> >
> 
> It could be something silly with warning silencing being global
> and not local, and thus depends on the order the tests are run in.
> 
> Did you try running all the (offline) tests in one go under Python 3.2?
> 
> Peter


From w.arindrarto at gmail.com  Sat Aug  6 05:52:13 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 6 Aug 2011 11:52:13 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
	<CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
Message-ID: <CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>

Hi Peter & everyone,

I've been trying to improve the parser so it works with forward-only
handles, but I'm drawing a blank for now.

I realized the reason I use seek in the first place was because of the file
structure. In an Abi file we've got three data blocks: the header that
contains the file information, the sequencing data, and the directories
which serve as indexes to the sequencing data. To unpack the sequencing data
bytes, we need the information stored in the directories. Depending on its
size, it could be stored outside the directories block, or in the directory
itself. This is why .seek() helps, because it allows for jumping between the
directories and the sequencing data as it is being parsed.

Now, I thought the three blocks were stored in this order: header -
directory - sequencing data. I've thought of a way of parsing the file if
the structure is like this. As it turns out, it's possible (or even this
might be the norm) that the order is: header - sequencing data - directory.
So as soon as I finished parsing the information on how to retrieve the data
from the directories, I've already gone past the data block. In forward-only
handles, this makes the data irretrievable.

There should be other ways to retrieve the sequencing data in forward-only
handles. I thought about reading the entire handle stream first and storing
it into a variable. This way, we could replace seek() with slicing
operators. The trade off is we store the entire handle stream in memory at
once (abi files are probably ~300-500kb in size). I'm sure there are other
ways, but I couldn't think of any now.

So what do you think? Or maybe anyone else have ideas that I could try?

Regards & have a nice weekend all,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Thu, Aug 4, 2011 at 13:47, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter,
> > Ah yes, I didn't know there could be handles without .seek() and .tell(),
> > and I thought those two are the proper way of traversing files, so I used
> > them. I also didn't realize you could use SeqIO with network handles,
> too.
> > This is really neat :).
>
> Yes - having a handle focused API makes some clever stuff possible :)
> Of course, parsing sequences directly from network handles isn't always
> a good idea, but it can be useful.
>
> > In any case, sure, I'd love to make some changes to the current AbiIO
> code
> > so it works without .seek() and .tell(). Is there any other input types
> that
> > does not use .seek() and .tell() other than network handles?
>
> I suspect some specialised handles for accessing compressed files might
> have similar limitations. In the case of gzip at least, I think it does
> support
> seek and tell.
>
> > Here's my new branch from the current master:
> > https://github.com/bow/biopython/tree/seqio-abi_handlefix
> > nothing different for now but I'll push my updates soon.
>
> Don't rush yourself - I'm away for a long weekend so won't be testing
> any updates till next week anyway.
>
> Thanks,
>
> Peter
>

From derjogi at web.de  Sun Aug  7 09:44:03 2011
From: derjogi at web.de (Jogi)
Date: Sun, 07 Aug 2011 15:44:03 +0200
Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map') +
	correction
Message-ID: <1312724643.2148.5.camel@JogiDesk>

I'm new to the field of 'bug reporting', so please, if someone knows
where I should post this message please tell me or do it yourself :)

I've found a bug in the Bio.Restriction module when calling
Analysis.print_as('map').

The bugs (that I know of and that I corrected):
1. When there is a restriction site within the first 60 basepairs in the
sequence this one isn't added to a list and thus raises an KeyError: 0

2. Sometimes (I don't know exactly how to reproduce it any more) an
Enzyme is repeated in every line although there is no restriction site.

Solution:

Replace from line 310 in PrintFormat.py:
        x, counter, length = 0, 0, len(self.sequence)
        for x in xrange(60, length, 60):
            counter = x - 60
            l=[]
            for key in mapping:
                if key <= x:
                    l.append(key)
                else:
                    cutloc[counter] = l
                    mapping = mapping[mapping.index(key):]
                    break
            cutloc[x] = l
        cutloc[x] = mapping
        sequence = self.sequence.tostring()

With
        upper, lower, length = 0, 0, len(self.sequence)
        for upper in xrange(60, length+60, 60):
            lower = upper - 60
            l=[]
            for key in mapping:
                if key <= upper and key > lower:
                    l.append(key)
                else:
                    mapping = mapping[mapping.index(key):]
                    break
            cutloc[lower] = l
        sequence = self.sequence.tostring()


Hope this bug report/solution was/is helpful and at the right place :)
J.Kuhn


From p.j.a.cock at googlemail.com  Tue Aug  9 09:40:18 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Aug 2011 14:40:18 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
	<CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
	<CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>
Message-ID: <CAKVJ-_6YnaVhnfux31NQq0kx7UDFWLYYxNRJvicEwMf=sGy=4w@mail.gmail.com>

On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter & everyone,
> I've been trying to improve the parser so it works with forward-only
> handles, but I'm drawing a blank for now.
> I realized the reason I use seek in the first place was because of the file
> structure. In an Abi file we've got three data blocks: the header that
> contains the file information, the sequencing data, and the directories
> which serve as indexes to the sequencing data. To unpack the sequencing data
> bytes, we need the information stored in the directories. Depending on its
> size, it could be stored outside the directories block, or in the directory
> itself. This is why .seek() helps, because it allows for jumping between the
> directories and the sequencing data as it is being parsed.

Yes - this design makes sense, especially given the computer
capabilities back when the format was designed.

> Now, I thought the three blocks were stored in this order: header -
> directory - sequencing data. I've thought of a way of parsing the file if
> the structure is like this.?As it turns out, it's possible (or even this
> might be the norm) that the order is: header - sequencing data - directory.
> So as soon as I finished parsing the information on how to retrieve the data
> from the directories, I've already gone past the data block. In forward-only
> handles, this makes the data irretrievable.

I see now, that is unfortunate. I presume the current order was chosen
to make writing the data easy (do the directory last). A simple forward
only parser would be possible IF the data was reordered, but we can't
require that.

> There should be other ways to retrieve the sequencing data in forward-only
> handles. I thought about reading the entire handle stream first and storing
> it into a variable. This way, we could replace seek() with slicing
> operators. The trade off is we store the entire handle stream in memory at
> once (abi files are probably ~300-500kb in size). I'm sure there are other
> ways, but I couldn't think of any now.
> So what do you think? Or maybe anyone else have ideas that I could try?
> Regards & have a nice weekend all,

I think we have to accept that typical ABI files are not suitable for forward
only parsing. Thanks for looking into this - I hope you found it interesting.

Regards,

Peter


From redmine at redmine.open-bio.org  Tue Aug  9 10:29:53 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 14:29:53 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use
	Gapped without import
Message-ID: <redmine.issue-3278.20110809142953@redmine.open-bio.org>


Issue #3278 has been reported by Paul Agapow.

----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug  9 10:29:54 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 14:29:54 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use
	Gapped without import
Message-ID: <redmine.issue-3278.20110809142953@redmine.open-bio.org>


Issue #3278 has been reported by Paul Agapow.

----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug  9 10:47:22 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 14:47:22 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] SeqIO tries to use Gapped
	without import
References: <redmine.issue-3278.20110809142953@redmine.open-bio.org>
Message-ID: <redmine.journal-14664.20110809144722@redmine.open-bio.org>


Issue #3278 has been updated by Peter Cock.


Looking at Biopython 1.53 (December 2009) you appear to be correct.

However, the function was explicitly made obsolete in Biopython 1.54 (with a deprecation warning), and at that point this error did not exist.

Unless there a related problem in the current release, I will close this report.

Thanks.
----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Aug  9 10:49:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Aug 2011 15:49:30 +0100
Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map')
	+ correction
In-Reply-To: <1312724643.2148.5.camel@JogiDesk>
References: <1312724643.2148.5.camel@JogiDesk>
Message-ID: <CAKVJ-_6Fv-Nom7=JbJ9Z0Vc+PQgNkTUqLwDSRp-EeGeWMPCfhA@mail.gmail.com>

On Sun, Aug 7, 2011 at 2:44 PM, Jogi <derjogi at web.de> wrote:
> I'm new to the field of 'bug reporting', so please, if someone knows
> where I should post this message please tell me or do it yourself :)
>
> I've found a bug in the Bio.Restriction module when calling
> Analysis.print_as('map').
>
> The bugs (that I know of and that I corrected):
> 1. When there is a restriction site within the first 60 basepairs in the
> sequence this one isn't added to a list and thus raises an KeyError: 0

Could you give a short example script showing the problem?
It could then be used for a unit test.

> 2. Sometimes (I don't know exactly how to reproduce it any more) an
> Enzyme is repeated in every line although there is no restriction site.

I'm not familiar with that problem - without an example that will be
hard to look into.

Peter

From w.arindrarto at gmail.com  Tue Aug  9 10:59:37 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 9 Aug 2011 16:59:37 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_6YnaVhnfux31NQq0kx7UDFWLYYxNRJvicEwMf=sGy=4w@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
	<CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
	<CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>
	<CAKVJ-_6YnaVhnfux31NQq0kx7UDFWLYYxNRJvicEwMf=sGy=4w@mail.gmail.com>
Message-ID: <CADEGkF4JQEisxGfR-yMCfx1v=MZW=VGAmoEg2BPeXfRwRR3qoA@mail.gmail.com>

Hi Peter,

You're welcome :)! Although a bit disappointing, it was nice when I
understood why my forward parser didn't work.

Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Tue, Aug 9, 2011 at 15:40, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter & everyone,
> > I've been trying to improve the parser so it works with forward-only
> > handles, but I'm drawing a blank for now.
> > I realized the reason I use seek in the first place was because of the
> file
> > structure. In an Abi file we've got three data blocks: the header that
> > contains the file information, the sequencing data, and the directories
> > which serve as indexes to the sequencing data. To unpack the sequencing
> data
> > bytes, we need the information stored in the directories. Depending on
> its
> > size, it could be stored outside the directories block, or in the
> directory
> > itself. This is why .seek() helps, because it allows for jumping between
> the
> > directories and the sequencing data as it is being parsed.
>
> Yes - this design makes sense, especially given the computer
> capabilities back when the format was designed.
>
> > Now, I thought the three blocks were stored in this order: header -
> > directory - sequencing data. I've thought of a way of parsing the file if
> > the structure is like this. As it turns out, it's possible (or even this
> > might be the norm) that the order is: header - sequencing data -
> directory.
> > So as soon as I finished parsing the information on how to retrieve the
> data
> > from the directories, I've already gone past the data block. In
> forward-only
> > handles, this makes the data irretrievable.
>
> I see now, that is unfortunate. I presume the current order was chosen
> to make writing the data easy (do the directory last). A simple forward
> only parser would be possible IF the data was reordered, but we can't
> require that.
>
> > There should be other ways to retrieve the sequencing data in
> forward-only
> > handles. I thought about reading the entire handle stream first and
> storing
> > it into a variable. This way, we could replace seek() with slicing
> > operators. The trade off is we store the entire handle stream in memory
> at
> > once (abi files are probably ~300-500kb in size). I'm sure there are
> other
> > ways, but I couldn't think of any now.
> > So what do you think? Or maybe anyone else have ideas that I could try?
> > Regards & have a nice weekend all,
>
> I think we have to accept that typical ABI files are not suitable for
> forward
> only parsing. Thanks for looking into this - I hope you found it
> interesting.
>
> Regards,
>
> Peter
>

From redmine at redmine.open-bio.org  Tue Aug  9 11:48:06 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 15:48:06 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] (Closed) SeqIO tries to use
	Gapped without import
References: <redmine.issue-3278.20110809142953@redmine.open-bio.org>
Message-ID: <redmine.journal-14665.20110809154806@redmine.open-bio.org>


Issue #3278 has been updated by Peter Cock.

Status changed from New to Closed
% Done changed from 0 to 100

I realised this deprecated function was due for removal, it will be gone in Biopython 1.58,
https://github.com/biopython/biopython/commit/9eb934ee0425b4636b26f310a0f1454f53745b17

Marking this bug as closed.
----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: Closed
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Wed Aug 10 13:12:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 10 Aug 2011 18:12:25 +0100
Subject: [Biopython-dev] pypaml
In-Reply-To: <AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
Message-ID: <CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>

On Fri, Jan 14, 2011 at 2:11 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> By the way, have you ever tried using this under Windows?
>
> I haven't yet but by the looks of it it should work fine assuming the
> programs are in the system path and thus can be called by name from
> any location in the file system. I see one line where I accidentally
> made it *nix-specific (default working directory is "./") but other
> than that, all files/directories are located via os.path or by
> user-inputted strings (as they would be in the control file). I have
> both a Linux and a Windows 7 machine at home though so I can do some
> testing. Obviously the unit tests here will help catch system-specific
> errors such as entering file locations incorrectly (I can see a few
> exceptions that I'm currently not handling).

Hi Brandon,

Have you looked into PAML under Windows yet?

Regards,

Peter

From b.invergo at gmail.com  Wed Aug 10 13:16:08 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Wed, 10 Aug 2011 19:16:08 +0200
Subject: [Biopython-dev] pypaml
In-Reply-To: <CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
Message-ID: <1312996570.1339.12.camel@localhost.localdomain>

On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote:
> Hi Brandon,
> 
> Have you looked into PAML under Windows yet?
> 
> Regards,
> 
> Peter

Hi Peter,
Unfortunately, I don't have a Windows machine at my disposal to test it
on! Has anyone reported any problems yet?

-brandon


From p.j.a.cock at googlemail.com  Thu Aug 11 07:36:41 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 12:36:41 +0100
Subject: [Biopython-dev] pypaml
In-Reply-To: <1312996570.1339.12.camel@localhost.localdomain>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
	<1312996570.1339.12.camel@localhost.localdomain>
Message-ID: <CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>

On Wed, Aug 10, 2011 at 6:16 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote:
>> Hi Brandon,
>>
>> Have you looked into PAML under Windows yet?
>>
>> Regards,
>>
>> Peter
>
> Hi Peter,
> Unfortunately, I don't have a Windows machine at my disposal to test it
> on! Has anyone reported any problems yet?
>
> -brandon

Hi Brandon,

It's a shame you don't still have access to the Windows 7 box.

I've just grabbed the current PAML 4.4 pre-compiled for Windows
and put it on my Windows machine which runs as a buildslave,
and put the binaries on the PATH:

http://abacus.gene.ucl.ac.uk/software/paml.html
http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz

None of the current unit tests actually use the binaries do they?
Could you add a basic test (in a separate file which raises the
missing dependency exception to skip the test if the binary is
not on the path) for calling the tools?

Peter

From b.invergo at gmail.com  Thu Aug 11 07:51:26 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Thu, 11 Aug 2011 13:51:26 +0200
Subject: [Biopython-dev] pypaml
In-Reply-To: <CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
	<1312996570.1339.12.camel@localhost.localdomain>
	<CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>
Message-ID: <1313063488.1339.28.camel@localhost.localdomain>

On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote:
> It's a shame you don't still have access to the Windows 7 box.
> 
> I've just grabbed the current PAML 4.4 pre-compiled for Windows
> and put it on my Windows machine which runs as a buildslave,
> and put the binaries on the PATH:
> 
> http://abacus.gene.ucl.ac.uk/software/paml.html
> http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz
> 
> None of the current unit tests actually use the binaries do they?
> Could you add a basic test (in a separate file which raises the
> missing dependency exception to skip the test if the binary is
> not on the path) for calling the tools?
> 
> Peter

No, I didn't include any tests that use the binaries because I wasn't
sure if they would be on the main test machine. Also, generating the
output which is used in other tests can take a lot of time in some
cases. Instead, I've generated the output files myself and then accessed
those from the tests. The one problem I have with this approach is that
it's not very reproducible; if someone else wishes to add data files
from later versions of PAML, they won't know how I generated them. Again
the goal is to make sure that we're parsing each new version correctly,
since the output format has been known to change between versions. I
could create a readme file which contains the info and put it in the
paml Tests subfolder. Sound reasonable?

I can create a Tests/test_PAML.py file to contain the proposed test. In
it, I can try to run codeml, baseml and yn00 directly using Subprocess,
each on some bogus input. If the binaries are there, they'll throw an
error which the test will catch. If they aren't Subprocess itself will
throw an error. I can't do this check using Bio.Phylo.PAML because we,
of course, aim to prevent bogus input from ever even reaching the
binary.
How does that sound? Is that what you had in mind?

-brandon


From p.j.a.cock at googlemail.com  Thu Aug 11 09:49:39 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 14:49:39 +0100
Subject: [Biopython-dev] pypaml
In-Reply-To: <1313063488.1339.28.camel@localhost.localdomain>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
	<1312996570.1339.12.camel@localhost.localdomain>
	<CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>
	<1313063488.1339.28.camel@localhost.localdomain>
Message-ID: <CAKVJ-_4aJ4xg1DzP-NmwPNFqLV+iJPZVGxdZNyu-DpCF3eJdng@mail.gmail.com>

On Thu, Aug 11, 2011 at 12:51 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote:
>> It's a shame you don't still have access to the Windows 7 box.
>>
>> I've just grabbed the current PAML 4.4 pre-compiled for Windows
>> and put it on my Windows machine which runs as a buildslave,
>> and put the binaries on the PATH:
>>
>> http://abacus.gene.ucl.ac.uk/software/paml.html
>> http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz
>>
>> None of the current unit tests actually use the binaries do they?
>> Could you add a basic test (in a separate file which raises the
>> missing dependency exception to skip the test if the binary is
>> not on the path) for calling the tools?
>>
>> Peter
>
> No, I didn't include any tests that use the binaries because I wasn't
> sure if they would be on the main test machine. Also, generating the
> output which is used in other tests can take a lot of time in some
> cases. Instead, I've generated the output files myself and then accessed
> those from the tests. The one problem I have with this approach is that
> it's not very reproducible; if someone else wishes to add data files
> from later versions of PAML, they won't know how I generated them.

Next time there is a PAML release, you'll have to make some more
test files ;)

> Again
> the goal is to make sure that we're parsing each new version correctly,
> since the output format has been known to change between versions. I
> could create a readme file which contains the info and put it in the
> paml Tests subfolder. Sound reasonable?

Yes.

> I can create a Tests/test_PAML.py file to contain the proposed test. In
> it, I can try to run codeml, baseml and yn00 directly using Subprocess,
> each on some bogus input. If the binaries are there, they'll throw an
> error which the test will catch. If they aren't Subprocess itself will
> throw an error. I can't do this check using Bio.Phylo.PAML because we,
> of course, aim to prevent bogus input from ever even reaching the
> binary. How does that sound? Is that what you had in mind?

I believe we're thinking on the same lines here - have a look at
test_Muscle_tool.py or test_Emboss.py and others like it. There is
some header code which tries to locate the binaries, and perhaps
check their version.

Some tools have a switch like -v or --help or similar which makes
them immediately exit, sometimes with a version number. This
is less trouble than trying to run them with a dummy input file.
Having had a quick play with ds.exe it generally seems to insist
on asking for an input file, so you may have to go that route. But
see if this is useful - probably you'd need /dev/nul on Unix machines:

C:\repositories\biopython\Tests>ds nul
results go into out.txt

(1) collecting min, max, and mean       0:00
(2) variance-covariance matrix      0:00
(3) median, percentiles & serial correlation       0:00
(4) Histograms and 1-D densities


If the binaries are missing or the wrong version, we raise
MissingExternalDependencyError and the test gets skipped.

If the binaries are present (and the right version), use the normal
unittest framework. Try to make the examples quick to run (aim
for well under a minute for the whole test), so smaller datafiles
than might be typical.

Peter

From p.j.a.cock at googlemail.com  Thu Aug 11 12:06:48 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 17:06:48 +0100
Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready
	to go?
Message-ID: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>

Hi Tiago & Bartek,

Looking over the DEPRECATED file, the following are about due for removal
in Bio.PopGen and Bio.Motif - do you guys have time to make these changes
yourselves?

Thanks,

Peter

> Bio.PopGen.FDist
> ================
> The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete
> in Release 1.54, and deprecated in Release 1.55 final. Their functionality is
> now available through a read() function.

and:

> Bio.Motif
> =========
> ...
> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete
> in Release 1.53 and deprecated in Release 1.55 final; their functionality is
> now available through a read() function in Bio.Motif.Parsers.AlignAce.
> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser,
> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and
> deprecated in Release 1.55 final; their functionality is now available through
> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST,
> respectively.

P.S. We don't usually need to mention private classes like _MEMEScanner in
the DEPRECATE file.

From tiagoantao at gmail.com  Thu Aug 11 12:15:08 2011
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 11 Aug 2011 17:15:08 +0100
Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif
	ready to go?
In-Reply-To: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
References: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
Message-ID: <CAA9RGEOwk+R3caBEUyXwPHHRV3VEhBs1tVOggdRLgJ181GPagw@mail.gmail.com>

I will do it over the weekend for bio.popgen

2011/8/11, Peter Cock <p.j.a.cock at googlemail.com>:
> Hi Tiago & Bartek,
>
> Looking over the DEPRECATED file, the following are about due for removal
> in Bio.PopGen and Bio.Motif - do you guys have time to make these changes
> yourselves?
>
> Thanks,
>
> Peter
>
>> Bio.PopGen.FDist
>> ================
>> The RecordParser, _Scanner, and _RecordConsumer classes were declared
>> obsolete
>> in Release 1.54, and deprecated in Release 1.55 final. Their functionality
>> is
>> now available through a read() function.
>
> and:
>
>> Bio.Motif
>> =========
>> ...
>> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared
>> obsolete
>> in Release 1.53 and deprecated in Release 1.55 final; their functionality
>> is
>> now available through a read() function in Bio.Motif.Parsers.AlignAce.
>> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser,
>> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and
>> deprecated in Release 1.55 final; their functionality is now available
>> through
>> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST,
>> respectively.
>
> P.S. We don't usually need to mention private classes like _MEMEScanner in
> the DEPRECATE file.
>

-- 
Enviada a partir do meu dispositivo m?vel

"If you want to get laid, go to college.  If you want an education, go
to the library." - Frank Zappa


From barwil at gmail.com  Thu Aug 11 12:28:01 2011
From: barwil at gmail.com (Bartek Wilczynski)
Date: Thu, 11 Aug 2011 09:28:01 -0700
Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif
	ready to go?
In-Reply-To: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
References: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
Message-ID: <CABHxouWCaSHhQzj3-hh9dRtN8HJCc_1sZc1XEZGY=G4SMAw-Tg@mail.gmail.com>

Hi,

I'll do the necessary changes in Bio.Motif by the end of the week.

best
Bartek

2011/8/11 Peter Cock <p.j.a.cock at googlemail.com>:
> Hi Tiago & Bartek,
>
> Looking over the DEPRECATED file, the following are about due for removal
> in Bio.PopGen and Bio.Motif - do you guys have time to make these changes
> yourselves?
>
> Thanks,
>
> Peter
>
>> Bio.PopGen.FDist
>> ================
>> The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete
>> in Release 1.54, and deprecated in Release 1.55 final. Their functionality is
>> now available through a read() function.
>
> and:
>
>> Bio.Motif
>> =========
>> ...
>> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete
>> in Release 1.53 and deprecated in Release 1.55 final; their functionality is
>> now available through a read() function in Bio.Motif.Parsers.AlignAce.
>> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser,
>> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and
>> deprecated in Release 1.55 final; their functionality is now available through
>> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST,
>> respectively.
>
> P.S. We don't usually need to mention private classes like _MEMEScanner in
> the DEPRECATE file.
>


-- 
Bartek Wilczynski
==================
Institute of Informatics
University of Warsaw
http://www.mimuw.edu.pl/~bartek

From redmine at redmine.open-bio.org  Mon Aug 15 05:59:39 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 15 Aug 2011 09:59:39 +0000
Subject: [Biopython-dev] [Biopython - Bug #3188] (Closed) Test bug,
	please ignore
References: <redmine.issue-3188.20110328134006@redmine.open-bio.org>
Message-ID: <redmine.journal-14672.20110815095939@redmine.open-bio.org>


Issue #3188 has been updated by Peter Cock.

Status changed from New to Closed
% Done changed from 0 to 100

Should have closed this test bug a while ago.
----------------------------------------
Bug #3188: Test bug, please ignore
https://redmine.open-bio.org/issues/3188

Author: Peter Cock
Status: Closed
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


The aim of this bug is to test the Redmine "Email on New Issue" option from the Newissuealerts module.

This issue should get emailed to the biopython-dev email list automatically...

Peter


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Aug 15 06:04:41 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Aug 2011 11:04:41 +0100
Subject: [Biopython-dev] Release blockers? PAML?
Message-ID: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>

Hi all,

We're about due to make a Biopython release, and I could
do it early this week - but then I'm away for a fortnight. I am
fortunate to be attending the BioHackathon 2011 in Kyoto
next week, http://2011.biohackathon.org/

I think we're in a good position with the code on the trunk to
release Biopython 1.58, bar the PAML code which has not
yet been tested on Windows. Also, I'd be keen for Tiago and
Brandon to take a look at the application calling code to see
if the is any scope for a more common approach between
the PAML wrappers and the PopGen tools. Note that both
sets of tools are not 'nicely behaved' Unix style tools (which
is what the Bio.Applications API targets). To do anything
useful with these tools you have to do nasty things like
switch the current working directory and so on.

If we want to do the release this week, we could just warn
that the PAML code is consider to be "in beta" and that
the API may well change in non-backwards compatible
ways?

What else should be addressed before the next release?

There are some open bugs, but at first glance nothing
critical.

Regards,

Peter

From b.invergo at gmail.com  Mon Aug 15 06:15:04 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 15 Aug 2011 12:15:04 +0200
Subject: [Biopython-dev] Release blockers? PAML?
In-Reply-To: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
References: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
Message-ID: <1313403306.3107.5.camel@localhost.localdomain>

Hi, 
Regarding PAML, I'm sorry I haven't implemented the binary tests yet.
I'll put it on my to-do for today. Turns out it's a Spanish national
holiday today so I guess I don't have to go to the lab.  

I have a Windows 7 laptop that up until now has been quarantined and
used only for music software, with no other software allowed on it, not
allowed near the interwebs, etc (it's a fickle machine), but last night
I broke the rules and installed Python 2.7 on it. I'll try running the
PAML tests on it and I'll let everyone know how it goes. 

Until later,
-brandon

On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote:
> Hi all,
> 
> We're about due to make a Biopython release, and I could
> do it early this week - but then I'm away for a fortnight. I am
> fortunate to be attending the BioHackathon 2011 in Kyoto
> next week, http://2011.biohackathon.org/
> 
> I think we're in a good position with the code on the trunk to
> release Biopython 1.58, bar the PAML code which has not
> yet been tested on Windows. Also, I'd be keen for Tiago and
> Brandon to take a look at the application calling code to see
> if the is any scope for a more common approach between
> the PAML wrappers and the PopGen tools. Note that both
> sets of tools are not 'nicely behaved' Unix style tools (which
> is what the Bio.Applications API targets). To do anything
> useful with these tools you have to do nasty things like
> switch the current working directory and so on.
> 
> If we want to do the release this week, we could just warn
> that the PAML code is consider to be "in beta" and that
> the API may well change in non-backwards compatible
> ways?
> 
> What else should be addressed before the next release?
> 
> There are some open bugs, but at first glance nothing
> critical.
> 
> Regards,
> 
> Peter


From eric.talevich at gmail.com  Mon Aug 15 11:02:57 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 15 Aug 2011 11:02:57 -0400
Subject: [Biopython-dev] Release blockers? PAML?
In-Reply-To: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
References: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
Message-ID: <CAMC681nEp6JeLyCYXamHtRSZ5VLkb=MwMWxHE3dw9dVJezYFUg@mail.gmail.com>

On Mon, Aug 15, 2011 at 6:04 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi all,
>
> We're about due to make a Biopython release, and I could
> do it early this week - but then I'm away for a fortnight. I am
> fortunate to be attending the BioHackathon 2011 in Kyoto
> next week, http://2011.biohackathon.org/
>
> [...]

> What else should be addressed before the next release?
>
> There are some open bugs, but at first glance nothing
> critical.
>
>
A while ago I pushed a new function, Phylo.draw(). It draws rooted
phylograms much like Phylip's drawgram or ape's plot.tree function. There's
a lot of room for personal preferences here, so I'd appreciate if someone
else could try it out and suggest changes.

Usage:
>>> from Bio import Phylo
>>> tree = Phylo.read('some_tree.nwk', 'newick')
>>> Phylo.draw(tree)

Code:
https://github.com/biopython/biopython/blob/master/Bio/Phylo/_utils.py

The function only takes a few arguments, but since it's based on
matplotlib/pylab, the aesthetics of a plot can easily be changed after the
initial plotting.

If we're happy with it, then I'll add a mention of it to the Tutorial.

While I'm at it, has anyone else used Bio.Applications.PhymlCommandline and
found any issues?

Thanks,
Eric

From b.invergo at gmail.com  Tue Aug 16 16:06:24 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 16 Aug 2011 22:06:24 +0200
Subject: [Biopython-dev] Release blockers? PAML?
In-Reply-To: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
References: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
Message-ID: <1313525186.3107.7.camel@localhost.localdomain>

Hi everyone,

I wrote some tests for the presence of the PAML binaries and I've run
all the unit tests in Python 2.7 on Windows 7 and they all pass.

Cheers,
Brandon


On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote:
> Hi all,
> 
> We're about due to make a Biopython release, and I could
> do it early this week - but then I'm away for a fortnight. I am
> fortunate to be attending the BioHackathon 2011 in Kyoto
> next week, http://2011.biohackathon.org/
> 
> I think we're in a good position with the code on the trunk to
> release Biopython 1.58, bar the PAML code which has not
> yet been tested on Windows. Also, I'd be keen for Tiago and
> Brandon to take a look at the application calling code to see
> if the is any scope for a more common approach between
> the PAML wrappers and the PopGen tools. Note that both
> sets of tools are not 'nicely behaved' Unix style tools (which
> is what the Bio.Applications API targets). To do anything
> useful with these tools you have to do nasty things like
> switch the current working directory and so on.
> 
> If we want to do the release this week, we could just warn
> that the PAML code is consider to be "in beta" and that
> the API may well change in non-backwards compatible
> ways?
> 
> What else should be addressed before the next release?
> 
> There are some open bugs, but at first glance nothing
> critical.
> 
> Regards,
> 
> Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 11:28:16 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 16:28:16 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
Message-ID: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>

Hi Brandon,

It looks like the stats line parsing in yn00 needs a little adjustment
for this platform,

======================================================================
ERROR: Test that the yn00 binary runs and generates correct output.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\repositories\BuildBot\win26\build\Tests\test_PAML_tools.py",
line 139, in testYn00Binary
    results = self.yn.run()
  File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py",
line 106, in run
    results = read(self.out_file)
  File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py",
line 131, in read
    sequences)
  File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\_parse_yn00.py",
line 110, in parse_others
    value = stats_split[i+2].strip("()")
IndexError: list index out of range

----------------------------------------------------------------------
Ran 157 tests in 282.385 seconds


I added this commit for a more helpful error message:
https://github.com/biopython/biopython/commit/420430164d258aae27714d907705cd729626f3c6

C:\repositories\biopython\Tests>c:\python26\python test_PAML_tools.py
Test that the baseml binary runs and generates correct output ... ok
Test that the codeml binary runs and generates correct output ... ok
Test that the yn00 binary runs and generates correct output. ... ERROR

======================================================================
ERROR: Test that the yn00 binary runs and generates correct output.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_PAML_tools.py", line 139, in testYn00Binary
    results = self.yn.run()
  File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 106, in run
    results = read(self.out_file)
  File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 131, in read
    sequences)
  File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\_parse_yn00.py",
line 113, in parse_others
    raise ValueError("Problem with stats line: %r" % line)
ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN =
-1.#IND w =-1.#IND S =   -1.$ N =   -1.$ (rho = -1.#IO)\n'

----------------------------------------------------------------------
Ran 3 tests in 1.312s

FAILED (errors=1)


It looks like you're not expecting a bracket pattern quite like that
(and/or this is a cross platform C float representation issue).

Hopefully that string is enough to work out how to fix the parser,
even if you can't reproduce this on your own machine. I can try
and find the output file if you like... might have to disable the
tool's clean up code temporarily to leave it behind.

Regards,

Peter

From p.j.a.cock at googlemail.com  Wed Aug 17 11:39:41 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 16:39:41 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
Message-ID: <CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi Brandon,
>
> It looks like the stats line parsing in yn00 needs a little adjustment
> for this platform,
> ...
> ? ?value = stats_split[i+2].strip("()")
> IndexError: list index out of range
>
>
> ...
> ? ?raise ValueError("Problem with stats line: %r" % line)
> ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN =
> -1.#IND w =-1.#IND S = ? -1.$ N = ? -1.$ (rho = -1.#IO)\n'

I think you need to adjustment to the bounds on i given you want to use
stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper
bound...

C:\repositories\biopython\Tests>git diff
diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py
index 221b6de..e4967fb 100644
--- a/Bio/Phylo/PAML/_parse_yn00.py
+++ b/Bio/Phylo/PAML/_parse_yn00.py
@@ -103,7 +103,7 @@ def parse_others(lines, results, sequences):
                 stats = {}
                 line_stats = line.split(":")[1].strip()
                 stats_split = line_stats.split()
-                for i in range(0, len(stats_split), 3):
+                for i in range(0, len(stats_split)-3, 3):
                     stat = stats_split[i].strip("()")
                     if stat == "w":
                         stat = "omega"


I don't know why this didn't come up under Linux, something subtle
going on between the PAML versions maybe?

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 13:02:24 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 18:02:24 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
Message-ID: <CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>

Hi again,

You may have noticed from the buildbot emails that there is a
separate issue with the PAML tests on Python (2.4 and) 2.5,
applying to executing all three binaries tried: yn00, baseml
and codeml, e.g.

http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.4/builds/259/steps/shell/logs/stdio

======================================================================
ERROR: Test that the yn00 binary runs and generates correct output.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\repositories\BuildBot\win24\build\Tests\test_PAML_tools.py",
line 139, in testYn00Binary
    results = self.yn.run()
  File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\yn00.py",
line 104, in run
    Paml.run(self, ctl_file, verbose, command)
  File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\_paml.py",
line 148, in run
    raise EnvironmentError, "The %s process was killed." % command
EnvironmentError: The yn00 process was killed.

----------------------------------------------------------------------


I can reproduce this at the terminal window, and it is specific
to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as
are Python 3.1 and 3.2.

Peter

From p.j.a.cock at googlemail.com  Wed Aug 17 13:56:28 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 18:56:28 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
	<CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>
Message-ID: <CAKVJ-_7DGxnyOCZc0Vx3_F8e4e4iF_=K-5NFiKsLGmJxLwdYGQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi again,
>
> You may have noticed from the buildbot emails that there is a
> separate issue with the PAML tests on Python (2.4 and) 2.5,
> applying to executing all three binaries tried: yn00, baseml
> and codeml, e.g.
> ...
> I can reproduce this at the terminal window, and it is specific
> to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as
> are Python 3.1 and 3.2.

I'm getting -1 back from the subprocess.call(...)
https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca

Some debugging later I realised the paths in the control file
were using Unix slashes rather than Windows slashes:
https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa

That should now just leave the yn00 stats parsing for you
to check (which offset should the fix use, assuming that
is the right fix).

It was worth insisting on more tests and running them on Windows :)

Regards,

Peter

From b.invergo at gmail.com  Wed Aug 17 14:43:04 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Wed, 17 Aug 2011 20:43:04 +0200
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_7DGxnyOCZc0Vx3_F8e4e4iF_=K-5NFiKsLGmJxLwdYGQ@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
	<CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>
	<CAKVJ-_7DGxnyOCZc0Vx3_F8e4e4iF_=K-5NFiKsLGmJxLwdYGQ@mail.gmail.com>
Message-ID: <1313606586.3107.9.camel@localhost.localdomain>

Hi, 
Just got home and saw the emails. Yes, in the end it was good to do the
extra tests! So the path separator problem is solved, right?

That indexing is a weird one. I'll look at it now.

-brandon

On Wed, 2011-08-17 at 18:56 +0100, Peter Cock wrote:
> On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Hi again,
> >
> > You may have noticed from the buildbot emails that there is a
> > separate issue with the PAML tests on Python (2.4 and) 2.5,
> > applying to executing all three binaries tried: yn00, baseml
> > and codeml, e.g.
> > ...
> > I can reproduce this at the terminal window, and it is specific
> > to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as
> > are Python 3.1 and 3.2.
> 
> I'm getting -1 back from the subprocess.call(...)
> https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca
> 
> Some debugging later I realised the paths in the control file
> were using Unix slashes rather than Windows slashes:
> https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa
> 
> That should now just leave the yn00 stats parsing for you
> to check (which offset should the fix use, assuming that
> is the right fix).
> 
> It was worth insisting on more tests and running them on Windows :)
> 
> Regards,
> 
> Peter


From b.invergo at gmail.com  Wed Aug 17 17:28:32 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Wed, 17 Aug 2011 23:28:32 +0200
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
Message-ID: <1313616514.3107.27.camel@localhost.localdomain>

Ok, I just sent a pull request. It turns out that either due to the way
C works in Windows or due to the way PAML was coded, what was a nice
"-nan" in Linux is printed as "-1.#IND" in Windows, which messed up
everything. Rather than parsing it in an algorithmic manner, I got angry
and threw some regex fu at it, which works a lot nicer than what I had
before.

Tested successfully in Linux and Windows 7, Python 2.7.2

-brandon

On Wed, 2011-08-17 at 16:39 +0100, Peter Cock wrote:
> On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Hi Brandon,
> >
> > It looks like the stats line parsing in yn00 needs a little adjustment
> > for this platform,
> > ...
> >    value = stats_split[i+2].strip("()")
> > IndexError: list index out of range
> >
> >
> > ...
> >    raise ValueError("Problem with stats line: %r" % line)
> > ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN =
> > -1.#IND w =-1.#IND S =   -1.$ N =   -1.$ (rho = -1.#IO)\n'
> 
> I think you need to adjustment to the bounds on i given you want to use
> stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper
> bound...
> 
> C:\repositories\biopython\Tests>git diff
> diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py
> index 221b6de..e4967fb 100644
> --- a/Bio/Phylo/PAML/_parse_yn00.py
> +++ b/Bio/Phylo/PAML/_parse_yn00.py
> @@ -103,7 +103,7 @@ def parse_others(lines, results, sequences):
>                  stats = {}
>                  line_stats = line.split(":")[1].strip()
>                  stats_split = line_stats.split()
> -                for i in range(0, len(stats_split), 3):
> +                for i in range(0, len(stats_split)-3, 3):
>                      stat = stats_split[i].strip("()")
>                      if stat == "w":
>                          stat = "omega"
> 
> 
> I don't know why this didn't come up under Linux, something subtle
> going on between the PAML versions maybe?
> 
> Regards,
> 
> Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 17:43:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 22:43:13 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <1313616514.3107.27.camel@localhost.localdomain>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
	<1313616514.3107.27.camel@localhost.localdomain>
Message-ID: <CAKVJ-_4PLb42auxnmuKYZPdFhhg66c+kWRNpiFEF9Rd4hsngxQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 10:28 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Ok, I just sent a pull request. It turns out that either due to the way
> C works in Windows or due to the way PAML was coded, what was a nice
> "-nan" in Linux is printed as "-1.#IND" in Windows, which messed up
> everything.

That sounds like the C float libraries, the oddities of which are
something which later versions of Python have done a better
and better job of hiding from us ;)

> Rather than parsing it in an algorithmic manner, I got angry
> and threw some regex fu at it, which works a lot nicer than what
> I had before.
>
> Tested successfully in Linux and Windows 7, Python 2.7.2
>
> -brandon

Sounds good - I'll have a look on github (possibly tomorrow),

Peter

From p.j.a.cock at googlemail.com  Thu Aug 18 12:10:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 18 Aug 2011 17:10:15 +0100
Subject: [Biopython-dev] Commit freeze for release 1.58
Message-ID: <CAKVJ-_6An1D82CAdD0tN6BzbMTSqhhRx4ES4Z+38aT82+_8tow@mail.gmail.com>

Hi all,

Unless anyone objects I propose to do the Biopython 1.58
release in the next hour. If this runs into any issues, it will
have to wait until I'm back at work in two weeks time, or
someone else (with access to a Windows 32 bit machine
with all the compilers setup) can tackle it instead.

I will be active online next week however - and coding -
but on Japan time: http://2011.biohackathon.org/

I'm assuming the NEWS file is up to date, and will as
usual be basing the release notice on that. If there is
anything missing, please reply by email.

Thank you all,

Peter

From p.j.a.cock at googlemail.com  Thu Aug 18 13:19:32 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 18 Aug 2011 18:19:32 +0100
Subject: [Biopython-dev] Commit freeze for release 1.58
In-Reply-To: <CAKVJ-_6An1D82CAdD0tN6BzbMTSqhhRx4ES4Z+38aT82+_8tow@mail.gmail.com>
References: <CAKVJ-_6An1D82CAdD0tN6BzbMTSqhhRx4ES4Z+38aT82+_8tow@mail.gmail.com>
Message-ID: <CAKVJ-_6F47YomzP+YVu69=6AA=MMuuFjhpnB0+yuNvmgpVenGA@mail.gmail.com>

On Thu, Aug 18, 2011 at 5:10 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> Unless anyone objects I propose to do the Biopython 1.58
> release in the next hour. If this runs into any issues, it will
> have to wait until I'm back at work in two weeks time, or
> someone else (with access to a Windows 32 bit machine
> with all the compilers setup) can tackle it instead.
>
> I will be active online next week however - and coding -
> but on Japan time: http://2011.biohackathon.org/
>
> I'm assuming the NEWS file is up to date, and will as
> usual be basing the release notice on that. If there is
> anything missing, please reply by email.
>
> Thank you all,
>
> Peter
>

Ok, that's done. And in news that will no doubt please
some of you, I've finally given up on keeping Python 2.4
support going. Feel free to start cleaning up some of the
nastier hacks (like the ElementTree imports).

Peter

From p.j.a.cock at googlemail.com  Thu Aug 18 15:32:57 2011
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 18 Aug 2011 20:32:57 +0100
Subject: [Biopython-dev] Biopython 1.58 released
Message-ID: <75327C54-CF88-43BC-BACF-87139456FE67@googlemail.com>

Dear All,

Biopython 1.58 is out:
http://news.open-bio.org/news/2011/08/biopython-1-58-released/

Thank you to everyone who has contributed.

Peter

P.S. We're on Twitter as @Biopython


From updates at feedmyinbox.com  Sun Aug 21 03:49:13 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Sun, 21 Aug 2011 03:49:13 -0400
Subject: [Biopython-dev] 8/21 newest questions tagged biopython - Stack
	Overflow
Message-ID: <0adf58b4241f2a58161d1a41524288d1@74.63.51.88>

// A PWM with gapped alignments in Biopython
// August 9, 2011 at 11:28 AM

http://stackoverflow.com/questions/6998727/a-pwm-with-gapped-alignments-in-biopython
I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments.  I get a "Wrong Alphabet" error every time I do it with gapped alignments.  From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments.  But when I do this, it still doesn't resolve the error.  Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()


--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&amp;sort=newest

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From updates at feedmyinbox.com  Sun Aug 21 03:48:37 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Sun, 21 Aug 2011 03:48:37 -0400
Subject: [Biopython-dev] 8/21 biopython Questions - BioStar
Message-ID: <44c53445166933a51ab21f5d53e72577@74.63.51.88>

// Error using Entrez.esummary from biopython
// August 16, 2011 at 8:47 AM

http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython
Can someone please explain this error?

I hava a smal script that tries to fetch information from the a NCBI BioAssay using the Entrez module form Bipython. I get an error I do not understand. I try to run:

from Bio import Entrez
Entrez.email="yourname at mail.se"

handle_esummary=Entrez.esummary(db='pcassay',id='1337')
record_esummary=Entrez.read(handle_esummary)


I get the error:

File "smaltest.py", line 5, in <module>
    record_esummary=Entrez.read(handle_esummary)
  File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read
    record = handler.run(handle)
  File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run
    self.parser.ParseFile(handle)
  File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement
    itemtype = str(attrs["Type"]) # convert from Unicode
KeyError: 'Type'


// Import fasta sequences to a motif
// August 15, 2011 at 11:54 AM

http://biostar.stackexchange.com/questions/11204/import-fasta-sequences-to-a-motif
I need to construct a PWM from every sequence in a fasta file, using biopython.  The way I'm trying to do this is to import each line of sequence into a motif, then run a PWM on each instance of the motif.  Currently, I'm trying it this way, but different variations of it have generated their fair share of errors, mostly "Wrong Alphabet" and "NoneType object is not iterable":

alphabet = IUPAC.unambiguous_dna
m = Motif.Motif(alphabet)

for seq_record in SeqIO.parse("10fasta.fasta", "fasta"):
    m.add_instance(seq_record.seq)
    print m1.pwm()


Does anyone see what's wrong with the way I'm adding instances to the motif?  Of course, if there's a better way to do this that I'm completely missing, feel free to comment on that too.


// A PWM with gapped alignments in Biopython
// August 9, 2011 at 1:47 PM

http://biostar.stackexchange.com/questions/11070/a-pwm-with-gapped-alignments-in-biopython
I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()


--
Website: http://biostar.stackexchange.com/questions/tagged/biopython

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From p.j.a.cock at googlemail.com  Mon Aug 22 02:53:17 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Aug 2011 07:53:17 +0100
Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via BioStar)
Message-ID: <CAKVJ-_67YzQCHrv2ZZyrbbtMiqu9xtJWZLsZPdXYSnTP4Ti9Wg@mail.gmail.com>

Hi all,

On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox <updates at feedmyinbox.com> wrote:
> // Error using Entrez.esummary from biopython
> // August 16, 2011 at 8:47 AM
>
> http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython
> Can someone please explain this error?
>
> I hava a smal script that tries to fetch information from the a
> NCBI BioAssay using the Entrez module form Bipython. I get
> an error I do not understand. I try to run:
>
> from Bio import Entrez
> Entrez.email="yourname at mail.se"
>
> handle_esummary=Entrez.esummary(db='pcassay',id='1337')
> record_esummary=Entrez.read(handle_esummary)
>
>
> I get the error:
>
> File "smaltest.py", line 5, in <module>
> ? ?record_esummary=Entrez.read(handle_esummary)
> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read
> ? ?record = handler.run(handle)
> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run
> ? ?self.parser.ParseFile(handle)
> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement
> ? ?itemtype = str(attrs["Type"]) # convert from Unicode
> KeyError: 'Type'
>

I can reproduce this and The cause is the NCBI using
lowercase in one tag's attribute:

<Item Name="SourceNameList" type="List">

We're expecting the attributes to be Name and Type, and
that is the case for all the other <Item> tags in this file.

Michiel - do you think we should just add a fallback for
type if we get a KeyError on Type? Do you think we should
report this inconsistency/bug to the NCBI?

Peter


From p.j.a.cock at googlemail.com  Mon Aug 22 03:03:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Aug 2011 08:03:30 +0100
Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via
	BioStar)
In-Reply-To: <CAKVJ-_67YzQCHrv2ZZyrbbtMiqu9xtJWZLsZPdXYSnTP4Ti9Wg@mail.gmail.com>
References: <CAKVJ-_67YzQCHrv2ZZyrbbtMiqu9xtJWZLsZPdXYSnTP4Ti9Wg@mail.gmail.com>
Message-ID: <CAKVJ-_783C_9JjJGar4stEu__c9v9Fk3gW=7sTf5b_VmJN-QUA@mail.gmail.com>

On Mon, Aug 22, 2011 at 7:53 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox <updates at feedmyinbox.com> wrote:
>> // Error using Entrez.esummary from biopython
>> // August 16, 2011 at 8:47 AM
>>
>> http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython
>> Can someone please explain this error?
>>
>> I hava a smal script that tries to fetch information from the a
>> NCBI BioAssay using the Entrez module form Bipython. I get
>> an error I do not understand. I try to run:
>>
>> from Bio import Entrez
>> Entrez.email="yourname at mail.se"
>>
>> handle_esummary=Entrez.esummary(db='pcassay',id='1337')
>> record_esummary=Entrez.read(handle_esummary)
>>
>>
>> I get the error:
>>
>> File "smaltest.py", line 5, in <module>
>> ? ?record_esummary=Entrez.read(handle_esummary)
>> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read
>> ? ?record = handler.run(handle)
>> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run
>> ? ?self.parser.ParseFile(handle)
>> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement
>> ? ?itemtype = str(attrs["Type"]) # convert from Unicode
>> KeyError: 'Type'
>>
>
> I can reproduce this and The cause is the NCBI using
> lowercase in one tag's attribute:
>
> <Item Name="SourceNameList" type="List">
>
> We're expecting the attributes to be Name and Type, and
> that is the case for all the other <Item> tags in this file.
>
> Michiel - do you think we should just add a fallback for
> type if we get a KeyError on Type? Do you think we should
> report this inconsistency/bug to the NCBI?

Actually it clearly violates the DTD, and thus fails XML
validation - so it is clearly a NCBI bug.

Peter


From chapmanb at 50mail.com  Tue Aug 23 15:31:34 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 23 Aug 2011 15:31:34 -0400
Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository
In-Reply-To: <CAKVJ-_738+2QeQgsTb14yWjYMKoWtNVxxgMUMFapvZNMyCSc6g@mail.gmail.com>
References: <CAKVJ-_738+2QeQgsTb14yWjYMKoWtNVxxgMUMFapvZNMyCSc6g@mail.gmail.com>
Message-ID: <20110823193134.GB507@kunkel>

Peter;
Awesome, thanks for doing this. I didn't even realize there was a
git solution that could transfer histories across repositories like
this; how did you do it?

Everything looks great on a first pass. Do you think some of the
scripts would also be useful to include in the script directory?
They handle some of the common cases people have asked about;
'access_gff_index.py' uses bx-python so might be excluded, but the
others are Biopython specific.

Thanks again,
Brad

> I managed to do a git script to select out the GFF code and tests from
> your bcbb repository and get it into the Biopython source tree. The
> folder changes made it interesting ;)
> 
> Input: https://github.com/chapmanb/bcbb (master branch)
> 
> Output: https://github.com/peterjc/biopython/tree/brad_gff
> 
> The tests pass, but that is as far as I have got with this. Brad,
> could you have a look at this new branch for sanity checking please?
> 
> Peter

From p.j.a.cock at googlemail.com  Tue Aug 23 22:33:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 24 Aug 2011 03:33:21 +0100
Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository
In-Reply-To: <20110823193134.GB507@kunkel>
References: <CAKVJ-_738+2QeQgsTb14yWjYMKoWtNVxxgMUMFapvZNMyCSc6g@mail.gmail.com>
	<20110823193134.GB507@kunkel>
Message-ID: <CAKVJ-_6HZO0qoUUQBbMJ-oRPdK50hj_=3bAZy6qefKAOO30+uw@mail.gmail.com>

On Tue, Aug 23, 2011 at 8:31 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter;
> Awesome, thanks for doing this. I didn't even realize there was a
> git solution that could transfer histories across repositories like
> this; how did you do it?

Well, it wasn't an off the shelf solution, it was a hack.

See https://gist.github.com/1167169
and https://github.com/gitpython-developers/GitPython

I used the Python library (import git) to query the source
repository, basically doing "git log -- gff/BCBio gff/Tests"
to find only the commits of interest, then "git show XXX"
to extract the diff which I then had to modify to change
the paths, then a system call to patch to apply each
patch to the destination repository, git add, git commit.
Note for git commit you can specify the message via
a file (-F) so I could preserve the original long message,
plus you can preserve the authored date (--date) and
the author too.

There were several steps where I couldn't work out
how you were meant to do something via the git
wrapper's API (e.g. get a diff as a patch), but it also
lets you easily call git commands directly which was
easier for me.

Bit hacky but seemed to get the job done.

> Everything looks great on a first pass. Do you think some of the
> scripts would also be useful to include in the script directory?
> They handle some of the common cases people have asked about;
> 'access_gff_index.py' uses bx-python so might be excluded, but the
> others are Biopython specific.
>
> Thanks again,
> Brad

Good point - that could be mapped to the Biopython
scripts folder. I'll take a look.

Peter

From updates at feedmyinbox.com  Thu Aug 25 03:48:40 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Thu, 25 Aug 2011 03:48:40 -0400
Subject: [Biopython-dev] 8/25 biopython Questions - BioStar
Message-ID: <738da676fc97903dba65147015733dc5@74.63.51.88>

// How to fetch genomics sequnce using coordinates in BIOPython
// August 24, 2011 at 10:56 PM

http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequnce-using-coordinates-in-biopython
Hi everyone,

I'm a newbie of biopython. My question may be stupid but please help.
I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome.
How can this be done with biopython connecting to NCBI database?
Could anyone help me please?

Thanks a lot.


// How to fetch genomics sequence using coordinates in BioPython
// August 24, 2011 at 10:56 PM

http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequence-using-coordinates-in-biopython
Hi everyone,

I'm a newbie of biopython. My question may be stupid but please help.
I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome.
How can this be done with biopython connecting to NCBI database?
Could anyone help me please?

Thanks a lot.


--
Website: http://biostar.stackexchange.com/questions/tagged/biopython

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From p.j.a.cock at googlemail.com  Fri Aug 26 03:44:32 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Aug 2011 08:44:32 +0100
Subject: [Biopython-dev] Biopython under Python from Cygwin on Windows?
Message-ID: <CAKVJ-_7aDUyv+ruVdQmkZ4yVgjc+hhKaLUNi7-+pGb2hqZwPPg@mail.gmail.com>

Hi all,

I was just wondering if anyone has tried this recently
(Biopython under Cygwin), and if it would be worth
adding as another platform for the buildbot. There
are likely enough differences from Linux to cause
potential cross platform issues - especially for calling
external tools...

Regards,

Peter

From updates at feedmyinbox.com  Fri Aug 26 04:05:18 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Fri, 26 Aug 2011 04:05:18 -0400
Subject: [Biopython-dev] 8/26 newest questions tagged biopython - Stack
	Overflow
Message-ID: <d193273feedfd7bf650264a3d6525a5a@74.63.51.88>

// How do I set the PYTHONPATH on Cygwin?
// August 25, 2011 at 9:16 PM

http://stackoverflow.com/questions/7199082/how-do-i-set-the-pythonpath-on-cygwin
In the Biopython installation instructions, it says that if Biopython doesn't work I'm supposed to do this:

export PYTHONPATH = $PYTHONPATH':/directory/where/you/put/Biopython'

I tried doing that in Cygwin from the ~ directory using the name of the Biopython directory (or everything of it past the ~ directory), but when I tested it by going into the Python interpreter and typing in


    From Bio.Seq import Seq
  

It said the module doesn't exist.

How do I make it so that I don't have to be in the Biopython directory to be able to import Seq?


--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&amp;sort=newest

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From clements at galaxyproject.org  Mon Aug 29 17:29:28 2011
From: clements at galaxyproject.org (Dave Clements)
Date: Mon, 29 Aug 2011 14:29:28 -0700
Subject: [Biopython-dev] Galaxy is Hiring
In-Reply-To: <CA+He-X8EiBEMsx+AwvXvM+H0XfRDv=MONLoQ9w+Y6HLHd_JmLw@mail.gmail.com>
References: <CA+He-X8EiBEMsx+AwvXvM+H0XfRDv=MONLoQ9w+Y6HLHd_JmLw@mail.gmail.com>
Message-ID: <CA+He-X-xHsgmr4HRW_xKxwEtKKJ9if9LvVPgjStC-LGf0vnG9Q@mail.gmail.com>

Hello all

The Galaxy Project is growing and has open positions in both the Penn State
and Emory groups (http://wiki.g2.bx.psu.edu/News/Galaxy%20is%20Hiring).

*Penn State: System administrators/analysts*

The Nekrutenko Lab <http://www.bx.psu.edu/%7Eanton/> at the Huck Institutes
of Life Sciences <http://www.huck.psu.edu/> at Penn State
<http://psu.edu/>is currently recruiting system
analysts/administrators with experience in
building and maintaining complex performance compute environments. The areas
of immediate need include:

   - Storage balancing and tiered storage
   - Virtualization
   - Schedulers
   - Deployment of Galaxy instances and dependence management
   - Relational databases and query optimization
   - User management

A minimum of 5 year experience with UNIX/Linux system administration is
required. Applicants should submit a CV and list of references to
jobs at galaxyproject.org.

<http://bx.mathcs.emory.edu/joining/>
*Emory: Software Engineers and Post-Docs*

The Taylor Lab <http://bx.mathcs.emory.edu/> in the
Biology<http://www.biology.emory.edu/>and Mathematics
& Computer Science <http://www.mathcs.emory.edu/> at Emory
University<http://emory.edu/>is looking for software
engineers <http://bx.mathcs.emory.edu/joining/sw/> and postdoctoral
scholars<http://bx.mathcs.emory.edu/joining/postdocs/>to work on the
Galaxy project.

We are seeking software engineers
<http://bx.mathcs.emory.edu/joining/sw/>with expertise in distributed
computing and systems programming, web-based
visualization and visual analytics, informatics and data analysis and
integration, and bioinformatics application areas such as re-sequencing, de
novo assembly, metagenomics, transcriptome analysis and epigenetics. These
are full time positions located in Atlanta, GA. See the official
posting<http://bx.mathcs.emory.edu/joining/sw/>(
http://bx.mathcs.emory.edu/joining/sw/) for full details.
Postdoctoral applicants
<http://bx.mathcs.emory.edu/joining/postdocs/>should have expertise in
Bioinformatics and Computational Biology and
research interests that complement but extend the lab's current
interests<http://bx.mathcs.emory.edu/research/>:
The Galaxy project; distributed and high-performance computing for data
intensive science; vertebrate functional genomics; and genomics and
epigenomic mechanisms of gene regulation, the role of transcription factors
and chromatin structure in global gene expression, development, and
differentiation. See the
announcement<http://bx.mathcs.emory.edu/joining/postdocs/>(
http://bx.mathcs.emory.edu/joining/postdocs/) for full details.


If any of these openings describe you then please consider applying.

Thanks,

Dave C.


-- 
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/

From redmine at redmine.open-bio.org  Mon Aug  1 05:24:51 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 1 Aug 2011 05:24:51 +0000
Subject: [Biopython-dev] [Biopython - Feature #3271] Updates to PDBList.py-
	downloading PDB structures
References: <redmine.issue-3271.20110726201643@redmine.open-bio.org>
Message-ID: <redmine.journal-14650.20110801052451@redmine.open-bio.org>


Issue #3271 has been updated by David Cain.


Hi, Eric. I'm glad you like my changes, and I appreciate your feedback. I made some changes in line with your suggestions and submitted my branch as a pull request.

Thank you again for the response.
----------------------------------------
Feature #3271: Updates to PDBList.py- downloading PDB structures
https://redmine.open-bio.org/issues/3271

Author: David Cain
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 1.57
URL: https://github.com/DavidCain/biopython


PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter.

Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue.

My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Mon Aug  1 14:57:06 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 1 Aug 2011 14:57:06 +0000
Subject: [Biopython-dev] [Biopython - Feature #3271] (Closed) Updates to
	PDBList.py- downloading PDB structures
References: <redmine.issue-3271.20110726201643@redmine.open-bio.org>
Message-ID: <redmine.journal-14652.20110801145706@redmine.open-bio.org>


Issue #3271 has been updated by Eric Talevich.

Status changed from New to Closed
% Done changed from 0 to 100

Merged it:
https://github.com/biopython/biopython/pull/14

I think we could do more work on the docstrings and comments, generally, but it's out of the scope of this bug.

Thanks again!
----------------------------------------
Feature #3271: Updates to PDBList.py- downloading PDB structures
https://redmine.open-bio.org/issues/3271

Author: David Cain
Status: Closed
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 1.57
URL: https://github.com/DavidCain/biopython


PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter.

Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue.

My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Aug  2 16:43:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 17:43:30 +0100
Subject: [Biopython-dev] Leaked handles in PAML unit tests
Message-ID: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>

Hi Brandon,

Would you be able to look at these handle leaks in the PAML unit tests
some time?

test_PAML_baseml ... /Users/pjcock/lib/python3.2/unittest/case.py:574:
ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad1.ctl'
mode='r' encoding='UTF-8'>
  callableObj(*args, **kwargs)
/Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning:
unclosed file <_io.TextIOWrapper name='PAML/bad2.ctl' mode='r'
encoding='UTF-8'>
  callableObj(*args, **kwargs)
/Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning:
unclosed file <_io.TextIOWrapper name='/dev/null' mode='w'
encoding='UTF-8'>
  callableObj(*args, **kwargs)
ok
test_PAML_codeml ... ok
test_PAML_yn00 ... /Users/pjcock/lib/python3.2/unittest/case.py:574:
ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad3.ctl'
mode='r' encoding='UTF-8'>
  callableObj(*args, **kwargs)
ok

This is warning is new under Python 3.2, but this kind of code can and
has caused bugs on Windows (can't delete files if there is an open
handle) and Jython (different GC collection, so implicit handle closing
is stochastic). See also:

http://bugs.python.org/issue10093

Note there are other cases of this, some in PopGen (which may
explain a periodic failure under Jython), and in test_SCOP_Astral.py
(where the object design makes this difficult to avoid IIRC), etc.

Peter


From p.j.a.cock at googlemail.com  Tue Aug  2 16:47:20 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 17:47:20 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
Message-ID: <CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>

On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> I've done some more improvements to the code:
> - I've written the check and unittest for the file handle mode. I've set it
> so that abi file has to be opened in 'rb' mode, otherwise it'll return an
> error. While it's ok to open in 'r' mode in python 2 in Linux, it has to be
> specified as 'rb' in Windows and/or Python 3 for the file to be read
> correctly. So I decided forcing it to 'rb' is the best. Because of this, I
> changed 'test_SeqIO.py:503' to include the mode argument when opening.

OK, good.

> - I've also checked against test_Emboss.py for seqret output, after
> including the abi format in it. My EMBOSS version is 6.4.0. There was a
> slight problem with this testing, since for some reason the ID returned by
> seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS
> installation, since when I previously tested it against 6.1.0, the ID was
> correct (although the qual values not, so I had to upgrade). As expected, if
> I comment out the code that tests for sequence id ('test_Emboss.py:168-172')
> the tests pass. Maybe you could try testing it as well and see if EMBOSS
> also returns the default id instead of the sample name?

EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS 6.4.0

> - Finally, I did some small cosmetic changes to the code (typos, etc).
> All changes have been pushed to my github fork. Now I still have time for
> the weekend to improve whatever needs to be improved :).
> Regards,

There appears to be another Python 3 problem, consider this at the
python prompt:

from Bio import SeqIO
record = SeqIO.read("Tests/Abi/310.ab1", "abi")
record.letter_annotations["phred_quality"]

I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00',
'\x00', '\x00', ..., '\x00']

Peter


From w.arindrarto at gmail.com  Tue Aug  2 16:53:46 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 2 Aug 2011 18:53:46 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
Message-ID: <CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>

Hi Peter,

I noticed that bug was because I did not add the _bytes_to_string()
converter for a data type. I already fixed this with my latest push, adding
the appropriate if clause at AbiIO.py:293-294.

Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Tue, Aug 2, 2011 at 18:47, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter,
> > I've done some more improvements to the code:
> > - I've written the check and unittest for the file handle mode. I've set
> it
> > so that abi file has to be opened in 'rb' mode, otherwise it'll return an
> > error. While it's ok to open in 'r' mode in python 2 in Linux, it has to
> be
> > specified as 'rb' in Windows and/or Python 3 for the file to be read
> > correctly. So I decided forcing it to 'rb' is the best. Because of this,
> I
> > changed 'test_SeqIO.py:503' to include the mode argument when opening.
>
> OK, good.
>
> > - I've also checked against test_Emboss.py for seqret output, after
> > including the abi format in it. My EMBOSS version is 6.4.0. There was a
> > slight problem with this testing, since for some reason the ID returned
> by
> > seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS
> > installation, since when I previously tested it against 6.1.0, the ID was
> > correct (although the qual values not, so I had to upgrade). As expected,
> if
> > I comment out the code that tests for sequence id
> ('test_Emboss.py:168-172')
> > the tests pass. Maybe you could try testing it as well and see if EMBOSS
> > also returns the default id instead of the sample name?
>
> EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS
> 6.4.0
>
> > - Finally, I did some small cosmetic changes to the code (typos, etc).
> > All changes have been pushed to my github fork. Now I still have time for
> > the weekend to improve whatever needs to be improved :).
> > Regards,
>
> There appears to be another Python 3 problem, consider this at the
> python prompt:
>
> from Bio import SeqIO
> record = SeqIO.read("Tests/Abi/310.ab1", "abi")
> record.letter_annotations["phred_quality"]
>
> I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00',
> '\x00', '\x00', ..., '\x00']
>
> Peter
>


From p.j.a.cock at googlemail.com  Tue Aug  2 17:57:56 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 18:57:56 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
Message-ID: <CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>

On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> I noticed that bug was because I did not add the _bytes_to_string()
> converter for a data type. I already fixed this with my latest push, adding
> the appropriate if clause at AbiIO.py:293-294.
> Regards,

Was that only half the fix? This made it work for me:

https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45

and:

https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e

Peter


From p.j.a.cock at googlemail.com  Tue Aug  2 18:03:24 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 2 Aug 2011 19:03:24 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
Message-ID: <CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>

On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
>> Hi Peter,
>> I noticed that bug was because I did not add the _bytes_to_string()
>> converter for a data type. I already fixed this with my latest push, adding
>> the appropriate if clause at AbiIO.py:293-294.
>> Regards,
>
> Was that only half the fix? This made it work for me:
>
> https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45
>
> and:
>
> https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e
>
> Peter
>

Could you test this branch, which I think is ready to be merged to the
trunk now:

https://github.com/peterjc/biopython/tree/seqio-abi

Thanks,

Peter


From w.arindrarto at gmail.com  Wed Aug  3 12:14:53 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 3 Aug 2011 14:14:53 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
Message-ID: <CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>

Hi Peter,

My bad, I forgot to change that one line and didn't test before comitting.
Thanks for fixing it.

I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the
results:

- On both py2.6.5  and py3.1.2, I have the following test case error:
"NameError: global name 'embossversion' is not defined", on line 257. I
didn't have "EMBOSS_ROOT" in my os.environ paths (I installed 6.4.0 from
source, by the way), so this must be what's causing it. Is there another way
to automatically detect EMBOSS_ROOT other than this? Or perhaps we should
avoid emboss 6.4.0's bug by only checking if the id is EMBOSS_001? The only
case I think this would fail is if the user inputs "EMBOSS_001" before the
sequencing run as the sample id, which is possible but unlikely.

- On a related note, I noticed you set the minimum Emboss requirement to
6.1.0 patch 3. I'm not sure if this the one I use previously, but my
previous Emboss 6.1.0 installation failed to extract the proper quality
values. Perhaps we should set the minimum version to 6.3.1? (well, making it
the only Emboss version that works with Biopython because of that 6.4.0
bug).

- Other than those two, everything's tip top :).


Regards,
Wibowo Arindrarto (bow)
http://bow.web.id


On Tue, Aug 2, 2011 at 20:03, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> > On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto
> > <w.arindrarto at gmail.com> wrote:
> >> Hi Peter,
> >> I noticed that bug was because I did not add the _bytes_to_string()
> >> converter for a data type. I already fixed this with my latest push,
> adding
> >> the appropriate if clause at AbiIO.py:293-294.
> >> Regards,
> >
> > Was that only half the fix? This made it work for me:
> >
> >
> https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45
> >
> > and:
> >
> >
> https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e
> >
> > Peter
> >
>
> Could you test this branch, which I think is ready to be merged to the
> trunk now:
>
> https://github.com/peterjc/biopython/tree/seqio-abi
>
> Thanks,
>
> Peter
>


From macrozhu at gmail.com  Wed Aug  3 13:47:07 2011
From: macrozhu at gmail.com (Hongbo Zhu)
Date: Wed, 3 Aug 2011 15:47:07 +0200
Subject: [Biopython-dev] inconsistent return values
	Bio.PDB.NeighborSearch.search()
Message-ID: <CABHu63pviognJX42+kyPDt+yxZrvETPygTuv5bAkm0FnZkECuw@mail.gmail.com>

Hi, python-developers,

In the current version of BioPython (source code as of 3 Aug. 2011), it
seems the outcome of *Bio.PDB.NeighborSearch.search()* is inconsistent if
different levels are specified when the returned list is empty.

e.g.

> ns.search(center, radius, 'A')
> []
> ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S'
> IndexError: list index out of range

Obviously, this is because the Bio.PDB.NeighborSearch.search() functions
tries to convert returned list to levels other than 'A' using
function Bio.PDB.Selection.unfold_entities() (see line 92 in
NeighborSearch.py). In function unfold_entities(), the first element of
input argument entity_list is evaluated without entity_list being checked
for emptiness (see line 47 in Selection.py). An IndexError is raised when
entity_list is empty.

So, I think either the length of the returned list in
Bio.PDB.NeighborSearch.search()
should be checked before invoking Bio.PDB.Selection.unfold_entities(), or
the function Bio.PDB.Selection.unfold_entities() should be revised so that
it simply returns an empty list if the argument entity_list is empty. I
prefer the latter solution because this would also fix other similar
situations when  Bio.PDB.Selection.unfold_entities() is invoked in other
functions.

And it seems "Sorry, entering bugs into the product Biopython has been
disabled."

regards,
Hongbo Zhu


From p.j.a.cock at googlemail.com  Wed Aug  3 13:58:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 3 Aug 2011 14:58:13 +0100
Subject: [Biopython-dev] inconsistent return values
	Bio.PDB.NeighborSearch.search()
In-Reply-To: <CABHu63pviognJX42+kyPDt+yxZrvETPygTuv5bAkm0FnZkECuw@mail.gmail.com>
References: <CABHu63pviognJX42+kyPDt+yxZrvETPygTuv5bAkm0FnZkECuw@mail.gmail.com>
Message-ID: <CAKVJ-_4=W0wuVZ0__Mawh340ajQfPkS_7yOFNT4ikH8rUdGX1g@mail.gmail.com>

On Wed, Aug 3, 2011 at 2:47 PM, Hongbo Zhu <macrozhu at gmail.com> wrote:
>
> And it seems "Sorry, entering bugs into the product Biopython has been
> disabled."

We moved from Bugzilla to Redmine, links on the main homepage
were updated: http://redmine.open-bio.org/projects/biopython

I wonder if we can change that message text or something...

Peter


From p.j.a.cock at googlemail.com  Wed Aug  3 14:04:46 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 3 Aug 2011 15:04:46 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
Message-ID: <CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>

On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> My bad, I forgot to change that one line and didn't test before comitting.
> Thanks for fixing it.
> I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the
> results:
> - On both py2.6.5 ?and py3.1.2, I have the following test case error:
> "NameError: global name 'embossversion' is not defined", on line 257.
>...

It was simpler than that - I'd checked it in with a typo, emboss_version
was what I wanted. Sorry about that confusion!

> - On a related note, I noticed you set the minimum Emboss requirement to
> 6.1.0 patch 3. I'm not sure if this the one I use previously, but my
> previous Emboss 6.1.0 installation failed to extract the proper quality
> values. Perhaps we should set the minimum version to 6.3.1? (well, making it
> the only Emboss version that works with Biopython because of that 6.4.0
> bug).

We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later,
which is why that requirement exists. Asking for at least EMBOSS
6.3.1 makes no practical difference as far as I can see.

If you meant require EMBOSS 6.4.1 that hasn't been released yet.

I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after
I've tested the proposed patch Peter Rice sent), but that will still
report itself as EMBOSS 6.4.0 (based on past patch behaviour,
something I consider annoying but have to live with).

> - Other than those two, everything's tip top :).
>

Great. I've pushed the code to the main repository, and have
just set off the buildbot slaves as a final sanity test.

This reveal a minor Python 2.4 breakage (not a big issue - it only
seems to be me still trying to keep testing this - and I'm about
ready to give up), and another probable EMBOSS bug in an
older version installed on one buildslave.

Congratulations, your code will be in the next Biopython release.

Thank you,

Peter


From redmine at redmine.open-bio.org  Wed Aug  3 14:52:32 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Wed, 3 Aug 2011 14:52:32 +0000
Subject: [Biopython-dev] [Biopython - Bug #3276] (New) inconsistent returns
	of Bio.PDB.NeighborSearch.search()
Message-ID: <redmine.issue-3276.20110803145232@redmine.open-bio.org>


Issue #3276 has been reported by Hongbo Zhu.

----------------------------------------
Bug #3276: inconsistent returns of Bio.PDB.NeighborSearch.search()
https://redmine.open-bio.org/issues/3276

Author: Hongbo Zhu
Status: New
Priority: Normal
Assignee: 
Category: 
Target version: 
URL: 


In the current version of BioPython (source code as of 3 Aug. 2011), it seems the outcome of Bio.PDB.NeighborSearch.search() is inconsistent if different levels are specified when the returned list is empty.

i.e.
@
ns.search(center, radius, 'A')
[]
ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S'
IndexError: list index out of range
@
Obviously, this is because the Bio.PDB.NeighborSearch.search() functions tries to convert returned list to levels other than 'A' using function Bio.PDB.Selection.unfold_entities() (see line 92 in NeighborSearch.py). In function unfold_entities(), the first element of input argument entity_list is evaluated without entity_list being checked for emptiness (see line 47 in Selection.py). An IndexError is raised when entity_list is empty.

So, I think either the length of the returned list in Bio.PDB.NeighborSearch.search() should be checked before invoking Bio.PDB.Selection.unfold_entities(), or the function Bio.PDB.Selection.unfold_entities() should be revised so that it simply returns an empty list if the argument entity_list is empty. I prefer the latter solution because this would also fix other similar situations when  Bio.PDB.Selection.unfold_entities() is invoked in other functions.

cheers, hongbo


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From w.arindrarto at gmail.com  Wed Aug  3 15:11:13 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Wed, 3 Aug 2011 17:11:13 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
Message-ID: <CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>

Hi Peter,

On Wed, Aug 3, 2011 at 16:04, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter,
> > My bad, I forgot to change that one line and didn't test before
> comitting.
> > Thanks for fixing it.
> > I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the
> > results:
> > - On both py2.6.5  and py3.1.2, I have the following test case error:
> > "NameError: global name 'embossversion' is not defined", on line 257.
> >...

It was simpler than that - I'd checked it in with a typo, emboss_version
> was what I wanted. Sorry about that confusion!


Silly me, I should've noticed you used emboss_version when I was looking at
the code checking Emboss dependency :/.


> > - On a related note, I noticed you set the minimum Emboss requirement to
> > 6.1.0 patch 3. I'm not sure if this the one I use previously, but my
> > previous Emboss 6.1.0 installation failed to extract the proper quality
> > values. Perhaps we should set the minimum version to 6.3.1? (well, making
> it
> > the only Emboss version that works with Biopython because of that 6.4.0
> > bug).
>
> We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later,
> which is why that requirement exists. Asking for at least EMBOSS
> 6.3.1 makes no practical difference as far as I can see.
>
> If you meant require EMBOSS 6.4.1 that hasn't been released yet.
>
> I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after
> I've tested the proposed patch Peter Rice sent), but that will still
> report itself as EMBOSS 6.4.0 (based on past patch behaviour,
> something I consider annoying but have to live with).


I meant Emboss 6.3.1, since that seems to be one that works best with the
current AbiIO implementation. But yeah, I guess as long as the tests work
it's fine.


> > - Other than those two, everything's tip top :).
> >
>
> Great. I've pushed the code to the main repository, and have
> just set off the buildbot slaves as a final sanity test.
>
> This reveal a minor Python 2.4 breakage (not a big issue - it only
> seems to be me still trying to keep testing this - and I'm about
> ready to give up), and another probable EMBOSS bug in an
> older version installed on one buildslave.
>
> Congratulations, your code will be in the next Biopython release.
>
> Thank you,
>
> Peter
>

This really made my day :)! You're welcome and thank you reviewing my code,
too!


Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


From w.arindrarto at gmail.com  Thu Aug  4 11:30:44 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Thu, 4 Aug 2011 13:30:44 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
Message-ID: <CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>

Hi Peter,

Ah yes, I didn't know there could be handles without .seek() and .tell(),
and I thought those two are the proper way of traversing files, so I used
them. I also didn't realize you could use SeqIO with network handles, too.
This is really neat :).

In any case, sure, I'd love to make some changes to the current AbiIO code
so it works without .seek() and .tell(). Is there any other input types that
does not use .seek() and .tell() other than network handles? Here's my new
branch from the current master:
https://github.com/bow/biopython/tree/seqio-abi_handlefix, nothing different
for now but I'll push my updates soon.


Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Thu, Aug 4, 2011 at 13:03, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > On Wed, Aug 3, 2011 at 16:04, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
> >> ...
> >> Congratulations, your code will be in the next Biopython release.
> >> ...
> >
> > This really made my day :)! You're welcome and thank you reviewing my
> code,
> > too!
>
> I found something else to work on (sorry!). You're using seek and tell,
> which
> may not exist. Network handles are a good example of this situation. Try:
>
> from urllib import urlopen
> from Bio import SeqIO
> handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1")
> record = SeqIO.read(handle, "abi")
> handle.close()
>
> I've added some code to test_SeqIO.py to simulate this, which revealed that
> the SFF parser was also using the tell method. In that case we must track
> the
> offset explicitly (it is needed for handling SFF index blocks). You can see
> how
> I did this here - note I avoid the overhead of tracking the offset in
> general:
>
> https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc
>
> I've tried the same trick in the ABI parser, but this reveals your code
> likes to
> seek backwards. Try the attached patch against this revision to confirm
> this.
>
> Having looked over your code, I don't believe you need to use seek and tell
> at all. This isn't critical to fix right now, but I would like us to
> solve it. Would
> you like to try? Make a new branch from the current master for this please.
>
> Regards,
>
> Peter
>


From p.j.a.cock at googlemail.com  Thu Aug  4 11:03:27 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 12:03:27 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
Message-ID: <CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>

On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> On Wed, Aug 3, 2011 at 16:04, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>> ...
>> Congratulations, your code will be in the next Biopython release.
>> ...
>
> This really made my day :)! You're welcome and thank you reviewing my code,
> too!

I found something else to work on (sorry!). You're using seek and tell, which
may not exist. Network handles are a good example of this situation. Try:

from urllib import urlopen
from Bio import SeqIO
handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1")
record = SeqIO.read(handle, "abi")
handle.close()

I've added some code to test_SeqIO.py to simulate this, which revealed that
the SFF parser was also using the tell method. In that case we must track the
offset explicitly (it is needed for handling SFF index blocks). You can see how
I did this here - note I avoid the overhead of tracking the offset in general:
https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc

I've tried the same trick in the ABI parser, but this reveals your code likes to
seek backwards. Try the attached patch against this revision to confirm this.

Having looked over your code, I don't believe you need to use seek and tell
at all. This isn't critical to fix right now, but I would like us to
solve it. Would
you like to try? Make a new branch from the current master for this please.

Regards,

Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tell_hack.patch
Type: application/octet-stream
Size: 1466 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20110804/bd28e873/attachment-0002.obj>

From p.j.a.cock at googlemail.com  Thu Aug  4 11:47:49 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 12:47:49 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
Message-ID: <CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>

On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter,
> Ah yes, I didn't know there could be handles without .seek() and .tell(),
> and I thought those two are the proper way of traversing files, so I used
> them. I also didn't realize you could use SeqIO with network handles, too.
> This is really neat :).

Yes - having a handle focused API makes some clever stuff possible :)
Of course, parsing sequences directly from network handles isn't always
a good idea, but it can be useful.

> In any case, sure, I'd love to make some changes to the current AbiIO code
> so it works without .seek() and .tell(). Is there any other input types that
> does not use .seek() and .tell() other than network handles?

I suspect some specialised handles for accessing compressed files might
have similar limitations. In the case of gzip at least, I think it does support
seek and tell.

> Here's my new branch from the current master:
> https://github.com/bow/biopython/tree/seqio-abi_handlefix
> nothing different for now but I'll push my updates soon.

Don't rush yourself - I'm away for a long weekend so won't be testing
any updates till next week anyway.

Thanks,

Peter


From b.invergo at gmail.com  Thu Aug  4 15:38:23 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Thu, 04 Aug 2011 17:38:23 +0200
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
Message-ID: <1312472309.8916.15.camel@localhost.localdomain>

Hi Peter,
(I'm CCing this to the dev list for the info in the second paragraph)
Thanks for the reply. I solved the Python2 problem by fixing my
PYTHONPATH. Running the tests from the Tests directory couldn't find the
Bio module due to a mistake in the PYTHONPATH, so I tried to run them
from the parent directory, resulting in test failures. A dumb mistake
but anyway it's fixed. Sorry for wasting your time with that.

I still have the following error with Python 3.2, though, which prevents
me from figuring out the leaked handle problem in Py3k:
[brandon at brandon-linux Tests]$ python test_PAML_baseml.py
Traceback (most recent call last):
  File "test_PAML_baseml.py", line 10, in <module>
    from Bio.Phylo.PAML import baseml
  File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py",
line 12, in <module>
    from Bio.Phylo._io import parse, read, write, convert
  File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line
12, in <module>
    from Bio.Phylo import BaseTree, NewickIO, NexusIO
  File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py",
line 222
    return u'%s(%s)' % (self.__class__.__name__,

SyntaxError: invalid syntax

Regarding that specific error, I think all strings are implicitly
unicode in Python 3, aren't they? I don't have much experience with
maintaing Py2/3 compatibility, though, so I don't know how to best
handle this. Searching for the unicode operator (u') in the entire Bio
file tree shows that it only exists in Phylo/PhyloXML.py and
Phylo/BaseTree.py.

-brandon

On Wed, 2011-08-03 at 13:33 +0100, Peter Cock wrote:
> On Wed, Aug 3, 2011 at 11:18 AM, Brandon Invergo <b.invergo at gmail.com> wrote:
> > Hi Peter,
> > I'm still in the process of looking at them now but I'm running into a
> > side issue that maybe you can help with. I've tried running the unit
> > tests myself using both Python 2.7.2 and Python 3.2.1, the two versions
> > I have, and both times it fails.
> 
> Python 3 takes a bit more effort to debug due to the 2to3 thing
> and different paths - so I'd focus on Python 2.7 initially.
> 
> > Just looking at test_PAML_baseml.py, for example, with Python 2 I get a
> > lot of test failures due to baseml.py now (correctly) throwing IOErrors
> > rather than AttributeErrors or TypeErrors. With Python 3, on the other
> > hand, I get syntax errors in BaseTree.py (I'll include the output of
> > both below). I did a git pull upstream master before doing this, so my
> > code should be up-to-date (it seems like the unit tests are out-of-date,
> > re: the error types). Now, clearly these have passed on the build
> > machine so I'm wondering what I could be doing wrong.  Being able to
> > replicate the test failures in Python 3 on my machine will really help
> > in fixing them.
> > Sorry about the probable-newbie question...
> 
> What does "git status" give you?
> 
> My usual routine is as follows, but I clone from the official repository
> (which is therefore called origin), and have my personal one setup
> as peterjc via "git remote add ...":
> 
> git checkout master #if not there already
> git fetch origin
> git status #should say behind and can FF merge
> git merge origin/master #should now have latest code
> 
> I'm guessing you're working from a clone of your github repo?
> 
> An easy thing to try is a fresh clone of the official biopython.
> 
> The other key point is all the unit tests expect the current
> directory to be the Tests directory NOT the parent directory
> where setup.py lives.
> 
> Note if you just do "python test_PAML_baseml.py" this will
> pickup the installed Biopython (via PYTHONPATH etc).
> 
> One option is "runtests.py test_PAML_baseml.py" which
> will use the local code for you.
> 
> If you do "python Tests/test_PAML_baseml.py" this should
> pickup the source code for Biopython (won't work for any
> compiled modules IIRC).
> 
> Peter

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20110804/e63fea3a/attachment.sig>

From p.j.a.cock at googlemail.com  Thu Aug  4 15:59:42 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 16:59:42 +0100
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <1312472309.8916.15.camel@localhost.localdomain>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
Message-ID: <CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>

On Thu, Aug 4, 2011 at 4:38 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Hi Peter,
> (I'm CCing this to the dev list for the info in the second paragraph)
> Thanks for the reply. I solved the Python2 problem by fixing my
> PYTHONPATH. Running the tests from the Tests directory couldn't find the
> Bio module due to a mistake in the PYTHONPATH, so I tried to run them
> from the parent directory, resulting in test failures. A dumb mistake
> but anyway it's fixed. Sorry for wasting your time with that.

No problem - learning about paths and imports is a bit tricky.

> I still have the following error with Python 3.2, though, which prevents
> me from figuring out the leaked handle problem in Py3k:
> [brandon at brandon-linux Tests]$ python test_PAML_baseml.py
> Traceback (most recent call last):
> ?File "test_PAML_baseml.py", line 10, in <module>
> ? ?from Bio.Phylo.PAML import baseml
> ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py",
> line 12, in <module>
> ? ?from Bio.Phylo._io import parse, read, write, convert
> ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line
> 12, in <module>
> ? ?from Bio.Phylo import BaseTree, NewickIO, NexusIO
> ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py",
> line 222
> ? ?return u'%s(%s)' % (self.__class__.__name__,
>
> SyntaxError: invalid syntax

Hang on - that looks like you ran it with "python" meaning Python 2.x

Working with Python 3 the following should "just work":

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py build
python3 setup.py test
python3 setup.py install #Use sudo or --prefix etc if you want

However, if you want to run the offline test only, you need
to go into the Python3 converted Tests directory, not the
unconverted Python2 Tests directory. Note that this is
Biopython specific (but based on what NumPy does). e.g.

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py build
cd build/py3.2/Tests
python3 run_tests.py --offline

Likewise if you want to test just one module,

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py build
cd build/py3.2/Tests
python3 run_tests.py test_PAML_baseml.py

In the above, run_tests.py should take care of the path
settings to ensure the freshly built Biopython is used
(not whatever old version may be installed elsewhere).

If the above works nicely for you, stick with that.

Alternatively, I often just install in-development versions of
Biopython on my personal machine under my home directory
(where Python 3 was also installed using the --prefix option
so I don't need to mess about with the PYTHONPATH):

cd /home/brandon/Projects/pypaml/biopython
python3 setup.py install --prefix=$HOME
cd build/py3.2/Tests
python3 test_PAML_baseml.py

If your Python 3 is installed at system level you can do this but
it isn't very clean (certainly don't do it on a shared machine):

cd /home/brandon/Projects/pypaml/biopython
sudo python3 setup.py install
cd build/py3.2/Tests
python3 test_PAML_baseml.py

Alternatively if your Python 3 is at the system level you can
install Biopython under your home directory but then you have
to mess about with PYTHONPATH and keep changing it for
Python2 vs Python3, since they use the same variable (a
design choice I fail to see any advantages in).

Confusing isn't it?

There are other potential solutions to having multiple copies
of Python installed, like using virtualenv...

Peter


From p.j.a.cock at googlemail.com  Thu Aug  4 17:32:38 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 4 Aug 2011 18:32:38 +0100
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <1312478530.8916.20.camel@localhost.localdomain>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
	<CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>
	<1312478530.8916.20.camel@localhost.localdomain>
Message-ID: <CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>

On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
>
> The above does work nicely for me. So nicely, in fact, that the PAML
> tests all pass! So I'm still having trouble replicating the leaked
> handles. I'm still trying to figure out what's happening...
>

It could be something silly with warning silencing being global
and not local, and thus depends on the order the tests are run in.

Did you try running all the (offline) tests in one go under Python 3.2?

Peter


From b.invergo at gmail.com  Thu Aug  4 18:21:59 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Thu, 04 Aug 2011 20:21:59 +0200
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
	<CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>
	<1312478530.8916.20.camel@localhost.localdomain>
	<CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
Message-ID: <1312482121.8916.22.camel@localhost.localdomain>

Ok, now I've got the errors. Now I can actually get to work. Thanks for
your help with this. I had no idea about the special Py3 building (I've
just been using the raw tests from the repository)

I'll see what I can do now.
-brandon

On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote:
> On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> >
> > The above does work nicely for me. So nicely, in fact, that the PAML
> > tests all pass! So I'm still having trouble replicating the leaked
> > handles. I'm still trying to figure out what's happening...
> >
> 
> It could be something silly with warning silencing being global
> and not local, and thus depends on the order the tests are run in.
> 
> Did you try running all the (offline) tests in one go under Python 3.2?
> 
> Peter


From b.invergo at gmail.com  Fri Aug  5 13:58:27 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Fri, 05 Aug 2011 15:58:27 +0200
Subject: [Biopython-dev] Leaked handles in PAML unit tests
In-Reply-To: <CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
References: <CAKVJ-_7L3fCbEPj0iGUzy_1sq4x3Y3Y7-k7F9+qhhWkcbnHFFg@mail.gmail.com>
	<1312366681.1302.9.camel@localhost.localdomain>
	<CAKVJ-_6v+ntM2_6adhZQ_UbX4=NDNL6iDNp2vjOddXXLFhPTMQ@mail.gmail.com>
	<1312472309.8916.15.camel@localhost.localdomain>
	<CAKVJ-_6+P2g5A5PRf6xpevzcNP3JLYQj2hE5e+G+tPEQ2AtF=w@mail.gmail.com>
	<1312478530.8916.20.camel@localhost.localdomain>
	<CAKVJ-_6NT1YO4MzagtF0gUsvZqc+9c2AfDN+iRLn_QAt=sUTsg@mail.gmail.com>
Message-ID: <1312552714.8916.28.camel@localhost.localdomain>

Ok the leaks have been taken care of. The problem arises when an
exception is raised within a block of text in which a file handle is
currently open. I simply had to close the handle just before raising the
exception. There was another one, however, that came up from using
stdout=open('/dev/null', 'w') in the subprocess.call() to PAML programs
(which, come to think of it, is *nix-specific anyway, and probably
wouldn't work with Windows). Instead, I set stdout to a subprocess.PIPE
and get rid of the /dev/null handle altogether.

Cheers,
Brandon


On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote:
> On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> >
> > The above does work nicely for me. So nicely, in fact, that the PAML
> > tests all pass! So I'm still having trouble replicating the leaked
> > handles. I'm still trying to figure out what's happening...
> >
> 
> It could be something silly with warning silencing being global
> and not local, and thus depends on the order the tests are run in.
> 
> Did you try running all the (offline) tests in one go under Python 3.2?
> 
> Peter


From w.arindrarto at gmail.com  Sat Aug  6 09:52:13 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Sat, 6 Aug 2011 11:52:13 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
	<CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
Message-ID: <CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>

Hi Peter & everyone,

I've been trying to improve the parser so it works with forward-only
handles, but I'm drawing a blank for now.

I realized the reason I use seek in the first place was because of the file
structure. In an Abi file we've got three data blocks: the header that
contains the file information, the sequencing data, and the directories
which serve as indexes to the sequencing data. To unpack the sequencing data
bytes, we need the information stored in the directories. Depending on its
size, it could be stored outside the directories block, or in the directory
itself. This is why .seek() helps, because it allows for jumping between the
directories and the sequencing data as it is being parsed.

Now, I thought the three blocks were stored in this order: header -
directory - sequencing data. I've thought of a way of parsing the file if
the structure is like this. As it turns out, it's possible (or even this
might be the norm) that the order is: header - sequencing data - directory.
So as soon as I finished parsing the information on how to retrieve the data
from the directories, I've already gone past the data block. In forward-only
handles, this makes the data irretrievable.

There should be other ways to retrieve the sequencing data in forward-only
handles. I thought about reading the entire handle stream first and storing
it into a variable. This way, we could replace seek() with slicing
operators. The trade off is we store the entire handle stream in memory at
once (abi files are probably ~300-500kb in size). I'm sure there are other
ways, but I couldn't think of any now.

So what do you think? Or maybe anyone else have ideas that I could try?

Regards & have a nice weekend all,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Thu, Aug 4, 2011 at 13:47, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter,
> > Ah yes, I didn't know there could be handles without .seek() and .tell(),
> > and I thought those two are the proper way of traversing files, so I used
> > them. I also didn't realize you could use SeqIO with network handles,
> too.
> > This is really neat :).
>
> Yes - having a handle focused API makes some clever stuff possible :)
> Of course, parsing sequences directly from network handles isn't always
> a good idea, but it can be useful.
>
> > In any case, sure, I'd love to make some changes to the current AbiIO
> code
> > so it works without .seek() and .tell(). Is there any other input types
> that
> > does not use .seek() and .tell() other than network handles?
>
> I suspect some specialised handles for accessing compressed files might
> have similar limitations. In the case of gzip at least, I think it does
> support
> seek and tell.
>
> > Here's my new branch from the current master:
> > https://github.com/bow/biopython/tree/seqio-abi_handlefix
> > nothing different for now but I'll push my updates soon.
>
> Don't rush yourself - I'm away for a long weekend so won't be testing
> any updates till next week anyway.
>
> Thanks,
>
> Peter
>


From derjogi at web.de  Sun Aug  7 13:44:03 2011
From: derjogi at web.de (Jogi)
Date: Sun, 07 Aug 2011 15:44:03 +0200
Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map') +
	correction
Message-ID: <1312724643.2148.5.camel@JogiDesk>

I'm new to the field of 'bug reporting', so please, if someone knows
where I should post this message please tell me or do it yourself :)

I've found a bug in the Bio.Restriction module when calling
Analysis.print_as('map').

The bugs (that I know of and that I corrected):
1. When there is a restriction site within the first 60 basepairs in the
sequence this one isn't added to a list and thus raises an KeyError: 0

2. Sometimes (I don't know exactly how to reproduce it any more) an
Enzyme is repeated in every line although there is no restriction site.

Solution:

Replace from line 310 in PrintFormat.py:
        x, counter, length = 0, 0, len(self.sequence)
        for x in xrange(60, length, 60):
            counter = x - 60
            l=[]
            for key in mapping:
                if key <= x:
                    l.append(key)
                else:
                    cutloc[counter] = l
                    mapping = mapping[mapping.index(key):]
                    break
            cutloc[x] = l
        cutloc[x] = mapping
        sequence = self.sequence.tostring()

With
        upper, lower, length = 0, 0, len(self.sequence)
        for upper in xrange(60, length+60, 60):
            lower = upper - 60
            l=[]
            for key in mapping:
                if key <= upper and key > lower:
                    l.append(key)
                else:
                    mapping = mapping[mapping.index(key):]
                    break
            cutloc[lower] = l
        sequence = self.sequence.tostring()


Hope this bug report/solution was/is helpful and at the right place :)
J.Kuhn


From p.j.a.cock at googlemail.com  Tue Aug  9 13:40:18 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Aug 2011 14:40:18 +0100
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
	<CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
	<CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>
Message-ID: <CAKVJ-_6YnaVhnfux31NQq0kx7UDFWLYYxNRJvicEwMf=sGy=4w@mail.gmail.com>

On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto
<w.arindrarto at gmail.com> wrote:
> Hi Peter & everyone,
> I've been trying to improve the parser so it works with forward-only
> handles, but I'm drawing a blank for now.
> I realized the reason I use seek in the first place was because of the file
> structure. In an Abi file we've got three data blocks: the header that
> contains the file information, the sequencing data, and the directories
> which serve as indexes to the sequencing data. To unpack the sequencing data
> bytes, we need the information stored in the directories. Depending on its
> size, it could be stored outside the directories block, or in the directory
> itself. This is why .seek() helps, because it allows for jumping between the
> directories and the sequencing data as it is being parsed.

Yes - this design makes sense, especially given the computer
capabilities back when the format was designed.

> Now, I thought the three blocks were stored in this order: header -
> directory - sequencing data. I've thought of a way of parsing the file if
> the structure is like this.?As it turns out, it's possible (or even this
> might be the norm) that the order is: header - sequencing data - directory.
> So as soon as I finished parsing the information on how to retrieve the data
> from the directories, I've already gone past the data block. In forward-only
> handles, this makes the data irretrievable.

I see now, that is unfortunate. I presume the current order was chosen
to make writing the data easy (do the directory last). A simple forward
only parser would be possible IF the data was reordered, but we can't
require that.

> There should be other ways to retrieve the sequencing data in forward-only
> handles. I thought about reading the entire handle stream first and storing
> it into a variable. This way, we could replace seek() with slicing
> operators. The trade off is we store the entire handle stream in memory at
> once (abi files are probably ~300-500kb in size). I'm sure there are other
> ways, but I couldn't think of any now.
> So what do you think? Or maybe anyone else have ideas that I could try?
> Regards & have a nice weekend all,

I think we have to accept that typical ABI files are not suitable for forward
only parsing. Thanks for looking into this - I hope you found it interesting.

Regards,

Peter


From redmine at redmine.open-bio.org  Tue Aug  9 14:29:53 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 14:29:53 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use
	Gapped without import
Message-ID: <redmine.issue-3278.20110809142953@redmine.open-bio.org>


Issue #3278 has been reported by Paul Agapow.

----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


----------------------------------------
You have received this notification because this email was added to the New Issue Alert plugin


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug  9 14:29:54 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 14:29:54 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use
	Gapped without import
Message-ID: <redmine.issue-3278.20110809142953@redmine.open-bio.org>


Issue #3278 has been reported by Paul Agapow.

----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From redmine at redmine.open-bio.org  Tue Aug  9 14:47:22 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 14:47:22 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] SeqIO tries to use Gapped
	without import
References: <redmine.issue-3278.20110809142953@redmine.open-bio.org>
Message-ID: <redmine.journal-14664.20110809144722@redmine.open-bio.org>


Issue #3278 has been updated by Peter Cock.


Looking at Biopython 1.53 (December 2009) you appear to be correct.

However, the function was explicitly made obsolete in Biopython 1.54 (with a deprecation warning), and at that point this error did not exist.

Unless there a related problem in the current release, I will close this report.

Thanks.
----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: New
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Tue Aug  9 14:49:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Tue, 9 Aug 2011 15:49:30 +0100
Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map')
	+ correction
In-Reply-To: <1312724643.2148.5.camel@JogiDesk>
References: <1312724643.2148.5.camel@JogiDesk>
Message-ID: <CAKVJ-_6Fv-Nom7=JbJ9Z0Vc+PQgNkTUqLwDSRp-EeGeWMPCfhA@mail.gmail.com>

On Sun, Aug 7, 2011 at 2:44 PM, Jogi <derjogi at web.de> wrote:
> I'm new to the field of 'bug reporting', so please, if someone knows
> where I should post this message please tell me or do it yourself :)
>
> I've found a bug in the Bio.Restriction module when calling
> Analysis.print_as('map').
>
> The bugs (that I know of and that I corrected):
> 1. When there is a restriction site within the first 60 basepairs in the
> sequence this one isn't added to a list and thus raises an KeyError: 0

Could you give a short example script showing the problem?
It could then be used for a unit test.

> 2. Sometimes (I don't know exactly how to reproduce it any more) an
> Enzyme is repeated in every line although there is no restriction site.

I'm not familiar with that problem - without an example that will be
hard to look into.

Peter


From w.arindrarto at gmail.com  Tue Aug  9 14:59:37 2011
From: w.arindrarto at gmail.com (Wibowo Arindrarto)
Date: Tue, 9 Aug 2011 16:59:37 +0200
Subject: [Biopython-dev] SeqIO Abi Parser
In-Reply-To: <CAKVJ-_6YnaVhnfux31NQq0kx7UDFWLYYxNRJvicEwMf=sGy=4w@mail.gmail.com>
References: <CADEGkF7qC4Q_KsN-bjOWTyXOu6vw4dzva02H0LWm+kk3xXX6og@mail.gmail.com>
	<CADEGkF5HBL2nMTN=kNhS9Hs_8UiWHs3CXw36=yHUiaVgQwQ4hg@mail.gmail.com>
	<CAKVJ-_7HOhEAn+XwQes+2h=7CwFR10FtefOHZqXMU5xo97D0tA@mail.gmail.com>
	<CADEGkF6E=PDCbqnJ60qsYpt231CrS6S1EWghx7tdFZ_gwAi7QQ@mail.gmail.com>
	<CAKVJ-_6LiJ-j5md_+Fw6KyJoaFqCeRSGPwwbcFaO=PJWNdqNmg@mail.gmail.com>
	<CADEGkF7imzYii6qq7D=3r68uAumu_ZzXaQpC9eyg_8prYcDoHw@mail.gmail.com>
	<CAKVJ-_5WAL+XbC=br4fp5gRnd1DCqoPbidqtZeHi6sM0JS+KXQ@mail.gmail.com>
	<CADEGkF662Xs0cmFEa7We+mVyPB5VQkKAO2yYpERBbRJ0ZYQj4Q@mail.gmail.com>
	<CAKVJ-_7xNgb5xUX6VTZcWoakkUXOkUKv93KOs5NYYEOCTQgkuQ@mail.gmail.com>
	<CAKVJ-_4FpFYyVgOL3Rod1QR1DhuFk3rn6yqWKxUq2+8+OBnpJw@mail.gmail.com>
	<CADEGkF6GDqgK9SOmUe_fQsjv9RU_dZwG_E0QnfD7nAKXUOD8yg@mail.gmail.com>
	<CAKVJ-_6LT+VJStu_sN5mfUbGTuTChMffxOw0PsJ6hTSq-fgeqA@mail.gmail.com>
	<CADEGkF4RK5tV5gfX-0Rm1Q7wOa5DXFcX1JZjNjjwC0fukO5ekg@mail.gmail.com>
	<CAKVJ-_4F7En085bcLFXO3cc_xkq632gY9f=NDi=vCm0p1NWZvw@mail.gmail.com>
	<CAKVJ-_7DdjXjme+n3bkQo5BQk0=7V_Czb=c3Fym38aOHY2V=CQ@mail.gmail.com>
	<CADEGkF5aRyD97NxXzUROUZxVqA7xNoEHi4_8Kcs_fKcXGOPDZg@mail.gmail.com>
	<CAKVJ-_6PdooLMb=mvaskFTjtriLQ1LNikhqJXho7m1k1jA1WtQ@mail.gmail.com>
	<CADEGkF4zevZ7Mhy_viayMJ+mUrS1VXjf7q1zApEU=5dUJWjqbQ@mail.gmail.com>
	<CAKVJ-_5uXgdu8C8qqnhEcXT1G9kTFkJsS82VJ0M1=9q9QOVeOw@mail.gmail.com>
	<CADEGkF4fYp_J7hdqCs+BQWdc=5mvs=iAN1v8AJf88nDL1hJw4A@mail.gmail.com>
	<CAKVJ-_4n8EUJXeGwGGPxAd1yvaXF0uGR_52m0LXVJ5i2Y03JUA@mail.gmail.com>
	<CADEGkF7tXn6VqW7SGS9iFcyMDbeEOFfgjUBcCao4OH075hQDdw@mail.gmail.com>
	<CAKVJ-_6YnaVhnfux31NQq0kx7UDFWLYYxNRJvicEwMf=sGy=4w@mail.gmail.com>
Message-ID: <CADEGkF4JQEisxGfR-yMCfx1v=MZW=VGAmoEg2BPeXfRwRR3qoA@mail.gmail.com>

Hi Peter,

You're welcome :)! Although a bit disappointing, it was nice when I
understood why my forward parser didn't work.

Regards,
---
Wibowo Arindrarto (bow)
http://bow.web.id


On Tue, Aug 9, 2011 at 15:40, Peter Cock <p.j.a.cock at googlemail.com> wrote:

> On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto
> <w.arindrarto at gmail.com> wrote:
> > Hi Peter & everyone,
> > I've been trying to improve the parser so it works with forward-only
> > handles, but I'm drawing a blank for now.
> > I realized the reason I use seek in the first place was because of the
> file
> > structure. In an Abi file we've got three data blocks: the header that
> > contains the file information, the sequencing data, and the directories
> > which serve as indexes to the sequencing data. To unpack the sequencing
> data
> > bytes, we need the information stored in the directories. Depending on
> its
> > size, it could be stored outside the directories block, or in the
> directory
> > itself. This is why .seek() helps, because it allows for jumping between
> the
> > directories and the sequencing data as it is being parsed.
>
> Yes - this design makes sense, especially given the computer
> capabilities back when the format was designed.
>
> > Now, I thought the three blocks were stored in this order: header -
> > directory - sequencing data. I've thought of a way of parsing the file if
> > the structure is like this. As it turns out, it's possible (or even this
> > might be the norm) that the order is: header - sequencing data -
> directory.
> > So as soon as I finished parsing the information on how to retrieve the
> data
> > from the directories, I've already gone past the data block. In
> forward-only
> > handles, this makes the data irretrievable.
>
> I see now, that is unfortunate. I presume the current order was chosen
> to make writing the data easy (do the directory last). A simple forward
> only parser would be possible IF the data was reordered, but we can't
> require that.
>
> > There should be other ways to retrieve the sequencing data in
> forward-only
> > handles. I thought about reading the entire handle stream first and
> storing
> > it into a variable. This way, we could replace seek() with slicing
> > operators. The trade off is we store the entire handle stream in memory
> at
> > once (abi files are probably ~300-500kb in size). I'm sure there are
> other
> > ways, but I couldn't think of any now.
> > So what do you think? Or maybe anyone else have ideas that I could try?
> > Regards & have a nice weekend all,
>
> I think we have to accept that typical ABI files are not suitable for
> forward
> only parsing. Thanks for looking into this - I hope you found it
> interesting.
>
> Regards,
>
> Peter
>


From redmine at redmine.open-bio.org  Tue Aug  9 15:48:06 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Tue, 9 Aug 2011 15:48:06 +0000
Subject: [Biopython-dev] [Biopython - Bug #3278] (Closed) SeqIO tries to use
	Gapped without import
References: <redmine.issue-3278.20110809142953@redmine.open-bio.org>
Message-ID: <redmine.journal-14665.20110809154806@redmine.open-bio.org>


Issue #3278 has been updated by Peter Cock.

Status changed from New to Closed
% Done changed from 0 to 100

I realised this deprecated function was due for removal, it will be gone in Biopython 1.58,
https://github.com/biopython/biopython/commit/9eb934ee0425b4636b26f310a0f1454f53745b17

Marking this bug as closed.
----------------------------------------
Bug #3278: SeqIO tries to use Gapped without import
https://redmine.open-bio.org/issues/3278

Author: Paul Agapow
Status: Closed
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: Main Distribution
Target version: 1.53
URL: 


@to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file).

Solution: @from Bio.Alphabet import Gapped@


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Wed Aug 10 17:12:25 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 10 Aug 2011 18:12:25 +0100
Subject: [Biopython-dev] pypaml
In-Reply-To: <AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
Message-ID: <CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>

On Fri, Jan 14, 2011 at 2:11 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
>> By the way, have you ever tried using this under Windows?
>
> I haven't yet but by the looks of it it should work fine assuming the
> programs are in the system path and thus can be called by name from
> any location in the file system. I see one line where I accidentally
> made it *nix-specific (default working directory is "./") but other
> than that, all files/directories are located via os.path or by
> user-inputted strings (as they would be in the control file). I have
> both a Linux and a Windows 7 machine at home though so I can do some
> testing. Obviously the unit tests here will help catch system-specific
> errors such as entering file locations incorrectly (I can see a few
> exceptions that I'm currently not handling).

Hi Brandon,

Have you looked into PAML under Windows yet?

Regards,

Peter


From b.invergo at gmail.com  Wed Aug 10 17:16:08 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Wed, 10 Aug 2011 19:16:08 +0200
Subject: [Biopython-dev] pypaml
In-Reply-To: <CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
Message-ID: <1312996570.1339.12.camel@localhost.localdomain>

On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote:
> Hi Brandon,
> 
> Have you looked into PAML under Windows yet?
> 
> Regards,
> 
> Peter

Hi Peter,
Unfortunately, I don't have a Windows machine at my disposal to test it
on! Has anyone reported any problems yet?

-brandon


From p.j.a.cock at googlemail.com  Thu Aug 11 11:36:41 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 12:36:41 +0100
Subject: [Biopython-dev] pypaml
In-Reply-To: <1312996570.1339.12.camel@localhost.localdomain>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
	<1312996570.1339.12.camel@localhost.localdomain>
Message-ID: <CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>

On Wed, Aug 10, 2011 at 6:16 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote:
>> Hi Brandon,
>>
>> Have you looked into PAML under Windows yet?
>>
>> Regards,
>>
>> Peter
>
> Hi Peter,
> Unfortunately, I don't have a Windows machine at my disposal to test it
> on! Has anyone reported any problems yet?
>
> -brandon

Hi Brandon,

It's a shame you don't still have access to the Windows 7 box.

I've just grabbed the current PAML 4.4 pre-compiled for Windows
and put it on my Windows machine which runs as a buildslave,
and put the binaries on the PATH:

http://abacus.gene.ucl.ac.uk/software/paml.html
http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz

None of the current unit tests actually use the binaries do they?
Could you add a basic test (in a separate file which raises the
missing dependency exception to skip the test if the binary is
not on the path) for calling the tools?

Peter


From b.invergo at gmail.com  Thu Aug 11 11:51:26 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Thu, 11 Aug 2011 13:51:26 +0200
Subject: [Biopython-dev] pypaml
In-Reply-To: <CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
	<1312996570.1339.12.camel@localhost.localdomain>
	<CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>
Message-ID: <1313063488.1339.28.camel@localhost.localdomain>

On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote:
> It's a shame you don't still have access to the Windows 7 box.
> 
> I've just grabbed the current PAML 4.4 pre-compiled for Windows
> and put it on my Windows machine which runs as a buildslave,
> and put the binaries on the PATH:
> 
> http://abacus.gene.ucl.ac.uk/software/paml.html
> http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz
> 
> None of the current unit tests actually use the binaries do they?
> Could you add a basic test (in a separate file which raises the
> missing dependency exception to skip the test if the binary is
> not on the path) for calling the tools?
> 
> Peter

No, I didn't include any tests that use the binaries because I wasn't
sure if they would be on the main test machine. Also, generating the
output which is used in other tests can take a lot of time in some
cases. Instead, I've generated the output files myself and then accessed
those from the tests. The one problem I have with this approach is that
it's not very reproducible; if someone else wishes to add data files
from later versions of PAML, they won't know how I generated them. Again
the goal is to make sure that we're parsing each new version correctly,
since the output format has been known to change between versions. I
could create a readme file which contains the info and put it in the
paml Tests subfolder. Sound reasonable?

I can create a Tests/test_PAML.py file to contain the proposed test. In
it, I can try to run codeml, baseml and yn00 directly using Subprocess,
each on some bogus input. If the binaries are there, they'll throw an
error which the test will catch. If they aren't Subprocess itself will
throw an error. I can't do this check using Bio.Phylo.PAML because we,
of course, aim to prevent bogus input from ever even reaching the
binary.
How does that sound? Is that what you had in mind?

-brandon


From p.j.a.cock at googlemail.com  Thu Aug 11 13:49:39 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 14:49:39 +0100
Subject: [Biopython-dev] pypaml
In-Reply-To: <1313063488.1339.28.camel@localhost.localdomain>
References: <AANLkTikFsKt+RqO4b_ep9xxdfNtxbNJfigUMphd5OusA@mail.gmail.com>
	<AANLkTi=4vrti-J6HhVZEV8L7n1N8LSMra40HAp7Qst3J@mail.gmail.com>
	<AANLkTikKU0n7Hei=X9YnLjgO7YnZ88Ebz5-as8Zq1CXY@mail.gmail.com>
	<AANLkTin5=OCSDtHSEuyPkdbHBTHuC9Z8G6Gcw8L-kOhu@mail.gmail.com>
	<AANLkTikcSMo6uVZu9fx_FC8Z-s-1JFN4pH=O2+GkRcm6@mail.gmail.com>
	<CAKVJ-_6m2Vg4NVLXs3w6QRvvopLQ8sH8uoMEomitKch86M-kLw@mail.gmail.com>
	<1312996570.1339.12.camel@localhost.localdomain>
	<CAKVJ-_5HL33L8hzDE9Ht-Yy=4q0bN_GH9qrvJbUXvLvuPfvEfw@mail.gmail.com>
	<1313063488.1339.28.camel@localhost.localdomain>
Message-ID: <CAKVJ-_4aJ4xg1DzP-NmwPNFqLV+iJPZVGxdZNyu-DpCF3eJdng@mail.gmail.com>

On Thu, Aug 11, 2011 at 12:51 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote:
>> It's a shame you don't still have access to the Windows 7 box.
>>
>> I've just grabbed the current PAML 4.4 pre-compiled for Windows
>> and put it on my Windows machine which runs as a buildslave,
>> and put the binaries on the PATH:
>>
>> http://abacus.gene.ucl.ac.uk/software/paml.html
>> http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz
>>
>> None of the current unit tests actually use the binaries do they?
>> Could you add a basic test (in a separate file which raises the
>> missing dependency exception to skip the test if the binary is
>> not on the path) for calling the tools?
>>
>> Peter
>
> No, I didn't include any tests that use the binaries because I wasn't
> sure if they would be on the main test machine. Also, generating the
> output which is used in other tests can take a lot of time in some
> cases. Instead, I've generated the output files myself and then accessed
> those from the tests. The one problem I have with this approach is that
> it's not very reproducible; if someone else wishes to add data files
> from later versions of PAML, they won't know how I generated them.

Next time there is a PAML release, you'll have to make some more
test files ;)

> Again
> the goal is to make sure that we're parsing each new version correctly,
> since the output format has been known to change between versions. I
> could create a readme file which contains the info and put it in the
> paml Tests subfolder. Sound reasonable?

Yes.

> I can create a Tests/test_PAML.py file to contain the proposed test. In
> it, I can try to run codeml, baseml and yn00 directly using Subprocess,
> each on some bogus input. If the binaries are there, they'll throw an
> error which the test will catch. If they aren't Subprocess itself will
> throw an error. I can't do this check using Bio.Phylo.PAML because we,
> of course, aim to prevent bogus input from ever even reaching the
> binary. How does that sound? Is that what you had in mind?

I believe we're thinking on the same lines here - have a look at
test_Muscle_tool.py or test_Emboss.py and others like it. There is
some header code which tries to locate the binaries, and perhaps
check their version.

Some tools have a switch like -v or --help or similar which makes
them immediately exit, sometimes with a version number. This
is less trouble than trying to run them with a dummy input file.
Having had a quick play with ds.exe it generally seems to insist
on asking for an input file, so you may have to go that route. But
see if this is useful - probably you'd need /dev/nul on Unix machines:

C:\repositories\biopython\Tests>ds nul
results go into out.txt

(1) collecting min, max, and mean       0:00
(2) variance-covariance matrix      0:00
(3) median, percentiles & serial correlation       0:00
(4) Histograms and 1-D densities


If the binaries are missing or the wrong version, we raise
MissingExternalDependencyError and the test gets skipped.

If the binaries are present (and the right version), use the normal
unittest framework. Try to make the examples quick to run (aim
for well under a minute for the whole test), so smaller datafiles
than might be typical.

Peter


From p.j.a.cock at googlemail.com  Thu Aug 11 16:06:48 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 11 Aug 2011 17:06:48 +0100
Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready
	to go?
Message-ID: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>

Hi Tiago & Bartek,

Looking over the DEPRECATED file, the following are about due for removal
in Bio.PopGen and Bio.Motif - do you guys have time to make these changes
yourselves?

Thanks,

Peter

> Bio.PopGen.FDist
> ================
> The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete
> in Release 1.54, and deprecated in Release 1.55 final. Their functionality is
> now available through a read() function.

and:

> Bio.Motif
> =========
> ...
> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete
> in Release 1.53 and deprecated in Release 1.55 final; their functionality is
> now available through a read() function in Bio.Motif.Parsers.AlignAce.
> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser,
> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and
> deprecated in Release 1.55 final; their functionality is now available through
> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST,
> respectively.

P.S. We don't usually need to mention private classes like _MEMEScanner in
the DEPRECATE file.


From tiagoantao at gmail.com  Thu Aug 11 16:15:08 2011
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Thu, 11 Aug 2011 17:15:08 +0100
Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif
	ready to go?
In-Reply-To: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
References: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
Message-ID: <CAA9RGEOwk+R3caBEUyXwPHHRV3VEhBs1tVOggdRLgJ181GPagw@mail.gmail.com>

I will do it over the weekend for bio.popgen

2011/8/11, Peter Cock <p.j.a.cock at googlemail.com>:
> Hi Tiago & Bartek,
>
> Looking over the DEPRECATED file, the following are about due for removal
> in Bio.PopGen and Bio.Motif - do you guys have time to make these changes
> yourselves?
>
> Thanks,
>
> Peter
>
>> Bio.PopGen.FDist
>> ================
>> The RecordParser, _Scanner, and _RecordConsumer classes were declared
>> obsolete
>> in Release 1.54, and deprecated in Release 1.55 final. Their functionality
>> is
>> now available through a read() function.
>
> and:
>
>> Bio.Motif
>> =========
>> ...
>> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared
>> obsolete
>> in Release 1.53 and deprecated in Release 1.55 final; their functionality
>> is
>> now available through a read() function in Bio.Motif.Parsers.AlignAce.
>> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser,
>> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and
>> deprecated in Release 1.55 final; their functionality is now available
>> through
>> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST,
>> respectively.
>
> P.S. We don't usually need to mention private classes like _MEMEScanner in
> the DEPRECATE file.
>

-- 
Enviada a partir do meu dispositivo m?vel

"If you want to get laid, go to college.  If you want an education, go
to the library." - Frank Zappa


From barwil at gmail.com  Thu Aug 11 16:28:01 2011
From: barwil at gmail.com (Bartek Wilczynski)
Date: Thu, 11 Aug 2011 09:28:01 -0700
Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif
	ready to go?
In-Reply-To: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
References: <CAKVJ-_4fS1JABxHojMX+PATmE0X127=tD+_s+4TGo6X3FGQTuA@mail.gmail.com>
Message-ID: <CABHxouWCaSHhQzj3-hh9dRtN8HJCc_1sZc1XEZGY=G4SMAw-Tg@mail.gmail.com>

Hi,

I'll do the necessary changes in Bio.Motif by the end of the week.

best
Bartek

2011/8/11 Peter Cock <p.j.a.cock at googlemail.com>:
> Hi Tiago & Bartek,
>
> Looking over the DEPRECATED file, the following are about due for removal
> in Bio.PopGen and Bio.Motif - do you guys have time to make these changes
> yourselves?
>
> Thanks,
>
> Peter
>
>> Bio.PopGen.FDist
>> ================
>> The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete
>> in Release 1.54, and deprecated in Release 1.55 final. Their functionality is
>> now available through a read() function.
>
> and:
>
>> Bio.Motif
>> =========
>> ...
>> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete
>> in Release 1.53 and deprecated in Release 1.55 final; their functionality is
>> now available through a read() function in Bio.Motif.Parsers.AlignAce.
>> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser,
>> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and
>> deprecated in Release 1.55 final; their functionality is now available through
>> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST,
>> respectively.
>
> P.S. We don't usually need to mention private classes like _MEMEScanner in
> the DEPRECATE file.
>


-- 
Bartek Wilczynski
==================
Institute of Informatics
University of Warsaw
http://www.mimuw.edu.pl/~bartek


From redmine at redmine.open-bio.org  Mon Aug 15 09:59:39 2011
From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org)
Date: Mon, 15 Aug 2011 09:59:39 +0000
Subject: [Biopython-dev] [Biopython - Bug #3188] (Closed) Test bug,
	please ignore
References: <redmine.issue-3188.20110328134006@redmine.open-bio.org>
Message-ID: <redmine.journal-14672.20110815095939@redmine.open-bio.org>


Issue #3188 has been updated by Peter Cock.

Status changed from New to Closed
% Done changed from 0 to 100

Should have closed this test bug a while ago.
----------------------------------------
Bug #3188: Test bug, please ignore
https://redmine.open-bio.org/issues/3188

Author: Peter Cock
Status: Closed
Priority: Normal
Assignee: Biopython Dev Mailing List
Category: 
Target version: 
URL: 


The aim of this bug is to test the Redmine "Email on New Issue" option from the Newissuealerts module.

This issue should get emailed to the biopython-dev email list automatically...

Peter


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org


From p.j.a.cock at googlemail.com  Mon Aug 15 10:04:41 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 15 Aug 2011 11:04:41 +0100
Subject: [Biopython-dev] Release blockers? PAML?
Message-ID: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>

Hi all,

We're about due to make a Biopython release, and I could
do it early this week - but then I'm away for a fortnight. I am
fortunate to be attending the BioHackathon 2011 in Kyoto
next week, http://2011.biohackathon.org/

I think we're in a good position with the code on the trunk to
release Biopython 1.58, bar the PAML code which has not
yet been tested on Windows. Also, I'd be keen for Tiago and
Brandon to take a look at the application calling code to see
if the is any scope for a more common approach between
the PAML wrappers and the PopGen tools. Note that both
sets of tools are not 'nicely behaved' Unix style tools (which
is what the Bio.Applications API targets). To do anything
useful with these tools you have to do nasty things like
switch the current working directory and so on.

If we want to do the release this week, we could just warn
that the PAML code is consider to be "in beta" and that
the API may well change in non-backwards compatible
ways?

What else should be addressed before the next release?

There are some open bugs, but at first glance nothing
critical.

Regards,

Peter


From b.invergo at gmail.com  Mon Aug 15 10:15:04 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Mon, 15 Aug 2011 12:15:04 +0200
Subject: [Biopython-dev] Release blockers? PAML?
In-Reply-To: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
References: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
Message-ID: <1313403306.3107.5.camel@localhost.localdomain>

Hi, 
Regarding PAML, I'm sorry I haven't implemented the binary tests yet.
I'll put it on my to-do for today. Turns out it's a Spanish national
holiday today so I guess I don't have to go to the lab.  

I have a Windows 7 laptop that up until now has been quarantined and
used only for music software, with no other software allowed on it, not
allowed near the interwebs, etc (it's a fickle machine), but last night
I broke the rules and installed Python 2.7 on it. I'll try running the
PAML tests on it and I'll let everyone know how it goes. 

Until later,
-brandon

On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote:
> Hi all,
> 
> We're about due to make a Biopython release, and I could
> do it early this week - but then I'm away for a fortnight. I am
> fortunate to be attending the BioHackathon 2011 in Kyoto
> next week, http://2011.biohackathon.org/
> 
> I think we're in a good position with the code on the trunk to
> release Biopython 1.58, bar the PAML code which has not
> yet been tested on Windows. Also, I'd be keen for Tiago and
> Brandon to take a look at the application calling code to see
> if the is any scope for a more common approach between
> the PAML wrappers and the PopGen tools. Note that both
> sets of tools are not 'nicely behaved' Unix style tools (which
> is what the Bio.Applications API targets). To do anything
> useful with these tools you have to do nasty things like
> switch the current working directory and so on.
> 
> If we want to do the release this week, we could just warn
> that the PAML code is consider to be "in beta" and that
> the API may well change in non-backwards compatible
> ways?
> 
> What else should be addressed before the next release?
> 
> There are some open bugs, but at first glance nothing
> critical.
> 
> Regards,
> 
> Peter


From eric.talevich at gmail.com  Mon Aug 15 15:02:57 2011
From: eric.talevich at gmail.com (Eric Talevich)
Date: Mon, 15 Aug 2011 11:02:57 -0400
Subject: [Biopython-dev] Release blockers? PAML?
In-Reply-To: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
References: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
Message-ID: <CAMC681nEp6JeLyCYXamHtRSZ5VLkb=MwMWxHE3dw9dVJezYFUg@mail.gmail.com>

On Mon, Aug 15, 2011 at 6:04 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> Hi all,
>
> We're about due to make a Biopython release, and I could
> do it early this week - but then I'm away for a fortnight. I am
> fortunate to be attending the BioHackathon 2011 in Kyoto
> next week, http://2011.biohackathon.org/
>
> [...]

> What else should be addressed before the next release?
>
> There are some open bugs, but at first glance nothing
> critical.
>
>
A while ago I pushed a new function, Phylo.draw(). It draws rooted
phylograms much like Phylip's drawgram or ape's plot.tree function. There's
a lot of room for personal preferences here, so I'd appreciate if someone
else could try it out and suggest changes.

Usage:
>>> from Bio import Phylo
>>> tree = Phylo.read('some_tree.nwk', 'newick')
>>> Phylo.draw(tree)

Code:
https://github.com/biopython/biopython/blob/master/Bio/Phylo/_utils.py

The function only takes a few arguments, but since it's based on
matplotlib/pylab, the aesthetics of a plot can easily be changed after the
initial plotting.

If we're happy with it, then I'll add a mention of it to the Tutorial.

While I'm at it, has anyone else used Bio.Applications.PhymlCommandline and
found any issues?

Thanks,
Eric


From b.invergo at gmail.com  Tue Aug 16 20:06:24 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Tue, 16 Aug 2011 22:06:24 +0200
Subject: [Biopython-dev] Release blockers? PAML?
In-Reply-To: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
References: <CAKVJ-_4NJTRBWbSr3cjUX4TT=JFMFYfSLRs0n9tEnbsvxBS-cw@mail.gmail.com>
Message-ID: <1313525186.3107.7.camel@localhost.localdomain>

Hi everyone,

I wrote some tests for the presence of the PAML binaries and I've run
all the unit tests in Python 2.7 on Windows 7 and they all pass.

Cheers,
Brandon


On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote:
> Hi all,
> 
> We're about due to make a Biopython release, and I could
> do it early this week - but then I'm away for a fortnight. I am
> fortunate to be attending the BioHackathon 2011 in Kyoto
> next week, http://2011.biohackathon.org/
> 
> I think we're in a good position with the code on the trunk to
> release Biopython 1.58, bar the PAML code which has not
> yet been tested on Windows. Also, I'd be keen for Tiago and
> Brandon to take a look at the application calling code to see
> if the is any scope for a more common approach between
> the PAML wrappers and the PopGen tools. Note that both
> sets of tools are not 'nicely behaved' Unix style tools (which
> is what the Bio.Applications API targets). To do anything
> useful with these tools you have to do nasty things like
> switch the current working directory and so on.
> 
> If we want to do the release this week, we could just warn
> that the PAML code is consider to be "in beta" and that
> the API may well change in non-backwards compatible
> ways?
> 
> What else should be addressed before the next release?
> 
> There are some open bugs, but at first glance nothing
> critical.
> 
> Regards,
> 
> Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 15:28:16 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 16:28:16 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
Message-ID: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>

Hi Brandon,

It looks like the stats line parsing in yn00 needs a little adjustment
for this platform,

======================================================================
ERROR: Test that the yn00 binary runs and generates correct output.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\repositories\BuildBot\win26\build\Tests\test_PAML_tools.py",
line 139, in testYn00Binary
    results = self.yn.run()
  File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py",
line 106, in run
    results = read(self.out_file)
  File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py",
line 131, in read
    sequences)
  File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\_parse_yn00.py",
line 110, in parse_others
    value = stats_split[i+2].strip("()")
IndexError: list index out of range

----------------------------------------------------------------------
Ran 157 tests in 282.385 seconds


I added this commit for a more helpful error message:
https://github.com/biopython/biopython/commit/420430164d258aae27714d907705cd729626f3c6

C:\repositories\biopython\Tests>c:\python26\python test_PAML_tools.py
Test that the baseml binary runs and generates correct output ... ok
Test that the codeml binary runs and generates correct output ... ok
Test that the yn00 binary runs and generates correct output. ... ERROR

======================================================================
ERROR: Test that the yn00 binary runs and generates correct output.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_PAML_tools.py", line 139, in testYn00Binary
    results = self.yn.run()
  File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 106, in run
    results = read(self.out_file)
  File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 131, in read
    sequences)
  File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\_parse_yn00.py",
line 113, in parse_others
    raise ValueError("Problem with stats line: %r" % line)
ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN =
-1.#IND w =-1.#IND S =   -1.$ N =   -1.$ (rho = -1.#IO)\n'

----------------------------------------------------------------------
Ran 3 tests in 1.312s

FAILED (errors=1)


It looks like you're not expecting a bracket pattern quite like that
(and/or this is a cross platform C float representation issue).

Hopefully that string is enough to work out how to fix the parser,
even if you can't reproduce this on your own machine. I can try
and find the output file if you like... might have to disable the
tool's clean up code temporarily to leave it behind.

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 15:39:41 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 16:39:41 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
Message-ID: <CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi Brandon,
>
> It looks like the stats line parsing in yn00 needs a little adjustment
> for this platform,
> ...
> ? ?value = stats_split[i+2].strip("()")
> IndexError: list index out of range
>
>
> ...
> ? ?raise ValueError("Problem with stats line: %r" % line)
> ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN =
> -1.#IND w =-1.#IND S = ? -1.$ N = ? -1.$ (rho = -1.#IO)\n'

I think you need to adjustment to the bounds on i given you want to use
stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper
bound...

C:\repositories\biopython\Tests>git diff
diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py
index 221b6de..e4967fb 100644
--- a/Bio/Phylo/PAML/_parse_yn00.py
+++ b/Bio/Phylo/PAML/_parse_yn00.py
@@ -103,7 +103,7 @@ def parse_others(lines, results, sequences):
                 stats = {}
                 line_stats = line.split(":")[1].strip()
                 stats_split = line_stats.split()
-                for i in range(0, len(stats_split), 3):
+                for i in range(0, len(stats_split)-3, 3):
                     stat = stats_split[i].strip("()")
                     if stat == "w":
                         stat = "omega"


I don't know why this didn't come up under Linux, something subtle
going on between the PAML versions maybe?

Regards,

Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 17:02:24 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 18:02:24 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
Message-ID: <CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>

Hi again,

You may have noticed from the buildbot emails that there is a
separate issue with the PAML tests on Python (2.4 and) 2.5,
applying to executing all three binaries tried: yn00, baseml
and codeml, e.g.

http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.4/builds/259/steps/shell/logs/stdio

======================================================================
ERROR: Test that the yn00 binary runs and generates correct output.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\repositories\BuildBot\win24\build\Tests\test_PAML_tools.py",
line 139, in testYn00Binary
    results = self.yn.run()
  File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\yn00.py",
line 104, in run
    Paml.run(self, ctl_file, verbose, command)
  File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\_paml.py",
line 148, in run
    raise EnvironmentError, "The %s process was killed." % command
EnvironmentError: The yn00 process was killed.

----------------------------------------------------------------------


I can reproduce this at the terminal window, and it is specific
to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as
are Python 3.1 and 3.2.

Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 17:56:28 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 18:56:28 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
	<CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>
Message-ID: <CAKVJ-_7DGxnyOCZc0Vx3_F8e4e4iF_=K-5NFiKsLGmJxLwdYGQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi again,
>
> You may have noticed from the buildbot emails that there is a
> separate issue with the PAML tests on Python (2.4 and) 2.5,
> applying to executing all three binaries tried: yn00, baseml
> and codeml, e.g.
> ...
> I can reproduce this at the terminal window, and it is specific
> to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as
> are Python 3.1 and 3.2.

I'm getting -1 back from the subprocess.call(...)
https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca

Some debugging later I realised the paths in the control file
were using Unix slashes rather than Windows slashes:
https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa

That should now just leave the yn00 stats parsing for you
to check (which offset should the fix use, assuming that
is the right fix).

It was worth insisting on more tests and running them on Windows :)

Regards,

Peter


From b.invergo at gmail.com  Wed Aug 17 18:43:04 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Wed, 17 Aug 2011 20:43:04 +0200
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_7DGxnyOCZc0Vx3_F8e4e4iF_=K-5NFiKsLGmJxLwdYGQ@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
	<CAKVJ-_7zz+VXWpnmrqv-JNt-odP-p7KeB-ZSud_NK_=ORwVs8A@mail.gmail.com>
	<CAKVJ-_7DGxnyOCZc0Vx3_F8e4e4iF_=K-5NFiKsLGmJxLwdYGQ@mail.gmail.com>
Message-ID: <1313606586.3107.9.camel@localhost.localdomain>

Hi, 
Just got home and saw the emails. Yes, in the end it was good to do the
extra tests! So the path separator problem is solved, right?

That indexing is a weird one. I'll look at it now.

-brandon

On Wed, 2011-08-17 at 18:56 +0100, Peter Cock wrote:
> On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Hi again,
> >
> > You may have noticed from the buildbot emails that there is a
> > separate issue with the PAML tests on Python (2.4 and) 2.5,
> > applying to executing all three binaries tried: yn00, baseml
> > and codeml, e.g.
> > ...
> > I can reproduce this at the terminal window, and it is specific
> > to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as
> > are Python 3.1 and 3.2.
> 
> I'm getting -1 back from the subprocess.call(...)
> https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca
> 
> Some debugging later I realised the paths in the control file
> were using Unix slashes rather than Windows slashes:
> https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa
> 
> That should now just leave the yn00 stats parsing for you
> to check (which offset should the fix use, assuming that
> is the right fix).
> 
> It was worth insisting on more tests and running them on Windows :)
> 
> Regards,
> 
> Peter


From b.invergo at gmail.com  Wed Aug 17 21:28:32 2011
From: b.invergo at gmail.com (Brandon Invergo)
Date: Wed, 17 Aug 2011 23:28:32 +0200
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
Message-ID: <1313616514.3107.27.camel@localhost.localdomain>

Ok, I just sent a pull request. It turns out that either due to the way
C works in Windows or due to the way PAML was coded, what was a nice
"-nan" in Linux is printed as "-1.#IND" in Windows, which messed up
everything. Rather than parsing it in an algorithmic manner, I got angry
and threw some regex fu at it, which works a lot nicer than what I had
before.

Tested successfully in Linux and Windows 7, Python 2.7.2

-brandon

On Wed, 2011-08-17 at 16:39 +0100, Peter Cock wrote:
> On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> > Hi Brandon,
> >
> > It looks like the stats line parsing in yn00 needs a little adjustment
> > for this platform,
> > ...
> >    value = stats_split[i+2].strip("()")
> > IndexError: list index out of range
> >
> >
> > ...
> >    raise ValueError("Problem with stats line: %r" % line)
> > ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN =
> > -1.#IND w =-1.#IND S =   -1.$ N =   -1.$ (rho = -1.#IO)\n'
> 
> I think you need to adjustment to the bounds on i given you want to use
> stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper
> bound...
> 
> C:\repositories\biopython\Tests>git diff
> diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py
> index 221b6de..e4967fb 100644
> --- a/Bio/Phylo/PAML/_parse_yn00.py
> +++ b/Bio/Phylo/PAML/_parse_yn00.py
> @@ -103,7 +103,7 @@ def parse_others(lines, results, sequences):
>                  stats = {}
>                  line_stats = line.split(":")[1].strip()
>                  stats_split = line_stats.split()
> -                for i in range(0, len(stats_split), 3):
> +                for i in range(0, len(stats_split)-3, 3):
>                      stat = stats_split[i].strip("()")
>                      if stat == "w":
>                          stat = "omega"
> 
> 
> I don't know why this didn't come up under Linux, something subtle
> going on between the PAML versions maybe?
> 
> Regards,
> 
> Peter


From p.j.a.cock at googlemail.com  Wed Aug 17 21:43:13 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 17 Aug 2011 22:43:13 +0100
Subject: [Biopython-dev] PAML yn00 under Windows
In-Reply-To: <1313616514.3107.27.camel@localhost.localdomain>
References: <CAKVJ-_53J=c99VvZEEtcNEv+aBQ3MkxBF670099ZkFNuu34Pvg@mail.gmail.com>
	<CAKVJ-_4Y+1pDQOg+43StxuomyrnUrNuhd0pVFEe7OFYXteGtKQ@mail.gmail.com>
	<1313616514.3107.27.camel@localhost.localdomain>
Message-ID: <CAKVJ-_4PLb42auxnmuKYZPdFhhg66c+kWRNpiFEF9Rd4hsngxQ@mail.gmail.com>

On Wed, Aug 17, 2011 at 10:28 PM, Brandon Invergo <b.invergo at gmail.com> wrote:
> Ok, I just sent a pull request. It turns out that either due to the way
> C works in Windows or due to the way PAML was coded, what was a nice
> "-nan" in Linux is printed as "-1.#IND" in Windows, which messed up
> everything.

That sounds like the C float libraries, the oddities of which are
something which later versions of Python have done a better
and better job of hiding from us ;)

> Rather than parsing it in an algorithmic manner, I got angry
> and threw some regex fu at it, which works a lot nicer than what
> I had before.
>
> Tested successfully in Linux and Windows 7, Python 2.7.2
>
> -brandon

Sounds good - I'll have a look on github (possibly tomorrow),

Peter


From p.j.a.cock at googlemail.com  Thu Aug 18 16:10:15 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 18 Aug 2011 17:10:15 +0100
Subject: [Biopython-dev] Commit freeze for release 1.58
Message-ID: <CAKVJ-_6An1D82CAdD0tN6BzbMTSqhhRx4ES4Z+38aT82+_8tow@mail.gmail.com>

Hi all,

Unless anyone objects I propose to do the Biopython 1.58
release in the next hour. If this runs into any issues, it will
have to wait until I'm back at work in two weeks time, or
someone else (with access to a Windows 32 bit machine
with all the compilers setup) can tackle it instead.

I will be active online next week however - and coding -
but on Japan time: http://2011.biohackathon.org/

I'm assuming the NEWS file is up to date, and will as
usual be basing the release notice on that. If there is
anything missing, please reply by email.

Thank you all,

Peter


From p.j.a.cock at googlemail.com  Thu Aug 18 17:19:32 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 18 Aug 2011 18:19:32 +0100
Subject: [Biopython-dev] Commit freeze for release 1.58
In-Reply-To: <CAKVJ-_6An1D82CAdD0tN6BzbMTSqhhRx4ES4Z+38aT82+_8tow@mail.gmail.com>
References: <CAKVJ-_6An1D82CAdD0tN6BzbMTSqhhRx4ES4Z+38aT82+_8tow@mail.gmail.com>
Message-ID: <CAKVJ-_6F47YomzP+YVu69=6AA=MMuuFjhpnB0+yuNvmgpVenGA@mail.gmail.com>

On Thu, Aug 18, 2011 at 5:10 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> Unless anyone objects I propose to do the Biopython 1.58
> release in the next hour. If this runs into any issues, it will
> have to wait until I'm back at work in two weeks time, or
> someone else (with access to a Windows 32 bit machine
> with all the compilers setup) can tackle it instead.
>
> I will be active online next week however - and coding -
> but on Japan time: http://2011.biohackathon.org/
>
> I'm assuming the NEWS file is up to date, and will as
> usual be basing the release notice on that. If there is
> anything missing, please reply by email.
>
> Thank you all,
>
> Peter
>

Ok, that's done. And in news that will no doubt please
some of you, I've finally given up on keeping Python 2.4
support going. Feel free to start cleaning up some of the
nastier hacks (like the ElementTree imports).

Peter


From p.j.a.cock at googlemail.com  Thu Aug 18 19:32:57 2011
From: p.j.a.cock at googlemail.com (Peter)
Date: Thu, 18 Aug 2011 20:32:57 +0100
Subject: [Biopython-dev] Biopython 1.58 released
Message-ID: <75327C54-CF88-43BC-BACF-87139456FE67@googlemail.com>

Dear All,

Biopython 1.58 is out:
http://news.open-bio.org/news/2011/08/biopython-1-58-released/

Thank you to everyone who has contributed.

Peter

P.S. We're on Twitter as @Biopython


From updates at feedmyinbox.com  Sun Aug 21 07:49:13 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Sun, 21 Aug 2011 03:49:13 -0400
Subject: [Biopython-dev] 8/21 newest questions tagged biopython - Stack
	Overflow
Message-ID: <0adf58b4241f2a58161d1a41524288d1@74.63.51.88>

// A PWM with gapped alignments in Biopython
// August 9, 2011 at 11:28 AM

http://stackoverflow.com/questions/6998727/a-pwm-with-gapped-alignments-in-biopython
I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments.  I get a "Wrong Alphabet" error every time I do it with gapped alignments.  From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments.  But when I do this, it still doesn't resolve the error.  Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()


--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&amp;sort=newest

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From updates at feedmyinbox.com  Sun Aug 21 07:48:37 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Sun, 21 Aug 2011 03:48:37 -0400
Subject: [Biopython-dev] 8/21 biopython Questions - BioStar
Message-ID: <44c53445166933a51ab21f5d53e72577@74.63.51.88>

// Error using Entrez.esummary from biopython
// August 16, 2011 at 8:47 AM

http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython
Can someone please explain this error?

I hava a smal script that tries to fetch information from the a NCBI BioAssay using the Entrez module form Bipython. I get an error I do not understand. I try to run:

from Bio import Entrez
Entrez.email="yourname at mail.se"

handle_esummary=Entrez.esummary(db='pcassay',id='1337')
record_esummary=Entrez.read(handle_esummary)


I get the error:

File "smaltest.py", line 5, in <module>
    record_esummary=Entrez.read(handle_esummary)
  File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read
    record = handler.run(handle)
  File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run
    self.parser.ParseFile(handle)
  File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement
    itemtype = str(attrs["Type"]) # convert from Unicode
KeyError: 'Type'


// Import fasta sequences to a motif
// August 15, 2011 at 11:54 AM

http://biostar.stackexchange.com/questions/11204/import-fasta-sequences-to-a-motif
I need to construct a PWM from every sequence in a fasta file, using biopython.  The way I'm trying to do this is to import each line of sequence into a motif, then run a PWM on each instance of the motif.  Currently, I'm trying it this way, but different variations of it have generated their fair share of errors, mostly "Wrong Alphabet" and "NoneType object is not iterable":

alphabet = IUPAC.unambiguous_dna
m = Motif.Motif(alphabet)

for seq_record in SeqIO.parse("10fasta.fasta", "fasta"):
    m.add_instance(seq_record.seq)
    print m1.pwm()


Does anyone see what's wrong with the way I'm adding instances to the motif?  Of course, if there's a better way to do this that I'm completely missing, feel free to comment on that too.


// A PWM with gapped alignments in Biopython
// August 9, 2011 at 1:47 PM

http://biostar.stackexchange.com/questions/11070/a-pwm-with-gapped-alignments-in-biopython
I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?

from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
    m.add_instance(a.seq)
m.pwm()


--
Website: http://biostar.stackexchange.com/questions/tagged/biopython

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From p.j.a.cock at googlemail.com  Mon Aug 22 06:53:17 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Aug 2011 07:53:17 +0100
Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via BioStar)
Message-ID: <CAKVJ-_67YzQCHrv2ZZyrbbtMiqu9xtJWZLsZPdXYSnTP4Ti9Wg@mail.gmail.com>

Hi all,

On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox <updates at feedmyinbox.com> wrote:
> // Error using Entrez.esummary from biopython
> // August 16, 2011 at 8:47 AM
>
> http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython
> Can someone please explain this error?
>
> I hava a smal script that tries to fetch information from the a
> NCBI BioAssay using the Entrez module form Bipython. I get
> an error I do not understand. I try to run:
>
> from Bio import Entrez
> Entrez.email="yourname at mail.se"
>
> handle_esummary=Entrez.esummary(db='pcassay',id='1337')
> record_esummary=Entrez.read(handle_esummary)
>
>
> I get the error:
>
> File "smaltest.py", line 5, in <module>
> ? ?record_esummary=Entrez.read(handle_esummary)
> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read
> ? ?record = handler.run(handle)
> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run
> ? ?self.parser.ParseFile(handle)
> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement
> ? ?itemtype = str(attrs["Type"]) # convert from Unicode
> KeyError: 'Type'
>

I can reproduce this and The cause is the NCBI using
lowercase in one tag's attribute:

<Item Name="SourceNameList" type="List">

We're expecting the attributes to be Name and Type, and
that is the case for all the other <Item> tags in this file.

Michiel - do you think we should just add a fallback for
type if we get a KeyError on Type? Do you think we should
report this inconsistency/bug to the NCBI?

Peter


From p.j.a.cock at googlemail.com  Mon Aug 22 07:03:30 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 22 Aug 2011 08:03:30 +0100
Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via
	BioStar)
In-Reply-To: <CAKVJ-_67YzQCHrv2ZZyrbbtMiqu9xtJWZLsZPdXYSnTP4Ti9Wg@mail.gmail.com>
References: <CAKVJ-_67YzQCHrv2ZZyrbbtMiqu9xtJWZLsZPdXYSnTP4Ti9Wg@mail.gmail.com>
Message-ID: <CAKVJ-_783C_9JjJGar4stEu__c9v9Fk3gW=7sTf5b_VmJN-QUA@mail.gmail.com>

On Mon, Aug 22, 2011 at 7:53 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox <updates at feedmyinbox.com> wrote:
>> // Error using Entrez.esummary from biopython
>> // August 16, 2011 at 8:47 AM
>>
>> http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython
>> Can someone please explain this error?
>>
>> I hava a smal script that tries to fetch information from the a
>> NCBI BioAssay using the Entrez module form Bipython. I get
>> an error I do not understand. I try to run:
>>
>> from Bio import Entrez
>> Entrez.email="yourname at mail.se"
>>
>> handle_esummary=Entrez.esummary(db='pcassay',id='1337')
>> record_esummary=Entrez.read(handle_esummary)
>>
>>
>> I get the error:
>>
>> File "smaltest.py", line 5, in <module>
>> ? ?record_esummary=Entrez.read(handle_esummary)
>> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read
>> ? ?record = handler.run(handle)
>> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run
>> ? ?self.parser.ParseFile(handle)
>> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement
>> ? ?itemtype = str(attrs["Type"]) # convert from Unicode
>> KeyError: 'Type'
>>
>
> I can reproduce this and The cause is the NCBI using
> lowercase in one tag's attribute:
>
> <Item Name="SourceNameList" type="List">
>
> We're expecting the attributes to be Name and Type, and
> that is the case for all the other <Item> tags in this file.
>
> Michiel - do you think we should just add a fallback for
> type if we get a KeyError on Type? Do you think we should
> report this inconsistency/bug to the NCBI?

Actually it clearly violates the DTD, and thus fails XML
validation - so it is clearly a NCBI bug.

Peter


From chapmanb at 50mail.com  Tue Aug 23 19:31:34 2011
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 23 Aug 2011 15:31:34 -0400
Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository
In-Reply-To: <CAKVJ-_738+2QeQgsTb14yWjYMKoWtNVxxgMUMFapvZNMyCSc6g@mail.gmail.com>
References: <CAKVJ-_738+2QeQgsTb14yWjYMKoWtNVxxgMUMFapvZNMyCSc6g@mail.gmail.com>
Message-ID: <20110823193134.GB507@kunkel>

Peter;
Awesome, thanks for doing this. I didn't even realize there was a
git solution that could transfer histories across repositories like
this; how did you do it?

Everything looks great on a first pass. Do you think some of the
scripts would also be useful to include in the script directory?
They handle some of the common cases people have asked about;
'access_gff_index.py' uses bx-python so might be excluded, but the
others are Biopython specific.

Thanks again,
Brad

> I managed to do a git script to select out the GFF code and tests from
> your bcbb repository and get it into the Biopython source tree. The
> folder changes made it interesting ;)
> 
> Input: https://github.com/chapmanb/bcbb (master branch)
> 
> Output: https://github.com/peterjc/biopython/tree/brad_gff
> 
> The tests pass, but that is as far as I have got with this. Brad,
> could you have a look at this new branch for sanity checking please?
> 
> Peter


From p.j.a.cock at googlemail.com  Wed Aug 24 02:33:21 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Wed, 24 Aug 2011 03:33:21 +0100
Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository
In-Reply-To: <20110823193134.GB507@kunkel>
References: <CAKVJ-_738+2QeQgsTb14yWjYMKoWtNVxxgMUMFapvZNMyCSc6g@mail.gmail.com>
	<20110823193134.GB507@kunkel>
Message-ID: <CAKVJ-_6HZO0qoUUQBbMJ-oRPdK50hj_=3bAZy6qefKAOO30+uw@mail.gmail.com>

On Tue, Aug 23, 2011 at 8:31 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
> Peter;
> Awesome, thanks for doing this. I didn't even realize there was a
> git solution that could transfer histories across repositories like
> this; how did you do it?

Well, it wasn't an off the shelf solution, it was a hack.

See https://gist.github.com/1167169
and https://github.com/gitpython-developers/GitPython

I used the Python library (import git) to query the source
repository, basically doing "git log -- gff/BCBio gff/Tests"
to find only the commits of interest, then "git show XXX"
to extract the diff which I then had to modify to change
the paths, then a system call to patch to apply each
patch to the destination repository, git add, git commit.
Note for git commit you can specify the message via
a file (-F) so I could preserve the original long message,
plus you can preserve the authored date (--date) and
the author too.

There were several steps where I couldn't work out
how you were meant to do something via the git
wrapper's API (e.g. get a diff as a patch), but it also
lets you easily call git commands directly which was
easier for me.

Bit hacky but seemed to get the job done.

> Everything looks great on a first pass. Do you think some of the
> scripts would also be useful to include in the script directory?
> They handle some of the common cases people have asked about;
> 'access_gff_index.py' uses bx-python so might be excluded, but the
> others are Biopython specific.
>
> Thanks again,
> Brad

Good point - that could be mapped to the Biopython
scripts folder. I'll take a look.

Peter


From updates at feedmyinbox.com  Thu Aug 25 07:48:40 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Thu, 25 Aug 2011 03:48:40 -0400
Subject: [Biopython-dev] 8/25 biopython Questions - BioStar
Message-ID: <738da676fc97903dba65147015733dc5@74.63.51.88>

// How to fetch genomics sequnce using coordinates in BIOPython
// August 24, 2011 at 10:56 PM

http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequnce-using-coordinates-in-biopython
Hi everyone,

I'm a newbie of biopython. My question may be stupid but please help.
I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome.
How can this be done with biopython connecting to NCBI database?
Could anyone help me please?

Thanks a lot.


// How to fetch genomics sequence using coordinates in BioPython
// August 24, 2011 at 10:56 PM

http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequence-using-coordinates-in-biopython
Hi everyone,

I'm a newbie of biopython. My question may be stupid but please help.
I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome.
How can this be done with biopython connecting to NCBI database?
Could anyone help me please?

Thanks a lot.


--
Website: http://biostar.stackexchange.com/questions/tagged/biopython

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From p.j.a.cock at googlemail.com  Fri Aug 26 07:44:32 2011
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 26 Aug 2011 08:44:32 +0100
Subject: [Biopython-dev] Biopython under Python from Cygwin on Windows?
Message-ID: <CAKVJ-_7aDUyv+ruVdQmkZ4yVgjc+hhKaLUNi7-+pGb2hqZwPPg@mail.gmail.com>

Hi all,

I was just wondering if anyone has tried this recently
(Biopython under Cygwin), and if it would be worth
adding as another platform for the buildbot. There
are likely enough differences from Linux to cause
potential cross platform issues - especially for calling
external tools...

Regards,

Peter


From updates at feedmyinbox.com  Fri Aug 26 08:05:18 2011
From: updates at feedmyinbox.com (Feed My Inbox)
Date: Fri, 26 Aug 2011 04:05:18 -0400
Subject: [Biopython-dev] 8/26 newest questions tagged biopython - Stack
	Overflow
Message-ID: <d193273feedfd7bf650264a3d6525a5a@74.63.51.88>

// How do I set the PYTHONPATH on Cygwin?
// August 25, 2011 at 9:16 PM

http://stackoverflow.com/questions/7199082/how-do-i-set-the-pythonpath-on-cygwin
In the Biopython installation instructions, it says that if Biopython doesn't work I'm supposed to do this:

export PYTHONPATH = $PYTHONPATH':/directory/where/you/put/Biopython'

I tried doing that in Cygwin from the ~ directory using the name of the Biopython directory (or everything of it past the ~ directory), but when I tested it by going into the Python interpreter and typing in


    From Bio.Seq import Seq
  

It said the module doesn't exist.

How do I make it so that I don't have to be in the Biopython directory to be able to import Seq?


--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&amp;sort=newest

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068


From clements at galaxyproject.org  Mon Aug 29 21:29:28 2011
From: clements at galaxyproject.org (Dave Clements)
Date: Mon, 29 Aug 2011 14:29:28 -0700
Subject: [Biopython-dev] Galaxy is Hiring
In-Reply-To: <CA+He-X8EiBEMsx+AwvXvM+H0XfRDv=MONLoQ9w+Y6HLHd_JmLw@mail.gmail.com>
References: <CA+He-X8EiBEMsx+AwvXvM+H0XfRDv=MONLoQ9w+Y6HLHd_JmLw@mail.gmail.com>
Message-ID: <CA+He-X-xHsgmr4HRW_xKxwEtKKJ9if9LvVPgjStC-LGf0vnG9Q@mail.gmail.com>

Hello all

The Galaxy Project is growing and has open positions in both the Penn State
and Emory groups (http://wiki.g2.bx.psu.edu/News/Galaxy%20is%20Hiring).

*Penn State: System administrators/analysts*

The Nekrutenko Lab <http://www.bx.psu.edu/%7Eanton/> at the Huck Institutes
of Life Sciences <http://www.huck.psu.edu/> at Penn State
<http://psu.edu/>is currently recruiting system
analysts/administrators with experience in
building and maintaining complex performance compute environments. The areas
of immediate need include:

   - Storage balancing and tiered storage
   - Virtualization
   - Schedulers
   - Deployment of Galaxy instances and dependence management
   - Relational databases and query optimization
   - User management

A minimum of 5 year experience with UNIX/Linux system administration is
required. Applicants should submit a CV and list of references to
jobs at galaxyproject.org.

<http://bx.mathcs.emory.edu/joining/>
*Emory: Software Engineers and Post-Docs*

The Taylor Lab <http://bx.mathcs.emory.edu/> in the
Biology<http://www.biology.emory.edu/>and Mathematics
& Computer Science <http://www.mathcs.emory.edu/> at Emory
University<http://emory.edu/>is looking for software
engineers <http://bx.mathcs.emory.edu/joining/sw/> and postdoctoral
scholars<http://bx.mathcs.emory.edu/joining/postdocs/>to work on the
Galaxy project.

We are seeking software engineers
<http://bx.mathcs.emory.edu/joining/sw/>with expertise in distributed
computing and systems programming, web-based
visualization and visual analytics, informatics and data analysis and
integration, and bioinformatics application areas such as re-sequencing, de
novo assembly, metagenomics, transcriptome analysis and epigenetics. These
are full time positions located in Atlanta, GA. See the official
posting<http://bx.mathcs.emory.edu/joining/sw/>(
http://bx.mathcs.emory.edu/joining/sw/) for full details.
Postdoctoral applicants
<http://bx.mathcs.emory.edu/joining/postdocs/>should have expertise in
Bioinformatics and Computational Biology and
research interests that complement but extend the lab's current
interests<http://bx.mathcs.emory.edu/research/>:
The Galaxy project; distributed and high-performance computing for data
intensive science; vertebrate functional genomics; and genomics and
epigenomic mechanisms of gene regulation, the role of transcription factors
and chromatin structure in global gene expression, development, and
differentiation. See the
announcement<http://bx.mathcs.emory.edu/joining/postdocs/>(
http://bx.mathcs.emory.edu/joining/postdocs/) for full details.


If any of these openings describe you then please consider applying.

Thanks,

Dave C.


-- 
http://galaxyproject.org/
http://getgalaxy.org/
http://usegalaxy.org/
http://galaxyproject.org/wiki/